I’m trying to manually run a model generated using relay.build. Basically, I’m pretending to be the graph runtime (for an experimental system that uses CUDA kernels but not the local CUDA drivers).
So far I know how to get the raw source, compiled ptx, execution graph, and parameters. What I’m missing are the block and grid dimensions. Here’s a minimum example:
import numpy as np
from tvm import relay
from tvm.relay import testing
import tvm
mod, params = relay.testing.mlp.get_workload(1)
target = tvm.target.cuda()
with tvm.transform.PassContext():
graphMod = relay.build(mod, target, params=params)
lib = graphMod.get_lib()
cudaLib = lib.imported_modules[0]
executionGraph = graphMod.get_json()
rawParams = { k : p.asnumpy() for k,p in params.items() }
rawSource = cudaLib.get_source()
cudaLib.save("foo.ptx")
# gridDim, blockDim = ????
From cuda_module.cc I see that these are encoded in the TVMArgs for a CUDAWrappedFunction, but I’m not clear on how to access this from Python or otherwise derive them. Ideally I’d do this without having to instrument TVM and run the model once.
One other question about this: The parameters are named in params, but numbered in the execution graph. Are the arrays in params ordered the same way as in the graph?