The arguments list of The tvm.build and Function is the same, however, it is not the same with the generated cuda kernel. For the cuda kernel, I found some times the outputs arguments is before the inputs, while some times it’s not. Can I get the order or mapping of the cuda kernel arguments ?
Do someone knows where are the codes that specify the order of the cuda kernel arguments list or map the build args to the cuda kernel arguments?
The argument order of the cuda kernel is decided right now at the time of the device host split. So it is not deterministic atm.
To call the cuda function, we usually invoke a related host function which has the same argument order as the arguments being passed to build.
Thank you for your reply, it’s would be great if the cuda kernel argument list keeps the same with the TVM build args, thus we can deploy the cuda kernel easily without using TVM runtime to avoid the C++ ABI problem, etc…
Since we can’t know the cuda kernel args list, does TVM still keep the input and output args continuous, and by topological order?
I would still recommend using the PackedFunc interface, as there can be quite a few things that are needed to directly use the raw kernel, for example, the launching parameter calculation is part of the host code, as well as the data unpacking.
Depending on the schedule, we could also generate a function that contains multiple cuda kernel launches.
If the primary concern is C++ ABI, TVM runtime contains a C interface which has a stable ABI, see https://github.com/apache/incubator-tvm/blob/main/include/tvm/runtime/c_runtime_api.h
Thank you, I will try the C interface, it’s true that manually call cuda kernel can’t handle the multiple cuda kernels.