i have been playing around with TVM’s BYOC , and i have implemented a CUSTOM codegen which has buffer generation and buffer copy calls into it.
However there are submodules when they are NOT offloaded to the custom codegen and runs on default compiler of TVM.
keeping this scenario in mind i have following runtime scenario in mind:
CUSTOM submodule ------> NON-CUSTOM (default) submodule ---------.> CUSTOM submodule
Now lets talk about the the buffer management in the codegen.
in CUSTOM codegen , i am creating buffer for each input and copying the input data from HOST->device. (ROCM).
Though this works correctly , i am a little confused why would this work.
Allow me to explain ,
i have compiled this realy graph with
with tvm.transform.PassContext(opt_level=2): graph, lib, params = relay.build(partitioned_mod, target="rocm" , target_host = "llvm", params=None)
So this means that i am compiling the code for target GPU. hence all the inputs will also be in device itself.
Now as i mentioned that in CUSTOM codegen i am creating buffers in the device and copying data from HOST → device. which is a little confusing as , the NON-CUSTOM submodule runs on device , the output of this module which is input to last CUSTOM module is also in device,
Then how CUSTOM codegen is able to copy data from HOST to device when the data is actually in device ?
So i think i want to ask 2 questions:
- How to figure out which target device the output of a callNOde is residing on. or in other words , how to get target device type of all inputs for a callNode.
- What are the rules to annotate a target device type for a CallNode.?
These seems to be little confusing to me , and any help would highly appreciated. thanks