hi , i have a question regarding heterogeneous execution in TVM . mainly focused on BYOC experimentation.
Lets assume i have a BYOC module which will be executed on a accelerator (which is residing the GPU itself) which basically means that accelerator can use GPU mem.
Now by the looks of it, byoc module is executed from HOST side, hence all the inputs and outputs are transferred back to host once module is executed. But if there is a scenario like this:
GPU module ----> BYOC module
then currently , the output of GPU module is first copied to host and then BYOC uses this host buffer , whereas since accelerator is a part of GPU, BYOC module can directly use gpu output as input , hence totally avoiding the host / device copies.
after a lot of code follow through in tvm i am still unable to figure out if this is possible in current BYOC support in tvm , and if there is , how could we achieve it?
@sanirudh @tqchen @comaniac @zhiics any insights on this?
Any help is very much appreciated.