hi ,
I have a question regarding Runtime module execution wrt to memory managment.
Basically if there is a scenario where :
Module1 (CPU) -----> Module2 (GPU) -------> Module3 (CPU)
are getting executed , lets say i have done :
with tvm.transform.PassContext(opt_level=2):
graph, lib, params = relay.build(mod, target="rocm" , target_host = "llvm", params=None)
and then ran the runtime module using:
lib.export_library("libx.so", fcompile=False)
lib = runtime.load_module("libx.so")
rt_mod = graph_runtime.create(graph, lib ,tvm.device("rocm" , 0))
# rt_mod = debug_runtime.create(graph, lib ,tvm.device("rocm" , 0))
execute_rt_mod(rt_mod)
, what i need to understand is where exactly the device_copy calls are embedded between Module1 (CPU) and module2 (GPU) and similary between Module2(GPU) and module3(CPU). thanks