I run the tvm inference on cuda with compiled mxnet model using C++. it worked well. But if I want to run another model, how to release the memory used by the 1st model? Is there some API about “tvm.graph_runtime.release” ?
tvm::runtime::Module mod_syslib = tvm::runtime::Module::LoadFromFile("../../../dl_compiler_py/libdeploy.so");
//...
tvm::runtime::Module mod =(*tvm::runtime::Registry::Get("tvm.graph_runtime.create"))(json_data, mod_syslib, device_type, device_id);