Hi.
I’m inspired by a idea of the paper Spatial Sharing of GPU for Autotuning DNN models [[2008.03602] Spatial Sharing of GPU for Autotuning DNN models].
The idea I focused on is ‘upload multiple models on 1 GPU’ and ‘execute models concurrently’.
But as far as I searched, there is no way to execute multiple model on one GPU at once in TVM natively.
For example,
# SINGLE MODEL EXECUTION
model0 = tvm.contrib.graph_executor.GraphModule(lib0["default"](dev))
for _ in range(iteration):
model0.set_input('input_1', input_data)
model0.run()
model0.get_output(0).numpy()
# MULTIPLE MODEL EXECUTION
model0 = tvm.contrib.graph_executor.GraphModule(lib0["default"](dev))
model1 = tvm.contrib.graph_executor.GraphModule(lib1["default"](dev))
for _ in range(iteration):
model0.set_input('input_1', input_data)
model1.set_input('input_1', input_data)
model0.run()
model1.run()
model0.get_output(0).numpy()
model1.get_output(0).numpy()
and result would be like
So, my question is that “Did I missed something?” or “This is normal behaviour?”
Thanks in advance.