Mutliple model concurrent execution on 1 GPU in TVM

junarwohn · January 14, 2023, 10:47am

Hi.

I’m inspired by a idea of the paper Spatial Sharing of GPU for Autotuning DNN models [[2008.03602] Spatial Sharing of GPU for Autotuning DNN models].

The idea I focused on is ‘upload multiple models on 1 GPU’ and ‘execute models concurrently’.

But as far as I searched, there is no way to execute multiple model on one GPU at once in TVM natively.

For example,

    # SINGLE MODEL EXECUTION

    model0 =  tvm.contrib.graph_executor.GraphModule(lib0["default"](dev))
    for _ in range(iteration):
        model0.set_input('input_1', input_data)
        model0.run()
        model0.get_output(0).numpy()

    # MULTIPLE MODEL EXECUTION
    model0 =  tvm.contrib.graph_executor.GraphModule(lib0["default"](dev))
    model1 =  tvm.contrib.graph_executor.GraphModule(lib1["default"](dev))
    for _ in range(iteration):
        model0.set_input('input_1', input_data)
        model1.set_input('input_1', input_data)
        model0.run()
        model1.run()
        model0.get_output(0).numpy()
        model1.get_output(0).numpy()

and result would be like

So, my question is that “Did I missed something?” or “This is normal behaviour?”

Thanks in advance.

masahi · January 15, 2023, 1:16am

Yes, I think we only use one GPU stream, so two execution happens serially.

junarwohn · January 16, 2023, 2:03am

Thank you for your reply.

And for those interested in this topic,

Python’s multiprocessing module is also unlikely to be able to run multiple models on a single GPU.

(More profile needed, but the execution time of multiple model takes twice as long as a single model)

I’m looking for another way to accomplish this.

Thank you @masahi again