How could we request a inference synchronously?

daeinki · November 14, 2018, 8:07am

Hello,

I have tested some inferences - squeezenet1.1, resnet18_v1 and inceptionv3 using MALI GPU, and measured the performance, and compared CPU and GPU performance.

While measuring the performance on GPU, I found out that GPU operations aren’t completed at run().
Instead, it seems the operations are completed at TVMArrayCopyFromTo(gpu_y, cpu_y,. .).

Is there any API to make sure to wait for the completion of the all GPU operations?

Thanks,
Inki Dae

masahi · November 14, 2018, 10:41am

you can use ctx.sync().

daeinki · November 15, 2018, 4:52am

Thanks for answer.

Thanks,
Inki Dae

daeinki · November 21, 2018, 8:35am

BTW, I used c++ code on device so I cannot use ctx.sync(). Is there c++ based sync API?
I see TVMSynchronize function but it seems creating runtime stream is required. Is there any example about this?

Thanks,
Inki Dae

masahi · November 21, 2018, 9:32am

You can check what ctx.sync() does, here.

So you should be able to pass a null pointer to TVMSynchronize on C++ side too. Or you can always use the OpenCL runtime api directly.