Hello,
I have tested some inferences - squeezenet1.1, resnet18_v1 and inceptionv3 using MALI GPU, and measured the performance, and compared CPU and GPU performance.
While measuring the performance on GPU, I found out that GPU operations aren’t completed at run().
Instead, it seems the operations are completed at TVMArrayCopyFromTo(gpu_y, cpu_y,. .).
Is there any API to make sure to wait for the completion of the all GPU operations?
Thanks,
Inki Dae