Auto-scheduling target on CUDA : Poor Performance(real time classifiaction)

andrewwang0612 · May 13, 2022, 4:04am

Hello, When I use real time classification on camera with none tune. The fps is about 13~14. The network is EfficientNetV2-B0 and I converted ckpt to onnx from official website.https://github.com/google/automl/tree/master/efficientnetv2

In autoscheduler , I set target=‘cuda sm_61’ , ‘n_trial=10000’ ,Fps is become to 0.2~0.3 .Maybe it’s the TVM version is too old,so I reinstall it and I set target=‘cuda sm_61’ , ‘n_trial=5000’,still get the poor performance.

Did I use the wrong way to apply the best result in tuning file?

Does anyone have an idea why use autoscheduler is worse than none-autoscheduler?

Thanks!!

merrymercy · June 12, 2022, 11:05pm

It seems your usage is wrong. You should put relay.build under auto_scheduler.ApplyHistoryBest.

You can follow the scripts in this repo to correctly set the tuning parameters of Ansor

Note that Ansor works better for NHWC layout, so probably you can follow the scripts above to convert your model into NHWC layout.

andrewwang0612 · June 14, 2022, 2:21pm

Apologies for the delayed response， Thank you so much !! I already solved my problem. Although in Jetson nano still hasn’t an obvious performance, Maybe it is the problem of model’s layout I will follow the scripts in your’s repo to check it !!