Auto-scheduling Problem on CUDA

Hello, When I use real time classification on camera with autoscheduler. The model is mobilenetv2 and I convert the model format to onnx

But the result isn’t improve too much, With autoscheduler is about 50fps and without autoscheduler is about 45fps

I have two 1080ti gpus in my computer, so I am curious Do I have to specify the gpu device to 0 or 1 in autoscheduler? or I use the wrong way to use autoscheduler?

You can follow the scripts in this repo to correctly set the tuning parameters of Ansor

Note that Ansor works better for NHWC layout, so probably you can follow the scripts above to convert your model into NHWC layout.

Excuse me. Sorry to bother you. When I use efficientnetv2-b0 to do real time classification on 1080ti The model layout is NCHW and I use 20000 trials,FPS is about 33. And I try to convert my model layout to NHWC,the tuning parameters are same as the previous work,FPS is about 30.

In efficientnetv2-b0,it has 45 tasks to tune. So I am curious, Does the trial isn’t big enough to let Ansor works better for NHWC laout?

And I try to test with mobilnetv2 for NCHW layout and NHWC layout, Both are without autolscheduler, just convert to TVM relay, For NCHW is about 58~60 fps,NHWC is about 45~47fps. The original model layout is NCHW,I think when for NHWC layout will use extra time to convert the layout,lead to the performance isn’t good as NCHW layout?

Another question is when I convert the model layout to NHWC it shows this warning: Desired layout(s) not specified for op: nn.global_avg_pool2d. Does this warning also influence the performance of Autoscheduler?

Many thanks!!