Excuse me. Sorry to bother you.
When I use efficientnetv2-b0 to do real time classification on 1080ti
The model layout is NCHW and I use 20000 trials,FPS is about 33.
And I try to convert my model layout to NHWC,the tuning parameters are same as the previous work,FPS is about 30.
In efficientnetv2-b0,it has 45 tasks to tune.
So I am curious,
Does the trial isn’t big enough to let Ansor works better for NHWC laout?
And I try to test with mobilnetv2 for NCHW layout and NHWC layout,
Both are without autolscheduler, just convert to TVM relay,
For NCHW is about 58~60 fps,NHWC is about 45~47fps.
The original model layout is NCHW,I think when for NHWC layout will
use extra time to convert the layout,lead to the performance isn’t good as NCHW layout?
Another question is when I convert the model layout to NHWC
it shows this warning:
Desired layout(s) not specified for op: nn.global_avg_pool2d.
Does this warning also influence the performance of Autoscheduler?