Opt level has no direct relation with graph tuner. Now the issue is depthwise conv2d is slow. Then does this come from autotvm, or other stuff such as layout transform? A simple way to verify is to check the best config cost of each workload in the auto tuned log file.