[BUG] Performance drop with batch and opt_level=3

guyzsarun · February 23, 2021, 8:06am

When autotuning with batch_size more than 1 and opt_level=3 with mxnet model multiple outputs, I encountered a performance drop in some of the output.
opt_level 0, 1 and 2 seems to be fine

The performance of tvm model batch_size=2, opt_level=3

Similarity Score for output 1 : 0.88 
Similarity Score for output 2 : 0.55 
Similarity Score for output 3 : 1.00 
Similarity Score for output 4 : 0.86 
Similarity Score for output 5 : 1.00

gasgallo · March 1, 2021, 4:28am

I’m facing the same issue and I can reproduce it with a retinaface model (this model).

When batch size is 1, outputs are correct.

When batch size is 2 or bigger and opt_level is 1 or 2, outputs are correct.

When batch size is 2 or bigger and opt_level is 3, some outputs are wrong/different when compared to the original model.

Is this a bug? Any idea why this happens? @tqchen

masahi · March 1, 2021, 4:43am

hmm maybe a pass or compute/schedule that kicks in only when opt_level = 3 is assuming that a batch size is one. If you have a repro, please open an issue and I can take a look.

To be clear, is this an accuracy or performance problem? Maybe both?

gasgallo · March 1, 2021, 4:51am

It’s an accuracy issue for sure. I haven’t tested performance yet, but I will.

I’ll open an issue on github then, thanks!

masahi · March 1, 2021, 4:53am

ah “performance” @guyzsarun mentioned seems to mean “accuracy”. Ok.

tqchen · March 3, 2021, 3:41pm

Thanks @gasgallo , can you try to optionally disable passes that in tvm/build_module.cc at main · apache/tvm · GitHub and see what passes causes the accuracy drop?

One possible pass could be FoldScaleAxis

guyzsarun · March 5, 2021, 7:27am

Please refer to Accuracy drop when use batch and opt_level=3 · Issue #7563 · apache/tvm (github.com)