Why tuning 2 times, the accuracy of compiled model nearly 0?

When I do not use AutoTVM, the accuracy of compiled model is very high. However, after tuning the model only 2 times, the accuracy is nearly 0.

How auto tuning affect the accuracy?

To my knowlege, auto tuning is the process of finding the optimal configuration for operator implementation, It only changes the performance.

Please correct me, if I misunderstand autotvm.

Could anyone give me some advice? Thanks a lot!

You could have found a bug in the auto-tuner, you’re right that autotuning normally should change the behaviour of the network.

What deep learning framework are you importing your model from? Tensorflow? PyTorch? Other?

Would you be able to share a reproducible example of your network?

If you could generate some random input data in that framework, and see what the correct output is, and save all of that to file (e.g. Pickle). Then, pass the same input data to the network in TVM. You can then compare the original expected output to the actual output you got.

1 Like

I use resnet50-imagenet.h5 keras model, and test on imagenet dataset.

You can download the model 、test data、script and tuning results from the following links。

Put all of this files in the same directory and then run the script, the bug maybe reproduce.


test data:

script && tuning results:


Thanks !!!

ps: the cpu info: Intel® Xeon® CPU E5-2640 v4 @ 2.40GHz

Thanks, would you be able to simplify your problem further?

E.g. create a new script that doesn’t use autoTVM, just loads the Keras model in TVM, and passes a single image through it? Then you can use something like np.testing.assert_allclose to see if the value matches the output from vanilla Keras.

Right now, passing a whole dataset through the autotuned model is brining in a lot of complexity that makes it more difficult to debug. Starting simple and building from there we should be able to figure out what’s wrong.

1 Like

Thanks your advices! Following your advices, I do some testings:

Testing 1:

The result shows that tvm and vanilla Keras have same predicting result on a single image .---->no bug

Testing 2:

Basing on Testing 1, predict more than one images at the same time —> no bug

Testing 3:

Basing on Testing 1, only add a single sentence with autotvm.apply_graph_best("resnet50-imagenet_origin.h5_graph_opt.log"): —> no bug

Testing 4:

Basing on Testing 3, predict more than one images(e.g., 2) at the same time ----> the bug appear

In one word, only when predicting more than one images and using the tuning result---- with autotvm.apply_graph_best() at the same time can lead to this bug.

the Testing 4 script 、one test image and the tuning result are following: https://drive.google.com/drive/folders/1wsnwXCaUNNzOJHiU4Jt0LFPgih6-hAuV?usp=sharing

Thanks @sqchao for the investigation, this makes identifying the problem a lot easier.

My initial suspicion is that the auto-tuned network has been tuned for a batch size of 1. So if you run it with a larger batch size, it uses code that expects a batch size of 1, so gives the wrong output.

However, looking at the bug_1.py script you provide, I see you’re tuning with a batch size of 100. So that doesn’t seem like it could be it.

Could you see if you can reproduce this behaviour in your script using another Keras ImageNet model, even tuning for just a small number of iterations. E.g. export MobileNetV2 with:

from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2 as Net

model = Net(weights='imagenet')
1 Like

I’m sorry to make you confused. The batch_size = 100 in script stand for the number of input images rather than tuning times. The tuning time is n_trial = 2 in line 62.

I have run the “mobilenetv2.h5” model with tuning 2 time, The bug also appear, This bug may not be related with model.

After that, I only change the number of tuning(4, 8, 10) and get some different results below(testing on 2 images).

tuning times =4:


tuning times=8:


tuning 10 times:
np.testing.assert_allclose() passed!!! —> the model can predict the 2 images very well.

From the above results, we can find that with the increase of the tuning times, the Mismatch elements decreases gradually【100%–> 39.6%–>0%】

When checking the prediction probability of each classification, I find that when the number of tuning is 2, the prediction probability values for each classification are almost similar. image

With the increasing number of tuning, the prediction value for correct classification gradually increases.

If you figure out the root cause of this bug, please let me know. Thank you very much!

Hi there. I’ve taken a look, and find that yes the model works with varying batch sizes without autoTVM, if you recompile the model again for that different batch size.

See this gist - which is just a Jupyter notebook rewrite of your bug_simple.py script. The issue seems to be if you try a different batch size when applying an autotuned log.

I think that the batch size you do autotuning with is fixed. So if you try and run the network with the autotuning log with a different batch size, it will give the wrong output. I could be wrong, but I think that’s the issue. Does that sound right @merrymercy @eqy (listed as autoTVM knowledgable in CONTRIBUTORS.md) ?