You could have found a bug in the auto-tuner, youâre right that autotuning normally should change the behaviour of the network.
What deep learning framework are you importing your model from? Tensorflow? PyTorch? Other?
Would you be able to share a reproducible example of your network?
If you could generate some random input data in that framework, and see what the correct output is, and save all of that to file (e.g. Pickle). Then, pass the same input data to the network in TVM. You can then compare the original expected output to the actual output you got.
Thanks, would you be able to simplify your problem further?
E.g. create a new script that doesnât use autoTVM, just loads the Keras model in TVM, and passes a single image through it? Then you can use something like np.testing.assert_allclose to see if the value matches the output from vanilla Keras.
Right now, passing a whole dataset through the autotuned model is brining in a lot of complexity that makes it more difficult to debug. Starting simple and building from there we should be able to figure out whatâs wrong.
In one word, only when predicting more than one images and using the tuning result---- with autotvm.apply_graph_best() at the same time can lead to this bug.
the Testing 4 script ăone test image and the tuning result are following:
Thanks @sqchao for the investigation, this makes identifying the problem a lot easier.
My initial suspicion is that the auto-tuned network has been tuned for a batch size of 1. So if you run it with a larger batch size, it uses code that expects a batch size of 1, so gives the wrong output.
However, looking at the bug_1.py script you provide, I see youâre tuning with a batch size of 100. So that doesnât seem like it could be it.
Could you see if you can reproduce this behaviour in your script using another Keras ImageNet model, even tuning for just a small number of iterations. E.g. export MobileNetV2 with:
from tensorflow.keras.applications.mobilenet_v2 import MobileNetV2 as Net
model = Net(weights='imagenet')
model.save('mobilenetv2.h5')
Iâm sorry to make you confused. The batch_size = 100 in script stand for the number of input images rather than tuning times. The tuning time is n_trial = 2 in line 62.
I have run the âmobilenetv2.h5â model with tuning 2 time, The bug also appear, This bug may not be related with model.
After that, I only change the number of tuning(4, 8, 10) and get some different results below(testing on 2 images).
tuning 10 times: np.testing.assert_allclose()passed!!! â> the model can predict the 2 images very well.
From the above results, we can find that with the increase of the tuning times, the Mismatch elements decreases graduallyă100%â> 39.6%â>0%ă
When checking the prediction probability of each classification, I find that when the number of tuning is 2, the prediction probability values for each classification are almost similar.
With the increasing number of tuning, the prediction value for correct classification gradually increases.
Hi there. Iâve taken a look, and find that yes the model works with varying batch sizes without autoTVM, if you recompile the model again for that different batch size.
See this gist - which is just a Jupyter notebook rewrite of your bug_simple.py script. The issue seems to be if you try a different batch size when applying an autotuned log.
I think that the batch size you do autotuning with is fixed. So if you try and run the network with the autotuning log with a different batch size, it will give the wrong output. I could be wrong, but I think thatâs the issue. Does that sound right @merrymercy@eqy (listed as autoTVM knowledgable in CONTRIBUTORS.md) ?