I’m trying to use TVM’s stack to deploy INT8-quantized Transformer-based models.
I tried Relay + Ansor(AutoScheduler) for a Transformer (# layers = 1) and the results weren’t so neat.
|TVM (Relay, optimized)||130||120|
|TVM (Relay, optimized), Ansor (it=20k)||17||44|
- (# of runs) = 100
- the stdev was very small.
In your opinion, what’d be the best for the next steps? Could you recommend a good starting point or useful references for them?