[RFC] Discuss TVM v0.7 Roadmap

Nvidia has an interesting post recently demonstrating their fine-tuning method without retraining the original model for int4 and has only ~1% accuracy loss for resnet50. Their results could be a competitive comparison for TVM 4 bit quantization.

If we assume that we have already got high-accuracy int4 models, we should also focus on int4 tensor core optimization for model inference (mostly for convolution) to increase the speed of inference which other DL acceleration libs don’t support yet.

3 Likes