INT8 Quantization - Code generation for backends

jackwish · November 6, 2018, 3:33am

Hi guys, I have a question about this topic. Is TVM going to enable quantization for a network as a whole, or simply quantize Conv/FC? If whole net is the case, what’s our plan for operators like softmax, which contains floating point computing (exp in softmax)? My consideration is that, as TensorFlow Lite and PyTorch/Caffe2 are using gemmlowp in quantized softmax, could it be a bit tricky to have similar functionality in TVM IR?

Thanks.