TVM support for Quantized INT8 Models

Hi all,

I am trying to import INT8 quantized BERT model from ONNX to TVM relay using the frontend “from_ONNX” method. But the conversion is failing with following error-

tvm.error.OpNotImplemented: The following operators are not supported for frontend ONNX: MatMulInteger, DequantizeLinear, DynamicQuantizeLinear

Is there any plan to support these operations in TVM in near future? or can you suggest some other way to import INT8 Quantized models into TVM relay?

2 Likes