I am trying to work with Nervana Distiller to quantize models (example : FP32 to int8) and graphs.
https://github.com/NervanaSystems/distiller#using-venv
The distiller takes a model and quantizes it. It gives output in the form of .yaml files and quantized model checkpoint.pth.tar.
How do I import these quantized model files into the TVM compiler or TVM runtime ?
In the TVM runtime examples :
https://github.com/dmlc/tvm/tree/master/apps/howto_deploy
and https://docs.tvm.ai/deploy/cpp_deploy.html
It seems TVM runtime needs three files for a graph to consume it in runtime C++ and execute -
graph.json
params.json
deploy.so
but the Distiller doesnot give that as output.
Any help, how can I use the Nervana Distiller quantization along with the TVM runtime ?
Or any example of someone having integrated the distiller quantization with TVM ?