I have an int8 quantized model (with json, params) compiled to WASM and another params file that holds dequantization data. While ingesting the model in a NodeJS app, I need to dequantize the weights back to fp32, which requires me to read the parameter files, dequantize the weights and save the parameter dictionary again.
Is there a way to support TVM Relay functions that allow access to the graph and parameters(save_params_dict, load_params_dict) with a WASM library the way the TVM Runtime is supported?
Here’s a great example by Tianqi to compile to the model.
I take this and apply it to a pretrained TF model to compile to WASM. If you look carefully at the repo, you will also find a sample script to deploy the model for WebGPU.