I’m looking to quantize our models to int4 and be able to run them on ARM processors. I realized that TVM does not have support for INT4, while the OctoML version claims support for quantized variables below 8 bits. Is this a limitation of the open-sourced TVM? Do we need to use OctoML ecosystem for int4 quantization?
Any help is highly appreciated.