I’m looking to quantize our models to int4 and be able to run them on ARM processors. I realized that TVM does not have support for INT4, while the OctoML version claims support for quantized variables below 8 bits. Is this a limitation of the open-sourced TVM? Do we need to use OctoML ecosystem for int4 quantization?
Just curious about int4 support on ARM architecture. Are any (efficient) instructions working with int4 on any ARM platform? Or this question is mostly about feasibility to execute int4 model under TVM but not performance?
I’m curious about both. First, is there any way to get performance boost for running operations with int4, instead of int8 on ARM? In other words, does ARM support int4 computation? And also, whether TVM plans to support int4 as a Dtype?