Int4 support on ARM processros

I’m looking to quantize our models to int4 and be able to run them on ARM processors. I realized that TVM does not have support for INT4, while the OctoML version claims support for quantized variables below 8 bits. Is this a limitation of the open-sourced TVM? Do we need to use OctoML ecosystem for int4 quantization?

Any help is highly appreciated.

Hey,

AFAI can tell, there is only support for int4 (and apparently int2) for CUDA targets with tensorcores. I am unsure what you are referring to when you say

I have seen slides where the results of int4 on tensorcores have been reported, but I am unsure if it was really “an OctoML version”.

Also, there is an old post about using bitserial operators to implement aggressively quantized NNs.

That being said, I would also like more information and will most likely write a question in the first thread i linked.

Just curious about int4 support on ARM architecture. Are any (efficient) instructions working with int4 on any ARM platform? Or this question is mostly about feasibility to execute int4 model under TVM but not performance?

I’m curious about both. First, is there any way to get performance boost for running operations with int4, instead of int8 on ARM? In other words, does ARM support int4 computation? And also, whether TVM plans to support int4 as a Dtype?