Hi, I’m working with some int8 data on cuda. If we vectorize a x4 loop, four int8 elements can be packed to a int. e.g. https://gist.github.com/vinx13/8bb465e948d5f5883c67bc82d56167c9
However it is currently impossible to set the split factor = 16, because cuda does not have int8x16 type, which actually might be converted to int4. I wonder if we need support for this in tvm.
this question is solved in https://github.com/dmlc/tvm/pull/1503