Hey,
AFAI can tell, there is only support for int4 (and apparently int2) for CUDA targets with tensorcores. I am unsure what you are referring to when you say
I have seen slides where the results of int4 on tensorcores have been reported, but I am unsure if it was really “an OctoML version”.
Also, there is an old post about using bitserial operators to implement aggressively quantized NNs.
That being said, I would also like more information and will most likely write a question in the first thread i linked.