I see that the TVM 0.4 Roadmap has Quantized network support as a feature target.
I am interested in helping out for this feature. Currently, I am looking into Intel Skylake and future processors that support INT8 operations in hardware. My focus is on generating schedules suitable for these types of operations.
Is there anybody else working on Quantization (maybe for ARM backend)? If yes, it will be a good idea to discuss and reuse code if applicable. I have few questions regarding mixed precision in quantization that should be applicable to ARM backends as well. So, please let me know.
If you have specific questions that need to be discussed, it would be great to discuss them on the forum by opening threads so everyone can see and benefit from the discussion
Thanks @tqchen
I am planning to outline a proposal by the end of this week and put it up here for RFC. The feedback of the community on that will be really helpful.
For this thread, the specific question is - Is there anyway of specifying different precision formats for different data types in tvm computations. The motivational example is this
N = 138
A = tvm.placeholder((N,), name=βAβ, dtype=βint8β);
B = tvm.placeholder((N,), name=βBβ, dtype=βint8β);
k = tvm.reduce_axis((0, N), βkβ);
C = tvm.compute((1,), lambda i: tvm.sum(A[i] * B[i], axis=k), name=βCβ);
s = tvm.create_schedule(C.op);
print(tvm.lower(s, [A, B, C], simple_mode=True));
Here, I want C to be of datatype βint32β. This need arises because of INT8 quantization requirement where intermediate results need to be stored in higher precision.
You can specify out_dtype if you are using conv2d or other topi interfaces. So for your own functions, you may find a reference by learning topi source codes.