For example like below, the result C’s dtype is uint8:
A = tvm.placeholder((1080, 1920), name='A', dtype='uint8')
K = tvm.placeholder((3, 3), name='K', dtype='uint8')
ry = tvm.reduce_axis((0, 3), name='ry')
rx = tvm.reduce_axis((0, 3), name='rx')
C = tvm.compute((1080-2, 1920-2), lambda i, j: tvm.sum((A[i + ry, j + rx]*K[ry, rx], axis=[ry, rx]), name='C')
print("C dtype:", C.dtype)
If using astype to cast compute dtype, the result C’s dtype is uint32:
C = tvm.compute((1080-2, 1920-2), lambda i, j: tvm.sum((A[i + ry, j + rx].astype('uint32') *K[ry, rx].astype('uint32')), axis=[ry, rx]), name='C')
But the lower code indicate that the actual multiply-add operation is uint32 with uint32. I doubt that this will stop llvm to do any int8 vectorize optimization if hardware support it.
produce C {
for (i, 0, 1080) {
for (j, 0, 1920) {
C[(((i1918) + j) + -1919)] = (uint32)0
for (ry, 0, 3) {
for (rx, 0, 3) {
if (likely((1 <= i))) {
if (likely((i < 1079))) {
if (likely((1 <= j))) {
if (likely((j < 1919))) {
C[(((i1918) + j) + -1919)] = (C[(((i*1918) + j) + -1919)] + (uint32(A[(((((i + ry)*1920) + j) + rx) + -1921)])uint32(K[((ry3) + rx)])))
}
}
}
}
}
}
}
}
}
@janimesh