Sure, I agree that i32+i32 should map to i32 as the normal rules. I just mean that in the relay build, we might want to directly create vars whose dtype is i64
I made a simple pass: https://github.com/hzfan/tvm/pull/3/files
Basically I implemented a DataTypeRewrite pass, and a SymIntBoundAnalyzer. I use the analyzer to check whether an expression is purely symbolic, like Case5 and Case6 in the following.
The results are as follows:
Case1 (2.i32, 2.i32, 2.i32) + (2.i32, 2.i32, 2.i32):
produce c {
for (i0.int32, 0, 2) {
for (i1.int32, 0, 2) {
for (i2.int32, 0, 2) {
c.handle[(((i0.int32*4) + (i1.int32*2)) + i2.int32)] = (a.handle[(((i0.int32*4) + (i1.int32*2)) + i2.int32)] + b.handle[(((i0.int32*4) + (i1.int32*2)) + i2.int32)])
}
}
}
}
Case2 (2.i64, 2.i64, 2.i64) + (2.i64, 2.i64, 2.i64):
Same as case1. This indicates that tvm.const(2, dtype=‘int64’) gets narrowed to i32. On the other hand, this also indicates that the dtype in tvm.const(2, dtype=‘int64’) is essentially killed.
Case3 (2^16.i32, 2^16.i32) + (2^16.i32, 2^16.i32):
produce c {
for (i0.int32, 0, 65536) {
for (i1.int32, 0, 65536) {
c.handle[((int64(i0.int32)*(int64)65536) + int64(i1.int32))] = (a.handle[((int64(i0.int32)*(int64)65536) + int64(i1.int32))] + b.handle[((int64(i0.int32)*(int64)65536) + int64(i1.int32))])
}
}
}
This indicates automatic type promotion to prevent possible overflow.
Case4 (2^16.i64, 2^16. i64) + (2^16.i64, 2^16.i64):
Same as Case3.
Case5 (n.i32, m.i32, k.i32) + (n.i32, m.i32, k.i32):
produce c {
for (i0.int32, 0, n.int32) {
for (i1.int32, 0, m.int32) {
for (i2.int32, 0, k.int32) {
c.handle[(((i0.int32*stride.int32) + (i1.int32*stride.int32)) + (i2.int32*stride.int32))] = (a.handle[(((i0.int32*stride.int32) + (i1.int32*stride.int32)) + (i2.int32*stride.int32))] + b.handle[(((i0.int32*stride.int32) + (i1.int32*stride.int32)) + (i2.int32*stride.int32))])
}
}
}
}
With purely symbolic shapes, we do NOT promote datatype based on the bound provided by Analyzer, to make sure that i32+i32 remains i32.
Case6 (n.i64, m.i64, k.i64) + (n.i64, m.i64, k.i64):
produce c {
for (i0.int64, (int64)0, n.int64) {
for (i1.int64, (int64)0, m.int64) {
for (i2.int64, (int64)0, k.int64) {
c.handle[(((i0.int64*stride.int64) + (i1.int64*stride.int64)) + (i2.int64*stride.int64))] = (a.handle[(((i0.int64*stride.int64) + (i1.int64*stride.int64)) + (i2.int64*stride.int64))] + b.handle[(((i0.int64*stride.int64) + (i1.int64*stride.int64)) + (i2.int64*stride.int64))])
}
}
}
}
Some notes, sorry for not being clear
I don’t think we should automatically promote i32 + i32 -> i64 when there is an overflow. Because it is different from the standard convention of normal typed language(e.g. C++).
for (int64 i= 0 ; i < 100; ++i) {
A[i * 2 + 1] = 0
}
For example, i
in the above program can be downgraded into i32, because all the expressions that refers to it, in this case i * 2 + 1
falls within the i32 bound.
instead, we should be able to downgrade i64 expressions to i32 expression(or even i16), (e.g. change the type of a loop var), if we find that all the intermediate results are within bound of i32(use ConstIntBoundAnalyzer, no need to add a new analyzer)
Thanks for clarification. I implemented a new pass. The idea is to first rewrite the type of a loop var by examining all expressions containing it. Then I do some necessary promotion to make sure operands of one operation (like add) share one type. The code: https://github.com/hzfan/tvm/pull/4/files
The controversial case is that for purely symbolic shapes, i64_t will be deduced for a loop_var:
(n.i32, m.i32) + (n.i32, m.i32):
stmt = produce c {
for (i0.int64, (int64)0, int64(n.int32)) {
for (i1.int64, (int64)0, int64(m.int32)) {
c.handle[((i0.int64*int64(stride.int32)) + (i1.int64*int64(stride.int32)))] = (a.handle[((i0.int64*int64(stride.int32)) + (i1.int64*int64(stride.int32)))] + b.handle[((i0.int64*int64(stride.int32)) + (i1.int64*int64(stride.int32)))])
}
}
}
Const shapes behave as expected:
Case1 (2, 2, 2) + (2, 2, 2)
stmt = produce c {
for (i0.int32, 0, 2) {
for (i1.int32, 0, 2) {
for (i2.int32, 0, 2) {
c.handle[(((i0.int32*4) + (i1.int32*2)) + i2.int32)] = (a.handle[(((i0.int32*4) + (i1.int32*2)) + i2.int32)] + b.handle[(((i0.int32*4) + (i1.int32*2)) + i2.int32)])
}
}
}
}
Case2 (2^16, 2^16) + (2^16, 2^16)
stmt = produce c {
for (i0.int64, (int64)0, (int64)65536) {
for (i1.int64, (int64)0, (int64)65536) {
c.handle[((i0.int64*(int64)65536) + i1.int64)] = (a.handle[((i0.int64*(int64)65536) + i1.int64)] + b.handle[((i0.int64*(int64)65536) + i1.int64)])
}
}
}
Not sure if this is good, should we respect the i32 var type that was specified?
Agree. I made a few modification according to your suggestion: https://github.com/hzfan/tvm/pull/4/files
Case1 (n.i32, m.i32) + (n.i32, m.i32):
produce c {
for (i0.int32, 0, n.int32) {
for (i1.int32, 0, m.int32) {
c.handle[((i0.int32*stride.int32) + (i1.int32*stride.int32))] = (a.handle[((i0.int32*stride.int32) + (i1.int32*stride.int32))] + b.handle[((i0.int32*stride.int32) + (i1.int32*stride.int32))])
}
}
}
Case2 (n.i64, m.i64) + (n.i64, m.i64):
produce c {
for (i0.int64, (int64)0, n.int64) {
for (i1.int64, (int64)0, m.int64) {
c.handle[((i0.int64*stride.int64) + (i1.int64*stride.int64))] = (a.handle[((i0.int64*stride.int64) + (i1.int64*stride.int64))] + b.handle[((i0.int64*stride.int64) + (i1.int64*stride.int64))])
}
}
}
Case3 (2.i32, 2.i32, 2.i32) + (2.i32, 2.i32, 2.i32):
produce c {
for (i0.int32, 0, 2) {
for (i1.int32, 0, 2) {
for (i2.int32, 0, 2) {
c.handle[(((i0.int32*4) + (i1.int32*2)) + i2.int32)] = (a.handle[(((i0.int32*4) + (i1.int32*2)) + i2.int32)] + b.handle[(((i0.int32*4) + (i1.int32*2)) + i2.int32)])
}
}
}
}
Case4 (2.i64, 2.i64, 2.i64) + (2.i64, 2.i64, 2.i64)
In this case, we narrow the type to i32.
produce c {
for (i0.int32, 0, 2) {
for (i1.int32, 0, 2) {
for (i2.int32, 0, 2) {
c.handle[(((i0.int32*4) + (i1.int32*2)) + i2.int32)] = (a.handle[(((i0.int32*4) + (i1.int32*2)) + i2.int32)] + b.handle[(((i0.int32*4) + (i1.int32*2)) + i2.int32)])
}
}
}
}
Case5 (65536.i32, 65536.i32) + (65536.i32, 65536.i32)
In this case, i32 is used, we do not auto-promote the type even in the case of overflow.
produce c {
for (i0.int32, 0, 65536) {
for (i1.int32, 0, 65536) {
c.handle[((i0.int32*65536) + i1.int32)] = (a.handle[((i0.int32*65536) + i1.int32)] + b.handle[((i0.int32*65536) + i1.int32)])
}
}
}
Case6 (65536.i64, 65536.i64) + (65536.i64, 65536.i64)
produce c {
for (i0.int64, (int64)0, (int64)65536) {
for (i1.int64, (int64)0, (int64)65536) {
c.handle[((i0.int64*(int64)65536) + i1.int64)] = (a.handle[((i0.int64*(int64)65536) + i1.int64)] + b.handle[((i0.int64*(int64)65536) + i1.int64)])
}
}
}
Is the PR ready for review in upstream?
Yes. I will submit a PR in upstream.
Is int64 tensor fully supported in the main branch now?