I’d like to introduce a new pass, which automatically promotes the datatype of IterVar
. After this pass, the promoted type can be narrowed down by a type narrowring pass introduced in #5092 (as is discussed in 5643). It has two benefits:
- With this pass, users don’t have to wrap an integer with
IntImm(x, 'int64')
in order to use i64 indices. They will be promoted toi64
automatically. - It helps avoid unexpected overflow.
Strategy
One can certainly promote everything to i64
without ruin the correctness of the program. But unrestricted data type promotion hinders the narrowing afterwards. The more fine-grained our type promotion strategy is, the more narrowing we can do afterwards.
In order to make the promotion as reversible as possible, we use the following rule to determine which expression is to be promoted:
- Unbounded variables like
n = te.var('n')
are not promoted, becausevar.i32
andvar.i64
represents different things: the former ranges within i32 while the latter within i64. - If an expression contains unbounded variables, sub-expressions of it are not promoted.
2.i32 * var.i32
and2.i64 * i64(var.i32)
represents different things: the former suggests that the variable fits in i32, and the expression as a whole also fits in i32, while the latter suggests the variable fits in i32, but the expression as a whole does not.
In my implementation
- an expression is unbounded if it does not contain unbounded variables. Otherwise it’s bounded.
- a variable is bounded if its range (its min and extent) are bounded expressions. Otherwise it’s unbounded.
Phase
P0: Relay
I find it difficult to do it in relay. Some relay ops are implemented in python (mostly ops with autotvm). These ops unpack relay shapes (from c++, reprensented as an array of IntImm
) to python integers, perform some operations, then send them back to c++. In this process, We lose the datatype of IntImm
when converting to python integers.
P1: TE Operation Level.
API:
Operation PromoteIterVarType(Operation op);
Take the compute
operation as an example:
def compute():
# original code...
ret = PromoteIterVarType(ret)
return ret
There are two possible problems
- Variables can be introduced at schedule level. For example, the
factor
ofsplit
can be a variable. In this case, data type promotion should not have been performed. We failed to detect this variable at operation level, before schedules are given. - Variables can be introduced by inline. We cannot detect them at operation level, before the inline pass.
n = te.var('n')
a = te.placeholder((10, 2), name='a')
b = te.placeholder((n, 2), name='b')
c = te.compute((10, 2), lambda i, j: a[i, j])
d = te.compute((10 + n, 2), lambda i, j: te.select(i >= n, c[i - n, j], b[i, j]))
In this case, by our rules, c
is promoted, and d
is not. After inlining c
into d
, we have c[i - n, j] = a[i64(i - n), i64(j)] = a[i64(i - n) * i64(2) + i64(j)]
, which cannot be narrowed to (i - n) * 2 + j
.
P2: TE Schedule Level, between normalize
and InferBound
.
API:
Schedule Schedule::promote_iter_var_type();
A possible problem is that the buffer stride datatype will be promoted along with it. But there is no guarantee that the stride is not involved with any variable indices in operations that take this tensor as inputs.
P3: TIR Level, between StorageFlatten and NarrowDataType
I haven’t dived deep into this. I guess it’s similar to P2, except that in P2 IterVar
is promoted, while here For
is promoted.
It does not have the buffer stride datatype problem because it is performed after the creation of buffers.
A possible problem is that TIR is more expressive than TE. So P3 may be a bit more complex than P2. For example, at TE level, a nested loop with i in (0, 10), j in (0, i)
is not permitted (except for reduction axes), but TIR permits it.
would be great if you could share some thoughts. @junrushao @yzhliu @tqchen @ziheng @vinx13