I’d like to introduce a new pass, which automatically promotes the datatype of
IterVar. After this pass, the promoted type can be narrowed down by a type narrowring pass introduced in #5092 (as is discussed in 5643). It has two benefits:
- With this pass, users don’t have to wrap an integer with
IntImm(x, 'int64')in order to use i64 indices. They will be promoted to
- It helps avoid unexpected overflow.
One can certainly promote everything to
i64 without ruin the correctness of the program. But unrestricted data type promotion hinders the narrowing afterwards. The more fine-grained our type promotion strategy is, the more narrowing we can do afterwards.
In order to make the promotion as reversible as possible, we use the following rule to determine which expression is to be promoted:
- Unbounded variables like
n = te.var('n')are not promoted, because
var.i64represents different things: the former ranges within i32 while the latter within i64.
- If an expression contains unbounded variables, sub-expressions of it are not promoted.
2.i32 * var.i32and
2.i64 * i64(var.i32)represents different things: the former suggests that the variable fits in i32, and the expression as a whole also fits in i32, while the latter suggests the variable fits in i32, but the expression as a whole does not.
In my implementation
- an expression is unbounded if it does not contain unbounded variables. Otherwise it’s bounded.
- a variable is bounded if its range (its min and extent) are bounded expressions. Otherwise it’s unbounded.
I find it difficult to do it in relay. Some relay ops are implemented in python (mostly ops with autotvm). These ops unpack relay shapes (from c++, reprensented as an array of
IntImm) to python integers, perform some operations, then send them back to c++. In this process, We lose the datatype of
IntImm when converting to python integers.
P1: TE Operation Level.
Operation PromoteIterVarType(Operation op);
compute operation as an example:
def compute(): # original code... ret = PromoteIterVarType(ret) return ret
There are two possible problems
- Variables can be introduced at schedule level. For example, the
splitcan be a variable. In this case, data type promotion should not have been performed. We failed to detect this variable at operation level, before schedules are given.
- Variables can be introduced by inline. We cannot detect them at operation level, before the inline pass.
n = te.var('n') a = te.placeholder((10, 2), name='a') b = te.placeholder((n, 2), name='b') c = te.compute((10, 2), lambda i, j: a[i, j]) d = te.compute((10 + n, 2), lambda i, j: te.select(i >= n, c[i - n, j], b[i, j]))
In this case, by our rules,
c is promoted, and
d is not. After inlining
d, we have
c[i - n, j] = a[i64(i - n), i64(j)] = a[i64(i - n) * i64(2) + i64(j)], which cannot be narrowed to
(i - n) * 2 + j.
P2: TE Schedule Level, between
A possible problem is that the buffer stride datatype will be promoted along with it. But there is no guarantee that the stride is not involved with any variable indices in operations that take this tensor as inputs.
P3: TIR Level, between StorageFlatten and NarrowDataType
I haven’t dived deep into this. I guess it’s similar to P2, except that in P2
IterVar is promoted, while here
For is promoted.
It does not have the buffer stride datatype problem because it is performed after the creation of buffers.
A possible problem is that TIR is more expressive than TE. So P3 may be a bit more complex than P2. For example, at TE level, a nested loop with
i in (0, 10), j in (0, i) is not permitted (except for reduction axes), but TIR permits it.