In the Limitations of PassUpDomain section of InferBound Pass, I ran the code in Ex.6 in TVM ver0.8. And my result is not the same with the result given in the doc. Doc’s result :
// attr [B] storage_scope = "global"
allocate B[float32 * 16]
produce C {
for (ci.cj.fused.outer, 0, 4) {
produce B {
for (bi, 0, 4) {
for (bj, 0, 4) {
B[((bi*4) + bj)] = (A[((bi*4) + bj)] + 2.000000f)
}
}
}
for (ci.cj.fused.inner, 0, 4) {
C[((ci.cj.fused.outer*4) + ci.cj.fused.inner)] = (B[((ci.cj.fused.outer*4) + ci.cj.fused.inner)]*3.000000f)
}
}
}
my result using TVM 0.8:
@main = primfn(A_1: handle, C_1: handle) -> ()
attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
buffers = {C: Buffer(C_2: Pointer(float32), float32, [4, 4], []),
A: Buffer(A_2: Pointer(float32), float32, [4, 4], [])}
buffer_map = {A_1: A, C_1: C} {
allocate(B: Pointer(global float32), float32, [4]), storage_scope = global;
for (ci.cj.fused.outer: int32, 0, 4) {
for (bj: int32, 0, 4) {
B[bj] = ((float32*)A_2[((ci.cj.fused.outer*4) + bj)] + 2f32)
}
for (ci.cj.fused.inner: int32, 0, 4) {
C_2[((ci.cj.fused.outer*4) + ci.cj.fused.inner)] = ((float32*)B[ci.cj.fused.inner]*3f32)
}
}
}
It seems like the limitation mentioned in the doc has no longer existed. So the doc need to be updated.