compact_buffer_region PASS modify shared buffer stride[0] to
T.int64(72) * T.min((n + T.int64(63)) // T.int64(64) * T.int64(64), T.int64(96))
and stride[1] is T.int64(72)
but in LowerOpaqueBlock PASS it report error:
InternalError: Check failed: (is_zero(floormod(buffer->strides[i - 1], buffer->strides[i]))) is false:
is_zero(floormod(buffer->strides[i - 1], buffer->strides[i])) is not true.
Generally not. The TIR part is shared. It would not be a surprise that current TIR handling encounter certain issue on dynamic shape workloads from unity. For this case, it just do not know c * some_index is divisible by c . If it works, do you mind send this quick fix for the TVM repo? Thank you!
could u please to do code review ?
There is one more bug i found in PASS InjectPTXAsyncCopy .
that is dst_offset.dtype could be int64, the dtype of PrimExpr(index_factor) would be set to default to int32 .
cause dtype inconsistent when calling tir::Mul .
quick fix