I have a implementation done in my fork, which is to introduce sugars at parser/printer level. In the IR local variables remain buffers.
A = T.alloc_cell("int32")
A = A + 1
T.cp_async(A.buffer.data, ...)
Parser will create a buffer with dtype int32 with shape [1], but in the value table of parser, A is recorded as A[0] (BufferLoad). Then anywhere A used as a PrimExpr naturally works.
BufferStore is the place that needs additional handling. Parser accepts the case where the lhs of assign (and augassign) is a BufferLoad (of a [1] shape buffer).
If the user wants to access Buffer attributes, or encounters any other cases where the buffer of A is needed, A.buffer can be used.
The primray motivation of this solution is that
- I don’t prefer heavy solutions like introducing more nodes into IR
- For backend codes (like CUDA), it doesn’t seem to matter to keep local variables as arrays. Or you can modify the code genenator. It doesn’t affect other system parts anyway