Using TVM to build AI ASIC accelerators compiler stack

lixinmo · April 25, 2024, 2:17pm

I noticed that in the TVM codebase, different levels of memory seem to primarily target GPU architecture, for example:

"global"
"shared"
"warp"
"local"
"wmma.matrix_a"
"wmma.matrix_b"
"wmma.accumulator"

The concept of threads also seems to be mainly for GPUs, for example:

"blockIdx.。"
"threadIdx.。"

The relationship between the rank of memory and the rank of threads will involve different results in loop range inference during the code lowering process.

I would like to ask, if using TVM as the basis for building AI ASIC accelerators, and wanting to represent different levels of memory in the accelerator, such as DDR, Shared Memory, local memory, as well as different parallel levels, such as Tile, Core. How should these levels of memory in the accelerators and different hardware parallel levels be expressed? Is there an official recommended or reasonable technical route?