Dynamic shape support with tvm.tir

Infi-zc · July 9, 2023, 7:40am

Hi all, I understand that there are two kinds of dynamic shape issues:

the output shape can be inferred from the input shape.

In this case, a tvm.tir.Var can be used as a symbolic dim.The symbolic dim can be then used to represent other shape shapes, which can be inferred from the actual shape of the input during runtime. (ps: i use tvm v0.8.0 currently)

This scenario can be handled by TVM.

the output shape is determined by the actual input content rather than the shape of the input.

For example, in a TensorFlow graph, a sparse tensor can be represented by three input placeholders: dense_shape (whose shape is [2]), values , and indices . I want to perform some processing on the values and then replenish them based on indices into a tensor of shape dense_shape . I only know that the dimension of dense_shape is 2, but I cannot know its specific value before run, and therefore I am not sure what the shape of the output will be.

I try to use IRBuilder to build graph directly to express the same graph semantic as tensorflow graph.

Can this scenario be handled by tvm? If I use two symbolic tvm.tir.Var to represent dense_shape, how could I let tvm known that these two Var should be assigned to the value of actual input content of dense_shape with IRBuilder?

Thanks a lot

yzh119 · July 9, 2023, 3:21pm

The first issue was already supported in TVM, just use Var as you mentioned (btw, v0.8.0 is a little bit out-dated, the version has bumped to v0.14).

The second issue is about data-dependent operators such as nonzero and spgemm. My opinion is that it should be handled at higher-level IRs such as relax.

The generic workflow of a data-dependent operator could be:

Call an operator (f_estimate) to estimate the memory requirement of the output buffer.
Allocate enough memory for the output buffer.
Call the compute operator (f_compute) to perform actual computation.

After such 3-step decomposition, each operator’s output buffer size is determined before its execution, and both f_estimate and f_compute can be described by TensorIR.

One example is cuSPARSE’s cusparseSpGEMM operator, there are three APIs:

cusparseSpGEMM_workEstimation
cusparseSpGEMM_estimateMemory
cusparseSpGEMM_compute

They are responsible for estimating buffer size and performing actual computation, correspondingly. We can design a construct such as call_tir_data_dependent in Relax for such workflow:

output = R.call_tir_data_dependent((f_estimate, f_compute), args, ...)