Some thoughts about TVMScript

Hi,

These days, I am studying TVMScript and some other JIT languages. More or less, they have some similarities and differences. Here are some stuff I think may be helpful for the further development of TVMScript.

  1. At present, TVMScript do not support variable ndims. This makes it somewhat laborious to program. TaiChi use templates to suport this (refer to [Metaprogramming | Taichi Docs](https://meta programming)).
@ti.kernel
def copy_1D(x: ti.template(), y: ti.template()):
    for i in x:
        y[i] = x[i]

@ti.kernel
def copy_2d(x: ti.template(), y: ti.template()):
    for i, j in x:
        y[i, j] = x[i, j]

@ti.kernel
def copy_3d(x: ti.template(), y: ti.template()):
    for i, j, k in x:
        y[i, j, k] = x[i, j, k]

# Kernels listed above can be unified into one kernel using `ti.grouped`:
@ti.kernel
def copy(x: ti.template(), y: ti.template()):
    for I in ti.grouped(y):
        # I is a vector with dimensionality same to y
        # If y is 0D, then I = ti.Vector([]), which is equivalent to `None` used in x[I]
        # If y is 1D, then I = ti.Vector([i])
        # If y is 2D, then I = ti.Vector([i, j])
        # If y is 3D, then I = ti.Vector([i, j, k])
        # ...
        x[I] = y[I]

If TVMScript can support something like it, I think it will be more convenient and save users much time and energy.

  1. Compared with Numba, TaiChi, etc, TVMScript looks like a bit complicated. A new user need to learn some new concepts such as Block, Iterator domain which are not easy to understand. So can we simplify these things? For example, if we can get iterator domain information given the user’s program, then we can tell users that the iterator domain is not necessary to specify. What’s more, if so, explicit block declaration can be not mandatory for scheduling (we can create blocks by “blockize”, and create a generic shedule). For the explicit block declaration, we can categorize it into the advanced programming. this makes TVMScript more python and user-friendly.

  2. I think it’s better if we support python tuple in TVMScript. For example, M, N = a.shape.

7 Likes

@junrushao @Hzfengsy @masahi

yeah, I agree 100% :slight_smile:

  • on point 1, variable ndims support is going to be part of the TVMScript unified parser by @yelite and @cyx. I’m sure it’s doable given our POC is ready but need some extra time to fill in actual code :slightly_smiling_face:
  • on point 2, i’m not sure what the best approach is given the core idea of TIR itself is block isolation. One potential solution is to refer to @tqchen’s incoming summer tutorials; other possible solutions could be introducing more syntactic sugars. please feel free to suggest and we will be more than happy to adopt!!
  • on point 3, yes, it’s going to be supported with the TVMScript unified parser!

I’m +1 for syntax sugar or whatever that can hide the details of TIR mechanics that new users shouldn’t be concerned about. Other python-based DSLs like Taichi and Triton certainly feel more “pythonic”. Also, I always thought that T.reads(...), T.writes(...) etc should be something that can be inferred automatically from the program text.

We are always trying to hide the details and introducing sugar for the Script syntax. Any reasonable sugars are welcomed.

I’m not sure if TVMScript should be the interface for new users. TVMScript is not only an interface language but also a one-to-one mapping for TIR, the whole data structure. TVMScript looks like the syntax for LLVM IR but it’s not the C++ language.

For beginner users TE compute may be a better choice.

T.reads and T.writes can be auto-detected by now for simple workloads :slight_smile:

1 Like

Actually, I also agree with this. I’m quite happy with the current state of TVMScript as something that is primally generated (via some TIR pass or meta-programming trick), rather than written by hand.

But recently I’ve been seeing more activities in selling TVMScript as a gateway to using TVM for, say PyTorch users (“Unity” vision etc). I assume @lightzhan and others have a similar motivation for TVMScript.

Maybe we should have a clearer distinction in the roles of TVMScript as (1) interface for both new and experienced users and (2) data structure representation? Currently, it is kind of a mix of both, and the “internal” aspect of (2) is leaking to (1). I’d love to see TVMScript becoming more pythonic toward (1).

I agree about the two goals (1) and (2).

We started from goal (2) as a printable state, and it is important to have this perspective for productive development and transformation. Having (2) would help us not only in the time of input but also during transformation.

We started to see more activities towards goal (1), which is great. Note that goal (1) does not conflict with (2) as we mainly need improvements on the input side but can still make sure printed output to contain the details.

A more pythonic input (1) would help more people to be able to construct those workloads more easily. In the meantime, the ability to have more fine-grained control (e.g. the ability to explicitly annotate block for perf optimization) when needed would help more people to be able to leverage TVM to solve their needs.

I’m not sure if TVMScript should be the recommended path for new users (it could be dependent on use case, for example), but I think it is a very important piece in defining the API between the core compiler and the code-generator/runtime. I strongly believe it should be easy for someone working at that level to author TVMScript to build test cases, specify the optimizations they want TVM to perform, and manually implement layers via e.g. Relax.

I also really like the idea that one could bypass most of the compiler stack and provide a test case written in TVMScript that demonstrates “here is how you achieve good performance on this codegen/hardware.” What do folks think about that?

One thing is that this is a very AOT-centric view, since that’s the only place where TIR fully specifies the program. For VM users, it’s harder to rely on TIR for purposes like this. Moving towards unifying the VM and AOT executors via translating VM instructions into TIR would remove this limitation.

cc @kparzysz @mousius @manupa-arm