Implementing AOT in TVM

Thanks everyone for great discussion so far and the initial AOT PoC. Thanks @giuseros and others for bringing in the first PoC. I finally get time to look into the proposed changes, these are great work.

My main comments so far have things to do with interface design and how to make things in an architecture consistent way.

Specifically, it would be great to think about the general API design and consolidation. In particular, we should de-couple the implementation of the API(AOT vs interpreter based) from the API interface design.

Ideally a user should use a similar compilation path for compiling(except for a different flag), exportation and then load a AOT module

Right now we can see are a few variants of ways to expose the model generated by AOT:

  • W0: Through runtime.Module and PackedFunc interface, the executor is a runtime.Module which contains three packed functions(set/get/run), this is in alignment with the Module based runtime interface mentioned in the previous
  • W1a: A standardized C API for graph/aot model execution only in C runtime.
  • W1b: A standardized C API runtime that wraps the module-based API(W0) and exposes a higher level API to the user.
  • W2: A separate C API that allows direct invocation of the generated model, specifically for AOT

From W2 => W1 => W0 there are different levels of standardization being involved.

For example, if AOT generates the code that obeys the W0 convention, then we can naturally test the result generated by AOT directly through python, run the code through RPC using the current set of infrastructure. The AOT tutorial can then directly sits insides the uTVM tutorials via python.

W1a and W1b are similar to each other(from the users’ PoV), except that in the case of W1b, W0 was the first class citizen, and the common part. W1a models things in another round. Finally W2 means the developers need to be aware of the backend that is being used.

Given the importance of embedded setting, I think it is useful to have some form of W1(a or b), that allows users to directly have a set of common convention for C runtime. However, such API ideally should not be AOT specific, but instead the “official” way to use all generated results in C API.

I also think it would be useful to always start by thinking about W0 support. Although W0 introduced an indirection(e.g. run function can be a direct C API instead of a PackedFunc), we already used PackedFunc through the per operator function, using PackedFunc for the general case won’t add too much of an overhead, but would enable the benefit mentioned above.

Would love to get everyone’s take, in terms of (1) engineering feasibility/ overhead of the Ws, (2) preference of the interface choice.