Implementing AOT in TVM

@giuseros thanks for your reply! I think this approach makes sense to me–I want to clarify a few more things.

First, we have unfortunately overloaded the word “runtime.” There are 2 different families of runtimes:

  • c and c++ runtime – describes the implementation of c_runtime_api.h and c_backend_api.h.
  • graph, vm, aot runtime – describes how the operator functions are invoked in a model. eventually, could be stated similarly to the above as “describes the implementation of the module-based model interface.” should really be called GraphExecutor or something, but that’s another topic.

I am actually going to send an RFC to propose we rename GraphRuntime and family to e.g. GraphExecutor this week.

for AOT runtime I agree we do not need JSON parsing or any of the underlying facilities it brings. However, given it seems like you’re planning to reuse the C-runtime memory allocator and interfaces in include/tvm/crt/platform.h, I think it would be great to continue using --runtime=c in the target string and create an additional flag or other tvm.relay.build() argument. I don’t know that the (graph) runtime specification belongs in the Target string.

The main point, as you correctly spotted, is to understand how to populate the resource_handle in the call to the run_func

Could you say why you need this set? Currently it’s always NULL. I think it would be great to develop a pattern to use it, but right now the most natural pattern is to set it to the TVMModule instance that contains the operator function.

Since we are getting rid of the JSON, I don’t think we need this mapping any more.

A couple of thoughts:

  1. It would be nice to keep the logic for assembling PackedFunc args and handling return values in tir.call_packed. This way if we change the interface, we don’t have to look in too many places.
  2. Mainly I’m trying to make sure that to simplify the compiler, we implement the same conceptual TIR on both C++ and C runtimes. In the C++ runtime, we use PackedFunc as a “calling convention” to avoid needing to effectively hardcode C in various code generators. For instance, when dispatching to a compute library e.g. CUDA, a PackedFunc serves as a sort of adapter glue layer between TVM and CUDA.
  3. In the C++ runtime, not all PackedFunc live in the same runtime::Module. So, we need the string lookup to do a sort of “late-binding.” In the C runtime, you’re right that the primary use case for this late-binding is with the RPC server. Perhaps we should just change CodeGenC and CodeGenLLVM to implement tir.call_packed when targeting C runtime by calling the symbol directly with the PackedFunc API instead of invoking TVMBackendGetFuncFromEnv. Would this address your concerns?

The main thing here is to move the control code inside the library, and deliver the minimal API to use it

Ok, that makes sense.

I modified the strawman image (in the RFC) with a proper self-contained example to show the overall flow. Please, let me know if that explains things more clearly.

Yeah this makes sense. Sounds good to me.

I was thinking to have a separate module AOTModule that will import the different modules within it.

That also makes sense. I think my question was poorly worded before. Just confirming that, similar to MetadataModule, this would be lib, in the return value from graph_json, lib, params = tvm.relay.build()? At present, those things are wrapped in GraphRuntimeFactoryModule, and we’ll need to address that. I have another RFC forthcoming in a week or so to discuss changes there designed to support µTVM and accelerator use cases.