Implementing AOT in TVM

giuseros · March 4, 2021, 3:27pm

Hi Andrew,

for AOT runtime I agree we do not need JSON parsing or any of the underlying facilities it brings. However, given it seems like you’re planning to reuse the C-runtime memory allocator and interfaces in include/tvm/crt/platform.h, I think it would be great to continue using --runtime=c in the target string and create an additional flag or other tvm.relay.build() argument. I don’t know that the (graph) runtime specification belongs in the Target string.

Thanks for this clarification. Yes, this interface is fine for now. About the implementation we will have aot_runtime.h in a separate src/runtime/aot folder which will #include the crt memory manager from src/runtime/crt, for now. In future we will make a memory manager specific for AOT (possibly code generating information like the required memory to run the network).

Could you say why you need this set? Currently it’s always NULL. I think it would be great to develop a pattern to use it, but right now the most natural pattern is to set it to the TVMModule instance that contains the operator function.

So the short answer is that we don’t have a clear idea yet. But we were hoping to actually develop a pattern to use it, as you suggest. That’s though something I think deserves a separate and more detailed discussion

It would be nice to keep the logic for assembling PackedFunc args and handling return values in tir.call_packed. This way if we change the interface, we don’t have to look in too many places.

Mainly I’m trying to make sure that to simplify the compiler, we implement the same conceptual TIR on both C++ and C runtimes. In the C++ runtime, we use PackedFunc as a “calling convention” to avoid needing to effectively hardcode C in various code generators. For instance, when dispatching to a compute library e.g. CUDA, a PackedFunc serves as a sort of adapter glue layer between TVM and CUDA.

In the C++ runtime, not all PackedFunc live in the same runtime::Module. So, we need the string lookup to do a sort of “late-binding.” In the C runtime, you’re right that the primary use case for this late-binding is with the RPC server. Perhaps we should just change CodeGenC and CodeGenLLVM to implement tir.call_packed when targeting C runtime by calling the symbol directly with the PackedFunc API instead of invoking TVMBackendGetFuncFromEnv. Would this address your concerns?

Yes, I like this approach. Basically we get rid of the system library in c, but not of the dynamic system library in c++ (where it probably is less of an issue). This means this work could possibly be extended to support c++ runtime in the future.

That also makes sense. I think my question was poorly worded before. Just confirming that, similar to MetadataModule, this would be lib, in the return value from graph_json, lib, params = tvm.relay.build()? At present, those things are wrapped in GraphRuntimeFactoryModule, and we’ll need to address that. I have another RFC forthcoming in a week or so to discuss changes there designed to support µTVM and accelerator use cases.

Yes, this exactly what I meant. I am looking forward to the RFC!

Thanks,

Giuseppe