[Unity][Tutorial] TVM Unity BYOC

As we presented in TVMCon’23. in TVM Unity, compilation flow is modular and composable including BYOC.

This tutorial walk-throughs how BYOC offloading works in Unity pass infra and how it works with other passes, such as lowering and MetaSchedule tuning pass.

Hope this helps and please follow-up on this thread if you have any question or feedback. Thank you! :slight_smile:

6 Likes

Great! BTW, when will ‘relax’ be merged into the main branch?

Checkout Establish TVM Unity Branch for background, more tutorials will be published and you can play with the code in unity branch atm

1 Like

Please do post followup questions in here(there is a unity tags in forum) and I am sure the community would love to hear about more feedbacks and bring discussions together

Great tutorial! I have watched the TVMCon23 recording for this tutorial.

My situation is: I have my Python Runtime that can call my HW primitives library from host CPU to a PCIe accelerator card. I can follow the ‘FuseOpsByPattern’ way to map a Relax subgraph (conv_relu for example) to one of my primitives. I want to compile the CPU Runtime with my Python Runtime into one executable, so I can accelerate certain subgraphs within each layer in a multi-layer neural network model.

My question is: How can I call my Python Runtime from the Relax VM (, multiple times)?

My assumption is: I need to register my Python Runtime with MyMod.attrs[‘external_mods’]. This part is missing in the tutorial, since TensorRT is a registered BYOC runtime.

Is it possible to do the registration as UMA tutorial does: tvm/apps/uma/_template/backend.py? It suits my situation since I also have C Runtime API for my accelerator.

Or I can register my Python Runtime API without re-compiling TVM, as shown in Registering Runtime Function in Chap.4 of MLC.ai class?

Just verified: We are able to call the Python Runtime API from Relax.