As we presented in TVMCon’23. in TVM Unity, compilation flow is modular and composable including BYOC.
This tutorial walk-throughs how BYOC offloading works in Unity pass infra and how it works with other passes, such as lowering and MetaSchedule tuning pass.
Hope this helps and please follow-up on this thread if you have any question or feedback. Thank you!
Please do post followup questions in here(there is a unity tags in forum) and I am sure the community would love to hear about more feedbacks and bring discussions together
Great tutorial! I have watched the TVMCon23 recording for this tutorial.
My situation is: I have my Python Runtime that can call my HW primitives library from host CPU to a PCIe accelerator card. I can follow the ‘FuseOpsByPattern’ way to map a Relax subgraph (conv_relu for example) to one of my primitives. I want to compile the CPU Runtime with my Python Runtime into one executable, so I can accelerate certain subgraphs within each layer in a multi-layer neural network model.
My question is: How can I call my Python Runtime from the Relax VM (, multiple times)?
My assumption is: I need to register my Python Runtime with MyMod.attrs[‘external_mods’]. This part is missing in the tutorial, since TensorRT is a registered BYOC runtime.
Is it possible to do the registration as UMA tutorial does: tvm/apps/uma/_template/backend.py? It suits my situation since I also have C Runtime API for my accelerator.