Great tutorial! I have watched the TVMCon23 recording for this tutorial.
My situation is: I have my Python Runtime that can call my HW primitives library from host CPU to a PCIe accelerator card. I can follow the ‘FuseOpsByPattern’ way to map a Relax subgraph (conv_relu for example) to one of my primitives. I want to compile the CPU Runtime with my Python Runtime into one executable, so I can accelerate certain subgraphs within each layer in a multi-layer neural network model.
My question is: How can I call my Python Runtime from the Relax VM (, multiple times)?
My assumption is: I need to register my Python Runtime with MyMod.attrs[‘external_mods’]. This part is missing in the tutorial, since TensorRT is a registered BYOC runtime.