[TVM Runtime] Is it possible to output a independent TVM Runtime or Operator Kernel?

TVM is a really handy ML Compiler for model optimizations. However, when it comes to deployment, clone and build the whole TVM repository seems tedious and unnecessary for inference-only scenarios.

I am wondering what is the lightest practice of running the tvm_runtime in a python-based pipeline, how can we minimize the requirements and only build the must-needs for inference? Or even further, can we output a completely dependent-free runtime.so that doesn’t need any library to run with?

The other question is that, since TVM already has a very well developed codegen system, is it do-able to generate single operators (like cuda kernels) instead of a complete IR or runtime for the whole model? So that we can plug the auto-optimized operators into frameworks like pytorch and TensorRT.

Thanks!

@anijain2305 @masahi @AndrewZhaoLuo @lhutton1

Yep this is pretty much doable to directly generate cuda. Example: https://github.com/apache/tvm/blob/main/apps/topi_recipe/conv/depthwise_conv2d_test.py#L46

Many thanks for the reply! This looks like what I am searching for. Just to be sure, is this topi_recipe repo under apps created for producing optimized .cu operators via TVM that can be shared/implemented with other frameworks? Such as TensorRT, PyTorch, etc.