[DEPLOY] Deploy and run generated module without tvm runtime

Hi, as a newbie to tvm, after having tried some demos, I have some basic questions:

  1. I have compile the model and export to .so/.o(relay.build and export_library), is that possible to load this library or link .o with a basic main without any tvm runtime dependencies ? Since the generated library itself does not rely on any tvm shared libraries and I don’t have tunning need. I don’t understand why tvm runtime is needed to be deployed on target devices(ie arm).
  2. If tvm runtime is not needed, how could I know the entry API or export the header files for this generated lib ? Without that, I could not know how to use the lib to fetch the input and run inference .

The reason I think it should be possible is that since TVM supports codegen for cuda,I believe the generated code itself does not include any tvm related headers, for llvm it is the llvm IR, so this code should be able to be compiled and run without any 3rd dependencies.

Add for the 1st question, from the document, relay.build is used to builds a Relay function to run on TVM graph runtime, so maybe it explains why the exported lib through this method must rely on tvm runtime, so my basic point is: is that possible to compile a model and build a lib without tvm runtime dependences ?

Thanks in advance. Reference Code Example: https://tvm.apache.org/docs/tutorials/frontend/from_tflite.html#sphx-glr-tutorials-frontend-from-tflite-py

1 Like

hi @huangteng,

i’m not super-familiar with CUDA but here are some higher-level answers. i’ll defer to others to speak to whether you can do this with CUDA.

  1. I have compile the model and export to .so/.o(relay.build and export_library), is that possible to load this library or link .o with a basic main without any tvm runtime dependencies ? Since the generated library itself does not rely on any tvm shared libraries and I don’t have tunning need. I don’t understand why tvm runtime is needed to be deployed on target devices(ie arm).

right now, the generated code produced by TVM is a set of functions, each of which implements a part of the model. a runtime (the graph runtime, as you mentioned) is needed to invoke these operators in the correct order and manage the tensor memory required. the TVM runtime contains that runtime plus additional utilities. A lightweight example is given in apps/bundle_deploy, but I don’t think this example includes cuda driver code. you may be able to modify the C++ variant in that demo to add the CUDA code

  1. If tvm runtime is not needed, how could I know the entry API or export the header files for this generated lib ? Without that, I could not know how to use the lib to fetch the input and run inference .

The entry API is the GraphRuntime API, but this is being abstracted into a module-based model runtime API. An example of using the C++ GraphRuntime is in bundle_deploy.

Add for the 1st question, from the document, relay.build is used to builds a Relay function to run on TVM graph runtime , so maybe it explains why the exported lib through this method must rely on tvm runtime, so my basic point is: is that possible to compile a model and build a lib without tvm runtime dependences ?

we’re working to build an Ahead-of-Time compiler in the coming months, which would remove the need for the graph runtime. a v1 of that is here, but we expect that to change considerably before we merge it in to master.

Although tvm can generate CUDA code, the code only operates on the GPU side, so you will still need code to calculate the cuda launching size etc (on the host). That is why tvm runtime is still preferred.

TVM runtime itself is quite minimum(~50k if only cpu only), with support for PackedFunc and device allocation etc. So it does not hurt to introduce such a dependency

1 Like