[DEPLOY] Deploy and run generated module without tvm runtime

areusch · December 2, 2020, 6:27pm

i’m not super-familiar with CUDA but here are some higher-level answers. i’ll defer to others to speak to whether you can do this with CUDA.

I have compile the model and export to .so/.o(relay.build and export_library), is that possible to load this library or link .o with a basic main without any tvm runtime dependencies ? Since the generated library itself does not rely on any tvm shared libraries and I don’t have tunning need. I don’t understand why tvm runtime is needed to be deployed on target devices(ie arm).

right now, the generated code produced by TVM is a set of functions, each of which implements a part of the model. a runtime (the graph runtime, as you mentioned) is needed to invoke these operators in the correct order and manage the tensor memory required. the TVM runtime contains that runtime plus additional utilities. A lightweight example is given in apps/bundle_deploy, but I don’t think this example includes cuda driver code. you may be able to modify the C++ variant in that demo to add the CUDA code

If tvm runtime is not needed, how could I know the entry API or export the header files for this generated lib ? Without that, I could not know how to use the lib to fetch the input and run inference .

The entry API is the GraphRuntime API, but this is being abstracted into a module-based model runtime API. An example of using the C++ GraphRuntime is in bundle_deploy.

Add for the 1st question, from the document, relay.build is used to builds a Relay function to run on TVM graph runtime , so maybe it explains why the exported lib through this method must rely on tvm runtime, so my basic point is: is that possible to compile a model and build a lib without tvm runtime dependences ?

we’re working to build an Ahead-of-Time compiler in the coming months, which would remove the need for the graph runtime. a v1 of that is here, but we expect that to change considerably before we merge it in to master.