Recently I am doing some work that requires a minimal form of TVM runtime deployment via python API. After walking through the examples from how_to_deploy and some relevant posts on this forum, I found myself still very confused. Here are several of my questions:
-
Per this tutorial, we can get a minimum runtime of around 300K ~ 600K. How do we do that in the Python API scenario? Can we deploy an already compiled model_runtime.so without build the whole TVM project on a new machine? (Suppose the new machine and the machine we used to compile the model_runtime shares the same environment configuration)
-
To simplify my confusion: is there any way to bypass the “build TVM” procedure and provide some out-of-the-box read-to-use runtime bundle? (i.e. Just put a minimum dependency package/folder on the target machine and being able to import the model into the whatever python ML pipeline to do the inference job)
Thanks for the help!
@junrushao @merrymercy @masahi @AndrewZhaoLuo @lhutton1 @anijain2305