i am currently trying to deploy an CNN model. So i setup everything in python, tested it (run inference) and exported my module as shared library (.so). Then i went further on by looking on the sample “how_to_deploy” in the apps. I build the runtime for my target and then setup my own inference code and makefile along the tutorial.
In another thread i found out that the basic function name for the packed function is the same as in the graph.json. Now i can retrive the packed function. I am a little bit confused, my graph json tells me (as well as the compiler), that the function needs four arguments: input, p0, p1 and the last must be the output.
So what must be fed for p0 and p1 ? Also i have seen samples where they using “set_input”, “run” etc. did i miss something here ? I am using the latest master.
Constant tensors are renamed ‘p0’, ‘p1’, etc after compiling. You should get a params dict from the call to relay.build. Use this with a GraphRuntime object. Should look similar to the following
tvm::runtime::Module mod_syslib = tvm::runtime::Module::LoadFromFile("lib.so");
// json graph
std::ifstream json_in("graph.json", std::ios::in);
std::string json_data((std::istreambuf_iterator<char>(json_in)), std::istreambuf_iterator<char>());
json_in.close();
// parameters
std::ifstream params_in("params.bin", std::ios::binary);
std::string params_data((std::istreambuf_iterator<char>(params_in)), std::istreambuf_iterator<char>());
params_in.close();
// parameters need to be TVMByteArray type to indicate the binary data
TVMByteArray params_arr;
params_arr.data = params_data.c_str();
params_arr.size = params_data.length();
int dtype_code = kDLFloat;
int dtype_bits = 32;
int dtype_bytes = dtype_bits / 8;
int dtype_lanes = 1;
int cpu_dev_type = kDLCPU;
int cpu_dev_id = 0;
// get global function module for graph runtime
auto graph_runtime = tvm::runtime::Registry::Get("tvm.graph_runtime.create");
tvm::runtime::Module mod = (*graph_runtime)(json_data, mod_syslib, cpu_dev_type, cpu_dev_id);
// Create input tensor
DLTensor* x;
int in_ndim = 2;
// First index of input shape is BATCH_SIZE
int64_t in_shape[2] = {1, 784};
TVMArrayAlloc(in_shape, in_ndim, dtype_code, dtype_bits, dtype_lanes, cpu_dev_type, cpu_dev_id, &x);
// load image data saved in binary
const std::string data_filename = "input.bin";
std::ifstream data_fin(data_filename, std::ios::binary);
if(!data_fin) throw std::runtime_error("Could not open: " + data_filename);
data_fin.read(static_cast<char*>(x->data), 1 * 784 * dtype_bytes);
tvm::runtime::PackedFunc set_input = mod.GetFunction("set_input");
set_input("data", x);
tvm::runtime::PackedFunc load_params = mod.GetFunction("load_params");
load_params(params_arr);
tvm::runtime::PackedFunc run = mod.GetFunction("run");
run();
So you need a call to load_params to load the weights, and then another call to set_input before you can run an inference.
Thanks @adb for providing this, worked for me! My misunderstanding was that expectation, that the .so already contains the “ready to go” runtime module, which is obviously not the case.
@heliqi
run the script “run.sh” in the the apps/how_to_deploy folder. This will give you the runtime.o, link this against your executable and provide as well the header files. The Makefile in the how_to_deploy folder does exactly this. If you want to use OpenCL, you will also need to link this.
You may have also a look here: https://docs.tvm.ai/deploy/nnvm.html