C++ deployment and inference for an CNN module - how to?

W1k1 · April 22, 2020, 7:54pm

Hello everyone,

i am currently trying to deploy an CNN model. So i setup everything in python, tested it (run inference) and exported my module as shared library (.so). Then i went further on by looking on the sample “how_to_deploy” in the apps. I build the runtime for my target and then setup my own inference code and makefile along the tutorial.

In another thread i found out that the basic function name for the packed function is the same as in the graph.json. Now i can retrive the packed function. I am a little bit confused, my graph json tells me (as well as the compiler), that the function needs four arguments: input, p0, p1 and the last must be the output.

So what must be fed for p0 and p1 ? Also i have seen samples where they using “set_input”, “run” etc. did i miss something here ? I am using the latest master.

Any help on this would be highly appreciated.

adb · April 22, 2020, 8:38pm

Constant tensors are renamed ‘p0’, ‘p1’, etc after compiling. You should get a params dict from the call to relay.build. Use this with a GraphRuntime object. Should look similar to the following

    tvm::runtime::Module mod_syslib = tvm::runtime::Module::LoadFromFile("lib.so");

    // json graph
    std::ifstream json_in("graph.json", std::ios::in);
    std::string json_data((std::istreambuf_iterator<char>(json_in)), std::istreambuf_iterator<char>());
    json_in.close();

    // parameters
    std::ifstream params_in("params.bin", std::ios::binary);
    std::string params_data((std::istreambuf_iterator<char>(params_in)), std::istreambuf_iterator<char>());
    params_in.close();

    // parameters need to be TVMByteArray type to indicate the binary data
    TVMByteArray params_arr;
    params_arr.data = params_data.c_str();
    params_arr.size = params_data.length();

    int dtype_code = kDLFloat;
    int dtype_bits = 32;
    int dtype_bytes = dtype_bits / 8;
    int dtype_lanes = 1;

    int cpu_dev_type = kDLCPU;
    int cpu_dev_id = 0;

    // get global function module for graph runtime
    auto graph_runtime = tvm::runtime::Registry::Get("tvm.graph_runtime.create");
    tvm::runtime::Module mod = (*graph_runtime)(json_data, mod_syslib, cpu_dev_type, cpu_dev_id);

    // Create input tensor
    DLTensor* x;
    int in_ndim = 2;

    // First index of input shape is BATCH_SIZE
    int64_t in_shape[2] = {1, 784};

    TVMArrayAlloc(in_shape, in_ndim, dtype_code, dtype_bits, dtype_lanes, cpu_dev_type, cpu_dev_id, &x);

    // load image data saved in binary
    const std::string data_filename = "input.bin";
    std::ifstream data_fin(data_filename, std::ios::binary);
    if(!data_fin) throw std::runtime_error("Could not open: " + data_filename);
    data_fin.read(static_cast<char*>(x->data), 1 * 784 * dtype_bytes);

    tvm::runtime::PackedFunc set_input = mod.GetFunction("set_input");
    set_input("data", x);

    tvm::runtime::PackedFunc load_params = mod.GetFunction("load_params");
    load_params(params_arr);

    tvm::runtime::PackedFunc run = mod.GetFunction("run");
    run();

So you need a call to load_params to load the weights, and then another call to set_input before you can run an inference.

heliqi · April 23, 2020, 10:04am

After writing c++ code, how do I write the makefile and then compile the.cpp code？ Is there a demo like the c++ code above?

W1k1 · April 23, 2020, 4:52pm

Thanks @adb for providing this, worked for me! My misunderstanding was that expectation, that the .so already contains the “ready to go” runtime module, which is obviously not the case.

@heliqi run the script “run.sh” in the the apps/how_to_deploy folder. This will give you the runtime.o, link this against your executable and provide as well the header files. The Makefile in the how_to_deploy folder does exactly this. If you want to use OpenCL, you will also need to link this. You may have also a look here: https://docs.tvm.ai/deploy/nnvm.html