Python and C++ inference are different?

When using the python api, everything works fine. If i am running the inference trough c++, all channels of the output image have the same value (so r,g,b is always the same - it looks like an intermediate tensor). This happens with macOS and OpenCL. Any ideas why this is happening? The module is build as dynamic library via. python.

The code is for both is pretty identical…

set_input() run() get_output(0)

Please try to call ctx.sync() in C++ before you get the output. asnumpy by default will synchronize the copy

Thanks for the hint! How to get the context ?

I have tried it with TVMSynchronize(kDLOpenCL, 0, NULL); but the result is still the same.

The synchronize part seems be fine. Then you might want to double check your code’s equivalence. It might also be helpful to switch to say CPU version and see if the error disappears. Note that depending on what you call, you want to pre-allocate output NDArray in CPU before you call get_output(0), or you call get_output to get the output NDArray(in opencl context) and call Copy to copy it out

Mhm i tried it as well as with llvm as target. The output then is just an grey image… I have build the module in python with:

json, lib, params =, target=target, params=params)
with open(os.path.join("./", 'graph.json'), 'w') as f_graph_json:

with open(os.path.join("./", "deploy.params"), "wb") as fo:

Then i just run some inference in python:

runtime = tvm.contrib.graph_runtime.create(json, lib, tvm.context(target, 0))
runtime.set_input("input", tvm_array)
image_tensor_relay_module_lib, _ = runtime.get_output(0).asnumpy(), runtime

In C++ i just load the so lib and retrive the runtime:

tvm::runtime::Module mod_dylib = tvm::runtime::Module::LoadFromFile(...)
std::ifstream json_in("./graph.json", std::ios::in);
std::string json_data((std::istreambuf_iterator<char>(json_in)), std::istreambuf_iterator<char>());
std::ifstream params_in("./deploy.params", std::ios::binary);
std::string params_data((std::istreambuf_iterator<char>(params_in)), std::istreambuf_iterator<char>());

TVMByteArray params_arr; = params_data.c_str();
params_arr.size = params_data.length();

auto graph_runtime = tvm::runtime::Registry::Get("tvm.graph_runtime.create");
tvm::runtime::Module mod = (*graph_runtime)(json_data, mod_dylib, kDLOpenCl, cpu_dev_id);
tvm::runtime::PackedFunc set_input = mod.GetFunction("set_input");

int64_t in_shape[4] = {1, channel, width, height}; // NCWH
int dtype_code = kDLFloat;
int dtype_bits = 32;
int dtype_bytes = dtype_bits / 8;
int dtype_lanes = 1;
TVMArrayAlloc(in_shape, in_ndim, dtype_code, dtype_bits, dtype_lanes, kDLOpenCL, cpu_dev_id, &x);
TVMArrayCopyFromBytes(x, normalized_image, sizeof(float) * width * height * channel);
tvm::runtime::PackedFunc run = mod.GetFunction("run");
tvm::runtime::PackedFunc get_output = mod.GetFunction("get_output");
TVMSynchronize(kDLOpenCL, 0, NULL);
get_output(0, y);
TVMArrayCopyToBytes(y, output_image, sizeof(float) * width * height * channel)



was missing. Maybe we could add some check to the C++ API ?

Now that tvm.graph_runtime.create is deprecated for tvm.graph_executor.create, is there a way to still achieve establishing the graph runtime with the 3 artifact compile (i.e .so, json, and params)?

@W1k1, I have the same code logic as your reply above, but I changed this segment here:

    int dtype_code = kDLFloat;
    int dtype_bits = 32;
    int dtype_lanes = 1;
    int device_type = kDLCPU;
    int device_id = 0;

    // ...
    // I read in all the json and params the same way. It's omitted here.
    // ...
    // ...

    auto tvm_graph_runtime_create = tvm::runtime::Registry::Get("tvm.graph_executor.create");
    tvm::runtime::Module gmod = (*tvm_graph_runtime_create)(json_data, mod_factory, device_type, device_id);
    tvm::runtime::PackedFunc set_input = gmod.GetFunction("set_input");
    tvm::runtime::PackedFunc get_output = gmod.GetFunction("get_output");

Changing from graph_runtime to graph executor was a simple rename in Python, but this is not the case on the C++ side…