Different output in C and Python for ONNX model


I have an ONNX model that I’m trying to run on C, like the bundle deploy example - the static option. However, the results differ between Python and C. (Python outputs the correct results)

Following the example, I generate the graph, lib and params in Python, then load them in C. When running extremely simple models that only do slicing, it works. However, when running a bit more complex model, the results differ greatly between Python and C. In addition, when changing opt_level to 0, the output in C changes, but is still wrong.

Generating graph, lib, params:

 mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)
 with tvm.transform.PassContext(opt_level=3, config={"tir.disable_vectorize": True}):
         target = tvm.target.Target("llvm --runtime=c --system-lib")
         g_json, mmod, params = relay.build(mod, target=target, params=params)

# save artifacts
bin_params = tvm.runtime.save_param_dict(params)
lib_file_name = os.path.join(build_dir, file_format_str.format(name=model_name, ext="tar"))
with open(
        os.path.join(build_dir, file_format_str.format(name=model_name + "_graph", ext="json")), "w"
) as f_graph_json:
with open(
        os.path.join(build_dir, file_format_str.format(name=model_name + "_params", ext="bin")), "wb"
) as f_params:

Loading in C:

char* json_data = (char*)(build_dvt_graph_c_json); char* params_data = (char*)(build_dvt_params_c_bin); uint64_t params_size = build_dvt_params_c_bin_len;

// more input and output config here, basically the same as in the example except for sizes

void* handle = tvm_runtime_create(json_data, params_data, params_size, argv[0]); tvm_runtime_set_input(handle, “0”, &input); tvm_runtime_run(handle); tvm_runtime_get_output(handle, 0, &output); tvm_runtime_destroy(handle);

I used the debugger and this is the log for Python:

[17:10:44] /home/sapire/git/tvm_r/src/runtime/graph_executor/debug/graph_executor_debug.cc:103: Iteration: 0 [17:10:44] /home/sapire/git/tvm_r/src/runtime/graph_executor/debug/graph_executor_debug.cc:108: Op #0 fused_strided_slice: 0.198826 us/iter [17:10:44] /home/sapire/git/tvm_r/src/runtime/graph_executor/debug/graph_executor_debug.cc:108: Op #1 fused_mean: 0.908446 us/iter [17:10:44] /home/sapire/git/tvm_r/src/runtime/graph_executor/debug/graph_executor_debug.cc:108: Op #2 fused_strided_slice_1: 0.355717 us/iter [17:10:44] /home/sapire/git/tvm_r/src/runtime/graph_executor/debug/graph_executor_debug.cc:108: Op #3 fused_mean_1: 1.0394 us/iter [17:10:44] /home/sapire/git/tvm_r/src/runtime/graph_executor/debug/graph_executor_debug.cc:108: Op #4 fused_strided_slice_2: 0.805902 us/iter [17:10:44] /home/sapire/git/tvm_r/src/runtime/graph_executor/debug/graph_executor_debug.cc:108: Op #5 fused_mean_2: 7.23777 us/iter [17:10:44] /home/sapire/git/tvm_r/src/runtime/graph_executor/debug/graph_executor_debug.cc:108: Op #6 fused_take_concatenate_strided_slice_reshape_squeeze_subtract_strided_slice_abs: 0.108174 us/iter [17:10:44] /home/sapire/git/tvm_r/src/runtime/graph_executor/debug/graph_executor_debug.cc:108: Op #7 fused_sum: 0.0805326 us/iter [17:10:44] /home/sapire/git/tvm_r/src/runtime/graph_executor/debug/graph_executor_debug.cc:108: Op #8 fused_multiply_divide: 0.0795978 us/iter

When looking at the output of Op #1 fused_mean, there’s already a difference between Python and C. (I also turned on TVM_CRT_DEBUG, the same operators are called in C.)

Any help with this issue would be much appreciated.