How to use TVM to convert an ONNX model to a CMSIS-NN model

The TVM repository provides examples of Micro-TVM. In this examples, TVM is capable of converting TFLite models to CMSIS-NN models and deploying them on Cortex-M microcontrollers. When I attempted to replace the model with my quantized ONNX model, TVM also successfully converted the model. However, upon inspecting the “metadata.json” file, I noticed that TVM did not transform the quantized operators into the corresponding CMSIS-NN operators. Is this behavior normal?

Model

My ONNX Model

Bash Command:

python3 -m tvm.driver.tvmc compile \
    --target=cmsis-nn,c \
    --target-cmsis-nn-mcpu=cortex-m55 \
    --target-c-mcpu=cortex-m55 \
    --runtime=crt     \
    --executor=aot     \
    --executor-aot-interface-api=c     \
    --executor-aot-unpacked-api=1     \
    --pass-config tir.usmp.enable=1     \
    --pass-config tir.usmp.algorithm=hill_climb     \
    --pass-config tir.disable_storage_rewrite=1     \
    --pass-config tir.disable_vectorize=1 \
    --output-format=mlf     \
    --model-format=onnx \
    --dump-offloads="" \
    --module-name=clas \
    --input-shapes="inputs:[1,3,224,224]" \
    --output=output \
    model.onnx

@lhutton1 Do you have any good ideas? Is the reason for the issue that the model’s operators have not been converted to ‘_qnn.nn’ operators??

I’m not very familiar with ONNX although I took a look at the model and it seems the data layout is NCHW which is not supported by CMSIS-NN BYOC. Could you try to add --desired-layouts=NHWC to the tvmc compile command to attempt to convert the input graph?

Also cc @ashutosh-arm who may have some more ideas

I try to add --desired-layout=NHWC to compile the model.But the situation seems to be getting worse, because the depthwise_conv2d with layout NHWC is not optimized for arm cpu.

my compile command is:

python3 -m tvm.driver.tvmc compile \
    --desired-layout NHWC \
    --target=cmsis-nn,c \
    --target-cmsis-nn-mcpu=cortex-m55 \
    --target-c-mcpu=cortex-m55 \
    --runtime=crt     \
    --executor=aot     \
    --executor-aot-interface-api=c     \
    --executor-aot-unpacked-api=1     \
    --pass-config tir.usmp.enable=1     \
    --pass-config tir.usmp.algorithm=hill_climb     \
    --pass-config tir.disable_storage_rewrite=1     \
    --pass-config tir.disable_vectorize=1 \
    --output-format=mlf     \
    --model-format=onnx \
    --dump-offloads="" \
    --module-name=clas \
    --input-shapes="inputs:[1,3,224,224]" \
    --output=output.tar \
    model.onnx

Out of curiosity, did any of the operators in the graph get offloaded to CMSIS-NN? The depthwise operation mentioned in the warning might be a specific case not supported by CMSIS-NN and therefore it must run on fallback schedules provided by TVM.

@Zheng-Bicheng as you have mentioned above, the partitioner specifically looks for qnn.conv2d as can be seen here: https://github.com/apache/tvm/blob/e2c8d7b33ea158a6775273431cb09aec776d311e/python/tvm/relay/op/contrib/cmsisnn.py#L95. I see that the model you have shared contains float32 conv2ds.

@lhutton1 The operators of this model have not been get offloaded to cmsis-nn at all. It could be run by dsp? In arm_cpu.py that you can see the error.

@ashutosh-arm I think I’ve identified the cause of the problem, and it should be related to the ONNX quantization format. For instance, the quantization information of the conv2d operator in the model is stored in the quantize_linear and dequantize_linear operators of the output. The conv2d operator itself doesn’t carry quantization information, which is why its data type is FP32.

@ashutosh-arm I’ve been thinking about it, and the best solution to successfully convert similar ONNX/Paddle models to cmsis-nn models might be to consolidate the mentioned three operators (or even more) into a _qnn operator. In the future, I plan to submit a Pull Request to accomplish this. Can I seek your assistance with this?

Conversion of those 3 into one operator as a Relay transform sounds like a good idea.

As @lhutton1 mentioned, watch out for the layout differences.