How to use microTVM with mimxrt1060_evk

I notice that microTVM can directly support mimxrt1050_evk, but I’m working on one mimxrt1060_evk. I’m wondering whether microTVM can also run on 1060_evk, because these two boards are quite similar, they both belong to rt10xx and cortex-m7 boards, they are both supported by zephyr.

So far, I have made some try based on the guide Executing a Tiny Model with TVMC Micro — tvm 0.9.dev0 documentation, but there are still some problems.

My enviroment:

ubuntu 20.04 (vmware)

tvm 0.9 (commit 7e376e2599b6422bb1562bdf1823413276914d5d), set(USE_MICRO ON)

llvm-10

zephyr 2.7.0 (installed diretly instead of ReferenceVM), sdk 0.14.2

JLink 7.66c

What I have tried:

  1. compile

python -m tvm.driver.tvmc compile magic_wand.tflite --target=‘c -keys=cpu -link-params=0 -mcpu=cortex-m7 -model=imxrt10xx’ --runtime=crt --runtime-crt-system-lib 1 --executor=‘graph’ --executor-graph-link-params 0 --output model.tar --output-format mlf --pass-config tir.disable_vectorize=1 --disabled-pass=AlterOpLayout

output:

conv2d NHWC layout is not optimized for x86 with autotvm.
conv2d NHWC layout is not optimized for x86 with autotvm.
One or more operators have not been tuned. Please tune your model for better performance. Use DEBUG logging level to see more details.
  1. flash project

edit board.json in build/microtvm_template_projects/zephyr/boards.json

add:

"mimxrt1060_evk": {
    "board": "mimxrt1060_evk",
    "model": "imxrt10xx",
    "is_qemu": false,
    "fpu": true,
    "vid_hex": "1366",
    "pid_hex": "0105"
},

python -m tvm.driver.tvmc micro create project model.tar zephyr --project-option project_type=host_driven zephyr_board=mimxrt1060_evk

python -m tvm.driver.tvmc micro build project zephyr --project-option zephyr_board=mimxrt1060_evk

python -m tvm.driver.tvmc micro flash project zephyr --project-option zephyr_board=mimxrt1060_evk

No explicit error

  1. run
 python -m tvm.driver.tvmc run \
     --device micro \
     project \
     --project-option zephyr_board=mimxrt1060_evk \
     --fill-mode ones \
     --print-top 4

Error: Could not open a session with the micro target.

And there are some output in serial, but it also looks not very good.

Can anyone help me, thanks!

hey @YAETHeryk thanks for trying this out! it does look like you are seeing the expected RPC server traffic in your console (it is binary), so it looks like the problem here is that tvm.driver.tvmc isn’t seeing that traffic.

could you try adding ‘-vvvv’ to the tvmc run command and paste the log here? perhaps i can see what’s going wrong.

another thing that could be happening is: apps/microtvm/zephyr/template_project/microtvm_api_server.py needs to pick a local TTY to use for sending RPC traffic. a common task in supporting new boards is pointing this file at a new serial port. perhaps you might need to tweak the port used in that file?

Thanks a lot for your help!

Now I have solved this problem. I would like to share my experience.

Acutally we don’t need to modify anything in microtvm_api_server.py, the device can be recognized via USB vid and pid.

My first mistake is that I choose the wrong id:

"vid_hex": "1366",
"pid_hex": "0105"

I choose this because this has been written in the json for mimxrt1050_evk, and I do have the device with this id.

But, this id in my computer is for SEGGER J-Link. I am not sure whether this also can be used for transport, and I don’t know how to do that.

Anyway, in my case, the UART serial port for my device is 0d28:0204

image

Therefore, just modify the boards.json in the project.

"mimxrt1060_evk": {
    "board": "mimxrt1060_evk",
    "model": "imxrt10xx",
    "is_qemu": false,
    "fpu": true,
    "vid_hex": "0d28",
    "pid_hex": "0204"
},

The second tricky thing is: I open a connection through the serial tool (cutecom) to monitor the ouput (just like running hello_world example). But this will occupy the serial port, causing connection error.

After closing the connection. Everything is OK.

Update:

One extra point: after flashing, SW9 button (power reset) must be pressed once, then the run commmad can succeed, otherwise Error: Could not open a session with the micro target. again, I don’t know why.

And then I tried another example:Autotuning with microTVM — tvm 0.9.dev0 documentation, there are also some problem.

I don’t modify the code in the example except the board config.

boards[BOARD]

{'board': 'mimxrt1060_evk',
 'model': 'imxrt10xx',
 'is_qemu': False,
 'fpu': True,
 'vid_hex': '0d28',
 'pid_hex': '0204'}

And I get this result for auto-tuning code

in terminal:

 Current/Best:    0.00/   0.00 MFLOPS | Progress: (0/10) | 0.00 s

model/codegen/host/src/lib1.c: In function ‘default_function’:
model/codegen/host/src/lib1.c:24:9: warning: unused variable ‘arg_data_shape’ [-Wunused-variable]
   24 |   void* arg_data_shape = (((DLTensor*)arg_data)[0].shape);
      |         ^~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:21:9: warning: unused variable ‘arg_kernel_shape’ [-Wunused-variable]
   21 |   void* arg_kernel_shape = (((DLTensor*)arg_kernel)[0].shape);
      |         ^~~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:17:9: warning: unused variable ‘arg_conv2d_NCHWc_shape’ [-Wunused-variable]
   17 |   void* arg_conv2d_NCHWc_shape = (((DLTensor*)arg_conv2d_NCHWc)[0].shape);
      |         ^~~~~~~~~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:15:11: warning: unused variable ‘arg_data_code’ [-Wunused-variable]
   15 |   int32_t arg_data_code = arg_type_ids[2];
      |           ^~~~~~~~~~~~~
model/codegen/host/src/lib1.c:13:11: warning: unused variable ‘arg_kernel_code’ [-Wunused-variable]
   13 |   int32_t arg_kernel_code = arg_type_ids[1];
      |           ^~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:11:11: warning: unused variable ‘arg_conv2d_NCHWc_code’ [-Wunused-variable]
   11 |   int32_t arg_conv2d_NCHWc_code = arg_type_ids[0];
      |           ^~~~~~~~~~~~~~~~~~~~~

 Current/Best:  359.22/ 359.22 MFLOPS | Progress: (1/10) | 4.82 s

microTVM runtime: 0-length read, exiting!
model/codegen/host/src/lib1.c: In function ‘default_function’:
model/codegen/host/src/lib1.c:24:9: warning: unused variable ‘arg_data_shape’ [-Wunused-variable]
   24 |   void* arg_data_shape = (((DLTensor*)arg_data)[0].shape);
      |         ^~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:21:9: warning: unused variable ‘arg_kernel_shape’ [-Wunused-variable]
   21 |   void* arg_kernel_shape = (((DLTensor*)arg_kernel)[0].shape);
      |         ^~~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:17:9: warning: unused variable ‘arg_conv2d_NCHWc_shape’ [-Wunused-variable]
   17 |   void* arg_conv2d_NCHWc_shape = (((DLTensor*)arg_conv2d_NCHWc)[0].shape);
      |         ^~~~~~~~~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:15:11: warning: unused variable ‘arg_data_code’ [-Wunused-variable]
   15 |   int32_t arg_data_code = arg_type_ids[2];
      |           ^~~~~~~~~~~~~
model/codegen/host/src/lib1.c:13:11: warning: unused variable ‘arg_kernel_code’ [-Wunused-variable]
   13 |   int32_t arg_kernel_code = arg_type_ids[1];
      |           ^~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:11:11: warning: unused variable ‘arg_conv2d_NCHWc_code’ [-Wunused-variable]
   11 |   int32_t arg_conv2d_NCHWc_code = arg_type_ids[0];
      |           ^~~~~~~~~~~~~~~~~~~~~

 Current/Best:  276.81/ 359.22 MFLOPS | Progress: (2/10) | 7.82 s

microTVM runtime: 0-length read, exiting!
model/codegen/host/src/lib1.c: In function ‘default_function’:
model/codegen/host/src/lib1.c:24:9: warning: unused variable ‘arg_data_shape’ [-Wunused-variable]
   24 |   void* arg_data_shape = (((DLTensor*)arg_data)[0].shape);
      |         ^~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:21:9: warning: unused variable ‘arg_kernel_shape’ [-Wunused-variable]
   21 |   void* arg_kernel_shape = (((DLTensor*)arg_kernel)[0].shape);
      |         ^~~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:17:9: warning: unused variable ‘arg_conv2d_NCHWc_shape’ [-Wunused-variable]
   17 |   void* arg_conv2d_NCHWc_shape = (((DLTensor*)arg_conv2d_NCHWc)[0].shape);
      |         ^~~~~~~~~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:15:11: warning: unused variable ‘arg_data_code’ [-Wunused-variable]
   15 |   int32_t arg_data_code = arg_type_ids[2];
      |           ^~~~~~~~~~~~~
model/codegen/host/src/lib1.c:13:11: warning: unused variable ‘arg_kernel_code’ [-Wunused-variable]
   13 |   int32_t arg_kernel_code = arg_type_ids[1];
      |           ^~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:11:11: warning: unused variable ‘arg_conv2d_NCHWc_code’ [-Wunused-variable]
   11 |   int32_t arg_conv2d_NCHWc_code = arg_type_ids[0];
      |           ^~~~~~~~~~~~~~~~~~~~~

 Current/Best:  369.08/ 369.08 MFLOPS | Progress: (3/10) | 10.76 s

microTVM runtime: 0-length read, exiting!
model/codegen/host/src/lib1.c: In function ‘default_function’:
model/codegen/host/src/lib1.c:24:9: warning: unused variable ‘arg_data_shape’ [-Wunused-variable]
   24 |   void* arg_data_shape = (((DLTensor*)arg_data)[0].shape);
      |         ^~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:21:9: warning: unused variable ‘arg_kernel_shape’ [-Wunused-variable]
   21 |   void* arg_kernel_shape = (((DLTensor*)arg_kernel)[0].shape);
      |         ^~~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:17:9: warning: unused variable ‘arg_conv2d_NCHWc_shape’ [-Wunused-variable]
   17 |   void* arg_conv2d_NCHWc_shape = (((DLTensor*)arg_conv2d_NCHWc)[0].shape);
      |         ^~~~~~~~~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:15:11: warning: unused variable ‘arg_data_code’ [-Wunused-variable]
   15 |   int32_t arg_data_code = arg_type_ids[2];
      |           ^~~~~~~~~~~~~
model/codegen/host/src/lib1.c:13:11: warning: unused variable ‘arg_kernel_code’ [-Wunused-variable]
   13 |   int32_t arg_kernel_code = arg_type_ids[1];
      |           ^~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:11:11: warning: unused variable ‘arg_conv2d_NCHWc_code’ [-Wunused-variable]
   11 |   int32_t arg_conv2d_NCHWc_code = arg_type_ids[0];
      |           ^~~~~~~~~~~~~~~~~~~~~

 Current/Best:  607.20/ 607.20 MFLOPS | Progress: (4/10) | 13.56 s

microTVM runtime: 0-length read, exiting!
model/codegen/host/src/lib1.c: In function ‘default_function’:
model/codegen/host/src/lib1.c:24:9: warning: unused variable ‘arg_data_shape’ [-Wunused-variable]
   24 |   void* arg_data_shape = (((DLTensor*)arg_data)[0].shape);
      |         ^~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:21:9: warning: unused variable ‘arg_kernel_shape’ [-Wunused-variable]
   21 |   void* arg_kernel_shape = (((DLTensor*)arg_kernel)[0].shape);
      |         ^~~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:17:9: warning: unused variable ‘arg_conv2d_NCHWc_shape’ [-Wunused-variable]
   17 |   void* arg_conv2d_NCHWc_shape = (((DLTensor*)arg_conv2d_NCHWc)[0].shape);
      |         ^~~~~~~~~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:15:11: warning: unused variable ‘arg_data_code’ [-Wunused-variable]
   15 |   int32_t arg_data_code = arg_type_ids[2];
      |           ^~~~~~~~~~~~~
model/codegen/host/src/lib1.c:13:11: warning: unused variable ‘arg_kernel_code’ [-Wunused-variable]
   13 |   int32_t arg_kernel_code = arg_type_ids[1];
      |           ^~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:11:11: warning: unused variable ‘arg_conv2d_NCHWc_code’ [-Wunused-variable]
   11 |   int32_t arg_conv2d_NCHWc_code = arg_type_ids[0];
      |           ^~~~~~~~~~~~~~~~~~~~~

 Current/Best:  537.81/ 607.20 MFLOPS | Progress: (5/10) | 16.40 s

microTVM runtime: 0-length read, exiting!
model/codegen/host/src/lib1.c: In function ‘default_function’:
model/codegen/host/src/lib1.c:24:9: warning: unused variable ‘arg_data_shape’ [-Wunused-variable]
   24 |   void* arg_data_shape = (((DLTensor*)arg_data)[0].shape);
      |         ^~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:21:9: warning: unused variable ‘arg_kernel_shape’ [-Wunused-variable]
   21 |   void* arg_kernel_shape = (((DLTensor*)arg_kernel)[0].shape);
      |         ^~~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:17:9: warning: unused variable ‘arg_conv2d_NCHWc_shape’ [-Wunused-variable]
   17 |   void* arg_conv2d_NCHWc_shape = (((DLTensor*)arg_conv2d_NCHWc)[0].shape);
      |         ^~~~~~~~~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:15:11: warning: unused variable ‘arg_data_code’ [-Wunused-variable]
   15 |   int32_t arg_data_code = arg_type_ids[2];
      |           ^~~~~~~~~~~~~
model/codegen/host/src/lib1.c:13:11: warning: unused variable ‘arg_kernel_code’ [-Wunused-variable]
   13 |   int32_t arg_kernel_code = arg_type_ids[1];
      |           ^~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:11:11: warning: unused variable ‘arg_conv2d_NCHWc_code’ [-Wunused-variable]
   11 |   int32_t arg_conv2d_NCHWc_code = arg_type_ids[0];
      |           ^~~~~~~~~~~~~~~~~~~~~

 Current/Best:  280.94/ 607.20 MFLOPS | Progress: (6/10) | 19.44 s

microTVM runtime: 0-length read, exiting!
model/codegen/host/src/lib1.c: In function ‘default_function’:
model/codegen/host/src/lib1.c:24:9: warning: unused variable ‘arg_data_shape’ [-Wunused-variable]
   24 |   void* arg_data_shape = (((DLTensor*)arg_data)[0].shape);
      |         ^~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:21:9: warning: unused variable ‘arg_kernel_shape’ [-Wunused-variable]
   21 |   void* arg_kernel_shape = (((DLTensor*)arg_kernel)[0].shape);
      |         ^~~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:17:9: warning: unused variable ‘arg_conv2d_NCHWc_shape’ [-Wunused-variable]
   17 |   void* arg_conv2d_NCHWc_shape = (((DLTensor*)arg_conv2d_NCHWc)[0].shape);
      |         ^~~~~~~~~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:15:11: warning: unused variable ‘arg_data_code’ [-Wunused-variable]
   15 |   int32_t arg_data_code = arg_type_ids[2];
      |           ^~~~~~~~~~~~~
model/codegen/host/src/lib1.c:13:11: warning: unused variable ‘arg_kernel_code’ [-Wunused-variable]
   13 |   int32_t arg_kernel_code = arg_type_ids[1];
      |           ^~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:11:11: warning: unused variable ‘arg_conv2d_NCHWc_code’ [-Wunused-variable]
   11 |   int32_t arg_conv2d_NCHWc_code = arg_type_ids[0];
      |           ^~~~~~~~~~~~~~~~~~~~~

 Current/Best:  274.39/ 607.20 MFLOPS | Progress: (7/10) | 22.23 s

microTVM runtime: 0-length read, exiting!
model/codegen/host/src/lib1.c: In function ‘default_function’:
model/codegen/host/src/lib1.c:24:9: warning: unused variable ‘arg_data_shape’ [-Wunused-variable]
   24 |   void* arg_data_shape = (((DLTensor*)arg_data)[0].shape);
      |         ^~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:21:9: warning: unused variable ‘arg_kernel_shape’ [-Wunused-variable]
   21 |   void* arg_kernel_shape = (((DLTensor*)arg_kernel)[0].shape);
      |         ^~~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:17:9: warning: unused variable ‘arg_conv2d_NCHWc_shape’ [-Wunused-variable]
   17 |   void* arg_conv2d_NCHWc_shape = (((DLTensor*)arg_conv2d_NCHWc)[0].shape);
      |         ^~~~~~~~~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:15:11: warning: unused variable ‘arg_data_code’ [-Wunused-variable]
   15 |   int32_t arg_data_code = arg_type_ids[2];
      |           ^~~~~~~~~~~~~
model/codegen/host/src/lib1.c:13:11: warning: unused variable ‘arg_kernel_code’ [-Wunused-variable]
   13 |   int32_t arg_kernel_code = arg_type_ids[1];
      |           ^~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:11:11: warning: unused variable ‘arg_conv2d_NCHWc_code’ [-Wunused-variable]
   11 |   int32_t arg_conv2d_NCHWc_code = arg_type_ids[0];
      |           ^~~~~~~~~~~~~~~~~~~~~

 Current/Best:  319.04/ 607.20 MFLOPS | Progress: (8/10) | 25.18 s

microTVM runtime: 0-length read, exiting!
model/codegen/host/src/lib1.c: In function ‘default_function’:
model/codegen/host/src/lib1.c:24:9: warning: unused variable ‘arg_data_shape’ [-Wunused-variable]
   24 |   void* arg_data_shape = (((DLTensor*)arg_data)[0].shape);
      |         ^~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:21:9: warning: unused variable ‘arg_kernel_shape’ [-Wunused-variable]
   21 |   void* arg_kernel_shape = (((DLTensor*)arg_kernel)[0].shape);
      |         ^~~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:17:9: warning: unused variable ‘arg_conv2d_NCHWc_shape’ [-Wunused-variable]
   17 |   void* arg_conv2d_NCHWc_shape = (((DLTensor*)arg_conv2d_NCHWc)[0].shape);
      |         ^~~~~~~~~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:15:11: warning: unused variable ‘arg_data_code’ [-Wunused-variable]
   15 |   int32_t arg_data_code = arg_type_ids[2];
      |           ^~~~~~~~~~~~~
model/codegen/host/src/lib1.c:13:11: warning: unused variable ‘arg_kernel_code’ [-Wunused-variable]
   13 |   int32_t arg_kernel_code = arg_type_ids[1];
      |           ^~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:11:11: warning: unused variable ‘arg_conv2d_NCHWc_code’ [-Wunused-variable]
   11 |   int32_t arg_conv2d_NCHWc_code = arg_type_ids[0];
      |           ^~~~~~~~~~~~~~~~~~~~~

 Current/Best:  231.81/ 607.20 MFLOPS | Progress: (9/10) | 28.03 s

microTVM runtime: 0-length read, exiting!
model/codegen/host/src/lib1.c: In function ‘default_function’:
model/codegen/host/src/lib1.c:24:9: warning: unused variable ‘arg_data_shape’ [-Wunused-variable]
   24 |   void* arg_data_shape = (((DLTensor*)arg_data)[0].shape);
      |         ^~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:21:9: warning: unused variable ‘arg_kernel_shape’ [-Wunused-variable]
   21 |   void* arg_kernel_shape = (((DLTensor*)arg_kernel)[0].shape);
      |         ^~~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:17:9: warning: unused variable ‘arg_conv2d_NCHWc_shape’ [-Wunused-variable]
   17 |   void* arg_conv2d_NCHWc_shape = (((DLTensor*)arg_conv2d_NCHWc)[0].shape);
      |         ^~~~~~~~~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:15:11: warning: unused variable ‘arg_data_code’ [-Wunused-variable]
   15 |   int32_t arg_data_code = arg_type_ids[2];
      |           ^~~~~~~~~~~~~
model/codegen/host/src/lib1.c:13:11: warning: unused variable ‘arg_kernel_code’ [-Wunused-variable]
   13 |   int32_t arg_kernel_code = arg_type_ids[1];
      |           ^~~~~~~~~~~~~~~
model/codegen/host/src/lib1.c:11:11: warning: unused variable ‘arg_conv2d_NCHWc_code’ [-Wunused-variable]
   11 |   int32_t arg_conv2d_NCHWc_code = arg_type_ids[0];
      |           ^~~~~~~~~~~~~~~~~~~~~

 Current/Best:  272.80/ 607.20 MFLOPS | Progress: (10/10) | 30.78 s Done.

microTVM runtime: 0-length read, exiting!

in log file:

{"input": ["c -keys=cpu -link-params=0 -mcpu=cortex-m7 -model=imxrt10xx", "conv2d_NCHWc.x86", [["TENSOR", [1, 3, 10, 10], "float32"], ["TENSOR", [6, 3, 5, 5], "float32"], [1, 1], [2, 2, 2, 2], [1, 1], "NCHW", "NCHW", "float32"], {}], "config": {"index": 86, "code_hash": null, "entity": [["tile_ic", "sp", [-1, 1]], ["tile_oc", "sp", [-1, 6]], ["tile_ow", "sp", [-1, 8]], ["unroll_kw", "ot", false]]}, "result": [[0.000262], 0, 2.5539603233337402, 1655886068.604009], "version": 0.2, "tvm_version": "0.9.dev0"}
{"input": ["c -keys=cpu -link-params=0 -mcpu=cortex-m7 -model=imxrt10xx", "conv2d_NCHWc.x86", [["TENSOR", [1, 3, 10, 10], "float32"], ["TENSOR", [6, 3, 5, 5], "float32"], [1, 1], [2, 2, 2, 2], [1, 1], "NCHW", "NCHW", "float32"], {}], "config": {"index": 85, "code_hash": null, "entity": [["tile_ic", "sp", [-1, 3]], ["tile_oc", "sp", [-1, 3]], ["tile_ow", "sp", [-1, 8]], ["unroll_kw", "ot", false]]}, "result": [[0.00034], 0, 2.8432226181030273, 1655886071.5829275], "version": 0.2, "tvm_version": "0.9.dev0"}
{"input": ["c -keys=cpu -link-params=0 -mcpu=cortex-m7 -model=imxrt10xx", "conv2d_NCHWc.x86", [["TENSOR", [1, 3, 10, 10], "float32"], ["TENSOR", [6, 3, 5, 5], "float32"], [1, 1], [2, 2, 2, 2], [1, 1], "NCHW", "NCHW", "float32"], {}], "config": {"index": 23, "code_hash": null, "entity": [["tile_ic", "sp", [-1, 3]], ["tile_oc", "sp", [-1, 6]], ["tile_ow", "sp", [-1, 4]], ["unroll_kw", "ot", true]]}, "result": [[0.000255], 0, 2.7958567142486572, 1655886074.537234], "version": 0.2, "tvm_version": "0.9.dev0"}
{"input": ["c -keys=cpu -link-params=0 -mcpu=cortex-m7 -model=imxrt10xx", "conv2d_NCHWc.x86", [["TENSOR", [1, 3, 10, 10], "float32"], ["TENSOR", [6, 3, 5, 5], "float32"], [1, 1], [2, 2, 2, 2], [1, 1], "NCHW", "NCHW", "float32"], {}], "config": {"index": 25, "code_hash": null, "entity": [["tile_ic", "sp", [-1, 3]], ["tile_oc", "sp", [-1, 1]], ["tile_ow", "sp", [-1, 5]], ["unroll_kw", "ot", true]]}, "result": [[0.000155], 0, 2.6651906967163086, 1655886077.3360674], "version": 0.2, "tvm_version": "0.9.dev0"}
{"input": ["c -keys=cpu -link-params=0 -mcpu=cortex-m7 -model=imxrt10xx", "conv2d_NCHWc.x86", [["TENSOR", [1, 3, 10, 10], "float32"], ["TENSOR", [6, 3, 5, 5], "float32"], [1, 1], [2, 2, 2, 2], [1, 1], "NCHW", "NCHW", "float32"], {}], "config": {"index": 16, "code_hash": null, "entity": [["tile_ic", "sp", [-1, 1]], ["tile_oc", "sp", [-1, 1]], ["tile_ow", "sp", [-1, 4]], ["unroll_kw", "ot", true]]}, "result": [[0.000175], 0, 2.7136850357055664, 1655886080.1852794], "version": 0.2, "tvm_version": "0.9.dev0"}
{"input": ["c -keys=cpu -link-params=0 -mcpu=cortex-m7 -model=imxrt10xx", "conv2d_NCHWc.x86", [["TENSOR", [1, 3, 10, 10], "float32"], ["TENSOR", [6, 3, 5, 5], "float32"], [1, 1], [2, 2, 2, 2], [1, 1], "NCHW", "NCHW", "float32"], {}], "config": {"index": 75, "code_hash": null, "entity": [["tile_ic", "sp", [-1, 3]], ["tile_oc", "sp", [-1, 2]], ["tile_ow", "sp", [-1, 5]], ["unroll_kw", "ot", false]]}, "result": [[0.000335], 0, 2.898169994354248, 1655886083.2180862], "version": 0.2, "tvm_version": "0.9.dev0"}
{"input": ["c -keys=cpu -link-params=0 -mcpu=cortex-m7 -model=imxrt10xx", "conv2d_NCHWc.x86", [["TENSOR", [1, 3, 10, 10], "float32"], ["TENSOR", [6, 3, 5, 5], "float32"], [1, 1], [2, 2, 2, 2], [1, 1], "NCHW", "NCHW", "float32"], {}], "config": {"index": 49, "code_hash": null, "entity": [["tile_ic", "sp", [-1, 3]], ["tile_oc", "sp", [-1, 1]], ["tile_ow", "sp", [-1, 1]], ["unroll_kw", "ot", false]]}, "result": [[0.000343], 0, 2.653069257736206, 1655886086.0076249], "version": 0.2, "tvm_version": "0.9.dev0"}
{"input": ["c -keys=cpu -link-params=0 -mcpu=cortex-m7 -model=imxrt10xx", "conv2d_NCHWc.x86", [["TENSOR", [1, 3, 10, 10], "float32"], ["TENSOR", [6, 3, 5, 5], "float32"], [1, 1], [2, 2, 2, 2], [1, 1], "NCHW", "NCHW", "float32"], {}], "config": {"index": 38, "code_hash": null, "entity": [["tile_ic", "sp", [-1, 1]], ["tile_oc", "sp", [-1, 6]], ["tile_ow", "sp", [-1, 8]], ["unroll_kw", "ot", true]]}, "result": [[0.000295], 0, 2.7871172428131104, 1655886088.933378], "version": 0.2, "tvm_version": "0.9.dev0"}
{"input": ["c -keys=cpu -link-params=0 -mcpu=cortex-m7 -model=imxrt10xx", "conv2d_NCHWc.x86", [["TENSOR", [1, 3, 10, 10], "float32"], ["TENSOR", [6, 3, 5, 5], "float32"], [1, 1], [2, 2, 2, 2], [1, 1], "NCHW", "NCHW", "float32"], {}], "config": {"index": 61, "code_hash": null, "entity": [["tile_ic", "sp", [-1, 3]], ["tile_oc", "sp", [-1, 3]], ["tile_ow", "sp", [-1, 2]], ["unroll_kw", "ot", false]]}, "result": [[0.000406], 0, 2.718482255935669, 1655886091.8119724], "version": 0.2, "tvm_version": "0.9.dev0"}
{"input": ["c -keys=cpu -link-params=0 -mcpu=cortex-m7 -model=imxrt10xx", "conv2d_NCHWc.x86", [["TENSOR", [1, 3, 10, 10], "float32"], ["TENSOR", [6, 3, 5, 5], "float32"], [1, 1], [2, 2, 2, 2], [1, 1], "NCHW", "NCHW", "float32"], {}], "config": {"index": 59, "code_hash": null, "entity": [["tile_ic", "sp", [-1, 3]], ["tile_oc", "sp", [-1, 2]], ["tile_ow", "sp", [-1, 2]], ["unroll_kw", "ot", false]]}, "result": [[0.000345], 0, 2.614313840866089, 1655886094.5607457], "version": 0.2, "tvm_version": "0.9.dev0"}

I’m not sure if the result is correct, and what’s the meaning of microTVM runtime: 0-length read, exiting!

Then, another error occurs when I try to Timing the untuned program:

########## Build without Autotuning ##########

[15:26:21] ../src/runtime/graph_executor/debug/graph_executor_debug.cc:181: Got op timing: 3.21982e-06
[15:26:21] ../src/runtime/graph_executor/debug/graph_executor_debug.cc:181: Got op timing: 0.000815525
[15:26:22] ../src/runtime/graph_executor/debug/graph_executor_debug.cc:181: Got op timing: 3.20417e-06

---------------------------------------------------------------------------
TVMError                                  Traceback (most recent call last)
Input In [12], in <cell line: 1>()
      5 debug_module.set_input(**lowered.get_params())
      6 print("########## Build without Autotuning ##########")
----> 7 debug_module.run()
      8 del debug_module

File ~/Desktop/tvm/python/tvm/contrib/debugger/debug_executor.py:274, in GraphModuleDebug.run(self, **input_dict)
    271     self.set_input(**input_dict)
    273 # Step 1. Execute the graph
--> 274 self._run_debug()
    275 # Step 2. Dump the output tensors to the dump folder
    276 self.debug_datum.dump_output_tensor()

File ~/Desktop/tvm/python/tvm/contrib/debugger/debug_executor.py:234, in GraphModuleDebug._run_debug(self)
    231 self.debug_datum._time_list = [[float(t)] for t in self.run_individual(10, 1, 1)]
    233 # Get outputs.
--> 234 self._run_per_layer()

File ~/Desktop/tvm/python/tvm/contrib/debugger/debug_executor.py:222, in GraphModuleDebug._run_per_layer(self)
    218     for j in range(num_outputs):
    219         logging.info(
    220             "running node=%d, output_ind=%d, with node_name: %s", i, j, node["name"]
    221         )
--> 222         output_tensors.append(self._get_node_output(i, j))
    223 self.debug_datum.update_output_tensors(output_tensors)

File ~/Desktop/tvm/python/tvm/_ffi/_ctypes/packed_func.py:237, in PackedFuncBase.__call__(self, *args)
    225 ret_tcode = ctypes.c_int()
    226 if (
    227     _LIB.TVMFuncCall(
    228         self.handle,
   (...)
    235     != 0
    236 ):
--> 237     raise get_last_ffi_error()
    238 _ = temp_args
    239 _ = args

TVMError: MicroSessionTimeoutError: failed to read reply message after timeout 5s

Maybe I should start a new topic for this?

thanks so much for posting this writeup, and glad you were able to get it working! If you’d like to open a PR to adjust boards.json i’d be happy to review it.

I wonder if IMXRT1060 could be reset via a Zephyr command e.g. something like make reset. We could add this to microtvm_api_server.py if so.

Interesting. I suspect there are memory problems running the full model with the GraphExecutor on device. AOTExecutor dramatically improves that situation by removing dynamic allocations, so I’d be curious to know whether you are able to get this working via AOTExecutor. We don’t quite have the host-driven AOTExecutor written up into a tutorial yet iirc, but we do have this test: https://github.com/apache/tvm/blob/main/tests/micro/zephyr/test_zephyr_aot_exec.py#L127

Curious if you’re able to get that working? I think you could time end-to-end with something like (adapted on top of the code link above):

run_func = aot_executor.module.time_evaluator("run", sess._device, number=1)
print(run_func())

We’re working on ways to profile per-layer in AOTExecutor, but that’s not ready yet.

cc @alanmacd

Unfortunately, it doesn’t work either, with the same error.

Here’s the result in my jupyter, I extract the code from test_zephyr_aot_exec.py, directly run

pytest test_zephyr_aot_exec.py --zephyr-board=mimxrt1060_evk

gets the same error.

set:

with tvm.transform.PassContext(opt_level=3, config={"tir.disable_vectorize": True}):
    mod = tvm.relay.build(ir_mod, target=target, runtime=runtime, executor=executor)

build and flash.

the memory information (although it doesn’t illustate many things):

image

run:

Sometimes there’re extra messages:

Exception ignored in: <function NDArrayBase.__del__ at 0x7f1ac63a03a0>
Traceback (most recent call last):
  File "/home/yyk/Desktop/tvm/python/tvm/_ffi/_ctypes/ndarray.py", line 82, in __del__
    check_call(_LIB.TVMArrayFree(self.handle))
  File "/home/yyk/Desktop/tvm/python/tvm/_ffi/base.py", line 348, in check_call
    raise get_last_ffi_error()
ValueError: Traceback (most recent call last):
  185: 0xffffffffffffffff
  184: 0x0000564671b4cad6
  183: __libc_start_main
        at ../csu/libc-start.c:308
  182: Py_BytesMain
        at /opt/conda/conda-bld/python-split_1648465063888/work/Modules/main.c:1127
  181: Py_RunMain
        at /opt/conda/conda-bld/python-split_1648465063888/work/Modules/main.c:695
  180: pymain_run_python
        at /opt/conda/conda-bld/python-split_1648465063888/work/Modules/main.c:610
  179: pymain_run_module
        at /opt/conda/conda-bld/python-split_1648465063888/work/Modules/main.c:309
  178: PyObject_Call
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:228
  177: PyVectorcall_Call
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:200
  176: _PyFunction_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:436
  175: _PyEval_EvalCodeWithName
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:4298
  174: PyEval_EvalFrameEx
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:741
  173: _PyEval_EvalFrameDefault
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:3500
  172: call_function
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:4963
  171: _PyObject_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Include/cpython/abstract.h:127
  170: _PyFunction_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:436
  169: _PyEval_EvalCodeWithName
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:4298
  168: PyEval_EvalFrameEx
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:741
  167: _PyEval_EvalFrameDefault
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:3500
  166: call_function
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:4963
  165: _PyObject_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Include/cpython/abstract.h:127
  164: cfunction_vectorcall_FASTCALL
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/methodobject.c:426
  163: builtin_exec
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/clinic/bltinmodule.c.h:396
  162: builtin_exec_impl.isra.15
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/bltinmodule.c:1025
  161: PyEval_EvalCode
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:718
  160: PyEval_EvalCodeEx
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:4327
  159: _PyEval_EvalCodeWithName
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:4298
  158: PyEval_EvalFrameEx
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:741
  157: _PyEval_EvalFrameDefault
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:3469
  156: call_function
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:4963
  155: _PyObject_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Include/cpython/abstract.h:127
  154: method_vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/classobject.c:60
  153: _PyObject_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Include/cpython/abstract.h:127
  152: _PyFunction_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:436
  151: _PyEval_EvalCodeWithName
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:4298
  150: PyEval_EvalFrameEx
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:741
  149: _PyEval_EvalFrameDefault
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:3486
  148: call_function
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:4963
  147: _PyObject_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Include/cpython/abstract.h:127
  146: _PyFunction_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:411
  145: function_code_fastcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:284
  144: PyEval_EvalFrameEx
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:741
  143: _PyEval_EvalFrameDefault
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:3486
  142: call_function
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:4963
  141: _PyObject_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Include/cpython/abstract.h:127
  140: _PyFunction_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:411
  139: function_code_fastcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:284
  138: PyEval_EvalFrameEx
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:741
  137: _PyEval_EvalFrameDefault
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:3486
  136: call_function
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:4963
  135: _PyObject_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Include/cpython/abstract.h:127
  134: _PyFunction_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:411
  133: function_code_fastcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:284
  132: PyEval_EvalFrameEx
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:741
  131: _PyEval_EvalFrameDefault
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:3486
  130: call_function
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:4963
  129: _PyObject_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Include/cpython/abstract.h:127
  128: _PyFunction_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:411
  127: function_code_fastcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:284
  126: PyEval_EvalFrameEx
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:741
  125: _PyEval_EvalFrameDefault
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:3486
  124: call_function
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:4963
  123: _PyObject_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Include/cpython/abstract.h:127
  122: _PyFunction_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:411
  121: function_code_fastcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:284
  120: PyEval_EvalFrameEx
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:741
  119: _PyEval_EvalFrameDefault
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:3559
  118: do_call_core
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:4983
  117: PyCFunction_Call
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:775
  116: PyVectorcall_Call
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:200
  115: cfunction_vectorcall_FASTCALL_KEYWORDS
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/methodobject.c:441
  114: context_run
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/context.c:634
  113: _PyObject_MakeTpCall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:159
  112: TaskWakeupMethWrapper_call
        at /usr/local/src/conda/python-3.8.13/Modules/_asynciomodule.c:1791
  111: task_wakeup
        at /usr/local/src/conda/python-3.8.13/Modules/_asynciomodule.c:2971
  110: task_step
        at /usr/local/src/conda/python-3.8.13/Modules/_asynciomodule.c:2934
  109: task_step_impl
        at /usr/local/src/conda/python-3.8.13/Modules/_asynciomodule.c:2641
  108: gen_send_ex
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/genobject.c:222
  107: PyEval_EvalFrameEx
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:741
  106: _PyEval_EvalFrameDefault
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:2053
  105: _PyGen_Send
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/genobject.c:292
  104: gen_send_ex
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/genobject.c:222
  103: PyEval_EvalFrameEx
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:741
  102: _PyEval_EvalFrameDefault
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:2053
  101: _PyGen_Send
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/genobject.c:292
  100: gen_send_ex
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/genobject.c:222
  99: PyEval_EvalFrameEx
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:741
  98: _PyEval_EvalFrameDefault
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:2053
  97: _PyGen_Send
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/genobject.c:292
  96: gen_send_ex
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/genobject.c:222
  95: PyEval_EvalFrameEx
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:741
  94: _PyEval_EvalFrameDefault
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:2053
  93: _PyGen_Send
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/genobject.c:292
  92: gen_send_ex
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/genobject.c:222
  91: PyEval_EvalFrameEx
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:741
  90: _PyEval_EvalFrameDefault
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:3515
  89: call_function
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:4963
  88: _PyObject_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Include/cpython/abstract.h:127
  87: method_vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/classobject.c:60
  86: _PyObject_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Include/cpython/abstract.h:127
  85: _PyFunction_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:436
  84: _PyEval_EvalCodeWithName
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:4298
  83: PyEval_EvalFrameEx
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:741
  82: _PyEval_EvalFrameDefault
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:3559
  81: do_call_core
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:5010
  80: PyObject_Call
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:228
  79: PyVectorcall_Call
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:200
  78: method_vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/classobject.c:89
  77: _PyObject_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Include/cpython/abstract.h:127
  76: _PyFunction_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:436
  75: _PyEval_EvalCodeWithName
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:4298
  74: PyEval_EvalFrameEx
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:741
  73: _PyEval_EvalFrameDefault
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:3486
  72: call_function
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:4963
  71: _PyObject_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Include/cpython/abstract.h:127
  70: _PyFunction_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:411
  69: function_code_fastcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:284
  68: PyEval_EvalFrameEx
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:741
  67: _PyEval_EvalFrameDefault
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:3500
  66: call_function
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:4963
  65: _PyObject_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Include/cpython/abstract.h:127
  64: _PyFunction_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:411
  63: function_code_fastcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:284
  62: PyEval_EvalFrameEx
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:741
  61: _PyEval_EvalFrameDefault
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:3486
  60: call_function
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:4963
  59: _PyObject_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Include/cpython/abstract.h:127
  58: method_vectorcall_O
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/descrobject.c:416
  57: gen_send_ex
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/genobject.c:222
  56: PyEval_EvalFrameEx
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:741
  55: _PyEval_EvalFrameDefault
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:2053
  54: _PyGen_Send
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/genobject.c:292
  53: gen_send_ex
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/genobject.c:222
  52: PyEval_EvalFrameEx
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:741
  51: _PyEval_EvalFrameDefault
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:2053
  50: _PyGen_Send
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/genobject.c:292
  49: gen_send_ex
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/genobject.c:222
  48: PyEval_EvalFrameEx
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:741
  47: _PyEval_EvalFrameDefault
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:3500
  46: call_function
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:4963
  45: _PyObject_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Include/cpython/abstract.h:127
  44: cfunction_vectorcall_FASTCALL
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/methodobject.c:426
  43: builtin_exec
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/clinic/bltinmodule.c.h:396
  42: builtin_exec_impl.isra.15
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/bltinmodule.c:1025
  41: PyEval_EvalCode
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:718
  40: PyEval_EvalCodeEx
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:4327
  39: _PyEval_EvalCodeWithName
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:4298
  38: PyEval_EvalFrameEx
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:741
  37: _PyEval_EvalFrameDefault
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:2292
  36: insertdict
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/dictobject.c:1102
  35: _Py_XDECREF
        at /opt/conda/conda-bld/python-split_1648465063888/work/Include/object.h:541
  34: _Py_DECREF
        at /opt/conda/conda-bld/python-split_1648465063888/work/Include/object.h:478
  33: _Py_Dealloc
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/object.c:2215
  32: subtype_dealloc
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/typeobject.c:1221
  31: PyObject_CallFinalizerFromDealloc
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/object.c:328
  30: PyObject_CallFinalizer
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/object.c:310
  29: slot_tp_finalize
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/typeobject.c:6838
  28: call_unbound_noarg
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/typeobject.c:1465
  27: _PyObject_FastCall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Include/cpython/abstract.h:147
  26: _PyObject_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Include/cpython/abstract.h:127
  25: _PyFunction_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:411
  24: function_code_fastcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:284
  23: PyEval_EvalFrameEx
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:741
  22: _PyEval_EvalFrameDefault
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:3469
  21: call_function
        at /opt/conda/conda-bld/python-split_1648465063888/work/Python/ceval.c:4963
  20: _PyObject_Vectorcall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Include/cpython/abstract.h:125
  19: _PyObject_MakeTpCall
        at /opt/conda/conda-bld/python-split_1648465063888/work/Objects/call.c:159
  18: PyCFuncPtr_call
        at /usr/local/src/conda/python-3.8.13/Modules/_ctypes/_ctypes.c:4201
  17: _ctypes_callproc
        at /usr/local/src/conda/python-3.8.13/Modules/_ctypes/callproc.c:1264
  16: _call_function_pointer
        at /usr/local/src/conda/python-3.8.13/Modules/_ctypes/callproc.c:921
  15: ffi_call_int
  14: ffi_call_unix64
  13: TVMArrayFree
  12: tvm::runtime::RemoteNDArrayDeleter(tvm::runtime::Object*)
  11: tvm::runtime::RPCClientSession::FreeHandle(void*, int)
  10: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::RPCEndpoint::Init()::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#2}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  9: tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool, std::function<void (tvm::runtime::TVMArgs)>)
  8: tvm::runtime::micro_rpc::MicroTransportChannel::Send(void const*, unsigned long)
  7: tvm::runtime::micro_rpc::Session::SendMessage(tvm::runtime::micro_rpc::MessageType, unsigned char const*, unsigned long)
        at src/runtime/crt/microtvm_rpc_common/session.cc:115
  6: tvm::runtime::micro_rpc::Session::SendInternal(tvm::runtime::micro_rpc::MessageType, unsigned char const*, unsigned long)
        at src/runtime/crt/microtvm_rpc_common/session.cc:48
  5: tvm::runtime::micro_rpc::Session::StartMessage(tvm::runtime::micro_rpc::MessageType, unsigned long)
        at src/runtime/crt/microtvm_rpc_common/session.cc:69
  4: tvm::runtime::micro_rpc::Framer::StartPacket(unsigned long)
        at src/runtime/crt/microtvm_rpc_common/framing.cc:344
  3: tvm::runtime::micro_rpc::Framer::WriteAndCrc(unsigned char const*, unsigned long, bool, bool)
        at src/runtime/crt/microtvm_rpc_common/framing.cc:386
  2: tvm::runtime::micro_rpc::WriteStream::WriteAll(unsigned char*, unsigned long, unsigned long*)
        at src/runtime/crt/microtvm_rpc_common/write_stream.cc:36
  1: tvm::runtime::micro_rpc::CallbackWriteStream::Write(unsigned char const*, unsigned long)
  0: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<TVMFuncCreateFromCFunc::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#2}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) [clone .cold]
  File "/home/yyk/Desktop/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 81, in cfun
    rv = local_pyfunc(*pyargs)
  File "/home/yyk/Desktop/tvm/python/tvm/micro/session.py", line 110, in _wrap_transport_write
    self.transport.write(
  File "/home/yyk/Desktop/tvm/python/tvm/micro/transport.py", line 255, in write
    raise err
  File "/home/yyk/Desktop/tvm/python/tvm/micro/transport.py", line 234, in write
    self.child.write(data, timeout_sec)
  File "/home/yyk/Desktop/tvm/python/tvm/micro/project.py", line 53, in write
    self._api_client.write_transport(data, timeout_sec)
  File "/home/yyk/Desktop/tvm/python/tvm/micro/project_api/client.py", line 189, in write_transport
    return self._request_reply(
  File "/home/yyk/Desktop/tvm/python/tvm/micro/project_api/client.py", line 110, in _request_reply
    self.write_file.write(request_str)
ValueError: I/O operation on closed file.

I’m not an expert on this, if there’s anything I can do to help target the problem, please tell me and I would be glad to have a try. Thanks.

hmm…this continues to smell like a memory allocation problem. @alanmacd is working on a change that will make the amount of memory required at runtime in these non-deployed flows evident to the C compiler (e.g. the top-level uint8_t workspace array will be all that’s needed). my hope is that this can help–it’s pretty hard to debug something like this further from here without seeing an on-device traceback.

I find a new interesting thing: I run a small model (magic_wand.tflite) with microTVM in jupyter, with the same code in microTVM with TFLite Models — tvm 0.9.dev0 documentation.

First I use graph executor, and everything of course runs correctly (set “link-params”: True / False are both OK), I think this is equal to using tvmc, right?

Then I switch to aot executor, and the old problem: MicroSessionTimeoutError: failed to read reply message after timeout 5s appears again.

code change:

executor:

executor = tvm.relay.backend.Executor("aot")
# executor = tvm.relay.backend.Executor("graph", {"link-params": True})

execute:

with tvm.micro.Session(transport_context_manager=generated_project.transport()) as session:
#     graph_mod = tvm.micro.create_local_graph_executor(
#         module.get_graph_json(), session.get_system_lib(), session.device
#     )

#     # Set the model parameters using the lowered parameters produced by `relay.build`.
#     graph_mod.set_input(**module.get_params())

#     graph_mod.set_input(input_tensor, tvm.nd.array(np.ones(input_shape, dtype=input_dtype)))
#     graph_mod.run()

#     tvm_output = graph_mod.get_output(0).numpy()
#     print("result is: " + str(tvm_output))
    aot_executor = tvm.runtime.executor.aot_executor.AotModule(session.create_aot_executor())
    data = np.ones(input_shape, dtype=input_dtype)
    aot_executor.get_input(input_tensor).copyfrom(data)
    aot_executor.run()
    tvm_output = aot_executor.get_output(0).numpy()
    print("result is: " + str(tvm_output))

After seeing this result, I make another try, I modify the code in test_zephyr_aot_exec.py from aot executor to graph executor, then miraculously, everything is back to normal.

So, do you think this is purely a memory problem?

Also, I apply a bigger model (mobilenet_v1_0.25_128_quant.tflitet, with graph executor) and this time I get a different traceback:

---------------------------------------------------------------------------
RPCError                                  Traceback (most recent call last)
Input In [18], in <cell line: 1>()
      1 with tvm.micro.Session(transport_context_manager=generated_project.transport()) as session:
----> 2     graph_mod = tvm.micro.create_local_graph_executor(
      3         module.get_graph_json(), session.get_system_lib(), session.device
      4     )
      6     # Set the model parameters using the lowered parameters produced by `relay.build`.
      7     graph_mod.set_input(**module.get_params())

File ~/Desktop/tvm/python/tvm/micro/session.py:221, in create_local_graph_executor(graph_json_str, mod, device)
    218 device_type_id = [device.device_type, device.device_id]
    219 fcreate = get_global_func("tvm.graph_executor.create")
    220 return graph_executor.GraphModule(
--> 221     fcreate(graph_json_str, mod, lookup_remote_linked_param, *device_type_id)
    222 )

File ~/Desktop/tvm/python/tvm/_ffi/_ctypes/packed_func.py:237, in PackedFuncBase.__call__(self, *args)
    225 ret_tcode = ctypes.c_int()
    226 if (
    227     _LIB.TVMFuncCall(
    228         self.handle,
   (...)
    235     != 0
    236 ):
--> 237     raise get_last_ffi_error()
    238 _ = temp_args
    239 _ = args

RPCError: Traceback (most recent call last):
  9: TVMFuncCall
  8: tvm::runtime::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const [clone .isra.0]
  7: tvm::runtime::GraphExecutorCreate(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::Module const&, std::vector<DLDevice, std::allocator<DLDevice> > const&, tvm::runtime::PackedFunc)
  6: tvm::runtime::GraphExecutor::Init(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::Module, std::vector<DLDevice, std::allocator<DLDevice> > const&, tvm::runtime::PackedFunc)
  5: tvm::runtime::GraphExecutor::SetupStorage()
  4: tvm::runtime::NDArray::Empty(tvm::runtime::ShapeTuple, DLDataType, DLDevice, tvm::runtime::Optional<tvm::runtime::String>)
  3: tvm::runtime::RPCDeviceAPI::AllocDataSpace(DLDevice, int, long const*, DLDataType, tvm::runtime::Optional<tvm::runtime::String>)
  2: tvm::runtime::RPCClientSession::AllocDataSpace(DLDevice, int, long const*, DLDataType, tvm::runtime::Optional<tvm::runtime::String>)
  1: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::RPCEndpoint::Init()::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#2}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  0: tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool, std::function<void (tvm::runtime::TVMArgs)>)
  File "../src/runtime/rpc/rpc_endpoint.cc", line 376
RPCError: Error caught from RPC call:

This traceback looks more like a memory allocation problem. I’m still looking into this because I don’t think this model is that large.

Could you try increasing the stack size? We found that it was necessary to increase the stack size when running on most hardware due to the default zephyr settings. You can look at this PR for reference.

@YAETHeryk here’s a new version of micro_tflite.py to run the microTVM with TFLite demo. I’ve verified it runs on both QEMU and the nrf5340dk_nrf5340_cpuapp hardware. It did need an increase of the default stack size to 2048 to run on hardware, which you’ll find with a few other changes:

import os
import json
import tarfile
import pathlib
import tempfile
import numpy as np

import tvm
from tvm import relay
import tvm.contrib.utils
from tvm.contrib.download import download_testdata

use_physical_hw = bool(os.getenv("TVM_MICRO_USE_HW"))
model_url = "https://people.linaro.org/~tom.gall/sine_model.tflite"
model_file = "sine_model.tflite"
model_path = download_testdata(model_url, model_file, module="data")

tflite_model_buf = open(model_path, "rb").read()

######################################################################
# Using the buffer, transform into a tflite model python object
try:
    import tflite

    tflite_model = tflite.Model.GetRootAsModel(tflite_model_buf, 0)
except AttributeError:
    import tflite.Model

    tflite_model = tflite.Model.Model.GetRootAsModel(tflite_model_buf, 0)

######################################################################
# Print out the version of the model
version = tflite_model.Version()
print("Model Version: " + str(version))

######################################################################
# Parse the python model object to convert it into a relay module
# and weights.
# It is important to note that the input tensor name must match what
# is contained in the model.
#
# If you are unsure what that might be, this can be discovered by using
# the ``visualize.py`` script within the Tensorflow project.
# See `How do I inspect a .tflite file? <https://www.tensorflow.org/lite/guide/faq>`_

input_tensor = "dense_4_input"
input_shape = (1,)
input_dtype = "float32"

mod, params = relay.frontend.from_tflite(
    tflite_model, shape_dict={input_tensor: input_shape}, dtype_dict={input_tensor: input_dtype}
)

######################################################################
# Defining the target
# -------------------
#
# Now we create a build config for relay, turning off two options and then calling relay.build which
# will result in a C source file for the selected TARGET. When running on a simulated target of the
# same architecture as the host (where this Python script is executed) choose "host" below for the
# TARGET, the C Runtime as the RUNTIME and a proper board/VM to run it (Zephyr will create the right
# QEMU VM based on BOARD. In the example below the x86 arch is selected and a x86 VM is picked up accordingly:
#
RUNTIME = tvm.relay.backend.Runtime("crt", {"system-lib": True})
TARGET = tvm.target.target.micro("host")
EXECUTOR = tvm.relay.backend.Executor("aot")
#
# Compiling for physical hardware
#  When running on physical hardware, choose a TARGET and a BOARD that describe the hardware. The
#  STM32F746 Nucleo target and board is chosen in the example below. Another option would be to
#  choose the STM32F746 Discovery board instead. Since that board has the same MCU as the Nucleo
#  board but a couple of wirings and configs differ, it's necessary to select the "stm32f746g_disco"
#  board to generated the right firmware image.
#

if use_physical_hw:
    boards_file = pathlib.Path(tvm.micro.get_microtvm_template_projects("zephyr")) / "boards.json"
    with open(boards_file) as f:
        boards = json.load(f)

    BOARD = os.getenv("TVM_MICRO_BOARD", default="nucleo_f746zg")
    TARGET = tvm.target.target.micro(boards[BOARD]["model"])

#
#  For some boards, Zephyr runs them emulated by default, using QEMU. For example, below is the
#  TARGET and BOARD used to build a microTVM firmware for the mps2-an521 board. Since that board
#  runs emulated by default on Zephyr the suffix "-qemu" is added to the board name to inform
#  microTVM that the QEMU transporter must be used to communicate with the board. If the board name
#  already has the prefix "qemu_", like "qemu_x86", then it's not necessary to add that suffix.
#
#  TARGET = tvm.target.target.micro("mps2_an521")
#  BOARD = "mps2_an521-qemu"

######################################################################
# Now, compile the model for the target:

with tvm.transform.PassContext(
    opt_level=3, config={"tir.disable_vectorize": True}, disabled_pass=["AlterOpLayout"]
):
    module = relay.build(mod, target=TARGET, runtime=RUNTIME, params=params, executor=EXECUTOR)

# Compiling the generated code
# ----------------------------
#
# Now we need to incorporate the generated C code into a project that allows us to run inference on the
# device. The simplest way to do this is to integrate it yourself, using microTVM's standard output format
# (:doc:`Model Library Format` </dev/model_library_format>`). This is a tarball with a standard layout:

# Get a temporary path where we can store the tarball (since this is running as a tutorial).

fd, model_library_format_tar_path = tempfile.mkstemp()
os.close(fd)
os.unlink(model_library_format_tar_path)
tvm.micro.export_model_library_format(module, model_library_format_tar_path)

with tarfile.open(model_library_format_tar_path, "r:*") as tar_f:
    print("\n".join(f" - {m.name}" for m in tar_f.getmembers()))

# Cleanup for tutorial:
os.unlink(model_library_format_tar_path)


# TVM also provides a standard way for embedded platforms to automatically generate a standalone
# project, compile and flash it to a target, and communicate with it using the standard TVM RPC
# protocol. The Model Library Format serves as the model input to this process. When embedded
# platforms provide such an integration, they can be used directly by TVM for both host-driven
# inference and autotuning . This integration is provided by the
# `microTVM Project API` <https://github.com/apache/tvm-rfcs/blob/main/rfcs/0008-microtvm-project-api.md>_,
#
# Embedded platforms need to provide a Template Project containing a microTVM API Server (typically,
# this lives in a file ``microtvm_api_server.py`` in the root directory). Let's use the example ``host``
# project in this tutorial, which simulates the device using a POSIX subprocess and pipes:

template_project_path = pathlib.Path(tvm.micro.get_microtvm_template_projects("crt"))
project_options = {}  # You can use options to provide platform-specific options through TVM.

# Compiling for physical hardware (or an emulated board, like the mps_an521)
# --------------------------------------------------------------------------
#  For physical hardware, you can try out the Zephyr platform by using a different template project
#  and options:
#

if use_physical_hw:
    template_project_path = pathlib.Path(tvm.micro.get_microtvm_template_projects("zephyr"))
    project_options = {"project_type": "host_driven", "zephyr_board": BOARD, "config_main_stack_size": 2048}

# Create a temporary directory

temp_dir = tvm.contrib.utils.tempdir()
generated_project_dir = temp_dir / "generated-project"
generated_project = tvm.micro.generate_project(
    template_project_path, module, generated_project_dir, project_options
)

# Build and flash the project
generated_project.build()
generated_project.flash()


######################################################################
# Next, establish a session with the simulated device and run the
# computation. The `with session` line would typically flash an attached
# microcontroller, but in this tutorial, it simply launches a subprocess
# to stand in for an attached microcontroller.

with tvm.micro.Session(transport_context_manager=generated_project.transport()) as session:

    aot_executor = tvm.runtime.executor.aot_executor.AotModule(session.create_aot_executor())
    data = np.array([0.5])
    aot_executor.get_input(input_tensor).copyfrom(data)
    aot_executor.run()
    tvm_output = aot_executor.get_output(0).numpy()
    print("result is: " + str(tvm_output))

Yes, this works.

I can pass pytest test_zephyr_aot_exec.py --zephyr-board=mimxrt1060_evk now after increasing main stack size. And I will try more cases to see whether this can solve all the problems.

Thanks a lot!

2 Likes

I’ve tried some other things and have some new questions:

  1. After increasing the main stack size, the aot executor runs correctly. And I could run auto-tuning tutorial code on my board and get a log file. However, the debug executor still had some problems and increasing stack size didn’t solve the problems this time.

  2. I tried a bigger model (mobilenet_v1_0.25_128_quant.tflite) with aot executor but failed due to memory issue. But I don’t think this model is too big because NXP provides a sample project which can run this model (using a very complex method) on the board .

  3. I find that about auto-tune, for each trial MicroTVM needs to rebuild the project, flash onto the board and run, so 10 trials means 10 build-flash-run. How many trials do we need to get a significant result, if we follow the guide in TVM tutorial that over 1000 trials for CPU, this seems to be really exhausting.

That’s all, any suggestion? Thank you for any help. @areusch

@YAETHeryk I still think for #1 and #2 you’re running into memory problems. We still have some loose ends to cleanup around the AOTExecutor–specifically, the AOTExecutor when used over the RPC link still allocates input and output tensors dynamically. I think that’s the issue you’re hitting in #1, though I don’t see a smoking gun.

In #2, it looks like you’re hitting a CHECK-fail in the firmware specific to the Zephyr runtime. I can tell this by checking the error code you’re seeing against the CRT error codes header.

Agreed this can be problematic–however, without programming the model implementation into the flash, it can be hard to assure an accurate result from tuning. The number of trials depends on the size of the workload you’re trying to run, and there are some other tricks that our more advanced (but unfortunately still in-development) optimization strategies (e.g. meta-scheduler and Relax) can play. You can also parallelize the tuning process by defining multiple runners. We’re working on leveraging these strategies in the coming months.

Thank you.

I’m still curious this function: TVMBackendAllocWorkspace, what does this function do?

Because the following message appears every time the executor runs incorrectly, Check failed error seems to be related to this function.

[11:28:37] /home/yyk/Desktop/tvm/src/runtime/micro/micro_session.cc:368: remote: CMAKE_SOURCE_DIR/crt/src/runtime/crt/common/crt_backend_api.c:42: Check failed: err == kTvmErrorNoError: TVMBackendAllocWorkspace(1, 0, 589824, 2, 32) → 1283

Oh, I checked the file you mentioned, the error code 1283 means:

kTvmErrorPlatformNoMemory

This is certainly a memory problem.

Sorry for the delay in reply. TVMBackendAllocWorkspace is called when an operator implementation needs scratchpad memory–it is like malloc. However, for microTVM when the AOT compilation flow is used with USMP enabled, you should never see this–it’s intended (though we are not quite there yet in the default flow) that code generated for microTVM avoids all dynamic memory allocations and instead makes all memory references in to pre-defined buffers with sizes specified by the compiler (and constrained by the user).

I’d suggest trying to compile with USMP enabled–I think passing 'tir.usmp.enable': True PassContext option should be enough per one of the unit tests. That may at least remove the need for this TVMBackendAllocWorkspace. cc @manupa-arm in case he has more to add here.

I add this config 'tir.usmp.enable': True to my code and there’re some changes:

  1. For magic_wand.tflite case (which could run correctly before), now it can’t run and show another failure message:

Is this an expected result? Because the comment for test case says: Test should fail if const pool is supplied to executor as these are currently not supported

  1. For mobilenet_v1_0.25_128_quant.tflite case (which was problematic), I don’t know the exact reason but increasing main stack size again is needed, after that it’s still problematic and that problematic function appears again:

Things are more complicated than we expected, right?

@YAETHeryk

Ah, that does indeed look like it’s expected for now. hopefully we can add support for constant pools to the host-driven AOT executor soon. @manupa-arm might have guidance on whether this would work if you deploy standalone (e.g. set project_type="aot-demo" and drive inference fully from the µC) right now.

Yeah unfortunately I wouldn’t expect that mobilenet to work without using AOTExecutor. I do think the configuration in this tutorial should get mobilenet running–I don’t believe it uses USMP, so it may not work on as many boards as we’d like right now.

This example app shows USMP (drive inference fully from the µC) working using the embedded C runtime APIs. – So I would expect that to work. @areusch might confirm whether that is type of the project generated with “aot-demo” (which I believe it would be).

I’m not as familiar with the Zephyr bindings, but microTVM also supports (via Arduino) the Teensy 4.0 and 4.1, which have the same IMXRT1060 chip as your mimxrt1060_evk, and I’ve used TVM with both in the past.

To help reproduce the problem, I wrote up a short script on Colab that builds mobilenet_v1_0.25_128_quant.tflite for the IMXRT1060, using microTVM’s Arduino bindings and the AOT executor. Using the AOT executor instead of the graph executor allows us to detect the inevitable memory overflow at compile time, instead of at runtime.

First, it should not be necessary to use the 'tir.usmp.enable': True, just to get a board working. Instead, your issue is where your variables are being stored. To briefly summarize the datasheet, the IMXRT1060 has 1 MB OCRAM (on-chip RAM), which is divided into 512 KB of tightly coupled ITCM/DTCM (of which the ITCM/DTCM split can be specified in 32 KB increments), as well as 512 KB of regular, general purpose OCRAM. Along with 128 KB of boot ROM (which isn’t relevant here), this is all the memory on the chip.

Importantly, this means the IMXRT1060 has no flash memory onboard (instead, it is located on an external chip and connected via one of memory interfaces). This memory layout means the IMXRT1060’s compiler can often make interesting (bad) choices of where to store variables and memory, and will try really hard to put all variables, including static ones, into TCRAM (tightly-coupled RAM, the first 512 KB block mentioned above). This is what’s causing your memory issue - the compiler is trying to store all variables inside DTCM, which can be at most 512 KB (and is in practice less).

The solution? Store the variables in places that make sense. Static variables (like the model weights in default_lib2.c) should go on the external flash chip, which can be done by specifying PROGMEM. Our AOT memory array should go in regular OCRAM, which can be done by specifying DMAMEM, e.g.

DMAMEM uint8_t g_aot_memory[WORKSPACE_SIZE]
    __attribute__((aligned(TVM_RUNTIME_ALLOC_ALIGNMENT_BYTES)));

You can easily verify that these changes help - the example above in Colab tells you the amount by which the DTCM region of memory is exceeded. Making either one of these changes will decrease the amount by which we exceed it, and making both will solve the problem.

This is not a good long-term solution, however, as it requires you to mess with the compiler’s output. @alanmacd is doing work on memory pools that will let us make these decisions automatically, but it will not be ready for a little while.