Enabling µTVM on different device

max1996 · January 27, 2021, 12:25pm

As I do not have one of the supported boards on hand, how difficult is it to port the current state of µTVM to a different board. I happen to have a couple of STM32L496 Nucleo-144 boards.

These are Cortex-M4 based, instead of the Cortex-M7. What do I have to do to geht the microtvm-eval blog repository to work?

leandron · January 27, 2021, 2:06pm

Hi @max1996, there is a PR going on right now, about supporting a new board, you maybe can have a look at that?

cc @areusch @mdw-octoml

max1996 · January 27, 2021, 3:33pm

thank you, that helped a lot, as I have missed some of these locations.

I tried running the blogpost demo, but it is unable to flash the firmware to the device as it will timeout after 10 seconds, but that is most likely not a TVM problem. (I am not an expert on these development boards)

areusch · January 27, 2021, 5:44pm

hi @max1996, can you share a pointer to your code and I may be able to help? unfortunately we need to write up some documentation to describe how to port to a new board.

-andrew

max1996 · January 28, 2021, 8:32am

hi @areusch, I will create a public fork to share the current state, but the changes are small. The main differences are the added entries for the platform in some dictionaries.

The full error at the flashing and running step looks like this:

max1996 · January 28, 2021, 9:37am

Source Code:

GitHub - Blogpost - Fork

GitHub - TVM - Fork

I hope, that I did not introduce additional errors while forking and merging my changes to the latest version

max1996 · January 28, 2021, 9:52am

I was able to find the terminal output of the flashing process

Open On-Chip Debugger 0.10.0+dev-01341-g580d06d9d-dirty (2020-05-16-15:41)
Licensed under GNU GPL v2
For bug reports, read
	http://openocd.org/doc/doxygen/bugs.html
0665FF544851717867230824
Info : The selected transport took over low-level target control. The results might differ compared to plain JTAG/SWD
Info : clock speed 500 kHz
Info : STLINK V2J30M20 (API v2) VID:PID 0483:374B
Info : Target voltage: 3.261782
Info : stm32l4x.cpu: hardware has 6 breakpoints, 4 watchpoints
Info : Listening on port 3333 for gdb connections
    TargetName         Type       Endian TapName            State       
--  ------------------ ---------- ------ ------------------ ------------
 0* stm32l4x.cpu       hla_target little stm32l4x.cpu       running

Info : Unable to match requested speed 500 kHz, using 480 kHz
Info : Unable to match requested speed 500 kHz, using 480 kHz
target halted due to debug-request, current mode: Thread 
xPSR: 0x01000000 pc: 0x08001df0 msp: 0x2003e368
Info : device idcode = 0x20006461 (STM32L49/L4Axx - Rev: B)
Info : flash size = 1024kbytes
Info : flash mode : dual-bank
Warn : Adding extra erase range, 0x08021768 .. 0x080217ff
auto erase enabled
wrote 137064 bytes from file /tmp/tvm-debug-mode-tempdirs/2021-01-28T09-48-51___7oxhvrt4/00000/build/runtime/zephyr/zephyr.hex in 8.824641s (15.168 KiB/s)

Info : Unable to match requested speed 500 kHz, using 480 kHz
Info : Unable to match requested speed 500 kHz, using 480 kHz
target halted due to debug-request, current mode: Thread 
xPSR: 0x01000000 pc: 0x08001df0 msp: 0x2003e368
verified 137064 bytes in 4.516465s (29.636 KiB/s)

Info : Unable to match requested speed 500 kHz, using 480 kHz
Info : Unable to match requested speed 500 kHz, using 480 kHz
shutdown command invoked

areusch · January 28, 2021, 4:41pm

hi @max1996, thanks for posting that up. your changes look fine to me. at this point i’d guess there is a problem with either:

the RPC server is not starting properly (you should see uTVM On-Device Runtime written on the UART immediately after startup)
TVM is trying to use the wrong serial port to communicate
the session setup logic is causing the board to crash. This would typically only happen if there were problems allocating memory on the board, but some memory is allocated during startup so it’s unlikely.

I think we have debugged cause 3 fairly well–I would see if you can investigate cause 1 or 2. here are some pointers:

you should find the generated Zephyr project under workspace/builds/<datetime>. try flashing the project and then use python -mserial.tools.miniterm <port> 115200 to verify you see that debug output. if not, use west debug to launch GDB and investigate. Most likely, you need to adjust the size of your memory pool or the UART being used for zephyr_console output
if you do see that traffic on the serial port, verify TVM is using the correct port. see python/tvm/micro/transport/serial.py and python/tvm/micro/contrib/zephyr.py.
you can also use --debug-micro-execution to debug the firmware binary while TVM is sending it commands. You need to launch a separate terminal window and run python -mtvm.exec.microtvm_debug_shell, and then TVM will launch GDB in that terminal after it has flashed the device

let me know if this helps.

-andrew

max1996 · January 29, 2021, 11:37am

Hi @areusch ,

thank you for the initial pointers. I am not very experienced with such embedded devices, but I think I managed to follow your points:

I guess the workspace/builds pass only applies, when I am not using the Jupyter Notebook to run the example, so I took the files from the path that is returned in one of its cells
west flash returns the same error as if it would be executed from Python
west debug seems to indicate, that just a reset has been executed and it is idleing afterwards
image607×319 49.3 KB

I also tried to flash the standalone, but it is too big for the available SRAM

areusch · January 29, 2021, 5:06pm

hi @max1996,

the flash output (for the non-standalone projects) looks correct to me. the next step in debugging is to open a serial console with python -m serial.tools.miniterm and verify whether or not you see the debug uTVM On-Device Runtime message printed. you need to discover the block device assigned to your Nucleo board’s Virtual COM Port. this is typically of the form /dev/ttyACMn, /dev/ttyUSBn, or sometimes /dev/tty.usb*. The easiest way to explain is to look for these in /dev when the PCB is connected, then unplug the board and see what disappears.

Once you know that, run python -mserial.tools.miniterm <path to block device> 115200 and you should be able to confirm whether you see the debug message. if you do see the message, it’s likely TVM is just using the wrong block device internally (it auto-detects the device to use, but this logic is sort of hopefully per-board-family). you can probably fix it by overriding the chosen block device in Python. otherwise, there is likely an error in startup between Zephyr and the µTVM Runtime. this error may be unfortunately tricky to debug over the forum.

The standalone unfortunately requires 512KB RAM so it’s unlikely it will work with your particular dev board. You could eventually try supplying a smaller network, though.

Andrew

max1996 · February 1, 2021, 7:34am

Hi @areusch,

thank you very much for your help. I seems like you are right.

I looked at the serial output while trying to run the example jupyter notebook and it looks like the board is stuck in a loop due to TVM using the wrong device for communication.

max1996 · February 1, 2021, 9:07am

Okay, I finally got my hands on a F746ZG board and experienced the same problem. I added some print output for the port path and it is using the correct device (/dev/ttyACM0)

max1996 · February 1, 2021, 1:57pm

The error seems to alternate between timeout during the handshake and “Check failed: bytes_consumed <= pending_chunk_.size() == false: consumed 18446744073709551605 want <= 149”

Does that indicate some kind of race condition? I tried to insert some prints, to get more information on where it fails, but I could not find out anything useful

areusch · February 1, 2021, 6:16pm

hey @max1996 can you try pulling this PR: Update to support nRF5340. by mdw-octoml · Pull Request #2 · areusch/microtvm-blogpost-eval · GitHub

I think perhaps the runtime in the blogpost did not get TVMPlatformGenerateRandom yet, so it’s using the default weak-linked impl that doesn’t do anything (but which is useful when doing standalone deployment).

max1996 · February 2, 2021, 8:56am

Hi @areusch , sorry to bother you again, but the linked PR did not work out with the F746ZG. I realized that the system time of my VM was fluctuating a lot and after fixing it, I was able to get debug information during the execution process (it outputs a lot) and during the handshake the problem really seems to be related to the session id / random generator

here the seemingly important part (this part repeats until it times out):

2021-02-02 09:47:58.633 DEBUG base.py:248 micro-rpc: read { 0.00s}  128 B -> [ 46 B]:
0000  69 74 68 75 62 2f 74 76 6d 2f 73 72 63 2f 72 75  ithub/tvm/src/ru
0010  6e 74 69 6d 65 2f 63 72 74 2f 75 74 76 6d 5f 72  ntime/crt/utvm_r
0020  70 63 5f 73 65 72 76 65 72 2f 72 70 63 5f        pc_server/rpc_
2021-02-02 09:47:58.637 DEBUG base.py:248 micro-rpc: read { 0.00s}  128 B -> [ 46 B]:
0000  73 65 72 76 65 72 2e 63 63 3a 31 32 37 3a 20 43  server.cc:127:.C
0010  68 65 63 6b 20 66 61 69 6c 65 64 3a 20 6b 54 76  heck.failed:.kTv
0020  6d 45 72 72 6f 72 4e 6f 45 72 72 6f 72 20        mErrorNoError.
2021-02-02 09:47:58.773 DEBUG base.py:248 micro-rpc: read { 0.00s}  128 B -> [ 39 B]:
0000  3d 3d 20 65 72 72 6f 72 3a 20 67 65 6e 65 72 61  ==.error:.genera
0010  74 69 6e 67 20 72 61 6e 64 6f 6d 20 73 65 73 73  ting.random.sess
0020  69 6f 6e 20 69 64 00                             ion.id.
2021-02-02 09:47:58.892 DEBUG base.py:248 micro-rpc: read { 0.00s}  128 B -> [129 B]:
0000  fe ff fd 90 00 00 00 00 00 03 2f 68 6f 6d 65 2f  ........../home/
0010  6d 61 78 2f 67 69 74 68 75 62 2f 74 76 6d 2f 73  max/github/tvm/s
0020  72 63 2f 72 75 6e 74 69 6d 65 2f 63 72 74 2f 75  rc/runtime/crt/u
0030  74 76 6d 5f 72 70 63 5f 73 65 72 76 65 72 2f 72  tvm_rpc_server/r
0040  70 63 5f 73 65 72 76 65 72 2e 63 63 3a 31 32 37  pc_server.cc:127
0050  3a 20 43 68 65 63 6b 20 66 61 69 6c 65 64 3a 20  :.Check.failed:.
0060  6b 54 76 6d 45 72 72 6f 72 4e 6f 45 72 72 6f 72  kTvmErrorNoError
0070  20 3d 3d 20 65 72 72 6f 72 3a 20 67 65 6e 65 72  .==.error:.gener
0080  61                                               a
2021-02-02 09:47:58.897 DEBUG base.py:248 micro-rpc: read { 0.00s}  128 B -> [ 23 B]:
0000  74 69 6e 67 20 72 61 6e 64 6f 6d 20 73 65 73 73  ting.random.sess
0010  69 6f 6e 20 69 64 00                             ion.id.

areusch · February 2, 2021, 5:21pm

weird. can you try to add this implementation of TVMPlatformGenerateRandom?

also, be sure you have CONFIG_TIMER_RANDOM_GENERATOR=y in prj.conf.

max1996 · February 3, 2021, 6:58am

hi @areusch,

I tested it with and without the pull request, as well as my modified (with Nucleo L496ZG) and the original version ( using the Nucleo F746ZG). The error stays the same.

I did not find “CONFIG_TIMER_RANDOM_GENERATOR=y” in any project file inside the blogpost repository, but it is present in the pro.conf in tvm (/home/max/github/tvm/tests/micro/qemu/zephyr-runtime/prj.conf). Is this correct? Where do I have to add this line, as the prj.conf file in the blogpost repository seems to be created for each run of the build process?

areusch · February 3, 2021, 3:46pm

ah yeah–it depends per-board. try adding it to the template prj.conf (you will need to comment line 46). that prj.conf will be copied into each created project tree under workspace/

max1996 · February 4, 2021, 9:11am

hi, I tried it. It shows this warning:

warning: The choice symbol TIMER_RANDOM_GENERATOR (defined at subsys/random/Kconfig:29) was selected (set =y), but no symbol ended up as the choice selection. See http://docs.zephyrproject.org/latest/reference/kconfig/CONFIG_TIMER_RANDOM_GENERATOR.html and/or look up TIMER_RANDOM_GENERATOR in the menuconfig/guiconfig interface. The Application Development Primer, Setting Configuration Values, and Kconfig - Tips and Best Practices sections of the manual might be helpful too.

and if do not pass the --log-level=DEBUG argument, it will get stuck at the same point, but without reaching the timeout. If I pass the --log-level argument, it will timeout will trying to connect, just as before.

As I am now using the same board, that you used in the demo and it is still not working, I will try to erase everything and start with a new VM. I must have done something wrong

EDIT: I recloned the TVM and blogpost repositories, used the same board as you did and it worked after I removed

CONFIG_RESET_ON_FATAL_ERROR=n

from the prj.conf (it would not compile otherwise) and it is now working correctly (did not try with the pull request again yet), but I can now proceed to readd my changes for the L496ZG and test them. Thank you very

EDIT EDIT: I was able to modify a fork of your repository as well as of TVM to support the Nucleo-L496ZG board. would it be useful to merge this changes into TVM to enable support for “smaller” CPUs(the fork of the blogpost would be less relevant)?

repository:

TVM + CortexM4+ STM32L496ZG

Blogpost + STM32L496ZG

areusch · February 4, 2021, 5:00pm

ah interesting, okay. i’m not sure what may have been wrong before, but i’m glad it’s working now.

i took a look at your changes and they look fine overall. we may need to think a little about templating the prj.conf and generating it per-board (actually I think the one checked-in at master right now works with nRF boards and not STM, so I should fix that).

do you want to open some PRs and we can iterate there?