I think there is an issue with current TSIM for resnet-18 computation on VTA. My current experiments are constantly reproducible on both Linux and macOS, while attempts with FSIM backend are successful.
On both Linux and macOS, it just crashed into a segmentation fault error. To reproduce the error, configure vta/config/vta_config.json to use tsim backend, and run deploy_vision_on_vta.py with python3.
With âTARGETâ set to âtsimâ in vta_config.json, I tried the demos in vta/tutorials and vta/tests/integration (including the test_vta_insn.py). The error is:
File â/krzhang/tvm/tvm0809/tvm-0809-base/vta/python/vta/testing/simulator.pyâ, line 43, in _load_all
m = tvm.module.load(lib[0], âvta-tsimâ)
IndexError: list index out of range
It seems that âlibvta_hw.soâ is missing.
Am I using too old version?
Appreciate it!
Thanks for your attention, I think for now, you can have a successful evaluation of the test_benchmark_topi_conv2d.py script with TSIM backend. The script performs most of the workloads in resnet18. Therefore, I think the hardware implement (Chisel VTA) along with TSIM based simulation should be fine for now.
As for the problem in evaluating deploy_vision_on_vta.py script, the error reported seems to be related to the integration with relay and the runtime.
Note that you might need to duplicate the lines with pynq_1x16_i8w8a32_15_15_18_17 in the file ~/.tvm/tophub/vta_v0.06.log, and replace pynq_1x16_i8w8a32_15_15_18_17 with tsim_1x16_i8w8a32_15_15_18_17 in order to load pre-tuned schedule parameters correctly.
Hi @thierry, since @stevenmburns can also eval test_benchmark_topi_conv2d.py successfully with TSIM backend, I think there is no hardware issue in Chisel VTA to enable end-to-end inference.
As we are heading towards enabling end-to-end inference with the deploy_vision_on_vta.py script, I observed the segment fault actually take place in the 1st layer of resnet18. I also observed that the 1st layer of resnet18 doesnât actually run on VTA, since it is ahead of ânn.max_pool2dâ layer.
The stack trace looks like the following (some functions actually exist in the generated code):
#0 0x00007fffba0a0cd6 in tvm::runtime::ThreadPool::RunWorker(int) (this=0x2032da8, worker_id=1)
at /home/liangfu/workspace/tvm_upstream/src/runtime/thread_pool.cc:365
#1 0x00007fffba0a04f9 in tvm::runtime::ThreadPool::ThreadPool()::{lambda(int)#1}::operator()(int) const
(__closure=0x22f0518, worker_id=1) at /home/liangfu/workspace/tvm_upstream/src/runtime/thread_pool.cc:291
...
#3 in __TVMBackendParallelLaunch
#4 in fused_nn_conv2d_add_nn_relu_compute_
#5 in fused_nn_conv2d_add_nn_relu
...
Do you have any suggestions in making this actually work?
We (Intel Strategic CAD Labs) would like to get the end to end flow working as well.
A few observations and questions from our end:
The end to end flow works with target âsimâ but not with âtsimâ. The verilator simulation resets and performs zero or one clock ticks out of reset before the crash occurs. I get four separate core dumps before the threading code produces a stack trace. When I run in gdb I see the first core dump is in the code generated by the runtime (libgraph I think). There are no debugging symbols to see exactly what happened. (Perhaps there is a way to get more debug visability here.)
Why would sim work and tsim not work before the tsim simulator starts doing anything real? Any thought?
Does the end to end flow work for you on the de10 nano, or do you get a similar issue with a runtime coredump? We have a de10 nano up and running, but I donât know the results yet of running the end to end flow? Is it expected to work? If it does work, what is different about this environment than the tsim verilator set up that could cause the difference?
@stevenmburns Thanks for the comments. Unfortunately I donât have a fix as well.
My previous post was a diagnose on the fix to bring TSIM support for the end to end work flow.
It doesnât work on my de10-nano as well. The error take place on the 1st layer of resnet18 inference, which is designed to run on cpu instead, if I understand correctly.
The crash is caused by the use of virtual memory in TSIM driver. The runtime is trying to reach virtual memory address. I brought a PR to fix the issue. (See PR #4527 )
After making the changes you suggested in the TSIM driver, I was able to run the program without any segmentation fault. But I am facing a couple of issues :
In the PR, youâve mentioned that because of the multi-threading support it would take around 5 minutes to perform the cycle-accurate simulation. However, it took around 3-4 hours to run (I ran 8 threads in an 8-core processor).
At the end, the resulting prediction shown as output turns out to be wrong (see figure).
Hi @kevinyuan, thanks for reporting.
Sorry for introducing the mistake to leave the aluBits undefined in the module. Can you bring a PR to fix that?
However, I can have a successful detection of cat with the simple fix. Letâs find out why you cannot have cats detected with TSIM backend.