[VTA] Workaround for Autotuning with One PYNQ Z1 Board

hht · November 26, 2020, 9:18am

One way is to create a global class record cma alloc and free. When exception occurs, the python-rpc-server firstly provided the C API to invoke xilinx cma_free() to release the cma memory then terminate the process.

hht · November 26, 2020, 9:24am

Another way is to avoid rpc exception and improve tuning efficiency. I replace batch 300 input tensor with batch 1 input tensor. And use simple python script the generate simpler net. I thought that with GEMM batch 300 conv2d could share nearly the same optimal schedule with batch 1 conv2d.

isong · November 26, 2020, 10:47am

Hi @htt

Thank you for sharing your work. It seems the 2nd option made more sense, though it seems to be good to improve rpc exception handling.

Cheers, ISS

thkim · December 4, 2020, 10:31pm

Hi,

I want to check several things.

Does your tutorial work with ‘xgb’ tuner?
I tried to compare non-tuning with tuning and failed to see the improvement for execution time after tuning. Did you see the improvement after tuning?
@isong I’m using one PYNQ-Z1 board. When I ran the tutorial, the mean inference time was 365 ~ 372 ms. Your example shows 69.65 ms. Could you give me some advice to decrease the mean inference time?

Thanks!

hht · December 6, 2020, 4:19am

@thkim

I am not sure, but with random tunner it works.
Because you use the tophub well-tuned params. If you try with fallback config, it will be great improvement.
I use PYNQ Z1 and ZCU104. I think 365ms is reasonable.

isong · December 6, 2020, 10:39pm

Hi @thkim

For 1 and 2, I think @hht has already answered.

For 3, I used ultra96, could be the reason why the difference.

isong · December 13, 2020, 10:19am

@hht

I have create a PR that combines your change and my change that is discussed in the thread. I wanted to add you to the PR, but I cannot seem to find you in @ at github id.

github.com/apache/tvm

[Fix][Tutorial][VTA] Update tune_relay_vta.py to support single board

main ← insop:insop/tune_relay_vta

opened 10:10AM - 13 Dec 20 UTC

insop

+16 -15

- support single pynq board run, change is credited to @i24361, https://github.c…om/i24361/incubator-tvm/blob/0472b1f347976229a29be8a6e60b626a0604c8df/vta/tutorials/autotvm/tune_relay_vta_with_one_board.py - fixes the save fail - issues and changes are discussed in https://discuss.tvm.apache.org/t/vta-workaround-for-autotuning-with-one-pynq-z1-board/8091/9 Thanks for contributing to TVM! Please refer to guideline https://tvm.apache.org/docs/contribute/ for useful information and tips. After the pull request is submitted, please request code reviews from [Reviewers](https://github.com/apache/incubator-tvm/blob/master/CONTRIBUTORS.md#reviewers) by @ them in the pull request thread.

hht · December 13, 2020, 12:33pm

@isong

I see you already @ me. Hope this commit can pass.

youxiudeshouyeren · April 21, 2022, 3:33am

Recently, I encountered this problem again. Using a single pynq Z2 will prompt that the device cannot be obtained from remote. My TVM version is v0 8. Do you have any suggestions? Thank you.

thesinger · May 6, 2022, 5:36am

Have you solved the problem? I also encounter the problem, My TVM version is 0.9, and remote device is PYNQ Z2.

thesinger · May 6, 2022, 5:38am

@youxiudeshouyeren Do you have any ideas?

denis · November 1, 2022, 6:37am

@hht Hello, I meet the same problem now.
If you remove the remote , then how could you program the FPGA with bitstream?
And should I run start_rpc_server.sh or start_rpc_server_to_tracker.sh ? What’s the difference?