[VTA] Workaround for Autotuning with One PYNQ Z1 Board

One way is to create a global class record cma alloc and free. When exception occurs, the python-rpc-server firstly provided the C API to invoke xilinx cma_free() to release the cma memory then terminate the process.

Another way is to avoid rpc exception and improve tuning efficiency. I replace batch 300 input tensor with batch 1 input tensor. And use simple python script the generate simpler net. I thought that with GEMM batch 300 conv2d could share nearly the same optimal schedule with batch 1 conv2d.

Hi @htt

Thank you for sharing your work. It seems the 2nd option made more sense, though it seems to be good to improve rpc exception handling.

Cheers, ISS

Hi,

I want to check several things.

  1. Does your tutorial work with ‘xgb’ tuner?
  2. I tried to compare non-tuning with tuning and failed to see the improvement for execution time after tuning. Did you see the improvement after tuning?
  3. @isong I’m using one PYNQ-Z1 board. When I ran the tutorial, the mean inference time was 365 ~ 372 ms. Your example shows 69.65 ms. Could you give me some advice to decrease the mean inference time?

Thanks!

@thkim

  1. I am not sure, but with random tunner it works.

  2. Because you use the tophub well-tuned params. If you try with fallback config, it will be great improvement.

  3. I use PYNQ Z1 and ZCU104. I think 365ms is reasonable.

1 Like

Hi @thkim

For 1 and 2, I think @hht has already answered.

For 3, I used ultra96, could be the reason why the difference.

1 Like

@hht

I have create a PR that combines your change and my change that is discussed in the thread. I wanted to add you to the PR, but I cannot seem to find you in @ at github id.

1 Like

@isong

I see you already @ me. Hope this commit can pass.

Recently, I encountered this problem again. Using a single pynq Z2 will prompt that the device cannot be obtained from remote. My TVM version is v0 8. Do you have any suggestions? Thank you.

Have you solved the problem? I also encounter the problem, My TVM version is 0.9, and remote device is PYNQ Z2.

@youxiudeshouyeren Do you have any ideas?

@hht Hello, I meet the same problem now.
If you remove the remote , then how could you program the FPGA with bitstream?
And should I run start_rpc_server.sh or start_rpc_server_to_tracker.sh ? What’s the difference?