VTA configuration problem

patrick20190725 · July 26, 2019, 7:21am

In TVM0.6, the configuration in vta_config.json cannot be changed. when I change “LOG_BATCH” “LOG_BLOCK_IN” “LOG_BLOCK_OUT” and other parameters, it will bring great performance regression, which increases the data amount hugely. I want to know how to configure correctly. Thanks!

thierry · July 26, 2019, 7:50am

Thanks for sharing your issue in the forums. The reason that there are these messages about great performance regression is because we need to re-tune the schedules for each resnet operator on this new VTA variant that was synthesized.

In order to do so, there is a handy tutorial that will explain the setup steps: https://docs.tvm.ai/vta/tutorials/autotvm/tune_relay_vta.html#sphx-glr-vta-tutorials-autotvm-tune-relay-vta-py

patrick20190725 · July 26, 2019, 9:17am

Thanks for your guidance. I want to know further how I can use ‘sim’ mode to re-tune the network without using physical devices.

thierry · July 26, 2019, 4:29pm

I see, this may lead to slightly off schedules since when using the simulator, we’d be optimizing for the simulator runtime and not the actual hardware runtime. However, I believe it would be possible to deploy an autoTVM loop that uses simulation metrics (e.g. data moved) as an objective function to minimize. I believe that would be easy to do, and would lead to a pretty good first order approximation of a good schedule. The only limitation would be that the simulator would not be able to capture effects like thread-level parallelism.

There is an alternative which is to use the cycle accurate simulator to get on-chip cycle statistics, but (1) it would be very slow to run a whole tuning job (takes minutes to execute a resnet), and (2) it does not capture effects of DRAM access for now (and some workloads are memory bound so).

thierry · July 26, 2019, 4:30pm

Long story short is: (1) it would be more realistic to run this optimization loop if you had an FPGA, and (2) if you really want to, I’d suggest using the fast simulator and guide the optimization process using bytes moved as a first order approximation of hardware performance.