[VTA] Inference questions

Hi, as a new user I have some questions about using VTA in simulation and RPC server mode:

  1. Are fully connected layers (and non-quantized convolutional layers) executed by target CPU (ARM CPU of the board) ? Or by host CPU (x86 CPU of my computer) ?

  2. What is measured exactly when using VTA in tsim with the timer() function: Only part offloaded to VTA or also layers executed by target ARM CPU ? It is related to question 1.

  3. The value returned by timer() function when I execute the MxNet tutorial (https://tvm.apache.org/docs/vta/tutorials/frontend/deploy_classification.html#sphx-glr-vta-tutorials-frontend-deploy-classification-py) in tsim is about 90 seconds! Why is it so far from the results in the publication?

  4. How to interpret the simulation stats in tsim (cycle_count)? and in fsim (inp_load_nbytes, etc…)?

  5. Is it possible to measure execution time layer by layer to identify a bottleneck in the neural network ?

Thanks in advance :smiley: