I am new learner to TVM and VTA. What confuse me is how parallelism run? I may get the idea of it with virtual and block diagram in VTA paper, but the code in vta-hw/src confuse me, as in that code, module load, compute and store are serialized. What is my problem? I will appreciate if any clue you would provide me.