Dear community members:
I am new to TVM/VTA, may I know what’s the proceedure to port TVM to new acceleartor other than the default VTA ?
e.g.: An accelerator design which looks like an array of VTA computing blocks connected to AXI-like NOC
I think the graph-level optimization should be hareware agnostic, but the tensor-level optmization should take care of the accelarator’s michroarchitecture, right?
Can you give me some basic ideas how to do this ?
Thanks very much 
Kevin
Hi Kevin,
The graph-level optimizations are actually not hardware agnostic. Say you have a tensor core that does vector matrix multiplication, you’ll need to reshape the data from NCHW to something like NCHWc. In addition you may need to perform some quantization etc.
One starting point is to formalize your accelerator programming interface. How do you program/synchronize all of your array computing blocks? How do you move data in and out of your accelerator? And finally how do you orchestrate computation and data? This can give us a starting point to understand what needs to be implemented to target the design.
Thierry