In my understanding, VTA is a backend of TVM, which reuses the frontend and middle-end. Therefore, the output IR is a combination of basic ops in Halide and for loops. Then, VTA inserts DMA ops and tensorize basic ops.
Do you think it could be reasonable to support intrinsics in TVM low level IR that can easily lowered into common instructions in AI chips?
By the way, how efficient is current tensorization. Does it support for detecting user implemented node which uses lambda expr? How is the theory behind this?
Hi,I just start to study TVM , I want to know except VTA’s code vta.cc whcih was design by someone.Does the TVM can aotu-print code of FPGA like it print cuDNN code?