Hi all, I am developing a custom NN accelerator hardware.
I also provide a C API layer that implements basic operators like conv2d, dense, depthwise conv2d, and others.
This implementation utilizes the NN accelerator features in the most optimized way.
I would like to use TVM to parse any frontend (tflite, onnx, keras, …) and eventually generate a C code that calls my customer C API for the operators that I’ve implemented and use the default implementation for the ones not implemented yet.
I would appreciate some guidance or explanation of what would be the correct way to do the above.
I would advise you to take a look at how the VTA accelerator is integrated into TVM, and also the Ethos-U. I also think this RFC would make your life soooo much easier
Thanks a lot, I think UMA (Universal Modular Accelerator) is what I’m looking for.
It is still in development though.
I will post my findings later in this thread
@kslavka The pull request for UMA baseline infrastructure is currently being integrated in the TVM toolchain: [UMA] UMA v1.0 by MichaelJKlaiber · Pull Request #12087 · apache/tvm · GitHub but will surely need a view rounds of review before it finally lands in main. But I think it needs a few more additions to actually improve your user experience significantly over current BYOC.
For the further adoption of UMA it would be great if you could provide an example for the C-Code you would like to be generated by TVM.