I’ve been thinking about this problem (before I came across TVM/NNVM) for a while, and I was trying to think of some ideas for how it should be approached.
I have a set of accelerators that correspond very closely to NNVM operators (ie: The accelerator does a convolution, etc).
So basically what I want to do is write something that finds places in the NNVM graph of a model, and replaces the part where that would normally execute with a function call.
There’s an example of how to do this with a built in CUDA function, but I don’t understand how to approach doing this if I have a pre-written function that I want to compile down.
One thing I thought of was trying to do a replacement pass, but reading the example passes like the “GraphFusePartition” pass I’m having a hard time understanding how this would impact the later layers of compilation.
As I understand it:
NNVM operations, can be made up of TVM operations.
So if an NNVM operation is replaced with a function call, how do I maintain the graph for the subsequent passages?
It sounds like you support high-level operators in your accelerator, such as 2dconvolution etc. I recommend first starting at the TVM abstraction layer, where you can write a TOPI operator library for your accelerator for each type of operator that your accelerator supports (e.g. 2dconvolution, fullyconnected, maxpool, activation, groupedconvolutions etc.).
Once you get to the point where you have mapped TOPI operators to your accelerator you can start to map NNVM graph nodes to those operators.
Note that our graph IR is going through a major revision/replacement (as you can see in this issue: https://github.com/dmlc/tvm/issues/1673). This is why I’d recommend creating an operator library in TVM first.
Also regarding NNVM graph partitioning for heterogeneous inference in case you want to offload operators to CPU and a custom accelerator in a system without shared memory (currently VTA relies on that assumption), I believe that you may be interested in this WIP PR: https://github.com/dmlc/tvm/pull/1688
Yeah the accelerators map very closely to NNVM operations.
They’re actually just hypothetical accelerators at this point, because I wanted to a thought experiment on the idea of driving hardware design based on what the compiler can do. So I’m approaching this as if the accelerator maps exactly to NNVM operators.
Are you saying I should write a set of TVM/TOPI ops that correspond to the accelerator, and at NNVM level replace that operation with this library?
How would I modify the NNVM->TVM replacement stage to do the replacement that I want.
In terms of modifying the NNVM->TVM replacement stage to do what you want, we’ll release tutorials on how to do this in Relay which is the successor to NNVM. Stay tuned for more info on that.
One thing I realized after posting was that what I’m really looking for isn’t heterogenity, so much as it is simply replacing nodes in a graph with a function call. It just happens that the function call contains an instruction that causes an accelerator to be activated.