New backend for microTVM?

areusch · November 8, 2021, 1:03am

hi @hagonzalezdvb thanks for posting up your question! It sounds like a fairly interesting system. It’s hard to say exactly without knowing more details, but it seems like:

If each processing element didn’t need to worry about any accelerators (e.g. was just a single-core ARM CPU), you could simply model each element as a single Relay model and wrap our GraphExecutor or (soon) AotExecutor in some way as to be compatible with your runtime. Since you already define a partitioner and mapper, I presume you also have some runtime component which can coordinate the cores (e.g. load code and tensors and dispatch tasks)
Should each processing element have additional CPUs or accelerators, you can use the same approach but the microTVM side gets a bit more complex. This side isn’t fully implemented yet. See the [pre-RFC] C Device API thread for more on supporting generically heterogenous systems from the C runtime. However, if you just have a single CPU with an accelerator and you want to synchronously offload compute to the accelerator, you could probably take the same approach being used for the Ethos-U accelerator (e.g. use tir.call_extern to invoke the driver directly from TVM).
Apart from the TVM RPC system, if memory serves we don’t have a runtime component right now which could coordinate all of the various cores in your system.
You might also see the pipelined GraphExecutor work done by @hjiang

Let me know if this helps.

-Andrew