New backend for microTVM?

hi @hagonzalezdvb thanks for posting up your question! It sounds like a fairly interesting system. It’s hard to say exactly without knowing more details, but it seems like:

  • If each processing element didn’t need to worry about any accelerators (e.g. was just a single-core ARM CPU), you could simply model each element as a single Relay model and wrap our GraphExecutor or (soon) AotExecutor in some way as to be compatible with your runtime. Since you already define a partitioner and mapper, I presume you also have some runtime component which can coordinate the cores (e.g. load code and tensors and dispatch tasks)
  • Should each processing element have additional CPUs or accelerators, you can use the same approach but the microTVM side gets a bit more complex. This side isn’t fully implemented yet. See the [pre-RFC] C Device API thread for more on supporting generically heterogenous systems from the C runtime. However, if you just have a single CPU with an accelerator and you want to synchronously offload compute to the accelerator, you could probably take the same approach being used for the Ethos-U accelerator (e.g. use tir.call_extern to invoke the driver directly from TVM).
  • Apart from the TVM RPC system, if memory serves we don’t have a runtime component right now which could coordinate all of the various cores in your system.
  • You might also see the pipelined GraphExecutor work done by @hjiang

Let me know if this helps.

-Andrew