[RFC] UMA: Universal Modular Accelerator Interface

areusch · March 2, 2022, 5:04pm

We discussed this at the TVM Community Meeting this morning. There was a presentation about the approach followed by some discussion. Thanks @MJKlaiber @cgerum @SebastianBoblestETAS @paulpb @PhilippvK @r.stahl @aca88 for bringing this to the meeting!

Here are some notes (please feel free to correct them if I got anything wrong!):

The current graph partitioning approach is the same one that’s used in the compiler today. It’s compatible with the collage partitioning which is in the works and not yet RFC’d.
Would the v1 support Tensor Expression (TE), or are we skipping that?
- Mikael understands CreatePrimFunc can support TE so should be natively supported
- Paolo: using standard lowering as is done by Ethos-U
Proposal has an explicit differentiation between S-TIR adn NS-TRI. WOuld there be different hooks? e.g. here we can register TIR scheduling passes vs TIR passes.
- Will it be possible to contribute S-TIR back to the compiler or just NS-TIR?
  - Scheduling passes work on S-TIR; passes in the boxes behind the schedules are injected into the lowering by pass context. Passes do not return S-TIR. They are part of the lowering from S-TIR to NS-TIR. At the moment, calling tvm.lower() and injecting those passes in to tvm.lower()
In Relay-to-TIR hook, already trying to figure out the lowering order, which might not match parittioning order. Want to see memory available after compiling c functions but before lowering Ethos-U functions. Any thoughts on whether it’s possible to configure the order of partitioning in this flow?
- Why? Need to see the amount of live memory available after running the default TVM flow.
- Relay passes can see the whole IRModule, past that only functions for a particular target are seen by a TIR pass.
- The order needs to be decided and it varies by registration point.
Q: Are there common accelerator passes that are in use in TVM, or does everyone do something different?
- There are common touch points, those are the “plumbing” mentioned in this slide presentation. e.g. Graph partitioning, scheduling, code-generation.
- UMA isn’t trying to box anyone into a particular flow, instead it’s just trying to suggest one way doing this from a broader set of options to serve as a guide for folks who may be new to TVM.
Question from Federico, who is integrating an accelerator of his own.
- VTA uses memory scopes to define buffers in block-ram. Are we planning to accommodate that in UMA?
  - You could write your own schedules and passes to do this. storage_scope is kind of the way to do this at the runtime level. You can also leverage USMP to define memory pools and use it as a pass to schedule.