We discussed this at the TVM Community Meeting this morning. There was a presentation about the approach followed by some discussion. Thanks @MJKlaiber @cgerum @SebastianBoblestETAS @paulpb @PhilippvK @r.stahl @aca88 for bringing this to the meeting!
Here are some notes (please feel free to correct them if I got anything wrong!):
-
The current graph partitioning approach is the same one that’s used in the compiler today. It’s compatible with the collage partitioning which is in the works and not yet RFC’d.
-
Would the v1 support Tensor Expression (TE), or are we skipping that?
- Mikael understands CreatePrimFunc can support TE so should be natively supported
- Paolo: using standard lowering as is done by Ethos-U
-
Proposal has an explicit differentiation between S-TIR adn NS-TRI. WOuld there be different hooks? e.g. here we can register TIR scheduling passes vs TIR passes.
- Will it be possible to contribute S-TIR back to the compiler or just NS-TIR?
- Scheduling passes work on S-TIR; passes in the boxes behind the schedules are injected into the lowering by pass context. Passes do not return S-TIR. They are part of the lowering from S-TIR to NS-TIR. At the moment, calling tvm.lower() and injecting those passes in to tvm.lower()
- Will it be possible to contribute S-TIR back to the compiler or just NS-TIR?
-
In Relay-to-TIR hook, already trying to figure out the lowering order, which might not match parittioning order. Want to see memory available after compiling c functions but before lowering Ethos-U functions. Any thoughts on whether it’s possible to configure the order of partitioning in this flow?
- Why? Need to see the amount of live memory available after running the default TVM flow.
- Relay passes can see the whole IRModule, past that only functions for a particular target are seen by a TIR pass.
- The order needs to be decided and it varies by registration point.
-
Q: Are there common accelerator passes that are in use in TVM, or does everyone do something different?
- There are common touch points, those are the “plumbing” mentioned in this slide presentation. e.g. Graph partitioning, scheduling, code-generation.
- UMA isn’t trying to box anyone into a particular flow, instead it’s just trying to suggest one way doing this from a broader set of options to serve as a guide for folks who may be new to TVM.
-
Question from Federico, who is integrating an accelerator of his own.
- VTA uses memory scopes to define buffers in block-ram. Are we planning to accommodate that in UMA?
- You could write your own schedules and passes to do this. storage_scope is kind of the way to do this at the runtime level. You can also leverage USMP to define memory pools and use it as a pass to schedule.
- VTA uses memory scopes to define buffers in block-ram. Are we planning to accommodate that in UMA?