[RFC] Unified Static Memory Planning

areusch · June 1, 2021, 7:56pm

hi @manupa-arm, thanks for posting this! there’s a lot to unpack here.

I think we can break the work here into two parts:

P1. Implementing the unified memory planner based on information in TIR

P2. Modifying the codegen/output to implement various compiler optimizations based on P1.

I think that the debate around P1 is likely to center around the “how,” whereas the debate around P2 is likely to center around the “what.”

Modeling the whole program in TIR

So far the AOT effort has made some initial effort here by creating a top-level TIR function which describes the top-level model. One open question related to this RFC is: how should we structure the compiler around this top-level program? In general, we have a couple of options:

S1. Place everything in TIR, and implement post-scheduling transforms as compiler passes. In the S1 world, any computed information e.g. memory placement for buffers would need to live in TIR. In this world, we should strive to avoid side-channel information carried outside of TIR.

S2. Keep with the piecewise representation, and build separate data structures to encapsulate compiler outputs from post-schedule passes e.g. memory planning.

I think currently @jroesch and @csullivan support S1 (see PR 7518, which my understanding says is still being worked on but which is often merge-conflicted). I also support this if it’s feasible to do so under all executors. I think the drawback is that non-AOT executors will need to run these passes, but the advantage is that it provides a clear framework under which we can consolidate post-scheduling whole-program modeling for both AOT and non-AOT use cases. Should we consider superseding VM executor with AOT in the future, it also provides a more natural pathway. I’m curious as to your opinions on this?

I bring this up because I think a lot of questions raised here and elsewhere in the proposal can likely be decided based on how we decide this general design pattern.

Inline questions

A couple other questions:

static int32_t entrypoint(TVMInputs_my_model* inputs, 
                          TVMOutputs_my_model* outputs,
                          TVMContext* context){

Just to confirm–would TVMContext also be generated e.g. TVMContext_my_model

Inputs :

AoT TIR PrimFunc ( the control function describing the call graph to operators)

All Operator Functions

the maximum size for each pool We could use “pinned_memory” (see below) to tag buffers with suggested priority order determined by the scheduler.

The idea is USMP will try to pool them using the preferred “pinned_memory” and fallback whenever the size is exceeding the user provided max size for each pool (if any)

Outputs :

AoT TIR PrimFunc accepting pool buffers from the user.

All Operator functions accepting pool buffers.

Each operator function should address using the correct offset in the correct pool buffer

I’m not certain the memory planner should necessarily encode all vars as buffer offsets–doing so could limit e.g. dynamic use cases, which may either a) need to express offsets as runtime-evaluated expressions or b) need to entirely defer such allocations to runtime, should it be impossible to pre-define such expressions.

This gets at my separation of concerns above–it would be nice to either

use the TIR-agnostic I/O format as a way to store the memory planner output and then inform further TIR modifications (e.g. either making everything buffer offsets when possible, passing those offsets in as positional arguments, or keeping TVMBAW for dynamic allocs)
represent that abstract output as e.g. TIR attributes and perform any of the aforementioned optimizations by examining TIR attributes

The current proposal for the interface is as follows :
struct BufferInfo {
    Integer uid;
    Integer size_bytes;
    Integer alignment;
    Array<Integer> conflicts; //the conflicting uids of buffers`
    Array<Integer> pool_candidates;`
    Integer pool_id;`
    Integer pool_offset;`
}
void (*foo)(Array buffers, Map<Integer, Integer> pool_sizes)

In the tvmc command above, memory pools were identified by name. Any reason to translate to integers here?

Special Considerations :

Let’s discuss these after resolving S1/S2 debate above.

cc @tqchen @junrushao f you have comments on representing this in TIR