[Discussion/Alignment] Memory Planning

areusch · June 3, 2021, 9:57pm

Thanks for posting your implementation!

First results are shown below. This is evaluated on RISC-V and using the code generator . There still might be some issue, because I would not expect the RAM usage to rise in any example. But looks promising so far with 10% reduction for a cifar10 model and 18% for resnet!

These are great results!

It would be awesome to find a way to contribute this to TVM. Here are some thoughts along those lines…

We’ve now landed initial work towards the AOT executor, and there’s been some parallel work to do memory planning with AOT executor.
The AOT planner currently uses GraphPlanMemory but there is a similar proposal to replace GraphPlanMemory with something else.
It would be great if we could come to a single memory planning interface and use that for both Graph and AOT memory planning.
With AOT, memory planning happens at the TIR level, which I think is slightly better as it allows for planning scratchpad/workspace memory alongside intermediate/output tensor memory. However, I think the fundamental inputs to any memory planning algorithms are similar.

In [RFC] Unified Static Memory Planning, there is a proposal for a memory pool-based planner interface. Could you provide some input on the interface planned there, and see if it’s compatible with your work here? It seems like we could move forward by merging that interface and replacing GraphPlanMemory with something that uses the new interface. Then, if I’m understanding correctly, the planner work you’ve done here with networkx could become an implementation of that interface.

Previously you’d critiqued memory pools, so I also just wanted to follow-up:

What we are used to from C compilers is a priorization for performance or memory (-O3/-Os). In the non-prioritized category it still tries to do the best possible job while avoiding unreasonable trade-offs. In the micro-world I guess a higher amount of control for this trade-off could be desired, but not really for all TVM users, right?

My thought is that memory pools still work as an abstraction here, and there can be additional parameters provided to any memory planning algorithms to enable tradeoffs such as these. If the user wishes to try for highest performance, they can offer as much memory as they can to the planner to see if it improves things.

I think in general it’s good to retain flexibility in TVM, as use cases tend to be quite varied. We may need to ensure there are sane defaults, though. Is there a use case you had in mind where the additional control is a drawback?