hi @r.stahl,
Great! I’ve added a few follow-up thoughts below. I am happy to put together an RFC around a memory planning interface, but it might be next week before it’s posted. Some follow-up replies:
The optimization process would become pretty hands-on, because if my application size changes, I may have to adjust the “remaining” memory pool for the ML model and everytime make sure that memory is not wasted or performance suffers too much from too constrained memory.
For microTVM, this is where I think project generators could help. My thought is that we would place the memory planner output in metadata in Model Library Format, and then downstream Project API implementations could consume that and declare global buffers sized to match the application demands.
On non-microTVM (e.g. traditional OS), I think we could still dynamically allocate memory pools using e.g. malloc, but this would effectively handle all of the system memory allocation with one malloc call per memory pool. GPU-specific memory pools can be allocated by the underlying GPU driver (however, some more work needs to be done to link the memory pool with the TVMDevice that provides it).
I saw that at the end of
Optimize
, the AutoScheduler is invoked, so I assume this is the point where the schedule is regarded as final. Is this accurate?
Right now, we don’t finalize the schedule until we’re already in GraphExecutorCodegen, so I think this may be the source of the confusion. I would like to move to a world where GraphExecutorCodegen conumes the top-level AOT TIR (or some similar thing).
Then later on after PR 7518 is merged, we would like to be able to define compiler passes over the entire scheduled program. This would allow for full dataflow analysis, which until this point, will need to just be confined to the single top-level TIR function generated from AOT.