[RFC] Unified Static Memory Planning

manupa-arm · June 8, 2021, 3:50pm

areusch:

Ok—when you say “load a packed function of the tvm_main instead of json,” do you mean simply that GraphExecutor#run could just call tvm_main? If we make GraphExecutor effectively consume the results of this interface, seems like that would effectively change SetupStorage to issue basically 3 (or maybe a few more) allocate calls:

for the input data (optional)

for the CPU workspace pool

for the output data

there could be additional calls if there are additional e.g. accelerator buffers

I think such a proposal might work to unify the memory planning around this AoT-based approach, but there are some cases which might mean we need to relax this proposal a bit–for instance, the part about passing only the memory pools to operators. it may be that in order to support overriding parameters at runtime (which GraphExecutor currently allows), we need to keep with passing individual function arguments, but these can be arranged (by AOT or GraphExecutor) to merely be offsets into the memory pools (or then be overridden to user-supplied tensors).

That broadly aligns with our thinking.

The proposed USMP’s actual “component” interface will be quite similiar to TVMC CLI additions. Therefore, graph executor flow could use “–with-parameter-buffer” to make USMP expose the parameter buffer to the actual executor runtime – so that the executor could update constants with known offsets.

Regarding specific parameter updates, since the relay pipeline run passes such FoldConstants, would it be safe to do specific parameter updates ? Anyway, if thats the case, we could use the same way it uses to know which parameters to update to using offsets instead.

Ack, yes maybe we should not limit this in the design.

I see, I think you are querying about attributes of the pools itself.

Initially, we are starting with “name” and “target” to identify the pool uniquely and which targets could access them, respectively.

However, going forward we are going to provide a guide “size” for the buffer to be used, which we could use to distribute tensors (if there are options) based on memory pressure – hence the guide.

Going a bit further out, we are planning append more metadata such as “bandwith” for the buffers, but to be used by scheduler to redact pools based on where they want them placed (in some cases where we might use double_buffering, rolling_buffers using scheduling primitives) – that goes hand-in-hand with performance required.