Memory scope for vm alloc_storage builtins

Hi all,

Could anyone explain why there is no support for memory scope in the alloc_storage builtins. I see that there is a storage_scope argument for relax.memory.alloc_storage, but when it gets lowered in VMBuiltinLower, the storage_scope argument is just ignored.

I understand that StaticPlanBlockMemory pass does not yet support any other memory scopes, but was just wondering if we needed to add support for it, we would need storage scopes supported in the vm builtins.

If there is no specific reason why storage scope was not added, I can add support for that as we need it. I just wanted to check if there was any specific reason why it was not there (any design level issues?).

Thanks, Anirudh

Yes, i think it should be supported. I think we can do it as part of overall effort in https://github.com/apache/tvm/issues/15101

1 Like

Thanks for the quick reply @tqchen. I’ve gone through the RFC once, and will probably try to go through a little more slowly later.

If I understand the overall flow correctly here, we would still need to support storage_scope in vm.builtin.alloc_storage and the to_vdevice builtin would then allow movement to different devices. That is, the actual allocation of memory would still be through an alloc_storage, which would probably be inserted for each vm.builtin.to_vdevice builtins, but with the memory scopes defined in the vdevice object.

If that’s the case, would it be okay if I quickly added storage scope support for vm.builtin.alloc_storage as we need it for moving ahead on our work.

1 Like

yes, i think that probably makes sense

1 Like

Great, thanks a lot for your help

awesome, keep us posted

1 Like

Hi @tqchen I’ve gone through the code and have some initial ideas, but wasn’t sure of the best approach for modifying the builtins.

The way it is right now,

On the one hand there is the R.vm.alloc_storage and the vm.builtin.alloc_storage functions that take the memory to be allocated in terms of the number of bytes. On the other hand, we have the AllocDataSpace device api method that supports mem_scope, but it takes the ndim and shape of tensor as arguments.

So we would need to either add new alloc_storage relax function and builtin to take ndim and shape as arguments, or add another overload of AllocDataSpace that takes mem_scope along with the nbytes version.

I guess adding new relax functions with ndim and shape support makes more sense, but in order to lower that properly, I might need to add support for pooled allocator for Relax VM that accepts ndim, shape and mem_scope. I’m not sure if it makes sense to implement a generic pooled allocator for potentially multi-dimensional memory allocation.

So, I wanted to ask whether it makes sense to not support pooled allocator for other memory scopes and let appropriate device managers handle pooling within their APIs? Only naive allocator would be supported for memory scoped versions

Also I wanted to ask if we would need to go through an RFC for the changes as we’re probably introducing new Relax functions/builtins and changes to Relax VM with new allocators

The ndim allocation is less frequently used, the only place that it is needed I think was texture, and more recently there is a better way to special handle some of that through allocator interface itself.

Given builtins are only registered function rather than permanent data structures we can afford to iterate as need comes. My recommendation is start with allocator flat memory for now, then as need comes we add later. Let me know if that makes sense

Hi @tqchen,

Thanks for the suggestion. We actually need the ndim version of AllocDataSpace for hexagon as it is used by hexagon_device_api for 2d allocation on VTCM. I can add both versions if that’s better, but we might need the ndim version as I couldn’t figure out any other way to pass information about 2d allocation from relax.

Right now, the way 2d allocation is done in hexagon is using the AXIS_SEPARATOR member of BufferNode which is set using transform_layout primitive in TIR. So the only way to allocate on VTCM was to use cache_read/cache_write within a PrimFunc for copying from global memory. My current idea was to try and avoid those extra copies within the PrimFuncs by directly allocating to VTCM from relax functions (that is why I needed memory_scope support here), but that also means that I need to somehow pass on the ndim/shape information as well.

I’ve got one version for the flat memory allocation already done. I thought of creating new relax registered functions called something like alloc_storage_with_scope or alloc_storage_nd, that would allow allocating n-D storage.

Should I consider other ways to pass the n-D allocation information to runtime without adding these extra registered functions or is this okay?

Thanks again for your quick replies

Get it, I think from the interface pov allocate_storage PackedFunc already takes in shape, so maybe having that directly enhanced with scope is not a bad idea(and support 1D case).

Then it is a matter of implementing that allocator backend. In which case we can enhance allocator interface and have another subclass of NDAllocator

1 Like

That makes perfect sense. Thanks a lot, I’ll ping here once the PR is ready.

I’ve raised PR #15178 with the changes