[VM] VM PooledAllocator memory release strategy

masahi · August 23, 2021, 11:40am

Currently, VM PooledAllocator releases its memory only when the underlying device fails to allocate more memory: tvm/pooled_allocator.h at 553778885388a9eff4d611e1022baecd75c69088 · apache/tvm · GitHub. This causes a program crash when doing repeated inferences with dynamic batch size. See [Bug] PyTorch MaskRCNN GPU OOM error · Issue #8233 · apache/tvm · GitHub for a minimal repro.

It seems there are two issues with it:

AllocDataSpace can be called outside of PooledAllocator, by NDArray::Empty(...) tvm/ndarray.cc at 4d9bc9b4a3e9e8d3420efe60a52964fcd4c29c8d · apache/tvm · GitHub. That call is not protected by try/catch, so if almost all memory are held by PooledAllocator and NDArray::Empty is called, the program crashes with the following error:

terminate called after throwing an instance of 'tvm::runtime::InternalError'
  what():  [19:12:54] /home/masa/projects/dev/tvm/src/runtime/vulkan/vulkan_stream.cc:123: 
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
  Check failed: (__e == VK_SUCCESS) is false: Vulkan Error, code=-13: Unknown Vulkan error code
Stack trace:
  0: tvm::runtime::vulkan::VulkanStream::Synchronize()
  1: _ZN3tvm7runtime6vulkan15VulkanDeviceAPI13FreeDataSpac
  2: tvm::runtime::NDArray::Internal::DefaultDeleter(tvm::runtime::Object*)
  3: tvm::runtime::NDArray::CopyTo(DLDevice const&) const
  4: tvm::runtime::vm::CopyTo(tvm::runtime::ObjectRef, DLDevice const&)
  5: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::runtime::vm::VirtualMachine::GetFunction(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::$_6>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)
  6: TVMFuncCall

Even if I fix the above problem by making sure that all allocations go through PooledAllocator, my program still crashes due to too much allocation of host memory (haven’t looked into why so much host memory is allocated when I’m running on a GPU target). Also, if I use the CPU target, the program is just killed after reaching the memory limit and before try/catch succeeds in catching memory allocation faiulure.

So I think we need a better way to decide when to call ReleaseAll() early if necessary. Should we add a device API to query the max available memory and call ReleaseAll() when we reach say 90% ? This doesn’t work if other memory-hungry processes are in use…

cc @ganler @comaniac @yuchenj @trevor-m for thought.

ganler · August 23, 2021, 3:17pm

@masahi Yea, I also found this issue a few months ago. If there’s an OOM, the exception will just flee… So I added another try/catch block and tried to fix that by calling ReleaseAll when OOM. The exception issue is very weird and I was not able to debug it (the exception just fled away and I cannot catch it during GDB).

I am not sure if calling ReleaseAll in advance could help. What about creating a global memory state per device (but it gonna be a big change)? Or simply unifying all memory allocation into a “PoolAllocator” (just like what TensorFlow did) which also enables users to control the memory limit. Or let’s say the memory pool should not hold a super huge memory chunk (e.g., 1 GB).