LLVM ERROR when tracing MaskRcnn model in PyTorch Object Detection tutorial

daniperfer · May 24, 2021, 8:38am

Hi:

I am trying to follow the tutorial in tutorials/frontend/deploy_object_detection_pytorch.py (from here), but got the following error: LLVM ERROR: out of memory. Aborted (core dumped).

First of all, I have built and installed TVM according to the steps described in the Host setup and docker build section of Vitis-AI integration tutorial (Vitis-AI Integration — tvm 0.8.dev0 documentation).

I used the 0.8.dev0 version of TVM, and:

torch version 1.7.0
torchvision version 0.8.1

Then, I launched the tutorial script under the pytorch conda environment, inside docker (tvm.demo_vitis_ai):

python tutorials/frontend/deploy_object_detection_pytorch.py

And I got the following LLVM out of memory error…

(my-vitis-ai-pytorch) Vitis-AI ~/tvm > python tutorials/frontend/deploy_object_detection_pytorch.py
/home/vitis-ai-user/.conda/envs/my-vitis-ai-pytorch/lib/python3.6/site-packages/torch/tensor.py:593: RuntimeWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
  'incorrect results).', category=RuntimeWarning)
/home/vitis-ai-user/.conda/envs/my-vitis-ai-pytorch/lib/python3.6/site-packages/torch/nn/functional.py:3123: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  dtype=torch.float32)).float())) for i in range(dim)]
/home/vitis-ai-user/.conda/envs/my-vitis-ai-pytorch/lib/python3.6/site-packages/torchvision/models/detection/anchor_utils.py:147: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  torch.tensor(image_size[1] // g[1], dtype=torch.int64, device=device)] for g in grid_sizes]
/home/vitis-ai-user/.conda/envs/my-vitis-ai-pytorch/lib/python3.6/site-packages/torchvision/ops/boxes.py:128: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  boxes_x = torch.min(boxes_x, torch.tensor(width, dtype=boxes.dtype, device=boxes.device))
/home/vitis-ai-user/.conda/envs/my-vitis-ai-pytorch/lib/python3.6/site-packages/torchvision/ops/boxes.py:130: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  boxes_y = torch.min(boxes_y, torch.tensor(height, dtype=boxes.dtype, device=boxes.device))
/home/vitis-ai-user/.conda/envs/my-vitis-ai-pytorch/lib/python3.6/site-packages/torchvision/models/detection/transform.py:271: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  for s, s_orig in zip(new_size, original_size)
/home/vitis-ai-user/.conda/envs/my-vitis-ai-pytorch/lib/python3.6/site-packages/torchvision/models/detection/roi_heads.py:372: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  return torch.tensor(M + 2 * padding).to(torch.float32) / torch.tensor(M).to(torch.float32)
LLVM ERROR: out of memory
Aborted (core dumped)

This error message suggests that something may be wrong with LLVM installation, so I repeated again the process using a different LLVM installation. The docker tvm.demo_vitis_ai includes several pre-installed versions of LLVM (4.0, 6.0, 7.0, 8.0, and 9.0). I tried all of them by means of selecting each one in the cmake configuration file before building TVM (using the appropiate value for USE_LLVM cmake variable). However, I’ve always got an error when running the tutorial script. With LLVM versions 7 and 8 I got the LLVM out of memory error. And with versions 4, 6, and 9 I got an munmap_chunk(): invalid pointer. Aborted (core dumped) error.

(my-vitis-ai-pytorch) Vitis-AI ~/tvm > python tutorials/frontend/deploy_object_detection_pytorch.py
/home/vitis-ai-user/.conda/envs/my-vitis-ai-pytorch/lib/python3.6/site-packages/torch/tensor.py:593: RuntimeWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
  'incorrect results).', category=RuntimeWarning)
/home/vitis-ai-user/.conda/envs/my-vitis-ai-pytorch/lib/python3.6/site-packages/torch/nn/functional.py:3123: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  dtype=torch.float32)).float())) for i in range(dim)]
/home/vitis-ai-user/.conda/envs/my-vitis-ai-pytorch/lib/python3.6/site-packages/torchvision/models/detection/anchor_utils.py:147: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  torch.tensor(image_size[1] // g[1], dtype=torch.int64, device=device)] for g in grid_sizes]
/home/vitis-ai-user/.conda/envs/my-vitis-ai-pytorch/lib/python3.6/site-packages/torchvision/ops/boxes.py:128: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  boxes_x = torch.min(boxes_x, torch.tensor(width, dtype=boxes.dtype, device=boxes.device))
/home/vitis-ai-user/.conda/envs/my-vitis-ai-pytorch/lib/python3.6/site-packages/torchvision/ops/boxes.py:130: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  boxes_y = torch.min(boxes_y, torch.tensor(height, dtype=boxes.dtype, device=boxes.device))
/home/vitis-ai-user/.conda/envs/my-vitis-ai-pytorch/lib/python3.6/site-packages/torchvision/models/detection/transform.py:271: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  for s, s_orig in zip(new_size, original_size)
/home/vitis-ai-user/.conda/envs/my-vitis-ai-pytorch/lib/python3.6/site-packages/torchvision/models/detection/roi_heads.py:372: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
  return torch.tensor(M + 2 * padding).to(torch.float32) / torch.tensor(M).to(torch.float32)
munmap_chunk(): invalid pointer
Aborted (core dumped)

I’ve also tried LLVM versions 11 and 12, by using the automatic installing script provided in LLVM webpage. However, with versions 11 and 12 I also got the munmap_chunk(): invalid pointer.

Finally, I’ve also tried to build LLVM from source code, according to the steps described in LLVM webpage. But I got the same result: LLVM out ot memory error when using LLVM versions 7 and 8; and munmap_chunk(): invalid pointer error when using any other LLVM version.

I would like to ask if anyone has any thoughts about what may be amiss, or if I might have overlooked something during the process.

abdulazizm · December 14, 2021, 1:09pm

I too facing the same issue… tried installing llvm 13 but no luck.

masahi · December 15, 2021, 3:21am

Looking at this now, I believe this is a symptom of the issue discussed in https://github.com/apache/tvm/issues/9362. A quick workaround to try is to swap the import order of pytorch and TVM so that pytorch is imported first. If that doesn’t work or is not possible, you can do the solution in https://github.com/apache/tvm/issues/9362#issuecomment-955263494

abdulazizm · December 16, 2021, 6:59am

Thanks @masahi . importing torch first resolves this LLVM issue