The graph_runtime.create() will crash with OOM, when batch_size more than 20 and target =cuda

Bug description:
When the model is inception_v3 , the target is cuda, and the batch_size is relatively large (such as 30), the below script crashed.

The statement graph_runtime.create(graph, lib, ctx) trigger this crash. The crash messages report out of memory (OOM), but the server’s memory is large enough.

The factors of trigger this bug.

  1. model: inception_v3-imagenet.h5
  2. batch_size: >20
  3. target =cuda

In order to confirm that the crash has nothing to do with other factors, 3 comparative experiments were carried out:

  • experiment-1: Change target to cpu, and set batch_size to a very large value (such as 30, 50, and 100)---->no crash, only target=cuda can trigger this bug!
  • experiment-2: changing to a different model, —> no crash, only ** xception_v3** can trigger this bug!
  • experiment-3: change the batch_size —> when batch_size is set to a very small number(such as 1,2,3,4), no bug! However, when batch_size is set to a related large number(such as 25, 30, 100), It crashed !

Crash messages:

Environment:
TVM : 0.8.dev0
OS: Ubuntu 16.04
GPU: GeForce GTX 1080 Ti
Cuda: 10.1
Mem: 125G

The reproducable script

import keras
import os
import tvm
from tvm import te
import tvm.relay as relay
import numpy as np
from PIL import Image
import tvm.runtime as runtime
from tvm.contrib import graph_runtime

input_tensor = 'input_6'

def image_resize(x, shape):
   x_return = []
   for x_test in x:
       tmp = np.copy(x_test)
       img = Image.fromarray(tmp.astype('uint8')).convert('RGB')
       img = img.resize(shape, Image.ANTIALIAS)
       x_return.append(np.array(img))
   return np.array(x_return)


input_precessor = keras.applications.vgg16.preprocess_input
input_shape = (32,32)
dataset_dir =  "/share_container/data/dataset/"
data_path = os.path.join(dataset_dir, "imagenet-val-1500.npz")
data = np.load(data_path)
x, y = data['x_test'], data['y_test']

x_resize = image_resize(np.copy(x),input_shape)
x_test = input_precessor(x_resize)
y_test = keras.utils.to_categorical(y, num_classes=1000)

model_path = '/share_container/data/keras_model/inception_v3-imagenet_origin.h5'
predict_model = keras.models.load_model(model_path)
print(predict_model.input)
print(predict_model.summary())
shape_dict = {input_tensor: (25, 3, 299, 299)}
irmod, params = relay.frontend.from_keras(predict_model, shape_dict)
target = 'cuda'
ctx = tvm.gpu(0)

with tvm.transform.PassContext(opt_level=0, disabled_pass= None):
   graph, lib, params = relay.build_module.build(irmod, target, params=params)
   
   module = graph_runtime.create(graph, lib, ctx)  # crash here !!!

You can receive the model by this link:

Looks like not a bug to me. Although you have 125GB memory on GPU, the shared memory is still limited to several MBs, so it is likely to crash when executing a certain op with large batch size.

1 Like

@sqchao Thanks for reporting your issue. You posted a script based on a keras model, but the model file you presented was from ONNX?

The ONNX model is failing due to a dynamic pad, I’m trying to figure out where that’s coming from.

This fixes the onnx model with the following script, but I can’t reproduce the OOM. If you can send me the h5 file, I’ll give it a try.

import tvm
from tvm import relay
import onnx

shape_dict = {"input_7": (25, 299, 299, 3)} 
target = 'cuda'
ctx = tvm.gpu(0)

onnx_model = onnx.load("xception-imagenet_origin.onnx")
irmod, params = relay.frontend.from_onnx(onnx_model, shape_dict, freeze_params=True)

with tvm.transform.PassContext(opt_level=0, disabled_pass= None):
   graph, lib, params = relay.build_module.build(irmod, target, params=params)
   module = graph_runtime.create(graph, lib, ctx)  # crash here !!!

As a note, the onnx model is expecting NHWC, so I had to transpose the input data. I’m wondering if the keras model is also expected NHWC instead of the NCHW in your script?

@mbrookhart I am sorry upload a wrong model. You can receive the keras model in this link now:

BTW, the keras model is expected NCHW. Thanks for you help.

I also come cross this bug, and I was thinking about asking if this is a bug. Thanks for you fix.

BTW, this bug is related with opt_level. only when opt_levele = 0,1,2 can trigger this bug and opt_level =3 will not trigger this bug.

In addition to the bug you mentioned, I also found more than one bugs caused by dynamic op. I am happy to provide more trigger scenarios to help you better fix

You posted xception-imagenet_origin.h5 but your script is calling for inception_v3-imagenet_origin.h5?

Dynamic shapes are often a feature not a bug, and you need to run them on the vm instead of the graph runtime. If you expect the model to be static, and it still had dynamic shapes after DynamicToStatic, that might be a bug.

ONNX is a super dynamic framework, they define everything in terms of dynamic shapes, the PR I posted is just to catch more static-ification at import time instead of compile time.

The inception_v3-imagenet_origin.h5 link :

Thanks for your reply.

My server has 2 GPUs. When I use another GPU by replacing ctx = tvm.gpu(0) with ctx=tvm.gpu(1), bug disappered. It is very strange!

problem solved! This crash is related with cuda rather than TVM. @comaniac comment is right.

Thank you very mach. @comaniac @mbrookhart