PyTorch runtime error


While compiling models from PyTorch I get a runtime error when running MobileNet_V2 on CUDA devices (2080ti in my case) Strangely, reducing the depth multiplier to less than one will execute the model with no errors.

The relevant section of the error looks like:

Check failed: ret == 0 (-1 vs. 0) : CUDAError: cuModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: CUDA_ERROR_INVALID_PTX

The minimal code to reproduce the error is:

import argparse
import numpy as np
import torch
from pytorchcv.model_provider import get_model
import tvm
from tvm import relay
from tvm.contrib import graph_runtime
if __name__ == '__main__':
    # Possible inputs 'mobilenetv2_w1', 'mobilenetv2_w3d4',
    # 'mobilenetv2_wd2', 'mobilenetv2_wd4', 'mobilenetv2b_w1',
    # 'mobilenetv2b_w3d4', 'mobilenetv2b_wd2', 'mobilenetv2b_wd4'
    parser = argparse.ArgumentParser(description='minimum breaking example')
    parser.add_argument('--model', type=str,
                        default=None,help='pytorchcv model name')
    args = parser.parse_args()
    model = get_model(args.model).cuda()
    bs = 2
    shape = (bs,3,224,224)
    dtype = 'float32'
    target = 'cuda'
    input = torch.rand(shape).cuda()
    input_name = "input0"
    shape_list = [(input_name, shape)]
    model_traced = torch.jit.trace(model, input)
    mod, params = relay.frontend.from_pytorch(model_traced, shape_list)
    with tvm.relay.build_config(opt_level=3):
        lib =, target=target, target_host='llvm')
    ctx = tvm.gpu()
    m = graph_runtime.GraphModule(lib["default"](ctx))
    m.set_input(input_name, tvm.nd.array(np.array(input.cpu()).astype(dtype)))

Examples for running the code:

python3 --model mobilenetv2_w1        // Fails
python3 --model mobilenetv2_w3d4    // Works

Tried with different optimization levels [0…3] all fail. Seems like a memory allocation problem but not sure if anyone else has seen this, has a solution or can point me to where in the code should I start looking into.