cudaMemcpy() fails in TensorRT INT8 calibration

I am trying to quantize and run a model via Apache TVM with the TensorRT backend and int8 calibration. A call to cudaMemcpy() in the TensorRTCalibrator causes an error “CUDA invalid image”, which I think happens, because of varying batchsizes/input dimensions.

I don’t completely understand the TensorRT calibration process. It looks like the defined number of calibration runs (defined by env TENSORRT_NUM_CALI_INT8) are executed successfully. These are invoked by: tvm_data = tvm.nd.array(input) module.set_input(input_name, tvm_data)

After that I am trying to invoke the, now calibrated, inference by calling module.set_input() and again and now cudaMemcpy() fails. With print debugging, I’ve seen, that the batch_size in the tensorrt runtime does not match the previously used batch_size of 1. Also the size/wordcount in the cudaMemcpy() has changed. Apparently, cudaMemcpy() is trying to copy more data than what is actually expected/allocated and fails.

I would be happy about any help!

I don’t know if this help but here is TVM’s error output: terminate called after throwing an instance of ‘tvm::runtime::InternalError’ what(): [22:37:29] …/06_tvm_benchmarking/tvm_benchmarking/tvm/src/runtime/contrib/tensorrt/tensorrt_calibrator.h:81: InternalError: Check failed: (e == cudaSuccess || e == cudaErrorCudartUnloading) is false: CUDA: invalid argument Stack trace: 0: tvm::runtime::TensorRTCalibrator::getBatch(void**, char const**, int) 1: 0x00007fe73e18c7f4 2: 0x00007fe73e2f7126 3: 0x00007fe73e1564e5 4: 0x00007fe73e15b4ee 5: 0x00007fe73e15be20 6: tvm::runtime::contrib::TensorRTBuilder::BuildEngine() 7: tvm::runtime::contrib::TensorRTRuntime::BuildEngineFromJson(int) 8: tvm::runtime::contrib::TensorRTRuntime::GetOrBuildEngine() 9: tvm::runtime::contrib::TensorRTRuntime::Run() 10: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::json::JSONRuntimeBase::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtrtvm::runtime::Object const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#3}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) 11: std::_Function_handler<void (), tvm::runtime::GraphExecutor::CreateTVMOp(tvm::runtime::TVMOpParam const&, std::vector<DLTensor*, std::allocator<DLTensor*> > const&)::{lambda()#3}>::_M_invoke(std::_Any_data const&) 12: tvm::runtime::GraphExecutor::Run() 13: tvm::runtime::LocalSession::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)> const&) 14: tvm::runtime::RPCWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const

Aborted (core dumped)


TensorRT Version: GPU Type: A100 Nvidia Driver Version: CUDA Version: 11.8 CUDNN Version: 8.9.0 Operating System + Version: Ubuntu 20.04 Python Version (if applicable): 3.8.10 TensorFlow Version (if applicable): 2.13.1