[Tensorflow][INT8][cuDNN] CUDNN_STATUS_BAD_PARAM

When I enabled cudnn as the gpu backend on main branch, it failed to run int8 mode with below error.

Here is the test script,

import tvm
from tvm import relay
from tvm.contrib import graph_executor

import numpy as np
import time
import tensorflow as tf
try:
    tf_compat_v1 = tf.compat.v1
except ImportError:
    tf_compat_v1 = tf

model_path="/mnt/tvm-benchmark/pb_model/resnet50_fp32_pretrained_model.pb"
with tf_compat_v1.gfile.GFile(model_path, "rb") as f:
    graph_def = tf_compat_v1.GraphDef()
    graph_def.ParseFromString(f.read())

mod, params = relay.frontend.from_tensorflow(graph_def,
        layout="NHWC",
        shape={"input": [1, 224, 224, 3]})

with relay.quantize.qconfig(calibrate_mode="global_scale", global_scale=8.0):
    mod = relay.quantize.quantize(mod, params)

#target = tvm.target.Target("cuda -model=t4", host="llvm -mcpu=cascadelake")
target = tvm.target.Target("cuda -model=t4 -libs=cudnn", host="llvm -mcpu=cascadelake")
dev = tvm.cuda(0)
with tvm.transform.PassContext(opt_level=3):
    lib = relay.build(mod, target, params=params)

m = graph_executor.GraphModule(lib["default"](dev))
m.set_input("input", tvm.nd.array(np.random.rand(1, 224, 224, 3).astype("float32")))

# Warm up
m.run()

And the error is,

  File "/mnt/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 81, in cfun
    rv = local_pyfunc(*pyargs)
  File "/mnt/tvm/python/tvm/relay/op/strategy/generic.py", line 240, in _compute_conv2d
    return [topi_compute(*args)]
  File "/mnt/tvm/python/tvm/autotvm/task/topi_integration.py", line 165, in wrapper
    node = topi_compute(cfg, *args)
  File "/mnt/tvm/python/tvm/topi/cuda/conv2d.py", line 134, in conv2d_cudnn
    groups=groups,
  File "/mnt/tvm/python/tvm/contrib/cudnn.py", line 357, in conv_forward
    groups,
  File "/mnt/tvm/python/tvm/contrib/cudnn.py", line 232, in conv_output_shape
    groups,
  File "/mnt/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 237, in __call__
    raise get_last_ffi_error()
  2: TVMFuncCall
  1: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), tvm::contrib::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#3}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)
  0: tvm::contrib::OutputShape(int, int, int, int const*, int const*, int const*, int const*, int const*, void*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
  File "/mnt/tvm/src/runtime/contrib/cudnn/conv_forward.cc", line 174
TVMError: 
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------

  Check failed: e == CUDNN_STATUS_SUCCESS (3 vs. 0) : cuDNN: CUDNN_STATUS_BAD_PARAM

BTW, I have to revert the commit 53370b9 according to [Pytorch] [Quantization] Error during quantization - #7 by masahi, otherwise, the following error encounted,

TVMError: 
---------------------------------------------------------------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
  Check failed: (it != type_definitions.end()) is false: There is no definition of List

With the change [cuDNN]Fix cudnn param error for int8 nhwc mode by kehuanfeng · Pull Request #8265 · apache/tvm · GitHub, I can get cudnn int8 working.

cc @masahi

It’s not a valid fix, since the filter layout for tensorflow is hwio, while cudnn api expects oihw format, so there should be an explicit transpose existing. As to where it should be, I am not sure as of now.