After PR #4353 we are able to run tensorcore based convolution using CUDNN in TVM for fp16 and int8. But when I run testing file test_cudnn.py
, fp16
convolution gave me flaky wrong results sometimes and the timing is always -1ms. I wonder what’s the cause for the strange results. @Hzfengsy @masahi
Here’s the results when I ran verify_conv2d("float16", "float32", tensor_format=1)
on Tesla T4 GPU:
I changed the input shape as follows:
in_channel = 512
out_channel = 512
filter_h = 3
filter_w = 3
pad_h = 1
pad_w = 1
stride_h = 1
stride_w = 1
dilation_h = 1
dilation_w = 1
batch = 1
height = 7
weight = 7
Sometimes, it gave me mismatch error as follows:
Mismatched elements: 1 / 25088 (0.00399%)
Max absolute difference: 8.17340421e-05
Max relative difference: 0.03795133
x: array([[[[-13.311087, 26.494438, -25.143475, ..., 11.120489,
0.849933, -5.120694],
[-10.676369, -19.9305 , -11.853168, ..., -8.573727,...
y: array([[[[-13.311075, 26.494409, -25.143463, ..., 11.120492,
0.849936, -5.120693],
[-10.676372, -19.930494, -11.853153, ..., -8.57373 ,...
When the results are correct, the timing is strange:
[20:35:09] /home/ubuntu/workplace/tvm-1/src/runtime/contrib/cudnn/conv_forward.cc:344: CUDNN Found 8 fwd algorithms, choosing CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM
[20:35:09] /home/ubuntu/workplace/tvm-1/src/runtime/contrib/cudnn/conv_forward.cc:347: 0) CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_GEMM - time: -1 ms, Memory: 0
[20:35:09] /home/ubuntu/workplace/tvm-1/src/runtime/contrib/cudnn/conv_forward.cc:347: 1) CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM - time: -1 ms, Memory: 0
[20:35:09] /home/ubuntu/workplace/tvm-1/src/runtime/contrib/cudnn/conv_forward.cc:347: 2) CUDNN_CONVOLUTION_FWD_ALGO_IMPLICIT_PRECOMP_GEMM - time: -1 ms, Memory: 0
[20:35:09] /home/ubuntu/workplace/tvm-1/src/runtime/contrib/cudnn/conv_forward.cc:347: 3) CUDNN_CONVOLUTION_FWD_ALGO_GEMM - time: -1 ms, Memory: 0
[20:35:09] /home/ubuntu/workplace/tvm-1/src/runtime/contrib/cudnn/conv_forward.cc:347: 4) CUDNN_CONVOLUTION_FWD_ALGO_DIRECT - time: -1 ms, Memory: 0
[20:35:09] /home/ubuntu/workplace/tvm-1/src/runtime/contrib/cudnn/conv_forward.cc:347: 5) CUDNN_CONVOLUTION_FWD_ALGO_FFT - time: -1 ms, Memory: 0
[20:35:09] /home/ubuntu/workplace/tvm-1/src/runtime/contrib/cudnn/conv_forward.cc:347: 6) CUDNN_CONVOLUTION_FWD_ALGO_FFT_TILING - time: -1 ms, Memory: 0
[20:35:09] /home/ubuntu/workplace/tvm-1/src/runtime/contrib/cudnn/conv_forward.cc:347: 7) CUDNN_CONVOLUTION_FWD_ALGO_WINOGRAD - time: -1 ms, Memory: 0
When I ran verify_conv2d("int8", "int32", tensor_format=1)
, no output info except:
/home/ubuntu/workplace/tvm-1/python/tvm/driver/build_module.py:259: UserWarning: Specified target cuda, but cannot find device code, did you do bind?
"bind?" % target)