Tune_conv2d_cuda.py erro

Hi,
When I test the [tune_conv2d_cuda.py] sample, some erros are reported:
Best config:
[(‘tile_f’, [8, 2, 8, 2]), (‘tile_y’, [7, 2, 1, 2]), (‘tile_x’, [1, 1, 28, 1]), (‘tile_rc’, [128, 2, 2]), (‘tile_ry’, [1, 3, 1]), (‘tile_rx’, [1, 3, 1]), (‘auto_unroll_max_step’, 0), (‘unroll_explicit’, 1)],None,453139402
Finish loading 10724 records
Traceback (most recent call last):
File “tune_conv2d_cuda.py”, line 208, in
func(a_tvm, w_tvm, c_tvm)
File “/home/yzw/tvm/python/tvm/_ffi/function.py”, line 128, in call
return f(*args)
File “/home/yzw/tvm/python/tvm/_ffi/_ctypes/function.py”, line 184, in call
ctypes.byref(ret_val), ctypes.byref(ret_tcode)))
File “/home/yzw/tvm/python/tvm/_ffi/base.py”, line 66, in check_call
raise TVMError(py_str(_LIB.TVMGetLastError()))
tvm._ffi.base.TVMError: [13:20:02] /home/yzw/tvm/src/runtime/module_util.cc:52: Check failed: ret == 0 (-1 vs. 0) Assert fail: (int32(arg0.shape[1]) == 512), Argument arg0.shape[1] has an unsatisfied constraint

Stack trace returned 10 entries:
[bt] (0) /home/yzw/tvm/build/libtvm.so(dmlc::StackTraceabi:cxx11+0x5b) [0x7f08de935dab]
[bt] (1) /home/yzw/tvm/build/libtvm.so(+0xa029ce) [0x7f08ded529ce]
[bt] (2) /home/yzw/tvm/build/libtvm.so(TVMFuncCall+0x5e) [0x7f08ded3778e]
[bt] (3) /home/lychee/anaconda3/envs/yzw/lib/python3.7/lib-dynload/…/…/libffi.so.6(ffi_call_unix64+0x4c) [0x7f08e43d5ec0]
[bt] (4) /home/lychee/anaconda3/envs/yzw/lib/python3.7/lib-dynload/…/…/libffi.so.6(ffi_call+0x22d) [0x7f08e43d587d]
[bt] (5) /home/lychee/anaconda3/envs/yzw/lib/python3.7/lib-dynload/_ctypes.cpython-37m-x86_64-linux-gnu.so(_ctypes_callproc+0x2ce) [0x7f08e45eaf8e]
[bt] (6) /home/lychee/anaconda3/envs/yzw/lib/python3.7/lib-dynload/_ctypes.cpython-37m-x86_64-linux-gnu.so(+0x129c4) [0x7f08e45eb9c4]
[bt] (7) python(_PyObject_FastCallKeywords+0x49b) [0x55f94951211b]
[bt] (8) python(_PyEval_EvalFrameDefault+0x523e) [0x55f94957e2ce]
[bt] (9) python(_PyEval_EvalCodeWithName+0x2e8) [0x55f9494b7528]

test case: N, H, W, CO, CI, KH, KW, strides, padding = 1, 28, 28, 512, 256, 3, 3, (1, 1), (1, 1), the erro can recurrent

Looks like something wrong with the shapes; can you print the shapes of

# check correctness
a_np = np.random.uniform(size=(N, CI, H, W)).astype(np.float32)
w_np = np.random.uniform(size=(CO, CI, KH, KW)).astype(np.float32)
c_np = conv2d_nchw_python(a_np, w_np, strides, padding)

?

The shapes are:

a_np shape:
(1, 3, 108, 108)
w_np shape:
(64, 3, 3, 3)
c_np shape:
(1, 64, 54, 54)

How are you changing the shape parameters N, H, W, CO, CI, KH, KW, strides, padding? It looks like these do not match with the tuning test case you described (N, H, W, CO, CI, KH, KW, strides, padding).

To be safe, I would only set these parameters once, just like the example does.

When I run the original case of [tune_conv2d_cuda.py],it’s ok. Now I want to test some other case comparing the performance of DeepBench. In the test, some case can run successfully, but some case run failed, and the erros reported at the same time. The failed cases are as follows, can you help me to reproduce and verify them?

N, H, W, CO, CI, KH, KW, strides, padding = 1, 224, 224, 64, 3, 3, 3, (1, 1), (1, 1)
N, H, W, CO, CI, KH, KW, strides, padding = 1, 112, 112, 128, 64, 3, 3, (1, 1), (1, 1)
N, H, W, CO, CI, KH, KW, strides, padding = 1, 28, 28, 512, 512, 3, 3, (1, 1), (1, 1)

Sorry, I made a mistake in that script.
Fixed by https://github.com/dmlc/tvm/pull/1641

BTW, this script is not very good at 3x3 kernels because it does not use winograd. This pr https://github.com/dmlc/tvm/pull/1638 adds winograd template.

Thanks, I will test them,and give the feedback to you.