When I wrote Tuning High Performance Convolution on NVIDIA GPUs according to the official tutorial,Segmentation fault (core dumped) occurs every time this code is run; func = mod[“main”] tasks = autotvm.task.extract_from_program(func, target=target, params=params, ops=(relay.op.nn.conv2d,))
When I run it alone for inference I notice these:
[10:07:59] src/nnvm/legacy_json_util.cc:209: Loading symbol saved by previous version v1.2.0. Attempting to upgrade… [10:07:59] src/nnvm/legacy_json_util.cc:217: Symbol successfully upgraded! Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw.cuda’, (‘TENSOR’, (1, 3, 112, 112), ‘float32’), (‘TENSOR’, (64, 3, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw_winograd.cuda’, (‘TENSOR’, (1, 3, 112, 112), ‘float32’), (‘TENSOR’, (64, 3, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw.cuda’, (‘TENSOR’, (1, 64, 112, 112), ‘float32’), (‘TENSOR’, (64, 64, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw_winograd.cuda’, (‘TENSOR’, (1, 64, 112, 112), ‘float32’), (‘TENSOR’, (64, 64, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw.cuda’, (‘TENSOR’, (1, 64, 112, 112), ‘float32’), (‘TENSOR’, (64, 64, 3, 3), ‘float32’), (2, 2), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw.cuda’, (‘TENSOR’, (1, 64, 112, 112), ‘float32’), (‘TENSOR’, (64, 64, 1, 1), ‘float32’), (2, 2), (0, 0, 0, 0), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw.cuda’, (‘TENSOR’, (1, 64, 56, 56), ‘float32’), (‘TENSOR’, (64, 64, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw.cuda’, (‘TENSOR’, (1, 64, 56, 56), ‘float32’), (‘TENSOR’, (128, 64, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw_winograd.cuda’, (‘TENSOR’, (1, 64, 56, 56), ‘float32’), (‘TENSOR’, (128, 64, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw.cuda’, (‘TENSOR’, (1, 128, 28, 28), ‘float32’), (‘TENSOR’, (128, 128, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw.cuda’, (‘TENSOR’, (1, 128, 28, 28), ‘float32’), (‘TENSOR’, (256, 128, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw_winograd.cuda’, (‘TENSOR’, (1, 128, 28, 28), ‘float32’), (‘TENSOR’, (256, 128, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw.cuda’, (‘TENSOR’, (1, 256, 14, 14), ‘float32’), (‘TENSOR’, (256, 256, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw.cuda’, (‘TENSOR’, (1, 256, 14, 14), ‘float32’), (‘TENSOR’, (512, 256, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw_winograd.cuda’, (‘TENSOR’, (1, 256, 14, 14), ‘float32’), (‘TENSOR’, (512, 256, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘conv2d_nchw.cuda’, (‘TENSOR’, (1, 512, 7, 7), ‘float32’), (‘TENSOR’, (512, 512, 3, 3), ‘float32’), (1, 1), (1, 1, 1, 1), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression. Cannot find config for target=cuda -model=unknown, workload=(‘dense_small_batch.cuda’, (‘TENSOR’, (1, 25088), ‘float32’), (‘TENSOR’, (512, 25088), ‘float32’), None, ‘float32’). A fallback configuration is used, which may bring great performance regression. create gragh_runtime take 3.6898140907287598 second set input and params take 0.18898606300354004 second run take 1.5326874256134033 second get_output take 4.76837158203125e-06 second (512,) dist : 1.9115334749221802
So I need to optimize these unconfigured operators?is it right? As these operators I cannot find in relay.op.nn, So I can’t use this method with autotvm.task.extract_from_program ()?So can these undefined operators be written by themselves?