when i try to convert onnx2tvm, the warning is:
WARNING:autotvm:Cannot find config for target=cuda, workload=(‘conv2d_transpose_nchw’, (1, 256, 64, 64, ‘float32’), (256, 128, 3, 3, ‘float32’), (2, 2), (1, 1), ‘float32’). A fallback configuration is used, which may bring great performance regression.
And, i use the convert so to run, module.run() is very fast but module.get_output(0).asnumpy() is very slow, so the total time cost much
Nothing is wrong. When you call module.run(), you just put all your cuda kernels into a default cuda stream. And when you call module.get_output(0).asnumpy(), it will call a cuda memory copy function, which is a synchronized function, so you will wait for all the computation in the dafault cuda stream is done.
I mean the time between the point when you call module.run() and the point when you get the result from module.get_output(0).asnumpy() is your module need to finish all the computation, so there is nothing to speed up.
Try to use autotvm to find the cuda configuration for your network and hardware, which will be stored in a log file. And use that log file when you call relay.build.