Can you share how you are compiling and running the model?
Note that if you use add -libs=cudnn you will not use TVM to generate the GPU kernels, so there should be essentially no difference with using darknet.
and set use GPU mode
GPU = 1
if not GPU:
target = ‘llvm’
ctx = tvm.cpu(0)
else: #target = tvm.target.cuda() target = 'cuda -libs=cudnn, cublas' #target = ‘cuda -libs=cudnn’
ctx = tvm.gpu(0)
then save the model
from tvm.contrib import util
path_lib = “./data/deploy_lib.tar”
lib.export_library(path_lib)
with open("./data/deploy_graph.json", “w”) as fo:
fo.write(graph.json())
with open("./data/deploy_param.params", “wb”) as fo:
fo.write(nnvm.compiler.save_param_dict(params))
last step loaded saved model again and test
ctx = tvm.gpu(0)
data = nnvm.testing.darknet.load_image(test_image, netw, neth)
The autotvm warning should not be an issue as -libs=cudnn is being used. Can you try using a time evaluator instead to do the timing? I am not sure if there is some other overhead or if there is some dynamic compilation time being included that only occurs on the first run, and this can affect the timing results with your measurement method. You can use the time evaluator with something like:
f = m.module.time_evaluator('run', ctx)
results = f()
where results should give you the running time in seconds. I get ~17ms out-of-the-box with cuDNN on an RTX 2080 Ti.
That is fine, but what are the results you get with later runs? That should agree with the time evaluator result, which ignores the first run.
Note that the first run can be slower for many different seasons such as JIT compilation, etc… This is typical and expected behavior for many different framework/backend combinations.
The warning messages of “autotvm:Cannot find config for target=cuda” is due to the op name when you announce the tasks from the tutorial script tune_relay_cuda.py, which i changed to relay.nn.conv2d, in my case, and now the auto-tuning seems fine now.
I use the target = cuda , but the inference result is so slow using m.module.time_evaluator, when I add the target = ‘cuda -libs=cudnn, culabs’ It raise ValueError: Cannot find global function tvm.contrib.cudnn.conv2d.output_shape
It may be caused by your build tvm version without cudnn support.
Try switch the USE_CUDNN and USE_CUBLAS flag On in config.cmake, and rebuild your tvm.