Hi, we are trying out tvm to accelerate inference of deep learning models on various devices, however, we find the benchmark results inconsistent with the wiki pages.
For example, resnet-18 takes 1.1ms
on 1080ti in this page:
However, when I run the benchmark results in tvmai/demo-gpu docker, it takes 6.7ms, which is 6x slower. Results for vgg-19 is also much slower than vgg-16 results on the wiki page.
linpengt@dev0002:~/tvm/apps/benchmark$ CUDA_VISIBLE_DEVICES=0 python3 gpu_imagenet_bench.py --model 1080ti
--------------------------------------------------
Network Name Mean Inference Time (std dev)
--------------------------------------------------
[09:52:22] /usr/tvm/src/runtime/threading_backend.cc:63: The thread affinity cannot be set when the number of workersis larger than the number of available cores in the system.
resnet-50 6.74 ms (0.42 ms)
mobilenet 1.04 ms (0.13 ms)
vgg-19 20.89 ms (0.68 ms)
Steps to produce:
- git clone the repository:
git clone --recursive https://github.com/dmlc/tvm
- run demo-gpu docker:
docker/bash.sh tvmai/demo-cpu
- run imagenet benchmark:
CUDA_VISIBLE_DEVICES=0 python3 gpu_imagenet_bench.py --model 1080ti
In additional, the demo program also failed to finish due to lack to inception_v3
model, but that might be another minor issue.
linpengt@dev0002:~/tvm/apps/benchmark$ CUDA_VISIBLE_DEVICES=0 python3 gpu_imagenet_bench.py --model 1080ti
--------------------------------------------------
Network Name Mean Inference Time (std dev)
--------------------------------------------------
[09:52:22] /usr/tvm/src/runtime/threading_backend.cc:63: The thread affinity cannot be set when the number of workersis larger than the number of available cores in the system.
resnet-50 6.74 ms (0.42 ms)
mobilenet 1.04 ms (0.13 ms)
vgg-19 20.89 ms (0.68 ms)
Traceback (most recent call last):
File "gpu_imagenet_bench.py", line 70, in <module>
benchmark(network, target)
File "gpu_imagenet_bench.py", line 19, in benchmark
net, params, input_shape, output_shape = get_network(network, batch_size=1)
File "/workspace/tvm/apps/benchmark/util.py", line 38, in get_network
net, params = nnvm.testing.inception_v3.get_workload(batch_size=batch_size, dtype=dtype)
AttributeError: module 'nnvm.testing' has no attribute 'inception_v3'
Here is the information on GPU and CUDA library:
linpengt@dev0002:~/tvm/apps/benchmark$ nvidia-smi
Mon Dec 24 09:53:02 2018
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 410.48 Driver Version: 410.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce GTX 108... Off | 00000000:04:00.0 Off | N/A |
| 21% 39C P8 17W / 250W | 10MiB / 11178MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
We also tried tvmai/ci-gpu
image but the same command have failed to run
$ tvm/docker/bash.sh tvmai/ci-gpu
$ cd tvm/apps/benchmark/
$ CUDA_VISIBLE_DEVICES=0 python gpu_imagenet_bench.py --model 1080ti
Traceback (most recent call last):
File "gpu_imagenet_bench.py", line 9, in <module>
import tvm
ImportError: No module named tvm
Let me know if more information is needed to debug the issue. Thanks!