--------------------------------------------------
Network Name Mean Inference Time (std dev)
--------------------------------------------------
squeezenet v1.1 204.01 ms (1.97 ms)
mobilenet 412.53 ms (79.38 ms)
resnet-18 775.99 ms (46.59 ms)
These appear noticeably slower than the ones reported on the repo page (shown below for reference):
--------------------------------------------------
Network Name Mean Inference Time (std dev)
--------------------------------------------------
squeezenet v1.1 92.34 ms (0.07 ms)
mobilenet 145.22 ms (0.11 ms)
resnet-18 325.06 ms (0.23 ms)
I’ve tried two Raspberry Pi 3Bs and two different host CPUs but cannot reproduce the reported results. Has anyone encountered this?
Can you report the individual collected measurements using results? It seems the variance of your measurement is very high (you can try increasing the --number parameter in this case).
Also, if you are running all of these tests one after the other on a small number of devices, then it is likely that your raspberry pis are thermal throttling due to the continued (stress)-testing unless you are using an effective aftermarket cooling solution.
--------------------------------------------------
Network Name Mean Inference Time (std dev)
--------------------------------------------------
ProfileResult(mean=0.6405168693, results=(0.6238276397, 0.6342533386, 0.6634696296))
resnet-18 640.52 ms (16.78 ms)
I tried running a single inference run on a resnet-18 model on a new Raspberry Pi that hadn’t been continuously stress-tested, but even then, I did not observe any noticeable speedup in runtime…
Can you give some more details on your host compilation environment e.g., llvm version?
Try also turning the number down e.g., try number=1. Your settings of number=10 and repeat=3 are more than enough to cause throttling as it does 30 end-to-end inference in a row.
I have llvm-6.0 installed.
gcc target is x86_64-linux-gnu.
What other information about my host environment would be helpful?
With number=1 and repeat=3, I get:
--------------------------------------------------
Network Name Mean Inference Time (std dev)
--------------------------------------------------
ProfileResult(mean=0.6026539773333334, results=(0.604407877, 0.593555052, 0.609999003))
resnet-18 602.65 ms (6.83 ms)
With number=1 and repeat=1, I get:
--------------------------------------------------
Network Name Mean Inference Time (std dev)
--------------------------------------------------
ProfileResult(mean=0.609508829, results=(0.609508829,))
resnet-18 609.51 ms (0.00 ms)
Ok, those results suggest that throttling is not the main cause of the difference here.
We have had issues with different llvm versions, so if it is not too tedious I would recommend also trying llvm-4.0. I think @merrymercy can confirm which version of llvm these schedules were tuned with.
Thanks, I rebuilt tvm on my host machine with llvm-4.0.
When I run the benchmark with number=1 and repeat=1, I still get:
--------------------------------------------------
Network Name Mean Inference Time (std dev)
--------------------------------------------------
ProfileResult(mean=0.604963224, results=(0.604963224,))
resnet-18 604.96 ms (0.00 ms)
It seems neon is disabled by default in your llvm, because I can reproduce your results with neon disabled.
However, neon is enabled by default in my llvm. I will send a pr to add neon for all targets.
Thank you, I made that change but it did not resolve the problem for me:
--------------------------------------------------
Network Name Mean Inference Time (std dev)
--------------------------------------------------
ProfileResult(mean=0.605326326, results=(0.605326326,))
resnet-18 605.33 ms (0.00 ms)
One quick way for diagnosis if it is a(host) software environment problem or problem of the pi3b, is to use the docker image. You can follow https://github.com/dmlc/tvm/tree/master/docker to do
docker/bash.sh tvmai/ci-gpu
Then build the tvm inside the environment, with llvm-config set to llvm-config-4.0, this will give you exactly the same software env that is being used in the build. If the problem persists, then we can at least confirm that the problem is on the device side.