Hello,
I have a single conv2D layer (keras), and I am measuring the inference time on a bagel bone device. I am using RPC to autotune and measure the inference time. I also manually cross compile and time the ‘run()’ function in c++ to see the performance.
The performance evaluation that rpc gives me and the one I get using ‘run()’ (and std::chrono::high_resolution_clock::now()) is quite different. rpc gives me a lower inference time.
Does anyone have an idea of why that might be?
Target: llvm -mtriple=arm-linux-gnueabihf -mcpu=cortex-a8