Schedule template for Adreno GPU?

Hi TVM community,

I’m tuning resnet18 model on the Adreno GPU on an SDM660 device, with Adreno 512 GPU using OpenCL interface. I’ve tried to use different target strings, however the tuning would either finish with poor performancem, or does not complete at all.

With following combination

target = tvm.target.mali('sdm660')
target_host = 'llvm -target=arm64-linux-android'
cc = '/opt/android-ndk-r18b/build/tools/arm64-toolchain-api21/bin/aarch64-linux-android-clang++'

The best performance I got for any of the workloads in the resnet model was ~13.2GFLOPS as printed in debug mode, which is less than 6% of the the peak GLOPS of this GPU. In contrast the Kryo CPU will reach more than 88GFLOPS.

If I replace the target with following line

target = 'opencl -model=sdm660 -device=adreno'

I believe it greatly enlarges the search space, resulting in numerous RUN_TIMEOUT (err_no = 7) or invalid kernel error — correct me if I’m wrong. Somehow this will unstablize the device and RPC app always crash / disconnect before reaching even a single successful execution.

So my question is, is there a target string that defines a search space suitable for Adreno GPU, and / or yields better performance?

I think you should implement the Adreno GPU schedule, though both are OpenCL interface. Adreno GPU has different architecture compared with Mali, you can not rely Mali schedule giving you nice performance.