Following the addition of new tunable GPU schedules for Mali, I was wondering whether there are plans to introduce schedules for newer Mali GPUs, particularly those using the Bifrost architecture. I’ve done some of my own experiments using a Hikey960 (G71 MP8 GPU) and can find a few schedules that take better advantage of the new architecture (up to about 2x compared to the existing Mali schedules). This would also probably make for a fairer comparison against the arm compute library which appears to be mostly focused on optimising for the newer GPUs now.
This is true. We are aware of this. We also have a Hikey960 board and some huawei phones which have Mali Bitfrost gpu. But currently we didn’t spend too much time on this.
Your contribution is welcome. For example, you can share the optimizations for bitfrost architecture. You can send tunable templates or simple recipes (for a single gemm or conv2d) to TOPI and show benchmark numbers. We are happy to test and improve it together.
Can you share your optimization? Did you tweak the parameters or use a totally different schedule template?