The GPU peak FLOPS measured by tvm is half the result mesured by clpeak

As the title said, clpeak shows ~470 GFLOPS for fp32 and ~920 for fp16, but TVM shows ~255 for fp32 and ~510 for fp16.

I’m using RK3588 SoC, the GPU is Mali-G610 MP4.