TVM opencl runtime for Adreno GPU

VarunGupta · April 15, 2024, 4:34am

I’m trying to run resnet-50 model on Adreno GPU but performance isn’t coming out to be that good. I’ve built TVM with opencl as backend. I’ve tried changing the texture_spacial_limit but it’s not having any effect on the inference time. I’ve few doubts regarding TVM for adreno gpu :

Does tvm know how to efficiently use texture processor that’s there within Adreno?
How to efficiently run NN on adreno gpu without OpenCLML?

OS : QNX 7.1 Target : Qualcomm Board @srkreddy1238

srkreddy1238 · April 20, 2024, 4:07am

Spatial limit affect texture memory compatibility only. Most of Qualcomm platfoms its 16K. We shouldn’t be increasing beyond hardware capability. Increasing this limit need not be necessarily increase performance.

Texture enablement happens with target being “opencl -device=adreno”. Auto tuning refines the kernels further.

VarunGupta · April 22, 2024, 6:33am

Thanks for your reply I’m getting following results with TVM on QNX7.1 os with Adreno 663 GPU for resnet-50 model :

Inference Time	Max_threads_per_block	Max_num_threads	Max_shared_mem_per_block	Texture_spatial_limit
42.78	8	256	8192	16384
42.49	8	256	8192	8192

Increasing texture_spatial_limit had no impact on inference time. Is this observation in sync with how texture processor should work? With tuning I’m getting improvement in inference time but still by default TVM is unable to use full capability of texture processor.

srkreddy1238 · April 22, 2024, 9:34am

Hi Varun,

Increased texture spatial limit only allows tensors with higher dimensions to use clImages (accessed texture hardware block). Resnet-50 may have tiny tensor shapes that fit into any of these limits (8K or 16K). Hence, performance may not have impacted here.