TVM opencl runtime for Adreno GPU

I’m trying to run resnet-50 model on Adreno GPU but performance isn’t coming out to be that good. I’ve built TVM with opencl as backend. I’ve tried changing the texture_spacial_limit but it’s not having any effect on the inference time. I’ve few doubts regarding TVM for adreno gpu :

  1. Does tvm know how to efficiently use texture processor that’s there within Adreno?

  2. How to efficiently run NN on adreno gpu without OpenCLML?

OS : QNX 7.1 Target : Qualcomm Board @srkreddy1238

Spatial limit affect texture memory compatibility only. Most of Qualcomm platfoms its 16K. We shouldn’t be increasing beyond hardware capability. Increasing this limit need not be necessarily increase performance.

Texture enablement happens with target being “opencl -device=adreno”. Auto tuning refines the kernels further.

Thanks for your reply I’m getting following results with TVM on QNX7.1 os with Adreno 663 GPU for resnet-50 model :

Inference Time Max_threads_per_block Max_num_threads Max_shared_mem_per_block Texture_spatial_limit
42.78 8 256 8192 16384
42.49 8 256 8192 8192

Increasing texture_spatial_limit had no impact on inference time. Is this observation in sync with how texture processor should work? With tuning I’m getting improvement in inference time but still by default TVM is unable to use full capability of texture processor.

Hi Varun,

Increased texture spatial limit only allows tensors with higher dimensions to use clImages (accessed texture hardware block). Resnet-50 may have tiny tensor shapes that fit into any of these limits (8K or 16K). Hence, performance may not have impacted here.