I have a network where I optimized it with autotvm for iphone using the gpu.
Upon deploying to the phone on the app I developed, the phone crashed during the Metal API validation step. It logged that the maximum threads (1024) were reached. -_[MTLDebugComputeCommandEncoder validateThreadsPerThreadgroup:]:906: failed assertion `(threadsPerThreadgroup.width(1) * threadsPerThreadgroup.height(34) * threadsPerThreadgroup.depth(34))(1156) must be <= 1024. (device threadgroup size limit)’
I turned off the validation step, and the network completed. Unfortunately, the result was not correct. (With an unoptimized net, I verified that this is correct.)
Examining the metal codegen, it seems that there is no check for maximum threads and code to work around this.
My question is what could be causing the incorrectness of the network result and if there is something in this thread that may be contributing to it.
Thanks, --C