The following two images show the tuning results of the same model
The first one resulted in ~171ms latency after 61440 trials. After validating the output, all the output values from the model match the Keras model. Best tune records: working_autoscheduler_mali - Pastebin.com
The second one resulted in ~71ms latency after 29632 trials. After validating the output, all the output values are completely mismatched Keras model. Best tune records: broken_autoscheduler_mali - Pastebin.com
First, this happens randomly, but if I include all the simple tasks for tuning, it definitely going to happen!
For example, task #3: normally achieved 66.29 GFLOPS but suddenly jumped to 6308.76 GFLOPS. There are only different in the SP (SplitStep) of the record, for example "SP", 3, 5, 256, [1, 2, 1, 1] to "SP", 3, 5, 256, [1, 2, 16, 2]
In task#8, also there is differences in the PragmaStep auto unroll: ["PR", 3, 0, "auto_unroll_max_step$64"] → ["PR", 3, 0, "auto_unroll_max_step$16"]
In task#13, again only there are differences in the SplitStep.
I have tried to update TVM to the newest version on GitHub but the same things still happen.
Please help me in understanding why such things happen.
What GPU are you running your model on? There might be issues in different parts of software stack including OpenCL compiler. We observedб for exampleб similar issue: https://github.com/apache/tvm/issues/9242
Can you share reference to the model (if it is publicly available) and auto scheduler records with correct/wrong behaviour?
Yes, phone has Qualcomm Snapdragon 888 chip with Adreno GPU. It must be same issue, Qualcomm OpenCL compiler had bugs which claimed as fixed in the latest version of Qualcomm OpenCL library.
Could you please share the version of the library executing this command: