AutoScheduler on Mali GPU occasionally results in completely broken records (absurdly fast GFLOPS and meaningless output)

trungnt13 · November 16, 2021, 11:31am

The following two images show the tuning results of the same model

The first one resulted in ~171ms latency after 61440 trials. After validating the output, all the output values from the model match the Keras model. Best tune records: working_autoscheduler_mali - Pastebin.com
mali_autoscheduler_08Nov21_complex898×660 15.2 KB
The second one resulted in ~71ms latency after 29632 trials. After validating the output, all the output values are completely mismatched Keras model. Best tune records: broken_autoscheduler_mali - Pastebin.com
mali_autoscheduler_16Nov21_complex1322×868 27.7 KB

This is the description of all tasks that had been tuned: 1/14 fused_nn_conv2d_multiply_expand_dims_add_nn_relu_negative_nn_relu_multiply_ - Pastebin.com

My observations are:

First, this happens randomly, but if I include all the simple tasks for tuning, it definitely going to happen!
For example, task #3: normally achieved 66.29 GFLOPS but suddenly jumped to 6308.76 GFLOPS. There are only different in the SP (SplitStep) of the record, for example "SP", 3, 5, 256, [1, 2, 1, 1] to "SP", 3, 5, 256, [1, 2, 16, 2]
In task#8, also there is differences in the PragmaStep auto unroll: ["PR", 3, 0, "auto_unroll_max_step$64"] → ["PR", 3, 0, "auto_unroll_max_step$16"]
In task#13, again only there are differences in the SplitStep.

I have tried to update TVM to the newest version on GitHub but the same things still happen. Please help me in understanding why such things happen.

Thanks in advance!

elvin-n · November 30, 2021, 6:02pm

What GPU are you running your model on? There might be issues in different parts of software stack including OpenCL compiler. We observedб for exampleб similar issue: https://github.com/apache/tvm/issues/9242

Can you share reference to the model (if it is publicly available) and auto scheduler records with correct/wrong behaviour?

trungnt13 · December 3, 2021, 9:37am

Hi Elvin,

Indeed, the issue is very similar to https://github.com/apache/tvm/issues/9242

The phone I used is Samsung Ultra S21, which use similar Mali GPU to the Samsung S21 in the post.

I use the original MobileNet from tensorflow.keras.application

        from tensorflow.keras import Input, Model
        from tensorflow.keras.layers import Conv2D
        from tensorflow.keras.applications import mobilenet
        net = mobilenet.MobileNet(input_shape=[image_size, image_size, 3],
                                  weights=None,
                                  include_top=False)
        new_input = Input(shape=[image_size, image_size, 4],
                          batch_size=1,
                          dtype=tf.float32,
                          name='img_input')
        y = Conv2D(filters=3, kernel_size=3, padding='SAME',
                   activation='linear')(new_input)
        y = net(y)
        model = Model(inputs=new_input, outputs=y, name='MobilenetV1')

The tuning records for correct AutoScheduler behaviour: working_autoscheduler_mali - Pastebin.com

The tunning records for wrong AutoScheduler behaviour: broken_autoscheduler_mali - Pastebin.com

elvin-n · December 7, 2021, 11:53am

Yes, phone has Qualcomm Snapdragon 888 chip with Adreno GPU. It must be same issue, Qualcomm OpenCL compiler had bugs which claimed as fixed in the latest version of Qualcomm OpenCL library. Could you please share the version of the library executing this command:

adb shell strings /vendor/lib/libllvm-qcom.so | grep E031

trungnt13 · December 10, 2021, 8:29am

The version on our phone is E031.38.01.02