AutoScheduler on Mali GPU occasionally results in completely broken records (absurdly fast GFLOPS and meaningless output)

The following two images show the tuning results of the same model

This is the description of all tasks that had been tuned: 1/14 fused_nn_conv2d_multiply_expand_dims_add_nn_relu_negative_nn_relu_multiply_ - Pastebin.com

My observations are:

  • First, this happens randomly, but if I include all the simple tasks for tuning, it definitely going to happen!
  • For example, task #3: normally achieved 66.29 GFLOPS but suddenly jumped to 6308.76 GFLOPS. There are only different in the SP (SplitStep) of the record, for example "SP", 3, 5, 256, [1, 2, 1, 1] to "SP", 3, 5, 256, [1, 2, 16, 2]
  • In task#8, also there is differences in the PragmaStep auto unroll: ["PR", 3, 0, "auto_unroll_max_step$64"]["PR", 3, 0, "auto_unroll_max_step$16"]
  • In task#13, again only there are differences in the SplitStep.

I have tried to update TVM to the newest version on GitHub but the same things still happen. Please help me in understanding why such things happen.

Thanks in advance!

What GPU are you running your model on? There might be issues in different parts of software stack including OpenCL compiler. We observedб for exampleб similar issue: https://github.com/apache/tvm/issues/9242

Can you share reference to the model (if it is publicly available) and auto scheduler records with correct/wrong behaviour?

Hi Elvin,

Indeed, the issue is very similar to https://github.com/apache/tvm/issues/9242

The phone I used is Samsung Ultra S21, which use similar Mali GPU to the Samsung S21 in the post.

I use the original MobileNet from tensorflow.keras.application

        from tensorflow.keras import Input, Model
        from tensorflow.keras.layers import Conv2D
        from tensorflow.keras.applications import mobilenet
        net = mobilenet.MobileNet(input_shape=[image_size, image_size, 3],
                                  weights=None,
                                  include_top=False)
        new_input = Input(shape=[image_size, image_size, 4],
                          batch_size=1,
                          dtype=tf.float32,
                          name='img_input')
        y = Conv2D(filters=3, kernel_size=3, padding='SAME',
                   activation='linear')(new_input)
        y = net(y)
        model = Model(inputs=new_input, outputs=y, name='MobilenetV1')

The tuning records for correct AutoScheduler behaviour: working_autoscheduler_mali - Pastebin.com

The tunning records for wrong AutoScheduler behaviour: broken_autoscheduler_mali - Pastebin.com