Support for Mali Valhalla

Hi! From my own testing it seems like currently TVM does not support compiling models for a Mali GPU that has the 3rd generation architecture Valhalla. Do you know when it’s planned to add support for that?

Thanks

Bumping for further visibility

Hi, is there any particular feature in Valhall you are interested in? Otherwise, you can use Mali target that should generally work at least from the functionality side… It might not be very performant though…

Hello, right now it seems that when you auto schedule with Valhall, often the compiled model will produce wrong outputs (significantly different than the original model). If possible I would like TVM to support auto scheduling devices with Valhall.

It does run properly if I just compile it without an optimization file and with -device=mali.

Thank you for your help.

Do you get wrong output with any particular model/operator or all of them? Do you use tvmc or custom script to run auto-scheduler? I think I have observed something like this for some workloads of conv2d operators but on Bifrost (2nd generation of Mali). It seemed though like a generic auto-scheduler problem not specific to Mali or GPU in general… Those are pretty hard to debug though as the same workloads can work fine for other targets simply because ansor uses different mutations…

I’m not sure, I did test it on various models such as mobilenet_v1 and it failed on all of them (see this post I opened a few weeks ago). However, when I use a device with bifrost it works for all the models.

From additional testing I did, this problem occurs even with pytorch models that have only 1 small 2d convolutional layer. I also tested it on a network with 1 fully connected layer and I don’t recall it getting errors (but I may have forgotten…).

In addition, for the simple networks with only 1 layer this error was pretty rare (would happen like 1 in 10 times), but for the larger models it almost always converged to an optimization that gave wrong outputs.

Interesting, thanks for sharing. I was only able to run tflite model with ansor on “-device=mali” on Valhall or Bifrost GPUs. I think it was something fairly simple like resnet18… It might be useful to examine the output of the operators individually.

I did notice that some depthwise_conv2d workloads returned the wrong result and they were all zeroes. So it felt to me that some boundary condition was not transformed correctly somewhere and therefore nothing was ever written into the output buffers. However, I was not able to narrow down the problem yet.