X86 tasks appearing when tuning for ARM

yakovdan · April 2, 2022, 6:51pm

Hi All, I’m a TVM beginner, so thanks in advance for your patience

I’ve followed the relevant tutorials on Android deployment, as well as tuning ConvNets for ARM. Now I’m tuning a UNet CNN that uses MobileNetV2 as a backbone in a Encoder-Decoder architecture on a Snapdragon 855 mobile phone. I’ve also converted to FP16 using a mixed precision pass. It works well with meaningful a speedup relative to TFlite. As expected, tuning is rather slow. Looking at the tasks generated by

target = "llvm -device=arm_cpu -model=snapdragon855 -mtriple=aarch64-linux-android -mcpu=kryo -mattr=+neon,+fullfp16,+fp-armv8"
tasks = autotvm.task.extract_from_program(mod["main"], target=target, params=params)

I get tasks like Task(func_name=conv2d_NCHWc.x86 as well as tasks like Task(func_name=conv2d_nchw_spatial_pack.arm_cpu. I don’t understand why x86 tasks are generated and how they are useful for an ARM based architecture? Are those a waste of tuning time budget? If so, how to remove them? Otherwise, why are they needed?

In addition, it seems that I only have conv2d tasks. Of course, those are the slowest. But does it make sense to tune bias_adds and Relus too?

Thanks!

areusch · April 6, 2022, 12:26am

hi @yakovdan,

I believe that “generic” schedules (i.e. those with no platform knowledge) are right now named as “x86” schedules, so apologies for that confusion. we do need to clean this up.

regarding tuning bias_adds and relu–i believe those broadcast operators are typically fused into other operators, such as conv2d, so they will typically reuse the results of tuning whatever operator they fuse into.

Andrew