[AutoTVM]"A fallback configuration is used, which may bring great performance regression"

Hi, I’m trying out AutoTVM using tune_relay_arm.py. I’m using Android phones and I’ve changed target to:

target = tvm.target.Target(“llvm -device=arm_cpu -mtriple=aarch64-linux-android”)

But I still got the “A fallback configuration is used, which may bring great performance regression” at the end. What am I missing here? I suspect mtriple is wrong? But I don’t know how to run ‘gcc -v’ on Android phones, how do I find out mtriple here?

(python3.7) weiwe-macbookpro:autotvm weiwe$ python tune_relay_arm.py
Extract tasks...
Tuning...
[Task  1/38]  Current/Best:    4.15/   7.05 GFLOPS | Progress: (5/5) | 2.23 s Done.
[Task  2/38]  Current/Best:    2.67/   4.78 GFLOPS | Progress: (5/5) | 10.73 s Done.
[Task  3/38]  Current/Best:    3.67/   4.03 GFLOPS | Progress: (5/5) | 1.99 s Done.
[Task  4/38]  Current/Best:    0.78/   1.41 GFLOPS | Progress: (5/5) | 2.74 s Done.
[Task  5/38]  Current/Best:    5.99/   6.32 GFLOPS | Progress: (5/5) | 2.64 s Done.
[Task  6/38]  Current/Best:    3.37/   4.68 GFLOPS | Progress: (5/5) | 6.44 s Done.
[Task  7/38]  Current/Best:    0.35/   2.52 GFLOPS | Progress: (5/5) | 6.39 s Done.
[Task  8/38]  Current/Best:    1.06/   1.55 GFLOPS | Progress: (5/5) | 3.98 s Done.
[Task  9/38]  Current/Best:    5.21/   8.85 GFLOPS | Progress: (5/5) | 2.85 s Done.
[Task 10/38]  Current/Best:    2.73/   6.04 GFLOPS | Progress: (5/5) | 5.36 s Done.
[Task 11/38]  Current/Best:    2.53/   2.83 GFLOPS | Progress: (5/5) | 5.09 s Done.
[Task 12/38]  Current/Best:    1.70/   1.70 GFLOPS | Progress: (5/5) | 11.55 s Done.
[Task 13/38]  Current/Best:    4.61/   7.63 GFLOPS | Progress: (5/5) | 4.33 s Done.
[Task 14/38]  Current/Best:   16.99/  16.99 GFLOPS | Progress: (5/5) | 12.46 s Done.
[Task 15/38]  Current/Best:    0.96/   3.51 GFLOPS | Progress: (5/5) | 6.85 s Done.
[Task 16/38]  Current/Best:    0.73/   1.85 GFLOPS | Progress: (5/5) | 5.29 s Done.
[Task 17/38]  Current/Best:    3.17/   5.15 GFLOPS | Progress: (5/5) | 4.51 s Done.
[Task 18/38]  Current/Best:    1.68/   3.07 GFLOPS | Progress: (5/5) | 11.24 s Done.
[Task 19/38]  Current/Best:    1.78/   1.96 GFLOPS | Progress: (5/5) | 7.98 s Done.
[Task 20/38]  Current/Best:    0.27/   0.75 GFLOPS | Progress: (5/5) | 2.98 s Done.
[Task 21/38]  Current/Best:    5.33/   7.30 GFLOPS | Progress: (5/5) | 4.94 s Done.
[Task 22/38]  Current/Best:    0.88/  10.45 GFLOPS | Progress: (5/5) | 12.86 s Done.
[Task 23/38]  Current/Best:    2.49/   2.49 GFLOPS | Progress: (5/5) | 1.43 s Done.
[Task 24/38]  Current/Best:    2.58/   2.58 GFLOPS | Progress: (5/5) | 2.91 s Done.
[Task 25/38]  Current/Best:    2.60/   6.36 GFLOPS | Progress: (5/5) | 5.24 s Done.
[Task 26/38]  Current/Best:    3.05/   3.44 GFLOPS | Progress: (5/5) | 11.56 s Done.
[Task 27/38]  Current/Best:    0.80/   2.58 GFLOPS | Progress: (5/5) | 4.32 s Done.
[Task 28/38]  Current/Best:    0.48/   0.60 GFLOPS | Progress: (5/5) | 5.59 s Done.
[Task 29/38]  Current/Best:    6.40/   8.00 GFLOPS | Progress: (5/5) | 6.58 s Done.
[Task 30/38]  Current/Best:    0.82/   9.52 GFLOPS | Progress: (5/5) | 7.90 s Done.
[Task 31/38]  Current/Best:    2.43/   2.43 GFLOPS | Progress: (5/5) | 4.99 s Done.
[Task 32/38]  Current/Best:   14.19/  14.19 GFLOPS | Progress: (5/5) | 2.70 s Done.
[Task 33/38]  Current/Best:    4.10/   6.58 GFLOPS | Progress: (5/5) | 3.83 s Done.
[Task 34/38]  Current/Best:    4.88/   4.88 GFLOPS | Progress: (5/5) | 4.89 s Done.
[Task 35/38]  Current/Best:    2.24/   2.24 GFLOPS | Progress: (5/5) | 10.97 s Done.
[Task 36/38]  Current/Best:    0.42/   0.70 GFLOPS | Progress: (5/5) | 2.08 s Done.
[Task 37/38]  Current/Best:    4.83/   8.52 GFLOPS | Progress: (5/5) | 2.86 s Done.
[Task 38/38]  Current/Best:    2.13/   7.22 GFLOPS | Progress: (5/5) | 6.20 s Done.
Compile...
Cannot find config for target=llvm -keys=arm_cpu,cpu -device=arm_cpu -mtriple=aarch64-linux-android, workload=('dense_nopack.x86', ('TENSOR', (1, 1024), 'float32'), ('TENSOR', (1000, 1024), 'float32'), None, 'float32'). A fallback configuration is used, which may bring great performance regression.
Upload...
Evaluate inference time cost...
Mean inference time (std dev): 174.64 ms (8.54 ms)

Don’t be scared of this warning. It only means that you didn’t ask the autotuner to tune 'dense_nopack.x86' layer.

In order to fix this problem, you just need to specify it in extract_from_program such as

    tasks = autotvm.task.extract_from_program(
        mod["main"],
        target=target,
        params=params,
        ops=(relay.op.get("nn.conv2d"), relay.op.get("nn.dense")),
    )

Actually you can simply ignore that argument to include all applicable ops:

    tasks = autotvm.task.extract_from_program(
        mod["main"],
        target=target,
        params=params
    )

Thank you both. It works now!