Auto-scheduler/autoTVM

I read some documentation about Auto-scheduler/autoTVM, if I want to tune the model and compile it to android phone, is it only through RPC? But if the RPC method is used, is this tuning only for my phone, and if other phones need to be re-tuned?

tune_option = auto_scheduler.TuningOptions(
        num_measure_trials=200,  # change this to 20000 to achieve the best performance
        builder=auto_scheduler.LocalBuilder(build_func="ndk" if use_ndk else "default"),
        runner=auto_scheduler.RPCRunner(
            device_key,
            host=rpc_host,
            port=rpc_port,
            timeout=30,
            repeat=1,
            min_repeat_ms=200,
            enable_cpu_cache_flush=True,
        ),
        measure_callbacks=[auto_scheduler.RecordToFile(log_file)],
    )

Does the above TuningOptions have to specify runner=auto_scheduler.RPCRunner? Can’t debug arm/android devices if I don’t specify this? thanks!

The tuning always happens on the certain device. If you can compile of the model on the same device where tuning should happen, there is no needs in RPC. (Another question if compilation and tuning happens on cpu, this will affect the reliability of the tuning, but it is another question.)

In case of Android, it is the only way -

  • start tracker on the host
  • start tvm_rpc through the adb shell or android_rpc on the phone and conenct to the tracker
  • start tuning on the host pointing tracker and alias of the phone in the tracker

As for the second question - it is very correct question. Right now TVM provides rather IoT approach when you need to tune under each device independently and only this will guaranty best performance. There is a hope but no guarantee that been tuned on one model of the device the configuration can work well on another, but you need to verify the performance and consider if results are acceptable or not.

For example, if you tune something on x86 processor, this can be safely reused on another x86 processor with same ISA and you get best performance on another model. As for ARM or GPU - it is open question.

Thank you very much for your detailed answer, this question has been bugging me forever. Because I provided the sdk for others to use, I am not sure which phone the program is running on.

I can use target = "llvm -device=arm_cpu -model=android -mtriple=arm64-linux-android -mattr=+neon", on my phone to achieve the best results through Auto-scheduler, but it is difficult on other phones Guaranteed, it’s also impossible for me to debug all phones.

For example, the MTK arm64 mobile phone I use is tested with arm_cpu_imagenet_bench.py. It is best to choose p20pro as the model. There is a difference of more than 20% between the models of other mobile phones. In fact, this is a big influence. Thanks!