thanks a lot @comaniac ! I read the rpc related issues then located my bug.
Before, I tried to parallelize my searching using multiple scripts.
CUDA_VISIBLE_DEVICES=0 python3 my_tvm_searching.py ...
CUDA_VISIBLE_DEVICES=1 python3 my_tvm_searching.py ...
...
my_tvm_searching.py
will launch the localrpc, and it’s not the correct usage.
measure_ctx = auto_scheduler.LocalRPCMeasureContext(min_repeat_ms=300)
tune_option = auto_scheduler.TuningOptions(
num_measure_trials=100,
runner=measure_ctx.runner,
measure_callbacks=[auto_scheduler.RecordToFile(log_file)],
builder=tvm.auto_scheduler.LocalBuilder(timeout=100),
verbose=2,
)
task.tune(tune_option)
del measure_ctx
after fixing bug, my commands are as follow:
nohup python3 -m tvm.exec.rpc_tracker --host=0.0.0.0 --port=9190 & \
CUDA_VISIBLE_DEVICES=0 nohup python3 -m tvm.exec.rpc_server --tracker 127.0.0.1:9190 --key V100 --host 0.0.0.0 --port=9091 & \
CUDA_VISIBLE_DEVICES=1 nohup python3 -m tvm.exec.rpc_server --tracker 127.0.0.1:9190 --key V100 --host 0.0.0.0 --port=9092 &
...
python3 my_tvm_searching.py
my_tvm_searching.py
:
...
runner = tvm.auto_scheduler.RPCRunner(key="V100", host="localhost", port=9190, n_parallel=8, min_repeat_ms=300, timeout=1000)
tune_option = auto_scheduler.TuningOptions(
num_measure_trials=100, # change this to 1000 to achieve the best performance
runner=runner,
measure_callbacks=[auto_scheduler.RecordToFile(log_file)],
builder=tvm.auto_scheduler.LocalBuilder(timeout=1000),
verbose=2,
)
task.tune(tune_option)
...
NOTE: set n_parallel as #gpus, set timeout a proper number.
ref:
https://tvm.apache.org/docs/how_to/tune_with_autoscheduler/tune_network_arm.html?highlight=parallel
https://tvm.apache.org/docs/reference/api/python/auto_scheduler.html?highlight=tvm%20auto_scheduler%20rpcrunner#tvm.auto_scheduler.RPCRunner