I find the workaround for autotuning with one PYNQ and locate the problem.
In the VTA autotuning tutorial, there is a handle named remote
.
The remote
does two things. One is to program FPGA.
if env.TARGET != "sim":
# Get remote from fleet node
remote = autotvm.measure.request_remote(
env.TARGET, tracker_host, tracker_port, timeout=10000
)
# Reconfigure the JIT runtime and FPGA.
vta.reconfig_runtime(remote)
vta.program_fpga(remote, bitstream=None)
else:
# In simulation mode, host the RPC server locally.
remote = rpc.LocalSession()
Another is to run the whole net and give the result after autotuning.
# compile kernels with history best records
with autotvm.tophub.context(target, extra_files=[log_file]):
# Compile network
print("Compile...")
if target.device_name != "vta":
with tvm.transform.PassContext(opt_level=3, disabled_pass={"AlterOpLayout"}):
lib = relay.build(
relay_prog, target=target, params=params, target_host=env.target_host
)
else:
with vta.build_config(opt_level=3, disabled_pass={"AlterOpLayout"}):
lib = relay.build(
relay_prog, target=target, params=params, target_host=env.target_host
)
# Export library
print("Upload...")
temp = util.tempdir()
lib.save(temp.relpath("graphlib.o"))
remote.upload(temp.relpath("graphlib.o"))
lib = remote.load_module("graphlib.o")
# Generate the graph runtime
ctx = remote.ext_dev(0) if device == "vta" else remote.cpu(0)
m = graph_runtime.GraphModule(lib["default"](ctx))
# upload parameters to device
image = tvm.nd.array((np.random.uniform(size=(1, 3, 224, 224))).astype("float32"))
m.set_input("data", image)
# evaluate
print("Evaluate inference time cost...")
timer = m.module.time_evaluator("run", ctx, number=1, repeat=10)
tcost = timer()
prof_res = np.array(tcost.results) * 1000 # convert to millisecond
print(
"Mean inference time (std dev): %.2f ms (%.2f ms)"
% (np.mean(prof_res), np.std(prof_res))
)
The remote
occupies a device all the time but it play no role in autotuning. So my workaround is to comment out the code above to remove the remote
and it works.
Extract tasks...
Extracted 10 conv2d tasks:
(1, 14, 14, 256, 512, 1, 1, 0, 0, 2, 2)
(1, 28, 28, 128, 256, 1, 1, 0, 0, 2, 2)
(1, 56, 56, 64, 128, 1, 1, 0, 0, 2, 2)
(1, 56, 56, 64, 64, 3, 3, 1, 1, 1, 1)
(1, 28, 28, 128, 128, 3, 3, 1, 1, 1, 1)
(1, 56, 56, 64, 128, 3, 3, 1, 1, 2, 2)
(1, 14, 14, 256, 256, 3, 3, 1, 1, 1, 1)
(1, 28, 28, 128, 256, 3, 3, 1, 1, 2, 2)
(1, 7, 7, 512, 512, 3, 3, 1, 1, 1, 1)
(1, 14, 14, 256, 512, 3, 3, 1, 1, 2, 2)
Tuning...
[Task 1/10] Current/Best: 0.00/ 28.79 GFLOPS | Progress: (480/480) | 306.61 s Done.
[Task 2/10] Current/Best: 0.00/ 31.41 GFLOPS | Progress: (576/576) | 389.47 s Done.
[Task 3/10] Current/Best: 0.00/ 43.20 GFLOPS | Progress: (1000/1000) | 667.90 s Done.
[Task 4/10] Current/Best: 0.00/ 46.37 GFLOPS | Progress: (1000/1000) | 564.08 s Done.
[Task 5/10] Current/Best: 0.00/ 38.90 GFLOPS | Progress: (1000/1000) | 641.09 s Done.
[Task 6/10] Current/Best: 0.00/ 44.39 GFLOPS | Progress: (1000/1000) | 560.03 s Done.
[Task 7/10] Current/Best: 0.00/ 40.67 GFLOPS | Progress: (1000/1000) | 731.33 s Done.
[Task 8/10] Current/Best: 0.00/ 9.58 GFLOPS | Progress: (1000/1000) | 1046.03 s Done.
[Task 9/10] Current/Best: 0.00/ 12.51 GFLOPS | Progress: (1000/1000) | 1276.48 s Done.
[Task 10/10] Current/Best: 0.31/ 11.95 GFLOPS | Progress: (480/480) | 619.91 s Done.