[VTA] + [AutoTVM] runtime/compiler errors in transform.py

Hi,

during the tuning of a single Conv2D Layer with AutoTVM on an ultra96 FPGA, most of the schedules have a build or runtime error. Only ~10-20% of the schedules are are actually executed. These executed schedules then perform a correct computation (verified by the check_correctness flag in the RPCRunner). So this seems to be a software/setup issue.

My VTA setup uses the default ultra96 configuration and Conv2D configuration and schedule from VTA/top, tvm version 0.7.dev1

The following errors are the most common ones:

...python/vta/transform.py, line 463, in _get_2d_pattern: RuntimeError: Scope[local.inp_buffer]: cannot detect 2d pattern with elem_block=16: shape=[2, 8, 17, 16], strides=[16384, 512, 16, 1]'

...python/vta/transform.py, line 537, in _inject_copy: ValueError: Do not support pad on the innermost block

My Question:

  • Is the only source of the errors a schedule that can’t work on VTA? Or is there some kind of issue with my setup (see below)?

Thanks for your time :slight_smile:

I adapted the Conv2d optimization tutorial to use AutoTVM:

batch_size = 1
height = 32
width = 32
in_channels = 256 
out_channels = 256
kernel_h = 3
kernel_w = 3
pad_h = 1 
pad_w = 1
stride_h = 1
stride_w = 1
#
# buffer definitions from the tutorial
# register_vta_tuning_task() from tune_relay_vta.py
#
device = "vta"
target = env.target if device == "vta" else env.target_vta_cpu
task = autotvm.task.create("conv2d_packed.vta",
                       args=(data, kernel, (stride_h,stride_w), (pad_h,pad_w), (1,1),"NCHW1n16c", env.acc_dtype),
                       target=target, target_host=env.target_host)
                       
print(task.config_space)
logging.getLogger('autotvm').setLevel(logging.DEBUG)
logging.getLogger('autotvm').addHandler(logging.FileHandler("./single-conv-debug.log"))
tracker_host = "192.168.2.1"
tracker_port = 9190
measure_option = autotvm.measure_option(
    builder=autotvm.LocalBuilder(),
    runner=autotvm.RPCRunner(env.TARGET, host=tracker_host, port=tracker_port, n_parallel=1, number=5, repeat=2, check_correctness=True))
tuner = autotvm.tuner.GATuner(task)
tuner.tune(n_trial=100,
       measure_option=measure_option,
       callbacks=[autotvm.callback.log_to_file('single_conv.log')])

Is this a normal behavior? Can anyone share experiences with AutoTVM and VTA? @tqchen @thierry ?

Hi @dsr91, thanks for the question. Indeed the rationale behind using the autotuner was that it would eliminate invalid schedules during the schedule search. For VTA, many schedules lead to invalid code-gen, because it doesn’t pattern match (e.g. conv2d), or for some other reason (runtime checks for valid schedules etc.). Therefore many of the schedules get discarded away.

In general I think there can be a much smarter way to approach the schedule search and applying more constraints but for that experiment on FPGAs it did the trick.

Ok, good to know this is the intended behaviour. Thanks for the response :slight_smile:

But VTA conv2d schedule is define by vta.top.schedule_conv2d_packed; And this schedule template passed the testcase in vta\tests\python\integration\test_benchmark_topi_conv2d.py;

So,Using same schedule template , Why Vta relay build tuning conv2d (tune_relay_vta.py) is wrong, But topi_conv2d vta tuning is OK