Autotune not working with x86 AVX2 architecture

e0lithic · February 20, 2020, 7:44am

Compute Specifications

Ubuntu 16.04
Intel® Xeon® CPU E3-1275 v6 @ 3.80GHz
llvm - 9.0.1
tvm - ‘0.7.dev0’

Issue Description

I am trying to run the script tune_relay_x86.py with all the default configurations except target = "llvm -mcpu=core-avx2". However, I am still observing the following warnings for all convolution and dense layers.

Cannot find config for target=llvm -device=tracing, workload=('conv2d', (1, 128, 28, 28, 'float32'), (128, 128, 3, 3, 'float32'), (1, 1), (1, 1, 1, 1), (1, 1), 'NCHW' , 'float32'). A fallback configuration is used, which may bring great performance regression.

Even though these are merely warning and the script executes successfully, the new compiled models are having slower inference time ,and hence it seems that autotuning isnt performing well at all.

Please let me know where I am going wrong, or how to resolve the issue.

Thanks

sergiomatiz · February 20, 2020, 2:17pm

I am currently facing the same problem with that example (my CPU: Intel® Xeon® CPU E5-2620 v4 @ 2.10GHz). What I find interesting is that optimizing only one convolutional layer in a separate script, using the same options and task (topi_x86_conv2d_NCHWc, opt_lvl = 3, and llvm -mcpu=core-avx2) does not produce this error for me. I am wondering if this is related to the conversion done in the tutorial in the function “tune_kernels”:

def tune_kernels(tasks,
                 measure_option,
                 tuner='gridsearch',
                 early_stopping=None,
                 log_filename='tuning.log'):

    for i, tsk in enumerate(tasks):
        prefix = "[Task %2d/%2d] " % (i+1, len(tasks))

        # converting conv2d tasks to conv2d_NCHWc tasks
        op_name = tsk.workload[0]
        if op_name == 'conv2d':
            func_create = 'topi_x86_conv2d_NCHWc'
        elif op_name == 'depthwise_conv2d_nchw':
            func_create = 'topi_x86_depthwise_conv2d_NCHWc_from_nchw'
        else:
            raise ValueError("Tuning {} is not supported on x86".format(op_name))

        task = autotvm.task.create(func_create, args=tsk.args,
                                   target=target, template_key='direct')
        task.workload = tsk.workload

There has been reports of conversion issue in the past:

I am tagging @eqy since he may be familiar with this type of issue

e0lithic · February 22, 2020, 5:40am

Tagging additional members who were involved in similar discussions on x86 autotuning. @kevinthesun @comaniac @apivovarov

heliqi · February 22, 2020, 7:29am

I am currently facing the same problem with that example.

I run the script tune_relay_x86.py with all the default configurations except target = "llvm -mcpu=core-avx2" . And modify the get_network function to use inceptionv3, show the same warning.It‘s can working,but Auto-tuning does not appear to be in effect…

Cannot find config for target=llvm -device=tracing, workload=('conv2d',xxx
Cannot find config for target=llvm -device=tracing, workload=('conv2d',xxx

kevinthesun · February 22, 2020, 8:41am

Are you comparing with default schedule?

e0lithic · February 22, 2020, 10:25am

I compared it to the standard inference time on mxnet (cpu) vs the tvm compiled model. In both the cases inference was done in python.

kevinthesun · February 24, 2020, 6:35pm

You can compare tuned schedule with default schedule to see whether it is because tuned schedule is slow.