AutoScheduler failed to find a valid schedule

Hello:
I am trying to generate source code of an operator which involves conv2d_nchw and gradient. I tried to achieve this with AutoScheduler. However, the search period will fail when the operator has a relative lager shape, and when the task.apply_best(log_file) was run,it will tell me that there is no valid schedule in the json file.

To be exact, the program works well when the layer looks like:

@auto_scheduler.register_workload
def conv2d_layer():
    x = tvm.te.placeholder((1,64,7,7), name='x')
    w1 = tvm.te.placeholder((64,64,3,3), name='w1')
    z1=topi.nn.conv2d_nchw(x,w1,(1,1),(1,1),dilation=1,out_dtype="float32")
    [dw1] = tvm.te.gradient(z1, [w1])
    return [x,w1,dw1]

but will fail when it looks like:

@auto_scheduler.register_workload
def conv2d_layer():
    x = tvm.te.placeholder((1,128,7,7), name='x')
    w1 = tvm.te.placeholder((128,128,3,3), name='w1')
    z1=topi.nn.conv2d_nchw(x,w1,(1,1),(1,1),dilation=1,out_dtype="float32")
    [dw1] = tvm.te.gradient(z1, [w1])
    return [x,w1,dw1]

the whole program looks like this.

import os

import numpy as np
import tvm
from tvm import te, auto_scheduler, topi
from tvm.topi.testing import conv2d_nchw_python

@auto_scheduler.register_workload
def conv2d_layer():
    x = tvm.te.placeholder((1,128,7,7), name='x')
    w1 = tvm.te.placeholder((128,128,3,3), name='w1')
    z1=topi.nn.conv2d_nchw(x,w1,(1,1),(1,1),dilation=1,out_dtype="float32")
    [dw1] = tvm.te.gradient(z1, [w1])
    return [x,w1,dw1]

target = tvm.target.Target("cuda")

task = auto_scheduler.SearchTask(
    func=conv2d_layer, args=(), target=target
)


print("Computational DAG:")
print(task.compute_dag)

log_file = "conv2d.json"
measure_ctx = auto_scheduler.LocalRPCMeasureContext(min_repeat_ms=300)
tune_option = auto_scheduler.TuningOptions(
    num_measure_trials=1000,  # change this to 1000 to achieve the best performance
    runner=measure_ctx.runner,
    measure_callbacks=[auto_scheduler.RecordToFile(log_file)],
    verbose=2,
)
task.tune(tune_option)

sch, args = task.apply_best(log_file)

del measure_ctx

print("Lowered TIR:")
print(tvm.lower(sch, args, simple_mode=True))

print("CUDA source code:")
print(task.print_best(log_file, print_mode="cuda"))

The last part of the output is below:

==================================================
No: 1000        GFLOPS: 0.00 / 0.00     results: MeasureResult(error_type:Runtim                                                                                                                              eDeviceError, error_msg:Traceback (most recent call last):
  File "/root/nnfusion/artifacts/.deps/tvm-0.7/python/tvm/auto_scheduler/measure                                                                                                                              .py", line 1120, in _rpc_run
    random_fill(empty_array)
  File "/root/nnfusion/artifacts/.deps/tvm-0.7/python/tvm/_ffi/_ctypes/packed_fu                                                                                                                              nc.
...
----------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
  Check failed: (code == RPCCode::kReturn) is false: code=1
, all_cost:13.82, Tstamp:1631297770.85)
==================================================
Placeholder: x, k
blockIdx.x nn.0@ff.0@yy.0@xx.0@ (0,14)
  threadIdx.x nn.2@ff.2@yy.2@xx.2@ (0,8)
    for rc.0 (0,8)
      for ax0@ax1@ax2@ax3@.0.0 (0,1152)
        threadIdx.x ax0@ax1@ax2@ax3@.0.1 (0,8)
          k.shared = ...
      for ax0@ax1@ax2@ax3@.0.0 (0,18)
        threadIdx.x ax0@ax1@ax2@ax3@.0.1 (0,8)
          vectorize ax0@ax1@ax2@ax3@.1 (0,3)
            pad_temp.shared = ...
      for rc.1 (0,16)
        for xx_c.3 (0,7)
          for ry.2 (0,3)
            for rx.2 (0,3)
              for ff_c.4 (0,8)
                compute.local = ...
    for ff.3 (0,8)
      for xx.3 (0,7)
        compute = ...
blockIdx.x ax0.0@ax1.0@ax2.0@ax3.0@ax4.0@ax5.0@ax6.0@ax7.0@ (0,1024)
  vthread ax0.1@ax1.1@ax2.1@ax3.1@ax4.1@ax5.1@ax6.1@ax7.1@ (0,14)
    threadIdx.x ax0.2@ax1.2@ax2.2@ax3.2@ax4.2@ax5.2@ax6.2@ax7.2@ (0,288)
      compute.k.grad.local auto_unroll: 1024
      for n0_n1_k2_shifted_shifted.0 (0,7)
        for n1_n2_k3_shifted_shifted.0 (0,7)
          for ax0@ax1@ax2@ax3@ax4@ax5@ax6@ax7@.0.0 (0,6)
            threadIdx.x ax0@ax1@ax2@ax3@ax4@ax5@ax6@ax7@.0.1 (0,288)
              compute.compute.grad.shared = ...
          for ax0@ax1@ax2@ax3@.0.0 (0,2)
            threadIdx.x ax0@ax1@ax2@ax3@.0.1 (0,288)
              pad_temp.d.shared = ...
          for ax3_c.3 (0,7)
            for ax4_c.3 (0,4)
              for ax4_c.4 (0,4)
                for ax5_c.4 (0,2)
                  compute.k.grad.local = ...
      for ax3.3 (0,7)
        for ax4.3 (0,16)
          for ax5.3 (0,2)
            compute.k.grad = ...

[18:16:10] /root/nnfusion/artifacts/.deps/tvm-0.7/src/auto_scheduler/measure.cc:                                                                                                                              299: Warning: Too many errors happened during tuning. Switching to debug mode.

Time elapsed for measurement: 426.86 s
----------------------------------------------------------------------
------------------------------  [ Done ]
----------------------------------------------------------------------
No valid state found in this search round. Check if it has traversed all of the                                                                                                                               search space.
/root/nnfusion/artifacts/.deps/anaconda3/lib/python3.6/site-packages/xgboost/tra                                                                                                                              ining.py:17: UserWarning: Old style callback is deprecated.  See: https://xgboos                                                                                                                              t.readthedocs.io/en/latest/python/callbacks.html
  warnings.warn(f'Old style callback is deprecated.  See: {link}', UserWarning)
MeasureInput with old format workload key ["conv2d_layer"] should be updated usi                                                                                                                              ng the script from https://github.com/apache/tvm/pull/7317.
MeasureInput with old format workload key ["dense_layer"] should be updated usin                                                                                                                              g the script from https://github.com/apache/tvm/pull/7317.



Traceback (most recent call last):
  File "conv_layer_tuning_grad.py", line 38, in <module>
    sch, args = task.apply_best(log_file)
  File "/root/nnfusion/artifacts/.deps/tvm-0.7/python/tvm/auto_scheduler/search_                                                                                                                              task.py", line 522, in apply_best
    "Cannot find any valid schedule for %s in file %s" % (self.workload_key, log                                                                                                                              _file)
RuntimeError: Cannot find any valid schedule for ["conv2d_layer"] in file conv2d                                                                                                                              .json

I am wandering how to make it work.
Can anyone help me?

Do anyone have any suggestion? I am hoping to get your advise! Thanks a lot!