Hello:
I am trying to generate source code of an operator which involves conv2d_nchw and gradient. I tried to achieve this with AutoScheduler. However, the search period will fail when the operator has a relative lager shape, and when the task.apply_best(log_file)
was run,it will tell me that there is no valid schedule in the json file.
To be exact, the program works well when the layer looks like:
@auto_scheduler.register_workload
def conv2d_layer():
x = tvm.te.placeholder((1,64,7,7), name='x')
w1 = tvm.te.placeholder((64,64,3,3), name='w1')
z1=topi.nn.conv2d_nchw(x,w1,(1,1),(1,1),dilation=1,out_dtype="float32")
[dw1] = tvm.te.gradient(z1, [w1])
return [x,w1,dw1]
but will fail when it looks like:
@auto_scheduler.register_workload
def conv2d_layer():
x = tvm.te.placeholder((1,128,7,7), name='x')
w1 = tvm.te.placeholder((128,128,3,3), name='w1')
z1=topi.nn.conv2d_nchw(x,w1,(1,1),(1,1),dilation=1,out_dtype="float32")
[dw1] = tvm.te.gradient(z1, [w1])
return [x,w1,dw1]
the whole program looks like this.
import os
import numpy as np
import tvm
from tvm import te, auto_scheduler, topi
from tvm.topi.testing import conv2d_nchw_python
@auto_scheduler.register_workload
def conv2d_layer():
x = tvm.te.placeholder((1,128,7,7), name='x')
w1 = tvm.te.placeholder((128,128,3,3), name='w1')
z1=topi.nn.conv2d_nchw(x,w1,(1,1),(1,1),dilation=1,out_dtype="float32")
[dw1] = tvm.te.gradient(z1, [w1])
return [x,w1,dw1]
target = tvm.target.Target("cuda")
task = auto_scheduler.SearchTask(
func=conv2d_layer, args=(), target=target
)
print("Computational DAG:")
print(task.compute_dag)
log_file = "conv2d.json"
measure_ctx = auto_scheduler.LocalRPCMeasureContext(min_repeat_ms=300)
tune_option = auto_scheduler.TuningOptions(
num_measure_trials=1000, # change this to 1000 to achieve the best performance
runner=measure_ctx.runner,
measure_callbacks=[auto_scheduler.RecordToFile(log_file)],
verbose=2,
)
task.tune(tune_option)
sch, args = task.apply_best(log_file)
del measure_ctx
print("Lowered TIR:")
print(tvm.lower(sch, args, simple_mode=True))
print("CUDA source code:")
print(task.print_best(log_file, print_mode="cuda"))
The last part of the output is below:
==================================================
No: 1000 GFLOPS: 0.00 / 0.00 results: MeasureResult(error_type:Runtim eDeviceError, error_msg:Traceback (most recent call last):
File "/root/nnfusion/artifacts/.deps/tvm-0.7/python/tvm/auto_scheduler/measure .py", line 1120, in _rpc_run
random_fill(empty_array)
File "/root/nnfusion/artifacts/.deps/tvm-0.7/python/tvm/_ffi/_ctypes/packed_fu nc.
...
----------
An error occurred during the execution of TVM.
For more information, please see: https://tvm.apache.org/docs/errors.html
---------------------------------------------------------------
Check failed: (code == RPCCode::kReturn) is false: code=1
, all_cost:13.82, Tstamp:1631297770.85)
==================================================
Placeholder: x, k
blockIdx.x nn.0@ff.0@yy.0@xx.0@ (0,14)
threadIdx.x nn.2@ff.2@yy.2@xx.2@ (0,8)
for rc.0 (0,8)
for ax0@ax1@ax2@ax3@.0.0 (0,1152)
threadIdx.x ax0@ax1@ax2@ax3@.0.1 (0,8)
k.shared = ...
for ax0@ax1@ax2@ax3@.0.0 (0,18)
threadIdx.x ax0@ax1@ax2@ax3@.0.1 (0,8)
vectorize ax0@ax1@ax2@ax3@.1 (0,3)
pad_temp.shared = ...
for rc.1 (0,16)
for xx_c.3 (0,7)
for ry.2 (0,3)
for rx.2 (0,3)
for ff_c.4 (0,8)
compute.local = ...
for ff.3 (0,8)
for xx.3 (0,7)
compute = ...
blockIdx.x ax0.0@ax1.0@ax2.0@ax3.0@ax4.0@ax5.0@ax6.0@ax7.0@ (0,1024)
vthread ax0.1@ax1.1@ax2.1@ax3.1@ax4.1@ax5.1@ax6.1@ax7.1@ (0,14)
threadIdx.x ax0.2@ax1.2@ax2.2@ax3.2@ax4.2@ax5.2@ax6.2@ax7.2@ (0,288)
compute.k.grad.local auto_unroll: 1024
for n0_n1_k2_shifted_shifted.0 (0,7)
for n1_n2_k3_shifted_shifted.0 (0,7)
for ax0@ax1@ax2@ax3@ax4@ax5@ax6@ax7@.0.0 (0,6)
threadIdx.x ax0@ax1@ax2@ax3@ax4@ax5@ax6@ax7@.0.1 (0,288)
compute.compute.grad.shared = ...
for ax0@ax1@ax2@ax3@.0.0 (0,2)
threadIdx.x ax0@ax1@ax2@ax3@.0.1 (0,288)
pad_temp.d.shared = ...
for ax3_c.3 (0,7)
for ax4_c.3 (0,4)
for ax4_c.4 (0,4)
for ax5_c.4 (0,2)
compute.k.grad.local = ...
for ax3.3 (0,7)
for ax4.3 (0,16)
for ax5.3 (0,2)
compute.k.grad = ...
[18:16:10] /root/nnfusion/artifacts/.deps/tvm-0.7/src/auto_scheduler/measure.cc: 299: Warning: Too many errors happened during tuning. Switching to debug mode.
Time elapsed for measurement: 426.86 s
----------------------------------------------------------------------
------------------------------ [ Done ]
----------------------------------------------------------------------
No valid state found in this search round. Check if it has traversed all of the search space.
/root/nnfusion/artifacts/.deps/anaconda3/lib/python3.6/site-packages/xgboost/tra ining.py:17: UserWarning: Old style callback is deprecated. See: https://xgboos t.readthedocs.io/en/latest/python/callbacks.html
warnings.warn(f'Old style callback is deprecated. See: {link}', UserWarning)
MeasureInput with old format workload key ["conv2d_layer"] should be updated usi ng the script from https://github.com/apache/tvm/pull/7317.
MeasureInput with old format workload key ["dense_layer"] should be updated usin g the script from https://github.com/apache/tvm/pull/7317.
Traceback (most recent call last):
File "conv_layer_tuning_grad.py", line 38, in <module>
sch, args = task.apply_best(log_file)
File "/root/nnfusion/artifacts/.deps/tvm-0.7/python/tvm/auto_scheduler/search_ task.py", line 522, in apply_best
"Cannot find any valid schedule for %s in file %s" % (self.workload_key, log _file)
RuntimeError: Cannot find any valid schedule for ["conv2d_layer"] in file conv2d .json
I am wandering how to make it work.
Can anyone help me?