Problem using Ansor:Did you forget to bind?

MingliSun · January 20, 2022, 3:32pm

I am using Ansor to tune to get a schedule,but an error occured when calling tvm.build() The error message is below:

Did you forget to bind?
    Variable `tensor_2` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
    Variable `conv0_weight` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
    Variable `data` is directly accessed by host memory (it is not contained in a thread environment or in the function arguments.
File "/home/sun/gitDownload/tvm/src/tir/analysis/verify_memory.cc", line 202
RuntimeError: Memory verification failed with the following errors:
PrimFunc([data, conv0_weight, tensor_2]) attrs={"global_symbol": "default_function", "tir.noalias": (bool)1, "target": cuda -keys=cuda,gpu -max_num_threads=1024 -thread_warp_size=32}......

Here is my source code:

@auto_scheduler.register_workload
def fused_12():
	data = te.placeholder((1,3,224,224), name='data', dtype='float32')
	tensor_0 = te.compute((1,3,112,112,7,7), lambda n,c,h1,w1,kh,kw: te.if_then_else(te.all(-3+2*h1+kh>=0,-3+2*h1+kh<224,-3+kw+2*w1>=0,-3+kw+2*w1<224),data[n,c,-3+2*h1+kh,-3+kw+2*w1],tvm.tir.const(0, dtype='float32')),name = 'tensor_0')
	conv0_weight = te.placeholder((64,3,7,7), name='conv0_weight', dtype='float32')
	# tensor_1 = te.compute((1,64,3,112,112,7,7), lambda n,oc,c,h1,w1,kh,kw: tensor_0[n,c,h1,w1,kh,kw] * conv0_weight[oc,c,kh,kw], name='tensor_1',)
	c = te.reduce_axis((0,3),name='c')
	kh = te.reduce_axis((0,7),name='kh')
	kw = te.reduce_axis((0,7),name='kw')
	tensor_2 = te.compute((1,64,112,112), lambda n,oc,h1,w1: te.sum(tensor_0[n,c,h1,w1,kh,kw] * conv0_weight[oc,c,kh,kw],axis = [c,kh,kw]), name='tensor_2')
	return [data,conv0_weight,tensor_2]

target = tvm.target.Target("cuda")
target_host = tvm.target.Target("llvm -mtriple=aarch64-linux-gnu")
tune_option= auto_scheduler.TuningOptions(num_measure_trials=10, measure_callbacks=[auto_scheduler.RecordToFile('fused_12.json')],)
task = tvm.auto_scheduler.SearchTask(func=fused_12, args=(), target=target)
task.tune(tune_option)

sch, args = task.apply_best('fused_12.json')
fused_12_func = tvm.build(sch,args,target=target,target_host=target_host)

And my enviroment: tvm0.8 Ubuntu18.04

Thanks a lot!

comaniac · January 20, 2022, 5:35pm

This error usually indicates that no schedule is applied when building the model, because GPU doesn’t have a valid default schedule. Specifically, if you see a warning like “Cannot find a schedule …” during the build, then it means no schedule is applied. If so, you might need to check whether Ansor has found any valid schedule and put it in the log file; otherwise you may increase the trial number or check your workload again.

MingliSun · January 21, 2022, 9:43am

I increase the trial number which is 500.And most of the trial show error message:

No: 500 GFLOPS: 0.00 / 0.00     results: MeasureResult(error_type:CompileHostError, error_msg:Traceback (most recent call last):
  File "/root/tvm/python/tvm/auto_scheduler/measure.py", line 629, in _timed_func
    func = build_module.build(sch, args, target=task.target)
  File "/root/tvm/python/tvm/driver/build_module.py", line 354, in build
    m
...
e.current.cuda_target_arch)
  File "/root/tvm/python/tvm/contrib/nvcc.py", line 71, in compile_cuda
    raise ValueError("arch(sm_xy) is not passed, and we cannot detect it from env")
ValueError: arch(sm_xy) is not passed, and we cannot detect it from env
, all_cost:0.05, Tstamp:1642757259.10)

And corresponding last error message

No valid state found in this search round. Check if it has traversed all of the search space.
MeasureInput with old format workload key ["fused_12"] should be updated using the script from https://github.com/apache/tvm/pull/7317.
Traceback (most recent call last):
  File "conv2d_test.py", line 45, in <module>
    sch, args = task.apply_best('fused_12.json')
  File "/root/tvm/python/tvm/auto_scheduler/search_task.py", line 522, in apply_best
    "Cannot find any valid schedule for %s in file %s" % (self.workload_key, log_file)
RuntimeError: Cannot find any valid schedule for ["fused_12"] in file fused_12.json

And I try this script

import tvm
print(tvm.gpu(0).exist)
print(tvm.gpu(0).compute_version)

which output:

True
7.5

And my envirment

nvidia-smi
Fri Jan 21 17:41:19 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.67       Driver Version: 418.67       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla T4            Off  | 00000000:21:01.0 Off |                    0 |
| N/A   36C    P0    27W /  70W |      0MiB / 15079MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Wed_Apr_24_19:10:27_PDT_2019
Cuda compilation tools, release 10.1, V10.1.168

I don’t figure it out and don’t know how to fix it? Any tips?

comaniac · January 21, 2022, 6:02pm

This error explains the reason. In v0.8, nvcc.py has the folowing

    if arch is None:
        if nd.cuda(0).exist:
            # auto detect the compute arch argument
            arch = "sm_" + "".join(nd.cuda(0).compute_version.split("."))
        else:
            raise ValueError("arch(sm_xy) is not passed, and we cannot detect it from env")

So it means nd.cuda(0).exist gave False. I’m not sure why, but you should be able to specify it in target to address this issue. Instead of just cuda, try the target like cuda -arch=sm_XY, where XY=75 for Tesla T4.

MingliSun · January 21, 2022, 7:34pm

I pass “cuda -arch=sm_75” to tvm.target.Target(),and this error still happened each trial.And I also try another gpu whose arch is “sm_35” and pass ‘sm_35’ to it,the same error occured. As I mentioned, I run script following

from tvm.runtime import ndarray as nd
print(nd.gpu(0).exist)

is right which print True

I’m confused.

MingliSun · January 22, 2022, 10:04am

After updating tvm to latest versio(0.9)，it works fine.Though I don’t know the reason, it is solved.