AutoTVM failure on GPU. Could not find any valid schedule for task Task(func_name=conv2d_nchw.cuda...)

sqchao · May 11, 2021, 2:54pm

When I use AutoTVM to tune a PyTorch model(resnet18), the tuning process failed.

The content of file /tmp/tvm_tuning_errors_95o934k0.log

Note that: When the target is changed to CPU, the tuning can run well.

Question 1: Is it a bug of TVM?

Question 2: The content of the above log file is not complete, do we need to improve it?

The runnable script:

import tvm
from tvm import relay
from tvm.contrib import graph_runtime
from tvm import autotvm
from tvm.autotvm.tuner import XGBTuner, GATuner, RandomTuner, GridSearchTuner

import torch
import torchvision.models as models
import numpy as np
import os

def tune_kernels(tasks,
                 measure_option,
                 tuner='XGBTuner',    #'gridsearch',
                 early_stopping=None,
                 log_filename='tuning.log'):

    for i, task in enumerate(tasks):
        prefix = "[Task %2d/%2d] " % (i+1, len(tasks))

        # create tuner
        if tuner == 'xgb' or tuner == 'xgb-rank':
            tuner_obj = XGBTuner(task, loss_type='rank')
        elif tuner == 'ga':
            tuner_obj = GATuner(task, pop_size=50)
        elif tuner == 'random':
            tuner_obj = RandomTuner(task)
        elif tuner == 'gridsearch':
            tuner_obj = GridSearchTuner(task)
        else:
            raise ValueError("Invalid tuner: " + tuner)

        # do tuning
        # n_trial=len(task.config_space)
        n_trial = 1  # replace n_tral = 1 to reduce running time
        tuner_obj.tune(n_trial=n_trial,
                       early_stopping=early_stopping,
                       measure_option=measure_option,
                       callbacks=[
                           autotvm.callback.progress_bar(n_trial, prefix=prefix),
                           autotvm.callback.log_to_file(log_filename)])

model = models.resnet18(pretrained=True).eval()
batch_size = 1
input_shape = [batch_size, 3, 224, 224]
input_data = torch.randn(input_shape)
scripted_model = torch.jit.trace(model, input_data).eval()



from PIL import Image

img_url = "https://github.com/dmlc/mxnet.js/blob/main/data/cat.png?raw=true"
from tvm.contrib.download import download_testdata

img_path = download_testdata(img_url, "cat.png", module="data")
img = Image.open(img_path).resize((224, 224))

# Preprocess the image and convert to tensor
from torchvision import transforms

my_preprocess = transforms.Compose(
    [
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    ]
)
img = my_preprocess(img)
img = np.expand_dims(img, 0)

for i in range(batch_size - 1):
    img = np.append(img, img, axis=0)

input_name = "input0"
shape_list = [(input_name, img.shape)]
mod, params = relay.frontend.from_pytorch(scripted_model, shape_list)

# target = 'llvm'
# dev = tvm.cpu(0)
target = 'cuda'
dev = tvm.gpu(0)

log_file = 'temp_tune.log'



tuning_option = {
        'log_filename': log_file,
        'tuner': 'random',
        'early_stopping': None,
        'measure_option': autotvm.measure_option(
            builder=autotvm.LocalBuilder(),
            runner=autotvm.LocalRunner(number=1, repeat=1,
                                       min_repeat_ms=0, enable_cpu_cache_flush=True),
        ),
    }
tasks = autotvm.task.extract_from_program(mod["main"], target=target,
                                          params=params,
                                          ops=(relay.op.get("nn.conv2d"),))

if not os.path.exists('./' + log_file):
    print('run tuning kernel tasks:')
    tune_kernels(tasks, **tuning_option)
# -------------------------------- end tune ------------------------
with autotvm.apply_history_best(log_file):
    with tvm.transform.PassContext(opt_level=3):
        lib = relay.build(mod, target=target, params=params)

dtype = "float32"
m = graph_runtime.GraphModule(lib["default"](dev))
m.set_input(input_name, tvm.nd.array(img.astype(dtype)))
m.run()
tvm_output = m.get_output(0).asnumpy()

comaniac · May 11, 2021, 5:26pm

The error log is truncated in order to be sent back to the host via RPC. For GPU, have you try to use a larger trial number? The GPU schedule may not fit to your GPU, so it’s possible to have no valid schedule if you only try one.

sqchao · May 12, 2021, 5:04am

Thanks for your reply.

When n_trial = 100, the error still exists.
When n_trial = len(task.config_space), some new debug information was output.

Part of the debug info:

[Task 1 / 15] Current / Best: 0.00 / 0.00 GFLOPS | Progress: (160 / 844800) | 186.98 sWARNING: autotvm: Too many errors happen in the tuning.Now is in debug mode
DEBUG: autotvm: No: 161 GFLOPS: 0.00 / 0.00 result: MeasureResult(costs = (RuntimeError('Traceback (most recent call last):\n  [bt] (5) /workplace/software/tvm/tvm8/build/libtvm.so(TVMFuncCall+0x10d) [0x7fe559f5e97d]\n  [bt] (4) /workplace/software/tvm/tvm8/build/libtvm.so(+0x4c3c3ca) [0x7fe55a05a3ca]\n  [bt] (3) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::RPCWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const+0xadf) [0x7fe55a0678ef]\n  [bt] (2) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::RPCClientSession::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)> const&)+0x65) [0x7fe55a04c3b5]\n  [bt] (1) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::RPCEndpoint::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)>)+0x594) [0x7fe55a032914]\n  [bt] (0) /workplace/software/tvm/tvm8/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x11c) [0x7fe557a88cdc]\n  File "/workplace/software/tvm/tvm8/src/runtime/rpc/rpc_endpoint.cc", line 807\nTVMError: Check f'), ), error_no = 4, all_cost = 2.359102964401245, timestamp = 1620783122.138785)[('tile_f', [-1, 2, 1, 16]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 2]), ('tile_ry', [-1, 1]), ('tile_rx', [-1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)], None, 357665
DEBUG: autotvm: No: 162 GFLOPS: 0.00 / 0.00 result: MeasureResult(costs = (InstantiationError('Traceback (most recent call last):\n  [bt] (8) /workplace/software/tvm/tvm8/build/libtvm.so(TVMFuncCall+0x10d) [0x7fe559f5e97d]\n  [bt] (7) /workplace/software/tvm/tvm8/build/libtvm.so(+0x2d3e246) [0x7fe55815c246]\n  [bt] (6) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::Pass::operator()(tvm::IRModule) const+0x217) [0x7fe557e81bb7]\n  [bt] (5) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::PassNode::operator()(tvm::IRModule) const+0xa0) [0x7fe557e9b460]\n  [bt] (4) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x913) [0x7fe5581569a3]\n  [bt] (3) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x223) [0x7fe558166ca3]\n  [bt] (2) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x90a) [0x7fe55889daba]\n  [bt] (1) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const+0x177) [0x7fe5588a48a7]\n  [bt] (0) /workplace/software/tvm/tvm8/build/libtvm.so(+0x4b42e6b) [0x7fe559f60e6b]\n  File "tvm/_ffi/_cython/./packed_func.pxi", line 55, in tvm._ffi._cy3.core.tvm_callback\n  File "/workplace/software/tvm/tvm8/python/tvm/autotvm/measure/measure_methods.py", line 747, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel'), ), error_no = 1, all_cost = 0.059586286544799805, timestamp = 1620783111.542556)[('tile_f', [-1, 4, 2, 8]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 128]), ('tile_ry', [-1, 1]), ('tile_rx', [-1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)], None, 377225
DEBUG: autotvm: No: 163 GFLOPS: 0.00 / 0.00 result: MeasureResult(costs = (RuntimeError('Traceback (most recent call last):\n  [bt] (5) /workplace/software/tvm/tvm8/build/libtvm.so(TVMFuncCall+0x10d) [0x7fe559f5e97d]\n  [bt] (4) /workplace/software/tvm/tvm8/build/libtvm.so(+0x4c3c3ca) [0x7fe55a05a3ca]\n  [bt] (3) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::RPCWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const+0xadf) [0x7fe55a0678ef]\n  [bt] (2) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::RPCClientSession::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)> const&)+0x65) [0x7fe55a04c3b5]\n  [bt] (1) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::RPCEndpoint::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)>)+0x594) [0x7fe55a032914]\n  [bt] (0) /workplace/software/tvm/tvm8/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x11c) [0x7fe557a88cdc]\n  File "/workplace/software/tvm/tvm8/src/runtime/rpc/rpc_endpoint.cc", line 807\nTVMError: Check f'), ), error_no = 4, all_cost = 2.060210943222046, timestamp = 1620783123.3094442)[('tile_f', [-1, 4, 16, 8]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 2]), ('tile_ry', [-1, 1]), ('tile_rx', [-1, 3]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)], None, 214880
DEBUG: autotvm: No: 164 GFLOPS: 0.00 / 0.00 result: MeasureResult(costs = (RuntimeError('Traceback (most recent call last):\n  [bt] (5) /workplace/software/tvm/tvm8/build/libtvm.so(TVMFuncCall+0x10d) [0x7fe559f5e97d]\n  [bt] (4) /workplace/software/tvm/tvm8/build/libtvm.so(+0x4c3c3ca) [0x7fe55a05a3ca]\n  [bt] (3) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::RPCWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const+0xadf) [0x7fe55a0678ef]\n  [bt] (2) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::RPCClientSession::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)> const&)+0x65) [0x7fe55a04c3b5]\n  [bt] (1) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::RPCEndpoint::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)>)+0x594) [0x7fe55a032914]\n  [bt] (0) /workplace/software/tvm/tvm8/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x11c) [0x7fe557a88cdc]\n  File "/workplace/software/tvm/tvm8/src/runtime/rpc/rpc_endpoint.cc", line 807\nTVMError: Check f'), ), error_no = 4, all_cost = 1.7798535823822021, timestamp = 1620783124.7946866)[('tile_f', [-1, 1, 8, 16]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 1]), ('tile_ry', [-1, 3]), ('tile_rx', [-1, 3]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)], None, 108419
DEBUG: autotvm: No: 165 GFLOPS: 0.00 / 0.00 result: MeasureResult(costs = (RuntimeError('Traceback (most recent call last):\n  [bt] (5) /workplace/software/tvm/tvm8/build/libtvm.so(TVMFuncCall+0x10d) [0x7fe559f5e97d]\n  [bt] (4) /workplace/software/tvm/tvm8/build/libtvm.so(+0x4c3c3ca) [0x7fe55a05a3ca]\n  [bt] (3) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::RPCWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const+0xadf) [0x7fe55a0678ef]\n  [bt] (2) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::RPCClientSession::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)> const&)+0x65) [0x7fe55a04c3b5]\n  [bt] (1) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::RPCEndpoint::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)>)+0x594) [0x7fe55a032914]\n  [bt] (0) /workplace/software/tvm/tvm8/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x11c) [0x7fe557a88cdc]\n  File "/workplace/software/tvm/tvm8/src/runtime/rpc/rpc_endpoint.cc", line 807\nTVMError: Check f'), ), error_no = 4, all_cost = 3.2494418621063232, timestamp = 1620783126.4023278)[('tile_f', [-1, 8, 4, 1]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 64]), ('tile_ry', [-1, 3]), ('tile_rx', [-1, 1]), ('auto_unroll_max_step', 512), ('unroll_explicit', 1)], None, 620642
DEBUG: autotvm: No: 166 GFLOPS: 0.00 / 0.00 result: MeasureResult(costs = (InstantiationError('Traceback (most recent call last):\n  [bt] (8) /workplace/software/tvm/tvm8/build/libtvm.so(TVMFuncCall+0x10d) [0x7fe559f5e97d]\n  [bt] (7) /workplace/software/tvm/tvm8/build/libtvm.so(+0x2d3e246) [0x7fe55815c246]\n  [bt] (6) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::Pass::operator()(tvm::IRModule) const+0x217) [0x7fe557e81bb7]\n  [bt] (5) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::PassNode::operator()(tvm::IRModule) const+0xa0) [0x7fe557e9b460]\n  [bt] (4) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x913) [0x7fe5581569a3]\n  [bt] (3) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x223) [0x7fe558166ca3]\n  [bt] (2) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x90a) [0x7fe55889daba]\n  [bt] (1) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const+0x177) [0x7fe5588a48a7]\n  [bt] (0) /workplace/software/tvm/tvm8/build/libtvm.so(+0x4b42e6b) [0x7fe559f60e6b]\n  File "tvm/_ffi/_cython/./packed_func.pxi", line 55, in tvm._ffi._cy3.core.tvm_callback\n  File "/workplace/software/tvm/tvm8/python/tvm/autotvm/measure/measure_methods.py", line 747, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel'), ), error_no = 1, all_cost = 0.06744527816772461, timestamp = 1620783112.4252615)[('tile_f', [-1, 1, 16, 32]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 64]), ('tile_ry', [-1, 3]), ('tile_rx', [-1, 3]), ('auto_unroll_max_step', 512), ('unroll_explicit', 1)], None, 690779
DEBUG: autotvm: No: 167 GFLOPS: 0.00 / 0.00 result: MeasureResult(costs = (InstantiationError('Traceback (most recent call last):\n  [bt] (8) /workplace/software/tvm/tvm8/build/libtvm.so(TVMFuncCall+0x10d) [0x7fe559f5e97d]\n  [bt] (7) /workplace/software/tvm/tvm8/build/libtvm.so(+0x2d3e246) [0x7fe55815c246]\n  [bt] (6) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::Pass::operator()(tvm::IRModule) const+0x217) [0x7fe557e81bb7]\n  [bt] (5) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::PassNode::operator()(tvm::IRModule) const+0xa0) [0x7fe557e9b460]\n  [bt] (4) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x913) [0x7fe5581569a3]\n  [bt] (3) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x223) [0x7fe558166ca3]\n  [bt] (2) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x90a) [0x7fe55889daba]\n  [bt] (1) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const+0x177) [0x7fe5588a48a7]\n  [bt] (0) /workplace/software/tvm/tvm8/build/libtvm.so(+0x4b42e6b) [0x7fe559f60e6b]\n  File "tvm/_ffi/_cython/./packed_func.pxi", line 55, in tvm._ffi._cy3.core.tvm_callback\n  File "/workplace/software/tvm/tvm8/python/tvm/autotvm/measure/measure_methods.py", line 747, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel'), ), error_no = 1, all_cost = 0.0637214183807373, timestamp = 1620783112.4254885)[('tile_f', [-1, 32, 4, 1]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 256]), ('tile_ry', [-1, 1]), ('tile_rx', [-1, 3]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)], None, 99904
DEBUG: autotvm: No: 168 GFLOPS: 0.00 / 0.00 result: MeasureResult(costs = (InstantiationError('Traceback (most recent call last):\n  [bt] (8) /workplace/software/tvm/tvm8/build/libtvm.so(TVMFuncCall+0x10d) [0x7fe559f5e97d]\n  [bt] (7) /workplace/software/tvm/tvm8/build/libtvm.so(+0x2d3e246) [0x7fe55815c246]\n  [bt] (6) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::Pass::operator()(tvm::IRModule) const+0x217) [0x7fe557e81bb7]\n  [bt] (5) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::PassNode::operator()(tvm::IRModule) const+0xa0) [0x7fe557e9b460]\n  [bt] (4) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x913) [0x7fe5581569a3]\n  [bt] (3) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x223) [0x7fe558166ca3]\n  [bt] (2) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x90a) [0x7fe55889daba]\n  [bt] (1) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const+0x177) [0x7fe5588a48a7]\n  [bt] (0) /workplace/software/tvm/tvm8/build/libtvm.so(+0x4b42e6b) [0x7fe559f60e6b]\n  File "tvm/_ffi/_cython/./packed_func.pxi", line 55, in tvm._ffi._cy3.core.tvm_callback\n  File "/workplace/software/tvm/tvm8/python/tvm/autotvm/measure/measure_methods.py", line 747, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel'), ), error_no = 1, all_cost = 0.0635826587677002, timestamp = 1620783112.4256418)[('tile_f', [-1, 8, 2, 16]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 256]), ('tile_ry', [-1, 3]), ('tile_rx', [-1, 3]), ('auto_unroll_max_step', 512), ('unroll_explicit', 1)], None, 700213
DEBUG: autotvm: No: 169 GFLOPS: 0.00 / 0.00 result: MeasureResult(costs = (InstantiationError('Traceback (most recent call last):\n  [bt] (8) /workplace/software/tvm/tvm8/build/libtvm.so(TVMFuncCall+0x10d) [0x7fe559f5e97d]\n  [bt] (7) /workplace/software/tvm/tvm8/build/libtvm.so(+0x2d3e246) [0x7fe55815c246]\n  [bt] (6) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::Pass::operator()(tvm::IRModule) const+0x217) [0x7fe557e81bb7]\n  [bt] (5) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::PassNode::operator()(tvm::IRModule) const+0xa0) [0x7fe557e9b460]\n  [bt] (4) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x913) [0x7fe5581569a3]\n  [bt] (3) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x223) [0x7fe558166ca3]\n  [bt] (2) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x90a) [0x7fe55889daba]\n  [bt] (1) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const+0x177) [0x7fe5588a48a7]\n  [bt] (0) /workplace/software/tvm/tvm8/build/libtvm.so(+0x4b42e6b) [0x7fe559f60e6b]\n  File "tvm/_ffi/_cython/./packed_func.pxi", line 55, in tvm._ffi._cy3.core.tvm_callback\n  File "/workplace/software/tvm/tvm8/python/tvm/autotvm/measure/measure_methods.py", line 747, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel'), ), error_no = 1, all_cost = 0.08813166618347168, timestamp = 1620783112.4258099)[('tile_f', [-1, 32, 4, 1]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 256]), ('tile_ry', [-1, 1]), ('tile_rx', [-1, 3]), ('auto_unroll_max_step', 512), ('unroll_explicit', 1)], None, 662664
DEBUG: autotvm: No: 170 GFLOPS: 0.00 / 0.00 result: MeasureResult(costs = (RuntimeError('Traceback (most recent call last):\n  [bt] (5) /workplace/software/tvm/tvm8/build/libtvm.so(TVMFuncCall+0x10d) [0x7fe559f5e97d]\n  [bt] (4) /workplace/software/tvm/tvm8/build/libtvm.so(+0x4c3c3ca) [0x7fe55a05a3ca]\n  [bt] (3) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::RPCWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const+0xadf) [0x7fe55a0678ef]\n  [bt] (2) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::RPCClientSession::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)> const&)+0x65) [0x7fe55a04c3b5]\n  [bt] (1) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::RPCEndpoint::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)>)+0x594) [0x7fe55a032914]\n  [bt] (0) /workplace/software/tvm/tvm8/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x11c) [0x7fe557a88cdc]\n  File "/workplace/software/tvm/tvm8/src/runtime/rpc/rpc_endpoint.cc", line 807\nTVMError: Check f'), ), error_no = 4, all_cost = 1.749293327331543, timestamp = 1620783127.9215028)[('tile_f', [-1, 2, 2, 4]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 1, 1]), ('tile_rc', [-1, 128]), ('tile_ry', [-1, 3]), ('tile_rx', [-1, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 1)], None, 482349
DEBUG: autotvm: No: 171 GFLOPS: 0.00 / 0.00 result: MeasureResult(costs = (InstantiationError('Traceback (most recent call last):\n  [bt] (8) /workplace/software/tvm/tvm8/build/libtvm.so(TVMFuncCall+0x10d) [0x7fe559f5e97d]\n  [bt] (7) /workplace/software/tvm/tvm8/build/libtvm.so(+0x2d3e246) [0x7fe55815c246]\n  [bt] (6) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::Pass::operator()(tvm::IRModule) const+0x217) [0x7fe557e81bb7]\n  [bt] (5) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::PassNode::operator()(tvm::IRModule) const+0xa0) [0x7fe557e9b460]\n  [bt] (4) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x913) [0x7fe5581569a3]\n  [bt] (3) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x223) [0x7fe558166ca3]\n  [bt] (2) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x90a) [0x7fe55889daba]\n  [bt] (1) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const+0x177) [0x7fe5588a48a7]\n  [bt] (0) /workplace/software/tvm/tvm8/build/libtvm.so(+0x4b42e6b) [0x7fe559f60e6b]\n  File "tvm/_ffi/_cython/./packed_func.pxi", line 55, in tvm._ffi._cy3.core.tvm_callback\n  File "/workplace/software/tvm/tvm8/python/tvm/autotvm/measure/measure_methods.py", line 747, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel'), ), error_no = 1, all_cost = 0.05807065963745117, timestamp = 1620783112.4261537)[('tile_f', [-1, 8, 8, 8]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 32]), ('tile_ry', [-1, 1]), ('tile_rx', [-1, 3]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)], None, 89917
DEBUG: autotvm: No: 172 GFLOPS: 0.00 / 0.00 result: MeasureResult(costs = (InstantiationError('Traceback (most recent call last):\n  [bt] (8) /workplace/software/tvm/tvm8/build/libtvm.so(TVMFuncCall+0x10d) [0x7fe559f5e97d]\n  [bt] (7) /workplace/software/tvm/tvm8/build/libtvm.so(+0x2d3e246) [0x7fe55815c246]\n  [bt] (6) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::Pass::operator()(tvm::IRModule) const+0x217) [0x7fe557e81bb7]\n  [bt] (5) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::PassNode::operator()(tvm::IRModule) const+0xa0) [0x7fe557e9b460]\n  [bt] (4) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x913) [0x7fe5581569a3]\n  [bt] (3) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x223) [0x7fe558166ca3]\n  [bt] (2) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x90a) [0x7fe55889daba]\n  [bt] (1) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const+0x177) [0x7fe5588a48a7]\n  [bt] (0) /workplace/software/tvm/tvm8/build/libtvm.so(+0x4b42e6b) [0x7fe559f60e6b]\n  File "tvm/_ffi/_cython/./packed_func.pxi", line 55, in tvm._ffi._cy3.core.tvm_callback\n  File "/workplace/software/tvm/tvm8/python/tvm/autotvm/measure/measure_methods.py", line 747, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel'), ), error_no = 1, all_cost = 0.15351152420043945, timestamp = 1620783112.4263077)[('tile_f', [-1, 64, 1, 4]), ('tile_y', [-1, 7, 1, 1]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 16]), ('tile_ry', [-1, 1]), ('tile_rx', [-1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)], None, 791446
DEBUG: autotvm: No: 173 GFLOPS: 0.00 / 0.00 result: MeasureResult(costs = (RuntimeError('Traceback (most recent call last):\n  [bt] (5) /workplace/software/tvm/tvm8/build/libtvm.so(TVMFuncCall+0x10d) [0x7fe559f5e97d]\n  [bt] (4) /workplace/software/tvm/tvm8/build/libtvm.so(+0x4c3c3ca) [0x7fe55a05a3ca]\n  [bt] (3) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::RPCWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const+0xadf) [0x7fe55a0678ef]\n  [bt] (2) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::RPCClientSession::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)> const&)+0x65) [0x7fe55a04c3b5]\n  [bt] (1) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::RPCEndpoint::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)>)+0x594) [0x7fe55a032914]\n  [bt] (0) /workplace/software/tvm/tvm8/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x11c) [0x7fe557a88cdc]\n  File "/workplace/software/tvm/tvm8/src/runtime/rpc/rpc_endpoint.cc", line 807\nTVMError: Check f'), ), error_no = 4, all_cost = 1.7525591850280762, timestamp = 1620783129.3650928)[('tile_f', [-1, 2, 32, 1]), ('tile_y', [-1, 1, 1, 7]), ('tile_x', [-1, 1, 1, 7]), ('tile_rc', [-1, 1]), ('tile_ry', [-1, 1]), ('tile_rx', [-1, 1]), ('auto_unroll_max_step', 0), ('unroll_explicit', 0)], None, 3341
DEBUG: autotvm: No: 174 GFLOPS: 0.00 / 0.00 result: MeasureResult(costs = (InstantiationError('Traceback (most recent call last):\n  [bt] (8) /workplace/software/tvm/tvm8/build/libtvm.so(TVMFuncCall+0x10d) [0x7fe559f5e97d]\n  [bt] (7) /workplace/software/tvm/tvm8/build/libtvm.so(+0x2d3e246) [0x7fe55815c246]\n  [bt] (6) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::Pass::operator()(tvm::IRModule) const+0x217) [0x7fe557e81bb7]\n  [bt] (5) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::PassNode::operator()(tvm::IRModule) const+0xa0) [0x7fe557e9b460]\n  [bt] (4) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x913) [0x7fe5581569a3]\n  [bt] (3) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x223) [0x7fe558166ca3]\n  [bt] (2) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x90a) [0x7fe55889daba]\n  [bt] (1) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const+0x177) [0x7fe5588a48a7]\n  [bt] (0) /workplace/software/tvm/tvm8/build/libtvm.so(+0x4b42e6b) [0x7fe559f60e6b]\n  File "tvm/_ffi/_cython/./packed_func.pxi", line 55, in tvm._ffi._cy3.core.tvm_callback\n  File "/workplace/software/tvm/tvm8/python/tvm/autotvm/measure/measure_methods.py", line 747, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel'), ), error_no = 1, all_cost = 0.07019519805908203, timestamp = 1620783112.4266315)[('tile_f', [-1, 1, 1, 256]), ('tile_y', [-1, 1, 1, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 512]), ('tile_ry', [-1, 3]), ('tile_rx', [-1, 3]), ('auto_unroll_max_step', 512), ('unroll_explicit', 1)], None, 701576
DEBUG: autotvm: No: 175 GFLOPS: 0.00 / 0.00 result: MeasureResult(costs = (InstantiationError('Traceback (most recent call last):\n  [bt] (8) /workplace/software/tvm/tvm8/build/libtvm.so(TVMFuncCall+0x10d) [0x7fe559f5e97d]\n  [bt] (7) /workplace/software/tvm/tvm8/build/libtvm.so(+0x2d3e246) [0x7fe55815c246]\n  [bt] (6) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::Pass::operator()(tvm::IRModule) const+0x217) [0x7fe557e81bb7]\n  [bt] (5) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::PassNode::operator()(tvm::IRModule) const+0xa0) [0x7fe557e9b460]\n  [bt] (4) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x913) [0x7fe5581569a3]\n  [bt] (3) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x223) [0x7fe558166ca3]\n  [bt] (2) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x90a) [0x7fe55889daba]\n  [bt] (1) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const+0x177) [0x7fe5588a48a7]\n  [bt] (0) /workplace/software/tvm/tvm8/build/libtvm.so(+0x4b42e6b) [0x7fe559f60e6b]\n  File "tvm/_ffi/_cython/./packed_func.pxi", line 55, in tvm._ffi._cy3.core.tvm_callback\n  File "/workplace/software/tvm/tvm8/python/tvm/autotvm/measure/measure_methods.py", line 747, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel'), ), error_no = 1, all_cost = 0.05889439582824707, timestamp = 1620783112.4268005)[('tile_f', [-1, 16, 2, 16]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 512]), ('tile_ry', [-1, 3]), ('tile_rx', [-1, 1]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 1)], None, 772374
DEBUG: autotvm: No: 176 GFLOPS: 0.00 / 0.00 result: MeasureResult(costs = (InstantiationError('Traceback (most recent call last):\n  [bt] (8) /workplace/software/tvm/tvm8/build/libtvm.so(TVMFuncCall+0x10d) [0x7fe559f5e97d]\n  [bt] (7) /workplace/software/tvm/tvm8/build/libtvm.so(+0x2d3e246) [0x7fe55815c246]\n  [bt] (6) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::Pass::operator()(tvm::IRModule) const+0x217) [0x7fe557e81bb7]\n  [bt] (5) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::PassNode::operator()(tvm::IRModule) const+0xa0) [0x7fe557e9b460]\n  [bt] (4) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x913) [0x7fe5581569a3]\n  [bt] (3) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x223) [0x7fe558166ca3]\n  [bt] (2) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const+0x90a) [0x7fe55889daba]\n  [bt] (1) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::TVMRetValue tvm::runtime::PackedFunc::operator()<tvm::tir::PrimFunc, tvm::IRModule, tvm::transform::PassContext>(tvm::tir::PrimFunc&&, tvm::IRModule&&, tvm::transform::PassContext&&) const+0x177) [0x7fe5588a48a7]\n  [bt] (0) /workplace/software/tvm/tvm8/build/libtvm.so(+0x4b42e6b) [0x7fe559f60e6b]\n  File "tvm/_ffi/_cython/./packed_func.pxi", line 55, in tvm._ffi._cy3.core.tvm_callback\n  File "/workplace/software/tvm/tvm8/python/tvm/autotvm/measure/measure_methods.py", line 747, in verify_pass\n    raise InstantiationError("Skipped because of invalid gpu kernel")\ntvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel'), ), error_no = 1, all_cost = 0.05487465858459473, timestamp = 1620783112.4269478)[('tile_f', [-1, 2, 1, 128]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 7, 1, 1]), ('tile_rc', [-1, 16]), ('tile_ry', [-1, 3]), ('tile_rx', [-1, 3]), ('auto_unroll_max_step', 1500), ('unroll_explicit', 0)], None, 402811
DEBUG: autotvm: No: 177 GFLOPS: 0.00 / 0.00 result: MeasureResult(costs = (RuntimeError('Traceback (most recent call last):\n  [bt] (5) /workplace/software/tvm/tvm8/build/libtvm.so(TVMFuncCall+0x10d) [0x7fe559f5e97d]\n  [bt] (4) /workplace/software/tvm/tvm8/build/libtvm.so(+0x4c3c3ca) [0x7fe55a05a3ca]\n  [bt] (3) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::RPCWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const+0xadf) [0x7fe55a0678ef]\n  [bt] (2) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::RPCClientSession::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)> const&)+0x65) [0x7fe55a04c3b5]\n  [bt] (1) /workplace/software/tvm/tvm8/build/libtvm.so(tvm::runtime::RPCEndpoint::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)>)+0x594) [0x7fe55a032914]\n  [bt] (0) /workplace/software/tvm/tvm8/build/libtvm.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x11c) [0x7fe557a88cdc]\n  File "/workplace/software/tvm/tvm8/src/runtime/rpc/rpc_endpoint.cc", line 807\nTVMError: Check f'), ), error_no = 4, all_cost = 1.8380303382873535, timestamp = 1620783130.8207612)[('tile_f', [-1, 2, 1, 16]), ('tile_y', [-1, 1, 7, 1]), ('tile_x', [-1, 1, 7, 1]), ('tile_rc', [-1, 1]), ('tile_ry', [-1, 1]), ('tile_rx', [-1, 3]), ('auto_unroll_max_step', 512), ('unroll_explicit', 0)], None, 213565

The error may be related to the information from the above text: Skipped because of invalid gpu kernel. But I can not understand it.

GPU: GeForce GTX 1080 Ti

comaniac · May 12, 2021, 4:28pm

InstantiationError can be ignored. The problem is the RuntimeError. According to the traceback, it failed here:

github.com

apache/tvm/blob/main/src/runtime/rpc/rpc_endpoint.cc#L805




  code = HandleUntilReturnEvent(true, encode_return);
  ICHECK(code == RPCCode::kReturn) << "code=" << RPCCodeToString(code);
}


void RPCEndpoint::CopyToRemote(void* from_bytes, DLTensor* to, uint64_t nbytes) {
  std::lock_guard<std::mutex> lock(mutex_);
  RPCCode code = RPCCode::kCopyToRemote;


  uint64_t tensor_total_size_bytes = static_cast<uint64_t>(GetDataSize(*to));
  ICHECK_LE(to->byte_offset + nbytes, tensor_total_size_bytes)
      << "CopyToRemote: overflow in tensor size: (byte_offset=" << to->byte_offset
      << ", nbytes=" << nbytes << ", tensor_total_size=" << tensor_total_size_bytes << ")";


  uint64_t overhead = RemoteCopyCalculatePacketOverheadSize(to, code, nbytes);
  uint64_t packet_nbytes = overhead + nbytes;


  handler_->Write(packet_nbytes);
  handler_->Write(code);
  RPCReference::SendDLTensor(handler_, to);
  handler_->Write(nbytes);

It indicates that the tensor size is too large to be sent via RPC.

sqchao · May 17, 2021, 4:33am

batch_size has been set to the minimum value of 1(only a single picture was used as input), and I can’t control the tenor size in the model. Is there an alternative way to solve this problem?

comaniac · May 17, 2021, 4:56am

(1, 3, 224, 224) is a very common tensor size and it should not be a problem. You probably need to check the RPC or buffer allocation issues. For example, you may try to dump the requested size and the actual buffer size from that error message, which was truncated by default, to see what exactly the size causes this issue.

Least1924 · November 10, 2021, 9:46am

hello,have you sloved this problem? i meet same one… even set n_trial = len(task.config_space), the error still exists. GPU: NVIDIA Tesla V100

guminhao0317 · November 10, 2021, 11:32am

I have same problem with you when I use AutoTVM to tune a Pytorch mode (resnet50),have you sloved it ?

pzq · November 10, 2021, 12:38pm

what kind error do you meet? InstantiationError or RuntimeError?

Can you show us the debugging info?

Least1924 · November 11, 2021, 1:59am

Thanks for your reply.

I am running the tvm example code (http://tvm.apache.org/docs/tutorial/autotvm_relay_x86.html).
when target==llvm,it works well,but jsut changed target==cuda(or tvm.target.cuda()),tuning process failured.

The part content of /tmp/tvm_tuning_errors_abkxdv1y.log is here.

Traceback (most recent call last):
  9: TVMFuncCall
  8: tvm::runtime::TypedPackedFunc<tvm::IRModule (tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, tvm::runtime::String const&, tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer, void, void> const&, bool)>::AssignTypedLambda<tvm::{lambda(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, tvm::runtime::String const&, tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer, void, void> const&, bool)#5}>(tvm::{lambda(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, tvm::runtime::String const&, tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer, void, void> const&, bool)#5}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const, tvm::runtime::TVMRetValue) const
  7: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
  6: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
  5: tvm::transform::Pass::operator()(tvm::IRModule) const
  4: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  3: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  2: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  1: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  0: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), TVMFuncCreateFromCFunc::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#2}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)
  File "/home/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 81, in cfun
    rv = local_pyfunc(*pyargs)
  File "/home/tvm/python/tvm/autotvm/measure/measure_methods.py", line 814, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel

comaniac · November 11, 2021, 2:48am

Could you provide a complete log instead of just a part of it? As I mentioned above InstantiationError should not be an issue as long as we have at least one schedule that can be instantiated.

pzq · November 11, 2021, 2:57am

@Least1924 According to the information you provided, the reason cannot be found temporarily.

Have you make any changes to the official tvm code? And which tvm commit were you using?

How about running the official sample code https://github.com/apache/tvm/blob/main/gallery/how_to/tune_with_autotvm/tune_conv2d_cuda.py in tvm to see if there are still errors?

If it encounters the same error, then there maybe something wrong with your environment and you need to check it first. If not, then we can discuss further.

Least1924 · November 11, 2021, 3:10am

Thanks,I will try the offical sample code first.

Least1924 · November 11, 2021, 3:12am

the complete log

Traceback (most recent call last):
  9: TVMFuncCall
  8: tvm::runtime::TypedPackedFunc<tvm::IRModule (tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, tvm::runtime::String const&, tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer, void, void> const&, bool)>::AssignTypedLambda<tvm::{lambda(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, tvm::runtime::String const&, tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer, void, void> const&, bool)#5}>(tvm::{lambda(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, tvm::runtime::String const&, tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer, void, void> const&, bool)#5}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const, tvm::runtime::TVMRetValue) const
  7: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
  6: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
  5: tvm::transform::Pass::operator()(tvm::IRModule) const
  4: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  3: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  2: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  1: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  0: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), TVMFuncCreateFromCFunc::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#2}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)
  File "/home/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 81, in cfun
    rv = local_pyfunc(*pyargs)
  File "/home/tvm/python/tvm/autotvm/measure/measure_methods.py", line 814, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel
Traceback (most recent call last):
  65: 0x00005556672ad3bf
        at ../sysdeps/x86_64/elf/start.S:103
  64: __libc_start_main
  63: _Py_UnixMain
        at /tmp/build/80754af9/python_1588882889832/work/Modules/main.c:3477
  62: pymain_main
        at /tmp/build/80754af9/python_1588882889832/work/Modules/main.c:3442
  61: pymain_run_python
        at /tmp/build/80754af9/python_1588882889832/work/Modules/main.c:2899
  60: pymain_run_module
        at /tmp/build/80754af9/python_1588882889832/work/Modules/main.c:355
  59: _PyFunction_FastCallDict
        at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:376
  58: _PyEval_EvalCodeWithName
        at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:3930
  57: _PyEval_EvalFrameDefault
        at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:3124
  56: call_function
        at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:4616
  55: _PyFunction_FastCallKeywords
        at /tmp/build/80754af9/python_1588882889
Traceback (most recent call last):
  65: 0x000055b70eacd3bf
        at ../sysdeps/x86_64/elf/start.S:103
  64: __libc_start_main
  63: _Py_UnixMain
        at /tmp/build/80754af9/python_1588882889832/work/Modules/main.c:3477
  62: pymain_main
        at /tmp/build/80754af9/python_1588882889832/work/Modules/main.c:3442
  61: pymain_run_python
        at /tmp/build/80754af9/python_1588882889832/work/Modules/main.c:2899
  60: pymain_run_module
        at /tmp/build/80754af9/python_1588882889832/work/Modules/main.c:355
  59: _PyFunction_FastCallDict
        at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:376
  58: _PyEval_EvalCodeWithName
        at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:3930
  57: _PyEval_EvalFrameDefault
        at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:3124
  56: call_function
        at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:4616
  55: _PyFunction_FastCallKeywords
        at /tmp/build/80754af9/python_1588882889
Traceback (most recent call last):
  65: 0x000055b7bf6033bf
        at ../sysdeps/x86_64/elf/start.S:103
  64: __libc_start_main
  63: _Py_UnixMain
        at /tmp/build/80754af9/python_1588882889832/work/Modules/main.c:3477
  62: pymain_main
        at /tmp/build/80754af9/python_1588882889832/work/Modules/main.c:3442
  61: pymain_run_python
        at /tmp/build/80754af9/python_1588882889832/work/Modules/main.c:2899
  60: pymain_run_module
        at /tmp/build/80754af9/python_1588882889832/work/Modules/main.c:355
  59: _PyFunction_FastCallDict
        at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:376
  58: _PyEval_EvalCodeWithName
        at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:3930
  57: _PyEval_EvalFrameDefault
        at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:3124
  56: call_function
        at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:4616
  55: _PyFunction_FastCallKeywords
        at /tmp/build/80754af9/python_1588882889
Traceback (most recent call last):
  9: TVMFuncCall
  8: tvm::runtime::TypedPackedFunc<tvm::IRModule (tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, tvm::runtime::String const&, tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer, void, void> const&, bool)>::AssignTypedLambda<tvm::{lambda(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, tvm::runtime::String const&, tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer, void, void> const&, bool)#5}>(tvm::{lambda(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, tvm::runtime::String const&, tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer, void, void> const&, bool)#5}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const, tvm::runtime::TVMRetValue) const
  7: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
  6: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
  5: tvm::transform::Pass::operator()(tvm::IRModule) const
  4: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  3: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  2: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  1: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  0: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), TVMFuncCreateFromCFunc::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#2}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)
  File "/home/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 81, in cfun
    rv = local_pyfunc(*pyargs)
  File "/home/tvm/python/tvm/autotvm/measure/measure_methods.py", line 814, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel
Traceback (most recent call last):
  9: TVMFuncCall
  8: tvm::runtime::TypedPackedFunc<tvm::IRModule (tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, tvm::runtime::String const&, tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer, void, void> const&, bool)>::AssignTypedLambda<tvm::{lambda(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, tvm::runtime::String const&, tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer, void, void> const&, bool)#5}>(tvm::{lambda(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, tvm::runtime::String const&, tvm::runtime::Map<tvm::te::Tensor, tvm::tir::Buffer, void, void> const&, bool)#5}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}::operator()(tvm::runtime::TVMArgs const, tvm::runtime::TVMRetValue) const
  7: tvm::LowerSchedule(tvm::te::Schedule, tvm::runtime::Array<tvm::runtime::ObjectRef, void> const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::unordered_map<tvm::te::Tensor, tvm::tir::Buffer, std::hash<tvm::te::Tensor>, std::equal_to<tvm::te::Tensor>, std::allocator<std::pair<tvm::te::Tensor const, tvm::tir::Buffer> > > const&, bool)
  6: tvm::LowerWithPassList(tvm::IRModule, tvm::runtime::Array<tvm::transform::Pass, void>)
  5: tvm::transform::Pass::operator()(tvm::IRModule) const
  4: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  3: tvm::transform::SequentialNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  2: tvm::transform::Pass::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  1: tvm::tir::transform::PrimFuncPassNode::operator()(tvm::IRModule, tvm::transform::PassContext const&) const
  0: std::_Function_handler<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*), TVMFuncCreateFromCFunc::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#2}>::_M_invoke(std::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)
  File "/home/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 81, in cfun
    rv = local_pyfunc(*pyargs)
  File "/home/tvm/python/tvm/autotvm/measure/measure_methods.py", line 814, in verify_pass
    raise InstantiationError("Skipped because of invalid gpu kernel")
tvm.autotvm.task.space.InstantiationError: Skipped because of invalid gpu kernel
Traceback (most recent call last):
  65: 0x0000559574d283bf
        at ../sysdeps/x86_64/elf/start.S:103
  64: __libc_start_main
  63: _Py_UnixMain
        at /tmp/build/80754af9/python_1588882889832/work/Modules/main.c:3477
  62: pymain_main
        at /tmp/build/80754af9/python_1588882889832/work/Modules/main.c:3442
  61: pymain_run_python
        at /tmp/build/80754af9/python_1588882889832/work/Modules/main.c:2899
  60: pymain_run_module
        at /tmp/build/80754af9/python_1588882889832/work/Modules/main.c:355
  59: _PyFunction_FastCallDict
        at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:376
  58: _PyEval_EvalCodeWithName
        at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:3930
  57: _PyEval_EvalFrameDefault
        at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:3124
  56: call_function
        at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:4616
  55: _PyFunction_FastCallKeywords
        at /tmp/build/80754af9/python_1588882889
Traceback (most recent call last):
  65: 0x000055b70eacd3bf
        at ../sysdeps/x86_64/elf/start.S:103
  64: __libc_start_main
  63: _Py_UnixMain
        at /tmp/build/80754af9/python_1588882889832/work/Modules/main.c:3477
  62: pymain_main
        at /tmp/build/80754af9/python_1588882889832/work/Modules/main.c:3442
  61: pymain_run_python
        at /tmp/build/80754af9/python_1588882889832/work/Modules/main.c:2899
  60: pymain_run_module
        at /tmp/build/80754af9/python_1588882889832/work/Modules/main.c:355
  59: _PyFunction_FastCallDict
        at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:376
  58: _PyEval_EvalCodeWithName
        at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:3930
  57: _PyEval_EvalFrameDefault
        at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:3124
  56: call_function
        at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:4616
  55: _PyFunction_FastCallKeywords
        at /tmp/build/80754af9/python_1588882889
Traceback (most recent call last):
  65: 0x000055cdfe1513bf
        at ../sysdeps/x86_64/elf/start.S:103
  64: __libc_start_main
  63: _Py_UnixMain
        at /tmp/build/80754af9/python_1588882889832/work/Modules/main.c:3477
  62: pymain_main
        at /tmp/build/80754af9/python_1588882889832/work/Modules/main.c:3442
  61: pymain_run_python
        at /tmp/build/80754af9/python_1588882889832/work/Modules/main.c:2899
  60: pymain_run_module
        at /tmp/build/80754af9/python_1588882889832/work/Modules/main.c:355
  59: _PyFunction_FastCallDict
        at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:376
  58: _PyEval_EvalCodeWithName
        at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:3930
  57: _PyEval_EvalFrameDefault
        at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:3124
  56: call_function
        at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:4616
  55: _PyFunction_FastCallKeywords
        at /tmp/build/80754af9/python_1588882889
Traceback (most recent call last):
  65: 0x000055a6e97ab3bf
        at ../sysdeps/x86_64/elf/start.S:103
  64: __libc_start_main
  63: _Py_UnixMain
        at /tmp/build/80754af9/python_1588882889832/work/Modules/main.c:3477
  62: pymain_main
        at /tmp/build/80754af9/python_1588882889832/work/Modules/main.c:3442
  61: pymain_run_python
        at /tmp/build/80754af9/python_1588882889832/work/Modules/main.c:2899
  60: pymain_run_module
        at /tmp/build/80754af9/python_1588882889832/work/Modules/main.c:355
  59: _PyFunction_FastCallDict
        at /tmp/build/80754af9/python_1588882889832/work/Objects/call.c:376
  58: _PyEval_EvalCodeWithName
        at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:3930
  57: _PyEval_EvalFrameDefault
        at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:3124
  56: call_function
        at /tmp/build/80754af9/python_1588882889832/work/Python/ceval.c:4616
  55: _PyFunction_FastCallKeywords
        at /tmp/build/80754af9/python_1588882889

Least1924 · November 11, 2021, 7:01am

the problem is sloved when i refered to offical example ‘tune_relay_cuda’ to adjust tuning_option.

I am new to TVM, thanks for your help！ @pzq @comaniac

@guminhao0317 ,Hi,hope this can help you too.

pzq · November 11, 2021, 7:24am

Good job!

Can you provide your new tuning_option here? It may help others who have the same problem.

LOL

Least1924 · November 11, 2021, 7:51am

I use autotvm to tune a onnx model (resnet50),here is the tuning_option

number = 20
repeat = 3
min_repeat_ms = 4  # since we're tuning on a CPU, can be set to 0
timeout = 150  # in seconds

# create a TVM runner
runner = autotvm.LocalRunner(
    number=number,
    repeat=repeat,
    timeout=timeout,
    min_repeat_ms=min_repeat_ms
)

tuning_option = {
    "tuner": "xgb",
    "trials": 2000,
    "early_stopping": 600,
    "measure_option": autotvm.measure_option(
        builder=autotvm.LocalBuilder(build_func="default"), runner=runner
    ),
    "tuning_records": "resnet50_1105_finetune_cp6_sim.json",
}

the most important i think is trials,i’m trying to undetstand why

guminhao0317 · November 13, 2021, 3:01am

Thank you，I will try it.

David-D · June 17, 2023, 11:29am

Hi, I also met this issue.

David-D · June 19, 2023, 1:33am

Thank you for hints. I enlarge the trials to 2000, while it still occur error. “Skipped because of invalid gpu kernel” I kick off a new issue here:

Could someone help take a look?