Why doesn't the official tutorial work?

I installed tvm from the source code according to the official website and passed the Python-based installation verification steps.

Subsequently, I tried running this chapter from the tutorial: End-to-End Optimize Model — tvm 0.20.dev0 documentation.

To quickly verify it, I replaced the resnet18 imported from torchvision with the following simple Torch model:

class SimpleCNN(nn.Module): def init(self): super(SimpleCNN, self).init() self.conv1 = nn.Conv2d(1, 16, kernel_size=3, padding=1) self.relu = nn.ReLU()

def forward(self, x):
    x = self.relu(self.conv1(x))
    return x

And I modified TOTAL_TRIALS to 10.

However, at this step: ex = tvm.compile(mod, target=“cuda”), I encountered an error:

TVMError: Traceback (most recent call last): 0: operator() at /home/cuijianhua/llm_workspace/mlc_workspace/learning_tvm/tvm/src/tir/analysis/verify_memory.cc:203 Did you forget to bind? Variable p_conv1_bias is directly accessed by host memory (it is not contained in a thread environment or in the function arguments. Variable T_reshape is directly accessed by host memory (it is not contained in a thread environment or in the function arguments. File “/home/cuijianhua/learning_tvm/tvm/src/tir/analysis/verify_memory.cc”, line 203 RuntimeError: Memory verification failed with the following errors:

As a beginner, I’m very confused and don’t know where to start troubleshooting this kind of issue. Theoretically, I only modified a small, seemingly unimportant part of the tutorial, so why isn’t it working? Are there any relevant documents I can refer to? I couldn’t find effective help from Google or ChatGPT.

Can you provide a full self-contained example that one can use to reproduce? The lack of formatting here makes it unclear.

10 is not large enough :slight_smile:

Thank you very much for your response. I’ll immediately adjust TOTAL_TRIALS to 8000 to match the tutorial and give it a try.

By the way, I’d like to ask about this kind of issue (i.e., whether “TOTAL_TRAILS” affects binding in kernel). I’m not sure where to find similar information.

I searched for both “get_pipeline” and “static_shape_tuning” on Apache TVM Documentation — tvm 0.20.dev0 documentation but couldn’t find relevant details. Have I possibly overlooked some documentation?

Sorry for the poor format, I’ve reorganized my code and pasted it here:

import tvm
import torch
import torch.nn as nn

import tvm
from tvm import relax
from tvm.relax.frontend.onnx import from_onnx
from tvm.relax.frontend.torch import from_exported_program
import numpy as np
import os
import shutil

class SimpleCNN(nn.Module):
    def __init__(self):
        super(SimpleCNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 16, kernel_size=3, padding=1)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.relu(self.conv1(x))
        return x
        
from torch.export import export
torch_model = SimpleCNN()
torch_model.eval()

example_args = (torch.randn(1, 1, 28, 28), )

with torch.no_grad():
    exported_program = export(torch_model, example_args)
    mod = from_exported_program(exported_program, keep_params_as_input=True)

mod, params = relax.frontend.detach_params(mod)
mod.show()

TOTAL_TRIALS = 10 # Change to 20000 for better performance if needed
target = tvm.target.Target("nvidia/geforce-gtx-1080-ti")  # Change to your target device
work_dir = "tuning_logs"

mod = relax.get_pipeline("static_shape_tuning", target=target, total_trials=TOTAL_TRIALS)(mod)

# Only show the main function
mod["main"].show()

ex = tvm.compile(mod, target="cuda")

Thank you for the reminder!

During pipeline tuning, schedules (which determine thread binding) are only generated as part of the tuning process. If TOTAL_TRIALS is insufficient for all kernels to discover valid schedules, the remaining kernels will be left unscheduled (without thread binding), causing such issue.

In this case, the value 20 for TOTAL_TRIALS is too small, likely allowing only the first kernel to be properly tuned while leaving subsequent kernels unscheduled. This is the root cause of the problem.

Thanks for the explanation, it works well after adjusting TOTAL_TRIALS to 8000.