I just build and install the latest TVM according to the instruction, Install from Source — tvm 0.22.dev0 documentation.
(I did the [echo “set(USE_CUDA ON)” >> config.cmake])
It works well for CPU kernel; but when I tried for GPU, it does not work. Could anyone help me?
Program:
import os, sys, time, argparse, tvm, json
from tvm import te
from tvm import meta_schedule as ms
from tvm.meta_schedule.runner.config import EvaluatorConfig
from tvm.script import tir as T
from typing import Tuple
from tvm import te
from tvm.te import create_prim_func
from tvm.target import Target
M, N, K = 64, 64, 64
def matmul(
n: int, m: int, k: int, in_dtype: str = "float32", out_dtype: str = "float32"
) -> Tuple[te.Tensor, te.Tensor, te.Tensor]:
a = te.placeholder((n, k), name="A", dtype=in_dtype)
b = te.placeholder((k, m), name="B", dtype=in_dtype)
k = te.reduce_axis((0, k), name="k")
c = te.compute(
(n, m),
lambda i, j: te.sum(a[i, k].astype(out_dtype) * b[k, j].astype(out_dtype), axis=[k]),
name="C",
)
return (a, b, c)
if __name__ == "__main__":
#target = tvm.target.Target(f"cuda -max_threads_per_block 1024 -max_shared_memory_per_block 49152")
target = tvm.target.Target({"kind": "cuda", "arch": "sm_70", "max_threads_per_block": 1024, "max_shared_memory_per_block": 49152}) # V100
database = ms.tune_tir(
mod=create_prim_func(matmul(N, K, M, in_dtype="float32", out_dtype="float32")),#Matmul,
target=target,
max_trials_global=128,
num_trials_per_iter=32,
work_dir="./",
runner=ms.runner.LocalRunner(
evaluator_config=EvaluatorConfig(
number=3,
enable_cpu_cache_flush=False,
)
),
cost_model=ms.cost_model.XGBModel(
extractor=ms.feature_extractor.PerStoreFeature(),
adaptive_training=False,
),
strategy=ms.search_strategy.EvolutionarySearch(),
)
Result / output:
2025-10-14 05:51:09 [INFO] Logging directory: ./logs
2025-10-14 05:51:16 [INFO] LocalBuilder: max_workers = 16
2025-10-14 05:51:19 [INFO] [task_scheduler.cc:168] Initializing Task #0: "main"
2025-10-14 05:51:21 [INFO] [task_scheduler.cc:329]
ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done
-----------------------------------------------------------------------------------------------------
0 | main | 524288 | 1 | N/A | N/A | N/A | 0 |
-----------------------------------------------------------------------------------------------------
Total trials: 0
Total latency (us): 0
2025-10-14 05:51:21 [INFO] [task_scheduler.cc:189] TaskScheduler picks Task #0: "main"
2025-10-14 05:51:24 [INFO] [task_scheduler.cc:202] Sending 16 sample(s) to builder
2025-10-14 05:51:32 [INFO] [task_scheduler.cc:204] Sending 16 sample(s) to runner
2025-10-14 05:51:32 [INFO] [task_scheduler.cc:246] [Updated] Task #0: "main"
2025-10-14 05:51:32 [INFO] [task_scheduler.cc:329]
ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done
-----------------------------------------------------------------------------------------------------
0 | main | 524288 | 1 | N/A | N/A | N/A | 16 |
-----------------------------------------------------------------------------------------------------
Total trials: 0
Total latency (us): 0
2025-10-14 05:51:32 [INFO] [task_scheduler.cc:189] TaskScheduler picks Task #0: "main"
2025-10-14 05:51:35 [INFO] [task_scheduler.cc:202] Sending 16 sample(s) to builder
2025-10-14 05:51:43 [INFO] [task_scheduler.cc:204] Sending 16 sample(s) to runner
2025-10-14 05:51:43 [INFO] [task_scheduler.cc:246] [Updated] Task #0: "main"
2025-10-14 05:51:43 [INFO] [task_scheduler.cc:329]
ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done
-----------------------------------------------------------------------------------------------------
0 | main | 524288 | 1 | N/A | N/A | N/A | 32 |
-----------------------------------------------------------------------------------------------------
Total trials: 0
Total latency (us): 0
2025-10-14 05:51:43 [INFO] [task_scheduler.cc:189] TaskScheduler picks Task #0: "main"
2025-10-14 05:51:45 [INFO] [task_scheduler.cc:202] Sending 8 sample(s) to builder
2025-10-14 05:51:52 [INFO] [task_scheduler.cc:204] Sending 8 sample(s) to runner
2025-10-14 05:51:52 [INFO] [task_scheduler.cc:246] [Updated] Task #0: "main"
2025-10-14 05:51:52 [INFO] [task_scheduler.cc:329]
ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done
-----------------------------------------------------------------------------------------------------
0 | main | 524288 | 1 | N/A | N/A | N/A | 40 |
-----------------------------------------------------------------------------------------------------
Total trials: 0
Total latency (us): 0
2025-10-14 05:51:52 [INFO] [task_scheduler.cc:269] Task #0 has finished. Remaining task(s): 0
2025-10-14 05:51:52 [INFO] [task_scheduler.cc:329]
ID | Name | FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done
-----------------------------------------------------------------------------------------------------
0 | main | 524288 | 1 | N/A | N/A | N/A | 40 | Y
-----------------------------------------------------------------------------------------------------
Total trials: 0
Total latency (us): 0