Meta_schedule tuning for GPU does not work

I just build and install the latest TVM according to the instruction, Install from Source — tvm 0.22.dev0 documentation.

(I did the [echo “set(USE_CUDA ON)” >> config.cmake])

It works well for CPU kernel; but when I tried for GPU, it does not work. Could anyone help me?

Program:

import os, sys, time, argparse, tvm, json
from tvm import te
from tvm import meta_schedule as ms
from tvm.meta_schedule.runner.config import EvaluatorConfig
from tvm.script import tir as T

from typing import Tuple

from tvm import te
from tvm.te import create_prim_func
from tvm.target import Target


M, N, K = 64, 64, 64

def matmul(
    n: int, m: int, k: int, in_dtype: str = "float32", out_dtype: str = "float32"
) -> Tuple[te.Tensor, te.Tensor, te.Tensor]:
    a = te.placeholder((n, k), name="A", dtype=in_dtype)
    b = te.placeholder((k, m), name="B", dtype=in_dtype)
    k = te.reduce_axis((0, k), name="k")
    c = te.compute(
        (n, m),
        lambda i, j: te.sum(a[i, k].astype(out_dtype) * b[k, j].astype(out_dtype), axis=[k]),
        name="C",
    )
    return (a, b, c)

if __name__ == "__main__":
    #target = tvm.target.Target(f"cuda -max_threads_per_block 1024 -max_shared_memory_per_block 49152")
    target = tvm.target.Target({"kind": "cuda", "arch": "sm_70", "max_threads_per_block": 1024, "max_shared_memory_per_block": 49152}) # V100

    database = ms.tune_tir(
        mod=create_prim_func(matmul(N, K, M, in_dtype="float32", out_dtype="float32")),#Matmul,
        target=target,
        max_trials_global=128,
        num_trials_per_iter=32,
        work_dir="./",
        runner=ms.runner.LocalRunner(
            evaluator_config=EvaluatorConfig(
                number=3,
                enable_cpu_cache_flush=False,
            )
        ),
        cost_model=ms.cost_model.XGBModel(
            extractor=ms.feature_extractor.PerStoreFeature(),
            adaptive_training=False,
        ),
        strategy=ms.search_strategy.EvolutionarySearch(),
    )

Result / output:



2025-10-14 05:51:09 [INFO] Logging directory: ./logs
2025-10-14 05:51:16 [INFO] LocalBuilder: max_workers = 16
2025-10-14 05:51:19 [INFO] [task_scheduler.cc:168] Initializing Task #0: "main"
2025-10-14 05:51:21 [INFO] [task_scheduler.cc:329] 
 ID | Name |   FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done 
-----------------------------------------------------------------------------------------------------
  0 | main | 524288 |      1 |            N/A |          N/A |                   N/A |      0 |      
-----------------------------------------------------------------------------------------------------
Total trials: 0
Total latency (us): 0

2025-10-14 05:51:21 [INFO] [task_scheduler.cc:189] TaskScheduler picks Task #0: "main"
2025-10-14 05:51:24 [INFO] [task_scheduler.cc:202] Sending 16 sample(s) to builder
2025-10-14 05:51:32 [INFO] [task_scheduler.cc:204] Sending 16 sample(s) to runner
2025-10-14 05:51:32 [INFO] [task_scheduler.cc:246] [Updated] Task #0: "main"
2025-10-14 05:51:32 [INFO] [task_scheduler.cc:329] 
 ID | Name |   FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done 
-----------------------------------------------------------------------------------------------------
  0 | main | 524288 |      1 |            N/A |          N/A |                   N/A |     16 |      
-----------------------------------------------------------------------------------------------------
Total trials: 0
Total latency (us): 0

2025-10-14 05:51:32 [INFO] [task_scheduler.cc:189] TaskScheduler picks Task #0: "main"
2025-10-14 05:51:35 [INFO] [task_scheduler.cc:202] Sending 16 sample(s) to builder
2025-10-14 05:51:43 [INFO] [task_scheduler.cc:204] Sending 16 sample(s) to runner
2025-10-14 05:51:43 [INFO] [task_scheduler.cc:246] [Updated] Task #0: "main"
2025-10-14 05:51:43 [INFO] [task_scheduler.cc:329] 
 ID | Name |   FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done 
-----------------------------------------------------------------------------------------------------
  0 | main | 524288 |      1 |            N/A |          N/A |                   N/A |     32 |      
-----------------------------------------------------------------------------------------------------
Total trials: 0
Total latency (us): 0

2025-10-14 05:51:43 [INFO] [task_scheduler.cc:189] TaskScheduler picks Task #0: "main"
2025-10-14 05:51:45 [INFO] [task_scheduler.cc:202] Sending 8 sample(s) to builder
2025-10-14 05:51:52 [INFO] [task_scheduler.cc:204] Sending 8 sample(s) to runner
2025-10-14 05:51:52 [INFO] [task_scheduler.cc:246] [Updated] Task #0: "main"
2025-10-14 05:51:52 [INFO] [task_scheduler.cc:329] 
 ID | Name |   FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done 
-----------------------------------------------------------------------------------------------------
  0 | main | 524288 |      1 |            N/A |          N/A |                   N/A |     40 |      
-----------------------------------------------------------------------------------------------------
Total trials: 0
Total latency (us): 0

2025-10-14 05:51:52 [INFO] [task_scheduler.cc:269] Task #0 has finished. Remaining task(s): 0
2025-10-14 05:51:52 [INFO] [task_scheduler.cc:329] 
 ID | Name |   FLOP | Weight | Speed (GFLOPS) | Latency (us) | Weighted Latency (us) | Trials | Done 
-----------------------------------------------------------------------------------------------------
  0 | main | 524288 |      1 |            N/A |          N/A |                   N/A |     40 |    Y 
-----------------------------------------------------------------------------------------------------
Total trials: 0
Total latency (us): 0

The tuning code runs well on my machine. You can try installing the latest version of TVM. Or you can run the test code under tests/python/meta_schedule to see if it passes, to determine if there is any problem with your TVM.

I tested the code in the latest release version(0.21.0), but it does not work too.