Does Relax VM support multithread targeting on x86 CPU?

When I profile with htop or other profiling applications (like VTune), the parallelism is close to 1, unlike the Relay VM, which shows performance closer to the number of threads. Here is my code:

import tvm
from tvm import relax
import torch
import numpy as np
from torch import fx
from tvm.relax.frontend.torch import from_fx
from torchvision.models.resnet import ResNet18_Weights, resnet18
import os
import tempfile
num_threads = 16
os.environ["TVM_NUM_THREADS"] = str(num_threads)
device = tvm.cpu(0)
target = tvm.target.Target('llvm')

torch_model = resnet18(weights=ResNet18_Weights.DEFAULT)

# Give the input shape and data type
input_info = [((16, 3, 224, 224), "float32")]

# Convert the model to IRModule
with torch.no_grad():
    torch_fx_model = fx.symbolic_trace(torch_model)
    mod = from_fx(torch_fx_model, input_info)


ex = relax.build(mod, target=target)
vm = relax.VirtualMachine(ex, device=device)

gpu_data = tvm.nd.array(np.random.rand(16, 3, 224, 224).astype("float32"), device)
gpu_out = vm["main"](gpu_data).numpy()
print(gpu_out.shape)

For e2e build and optimization, please refer End-to-End Optimize Model — tvm 0.19.dev0 documentation

Sure, but is there any scheduler for CPU multithreading?

I also noticed that the generated LLVM code from the Relay build version contains __TVMBackendParallelLaunch, but the Relax build version does not. I believe that __TVMBackendParallelLaunch is likely one of the key APIs for multithreaded execution.

I got the point, thank you.

How about dynamic shape model tuning? I tried it with meta scheduler, but It seems to not support dynamic shape loop. The following code encountered an segmentation fault: the variable extent is a null pointer.

The above problem about dynamic shape model tuning is solved, but another issue occurred from the following: InternalError: Check failed: (!rv_names->count(output)) is false: ValueError: The random variable has been produced once: _

Current auto-tuning mechanism only supports static shape :slight_smile:

How can I limit the number of CPU cores used by the tuned model in end-to-end compilation?

Now I try to config the target num-cores like:

target = tvm.target.Target(
      f"llvm -mtriple={tvm.target.codegen.llvm_get_system_triple()} -mcpu={tvm.target.codegen.llvm_get_system_cpu()} -num-cores={parallelism}")

and limit the max_jobs_per_core as 1 in auto tuning pass:

ms.schedule_rule.ParallelizeVectorizeUnroll(1, 16, None, Ture)
ms.mutator.MutateParallel(1)

But the compiled model running in Relax VM seems to use only one core. Did I miss anything?

1 Like