How does TVM measure performance when tuning for a different target device?

From my understanding, during the tuning process TVM generates a batch of candidates for each task and then measures the performance (speed, latency, etc) and updates the task with the best performing candidate.

We can also specify lots of different target hardware(Jetson Orin, raspberry pi, etc) when tuning.

My question is, how does TVM measure the performance if I don’t physically have the target device with me? Is the performance metric measured on the host PC, where TVM is running on?

You can use TVM RPC to run and tune models on the target device.

Thanks for the response! One more question, is TVM RPC automatically used? For example in the End-to-End Optimize Model — tvm 0.22.dev0 documentation tutorial, if I change the target to

target = tvm.target.Target("raspberry-pi/4b-aarch64")

and run the code, then even if I don’t have the raspberry pi with me TVM will remotely simulate its running environment? Sorry if this is an obvious question

In my understanding, this only generates the binary for that target, it has nothing to do with execution. If you try to run it locally, it will fail. That’s my experience with TIR modules. I think relax modules have another runtime called relax_vm, I’m not very sure what will happen in there.

As for tuning for heterogeneous targets, I’m not sure if there’s a better way to do that, but my method was to limit parallelism and intrinsics to simulate the target device as much as possible on the computer running TVM.