Is TVMRuntime thread-safe?

I’m going to deploy TVM on server-class CPU. As far as I know, there are three steps to inference after load module: set_input, get_output, run.

My question:

  1. Is TMVRuntime thread-safe?
  2. How to achieve low latency and throughput? Does batching meaningful?

depending on what you ask mean by thread safety

  • TVMRuntime functions, in general, is thread safe, this means you can call a single TVM built Function from multiple threads without a problem
  • The graph executor contains states, which means it is not thread-safe(you don’t want to share an executor among multiple threads) unless you use mutex to lock it.

It is best to use one post per question so people can find the answers accurately. The question about latency vs throughput is quite general and I think there is no direct answer to this, try things out if you like