TVM multithreading on LLVM backend

Hi,

I see that in the LLVM output code (obtained with lib.get_source() after building a model) there are calls to @__TVMBackendParallelLaunch. I assume this function is defined somewhere in the TVM runtime and it handles some kind of parallelization/multithreading.

Is there any documentation about this? Does anyone know where I could start looking?

1 Like

The parallel launch API is defined in tvm runtime api:

Here(codegen_cpu) is how calls to this API gets emitted:

Thank you!
I have been looking at the code, and there is something I can’t find: it seems here


that each worker receives the same task. Is this correct?
I have to assume then that somewhere else there is a mechanism to split the input data so that each worker performs the same operations on a different set of data. Where would that be defined?

I think this section creates the parallel lambda, and it uses task_id to grab its assigned portion of the data:

Is it possible to disable the parallelization? Is it included maybe in one of the “optimization levels”?

1 Like

You can set TVM_NUM_THREADS to 1.

Setting NUM_THREADS still causes the code generator to create the call to @__TVMBackendParallelLaunch, which is what I want to avoid.

(I know I shouldn’t, but I am trying to get rid of TVMRuntime so I need the LLVM IR to be as clean as possilble)

1 Like

I’m looking for something similar, were you to find a solution to this?