Hi,
I see that in the LLVM output code (obtained with lib.get_source()
after building a model) there are calls to @__TVMBackendParallelLaunch
. I assume this function is defined somewhere in the TVM runtime and it handles some kind of parallelization/multithreading.
Is there any documentation about this? Does anyone know where I could start looking?
1 Like
wweic
August 1, 2019, 12:41am
2
The parallel launch API is defined in tvm runtime api:
int TVMBackendParallelLaunch(
FTVMParallelLambda flambda,
void* cdata,
int num_task) {
int res = tvm::runtime::ThreadPool::ThreadLocal()->Launch(
flambda, cdata, num_task, 1);
return res;
}
Here(codegen_cpu
) is how calls to this API gets emitted:
Thank you!
I have been looking at the code, and there is something I can’t find: it seems here
CHECK_LE(num_task, num_workers_used_)
<< "Request parallel sync task larger than number of threads used "
<< " workers=" << num_workers_used_ << " request=" << num_task;
}
launcher->Init(flambda, cdata, num_task, need_sync != 0);
SpscTaskQueue::Task tsk;
tsk.launcher = launcher;
// if worker0 is taken by the master, queues_[0] is abandoned
for (int i = exclude_worker0_; i < num_task; ++i) {
tsk.task_id = i;
queues_[i]->Push(tsk);
}
// use the master thread to run task 0
if (exclude_worker0_) {
TVMParallelGroupEnv* penv = &(tsk.launcher->env);
if ((*tsk.launcher->flambda)(0, penv, cdata) == 0) {
tsk.launcher->SignalJobFinish();
} else {
tsk.launcher->SignalJobError(tsk.task_id);
}
}
that each worker receives the same task. Is this correct?
I have to assume then that somewhere else there is a mechanism to split the input data so that each worker performs the same operations on a different set of data. Where would that be defined?
wweic
August 1, 2019, 3:58pm
4
I think this section creates the parallel lambda, and it uses task_id
to grab its assigned portion of the data:
// Setup the closure function.
BasicBlock *lambda_entry = BasicBlock::Create(*ctx_, "entry", f);
builder_->SetInsertPoint(lambda_entry);
auto it = f->arg_begin();
llvm::Value* task_id = &(*it++);
llvm::Value* penv = &(*it++);
cdata = builder_->CreatePointerCast(&(*it++), cdata->getType());
// setup new variable map, swap it with current var context.
std::unordered_map<const Variable*, llvm::Value*> new_vmap;
UnpackClosureData(cdata, vfields, &new_vmap);
// setup parallel env
ParallelEnv par_env;
par_env.task_id = Var("task_id", Int(32));
par_env.num_task = Var("num_task", Int(32));
new_vmap[par_env.task_id.get()] = task_id;
new_vmap[par_env.num_task.get()] = builder_->CreateLoad(
builder_->CreateInBoundsGEP(
penv, {ConstInt32(0), ConstInt32(1)}));
par_env.penv = penv;
std::swap(function_, f);
This file has been truncated. show original
Is it possible to disable the parallelization? Is it included maybe in one of the “optimization levels”?
1 Like
masahi
December 10, 2019, 12:03pm
6
You can set TVM_NUM_THREADS to 1.
Setting NUM_THREADS still causes the code generator to create the call to @__TVMBackendParallelLaunch
, which is what I want to avoid.
(I know I shouldn’t, but I am trying to get rid of TVMRuntime so I need the LLVM IR to be as clean as possilble)
1 Like
I’m looking for something similar, were you to find a solution to this?