[C++ runtime] multithreading and TVM runtime

mikeseven · November 24, 2020, 1:16am

I am working with multiple models generated as TVM DSO, following example howto_deploy and bundle_deploy.

I noticed that every time a thread is running a model, I am obliged to call mod_factory.GetFunction (“default”) (ctx) and retrieve the get/set input and output functions since these methods are deleted after leaving the function. This adds an extra startup cost to each input processing.

What I’d like to do is cache these methods.

Is there a way? Is it expected and a missing feature?

Thanks,

–mike

tqchen · November 24, 2020, 1:26am

You should be abe to directly cache the PackedFunc (set/get/run) as a member of your class.

mikeseven · November 24, 2020, 1:54am

that’s exactly what I expected too. But without calling mod_factory.GetFunction (“default”) (ctx), it doesn’t work: it won’t crash right away but get_output() produces random data.

By retrieving the module and getting the methods from it, then it produces correct results.

tqchen · November 24, 2020, 2:01pm

That is correct. You can call gmod = mod_factory.GetFunction (“default”) (ctx) to create the module, then cache the set/get/run PackedFunc from the gmod once

mikeseven · November 24, 2020, 10:56pm

Here is what I do:

// on thread 1 (configuration thread)
Filter::init() 
{
    mod_factory = tvm::runtime::Module::LoadFromFile (model_path);
    gmod = mod_factory.GetFunction ("default") (ctx);
    tvm_get_input = gmod.GetFunction ("get_input");
    tvm_set_input = gmod.GetFunction ("set_input");
    tvm_get_output = gmod.GetFunction ("get_output");
    tvm_run = gmod.GetFunction ("run");
}

// on thread 2 (runtime thread)
Filter::invoke(input, output) {
     // tvm::runtime::Module gmod = mod_factory.GetFunction ("default") (ctx);
     // tvm::runtime::PackedFunc tvm_set_input = gmod.GetFunction ("set_input");
     // tvm::runtime::PackedFunc tvm_get_output = gmod.GetFunction ("get_output");
     // tvm::runtime::PackedFunc tvm_run = gmod.GetFunction ("run");
      tvm_set_input (0, input->data);
      tvm_run();
      tvm_get_output(i, output->data);
}

if I uncomment the code in invoke(), then it works. Otherwise, the cached PackedFunc are invalid, output->data contains whatever was there before calling invoke(). And when the object is destroyed, these cached PackedFunc are dangling pointers and crash.

I tried various ways to protect the cache of PackedFunc but with no success than just using commented code in invoke().

Thanks for your help.

tqchen · November 25, 2020, 1:03am

We might want to dig deeper about how the PackedFunc are transferred. It might be useful to think if it is caused by multi-threading and lack of synchronization (e.g. you will need to have lock or other sync mechanism to make sure effect of thread 1 is seen in thread 2). or does the same problem occur in the single threaded setting. We do need to make sure that tvm_set_input, run are member of the Filter class?

Notably, the graph module does need to be created per thread once, since the internal activation data is private per thread.

If you can post a minimum repro via how to deploy example, we can also dig a bit deeper.

mikeseven · November 25, 2020, 1:36am

Before posting, I tried with various lock/mutex to protect tvm access but to no avail.

Indeed this looks like the issue of per thread creation.

The whole code is too large but I’ll try to make a simple test case to reproduce the issue and post it.

mikeseven · November 25, 2020, 5:42am

I made a quick test with many std::thread sharing the gmod and PackedFuncs: no issue. When I apply the same solution as an extension of the other library, where I have no idea what threading is used, same issue as before. Very weird.

tqchen · November 25, 2020, 1:21pm

Ok, it could be possible that the function you have is called in multiple threads. The graph runtime itself needs to be thread local because activation memory are preplanned and cannot be shared.

Try to allocated a thread local entry and cache the functions there

mikeseven · December 1, 2020, 3:59am

indeed, a thread_local works.

Thanks!