I am working with multiple models generated as TVM DSO, following example howto_deploy and bundle_deploy.
I noticed that every time a thread is running a model, I am obliged to call mod_factory.GetFunction (“default”) (ctx) and retrieve the get/set input and output functions since these methods are deleted after leaving the function. This adds an extra startup cost to each input processing.
What I’d like to do is cache these methods.
Is there a way?
Is it expected and a missing feature?
that’s exactly what I expected too.
But without calling mod_factory.GetFunction (“default”) (ctx), it doesn’t work: it won’t crash right away but get_output() produces random data.
By retrieving the module and getting the methods from it, then it produces correct results.
That is correct. You can call gmod = mod_factory.GetFunction (“default”) (ctx) to create the module, then cache the set/get/run PackedFunc from the gmod once
if I uncomment the code in invoke(), then it works. Otherwise, the cached PackedFunc are invalid, output->data contains whatever was there before calling invoke(). And when the object is destroyed, these cached PackedFunc are dangling pointers and crash.
I tried various ways to protect the cache of PackedFunc but with no success than just using commented code in invoke().
We might want to dig deeper about how the PackedFunc are transferred. It might be useful to think if it is caused by multi-threading and lack of synchronization (e.g. you will need to have lock or other sync mechanism to make sure effect of thread 1 is seen in thread 2). or does the same problem occur in the single threaded setting. We do need to make sure that tvm_set_input, run are member of the Filter class?
Notably, the graph module does need to be created per thread once, since the internal activation data is private per thread.
If you can post a minimum repro via how to deploy example, we can also dig a bit deeper.
I made a quick test with many std::thread sharing the gmod and PackedFuncs: no issue.
When I apply the same solution as an extension of the other library, where I have no idea what threading is used, same issue as before. Very weird.
Ok, it could be possible that the function you have is called in multiple threads. The graph runtime itself needs to be thread local because activation memory are preplanned and cannot be shared.
Try to allocated a thread local entry and cache the functions there