Does graph_runtime run parallel in graph level?

lyq · April 25, 2018, 3:27am

I’m inspecting graph_runtime.cc and try to understand the internal inference data flow. I’ll deploy in server-class CPU.

My understanding is that graph_runtime.creat function bind Node with the corresponding operator when load graph.json and deploy.so.

When invoking the below function, the graph is evaluated in sequential.

void Run() {
// setup the array and requirements.
for (size_t i = 0; i < op_execs_.size(); ++i) {
if (op_execs_[i]) op_execs_i;
}
}

How to make graph_runtime run parallel in graph level?

yidawang · April 25, 2018, 5:40pm

AFAIK, we don’t have operation-level parallelization yet. I agree that this is extremely useful for some networks like RNNs. Contributions are definitely welcomed.

aca88 · March 21, 2019, 2:21pm

Hello,

I wanted to ask what is the status of this question?

I see that according to the repo code

github.com

dmlc/tvm/blob/5e3ceaa073f8540eec0e1334a9190041b88921c4/src/runtime/graph/graph_runtime.cc#L26


#include <string>
#include <memory>
#include <utility>


namespace tvm {
namespace runtime {


/*!
* \brief Run all the operations one by one.
*/
void GraphRuntime::Run() {
 // setup the array and requirements.
 for (size_t i = 0; i < op_execs_.size(); ++i) {
   if (op_execs_[i]) op_execs_[i]();
 }
}
/*!
* \brief Initialize the graph executor with graph and context.
* \param graph_json The execution graph.
* \param module The module containing the compiled functions for the host
* processor.

It would seem that (at graph level) no parallelization is possible.
But if I inspect other parts of the code base (for example)

github.com

dmlc/tvm/blob/5e3ceaa073f8540eec0e1334a9190041b88921c4/src/runtime/thread_pool.cc#L35




namespace tvm {
namespace runtime {


// stride in the page, fit to cache line.
constexpr int kSyncStride = 64 / sizeof(std::atomic<int>);


/*!
* \brief Thread local master environment.
*/
class ParallelLauncher {
public:
 // Reset the the task request.
 void Init(FTVMParallelLambda flambda,
           void* cdata,
           int num_task,
           bool need_sync) {
   num_pending_.store(num_task);
   this->cdata = cdata;
   this->flambda = flambda;
   this->env.num_task = num_task;

Then I get the feeling that parallelization is implemented at other levels.

Can anyone please clarify?

Thanks

wolverines · April 22, 2019, 7:24pm

Hi, from my understanding, the parallelLauncher is parallelism within one operator, not graph-level parallelism. And currently there is no way to parallel different operators. I am also working on this issue, maybe we can talk about it.