Graph partitioning and Heterogeneous Execution

@zhiics I’m curious about the timeline for this item - this will be particularly useful for our upcoming FPGA backends (especially for PCI-E based FPGA platforms).

Thanks!

@thierry I think it is on the 0.5 roadmap. I have something now and we tested it successfully on a processor with Intel CPU and Intel Graphics. I am more focusing on some design details now.

Thanks, this will be very useful for FPGA backends down the road. Is a PR for this already open? If now what’s the timeline? Thank you

@thierry It’s currently in my private repo. We are testing it with some use cases. If you want it soon, I can probably add you to the repo and we can start from there. Sounds like a plan?

That would be fantastic! I’d be happy to provide feedback as well with respect to our FPGA examples. My github ID is tmoreau89.

Is it possible to make the WIP in a public fork? I am assuming many people watching this thread would be interested in what is going on, and it helps making things more accessible to broader community

Good idea, if that’s not too much of a hassle, it would be great to have the community also provide feedback on your WIP.

@tqchen @thierry Sounds good. I can check with somebody internally and see how to proceed.

1 Like

Got the approval and sent the WIP out: https://github.com/dmlc/tvm/pull/1688.

1 Like

What do you think to have a dedicated module which handles memory syncing across devices rather than explicit copy node in graph? Maybe too much rewrite… @tqchen

Dear All,

I am wondering current progress of this item? Can I partition the graph and run the in different devices with TVM?

1 Like

I am also curious about the current progress of partitioning the graph and running subgraphs on different devices in parallel.

BYOC has been landed. Please check the blogpost: How to Bring Your Own Codegen to TVM

Meanwhile, although we partition the graph and run subgraphs on difference devices, we don’t have a mechanism to run them in parallel.

I see, thank you for the clarification. Is moving to parallel execution for heterogeneous devices in the roadmap/are there active RFCs for this work?

Unfortunately we don’t have a plan to do so soon, mainly because 1) the current TVM runtime doesn’t support parallel execution, and 2) in most cases all parallelizable branches in the model are either fully offloaded to the external device or not. On the other hand, as I could recall, recently there are some discussions in the community about running multiple model inferences in parallel. This may be a good motivation of revisiting this issue.

Hello, have you solved this problem? Our team also encountered this problem recently, and may I ask you for help?