Graph partitioning and Heterogeneous Execution

thierry · August 23, 2018, 8:11pm

@zhiics I’m curious about the timeline for this item - this will be particularly useful for our upcoming FPGA backends (especially for PCI-E based FPGA platforms).

Thanks!

zhiics · August 23, 2018, 11:09pm

@thierry I think it is on the 0.5 roadmap. I have something now and we tested it successfully on a processor with Intel CPU and Intel Graphics. I am more focusing on some design details now.

thierry · August 27, 2018, 6:52pm

Thanks, this will be very useful for FPGA backends down the road. Is a PR for this already open? If now what’s the timeline? Thank you

zhiics · August 31, 2018, 8:38pm

@thierry It’s currently in my private repo. We are testing it with some use cases. If you want it soon, I can probably add you to the repo and we can start from there. Sounds like a plan?

thierry · August 31, 2018, 8:51pm

That would be fantastic! I’d be happy to provide feedback as well with respect to our FPGA examples. My github ID is tmoreau89.

tqchen · August 31, 2018, 10:53pm

Is it possible to make the WIP in a public fork? I am assuming many people watching this thread would be interested in what is going on, and it helps making things more accessible to broader community

thierry · August 31, 2018, 11:02pm

Good idea, if that’s not too much of a hassle, it would be great to have the community also provide feedback on your WIP.

zhiics · August 31, 2018, 11:34pm

@tqchen @thierry Sounds good. I can check with somebody internally and see how to proceed.

zhiics · September 5, 2018, 4:10pm

Got the approval and sent the WIP out: https://github.com/dmlc/tvm/pull/1688.

jackwish · September 10, 2018, 2:11am

What do you think to have a dedicated module which handles memory syncing across devices rather than explicit copy node in graph? Maybe too much rewrite… @tqchen

davidzhou · May 19, 2020, 5:37am

Dear All,

I am wondering current progress of this item? Can I partition the graph and run the in different devices with TVM?

JakeStevens · March 17, 2021, 2:51pm

I am also curious about the current progress of partitioning the graph and running subgraphs on different devices in parallel.

comaniac · March 17, 2021, 5:24pm

BYOC has been landed. Please check the blogpost: How to Bring Your Own Codegen to TVM

Meanwhile, although we partition the graph and run subgraphs on difference devices, we don’t have a mechanism to run them in parallel.

JakeStevens · March 17, 2021, 6:13pm

I see, thank you for the clarification. Is moving to parallel execution for heterogeneous devices in the roadmap/are there active RFCs for this work?

comaniac · March 17, 2021, 6:36pm

Unfortunately we don’t have a plan to do so soon, mainly because 1) the current TVM runtime doesn’t support parallel execution, and 2) in most cases all parallelizable branches in the model are either fully offloaded to the external device or not. On the other hand, as I could recall, recently there are some discussions in the community about running multiple model inferences in parallel. This may be a good motivation of revisiting this issue.

zhuohua · June 25, 2023, 10:46am

Hello, have you solved this problem? Our team also encountered this problem recently, and may I ask you for help?