[BYOC] How to partition specific region of a Relay graph to CPU

I’d like the post-processing part of the model to be executed on the CPU instead of the accelerator, is there a method to tell TVM to ‘stop’ partitioning for external BYOC compiler at some specific Relay operators (e.g., the ArgMax and Gather op in this case)? Should I write a custom pass or use the existing functionality to achieve this (like using AnnotateTarget and defining a custom pattern to isolate this subgraph)?

I don’t quite understand your question. In particular how do you define a “region”? If a “region” is just an operator, then BYOC won’t partition them for the accelerator if you didn’t mark those ops as offloadable.

A “region” is a subgraph that can be offloaded to some target as in the MergeCompilerRegions pass. Essentially I want all the operators after the specific two operators (boundary) to be unoffloadable, but when they appear before the boundary, they can be offloadable. How can I do this?

Here is a simple demonstration of what I need:

Op (offloadable)
↓
Op (the_boundary, unoffloadable)
↓
Op (unoffloadable)
↓
Op (unoffloadable)
↓
...  (unoffloadable)

Yes, this is the common issue that hardware vendor meet when integrating there tool chain through BYOC. You can refer the method used by tensorRT,

  1. parttion any op your acc supported, then maybe you get multiple your acc subgraph.
  2. reinline the subgraphs that haven’t big computation into relay main function.

The code of tensorRT already in the TVM repo.

2 Likes

Thanks, I’ll check that out.