I’d like the post-processing part of the model to be executed on the CPU instead of the accelerator, is there a method to tell TVM to ‘stop’ partitioning for external BYOC compiler at some specific Relay operators (e.g., the ArgMax
and Gather
op in this case)? Should I write a custom pass or use the existing functionality to achieve this (like using AnnotateTarget
and defining a custom pattern to isolate this subgraph)?
I don’t quite understand your question. In particular how do you define a “region”? If a “region” is just an operator, then BYOC won’t partition them for the accelerator if you didn’t mark those ops as offloadable.
A “region” is a subgraph that can be offloaded to some target as in the MergeCompilerRegions
pass. Essentially I want all the operators after the specific two operators (boundary) to be unoffloadable, but when they appear before the boundary, they can be offloadable. How can I do this?
Here is a simple demonstration of what I need:
Op (offloadable)
↓
Op (the_boundary, unoffloadable)
↓
Op (unoffloadable)
↓
Op (unoffloadable)
↓
... (unoffloadable)
Yes, this is the common issue that hardware vendor meet when integrating there tool chain through BYOC. You can refer the method used by tensorRT,
- parttion any op your acc supported, then maybe you get multiple your acc subgraph.
- reinline the subgraphs that haven’t big computation into relay main function.
The code of tensorRT already in the TVM repo.
Thanks, I’ll check that out.