I’m looking for a way to run a partitioned graph (subgraphs, composite functions, etc…) on CPU only to help assist with integration efforts of a backend with the BYOC framework. The main point would be for us to be able to check at each intermediate step that our I/O for each operator and composite operator matches with the ‘ground-truth’ implementation.
Currently, if I partition the graph and then run with target as LLVM, then it will fail without a correct backend implementation for whatever should be offloaded in the partitioned subgraph.
This is a fair requirement. I guess you need to remove the kCompiler attribute in the partitioned functions to let it be treat as a normal Relay function, but this may result in two problems:
Whether the Relay fusion pass will fuse ops inside this function body (I guess the answer is yes but need verification).
Whether the TE compiler could handle nested functions; otherwise a partitioned function will be treat as a fused function, and TE compile may not be able to schedule it due to multiple anchor ops.
Good points, fusion inside the subgraph function bodies would probably be undesirable. If it was just inside composite functions I think that would be fine.
I thought I had read some time ago about a BYOC + PTQ (post training quantization) infrastructure which was running inferences on cpu for partitioned graphs as an initial step? I can’t seem to find it now, but curious if that line of work is still active.