Creating subgraphs from a Tensor Expression

mbaret · October 7, 2020, 3:36pm

Is there a way in Python that I can create a ‘subgraph’ from a Tensor Expression? In particular, I have a large TE graph containing many operators and would like to lower only a small subgraph from within it. I’d expect all the inputs to the subgraph to be replaced with equivalent placeholder tensors.

Thanks

aca88 · October 7, 2020, 7:00pm

I don’t think I completely follow.

You have some chain of te stages. You want to pick a subset of these (i.e. the subgraph) and only (?) lower this subgraph?

What happens to the rest of the stages?

Wouldn’t tensorize be a way to technically handle a subset of the te stages “differently” (i.e. lowering to some intrinsic)?

Other than that, I would say I also only think I have seen something similar at the TIR level (VTA handles part of the TIR AST differently if they fit some pattern).

mbaret · October 7, 2020, 7:09pm

Essentially I’m trying to ‘partition’ a TE (that is, the graph of tensors/ComputeOps). The reason I want to do this is because the graph is very large (actually it’s a whole network lowered to TE) and I want to try some alternative scheduling options on only a small subgraph at a time. I can actually do this by only scheduling the ops in my subgraph (and leaving the rest to ‘default’ scheduling), but then I need to lower the entire graph for every small change in scheduling rather than just lowering the subgraph.

aca88 · October 8, 2020, 5:46am

So in essence: Normally a complete TE graph will be lowered to TIR representation. Your assumptions is that doing small changes in part of the TE graph should not propagate throughout all the TIR AST and you want to somehow “cache” the part of the TIR which is independent of changes of this subgraph. I guess since you are describing a complete network in TE lowering takes some time and you want to save this time. Correct?

Sadly I don’t know how to help you. The only thing that comes to mind is to do it at TIR level, but this would basically mean you need to reimplement the scheduling primitives (from TE) directly on TIR level. Maybe leverage the functionality from [RFC] TensorIR: A schedulable IR for TVM can help you?