Apologies for the delay in reply–I just needed to find some time to sit down and read the RFC all the way through. This is great work @chunit @haowhsu-quic and Zack ! I’m supportive of moving this forwards.
In An enabling framework for int8 quantization, we discussed how to effectively track frontend layers throughout the compiler. It seems like you guys have taken the same approach we discussed there–leveraging the graph edges (e.g. tensors) as the “stable” part of the graph and labelling that Relay ops between them as belonging to the same frontend layers (e.g. RecursivelyFillSpan
). Right now, this needs to be done per-pass, but I wonder if we could get away with doing this once at the end of compilation if we also attach references to the frontend layer (or post-import variable) to each Relay Var
.
It seems like by annotating Var
we might be able to add this information.
One issue is that once we move outside of Relay (e.g. in AOT flow), it’s harder to fill span information back up through the compiler since the layer variables have changed. I’m curious if you guys tried to apply this to any TIR-based fusion?
Lastly, any idea how much additional memory this takes or performance impact?