Profile on Relay Level?

I am bit confused, maybe I misunderstood your suggestion.

I am using the debug executor to measure the latency of the individual (fused) TIRfunctions, but I cannot tell which function corresponds to which part of the original/optimized relay graph. (example of TIR function name: fused_layout_transform_nn_batch_flatten)

So I am aware of the n:m mapping between Relay nodes and TIR functions, however, I would like to keep information about filter sizes and which operations are fused in the TIR functions. As the model to predict the performance needs additional information.