For a project, I want to train a number of models that can predict the execution time of a layer (from its relay description) on different hardware targets.
My current problem is, that I am unable to find a nice option to do this. The Debug Runtime measures the execution time for the low level functions, which include fused layers and cannot be directly mapped to relay nodes.
I looked into the Auto-Scheduler, as Ansor also works on a subgraph level, but it seems like it it also using measuring individual TIR functions.
I would like to work with the Relay representation as it enables the targeting of BYOC backends, which might be more relevant for highly heterogeneous targets.