I agree that the current default rules for op fusion is not enough. So a configurable op fusion instead of fixed rules will be a very useful feature for accelerators, 3rd party code gen, and potentially training in the future.
I wonder if you’ll be interested at working on a RFC or design that can allow op fusion pass to take in a set of predefined rules. And then, we can move on to develop some ML-based searchers to find a best fusion strategy based on these rules.
also cc @jroesch @zhiics @tqchen @MarisaKirisame @vinx13