[RFC] CSE Optimization

I think there are two potential ways to think about it. We can either try to do CSE in the Expr level, or we can do CSE in the TE level. I think both will bring some of the benefit, so it would be helpful to support both variants