I’m still stuck on reverse_compute_at which seems like a long name, and is still a bit too magical for me to understand
Yeah I once had some discussion with @tqchen and @spectrometerHBH that reverse_compute_at is too long and tedious, and I agree it will be great to find some better names (we are always bad at it admittedly)…
Perhaps @spectrometerHBH could give a specific example of using reverse_compute_at
to avoid duplicate splitting?
vmap lets you do something then “zoom” into the area, forget entirely about the outer dimension, and focus on that
Yeah vmap
is a powerful tool - it sounds more like “batching” (either static or dynamic), but its functionality can certainly go beyond that by being applied multiple times.
One problem of vmap
if we want to introduce to TensorIR is the semantics: do we consider it as adding a data parallel loop on top of the producer block of each output buffer? I believe @tqchen has more thoughts on this.
write a separately scoped bit of code that doesn’t even know about the outer construction at all
The idea sounds pretty much like our “block isolation” construct. A “block” in the TensorIR means doing isolated computation without having to know the outer scope.
The difference is that the with
scope you mentioned does not effectively change the IR, hence it is more like syntactic sugar to hint the schedule class that some primitives should operative locally, while the Block
in TensorIR is an IR construct that really enforces some restrictions.
CC @Hzfengsy @spectrometerHBH @tqchen would love to hear your opinions