If I have a tensor of shape (N, M, K) and I want to do a scan on the M-axis, I have to create an “init” and “state” that have M in the 0th dimension. If the result needs to have the same shape as the original, the output of the scan will need to be copied over.
- Are there any tricks to avoid the creation of temporary storage by scan and then copying it?
- Is there any plan to allow doing a scan on a non-0th axis?
- Right now, a ScanOp cannot be inlined. Is this a design decision, or is it just a current limitation?