Motivation
TIR For
node is very simple, it always does a fixed number of iterations. But some irregular algorithm, like NMS, need the number of iterations that is dependent on input. This is because as soon as we find max_output_size
number of boxes, there is no need to find more boxes. Without While loop, however, we cannot realize this “early exit”, so our NMS is always slower than frameworks.
Binary search is another interesting use case of While loop. Binary search is needed to add support for numpy style searchsorted
function. @mbrookhart also needs binary search to improve his merge sort, using “merge-path” algorithm.
Design alternatives
As @tqchen pointed out, we have two alternatives for how we might approach adding While loop:
- V0: Put the condition into the For node as an additional field (what I’ve implemented)
- V1: Introduce a separate While node
The pros of V0 are:
- Implementation is very simple, minimal change
- If we want a single loop abstraction that encompasses all kinds of loops, to do more advanced analysis (than what TVM does today) in a unified manner, this approach might be better
Interestingly, ispc, which offers implicit SPMD programming model to programmers, takes this approach. For and While loop are represented by the same IR construct. Although While loop usually only makes sense in a serial loop, ispc allows programmers to use While loop and still vectorizes it (!!), using sophisticated analysis and codegen (It does so by carefully updating a mask per vector, to keep track of which lanes are active and only does the work for the active lane). So for ispc, unifying For and While makes a lot of sense.
The cons of V0 are:
- It adds a new field to For node which only makes sense for while-loop case (kind == “serial”). The other fields of For node are irrelevant to While loop (begin, extent, etc)
- Related to above, users always have to specify the loop upper bound, since this is required by For.
- I need to write a lot of
if(op->test) { ... }
, to check if this is a while loop or not. - It complicates analysis on For node. While loop doesn’t need any analysis (unless we want to vectorize it like ispc does), so this coupling is not nice.
The pros of V1 are:
- A clean decoupling of For and While.
- Codegen for While loop is simpler. It doesn’t need phi node.
- The user API would also be simpler (just needs condition)
The cons of V1 are the exact opposite of the pros of V0 (more effort, no unified analysis etc).
Discussion
I think for us and for now, V1 is the clear winner. Any other thoughts to favor V0? Personally I do find the design of ispc elegant and I’m amazed at what it does, but I think such implicit SPMD + auto vectorization is out of reach for TVM for now.
@tqchen @junrushao @vinx13 @mbrookhart @zhiics @kevinthesun @trevor-m @anijain2305