Enhance the robustness of division by handling division by -1

syang · October 20, 2021, 6:39am

In the C and C++ programming languages, the behavior division by −1 is actually undefined behavior and may return strange results. In TVM, I found that we have added many checks in terms of division by 0, but there are few checks about division by -1.

For example, this code will trigger a crash because of that:

import tvm
from tvm import tir
expr = tir.Div(tir.FloorDiv(tir.Cast('int32', 2.95148e+28), tir.atan(tir.IntImm('int32',1))), tir.FloorMod(301, 31))
stmt = tir.Evaluate(expr)
func = tir.PrimFunc({},stmt)
tvm.build(func)

I use GDB to trace the root cause. We can find that the program will crash when it executes int64_t rdiv = x / y;, and the value of y is -1.

To enhance the robustness of the division in TVM, should we add more checks about the division by -1?

junrushao · October 20, 2021, 4:21pm

Interesting. Thanks for reporting this bug with very detailed explanation!

To make sure I understand this issue correctly, does this happen in floordiv when x = -2^64 and y = -1, which causes the overflow?

syang · October 21, 2021, 3:51am

It seems that the overflow will occur when x=-2^63 and y = -1. As discussed in this blog, INT_MIN % -1 and INT_MIN / -1 in C/C++ are undefined behaviors.

I have simplified the test case:

import tvm
from tvm import tir
expr = tir.FloorDiv(tir.IntImm('int64', 0x8000000000000000), tir.IntImm('int64', -1))
analyzer = tvm.arith.Analyzer()
analyzer.rewrite_simplify(expr)

Similarly, FloorMod can also trigger this kind of bug:

import tvm
from tvm import tir
expr = tir.FloorMod(tir.IntImm('int64', 0x8000000000000000), tir.IntImm('int64', -1))
analyzer = tvm.arith.Analyzer()
analyzer.rewrite_simplify(expr)

junrushao · October 21, 2021, 4:29am

Ahh got it! Thanks for the explanation! Yes, it’s definitely a UB in C++

syang · October 31, 2021, 4:55pm

To explain more in detail, I have sorted out some examples to trigger this undefined behavior.

import tvm

expr = tvm.tir.Div(tvm.tir.IntImm('int64', 0x8000000000000000), tvm.tir.IntImm('int64', -1))
analyzer = tvm.arith.Analyzer()
analyzer.rewrite_simplify(expr)

import tvm

expr = tvm.tir.Mod(tvm.tir.IntImm('int64', 0x8000000000000000), tvm.tir.IntImm('int64', -1))
analyzer = tvm.arith.Analyzer()
analyzer.rewrite_simplify(expr)

import tvm

expr = tvm.tir.FloorDiv(tvm.tir.IntImm('int64', 0x8000000000000000), tvm.tir.IntImm('int64', -1))
analyzer = tvm.arith.Analyzer()
analyzer.rewrite_simplify(expr)

import tvm

expr = tvm.tir.FloorMod(tvm.tir.IntImm('int64', 0x8000000000000000), tvm.tir.IntImm('int64', -1))
analyzer = tvm.arith.Analyzer()
analyzer.rewrite_simplify(expr)

The root causes of these four test cases are located in src/arith/const_fold.h, as they only check whether the divisor is zero.

Similarly, the following two cases will also cause an arithmetic error.

import tvm
from tvm import tir

expr = tir.Div(tir.Div(tir.Cast('int32', 2.95148e+28), tir.atan(tir.IntImm('int32',1))), tir.FloorMod(101, 11))
analyzer = tvm.arith.Analyzer()
analyzer.rewrite_simplify(expr)

import tvm
from tvm import tir

expr = tir.Div(tir.FloorDiv(tir.Cast('int32', 2.95148e+28), tir.atan(tir.IntImm('int32',1))), tir.FloorMod(101, 11))
analyzer = tvm.arith.Analyzer()
analyzer.rewrite_simplify(expr)

This is because the function InfAwareDiv and InfAwareFloorDiv in src/arith/const_int_bound.cc also forget to cover this UB.

static int64_t InfAwareDiv(int64_t x, int64_t y) {
    ICHECK_NE(y, 0);
    if (x == kPosInf || x == kNegInf) {
        if (y > 0) return x;
        return -x;
    }
    return x / y;
}

static int64_t InfAwareFloorDiv(int64_t x, int64_t y) {
    ICHECK_NE(y, 0);
    if (x == kPosInf || x == kNegInf) {
        if (y > 0) return x;
        return -x;
    }
    return floordiv(x, y);
}