Why relay.floor_mod allows float64?

Haoyang · February 12, 2022, 3:06pm

I am confused why relay.floor_mod allows variable of type float64 as parameter. So I wrote the following script:

import tvm
from tvm import relay
from tvm.contrib import graph_runtime
import numpy as np

def vmobj_to_list(o, dtype="float64"):
    if isinstance(o, tvm.nd.NDArray):
        return [o]
    elif isinstance(o, tvm.runtime.container.ADT):
        result = []
        for f in o:
            result.extend(vmobj_to_list(f, dtype))
        return result
    else:
        return o

var_0 = relay.var("var_0", dtype = "float64", shape = ()) 
var_1 = relay.exp(var_0)
const_2 = relay.const([787.644532], dtype="float64")
var_3 = relay.floor_mod(var_1, const_2)
tuple = relay.Tuple([var_3])
F = relay.Function([var_0], tuple)
mod = tvm.IRModule()
mod['main'] = F
mod = relay.transform.InferType()(mod)
graph, lib, params = relay.build(mod, target='llvm')
module = graph_runtime.create(graph, lib, tvm.device('llvm',0))
intrp = relay.build_module.create_executor('graph', mod, tvm.device('cuda',0),'cuda')
input_0= np.array(415.748715, dtype='float64')
module.set_input('var_0', input_0)
module.set_input(**params)
module.run()
res0_0 = module.get_output(0).asnumpy()
res1 = intrp.evaluate()(input_0)
res1 = vmobj_to_list(res1)
res1_0 = res1[0].asnumpy()
np.testing.assert_allclose(res0_0 ,res1_0, atol=1e-3, rtol=1e-3)

A difference between res0_0 and res1_0 then was caught by assert_allclose. I think it’s due to the fact that floor_mod on float64 is undefined, which would be interpreted in different ways by LLVM and CUDA. If it’s the case, why TVM still allow floor_mod to take float64 variable as parameter?

BTW, this is the message thrown by np.testing.assert_allclose(res0_0 ,res1_0, atol=1e-3, rtol=1e-3): Screenshot 2022-02-12 at 11.05.38 PM

wrongtest · February 13, 2022, 9:23am

Hi, as a context, we can see other dl-frameworks also allow that like tf.math.floormod | TensorFlow Core v2.8.0.

For TVM, below is where float floormod is lowered to normal arithmetics as a - floor(a/b) * b

github.com

apache/tvm/blob/7396be5645fa59cb10ae8ee14b718dbf7737390b/src/tir/transforms/lower_intrin.cc#L184-L186


if (dtype.is_float()) {
  // a - floor(a / b) * b
  return op->a - (VisitExpr_(tvm::floor(op->a / op->b).as<CallNode>()) * op->b);

Since the case in the test script is to compute floor_mod(e^c1, c2) . I think it would be great to check what sub-step actually cause the difference (exp, div, or floor()?). It is known that CUDA arithmetics do not take exact same precision with c/c++ standard math libs, Programming Guide :: CUDA Toolkit Documentation.

In my environment, the cpu result match what I test with raw python:

>>> import math
>>> def f(a,b): return a - math.floor(a/b) * b
...
>>> f(math.exp(415.748715), 787.644532)
4.606887725612233e+164

Haoyang · February 13, 2022, 11:42am

Thanks for your kind reply. I think I got the reason.