Potential bug: Floating point exception (core dumped) after relay.floor_mod ()

Haoyang · January 8, 2022, 10:57am

import tvm
from tvm import relay
from tvm.contrib import graph_runtime
import numpy as np
x = relay.var("x", dtype = "uint64", shape = ())
y = relay.const([[[51,18,93,48],[67,99,64,92],[14,76,21,91],[57,4,53,1],[59,84,94,40],        [21,75,84,28]],[[73,19,94,19],[54,47,50,5],[18,43,53,85],[94,69,29,9],[45,51,0,54],[7,6,55,66]],[[42,2,58,64],[77,42,92,3],[61,86,22,67],[86,24,72,4],[20,77,41,14],[46,71,75,91]],[[74,76,45,81],[34,53,48,76],[55,6,40,84],[1,84,87,14],[23,62,82,9],[86,6,65,6]],[[84,6,21,30],[77,96,74,3],[24,71,85,10],[24,85,87,31],[43,79,16,96],[16,3,11,39]],[[17,45,0,4],[51,17,62,35],[23,83,18,53],[32,92,56,8],[15,93,19,40],[78,58,23,74]],[[37,39,70,53],[95,81,44,12],[78,96,68,30],[13,31,17,37],[14,35,42,98],[79,98,59,95]]], dtype = "uint64") # shape=(7, 6, 4)
z = relay.floor_mod(x, y)
tuple = relay.Tuple([z])
F = relay.Function([x], tuple)
mod = tvm.IRModule()
mod['main'] = F
graph, lib, params = relay.build(mod, target='llvm')
module = graph_runtime.create(graph, lib,tvm.cpu(0))
input_x= np.array(1, dtype='uint64')
module.set_input('x', input_x)
module.set_input(**params)
module.run()

This python script incured Floating point exception (core dumped), which is really confusing because the divisor is not zero. The only variable possessing 0 value is z, the result. I do not know why TVM disallows this behaviour. By the way, I use numpy to do the similar calculation using the following script and no crash happened and no warning message thrown.

import numpy as np
x = np.array(0, dtype='uint64')
y = np.array([[[51,18,93,48],[67,99,64,92],[14,76,21,91],[57,4,53,1],[59,84,94,40],[21,75,84,28]],[[73,19,94,19],[54,47,50,5],[18,43,53,85],[94,69,29,9],[45,51,0,54],[7,6,55,66]],[[42,2,58,64],[77,42,92,3],[61,86,22,67],[86,24,72,4],[20,77,41,14],[46,71,75,91]],[[74,76,45,81],[34,53,48,76],[55,6,40,84],[1,84,87,14],[23,62,82,9],[86,6,65,6]],[[84,6,21,30],[77,96,74,3],[24,71,85,10],[24,85,87,31],[43,79,16,96],[16,3,11,39]],[[17,45,0,4],[51,17,62,35],[23,83,18,53],[32,92,56,8],[15,93,19,40],[78,58,23,74]],[[37,39,70,53],[95,81,44,12],[78,96,68,30],[13,31,17,37],[14,35,42,98],[79,98,59,95]]], dtype = "uint64")
print(x)
print(np.fmod(x, y))

Haoyang · January 10, 2022, 3:45am

Sorry for asking this simple question… I think I know why.

AndrewZhaoLuo · January 10, 2022, 10:47pm

Haoyang:

import tvm
from tvm import relay
from tvm.contrib import graph_runtime
import numpy as np
x = relay.var("x", dtype = "uint64", shape = ())
y = relay.const([[[51,18,93,48],[67,99,64,92],[14,76,21,91],[57,4,53,1],[59,84,94,40],        [21,75,84,28]],[[73,19,94,19],[54,47,50,5],[18,43,53,85],[94,69,29,9],[45,51,0,54],[7,6,55,66]],[[42,2,58,64],[77,42,92,3],[61,86,22,67],[86,24,72,4],[20,77,41,14],[46,71,75,91]],[[74,76,45,81],[34,53,48,76],[55,6,40,84],[1,84,87,14],[23,62,82,9],[86,6,65,6]],[[84,6,21,30],[77,96,74,3],[24,71,85,10],[24,85,87,31],[43,79,16,96],[16,3,11,39]],[[17,45,0,4],[51,17,62,35],[23,83,18,53],[32,92,56,8],[15,93,19,40],[78,58,23,74]],[[37,39,70,53],[95,81,44,12],[78,96,68,30],[13,31,17,37],[14,35,42,98],[79,98,59,95]]], dtype = "uint64") # shape=(7, 6, 4)
z = relay.floor_mod(x, y)
tuple = relay.Tuple([z])
F = relay.Function([x], tuple)
mod = tvm.IRModule()
mod['main'] = F
graph, lib, params = relay.build(mod, target='llvm')
module = graph_runtime.create(graph, lib,tvm.cpu(0))
input_x= np.array(1, dtype='uint64')
module.set_input('x', input_x)
module.set_input(**params)
module.run()

Hey Haoyang, I tried running your script on latest TVM revision and could not reproduce. Maybe try updating TVM off a dev branch?

Haoyang · January 11, 2022, 6:22am

Thanks for your kind reply. I will try the latest release.

Haoyang · January 25, 2022, 12:52pm

Hi Andrew. Though so many days passed, I’m still confused by this phenomenon. I have tried the latest version but it still crashed with Floating point exception (core dumped) on the statement graph, lib, params = relay.build(mod, target=‘llvm’). Could you please tell me the environment you install tvm on? I suspect different environments may have different outputs. Besides that, I think TVM is too strict to the occurence of “divide by 0”. Sometimes the computational graph is structrually correct but one variable , which is a divisor, may be assgined with 0 at runtime. This circumstance is possible but TVM will terminate and make the user confused and think the computational graph / network contains a problem.

AndrewZhaoLuo · January 25, 2022, 7:31pm

Haoyang:

import tvm
from tvm import relay
from tvm.contrib import graph_runtime
import numpy as np
x = relay.var("x", dtype = "uint64", shape = ())
y = relay.const([[[51,18,93,48],[67,99,64,92],[14,76,21,91],[57,4,53,1],[59,84,94,40],        [21,75,84,28]],[[73,19,94,19],[54,47,50,5],[18,43,53,85],[94,69,29,9],[45,51,0,54],[7,6,55,66]],[[42,2,58,64],[77,42,92,3],[61,86,22,67],[86,24,72,4],[20,77,41,14],[46,71,75,91]],[[74,76,45,81],[34,53,48,76],[55,6,40,84],[1,84,87,14],[23,62,82,9],[86,6,65,6]],[[84,6,21,30],[77,96,74,3],[24,71,85,10],[24,85,87,31],[43,79,16,96],[16,3,11,39]],[[17,45,0,4],[51,17,62,35],[23,83,18,53],[32,92,56,8],[15,93,19,40],[78,58,23,74]],[[37,39,70,53],[95,81,44,12],[78,96,68,30],[13,31,17,37],[14,35,42,98],[79,98,59,95]]], dtype = "uint64") # shape=(7, 6, 4)
z = relay.floor_mod(x, y)
tuple = relay.Tuple([z])
F = relay.Function([x], tuple)
mod = tvm.IRModule()
mod['main'] = F
graph, lib, params = relay.build(mod, target='llvm')
module = graph_runtime.create(graph, lib,tvm.cpu(0))
input_x= np.array(1, dtype='uint64')
module.set_input('x', input_x)
module.set_input(**params)
module.run()

Yeah this is very perplexing. Perhaps it is more useful for you to tell me about your setup? I am on m1 mac with llvm 12.01.

It could be a codegen difference of some sort. I am on an m1 mac (so ARM) and in TVM modulo by 0 just returns the original number. Perhaps there is an argument that modulo by 0 should not throw but return NaN or something.

There is an argument though that your example is malformed since it will take the modulo 0 (which I just noticed). Do you expect modulo 0 to not do something wrong?

Haoyang · January 26, 2022, 4:52am

Thanks for your reply. I am on Centos(x86_64 architecture) with AMD Ryzen Threadripper 3970X 32-Core Processor.

I think it’s more reasonable to throw a warning instead of terminating directly when modulo 0. The reason for this idea is stated previously:Sometimes the computational graph is structrually correct but one variable , which is a divisor, may be assgined with 0 at runtime.