TVM build with Segmentation fault

N= 2**5
D = 64*64
M = 512


with tvm.target.cuda():
    X = te.placeholder((N, D), name="X")
    W = te.placeholder((D, M), name="W")
    B = te.placeholder((1, M), name="B")
    H = topi.einsum("ik,kj->ij",X,W)
    Y = H+B
    s=topi.cuda.schedule_injective([Y])
    code =tvm.lower(s, [X, W,B,H,Y], simple_mode=True)
    f = tvm.build(code,target = 'cuda',target_host="llvm")

I just want to build a simple perceptron with topi. And it comes up with this fault, printing some generate cuda codes while there is no print call.

I am using 0.8 dev of TVM, llvm 10.0. gcc7 and cuda 11.2 if that helps.

In my machine the output is:

Traceback (most recent call last):
  File ".\dev_test\foo.py", line 38, in <module>
    code =tvm.lower(s, [X, W,B,H,Y], simple_mode=True)
  File "E:\projects\tvm\python\tvm\driver\build_module.py", line 131, in lower
    return ffi.lower_schedule(inp, args, name, binds, simple_mode)
  File "E:\projects\tvm\python\tvm\_ffi\_ctypes\packed_func.py", line 233, in __call__
    ctypes.byref(ret_tcode),
OSError: exception: stack overflow

If you use smaller input size, everything will be fine. And if you replace einsum with matmul then the original input size is also ok.

I think this may be a bug in the einsum.

1 Like

Thanks for your suggestion. I am also learning TVM. But I don’t see the point of how the input size relates to the build process. IMO, it’s translating the generated C-like codes(the output of lower ) to executable dynamic lib(I saw .so file was created when the script was running).