Compiling SSD for CUDA target seg faults

reminisce · December 4, 2018, 9:59pm

I was trying to compile the SSD model in this tutorial for CUDA target. It initially failed here.

dmlc/tvm/blob/master/topi/python/topi/cuda/ssd/multibox.py#L281


    ax = (al + ar) / 2.0
    ay = (at + ab) / 2.0
    px = loc[loc_base_idx]
    py = loc[loc_base_idx + 1]
    pw = loc[loc_base_idx + 2]
    ph = loc[loc_base_idx + 3]
    ox = px * vx * aw + ax
    oy = py * vy * ah + ay
    ow = tvm.exp(pw * vw) * aw / 2.0
    oh = tvm.exp(ph * vh) * ah / 2.0
    return tvm.select(clip, tvm.make.Max(0, tvm.make.Min(1, ox - ow)), ox - ow), \
        tvm.select(clip, tvm.make.Max(0, tvm.make.Min(1, oy - oh)), oy - oh), \
        tvm.select(clip, tvm.make.Max(0, tvm.make.Min(1, ox + ow)), ox + ow), \
        tvm.select(clip, tvm.make.Max(0, tvm.make.Min(1, oy + oh)), oy + oh)


max_threads = int(
    tvm.target.current_target(allow_none=False).max_num_threads)
ib = tvm.ir_builder.create()
score = ib.buffer_ptr(temp_score_in)
cls_id = ib.buffer_ptr(temp_id)
flag = ib.buffer_ptr(temp_flag)

I understand that the current nms op only has CPU version. I just want to get the compilation pass to run through first and then work on the nms GPU version.

After changing tvm.make.Min/Max to tvm.min/max, it crashed with a seg fault in Halide.

[21:04:38] /home/ubuntu/unison/tvm/src/pass/arg_binder.cc:87: Trying to bind buffer to another one with lower alignment requirement  required_alignment=8, provided_alignment=4
[21:04:38] /home/ubuntu/unison/tvm/src/arithmetic/int_set.cc:514: cannot evaluate set type Load
[21:04:38] /home/ubuntu/unison/tvm/src/arithmetic/int_set.cc:514: cannot evaluate set type 

Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007fffe4dd9542 in tvm::NodeBase::IncRef (this=<error reading variable: Cannot access memory at address 0x7fffff7fdfe8>)
    at /home/ubuntu/unison/tvm/3rdparty/HalideIR/src/tvm/node/node_base.h:68
68	  void IncRef() {

Anyone knows how to solve this problem? Thanks.

vinx13 · December 6, 2018, 5:51am

I can confirm this issue.
Initially, the problem in tvm.select(clip, tvm.make.Max(0, tvm.make.Min(1, ox - ow)), ox - ow) is type mismatch. Changing 0 and 1 to float works. However, I found both this solution and using tvm.min/max caused seg fault.
The problem is a infinite recursion in ConvertSSA. Since cpu and gpu have different multibox implementation, this is likely an issue in multibox ir on gpu.

reminisce · December 6, 2018, 6:14am

Thanks for looking into this. Can this bug be fixed soon? We are depending on this to support SSD for CUDA target.

vinx13 · December 6, 2018, 6:55am

Actually this issue is related to NMS on CUDA. The error can be reproduced by enabling nms test on CUDA https://github.com/dmlc/tvm/blob/25a7b46c83f076b9c75d5f325d9e5fe18b10deb7/topi/tests/python/test_topi_vision.py#L46

Laurawly · December 7, 2018, 12:44am

For a quick solution, plz checkout the previous commit of nms.py file. The most recent commit for nms.py on github works for intel graphics but hasn’t been tested on cuda devices.