I was trying to compile the SSD model in this tutorial for CUDA target. It initially failed here.
I understand that the current nms op only has CPU version. I just want to get the compilation pass to run through first and then work on the nms GPU version.
After changing tvm.make.Min/Max to tvm.min/max, it crashed with a seg fault in Halide.
[21:04:38] /home/ubuntu/unison/tvm/src/pass/arg_binder.cc:87: Trying to bind buffer to another one with lower alignment requirement required_alignment=8, provided_alignment=4
[21:04:38] /home/ubuntu/unison/tvm/src/arithmetic/int_set.cc:514: cannot evaluate set type Load
[21:04:38] /home/ubuntu/unison/tvm/src/arithmetic/int_set.cc:514: cannot evaluate set type
Thread 1 "python" received signal SIGSEGV, Segmentation fault.
0x00007fffe4dd9542 in tvm::NodeBase::IncRef (this=<error reading variable: Cannot access memory at address 0x7fffff7fdfe8>)
at /home/ubuntu/unison/tvm/3rdparty/HalideIR/src/tvm/node/node_base.h:68
68 void IncRef() {
I can confirm this issue.
Initially, the problem in tvm.select(clip, tvm.make.Max(0, tvm.make.Min(1, ox - ow)), ox - ow) is type mismatch. Changing 0 and 1 to float works. However, I found both this solution and using tvm.min/max caused seg fault.
The problem is a infinite recursion in ConvertSSA. Since cpu and gpu have different multibox implementation, this is likely an issue in multibox ir on gpu.
For a quick solution, plz checkout the previous commit of nms.py file. The most recent commit for nms.py on github works for intel graphics but hasn’t been tested on cuda devices.