Wrong output from Cuda `vision.multibox_transform_loc`

vision.multibox_transform_loc run on Cuda give wrong output .

def multibox_transform_loc(
    cls_prob, loc_pred, anchor, clip=True, threshold=0.01, variances=(0.1, 0.1, 0.2, 0.2)
)

vision.multibox_transform_loc output

  1. wrong ---- cls_prob is GPU output, loc_pred is const expr .
  2. wrong ---- cls_prob is GPU output, loc_pred is GPU output .
  3. correct ---- cls_prob is const expr, loc_pred is const expr .

Gpu output and Data dump(from caffe blobs) diff:

  1. cls_prob ------ mbox_conf_softmax 1.9020114e-06
  2. loc_pred ------ mbox_loc 4.2561445e-05

Shapes

  1. cls_prob ------ (1, 2, 7650)
  2. loc_pred ------ (1, 30600)

Three output above line 5587 is same. On image 3 (correct output) below line 5587 values is extramly small.

Any suggestions would be greatly appreciated!

This repo can reproduce the bug on tvm repo commit 4c77bae772ad68f3dc4dda009384cb65af9dfaec

How do you run these three tests? Are the CUDA kernel for multibox_transform_loc the same?

This repo can reproduce the bug on tvm repo commit 4c77bae772ad68f3dc4dda009384cb65af9dfaec

@vinx13

Exact same CUDA kernel

Thanks. I haven’t found exact reason, but seems this is related to some optimization pass. Setting opt_level=0 can produce correct result.

Thanks for test this problem. I’m not familar with Cuda, try to learn something to understand what’s going wrong.