We deploy SSD on a GPU with 64 threads; thus, target = 'cuda -max_num_threads=64'
is used. Please refer to test_ssd.py.
However, according to the generated host code, ((TVMValue*)stack_value)[6].v_int64
is 30 instead of 64 in fused_vision_multibox_transform_loc
. It seems that threadIdx and blockIdx is in reverse order.
We re-order ib.scope_attr
in multibox.py (please search the two functions: transform_loc and transform_loc_pre), and then we can get a host code with correct thread setting.
The question is that is scope_attr order-sensitive? If yes, should we modify multibox.py? Thank you for your help. The test files can be found in the repo.