Autoscheduler on mask rcnn raises "no task" error

crazydemo · May 14, 2021, 5:38am

I am trying to use autoscheduler to tune the pytorch official mask rcnn model maskrcnn_resnet50_fpn_coco on cpu

torch version: 1.8.1 + cu102 torchvision version: 0.9.1+cu102

the running log is as follows:

TypeError: unsupported operand type(s) for *: ‘Any’ and ‘int’ Traceback (most recent call last): File “/home2/zhangya9/TLCBench/tune_autoscheduler.py”, line 113, in auto_scheduler_tune(network, batch_size, dtype, target, log_file) File “/home2/zhangya9/TLCBench/tune_autoscheduler.py”, line 63, in auto_scheduler_tune tuner = auto_scheduler.TaskScheduler(tasks, task_weights) File “/home2/zhangya9/tvm/python/tvm/auto_scheduler/task_scheduler.py”, line 243, in init assert len(self.tasks) != 0, “No tasks” AssertionError: No tasks

comaniac · May 14, 2021, 5:17pm

From the error message it seems due to the dynamic model? But this issue should be resolved by this issue. Are you using the latest TVM?

Also cc @masahi @merrymercy

crazydemo · May 14, 2021, 5:56pm

I build the latest tvm from source, commit id:8d9a1dfe77cba7220d9313c32070f190b7eb30d8, it is updated on May 6th. I did not use VM. Could you offer me an example of how to use vm to extract tuning tasks?

crazydemo · May 14, 2021, 6:02pm

By the way, with the almost same codes, faster rcnn can be tuned.

masahi · May 14, 2021, 8:06pm

There is an error in your log: TypeError: unsupported operand type(s) for *: ‘Any’ and ‘int’. Unlike faster rcnn, mask rcnn has dynamic batch conv2d, conv2d transpose. Any there comes from the dynamism in the batch.

I thought I fixed this issue, but I may have forgot upstreaming the fix. Can you check which line the error is happening? The fix to this common error is always trivial, we just need to add something like tvm/conv2d_nhwc.py at 813136401a11a49d6c15e6013c34dd822a5c4ff6 · apache/tvm · GitHub

Things should work with torch 1.7, I haven’t tested on 1.8.

Also, in terms of tuning, there is no difference in mask rcnn vs faster cnn: The only extra workloads in maskrcnn are those dynamic ops, which cannot be tuned anyway for now. So you can tune on faster rcnn and use the same tuned log on mask rcnn. Note that due to those dynamic ops that cannot be tuned, mask rcnn in particular is extremely slow on TVM now.

crazydemo · May 15, 2021, 1:54am

Now I have change the torch and torchvision version to 1.7.0 and 0.8.1, respectively. The full log is listed in the following notion link. The error seems to happen in /tvm/python/tvm/nn/conv2d.py line 1075. But I have no ideas to fix it.

File “/home2/zhangya9/tvm/python/tvm/_ffi/_ctypes/packed_func.py”, line 81, in cfun rv = local_pyfunc(*pyargs) File “/home2/zhangya9/tvm/python/tvm/relay/op/strategy/generic.py”, line 240, in _compute_conv2d return [topi_compute(*args)] File “”, line 2, in conv2d_winograd_nhwc File “/home2/zhangya9/tvm/python/tvm/target/generic_func.py”, line 276, in dispatch_func return func(*args, **kwargs) File “/home2/zhangya9/tvm/python/tvm/topi/nn/conv2d.py”, line 1196, in conv2d_winograd_nhwc return _conv2d_winograd_nhwc_impl( File “/home2/zhangya9/tvm/python/tvm/topi/nn/conv2d.py”, line 1075, in _conv2d_winograd_nhwc_impl P = N * nH * nW

masahi · May 15, 2021, 5:20am

I see, I’m not sure if winograd can be used if N is dynamic at tvm/conv2d.py at 813136401a11a49d6c15e6013c34dd822a5c4ff6 · apache/tvm · GitHub

A workaround is to update x86 conv2d strategy in tvm/x86.py at 7130e80204ff727c4947dbb928e0330b0f1d6117 · apache/tvm · GitHub to not use winograd when the batch size is Any.

Note that on GPU, mask rcnn doesn’t work with NHWC layout, because there is some issue with default conv2d transpose schedule.

For now, I recommend using NCHW layout to work with mask rcnn.

crazydemo · May 17, 2021, 6:30am

Thank you so much! This issue is sloved by change the layout to “NCHW”.