I’m trying to auto-schedule BERT-like models on a GPU. This works successfully on the CPU.
However things fail when running on the GPU. I successfully complete all of the tuning tasks, however when compiling the full model with the logfile, I always crash with the error:
File "/home/wheest/tools/tvm/python/tvm/relay/op/strategy/generic.py", line 767, in _compute_batch_matmul
return [topi_compute(*args)]
File "/home/wheest/tools/tvm/python/tvm/autotvm/task/topi_integration.py", line 165, in wrapper
node = topi_compute(cfg, *args)
File "/home/wheest/tools/tvm/python/tvm/topi/cuda/batch_matmul.py", line 32, in batch_matmul
return nn.batch_matmul(x, y)
File "/home/wheest/tools/tvm/python/tvm/topi/nn/batch_matmul.py", line 57, in batch_matmul
assert len(x_shape) == 3 and len(y_shape) == 3, "only support 3-dim batch_matmul"
Presumably Ansor is trying to do some sort of tiling and it is causing problems.
Is there an auto-scheduling flag or similar which can ensure we don’t break this batch_matmul rule?
Or some other clever trick?