I’m having issues trying to compile a small DLRM model with TVM. I’ve been attempting to go via PyTorch → ONNX → TVM. Importing the network fails with an error “relay.concatenate requires all tensors have the same ndim”. I believe this may be due to the presence of strided slices in an Onnx loop, as mentioned in this PR: [Relay][Frontend][Onnx] Loop Support by jwfromm · Pull Request #6700 · apache/tvm · GitHub
@jwfromm , can you give me any pointers for where I might start looking to fix that issue with ONNX imports?
To reproduce:
git clone 'https://github.com/facebookresearch/dlrm'
cd dlrm
CUDA_VISIBLE_DEVICES= python dlrm_s_pytorch.py --save-onnx --mini-batch-size=256 --data-size=100
That should take only a few seconds and will result in a fresh onnx file with a small DLRM model trained on random data.
Add this file to the repo:
import onnx
import tvm
from tvm import relay
onnx_model = onnx.load('dlrm_s_pytorch.onnx')
onnx.checker.check_model(onnx_model)
mod, params = relay.frontend.from_onnx(onnx_model)
And run it to get the error.