I am trying to use debug_chat.py which uses relax virtual machine. I followed the instructions from Compile Model Libraries — mlc-llm 0.1.0 documentation to get Redpajama, compile it then use debug_chat to check the compile model. However, I keep getting this error: InternalError: Check failed: (e == cudaSuccess || e == cudaErrorCudartUnloading) is false: CUDA: operation would make the legacy stream depend on a capturing blocking stream. I have installed/uninstalled different version of MLC-LLM/TVM-unity/Cuda and I am still getting the same error. Has anyone gotten this error before? How can I solve it?
for now I have cuda 12.6 driver 560.35.03, the verified and tagged mlc-llm and tvm-unity v0.18.dev0 on Ubuntu22.04.
I would appreciate the help. thank you