regarding with the relax auto-scheduling issue, I heard about a site-package or 3rd-plugins which name “dlight” is under development,which would replace DefaultGPUSchedule which offers great performance at no auto-tuning cost.
So, whats the status now, I 'm expecting and can not wait for any second.
The first version of dlight has already been integrated into TVM Unity and MLC-LLM, you can try this feature by upgrading your relax and MLC-LLM to the latest version.
I have seen dlight in mlc-llm, but default dl.gpu.Matmul() does not seem to use nv’s tensor core, which makes matrix matmul quite slow. Any good suggestions?