[RFC][BYOC]NVIDIA CUTLASS Integration

Meteorix · February 19, 2021, 7:35am

Glad to see the RFC! TVM performance on large gemm has troubled me for a long time. Looking forward to further benchmark on cutlass+fusion against cublas+nofusion.

One potential issue: autotvm selects the best implement from autotuned-gemm and cublas-gemm based on performance, then do the fusion. If cutlass is integrated, we need to select sub-graph level autotuning and then select the best.