[Announcement] CUDA 10.2 features for TensorCores

adityaatluri · November 20, 2019, 5:56pm

Hi, NVIDIA released CUDA 10.2 yesterday. Here are few things TVM can be excited about.

Warp wide mma operations for fp16, (mma.sync.m16n8k8)
Warp wide mma operations for s8, (mma.sync.m8n8k16)
Warp wide mma operations for s4, (mma.sync.m8n8k32)
Load matrices directly from shared memory to registers to be directly consumed by mma instruction (ldmatrix)

These are good additions to mma.sync.m8n8k4 from 10.1