[RFC][Tensor Core] Optimization of CNNs on Tensor Core

Hi Shawn,

I’m curious about why need to set AS_align = chunk * wmma_k + offset but WS_align = warp_col_tiles * block_col_warps * wmma_k + offset. (AS_align do not include warp_row_tiles and block_row_warps)

Thank you