For example, I’m optimizing GEMV on GPU. The question is, how can I judge the kernel is efficient enough?
I found the roofline model and know the arithmetic intensity limits the FLOPS I can get.
But how can I judge the arithmetic intensity my kernel get is the upper bound ?
Thank you!