Did you mean the number of cycles for a single GEMM instruction? AKAIK, it depends on the implementation, as it for Chisel-based implementation, a single GEMM instruction takes 4 cycles to complete, since there are stages in the design that prepare the data stream for such execution.