If LOG_BLOCK=4, that means BLOCK_IN and BLOCK_OUT would be 16. Therefore, in a GEMM instruction, it would perform 16x16 fused-multiply-add (MAC), that is 256 MACs in a single GEMM instruction. In my calculation,
256 MACs * 0.142 GHz = 36.352 GOps
Note that, in current Chisel implement, a single GEMM would take 4 cycles/stages to complete. We might see a performance regression on that.