I am learning about TVM’s optimizations lately, and was wondering how TVM does fusion for RNNs? I understand how fusion can be done for operators like matrix multiplication and ReLU, but the computation in RNNs seems more complicated and has more dependences. For example, this is a summary of the computation of LSTM.
Hi, masahi, thank you for the reply! Could you elaborate more? I don’t totally understand, do you mean fusion is done per expression? not across expressions? Also I thought fusion is performed on the IR and between operators. Why would different lines matter though, they would be converted to the IR anyway right?
sorry what I said above didn’t make any sense… We automatically fuse matmul and following bias_add, elemwise ops like activation. But we never fuse two matmul (like W_{xi} * x_t + W_{hi} * h_{t-1} above)