I will do a bit more testing. I searched the codebase and it appears the gemm code was just cut/pasted for all the contrib matmul’s with little apparent effort to unify approaches.
I propose:
Header file with unified approach that all implementations can use.
Update all matmuls; they will then mainly just forward to unified approach with different gemm operators.
Add support for optional alpha, beta parameters. Old code would continue to work with no changes; the argvec.
This would update all matmuls in contrib. I cannot test all those different options, however, so I would need help with that. I could test mkl, cpu, cuda, and if the intel or nvidia opencl implementations support it the rocblas pathway.
Does this sound like a good plan? This may allow tvm to be used in systems that, for instance, are using gemm to sum gradients into an accumulator (beta of 1).
If we want to introduce something like gemm(with alpha support) maybe we need to rename the external operator as gemm. contributions are more than welcomed
I am hesitant to rename as I do not want to break anyone’s stuff. I could add a new operator called gemm (in the same files) that supports alpha, beta.
The changes I am making are not just adding alpha, beta. It is also verifying the implementation to work with matrixes with strides (submatrixes). Also, for the cuda implementation the stream and handle management were not done as well as contrib.cudnn so I am following that model.
My vote is to change nothing and just have more things work than would have otherwise but if there is some interface that matmul is supposed to adhere to (because, for instance, topi.matmul is only capable of it) then I should rename.