Currently TVM uses the transposed weight(in [N, K] * [M, K] format) for both Dense & BatchMatmul.
While in TensorFlow, the default weight layout is not transposed([K, M]), so we will get some extra transpose op in the relay model imported from TensorFlow. In most cases, these transpose can be easily merged by const folding, but in some model like Bert, the seond input (usually seen as weight) of BatchMatmul is not a const parameter, which means it cannot been simplified directly. We’ve seen some performance problem caused of this.
Another confusion is that in my understanding, K is the reduce axis and have both K in the inner dimension is not friendly for vectorize.
I guess the design of using transposed weight cames from the MXNet? Since there dose have some inconvenience of using such implementation, will the community agree to extend these ops to support non-transposed weight directly? (Exp. add extra attributes saying if input transposed or weight transposed)