Hi, I’m newer to TVM.
I’m exploring the FoldScaleAxis but I don’t even understand what it is.
Is there a conceptual example of this optimization to catch the need of this and how it works.
Thank you.
Hi, I’m newer to TVM.
I’m exploring the FoldScaleAxis but I don’t even understand what it is.
Is there a conceptual example of this optimization to catch the need of this and how it works.
Thank you.
For example, this enables pre-multiplying scaling in batch norm against its preceding conv2d weight. The combination of SimplifyInference
, FoldScaleAxis
, and FoldConstant
completely eliminates batch norm from an inference graph.
Thank you for the explanation. As far as I understand, it seems that operator fusion. In the aforementioned example, multiply and addition ops of batch norm can be merged to preceding conv2d op based on mathematical property.
I just now think FoldScaleAxis is one of ways to perform operator fusion. Why is FoldScaleAxis needed along with FuseOps? (I understood the example but I am confused with operator fusion)
FuseOp
can only combine multiple ops into a single kernel but can’t not completely eliminate the computations. FoldScaleAxis
is a kind of rewrite so that the transformed weight (the original weight + multiply / addition) can be pre-computed during compile time.
Now I understand the concept of FoldScaleAxis. Thank you all for reply…!