I am looking for a set transformation passes in TVM that helps in fusing/folding the Batchnorm ops into the previous or the next convolution-like layers.
My expectation :
before batchnorm fold : conv2d → bias_add → batch_norm
As of now I have been able to find SimplifyInference pass in TVM that is related to simplification of Batchnorm op.
But, from what I understand about this pass is that it separates out the constant terms from the batchnorm operation and folds it but the ops corresponding to the terms with data involvement are still present in the relay graph as two basic operations “Multiply” and “Add”.
Add ( Multiply ( data, scale ) , shift )
where :
scale is [gamma/sqrt(running_variance + epsilon)]
shift is [{-(running_mean * gamma)/(sqrt(running_variance + epsilon)}+beta]
I have applied “FoldScaleAxis” and “FoldConstant” pass in the same order after the Simplify inference pass which are not helpful in what I expect to achieve.
Can someone suggest if TVM has any other set of transformation passes working at the relay level that can help me get the expected Batchnorm fuse/fold transformation over the relay graph?
Hi @masahi ! Thanks for the quick response. I tried the sequence of passes you suggested but still seeing the same effect, i.e., multiply and add ops in place of Batchnorm op.
I hadn’t run bind_param_by_name. I tried it now. I am now not seeing multiply ops however I still see add ops in place of batchnorm ops. The script I am using is given below.
Thanks @masahi !
import onnx
import tvm
from tvm import relay
from tvm.relay.build_module import bind_params_by_name
Having add there is expected since batch norm has a shift by constant. But the idea is that the new add can be folded into conv2d bias add. SimplifyExpr pass finds such two consecutive add with constant rhs, and fold them into one add.
I get “add” ops as it is; they are not getting folded to the preceding conv2d’s bias.
Also, suppose there is no bias_add corresponding to a conv2d but batchnorm is present after this conv2d. So in this case, after folding the batchnorm will a new bias_add op be created eventually to adjust the shift or the shift will remain as an add op?
Note that you are using SimplifyInfernce twice, but you want to replace the second one with SimplifyExpr.
But right, it seems bias_add and add are not folded. It seems relay.transform.CanonicalizeOps() converts bias_add to add, so you want to call it before SimplifyExpr. I tried your script but I didn’t get satisfying output after brief attempt. Maybe you need to play around it a bit more. If you believe there is a bug / missing functionality, welcome to open a github issue.
Hi @masahi , I am not quite clear regarding this bias_add and add op folding that you mentioned.
Do you mean after simplification i.e., CanonicalizeOp and SimplifyExpr passes there will finally be an add op? In other words, will the add op (shift from batchnorm) and bias_add be effectively written as 1 “add” op?
It could be an issue for my usecase. As I mentioned in my original question I would need to preserve the conv2d and bias_add ops after batchnorm fold. It is more of a pattern matching requirement rather than an optimization one.
Also, in the cases where bias_add is not present, I would need to have the “shift” from batchnorm to be expressed as an bias_add op.
So, what I intend to achieve is as follows:
case 1:
before: conv2d → bias_add → add (shift from batchnorm) is transformed to:
after transform: conv2d → bias_add ( bias values are changed by add op folding into bias_add op)
case 2:
before: conv2d → add (shift from batchnorm) is transformed to:
after transform: conv2d → biad_add (add op expressed as bias _add op)
Can you please confirm if this seems possible with existing TVM transformation passes?
Either case should be possible with tvm transformation pass.
If you need to preserve conv2d and biad_add, an alternative way to achieve this is to perform such transform in original models before exporting to TVM.
For example, I used to use this script to fuse bn into conv.
Thanks a lot @Lyken17. This is an interesting alternative! Although I am currently exploring possibilities at the relay level to accommodate all frontends under same set of transformations.
Got it. I verified it by importing Gemm operator in onnx frontend to relay and saw the bias as an “add” op. So yeah, extending the pattern seems helpful here.
Thanks a lot @Masahi for these clarifications and suggestions! . .
Just checking if after the bn params are folded into conv-like layers does TVM has maintained a copy of these params in some way. More specifically does it maintain a mapping of which params got folded to which conv-like layer?
If the above is not maintained, then what could be a good place to maintain such a structure.?
I am trying to enable support for an optimization library. Optimizations being focused towards restoring accuracy of quantized models.
These optimizations are originally available in AIMET tool that can be directly applied to tensorflow and pytorch models.
I am trying to enable the optimizations at the relay level.
One of the algorithms for the optimization uses batchnorm params in the heuristics.
The algorithm is called “High Bias Absorption” and is applied after “Cross Layer Scaling”. Cross Layer scaling is used to normalize large variation in weight tensor of a conv layer so that when per tensor quantization is done on this optimized model, the scale and offset represent the whole tensor better.
But due to mathematics of Cross Layer Scaling, the Bias can become large sometimes. So, “High Bias Absorption” is done to reduce the bias and absorb excess bias in next layer and so on. To decide if bias is large or not BN params (that were originally present in the model before BN Fold.) corresponding to this conv layer are used.
After FoldScaleAxis, the multiply op disappears from the mod. So I’m not sure what the concrete structure of the mapping should be.
If your goal is quantization, how about disabling SimplifyInference and FoldScaleAxis when you run quantization passes? I think there is no need to run AIMET-inspired passes on a optimized graph.
I create a relay like conv2d + bias_add + batch_norm, and transform with the above passes on it, the result seems to be what you want @aakaverm-quic .
Firstly, bias_add → add
Secondly, batch_norm → add
Consequently, the two consecutive op (add) will fused by SimplyExpr(), and relay transform to be the format conv2d + add