Batchnorm op Fusion in TVM

aakaverm-quic · March 23, 2022, 7:42am

Dear All,

I am looking for a set transformation passes in TVM that helps in fusing/folding the Batchnorm ops into the previous or the next convolution-like layers.

My expectation :

before batchnorm fold : conv2d → bias_add → batch_norm
after batchnorm fold : conv2d (possibly changed weights) → bias_add (possibly changed bias)

The mathematics for this above transformation can be understood from the below image.

(source of the snapshot - Fusing batch normalization and convolution in runtime)

As of now I have been able to find SimplifyInference pass in TVM that is related to simplification of Batchnorm op.

But, from what I understand about this pass is that it separates out the constant terms from the batchnorm operation and folds it but the ops corresponding to the terms with data involvement are still present in the relay graph as two basic operations “Multiply” and “Add”.

Add ( Multiply ( data, scale ) , shift )

where :

scale is [gamma/sqrt(running_variance + epsilon)] 

shift is [{-(running_mean * gamma)/(sqrt(running_variance + epsilon)}+beta]

I have applied “FoldScaleAxis” and “FoldConstant” pass in the same order after the Simplify inference pass which are not helpful in what I expect to achieve.

Can someone suggest if TVM has any other set of transformation passes working at the relay level that can help me get the expected Batchnorm fuse/fold transformation over the relay graph?

Thanks!!

cc: @mbrookhart @mbs-octoml

masahi · March 23, 2022, 7:53am

You need to apply FoldScaleAxis after FoldConstant. See https://github.com/apache/tvm/blob/ac6607282e080dc15cce7d9cf565f5d390ba0f16/tests/python/relay/test_pass_fold_constant.py#L316-L323

aakaverm-quic · March 23, 2022, 9:01am

Hi @masahi ! Thanks for the quick response. I tried the sequence of passes you suggested but still seeing the same effect, i.e., multiply and add ops in place of Batchnorm op.

cc: @mbrookhart

masahi · March 23, 2022, 9:14am

Have you run bind_param_by_name? https://github.com/apache/tvm/blob/ac6607282e080dc15cce7d9cf565f5d390ba0f16/tests/python/relay/test_pass_fold_constant.py#L341

aakaverm-quic · March 23, 2022, 9:46am

I hadn’t run bind_param_by_name. I tried it now. I am now not seeing multiply ops however I still see add ops in place of batchnorm ops. The script I am using is given below. Thanks @masahi !

import onnx

import tvm

from tvm import relay

from tvm.relay.build_module import bind_params_by_name

import os

os.system(“wget https://s3.amazonaws.com/onnx-model-zoo/resnet/resnet50v1/resnet50v1.onnx”)

dtype_dict = {“data”: “float32”}

shape_dict = {“data”: [1,3,224,224]}

onnx_model = onnx.load(‘resnet50v1.onnx’)

mod, params = relay.frontend.from_onnx(onnx_model, shape_dict, freeze_params=True)

print(mod)

mod[“main”] = bind_params_by_name(mod[“main”], params)

with tvm.transform.PassContext(opt_level=3):
  seq1 = tvm.transform.Sequential(

         [relay.transform.InferType(),
          relay.transform.SimplifyInference(),
          relay.transform.FoldConstant(),
          relay.transform.FoldScaleAxis(),
         ])


  mod = seq1(mod)

  print(mod)

cc: @mbrookhart

masahi · March 23, 2022, 10:28am

Having add there is expected since batch norm has a shift by constant. But the idea is that the new add can be folded into conv2d bias add. SimplifyExpr pass finds such two consecutive add with constant rhs, and fold them into one add.

aakaverm-quic · March 23, 2022, 11:22am

Thanks @masahi.

Okay so, I tried with this sequence of passes:

seq1 = tvm.transform.Sequential(

    [relay.transform.InferType(),
    relay.transform.SimplifyInference(),
     relay.transform.FoldConstant(),
     relay.transform.FoldScaleAxis(),
     relay.transform.SimplifyInference(),
     relay.transform.FoldConstant()
    ])

I get “add” ops as it is; they are not getting folded to the preceding conv2d’s bias.

Also, suppose there is no bias_add corresponding to a conv2d but batchnorm is present after this conv2d. So in this case, after folding the batchnorm will a new bias_add op be created eventually to adjust the shift or the shift will remain as an add op?

masahi · March 23, 2022, 7:38pm

Note that you are using SimplifyInfernce twice, but you want to replace the second one with SimplifyExpr.

But right, it seems bias_add and add are not folded. It seems relay.transform.CanonicalizeOps() converts bias_add to add, so you want to call it before SimplifyExpr. I tried your script but I didn’t get satisfying output after brief attempt. Maybe you need to play around it a bit more. If you believe there is a bug / missing functionality, welcome to open a github issue.

aakaverm-quic · March 24, 2022, 9:44am

Hi @masahi , I am not quite clear regarding this bias_add and add op folding that you mentioned.

Do you mean after simplification i.e., CanonicalizeOp and SimplifyExpr passes there will finally be an add op? In other words, will the add op (shift from batchnorm) and bias_add be effectively written as 1 “add” op?

It could be an issue for my usecase. As I mentioned in my original question I would need to preserve the conv2d and bias_add ops after batchnorm fold. It is more of a pattern matching requirement rather than an optimization one.

Also, in the cases where bias_add is not present, I would need to have the “shift” from batchnorm to be expressed as an bias_add op.

So, what I intend to achieve is as follows:

case 1:

before: conv2d → bias_add → add (shift from batchnorm) is transformed to:

after transform: conv2d → bias_add ( bias values are changed by add op folding into bias_add op)

case 2:

before: conv2d → add (shift from batchnorm) is transformed to:

after transform: conv2d → biad_add (add op expressed as bias _add op)

Can you please confirm if this seems possible with existing TVM transformation passes?

Lyken17 · March 24, 2022, 5:21pm

Either case should be possible with tvm transformation pass.

If you need to preserve conv2d and biad_add, an alternative way to achieve this is to perform such transform in original models before exporting to TVM.

For example, I used to use this script to fuse bn into conv.

masahi · March 25, 2022, 12:13am

Looks like SimplifyExpr doesn’t support folding bias_add and bias, see tvm/src/relay/transforms/simplify_expr.cc at 6942b3660df3551a3a9a86c2faba834d366a2a7e · apache/tvm · GitHub. So both of cases don’t work unless you modify that pass. But I recommend not depending on bias_add as explained below.

I highly suggest modifying your pattern to support both bias_add and add as in tvm/python/tvm/relay/op/contrib/cutlass.py at 7fd73b2663ae33d341ab09834f215285eb9bd136 · apache/tvm · GitHub

Frontends are not consistent in which ops to generate for bias addition.

And I recommend experimenting with simple test cases like tvm/tests/python/relay/test_pass_fold_constant.py at ac6607282e080dc15cce7d9cf565f5d390ba0f16 · apache/tvm · GitHub rather than the whole resnet50 from the beginning.

aakaverm-quic · March 28, 2022, 8:05am

Thanks a lot @Lyken17. This is an interesting alternative! Although I am currently exploring possibilities at the relay level to accommodate all frontends under same set of transformations.

aakaverm-quic · March 28, 2022, 8:10am

Got it. I verified it by importing Gemm operator in onnx frontend to relay and saw the bias as an “add” op. So yeah, extending the pattern seems helpful here.

Thanks a lot @Masahi for these clarifications and suggestions! . .

mbrookhart · April 1, 2022, 8:24pm

If you add CanonicalizeOps to your pass list before Simplify Expr, it will turn the bias_add to an add

github.com

apache/tvm/blob/main/src/relay/transforms/canonicalize_ops.cc#L36-L69


class BiasAddSimplifier : public ExprRewriter {
 public:
  BiasAddSimplifier() : bias_add_op_(Op::Get("nn.bias_add")) {}


  Expr Rewrite_(const CallNode* n, const Expr& post) override {
    auto new_n = post;
    if (n->op == bias_add_op_) {
      Call call = Downcast<Call>(new_n);
      ICHECK_EQ(call->args.size(), 2);
      const BiasAddAttrs* param = call->attrs.as<BiasAddAttrs>();


      auto ttype = n->args[0]->type_as<TensorTypeNode>();
      size_t n_dim = ttype->shape.size();
      int axis = param->axis;
      if (axis < 0) {
        axis += n_dim;
      }
      Expr expanded_bias = ExpandBiasToMatchAxis(call->args[1], n_dim, {axis});
      Expr ret = Add(call->args[0], expanded_bias);
      ret->checked_type_ = n->checked_type_;

This file has been truncated. show original

aakaverm-quic · April 14, 2022, 12:33pm

Hi all,

Just checking if after the bn params are folded into conv-like layers does TVM has maintained a copy of these params in some way. More specifically does it maintain a mapping of which params got folded to which conv-like layer?

If the above is not maintained, then what could be a good place to maintain such a structure.?

For point 2 above:

According to my understanding so far :
There is simplify_inference that decomposes bathnorm into basic ops. It takes original params of BN (gamma, beta, movingmean, movingvar) and combines them to get a scale and shift. https://github.com/apache/tvm/blob/main/src/relay/transforms/simplify_inference.cc#L39-#L60
This scale would then be folded into conv-like layers like conv2d/dense either backward or forward in fold_scale_axis pass.

So, to maintain relationship between the original BN params and the corresponding conv-like layers into which batchnorm is folded:

I think I would have to maintain two maps:

Original BN params → Multiply op - maintain it in simplify_inference_pass
Multiply op → Conv-like layers into which scale is folded - maintain it in fold_scale_axis pass and also use the above map here.

Please share thoughts on the above. Thanks!

cc: @masahi @mbrookhart

masahi · April 15, 2022, 10:07pm

We don’t do that. Why do you need to maintain such mapping?

aakaverm-quic · April 16, 2022, 6:01pm

Hey, Thanks @masahi.

I am trying to enable support for an optimization library. Optimizations being focused towards restoring accuracy of quantized models. These optimizations are originally available in AIMET tool that can be directly applied to tensorflow and pytorch models. I am trying to enable the optimizations at the relay level.

One of the algorithms for the optimization uses batchnorm params in the heuristics.

The algorithm is called “High Bias Absorption” and is applied after “Cross Layer Scaling”. Cross Layer scaling is used to normalize large variation in weight tensor of a conv layer so that when per tensor quantization is done on this optimized model, the scale and offset represent the whole tensor better.

But due to mathematics of Cross Layer Scaling, the Bias can become large sometimes. So, “High Bias Absorption” is done to reduce the bias and absorb excess bias in next layer and so on. To decide if bias is large or not BN params (that were originally present in the model before BN Fold.) corresponding to this conv layer are used.

https://arxiv.org/pdf/1906.04721.pdf - section 4.1

masahi · April 18, 2022, 4:56am

I see.

After FoldScaleAxis, the multiply op disappears from the mod. So I’m not sure what the concrete structure of the mapping should be.

If your goal is quantization, how about disabling SimplifyInference and FoldScaleAxis when you run quantization passes? I think there is no need to run AIMET-inspired passes on a optimized graph.

small-cat · December 29, 2022, 8:40am

I followed the discussions above and tried the following transform,

seq = tvm.transform.Sequential(
      [
        transform.InferType(),
        transform.CanonicalizeOps(), 
        transform.SimplifyInference(), 
        transform.FoldConstant(), 
        transform.FoldScaleAxis(),
        transform.SimplifyExpr(), 
        transform.FoldConstant(),
        transform.FoldScaleAxis(),
      ] 
    )

I create a relay like conv2d + bias_add + batch_norm, and transform with the above passes on it, the result seems to be what you want @aakaverm-quic . Firstly, bias_add → add Secondly, batch_norm → add Consequently, the two consecutive op (add) will fused by SimplyExpr(), and relay transform to be the format conv2d + add

f2013519 · November 3, 2023, 4:11pm

Is it possible to fuse with conv-transpose-2d as well? Something like conv2d_transpose + batch_norm => conv2d_transpose?