Disable initialization in te.compute


Is there a way to disable initialization of my tensor C in the te.compute or a way to initialize the variable C in my program before calculating with te.compute ?

Out = te.compute(
        (batch_size, out_channels, out_h, out_w),
        lambda batch, out_channels, yy, xx: te.sum(
            A[batch, axe_in_channels, yy * stride_h + axe_kernel_h * dilation_h, xx * stride_w + axe_kernel_w * dilation_w]* W[out_channels, axe_in_channels, axe_kernel_h, axe_kernel_w],
            axis=[axe_in_channels, axe_kernel_h, axe_kernel_w],)

As you can see below when I do print(tvm.lower(…)) there is an initialization, I would like to do a loop permutation and this initialization prevents me from doing what I want.

Cannot find config for target=llvm -keys=cpu -link-params=0 -mcpu=core-avx2, workload=('conv2d_ttile_', 1, 56, 56, 128, 128, 3, 3, 1, 1, 1, 1, 1, 1). A fallback configuration is used, which may bring great performance regression.
primfn(A_1: handle, W_1: handle, compute_1: handle) -> ()
  attr = {"global_symbol": "main", "tir.noalias": True}
  buffers = {compute: Buffer(compute_2: Pointer(float32), float32, [1, 128, 56, 56], []),
             W: Buffer(W_2: Pointer(float32), float32, [128, 128, 3, 3], []),
             A: Buffer(A_2: Pointer(float32), float32, [1, 128, 58, 58], [])}
  buffer_map = {A_1: A, W_1: W, compute_1: compute} {
  for (out_channels: int32, 0, 128) {
    for (yy: int32, 0, 56) {
      for (xx: int32, 0, 56) {

        compute_2[(((out_channels*3136) + (yy*56)) + xx)] = 0f32

        for (axe_in_channels: int32, 0, 128) {
          for (axe_kernel_h: int32, 0, 3) {
            for (axe_kernel_w: int32, 0, 3) {
              compute_2[(((out_channels*3136) + (yy*56)) + xx)] = ((float32*)compute_2[(((out_channels*3136) + (yy*56)) + xx)] + ((float32*)A_2[(((((axe_in_channels*3364) + (yy*58)) + (axe_kernel_h*58)) + xx) + axe_kernel_w)]*(float32*)W_2[((((out_channels*1152) + (axe_in_channels*9)) + (axe_kernel_h*3)) + axe_kernel_w)]))

Thank you

@cali What is your goal to drop the init part in reduction? For compute operation, you cannot do that. You can check compute_op.cc:MakeReduction.

Thank you for your answer. The goal is to avoid doing an initialization if I give as argument a tensor already initialized to zero.

@cali I am not sure if there is a better way to achieve it. Maybe you can add a bool member drop_init in CommReducerNode. Once it is true you are safe to drop it in the MakeReduction function.

@cali Are you working on adding this drop_init option to CommReducerNode. This is something I’m interested in as well, and so I might be interested in adding this feature if you’re not working on this.

This isn’t quite what’s requested, but I discovered by accident that if you set debug_keep_trivial_loop=True for ScheduleOps, it will remove the init part of the reduction ops. You can see this is mentioned in the documentation (include/tvm/te/schedule_pass.h):

 * \param debug_keep_trivial_loop Whether keep trivial loops with extent of 1 during lowering.
 *                                This is a debug feature for dataflow/axis analysis.
 *                                Note: If this is true, The lowered IR may be incorrect,
 *                                because we will also delete the init part of reduction

While I still don’t entirely understand why this works in this way, you can follow it through the code to work out how to implement such a drop_init functionality. You can also just use it directly for prototyping.

Hi @matt-arm Thanks for the reply. I actually found this flag right before I posted the above question here, and I tried enabling it by passing True to ScheduleOps in python/tvm/driver/build_module.py. I also found that this was used to disable the initialization in MakeComputeStmt function in src/te/operation/compute_op.cc, which was why I asked that question to @cali whether they’ve started implementing it because I think I might be able to implement this if they have not started or don’t intend to work on this.

1 Like

Hi @sanirudh,

I’m sorry I didn’t implement this, I’m working on something else.

Thank you