# Loop partitioning, padding and tensorization

Hi all,

In my effort to accelerate AArch64 through tensorization, I incurred into an issue.

Basically, I am padding my input tensor, to let `tensorize` work (I need rows to be multiple of 4 and cols to be multiple of 16).

However, bound inference removes padding (since it is not used) and, when I tile the computation, `tir.likely` statements appear. This results in `tensorize` producing the error:

``````TVMError: Tensorize failed, split condition tir.likely(((...)) relies on var defined inside tensorize scope
``````

One solution is to add a (sufficiently complex) zero multiplication by a padding element, to trick the bound inference (see for example here).

However, this is very hacky and it is not supposed to last (as the bound inference gets smarter, it might detect that the added element is zero).

The question is: should we try to come up with a â€śgoodâ€ť solution for this?

One idea might be to let tensorize accept `@tir.likely` statements and replace them with a â€śvariable sizeâ€ť tensorization which will be provided by the developer.

For instance, we might add a `_intrin_func_variable` private function that gets called only when a variable tensorization (i.e., a tensorization over `@tir.likely`) is needed.

I have also read through this post, but it doesnâ€™t seem to arrive to a concrete solution.

Any ideas?

Hi Animesh, The problem is that I need padding added in the middle of TIR on my (transformed) data tensor.

I.e., something like

``````A1 = im2col(A)
C_padded = te.compute([M,N], lambda i, j : sum(A2[i,k]*B[k,j], k)
``````

Then I tile on `C` and tensorize on the inner tile (which is where the problem started). Note that I cannot fuse the requantization to the main computation because of the `unpad`

Also, it would be nice to not pad `A` at all, but to work on a solution that can automatically detect the borders and invoke different kind of tensorizations (if provided) or use scalar computation for the borders (if multiple `tensorizations` are not provided).

In this way I donâ€™t need unpadding and the computation could become:

``````A1 = im2col(A)
C = te.compute([M,N], lambda i, j : sum(A1[i,k]*B[k,j], k)) + requantization #tensorization handles everything automatically
``````

What do you think?

1 Like

Up, I am currently facing the same issueâ€¦ Have you found any solution?