[Tutorial] stride value in generated code

Hi, I would like to understand the code generated by this simple column reduce add operation (example from Introduction to TOPI — tvm 0.8.dev0 documentation):

B = te.compute((n,), lambda i: te.sum(A[i, k], axis=k), name="B")

The generated code contains “stride” value which I am not quite clear as follows: The stride is defined in Buffers as: [stride: int32, stride_1: int32] The implementation is as follow:

for (i: int32, 0, n) {
    B[i] = 0f32
    for (k: int32, 0, m) {
      B[i] = ((float32*)B[i] + (float32*)A_2[((i*stride) + (k*stride_1))])
    }
  }

So if I want to confirm where this implementation is correct, the “stride” must be equal to m and “stride_1” must be 1 to achieve correct column reduce add. But I don’t see when stride/stride_1 is initialized. Did I miss anything? Or is there a way to inspect the stride’s value ? Thanks

hello ,i also want to know about the stride.Have you find something or someone can help us?

The stride values are normally created when the buffer is either not declared in the code using tvm.tir.decl_buffer, because the type of buffer created by default is auto_broadcast.

You could create a buffer and bind it to the A tensor without mentioning any buffer_type which defaults to kDefault. This would enable compact representation mode and when you lower and print that IR, you would get the IR without strides. You can do something like the code shown below

B = te.compute((n,), lambda i: te.sum(A[i, k], axis=k), name="B")
Ab     = tvm.tir.decl_buffer(A.shape, name='Ab', dtype=A.dtype)
Bb     = tvm.tir.decl_buffer(B.shape, name='Bb', dtype=B.dtype)
s = te.create_schedule(B.op)
print(tvm.lower(s, [A], simple_mode=True, binds={A:Ab, B:Bb}))

This prints the below IR:

primfn(Ab_1: handle) -> ()
  attr = {"from_legacy_te_schedule": True, "global_symbol": "main", "tir.noalias": True}
  buffers = {Ab: Buffer(Ab_2: Pointer(float32), float32, [n: int32, m: int32], [])}
  buffer_map = {Ab_1: Ab} {
  allocate(Bb: Pointer(global float32), float32, [n]), storage_scope = global;
  for (i: int32, 0, n) {
    Bb[i] = 0f32
    for (k: int32, 0, m) {
      Bb[i] = ((float32*)Bb[i] + (float32*)Ab_2[((i*m) + k)])
    }
  }
}

I know I probably did not answer the original question as to where the stride values are assigned, and I’m not sure about that yet, and I’ll update this answer when I find it.

Incidentally, I do have one question related to this, which is that, when we do not bind to a buffer with decl_buffer, the buffer type created is kAutoBroadcast, but when we do bind it with decl_buffer and do not mention a buffer_type, the buffer type created is kDefault, which seems a bit confusing to me. Maybe someone else might clarify regarding this design decision.

Thanks, Anirudh

4 Likes