[HalideIR] Enhance For

lixiaoquan · May 21, 2019, 1:44am

There are some limitations of For now

start must be 0, which is limited by codegen_c
step of iterator always is 1

I’d suggested to enhance For to support this pattern:

__global__
void saxpy(int n, float a, float *x, float *y)
{
    for (int i = blockIdx.x * blockDim.x + threadIdx.x; 
         i < n; 
         i += blockDim.x * gridDim.x) 
      {
          y[i] = a * x[i] + y[i];
      }
}

@tqchen Do you think it is a good idea to support this kind of for expression in HalideIR? It’ll be helpful when we write some kernel with low level API

aca88 · May 21, 2019, 1:13pm

Is this really the case? codegen_c is only one of the many “backends” of TVM.
If I remember correctly, the For loops are normalized in the tvm.schedule.normalize() and it is (AFAIK) a simplification so that InferBound is easier.

lixiaoquan · May 22, 2019, 3:25am

Yes, there is an assert there to make sure min starts at 0
https://github.com/dmlc/tvm/blob/b63267b92d942b9c64f814b73567b2fe908e67fb/src/codegen/codegen_c.cc#L826

What I mean is that it is not the place to put start to 0, but this check makes non-zero start not to work.

If ir_builder is used, there seems no normalization for this

tqchen · May 22, 2019, 5:31am

Most likely we can use the same normalized loop to represent the same program, and low level program optimizer will detect such loop and rewrites to the strided version

for ( int i = 0; i < extent; i ++) {
   y[i * stride + min] = a * x[i * stride + min]
}

aca88 · August 28, 2019, 9:16am

Would you mind giving your insights as to which “low level program optimizers” do this.

So do the LLVM mid-backends generally do this?
What about if we want to design a backend for a non-LLVM compilable HW target?

I am thinking that since the problem is due to normalization (required for other TVM routines), TVM could “denormalize” after.

lxwithgod · September 3, 2020, 8:34am

hi I check tvm source code，I never see blockDim.x,and gridDim.x I think loop fused is ok when loop start，cond and step is same.