There are some limitations of For now
- start must be 0, which is limited by codegen_c
- step of iterator always is 1
I’d suggested to enhance For
to support this pattern:
__global__
void saxpy(int n, float a, float *x, float *y)
{
for (int i = blockIdx.x * blockDim.x + threadIdx.x;
i < n;
i += blockDim.x * gridDim.x)
{
y[i] = a * x[i] + y[i];
}
}
@tqchen Do you think it is a good idea to support this kind of for
expression in HalideIR? It’ll be helpful when we write some kernel with low level API