How Can Use zero to initial Buffer instead of assignment initialization?

Part of my TensorIR code like this:

A = T.decl_buffer([128], dtype=‘float16’, scope=‘shared’)

I want to initialize this Buffer with 0, I have to use for-loop to assignement initializal.

for i in range(128) : A[i] = T.float32(0)

And my target is ‘cuda’

cuda_mod = tvm.build(sch.mod, target=“cuda”)

So the Codegen Cuda code is

for(int i = 0; i < 128; ++i) *(float*)A[i] = 0.000e+00f;

But I want the Cuda code is

half A[128] = {0};

How can i do this in TensorIR? @tqchen

Is there much difference in terms of performance if we unroll the loop?

:grinning: i tried it, the performance is similar, thanks.

1 Like