Hi
I want to tensorize cache_read by using extern function “memcpy”. But after this operation, I can’t bind thread any more.
The origin device code like is:
nram A_nram[128];
nram B_nram[128];
memcpy(A_nram, A, 64);
for(int32_t i=0; i<4; i++){
B_nram[i] = A_nram[i] * 2;
}
memcpy(B, B_nram, 64);
In schedule, I wanna use s[B].bind(ni, threadx), but it call:
Bind have a unmet assertion: (uint1)0, on argument Aa.shape[0]
Does someone met this problem before? Please tell me some solution if you have some idea. Thank you!
Besides, I write tensorize function as follow:
def mlu_cache_read(l, dtype):
a = tvm.placeholder((l,), name=‘a’, dtype=dtype)
b = tvm.compute((l,), lambda i: a[i], name=‘b’)
Aa = tvm.decl_buffer(a.shape, a.dtype,
name=“Aa”,
offset_factor=1,
strides=[1])
Bb = tvm.decl_buffer(b.shape, b.dtype,
name=“Bb”,
offset_factor=1,
strides=[1], scope=‘nram’)
def cache_read(in_, out):
ib = tvm.ir_builder.create()
aa = in_[0]
bb = out[0]
ib.emit(tvm.call_extern("", "__memcpy",
bb.access_ptr('w'), aa.access_ptr('r'),
l*4, "enum(GDRAM2NRAM)"))
return ib.get()
with tvm.build_config(offset_factor=1):
return tvm.decl_tensor_intrin(b.op, cache_read, binds={a: Aa, b: Bb})