Hi, all,
I’m using TVM to generate C code for DSP, and I want to use a vector registor variable instead of array in memory.
for example, when i use tvm.sum, I got code like:
void* conv_NHWC_int8 = TVMMalloc(1024U);
...
(( int32x64*)(( int*)conv_NHWC_int8))[0] = ((int32x64)(broadcast64 0)); // load/store
for (int rc_outer = 0; rc_outer < 8; ++rc_outer) {
(( int32x64*)(( int*)conv_NHWC_int8))[0] = ((( int32x32*)(( int*)conv_NHWC_int8 ))[0] + something[rc_outer]; //load/store
}
as you can see ,there is a load/store while set to 0, and every loop do a load/store too. This means unnecessary data trans between memory and vector registor.
what I want is something like:
int32x64 temp_variable = ((int32x64)(broadcast64 0));
for (int rc_outer = 0; rc_outer < 8; ++rc_outer) {
temp_variable = temp_variable + something[rc_outer];
}
store temp_variable to conv_NHWC_int8
Is there any way to do that? Thanks!