Now the souce code of vectorize for OpenCL looks like:
vstore2((vload2(0, ( half*)compute + (ff * 2)) + (vload2(0, pad_temp_shared_local_local + 0) * ((half2)(input1_shared_local_local[0], input1_shared_local_local[0])))), 0, ( half*)compute + (ff * 2));
but i want something like:
half2 compute;
compute = compute + pad_temp_shared_local_local * input1_shared_local_local;
pad_temp_shared_local_local and input1_shared_local_local are half2 registers.
Could you share how to write schedule to generate such code?