How to write schedule of vectorized compute for OpenCL

sunzj · April 7, 2020, 8:53am

Now the souce code of vectorize for OpenCL looks like:

vstore2((vload2(0, ( half*)compute + (ff * 2)) + (vload2(0, pad_temp_shared_local_local + 0) * ((half2)(input1_shared_local_local[0], input1_shared_local_local[0])))), 0, ( half*)compute + (ff * 2));

but i want something like:

half2 compute;
compute = compute + pad_temp_shared_local_local *  input1_shared_local_local;

pad_temp_shared_local_local and input1_shared_local_local are half2 registers.

Could you share how to write schedule to generate such code?