Optimizing Reduction Initialization in Generated CUDA Code

This issue has been solved.