Different codegen behaviour when codegen vectorize

script to reproduce I have apply same schedule to below two programs, but the former one error in codegen,

because RampNode lanes is 8 > 4 is not allowed. Howere the later one codegen correctly, and it will not call CodeGenCUDA::VisitExpr_(tvm::tir::RampNode const*, std::ostream&).It’s quite strange, why the former one genereate RampNode in lowering but the later one did’nt ?

I added debug info in vectorize_loop pass.

The one couldn’t codegen successfully because it do has % in BufferStore expr , so it stay as RampNode The other one dosen’t has %, so the indices will be like [ax : ax + vector_size] which can be codegen successfully