[RFC] Stride caching in registers on the GPU

Hello! I encountered a problem about scheduling stride caching in registers on the GPU. I discussed it in the following post with @maplegu.

Looks like the "warp” option for cache_read is not fully working. I’m also thinking if we can use “local” + virtual threads here as well. Currently this also doesn’t work as it infers wrong array indices. Not sure if this is a bug or I’m using the schedule primitives in a wrong way.

Any advice on how to address this problem is appreciated!

@tqchen @Laurawly @FrozenGene @Huyuwei @YuanLin

2 Likes