TVM has a warp memory abstraction. If you use allocate((128,), 'int32', 'warp')
, TVM will put the data in thread local registers and then use shuffle operations to make the data available to other threads in the warp. Out can also use the shuffles directly if you want. I’m not sure how exactly to use warp shuffles in hybrid script, but you can grep the codebase for tvm_warp_shuffle
.
1 Like