Is it possible to control data layout of shared memory?

I came across this post which describes how to use cache_write to control the layout for intermediate buffers.

Is it possible to achieve the same result on a shared memory buffer for CUDA targets? Applying the same trick does not work directly, as shared memory is itself created by a cache_write/cache_read, and errors when I try to inline in the same way.