[TIR][Schedule] question about declare a read block explicitly shared memory size

Hi, I am using tir.schedule to codegen an matmul gpu kernel 4096409616384 and implement warp tile, thread tile, cache read, cache write feature.

However the generated tvm.scirpt shows cache read A and shared memory size A, is 4096 * 16384. I believe there is a tir.transform in lower procedure, which adjust the cache_A to the acctually used size like 2048.

So, my question is can I declare smem size explicitly? And anyone can tell me which tir.transfrom it use to prune the sharedmem size.

before transform

after transform