Prevent tvm.lower from Flattening the buffers

Hi I was experimenting with TE expressions and schedule. I found a similar [issue](https://discuss.tvm.apache.org/t/show-unflattened-tensor-in-tvm-lower/1728/3) but this didnt work for me.

I have the following example:

A = te.placeholder((128, 128), name="A")

B = te.compute((128, 128), lambda i, j: A[i, j] * 2, name="B")

C = te.compute((128, 128), lambda i, j: B[i, j] + 1, name="C")

sch = te.create_schedule(C.op)

with tvm.transform.PassContext(disabled_pass=["StorageFlatten"]):

    ir_module = tvm.lower(sch, [A, C], name="test", simple_mode=True)

print(ir_module)

The IR Module printed has flattened/1D buffers. Is there a way to avoid flattening the buffers and preserve the original dimensions in the computations.

# from tvm.script import ir as I

# from tvm.script import tir as T

@I.ir_module

class Module:

    @T.prim_func

    def test(A: T.Buffer((128, 128), "float32"), C: T.Buffer((128, 128), "float32")):

        T.func_attr({"from_legacy_te_schedule": T.bool(True), "tir.noalias": T.bool(True)})

        B = T.allocate([16384], "float32", "global")

        B_1 = T.Buffer((16384,), data=B)

        for i, j in T.grid(128, 128):

            cse_var_1: T.int32 = i * 128 + j

            A_1 = T.Buffer((16384,), data=A.data)

            B_1[cse_var_1] = A_1[cse_var_1] * T.float32(2.0)

        for i, j in T.grid(128, 128):

            cse_var_2: T.int32 = i * 128 + j

            C_1 = T.Buffer((16384,), data=C.data)

            C_1[cse_var_2] = B_1[cse_var_2] + T.float32(1.0)

How do I prevent tvm.lower from flattening the buffers?

Thanks