[Hexagon] VTCM not functional

So, we want to use VTCM to acclerate the computation.

And we follow the test file here and everything is ok.

What we want is something similar to preloaded_vrmpy in the test file, as follows.

The TIR of preloaded_vrmpy is like this:

Right now, we have a model imported from tflite. And we have applied vectorization to it. However, although we have allocated buffers in the TIR to VTCM as shown above, in some ugly way, the running time does not change before and after using VTCM.

That’s weired. So I wonder if we have missed some other important thing.

If anyone could provide some help or suggestions, it would be greatly appreciated.