If I want to store some small size constant arrays (like a slice of conv weight) in the CPU’s cache and keep it as long as I need, do we have some primitive/function to do such things?
For instance, we have ir_builder.allocate to indicate the memory scope for GPUs, it could be either shared, global or local.
I see. Please correct me if I’m wrong: Instead of directly state it with some kind of primitive, basically we have to carefully design the load/store behaviour and order of the program so as to make the final runtime maximize the cache hit/reuse rate as much as possible.