[tir] How to explicitly control the memory scope/planning when defining an op with ir_builder on CPU?

mgeek · October 19, 2022, 8:16am

If I want to store some small size constant arrays (like a slice of conv weight) in the CPU’s cache and keep it as long as I need, do we have some primitive/function to do such things?

For instance, we have ir_builder.allocate to indicate the memory scope for GPUs, it could be either shared, global or local.

 ker_buf = irb.allocate("float32", (KH*KW*CI,), name="kernel buffer", scope="global")

I tried to use this on CPU but it seems changing the scope doesn’t make any difference.

@ziheng @anijain2305 @lhutton1 @FrozenGene

FrozenGene · October 20, 2022, 1:10am

I think we hard to do it on cpu, because CPU’s cache doesn’t been controlled by programmer explicitly.

mgeek · October 20, 2022, 2:24am

Thanks for the reply!

I see. Please correct me if I’m wrong: Instead of directly state it with some kind of primitive, basically we have to carefully design the load/store behaviour and order of the program so as to make the final runtime maximize the cache hit/reuse rate as much as possible.

FrozenGene · October 21, 2022, 2:30am

+1, you are correctly.