I have been trying to understand the scheduling primitives in TVM. While I understand the basic ones like split, reorder, tile and fuse, I am having difficulties with the more complicated ones. I list a couple specific questions I have below.
Why does cache_write() also change the data layout? Is this a design decision (if so, why?), or something that is necessary for correctness (could you give me an example of this)?
What exactly does env_threads() do? I see that it is used for persistence with the scan operator, but I don’t exactly understand how it transforms the schedule to enable the persistence.
I have been trying these operators out on my own and looking the source code of the various lowering passes when I get errors. This is fairly time consuming. Does anyone have any suggestions regarding how I might go about exactly understand the specifications of each of the scheduling primitives?
Okay. In the case of cache_write(), I was referring specifically to the function CacheWriteWithReLayout. Does it not perform some data relayout? Maybe I don’t understand the source code very well then.
In this post, An operational model of schedule primitives in Tensor Expression, we are creating a document describing some schedule primitives. For cache_write, we have an example showing how it changes layout for the new stage. I understand it is a sugar combining layout change, layout restoration, and data copy (memory scope change). You can probably add a stage to do copy by yourself and call set_scope to do the same thing.
The layout change happens in ReplaceOriginalOp(), the last line of CacheWriteWithReLayout().
Hi! Yep. I have been looking at the document. Thanks for working on it!
I had another specific question regarding the semantics of compute_at and bind that I had asked here. Could you take a look at that as well? Thanks a lot!