What is the difference between the
set_scope functions? Both are at least related, in that they both specify some memory block, but I don’t see the differences. What do they do differently, and when should I use which?
What is the difference between the
I assume you’re referring to the tir Schedule primitives
I agree that the naming is slightly confusing for a new user (since scope means something different in generic programming), but those 2 primitives don’t do the same thing.
compute_at moves a computation
tir.Block to a new loop depth, which I guess can be called the scope of computation of that block.
set_scope on the other hand, does not refer to “computation scope”, but actually to what TVM calls a storage scope, which refers to the location in physical memory where a buffer is to be stored.
Example storage scopes are
global (DDR memory) and
texture (special memory in GPU),
global.vtcm (special user managed cache in hexagon), etc. So
set_scope allows one to set the storage scope of a buffer and has nothing to do with movement of computation.
Thanks for this! There is a third operation that seems to play into this, which is
cache_read. Where would I use that? My use-case is that I want to integrate custom hardware into TVM. From what I gather, I can use
cache_read together with
compute_at to move the reading of the data into the appropriate level. Where would
set_scope come into play here?
cache_read allows you to copy data from a global memory to the mentioned storage_scope for a read buffer (a buffer that is read in the mentioned block). As shown in that documentation example, it inserts a new block to copy the data in to the
"local" memory. Since it works on a read buffer, the copying block is inserted somewhere before the mentioned block.
Similarly, if you would like to write the output of a buffer to a different memory scope, you can use
cache_write, in which case, the copy block is inserted after the mentioned buffer.
set_scope is useful for Intermediate buffers that are allocated inside the PrimFunc, to specify that the allocation has to directly happen on a different
storage_scope (as opposed to allocating on a global memory and then inserting a cache_read/write to copy to a different scope).
I see, thanks! So
cache_read is what I am after. There is just one more thing I find confusing. The VTA Tutorial has this example (from here:
A = te.placeholder((o, n, env.BATCH, env.BLOCK_IN), name="A", dtype=env.inp_dtype) A_buf = te.compute((o, n, env.BATCH, env.BLOCK_IN), lambda *i: A(*i), "A_buf")
This is also used to move data into the accelerator. Is there a difference to this approach vs
cache_read? They both seem to achieve the same thing, but I am not sure if that is actually the case or if there are some slight differences I don’t see.
I’m not familiar with VTA, but based on the example in that page, it seems like they’re just inserting those extra compute stages in te, and then later use
set_scope as shown below:
Then they seem to use
s[A_buf].compute_at(s[C_buf], ko) to move the computes to a different location. I’m not sure why they implement their own copy compute and then a
set_scope instead of
cache_read/write, but I’m guessing that’s because they’re working with
te computes and schedules instead of
TIR schedules (I’m not very comfortable with te scheduling internals, so maybe this is needed to apply the compute_at).
In TIR, I don’t think this would be necessary as a cache_read insert the new copy block automatically that can be scheduled on its own by getting that block with