One feature that is great about many compiler toolchains is incremental recompilation.
E.g. imagine a C++ project with 40 source files which I compile; and then I change a few lines in one of the files. A clever compiler toolchain will only recompile the parts of the program touched by this change.
This is essential in projects with high compilation times.
Now, I have been exploring auto-schedules and such in TVM, and I am interested in the performance impact of different schedules. The auto-scheduler evaluates the performance of schedules in standalone workloads (sub-graphs, distinct from the whole model).
However, I am interested in the performance of these schedules in the full tensor program.
If I have a logfile with schedules for three workloads (A1, B1, C1), I can compile with:
with auto_scheduler.ApplyHistoryBest(log_file):
with tvm.transform.PassContext(
opt_level=3, config={"relay.backend.use_auto_scheduler": True}
):
lib = relay.build(
mod, target=target, target_host=target_host, params=params
)
However, let’s say I have alternative schedule for B, and thus can generate a second logfile (A1, B2, C1). Right now, as I understand it, to evaluate this schedule, I would need to recompile the whole model from scratch, even though only one part of the model has changed.
Thus, I am interested in how incremental compilation could be achieved in TVM - only recompile workload B, leaving A and C untouched.
If we know the parts of the graph we want to recompile, how complicated would it be to do this? What parts of TVM would need to be changed and extended? Any challenges or shortcuts you can foresee?
I would like it be to be as conceptually simple as compiling a standalone module of the part that changed, then running lib.changed_part = new_subgraph_lib