This is a high level meta-RFC for the general design principles around APIs involving passes and optimizations of the TVM stack. As we start to add new passes and optimizations to the infrastructure, there is an increasing tension between two styles of API designs.
To take an concrete example, imagine that we want to add a preprocessing option that quantizes a f32 model into int8, there are two possible options
Option 1: Add Options to the All-in-one Build API
mod = relay.frontend.from_keras(model_name)
with relay.BuildConfig(quantize_model=True, quantize_start_layer=1):
result = relay.build(mod)
In this case, relay.build serves as an all-in-one API that does everything, and the additional switches are just like -Wall switches in the compiler that switches things on and off.
Option 2: API Composition
mod = relay.fronend.from_keras(model_name)
mod = relay.quantize.Quantize(from_layer=1)(mod)
result = relay.build(mod)
We call another quantize pass before we do build. The advantage of this approach is that this is more composable. Imagine that I want to do additional pass after quantization, for example, change my layout to a customized layout that fits into the accelerator, we could insert a pass to do so
mod = relay.fronend.from_keras(model_name)
mod = relay.transform.Sequantial(
[relay.quantize.Quantize(from_layer=1),
relay.transform.ConvertLayout(from="NHWC", to="NHWC4c")])(mod)
result = relay.build(mod)
Summary of proposal in this RFC
This RFC advocate for option2. Note that once we have option 2, we could also build a customized pipeline that exposes an API like option1. Many of our proposed APIs started looking like option1, because option1 is the API that traditional compilers exposes through CLI.
However, the possibility of optimization choices brings the need to explore possible optimization pipeline patterns. Just like the same as the need for exploring neural network architectures. Today, we are getting used to composable APIs that construct resnet by layers, and then invoke it through a fit function. We can do the same for the pass API, with the analogy(pass <-> layer, fit <-> build)
Please share your thoughts on this, and we can collectively have a meta-guideline that helps us in future API designs