[RFC] Bifurcation of APIs across Python and C++: A Case Study and Recommendations for Prevention

This RFC is a case study on unifying the lower API, which had implementations in both Python and C++. I’ll call the duplication of APIs in Python and C++ “bifurcation” or “bifurcation of the API”. I’ll conclude by analyzing how this bifurcation happened in the first place, and then present options for changing open source policy to avoid future bifurcation. You can skip ahead to “causes of the bifurcation” if you don’t want to read the details of this case.

Unifying the Lower API

The PR is here if you would like to take a look.

In the case of Lower, there were two Lower functions, one in C++ and one in Python that have different signatures. The Python was introduced first, and is much more broad than the C++ API. Notably, it accepts Union types for inputs and args, which is difficult to replicate in C++.

Here are the two function signatures from before the refactor, for reference:

def lower(
    inputs: Union[schedule.Schedule, PrimFunc, IRModule],
    args: Optional[List[Union[Buffer, tensor.Tensor, Var]]] = None,
    name: str = "main",
    binds: Optional[Mapping[tensor.Tensor, Buffer]] = None,
    simple_mode: bool = False,
) -> IRModule
IRModule lower(te::Schedule sch, const Array<te::Tensor>& args, const std::string& name,
               const std::unordered_map<te::Tensor, tir::Buffer>& binds);

To preserve the behavior of the Python API, I split the C++ backend into three functions: LowerModule, LowerPrimFunc and LowerSchedule and registered these through the FFI. Then, from the python lower function, I dispatch into each of these C++ backends depending on the type passed in.

This approach is OK for a small number of types, but as you can see, if we had more types in the Union or multiple arguments with Union types, we’d have to make even more C++ functions.

The other really tricky part was that args is an Array[Union[Buffer, Var, Tensor]]. Unfortunately, Buffer, Var and Tensor’s only common supertype is ObjectRef. So, to replicate the Python behavior in C++, we needed to let args be an Array. But, we also need to preserve the original signature for backwards compatibility.

Thus, we end up with four function signatures in C++ to duplicate the behavior of one Python function, which is not ideal.

IRModule LowerModule(IRModule mod, bool simple_mode = false);

IRModule LowerPrimFunc(tvm::tir::PrimFunc func, const std::string& name,
                               bool simple_mode = false);

IRModule LowerSchedule(te::Schedule sch, const Array<te::Tensor>& args,
                               const std::string& name,
                               const std::unordered_map<te::Tensor, tir::Buffer>& binds,
                               bool simple_mode = false);

IRModule LowerSchedule(te::Schedule sch, const Array<ObjectRef>& args,
                               const std::string& name,
                               const std::unordered_map<te::Tensor, tir::Buffer>& binds,
                               bool simple_mode = false);

Writing the 2nd version of LowerSchedule that takes in an Array<ObjectRef> for args also caused some problems because the helper functions it called also needed their signatures changed.

For this case study, we’ll just look at get_binds. In C++, there is a parallel helper function GetBinds , but again, its signature doesn’t match the signature of the Python get_binds. The Python version allows args to be Array[Union[Buffer, Var, Tensor]], while the C++ version requires that args is an Array<Tensor>.

def get_binds(args, compact=False, binds=None):

The type of args in the Python version is Array[Union[Buffer, Var, Tensor]].

void GetBinds(const Array<te::Tensor>& args, bool compact,
              const std::unordered_map<te::Tensor, tir::Buffer>& binds,
              Map<te::Tensor, tir::Buffer>* out_binds, Array<ObjectRef>* out_arg_list);

To replicate the behavior of the Python get_binds, I had to write a second version of GetBinds in C++ that takes in args as an Array<ObjectRef>. Additionally, there are a few places that get_binds is used in the codebase, so I had to register and expose GetBinds through the FFI to ensure backwards compatibility.

The final problem I encountered was related to dead code. The C++ version is only called in one place. It calls SchedulePostProcRewriteForTensorCore, which causes failures in some of the calls that originate from Python but not in the calls that originate from C++. Fortunately, this function is dead code, so I just removed it. However, it could have caused further problems if it were actually required in some cases but not in others.

Cause of the bifurcation

  1. Python lower is introduced, and it relies on python duck typing

  2. Because code is in Python, all code is sort of like a public API, so people start using lower, get_binds and form_irmodule in ways that they shouldn’t

  3. Someone needs to introduce the C++ version, but it’s hard to duplicate the Python API in C++ because of duck typing. So the new C++ version doesn’t match all Python cases, and Python is not removed to preserve backwards compatibility.

  4. Because Python code is the primary code used, it is updated and maintained more frequently than the C++ version, which resulted in the C++ version having a call to SchedulePostProcRewriteForTensorCore that should have been removed from the codebase. Additionally, that was the only call to it in the entire codebase, so we were able to remove the body of SchedulePostProcRewriteForTensorCore, which was over 1000 LOC.

Recommendations on OSS Policy Change

First, the current TVM “policy” is that code can be implemented in Python for the prototyping and ease of implementation, and eventually, that code will be replaced with C++ and the Python will be deleted. I am not sure if this is written down anywhere, however I have heard it from community members when I ask about why there are both Python and C++ versions of functions.

In my view, the main problem with this policy is that usually, the Python code is never removed from the code base, or is removed only after it has caused problems (like in the case of Lower).

I see two potential solutions to prevent this from happening, which have different overheads in terms of enforcement:

Option 1: Turn on static type checking in Python to make it easier to convert Python to C++, and require that Python code is removed from the code base when corresponding C++ is added. Currently we assume that the Python will be removed later, which I do not think is a good assumption to make.

Option 2: Require that all new compiler APIs (as well as most logic in general) be written in C++, and only allow Python to serve as a frontend and soft wrapper of the C++ APIs.

It would be great if people could weigh in on which solution they prefer, or provide alternate solutions. I personally prefer Option 2. Although it is a bit strict, I think it will be easier to enforce than Option 1, since enforcement of Option 1 requires reviewers to check if there is a Python API the C++ is duplicating. Additionally, turning on static type checking in Python will probably require lots of changes to existing code and take a long time.

6 Likes

Option 2 would be easier to police since it is a single-stage solution. No need to add the new API then remove the prototyping API which would be multiple PRs.

Option 1 would need a tracking ticket to track the removal of the python code or the removal will be ignored and the problem will persist.

“Turn on static type checking in Python” would also greatly increase the readability of the code, something that is greatly needed.

Thanks @electriclilies for writing the summary. Would like to share some thoughts along these lines. The particular discussion are related to two design principles we followed so far.

C0: Python as a First Class Citizen

One of the key design principles we have followed so far is to make python a first class citizen. The rationale is that we want to make sure that users/developers can easily compose up optimizations passes, do transformations and manipulate the IR(as a way to do quick prototyping) in python.

As a result, developers can bring in new operator definitions, crafting importers and scheduling in a much faster pace. This design principle also enables us to quickly interact with the machine learning based cost models, most of which are based on python. Finally, researchers are able to plugin their own methods much more easily into the ecosystem. In short, python as a first class citizen is helps to to make the framework accessible to more people.

So while such design principle certainly brings restrictions, it is also deeply rooted in the framework design philosophy that allows us to build an optimization framework that is modular, extensible and easy to sue. We can see a lot of similar philosophy being adopted in frameworks like PyTorch.

First class integration with the python ecosystem have been is a key point that we choosed differently from some of the existing solutions.

C1: C++ as the Stable Core

While making things python friendly is important, it is equally important to think about a stable APIs and avoid implementation divergence. We have followed a principle that most of the stable implementations should be implemented in C++(while the object system makes them language agnostic). This has been the case for compiler passes and most of the infrastructure.

Discussions

While it is easy to give absolute priority to either C0 and C1, the most interesting question comes when we start to consider both. Specifically, how can we create a modular API that follows C1, while still enables C0? This is a harder but more interesting question we should ask ourselves.

For example, one extreme is to implement every step of compilation in C++, and only allow invocations through a CLI style interface. This extreme can create a big barrier for customization.

So the main question we are facing right now is how to maintain both C0 and C1 as much as possible. To solve this problem from the root, C2: Modularization would be key. Right now lower and build are monolithic APIs that does the TIR optimization, lowering and compilation in a single shot and it is really hard to inject things in between.

Imagine that instead we create a API that returns the Pass pipeline(as a list of passes) to the user, which will be invoked by build. If developers want to further customize the pass pipeline, they can manipulate the list of passes further(e.g. by add or deleting passes). Such new API would allow customization without much divergence because the default compilation pipeline is still defined in one place, while still enabling a good level of customization in python.

Thinking about the current state of the codebase. There are components that are still implemented primarily in python, including some of the frontend importers, connection to machine learning toolkits(that have to depend on python), dataset ingestion, as well as some TOPI schedules(the community made a choice to focus on python). So it is unlikely that we are going to eliminate python completely.

However, I think we tend to agree that as we have an idea of a stable API, C++ is preferred(while keeping C0 in mind and expose a pythonic interface). As a result, I think it is a good idea to require Option1 for new python APIs.

We should also start gradually add typings to the python APIs so that we can turn on mypy checkings in the CI for future enforcement. While it could take some effort, doing it incrementally will likely gives us much benefit.

@tqchen I do agree that in this case, modularizing the lower and build APIs will go a long way to improving the code. However, that doesn’t solve the general problem of bifurcation, which also impacts other APIs across the codebase.

My concern with Option 1 is really similar to @rkimball’s concern – it relies a lot on the community being aware of the policy and enforcing it. In my mind, the most reliable way to implement Option 1 is to require PRs that duplicate Python in C++ remove it in the same PR. However, this is not foolproof-- Python APIs often live in very different parts of the codebase, so this requires the reviewer to be aware of the policy, aware of the existing Python API, and be on board with the policy enough to prevent code from being checked in until the Python is removed.

I’d like to suggest a meet-in-the-middle approach:

  1. Require all new Python code to have type annotations
  2. Require all new compiler APIs and passes to be written in C++
  3. Operator registration and TOPI can still be written in Python or C++, however, we should consider ways to prevent duplication / bifurcation of operators in C++ and Python, because with registration, it is very unclear whether the Python or C++ version is actually being run when both are in the codebase.

I think that requiring all new compiler APIs to be written in C++ is important. I agree with you that Python is good for prototyping. However, should prototyping code should be checked into the repo? Committing prototyping code creates technical debt, since people start to use the prototyping APIs when they are checked in. Additionally, people can always do prototyping on own branches in Python.

So this is the question I have now: Are we at a point in the project where the cost of this technical debt outweighs the benefits of committing prototyping code to the main branch?

We should require that all new Python code has type annotations and also checking via mypy. Other than that I like your meet-in-the-middle approach.

1 Like

Thanks @electriclilies. My previous message about python was not advocating for checking in prototyping code. When looking at the use of python, we would need to look at the specific contexts. Let us consider a few specific scenarios:

  • S0: In most of the compiler infrastructure, such as pass, pass context, these implementations are all already in C++, because of the reason you mentioned
  • S1: The “stitching APIs” that stiches the calls to passes together was originally in python(lower) and should be fixed by modularizing and move to C++.
  • S2: Certain model importers, such as caffe2, pytorch, is much easier to implement in python, because these frameworks provide easy ways to introspect the internals through python API. Doing these importers in c++ would inccur non-trivial link dependency and sometimes infeasible.
  • S3: When we are doing compilation and invoke cost models, we need to interact with ML toolkit. Majority of the ML-ecosystem(such as pytorch, xgboost) are in python and interact through python is the most desirable way.
  • S4: When we want to introduce utilities such as external dataset loading, it is easiest to create adaptors to python packages such as pytorch and tensorflow.

As we can see that in the case of S0 and S1, C++ is certainly something that we should push for. While in the case of S2, S3, S4, allowing the use of python would help us because the considerations above.

Additionally, in most all cases, when possible, a interface will be defined in C++ so that the python corresponding implementation can be swapped out in a modular way if needed later.

Similar design principle are adopted by major ML frameworks such as pytorch, tensorflow. The code that represent the ML models and optimizations rules are written in python, while the core of the engine is written in c++.

In short, python do not necessarily equals prototyping and we need to look at the specific context.

Summary

Trying to summarize the thoughts so far, I think are are actually in agreement :slight_smile: Specifically, I agree about your suggestions:

  • A0: Compiler passes and APIs should be written in C++.
  • A1: Not checkin prototyping code into the codebase and avoid technical debt.
  • A2: Ensure one version of the code in the code base and delete the other one.
  • A3: We should add typings to the python API, possibly in an incremental way, see related discussions here [RFC] Setting up Mypy type checking for TVM codebase

In addition to that, my previous post was mainly to highlight the following additional principle (as part of the python first design principle):

  • A4: All the compiler passes and infrastructure written in C++ should be easily accessed, manipulated and used through python.

In short, if someone want to build a customized compilation that involves specific transfomations, passes, and scheduling, only a few lines of python should be needed.

My previous post is mainly to advocate for A4 in addition to A0- A3

Sounds good, glad we’re on the same page! I’ll update the recommendations and make a PR to the RFC repo.

Hi @electriclilies @tqchen ,

This effort is much appreciated, but I would like to have few more clarifcation around the following arguments :

While this may be a problem, but not exactly sure the cause is code being in “Python” though. I guess we need a better explicit policy here.

Do we know why someone needed to introduce the C++ version ? (This concern arises in my next question as well)

Why is this important to be in C++ ? – Given that most of our unit testing is in pytest, if the stitching is in Python, this might encourage better unit and partial integration tests. Moreover, it increases the ease of debug-ability between passes.

In principle I agree we should not perform bifurcation in the codebase, but should not that be just be careful in reviews to check for duplicate APIs ? (This is seen at the caller of the API – in the case of lower there is registration check)

Thanks @manupa-arm .

As mentioned in the last post. I agree with you that there is a tradeoff and python/c++ each comes with its own benefits.

To followup on the choice of S1(specifically on the build and lower API). The tradeoffs are:

  • T0: On one hand, putting things in python would offer some flexibility and debugging etc.
  • T1: On the other hand, we will want S1 in C++ API in some form, because there are other C++ APIs (CompileEngine) that depends on them.

T1 is the main reason why putting lower/build to c++ is desirable in this case.

But we should also indeed address the point about flexibility offered by python. This is why in my last post I brought up A4 as a key point. Specifically, the reason we are talking about why S1 should go is still because S1 contains a few hundreds lines of monolithic code.

  • P0: It is still relatively hard for someone else to reconstruct such a lower pipeline using python API
  • P1: The lower/build makes certain assumptions that is not necessarily composable with other transformations.

By keeping A4 in mind, the goal is to solve these Ps by modularizing the APIs. For example, we can create a function that returns a list of default lowering pipelines that can be manipulated through python.

To make an analogy, we can think of S1 like building model configurations(e.g. resnet, mobilenet) in modern ml frameworks. The modern nn frameworks are modular enough so that we can easily build those pipelines in python. Thee “default pipeline” can be constructed through c++ and further manipulated by python as well.

So the goal of python first is to come up modular APIs that makes f S1 as easy as building the nn model in ml framework.

1 Like