[RFC] Bifurcation of APIs across Python and C++: A Case Study and Recommendations for Prevention

Thanks @electriclilies. My previous message about python was not advocating for checking in prototyping code. When looking at the use of python, we would need to look at the specific contexts. Let us consider a few specific scenarios:

  • S0: In most of the compiler infrastructure, such as pass, pass context, these implementations are all already in C++, because of the reason you mentioned
  • S1: The “stitching APIs” that stiches the calls to passes together was originally in python(lower) and should be fixed by modularizing and move to C++.
  • S2: Certain model importers, such as caffe2, pytorch, is much easier to implement in python, because these frameworks provide easy ways to introspect the internals through python API. Doing these importers in c++ would inccur non-trivial link dependency and sometimes infeasible.
  • S3: When we are doing compilation and invoke cost models, we need to interact with ML toolkit. Majority of the ML-ecosystem(such as pytorch, xgboost) are in python and interact through python is the most desirable way.
  • S4: When we want to introduce utilities such as external dataset loading, it is easiest to create adaptors to python packages such as pytorch and tensorflow.

As we can see that in the case of S0 and S1, C++ is certainly something that we should push for. While in the case of S2, S3, S4, allowing the use of python would help us because the considerations above.

Additionally, in most all cases, when possible, a interface will be defined in C++ so that the python corresponding implementation can be swapped out in a modular way if needed later.

Similar design principle are adopted by major ML frameworks such as pytorch, tensorflow. The code that represent the ML models and optimizations rules are written in python, while the core of the engine is written in c++.

In short, python do not necessarily equals prototyping and we need to look at the specific context.

Summary

Trying to summarize the thoughts so far, I think are are actually in agreement :slight_smile: Specifically, I agree about your suggestions:

  • A0: Compiler passes and APIs should be written in C++.
  • A1: Not checkin prototyping code into the codebase and avoid technical debt.
  • A2: Ensure one version of the code in the code base and delete the other one.
  • A3: We should add typings to the python API, possibly in an incremental way, see related discussions here [RFC] Setting up Mypy type checking for TVM codebase

In addition to that, my previous post was mainly to highlight the following additional principle (as part of the python first design principle):

  • A4: All the compiler passes and infrastructure written in C++ should be easily accessed, manipulated and used through python.

In short, if someone want to build a customized compilation that involves specific transfomations, passes, and scheduling, only a few lines of python should be needed.

My previous post is mainly to advocate for A4 in addition to A0- A3