[Process RFC] Empowering New Scoped Module to the Project

xqdan · October 4, 2022, 5:33pm

Let me be straight, this is so weird rfc here.

Optional module is very natrue to me, why do we even need a rfc to discuss, and still there are some guys saying no! that’s crazy! If you have any traditional compiler experience, you should know each pass has its own flag, we can disable it if you want; also we have different optimization level, like O0, O1, O2, O3, Os, the higher optimzation level you use, the higher possiblity you find bugs will be. And lots of classical optimizations came from optional modules/passes at the beginning, like SSA, polyhedral optimzaitons. There is a process for them to become stable.

So can we just align on the optional module first? Then we can disscuss A or B optional module when we need in other place. High level mechanism should not be fused with details here. @tqchen

mbaret · October 4, 2022, 6:38pm

I’m very sympathetic to the goals of this RFC and agree that in the ML space being able to move rapidly is essential. One point I think though that the RFC should provide clarity on is the definition of ‘optional’.

In a modular project, almost everything can be argued to be optional to a greater or lesser extent. For instance:

All the frontends are optional
All the optimization passes are optional
All the TIR backends are optional

This is simply a consequence of having a modular compiler infrastructure that has to support lots of different use cases. However, I think a stricter definition of ‘optional’ is appropriate when regarding such a process change.

I would suggest that a truly optional component should not introduce user-facing features for which there are no other paths in the code. Otherwise, for users and developers who care about those features the new component will not be optional, but essential to their participation in the community.

Taking frontends as an example, any individual frontend can be argued to be optional because there are alternative frontends (or you could just use Relay directly). However, for a user who cares about importing a TensorFlow model, the TF frontend is essential. We therefore have a group of users who depend upon a feature which if it were removed/deprecated/broken would alienate them from the community. Other examples which fit this pattern would be things such as the OpenCL TIR backend, the BYOC infrastructure, microTVM etc. In such cases, even though you can ‘toggle’ these features off, I don’t think it is correct to view them as optional.

A truly optional component would be, for example, another TF frontend implemented in a different way (maybe using MLIR for argument’s sake). That’s because there is an alternative path that a user can rely on to still make use of TVM with TensorFlow. It does not expand the user-facing functionality of TVM and can be totally disregarded by the majority of users and developers.

Personally, I still don’t love the concept of entirely optional components (like described above) exactly because of the maintenance reasons mentioned in other replies. I do, however, appreciate that they are sometimes a necessary evil where the alternatives (e.g. forked development) are worse.

xqdan · October 5, 2022, 6:35am

To me these reasons are negtive here,

Many mentioned maintenance, is there anything to do with ‘optional’? I don’t think so, all have maintenance issue, like nnvm, it’s not optional first, right? IMO, time and users will make chioce gradually. we should be open minded first, let anything useful out, you may think A or B is useless or troublesome, but this is big community, right?

Some mentioned clear longevity, this is intresting to me. Take nnvm as example, could we predict the future at the time we landed the nnvm? Over conservative will block innovations！

There are many optional frontend/languages in gcc/llvm, but languages are user-facing features, right? Can’t we do the same thing in TVM? If you don’t like the new lang, you can just ignore it, someone else wants that. Also this is truly optional.

mbaret · October 5, 2022, 12:07am

As I understand, LLVM only has Clang as a ‘primary sub-project’ (i.e. part of the main LLVM repo). The other ‘optional’ languages (like frontends for Ruby, Rust, Python etc.) are handled as completely separate external projects rather than as optional modules within the LLVM project. I believe it’s much the same for GCC.

I think the ‘LLVM/MLIR solution’ here would be that were someone to offer a new frontend to TVM which was for some reason rejected, they would be welcome to instead implement it in an external repo (e.g. something like Torch-MLIR). I’m not necessarily advocating for that, just indicating that it’s an alternative to the proposal being made in this RFC.

xqdan · October 5, 2022, 12:28am

Nope，GCC supports C/C++/Objective-C, Fortran/Pascal/Java/Ada/Go and others. Clang only supported C/C++/Objective-C/Objective-C++, supports more with OpenCL, CUDA, and RenderScript now, some people are working on supporting others for the gap with GCC. Anyway, we can do the same thing on TVM.

comaniac · October 5, 2022, 1:17am

Thanks @Hzfengsy for the proposal and everyone for the meaningful discussions. Just my two cents about a life of an “optional module”: Back to years ago in around TVM 0.5 or 0.6, we only have Relay (with only graph executor but not VM), TE, AutoTVM, TensorFlow/ONNX frontends (yes, we don’t even have PyTorch frontend at that time). At that time we also only support x86/arm CPUs and NVIDIA GPUs.

Then we have Relay VM, PyTorch frontend, BYOC, AutoScheduler, etc. For backends, we supported arm compute library, TensorRT, OneDNN and CUTLASS. For hardware, we supported arm Ethos and other edge devices. To me every of them was an optional module when they was proposed, and I personally didn’t even fully agree with some modules. However, I didn’t see a big problem accepting them in the upstream TVM. They will either get stronger or weaker, depending on how many people care them. For example, not many people used Relay VM and AutoScheduler when they were in the early stage. Then more and more people started using them because they proved to be effective. On the other hand, there are also some modules that didn’t go smoothly. For example, we have the pipeline executor that executes multiple requests in pipeline fashion. The module is still there but just very few people using it.

In summary, I just want to point out that the original goal of establishing the RFC process is to gather more contributors with the interest of the same module so that they can work together to move faster. In more straight words, we should consider more about how to accept an RFC like “I would accept this RFC if …”, instead of “I think this RFC should NOT be accepted because …”. That’s why I support this Process RFC.

mbaret · October 5, 2022, 12:49am

Clang isn’t an optional component in the LLVM project - it’s a primary sub-project with the same status as LLVM core. If we want to follow the example of Clang then we should definitely not treat frontends specially. See the Clang contribution guidelines which set a very high bar for adding extensions exactly because of the maintenance issues identified in other replies: Clang - Get Involved.

However, extensions (particularly language extensions) have long-term maintenance costs for Clang. The benefits of the extension need to be evaluated against these costs.

Raising the LLVM project is fine here, but it argues for exactly the opposite process to what this RFC proposes.

I stand corrected on GCC, I’m not really familiar with it compared to LLVM. I’ll let GCC experts comment further on the OSS practices around contributions there. I’d be especially interested if they have similar alternative processes for ‘optional’ components as is proposed in this RFC.

mbaret · October 5, 2022, 1:01am

You’ve listed a number of major features added to TVM in the past couple of years there as optional modules. It would seem to me then that this process RFC would actually become the default process for new components in TVM. I’m not sure if that’s the intent of the RFC - I would hope it isn’t.

xqdan · October 5, 2022, 1:18am

Correct me if I am wrong based on what you posted here:

You agree on optional module,
languages of llvm/gcc are optional(not Clang), and they are user-facing,
We should set a proper bar for new optional module, considering maintenance issues.

IMO, all of us can align on 1 first.

comaniac · October 5, 2022, 1:17am

I don’t get your point. How would we know these modules will become “major features” at that time?

xqdan · October 5, 2022, 1:29am

When something is new and unstable, what do you it want be: 1) just go die; 2) default process of mainstream, 3) optional moudule, 4) other short answer from you?

mbaret · October 5, 2022, 1:37am

I agree with the idea that there are some components in TVM which genuinely have no larger design implications and don’t fulfil an essential use-case for the compiler which could be subject to this special process as outlined in the RFC. Such components would never intend to reach the S1 state.

My concern is that because we’re not defining what optional means, it’s completely open to interpretation. Taken to the extreme, you can even say Relay is optional because you could write an alternative way to get to TIR from the frontends.

As a hypothetical, let’s say BYOC was merged in S0 state but one part of the community strongly objected to the design. Then loads of people started using BYOC in S0 state and came to rely on it. Now we want to move to S1, but the design was poor so part of the community still objects and blocks it (vetoes now binding). We are left in a weird state where we have S0 code in main which is widely used but can’t become S1 quality because S0->S1 requires a different process.

I think unless I’m missing something with my hypothetical example above, only option 2 is long-term stable. It will have to go through the default process to reach S1, and I think it’s desirable that all code either eventually reaches S1 or deprecation for maintenance purposes. The other option is that this becomes the default process for all RFCs so such vetoing couldn’t occur in the S0->S1 transition.

Hzfengsy · October 5, 2022, 2:05am

It is helpful to have a discussion and that is what this thread is about. Trying to address a few focused items

Collective Shared Vision

It is important for all contributions to articulate their relations to the shared vision of the community. The particular proposal still requires the RFC to be published in normal RFC format, where the rationales and alternatives are being discussed.

But what defines collective shared vision? and how can that evolve to serve G0 and G1? This proposal addresses that question. What if the majority community agrees with a vision, but some don’t due to relatively subjective reasons?

When many are talking about the “bar”. I think we are not talking about code quality necessarily, but our overall thought process in bringing consensus around S0, S1, and S2 modules.

We would certainly benefit from more open-minded inclusions of S0 modules if they get majority support from the community. While S1 and S2 would bring more deliberation overall.

No Duplication

I would like to strongly call out that while most OSS projects encourage discussions about relations with existing modules (which we also do in the proposal), no duplication is a very strong term, can be used subjectively, and is unwelcoming. The specific example of “duplications” (of FX and TorchScript, TOSA and other graph IR, Spark/GraphX, and SparkGraph) all shows that common OSS projects allow some perceived duplication (of implementing similar functionalities) Want to highlight that it is not “simple duplication of code”, which of course should be avoided when possible and encourage reuse.

Code Quality and the Bar

I would like to call out that some of our discussions have less grounded implied messages in the discussion.

“If you do not do X, then you produce low-quality code. (or other bad consequences)”

This kind of implication is unwelcoming to the community members.

The result of such implication would mean that the community goes else-where to places that welcome them and can foster a culture of inclusion while shepherding that gets S0 modules faster, and sometimes due to the given agility to consider different architecture solutions, bring HIGHER quality code overall through better architecture. We have seen this, for example, in some of the ease of use enabled by TorchFX.

Everyone thrives on writing high-quality code. As stated in the proposal, the code patches and changes are subject to the same reviewing process and standard by the committers. Code changes are usually more grounded, as we can say X has a bug or write test coverage, which most people agree with and will act on. Having grounded code quality discussions helps us to make real strides toward a better codebase.

S0-stage modules can be high-quality in terms of robustness and the ability to handle the problems that they are designed for – which are needed by some(but not all) of the community members. They provide values and can be maintained in the long term (assuming the contributors continue to do so, and otherwise triggers deprecation). To be clear, the code patch follows the same process with grounded discussions and the proposal does not suggest a lower standard in the code reviewing process.

The proposal focuses on module establishment. Discussions around S0-module establishment can be subjective and less grounded than suggestions in code changes (where we can simply focus on whether a code has bugs, or suggest a concrete alternative implementation that can be acted upon in order of days).

Things become less constructive for the community when we hold S0-module due to subjective reasonings. That causes alienation of community members, and the set of users who need the S0-module do not get empowered. It is also a missed opportunity for the community to get to the S1 stage since the enthusiastic contributors don’t feel empowered (even if the majority agree).

So the proposal merely calls out the different levels of open-mindedness in the evaluation S0, S1 and S2 changes. Which reflects the practices we experience in different open source projects. The “bar” does not directly tie to code quality, but the open-mindedness in establishing an S0-module that many welcome and need, but do not necessarily need by ourselves.

On the contrary, we often see positive cases of code quality due to such open-mindedness thanks to the agility to be able to consider different architecture solutions, which brings HIGHER quality architecture throughout the process of S0->S1(since nobody can predict everything in the beginning). For example, we have seen this for example in some of the ease of use(e.g. TorchDynamo) enabled by TorchFX that were not part of the initial proposal.

Module Maintenance and Longevity

The longevity of a module depends on the following factors:

Whether or not it clearly solves the needs of some of the community members.
The level of maintenance, which is signaled by the proposal term: a clear set of maintenance and S0 modules can be triggered for deprecation once there is lack of maintenance.
The fact that a module is at the S0 stage only means that it is well scoped and does not affect its longevity. It would affect maintenance, in the case of S1 such cost is clearly scoped and with easier deprecation.

The above are reflected in the proposals. The topic of longevity is also subjective and an evolving topic. As modules evolve, we get more clarity on the opportunities and help us to make more informed decisions along the way. This RFC also clarifies the boundary of such transitions S0->S1, which allows us to have more avenues to make informed decisions at the corresponding time point.

Again the proposal aims to set up guidelines for us to consider S0 contributions with an open mind when there are subjective technical disagreements. In such cases, we need to acknowledge that each of our understanding may not be perfect, and it could be possible that the S0 proposal actually is “higher quality” and solves problems in the right way, given the majority of the community thinks so.

Notably, the S0, S1, S2 only have things to do with the scope of the change, as a result, the impact on the rest of the modules, and different levels of due diligence (and open-mindedness) in establishing the module (that motivates this proposal). S0 modules can have high quality, be maintained for a long time to serve user needs, and be cohesive with many other modules. They just benefit from more open-mindedness due to their well-scoped impact helps to empower community members.

In such cases, we should strongly consider community over code, as it is always a win-win for the community to bring in new contributions in a contained manner.

Hzfengsy · October 5, 2022, 4:09pm

Thank you everyone for great feedback so far. We would like to incorporate some of the suggestions

Terminology Clarification

We agree that it is helpful to clarify the term and avoid confusion. Moving forward, we will use S0-module instead of “optional module” as that can be confusing and leads to different implications as they mean different things for different people

An S0-stage module is a module that:

Clearly isolated in its own namespace.
Motivated by community use cases needed by some(but not all) users in the community.
No disruptive change to the rest of the codebase
Can be easily deprecated by removing the related namespaces.
Can be turned off through a feature toggle to contain the overall dependency from the rest of the modules.

Duplication

We agree it’s good to reuse the existing codes, but it’s also reasonable for modules at S0 stage to have some necessary overlap. So, we add following line to the RFC:

We expect discussions of the relation of the proposed module with existing ones and reuse when possible, but we do not enforce hard no-overlap rules at S0 stage, as most OSS projects do not require modules to have zero perceived overlap.

How S0-module fit into TVM

We also clarify that we do encourage discussing S1 and S2 opportunities that the proposed module can bring to the overall project in the RFC. But also acknowledging that not every decision can be made in an informed manner at a single shot and we would benefit from open-mindedness for S0 modules.

We encourage discussions of possible paths of S1 and S2 level change in the RFC to help us set up the context for future evolutions. We need to acknowledge that not all S1 and S2 level decisions can be made at the beginning, and it is not necessary to force an uninformed decision at S0 stage. As a result, we would encourage open-mindedness, acknowledging that we all can be subjective in making effective calls at the S0 moment and bring community over code in this case. As a result, we do not block RFC for S1/S2 discussions, knowing that we can have an open mind due to limited scope and further evolution and developments can help us to make more informed decisions

Finally, it is skipped in many of the discussions here. We would like to call out the benefit brought to community here. We get more community members who are eager to contribute to S0 and also other parts of the TVM ecosystem. In the meantime, S1 level stabilities also address the need of community members who want to enjoy relatively stable, robust and reliable code (G0). By empowering both through S0 for G1 and S1 for G0, we will be in a much better position in better code.

Hzfengsy · October 5, 2022, 2:36am

To help facilitate future discussions.

I would suggest that we use the term “S0-module” for now instead of “optional module”.

I would also like to call out that the S0, S1, S2 in this proposal only have things to do with the scope of the change. as a result the impact to the rest of the modules, and different level of due-diligence (and open mindedness) in establishing the module (that motivates this proposal)

S0 modules can have high quality, being maintained for a long time to serve user needs, and be cohesive with many other modules.

They just benefit from more open-mindness due to their well-scoped impact to help us to achieve G1. Also knowing that through development we will gather further information so we can make informed view about S1, S2, which have their own RFC to ensure G0.

Hzfengsy · October 5, 2022, 4:18pm

As been suggested in earlier part of the threads, S0 modules should be able to be turned off through a single toggle. Here I’d like to update the S0 module definition by adding:

Can be turned off through a feature toggle to contain the overall dependency from the rest of the modules.

The proposal is also updated. Thanks for everyone’s constructive discussion so far!

yzh119 · October 5, 2022, 7:01pm

I advocate this proposal as it provides a way to progressively land and stabilize new features, without breaking existing functionality and APIs.

I read the entire thread and feel the common concerns (from @areusch @leandron @tkonolige ) are:

maintaining overhead
whether those optional modules at the S0 stage could enter S1.

Regarding 1, as @Hzfengsy pointed out, PyTorch maintains several compiler infrastructures at the same time and they share some functionalities (you can check this slide presented by PyTorch Compiler team at MLSys this year). I understand as an infrastructure layer TVM wants to keep stable, however, Machine Learning Compilation is different from all existing compiler development experiences because the workload changes too fast, and no one can predict the future, that’s why we sometimes need to redesign rather than improve over existing modules. The new 3-stage looks natural to solve the possible compatibility issue. Regarding maintaining overhead, it introduces some extra pain but I do think them necessary. MLC project will not survive without innovations.

Regarding 2, yes this makes me concerned. Some modules never enter their S1 stage but they exist in the codebase and might cause trouble for maintaining (it’s also noteworthy that there are already many obsolete modules in TVM codebase). However, the S0->deprecation seems to solve this elegantly (because this proposal require the modules to be self-contained in a standalone folder). My another suggestion is to only allow the modules with “core” functionalities to enter the mainline (e.g. officially maintained dialects such as TensorIR and infrastructures).

For many dialects whose authors have no motivation to sync with upstream, it’s better to make TVM more modularized and standardize TVMScript (like MLIR) so that user can develop their own dialect without forking the TVM codebase and maintain them at their own pace. Actually, this is beyond the scope of this RFC.

yzhliu · October 5, 2022, 5:38pm

This is a good summary. It is very crucial to embrace new ideas and technologies, this is more about scope of impact, as a result level of open mindedness in decision around module establishment.

I also like the explicit definition of S0 modules. It doesn’t conflict with high-quality and longevity.

junrushao · October 6, 2022, 12:20am

I would love to echo the point that Siyuan has made. There is no necessary correlation between the scope of the module, its positive impact over the stack, and the code quality, or perfectness of the module.

Pursuing perfectness, of course, is ideal but a subjective matter, and we all need to admit that we are more or less imperfect from the perspectives of different people, and that’s the reason why we are discussing the scope, but not able to fully pin down questions on S1/S2 discussions all at once.

In terms of making progress though, as matter of fact, there is a high possibility that the proposed S0-modules is better for the evolution of TVM, in many people’s PoV according to the response in this thread. On the contrary, when looking at such cases about subjective S1 reasonings, due to lack of early evidence or simply out of personal view, it is important to admit that we are possibly imperfect and can be technically wrong, especially in light of less grounded conversations.

Therefore, empowering S0 module would be an excellent way to make us a more welcoming and evolving community, while retaining the promise of stability of existing modules. And as long as there is clear positive evidence in some aspects, in practice, it should not be mandatory to immediately pin down all future decisions and couple them with S0 modules.

xqdan · October 6, 2022, 12:30am

Can’t agree anymore. This is the version all of us should have in mind, especially for core memebers, like PMC, committers, reviewers.