You’ve listed a number of major features added to TVM in the past couple of years there as optional modules. It would seem to me then that this process RFC would actually become the default process for new components in TVM. I’m not sure if that’s the intent of the RFC - I would hope it isn’t.
Correct me if I am wrong based on what you posted here:
- You agree on optional module,
- languages of llvm/gcc are optional(not Clang), and they are user-facing,
- We should set a proper bar for new optional module, considering maintenance issues.
IMO, all of us can align on 1 first.
I don’t get your point. How would we know these modules will become “major features” at that time?
When something is new and unstable, what do you it want be: 1) just go die; 2) default process of mainstream, 3) optional moudule, 4) other short answer from you?
I agree with the idea that there are some components in TVM which genuinely have no larger design implications and don’t fulfil an essential use-case for the compiler which could be subject to this special process as outlined in the RFC. Such components would never intend to reach the S1 state.
My concern is that because we’re not defining what optional means, it’s completely open to interpretation. Taken to the extreme, you can even say Relay is optional because you could write an alternative way to get to TIR from the frontends.
As a hypothetical, let’s say BYOC was merged in S0 state but one part of the community strongly objected to the design. Then loads of people started using BYOC in S0 state and came to rely on it. Now we want to move to S1, but the design was poor so part of the community still objects and blocks it (vetoes now binding). We are left in a weird state where we have S0 code in main which is widely used but can’t become S1 quality because S0->S1 requires a different process.
I think unless I’m missing something with my hypothetical example above, only option 2 is long-term stable. It will have to go through the default process to reach S1, and I think it’s desirable that all code either eventually reaches S1 or deprecation for maintenance purposes. The other option is that this becomes the default process for all RFCs so such vetoing couldn’t occur in the S0->S1 transition.
It is helpful to have a discussion and that is what this thread is about. Trying to address a few focused items
Collective Shared Vision
It is important for all contributions to articulate their relations to the shared vision of the community. The particular proposal still requires the RFC to be published in normal RFC format, where the rationales and alternatives are being discussed.
But what defines collective shared vision? and how can that evolve to serve G0 and G1? This proposal addresses that question. What if the majority community agrees with a vision, but some don’t due to relatively subjective reasons?
When many are talking about the “bar”. I think we are not talking about code quality necessarily, but our overall thought process in bringing consensus around S0, S1, and S2 modules.
We would certainly benefit from more open-minded inclusions of S0 modules if they get majority support from the community. While S1 and S2 would bring more deliberation overall.
No Duplication
I would like to strongly call out that while most OSS projects encourage discussions about relations with existing modules (which we also do in the proposal), no duplication is a very strong term, can be used subjectively, and is unwelcoming. The specific example of “duplications” (of FX and TorchScript, TOSA and other graph IR, Spark/GraphX, and SparkGraph) all shows that common OSS projects allow some perceived duplication (of implementing similar functionalities) Want to highlight that it is not “simple duplication of code”, which of course should be avoided when possible and encourage reuse.
Code Quality and the Bar
I would like to call out that some of our discussions have less grounded implied messages in the discussion.
“If you do not do X, then you produce low-quality code. (or other bad consequences)”
This kind of implication is unwelcoming to the community members.
The result of such implication would mean that the community goes else-where to places that welcome them and can foster a culture of inclusion while shepherding that gets S0 modules faster, and sometimes due to the given agility to consider different architecture solutions, bring HIGHER quality code overall through better architecture. We have seen this, for example, in some of the ease of use enabled by TorchFX.
Everyone thrives on writing high-quality code. As stated in the proposal, the code patches and changes are subject to the same reviewing process and standard by the committers. Code changes are usually more grounded, as we can say X has a bug or write test coverage, which most people agree with and will act on. Having grounded code quality discussions helps us to make real strides toward a better codebase.
S0-stage modules can be high-quality in terms of robustness and the ability to handle the problems that they are designed for – which are needed by some(but not all) of the community members. They provide values and can be maintained in the long term (assuming the contributors continue to do so, and otherwise triggers deprecation). To be clear, the code patch follows the same process with grounded discussions and the proposal does not suggest a lower standard in the code reviewing process.
The proposal focuses on module establishment. Discussions around S0-module establishment can be subjective and less grounded than suggestions in code changes (where we can simply focus on whether a code has bugs, or suggest a concrete alternative implementation that can be acted upon in order of days).
Things become less constructive for the community when we hold S0-module due to subjective reasonings. That causes alienation of community members, and the set of users who need the S0-module do not get empowered. It is also a missed opportunity for the community to get to the S1 stage since the enthusiastic contributors don’t feel empowered (even if the majority agree).
So the proposal merely calls out the different levels of open-mindedness in the evaluation S0, S1 and S2 changes. Which reflects the practices we experience in different open source projects. The “bar” does not directly tie to code quality, but the open-mindedness in establishing an S0-module that many welcome and need, but do not necessarily need by ourselves.
On the contrary, we often see positive cases of code quality due to such open-mindedness thanks to the agility to be able to consider different architecture solutions, which brings HIGHER quality architecture throughout the process of S0->S1(since nobody can predict everything in the beginning). For example, we have seen this for example in some of the ease of use(e.g. TorchDynamo) enabled by TorchFX that were not part of the initial proposal.
Module Maintenance and Longevity
The longevity of a module depends on the following factors:
- Whether or not it clearly solves the needs of some of the community members.
- The level of maintenance, which is signaled by the proposal term: a clear set of maintenance and S0 modules can be triggered for deprecation once there is lack of maintenance.
- The fact that a module is at the S0 stage only means that it is well scoped and does not affect its longevity. It would affect maintenance, in the case of S1 such cost is clearly scoped and with easier deprecation.
The above are reflected in the proposals. The topic of longevity is also subjective and an evolving topic. As modules evolve, we get more clarity on the opportunities and help us to make more informed decisions along the way. This RFC also clarifies the boundary of such transitions S0->S1, which allows us to have more avenues to make informed decisions at the corresponding time point.
Again the proposal aims to set up guidelines for us to consider S0 contributions with an open mind when there are subjective technical disagreements. In such cases, we need to acknowledge that each of our understanding may not be perfect, and it could be possible that the S0 proposal actually is “higher quality” and solves problems in the right way, given the majority of the community thinks so.
Notably, the S0, S1, S2 only have things to do with the scope of the change, as a result, the impact on the rest of the modules, and different levels of due diligence (and open-mindedness) in establishing the module (that motivates this proposal). S0 modules can have high quality, be maintained for a long time to serve user needs, and be cohesive with many other modules. They just benefit from more open-mindedness due to their well-scoped impact helps to empower community members.
In such cases, we should strongly consider community over code, as it is always a win-win for the community to bring in new contributions in a contained manner.
Thank you everyone for great feedback so far. We would like to incorporate some of the suggestions
Terminology Clarification
We agree that it is helpful to clarify the term and avoid confusion. Moving forward, we will use S0-module instead of “optional module” as that can be confusing and leads to different implications as they mean different things for different people
An S0-stage module is a module that:
- Clearly isolated in its own namespace.
- Motivated by community use cases needed by some(but not all) users in the community.
- No disruptive change to the rest of the codebase
- Can be easily deprecated by removing the related namespaces.
- Can be turned off through a feature toggle to contain the overall dependency from the rest of the modules.
Duplication
We agree it’s good to reuse the existing codes, but it’s also reasonable for modules at S0 stage to have some necessary overlap. So, we add following line to the RFC:
- We expect discussions of the relation of the proposed module with existing ones and reuse when possible, but we do not enforce hard no-overlap rules at S0 stage, as most OSS projects do not require modules to have zero perceived overlap.
How S0-module fit into TVM
We also clarify that we do encourage discussing S1 and S2 opportunities that the proposed module can bring to the overall project in the RFC. But also acknowledging that not every decision can be made in an informed manner at a single shot and we would benefit from open-mindedness for S0 modules.
- We encourage discussions of possible paths of S1 and S2 level change in the RFC to help us set up the context for future evolutions. We need to acknowledge that not all S1 and S2 level decisions can be made at the beginning, and it is not necessary to force an uninformed decision at S0 stage. As a result, we would encourage open-mindedness, acknowledging that we all can be subjective in making effective calls at the S0 moment and bring community over code in this case. As a result, we do not block RFC for S1/S2 discussions, knowing that we can have an open mind due to limited scope and further evolution and developments can help us to make more informed decisions
Finally, it is skipped in many of the discussions here. We would like to call out the benefit brought to community here. We get more community members who are eager to contribute to S0 and also other parts of the TVM ecosystem. In the meantime, S1 level stabilities also address the need of community members who want to enjoy relatively stable, robust and reliable code (G0). By empowering both through S0 for G1 and S1 for G0, we will be in a much better position in better code.
To help facilitate future discussions.
I would suggest that we use the term “S0-module” for now instead of “optional module”.
I would also like to call out that the S0, S1, S2 in this proposal only have things to do with the scope of the change. as a result the impact to the rest of the modules, and different level of due-diligence (and open mindedness) in establishing the module (that motivates this proposal)
S0 modules can have high quality, being maintained for a long time to serve user needs, and be cohesive with many other modules.
They just benefit from more open-mindness due to their well-scoped impact to help us to achieve G1. Also knowing that through development we will gather further information so we can make informed view about S1, S2, which have their own RFC to ensure G0.
As been suggested in earlier part of the threads, S0 modules should be able to be turned off through a single toggle. Here I’d like to update the S0 module definition by adding:
- Can be turned off through a feature toggle to contain the overall dependency from the rest of the modules.
The proposal is also updated. Thanks for everyone’s constructive discussion so far!
I advocate this proposal as it provides a way to progressively land and stabilize new features, without breaking existing functionality and APIs.
I read the entire thread and feel the common concerns (from @areusch @leandron @tkonolige ) are:
- maintaining overhead
- whether those optional modules at the S0 stage could enter S1.
Regarding 1, as @Hzfengsy pointed out, PyTorch maintains several compiler infrastructures at the same time and they share some functionalities (you can check this slide presented by PyTorch Compiler team at MLSys this year). I understand as an infrastructure layer TVM wants to keep stable, however, Machine Learning Compilation is different from all existing compiler development experiences because the workload changes too fast, and no one can predict the future, that’s why we sometimes need to redesign rather than improve over existing modules. The new 3-stage looks natural to solve the possible compatibility issue. Regarding maintaining overhead, it introduces some extra pain but I do think them necessary. MLC project will not survive without innovations.
Regarding 2, yes this makes me concerned. Some modules never enter their S1 stage but they exist in the codebase and might cause trouble for maintaining (it’s also noteworthy that there are already many obsolete modules in TVM codebase). However, the S0->deprecation seems to solve this elegantly (because this proposal require the modules to be self-contained in a standalone folder). My another suggestion is to only allow the modules with “core” functionalities to enter the mainline (e.g. officially maintained dialects such as TensorIR and infrastructures).
For many dialects whose authors have no motivation to sync with upstream, it’s better to make TVM more modularized and standardize TVMScript (like MLIR) so that user can develop their own dialect without forking the TVM codebase and maintain them at their own pace. Actually, this is beyond the scope of this RFC.
This is a good summary. It is very crucial to embrace new ideas and technologies, this is more about scope of impact, as a result level of open mindedness in decision around module establishment.
I also like the explicit definition of S0 modules. It doesn’t conflict with high-quality and longevity.
I would love to echo the point that Siyuan has made. There is no necessary correlation between the scope of the module, its positive impact over the stack, and the code quality, or perfectness of the module.
Pursuing perfectness, of course, is ideal but a subjective matter, and we all need to admit that we are more or less imperfect from the perspectives of different people, and that’s the reason why we are discussing the scope, but not able to fully pin down questions on S1/S2 discussions all at once.
In terms of making progress though, as matter of fact, there is a high possibility that the proposed S0-modules is better for the evolution of TVM, in many people’s PoV according to the response in this thread. On the contrary, when looking at such cases about subjective S1 reasonings, due to lack of early evidence or simply out of personal view, it is important to admit that we are possibly imperfect and can be technically wrong, especially in light of less grounded conversations.
Therefore, empowering S0 module would be an excellent way to make us a more welcoming and evolving community, while retaining the promise of stability of existing modules. And as long as there is clear positive evidence in some aspects, in practice, it should not be mandatory to immediately pin down all future decisions and couple them with S0 modules.
Can’t agree anymore. This is the version all of us should have in mind, especially for core memebers, like PMC, committers, reviewers.
Thanks, everyone, for the discussions so far on this thread. Here are a few things that can help future discussions:
- I would encourage us to avoid making implications, “e.g. we have to do X; otherwise, it is not achieving goal Y and then bad consequences.” Instead, coming back to goals and being open-minded could go a long way.
- We all strive to build a welcoming, inclusive community. Through the discussions, let us be empathetic, welcoming, friendly, and patient. Let us focus on solving issues(of achieving both G0 and G1), and avoid blaming people or groups.
Finally, I would encourage us all to think about the community over the code principle and encourage everyone to put the community into the picture. Process serves the community this conversation is important for us to be viable as a community and as a project. thanks everyone’s inputs so far and thanks @Hzfengsy for driving the conversation.
How does the proposal fit into TVM and planning
Thanks, everyone for the discussion so far. One G0-related action is to discuss how an S0-module proposal fits into the project and overall planning. We agree that such discussions about fit and planning are important.
In the meantime, many OSS project modules also do not pin down all the detailed plans from the very beginning. Instead, they usually start with some clear positive signs in some area, then gradually expand to more as the project evolves. For example, TorchFX started with the ability to build quick rewrites(quantizer) and then moved on to support Dynamo compilation later.
As a result, would love to see what we think about the following text (also included F0 as a comparison). NOTE: It’s not a formal vote, but only to see feedback from the community
-
F0: There should be discussions about how the proposal fits into the project. We expect the discussion to bring pin down to all future S1 and S2 aspects.
-
F1: There should be discussions about how the proposal fits into the project to bring clarity. We also acknowledge that not all S1, S2 level decisions can be made at the beginning. Additionally, an S0-module should show a clear positive fit to some(but not all) aspects of the project and clear cohesion to some of the existing modules. As the development evolves, more discussions will happen in future RFCs with additional evidence to help us make informed decisions.
0 voters
To me, the most important metric of the quality of a code is whether it is useful and solves problems faced by some part of the community.
If it indeed solves problems and provides new features people want, then I will regard it as good code, even if they use _ and __ as variable names (forgive me if this sounds a bit extreme to you). You can always iterate over the code from 1 to 2 to 3 and so on, but the most important step comes from 0 to 1, which I value the most.
When I crafted and merged the initial version of TVMScript ([RFC] Hybrid Script Support for TIR, https://github.com/apache/tvm/pull/6227) into TVM, the code I wrote is hard to say to be in satisfactory industrial quality (at least from my perspective) when I was only a 3-year undergrad and has little experience of research and engineering, and it has been substantially refactored since then by many other community members.
I would say TVMScript is still an optional module now because users can choose to use TVMScript if they are interested in it, or they can simply just ignore it without modifying any line of their codes. And I believe this can be a typical workflow for other optional modules, which is principally reflected by this RFC.
Actually I am a bit surprised by this RFC because I thought it was a de-facto existing develop work flow in TVM. AutoTVM, Ansor, MetaSchedule, TVMScript are all optional modules IMO. Even for the compilation module which lies in the core of TVM, users can choose to use te or TensorIR which provide two indepedent APIs to describe computation and schedules.
Maintenance is a typical engineering problem faced by many projects, many of which are 10x larger than TVM. I would say as long as it’s an engineering problem, it can be solved with strong and broad community users and developers, given the fact that there are many much larger and longstanding projects like Linux that are still under active maintenance.
The key to such success is that we do have a persistently existing community along with the evolution of new technology, the emergence of new problems and new challenges, and the proposal of new and even unmature solutions. It is in some sense much more difficult than maintenance efforts because machines and codes are under control while people are not.
I would like to follow up on this since I was not straight on this, and I apologize for not calling it out directly. There was a post that singled out a particular group in the discussion. They are not helpful to our discussion at all, and we should stop bringing up conversations that single out a group. The related posts are flagged for deletion. As a community, we follow a common code of conduct and aim to have civil discussions.
There are many productive comments on this thread, and let us continue to have civil and constructive discussions.
Overall Process Compatibility
I would also like to report back here after doing some research on process compatibility. Since that was another concern being raised in this thread.
ASF allows projects to update their process according to the community needs, which was also the motivation behind the original process RFC
All code patches to S0-modules follow the same reviewing process, and it is reasonable as most comments on code changes can be quite grounded with clear technical reasoning (this code has bugs that can be fixed in this way).
S0-module establishment is closer to the establishment of a submodule. Such creations are usually not typical code changes since they are subjective and can not state grounded reasons “(opens a security exposure, negatively affects performance, etc. )”
For similar reasons, other Apache projects also define their process of creation of a new submodule that follows some kind of majority. For example, Apache Hadoop process and Apache Hive Bylaws specify that any submodule(subproject) creation follows 2 / 3 majority. Note that these previous cases can apply to both S0/S1 modules indifferently.
Thanks for the feedback from the community. As the majority of the community members agree on F1
, I’d like to update the proposal according to the community’s opinion. The following statements are added:
Thanks everyone for the great input in this thread. We all agree about the importance of G0 and G1. We have different thoughts on how to get to the Gs. Knowing that some of them can be subjective, a lot of our discussions come back to the level of open-mindedness when considering contributions wrt to their scope. I would also encourage all of us to come back and think about the community in such cases.
Thanks to everyone’s constructive inputs that help to improve the proposal, in some cases when we have different opinions, we choose the phrasing that most of us agree to.
Now, the RFC is post on the tvm-rfcs. Discussion in this thread and comments on the RFC repo are all welcomed.