[Process RFC] Empowering New Scoped Module to the Project

Background

Machine Learning Compilation (MLC) is an emerging field in fast development. With the tremendous help from the whole community, it’s exciting to see that TVM delivers significant needs from and to developers, and thus has become widely popular in both academia and industry.

As a rapidly growing field, inevitable needs keep emerging daily as new workloads and demands come in. For example, demand has been evolving from static shape compilation to dynamic shape compilation, from scalar code to tensor cores. As an early player in the field, we led in some of the most important areas, thanks to our close collaboration and agile iteration for innovations.

Success comes from listening to the community’s demands. As one of the first-movers in this field, who wants to build the project toward future success, it is important for us to keep listening and always have the following two goals in mind.

  • G0: Maintain stable solutions for existing use-cases
  • G1: Always be open-minded to new demands, land technical commitment timely, continue to reinvent ourselves, and welcome new members to the community.

G0 is important in the sense that we would like to continue making sure we do not create disruptions in existing code. In the meantime, enabling G1 in a timely manner helps us to stay up in the competition and keep pushing state of the art.

Definition: We categorize a new module as S0-module if it satisfies the following criteria:

  • Clearly isolated in its own namespace.
  • Clearly needed by some users in the community.
  • No disruptive change to the rest of the codebase
  • Can be easily deprecated by removing the related namespaces
  • Can be turned off through a feature toggle to contain the overall dependency from the rest of the modules.

Common practices: in most projects is to introduce improvements in different phases.

  • S0: as being defined in this proposal
  • S1: Evolving the overall solutions to make use of the new component.
  • S2: Deprecation of some existing solutions or evolving the solutions.

Notably, not all changes have to be scoped as S0-level changes. There are many features that involve S1 level changes which can also be evaluated as part of the RFC process. But nevertheless, having a clear phased development helps us to bring advances to both goals.

Keeping both goals in mind, it is important to enable a mechanism for the community to welcome new scoped modules to the project. Enabling new modules is one way to quickly enable G1 while keeping the existing G0 part stable. This is a common practice established in Apache and non-apache projects. For example, Apache Spark initially started with an optional module GraphX for the graph process, and then came follow-up improvements along the line of SparkGraph. MLIR enables different improvements as dialects, such as TOSA, Torch-MLIR. PyTorch enables new graph exporting mechanisms named TorchFX while also maintaining TorchScript for other existing use cases.

In those past practices, the new components are introduced as scoped modules with minimum changes to existing ones. Notably, there can be perceived overlap with some of the existing components, e.g. Torch-MLIR contains similar features around computational graphs as TOSA, but also brings orthogonal improvements to the overall system. As a related example, TorchFX certainly has overlapping features with TorchScript, but also brings in new capabilities along. While not all of them are ASF projects, they are successful practices that enable some of the open source projects to thrive in a similar field that we are in.

As in practices in other machine learning projects, there can be some levels of duplications or missing features compared with existing components (TorchFX TorchScript, TOSA and other MLIR graph IRs). Following the same practice in those related projects, as a team player in the community, one major principle in Apache is to empower communities by empowering optional components if they do not affect existing workflow. Empowering S0 through scoped modules brings a win-win situation for the community: it also brings in new aspirational members who are willing to collaborate and deliver the best for the community. This way, we keep ourselves up-to-date and grow stronger. On the other hand, failure to do so could result in community members getting discouraged and we lose valuable contributions and opportunities for us to grow in this rapidly growing area.

The type of modules can include, but are not limited to:

  • IR dialects such as MLIR’s TOSA (while there are other graph IRs). TorchFX(while there is already TorchScript).
  • Vertical flows that leverage some of the dialects.
  • Backends/frontends.
  • Other types of modules introduced in a self-contained namespace.

S0 changes would be contained in its namespace, with possible integrations also built inside its namespace. There can be follow-up steps (S1), such as making a dialect broadly accessible in existing compilation flows. Importantly, further S1/S2 level changes would require different RFC and longer deliberation for G0. The discussions on S1 also would serve as a way to allow the community to have a floor to talk about where broader areas of the project are going through a longer deliberation to maintain G0. Clearly identifying and empowering the S0 stage helps us to enable improvements quickly while bringing energies to the community, empowering a broader set of users, while not disrupting existing use cases.

Proposal: Empowering S0 Modules

In this process RFC, We’d like to propose a process to encourage S0 modules and set expectations about what we anticipate in such inclusion.

Note that this RFC focuses on the S0 stage. We propose the following guidelines to expedite to process while ensuring, quality and community support:

  • More than three PMC members endorse the S0-level proposal to ensure that there are enough eyes and shepherding in the module. The decision to establish a S0-level module needs to get majority support from PMC.
  • The code changes of S0-level modules follow the normal code review process as in all other modules in the codebase.*
  • A clear set of community members are committed to maintaining the proposed modules with technical support quantitatively, more than three endorsing committers who can serve as the initial owner.
  • No implication that everybody has to immediately work on or switch to the new S0 module.
  • We expect discussions of the relation of the proposed module with existing ones and reuse when possible, but we do not enforce hard no-overlap rules at S0 stage, as most OSS projects do not require modules to have zero perceived overlap.
  • Relations to existing modules and interaction are being clearly discussed, but no hard requirements on zero duplications as per practices in other projects
  • Clean isolation of changes from existing modules, when the change touches existing modules, they should be discussed separately.
  • In discussions of S0-level RFC, maintain a clear separation from S1, and S2 level decisions in later stages so we can encourage S0 changes early while enabling informed decisions at S1, and S2 levels in continued discussions as the modules continue to evolve in the ecosystem.
  • There should be discussions about how the proposal fits into the project to bring clarity. We also acknowledge that not all S1, S2 level decisions can be made at the beginning. Additionally, an S0-module should show a clear positive fit to some(but not all) aspects of the project and clear cohesion to some of the existing modules. As the development evolves, more discussions will happen in future RFCs with additional evidence that help us to make informed decisions.

After the RFC discussion period. One of the PMC members would serve as a champion, provide a clear technical summary of the state, pros and cons during discussions for the S0-level proposal and suggest a path forward. The champion will also continue to drive the overall process of code upstreaming and follow-up discussions.

Transitions of S0-level module. After an S0-level module is established, it can undergo the following possible transitions:

  • S0 → deprecation: When a S0 module no longer has an active set of maintainers, the module will be deprecated. The removal of the module is easy as they are contained in the respective folders, with no modules that come and depend on them.
  • S0 → S1: When developers propose to incorporate a S0 module broadly into existing flows/components. Each such composition would require its own RFC and following the existing RFC process.

Questions for Discussion

  1. Would love to see other suggestions on encouraging new contributions
4 Likes

+1. (Wearing the Apache hat) As an open source community, the open-mindness to add optional modules without affecting existing functionality is natural thing. I don’t think it’s anything controversial.

1 Like

Thanks for this proposal @Hzfengsy, I don’t think there’s anything contentious about providing a way for optional modules to be added into TVM and that it is often the best way forwards. I do have some concerns with the process as proposed though:

I think the existing RFC process allow us to have some degree of control in balancing G0 and G1. At the same time, can you clarify what you mean with “land technical commitment timely”? I’m questioning that because don’t think we should make this perceived time pressure (implied in many points on the text) as the main decision point in technical discussions.

This point interests me a lot, as I think we should be careful not to use this reasoning of “other project do it this way” especially when justifying things that can be perceived as gaps, if we can strive to have a higher (yet healthy) quality bar in our project.

I don’t think we should just by default accept duplication or missing features as a commonly good things, and we should be asking ourselves why some contribution is coming into the code base, and what’s the outlook for it to become a full feature in the stack.

I agree we should be as welcoming as possible with new features and contributors, but to make that fair and sustainable for a community maintained project, contributions proposals should do the due diligence of seek consensus on how they fit long term into TVM.

Can you expand on why it’s necessary to create a new process here on top of the Apache voting process? My concern is that if we create this new process, the overarching TVM project could become fractured as these trios can effectively operate as their own sub-project without any of the broader governance.

I think this is a standard piece of our existing RFC process and we don’t necessarily have to duplicate it here?

Can we add something that states the burden of proof is on the author of the optional module to show that it cannot be done within existing modules?

My fear here is that it is usually harder to rework existing code than start something without all the existing features, but for the long-term health of the TVM project we should encourage integration into existing modules and support people in discovering how best to rework their solutions to maximize the benefit it can have for our users.

Sorry but I disagree here, when proposing new modules we should be considering the longevity of the TVM project as a whole and how it impacts the diverse range of contributors; I would suggest we only land modules in the TVM source tree which have clear longevity.

Finally - again, I think this discussion is very relevant and I’m glad you posted this - but I’m feeling there is a missing analysis on what are other approaches that we could adopt as a community to deal with optional modules - was there any investigation on that regard?

Also, I’m equally interested in seeing this document slightly expanded with a section describing what you and others perceive as risks of the proposed approach, examples:

  • “a mid-S1 stage contribution doesn’t materialise as something the community wants to maintain, and therefore TVM is left with 2 incomplete alternative ways of doing something” - how should we proceed in that case?

  • How this process can be evolved to help us tackling an ever growing puzzle of features which is hard to reason about such as “you can only use tuner T if you use frontend F1 and F3, but because it uses optimisations provided by module X, it can only executed on runtime R4.”.

Thanks @Hzfengsy for the proposal! I’m broadly supportive of admitting new optional modules to the project.

My main question comes around maintenance, CI and releases. In general I think it is healthy for our project to welcome new contributions. At tension with that desire is the maintenance overhead of updating and fixing tests that are introduced in such new optional modules, plus the burden of supporting them in our official releases.

Most new modules tend to be a bit rough around the edges in terms of robustness and documentation–this is nothing bad and a natural part of how new components are developed. My concern arises when a piece of the underlying project is changing, and the person making that change doesn’t know how to update the optional module or its tests. I see in your proposal that you address this with

This means to me that at the time of adoption, the new module has a support community that could help with situations like the one I posed above. However, as time goes on, that situation may change–those folks who originally volunteered to support the new module may become busy with other things. What’s not clear to me, if we adopt as project policy that we welcome any new optional module of interest, is how we should treat optional modules that have tended to bitrot?

Because of those reasons above,

if we’re going to adopt an official process for accepting new modules, I’d like to encourage us to round out this proposal by expanding it to consider the full life cycle of such new modules. e.g.

  • What should happen if the maintainers to an S0 module disappear, as noted above? What about an S1 module?
  • What is an S1 module allowed to do that an S0 module cannot? Same question between S2 and S1.
  • What is the process by which an S0 module moves along to S1? S1 to S2? What implications are made by those transitions?

Thanks @leandron for the reply. I’d like to answer or explain some of your questions here:

Note there is no time pressure factor in the proposed guideline and they should not be applied to making a particular decision.

We are talking about G1, which is the motivation and goal of the project development and for this process RFC, so we can improve G1 in general. We do need to take these motivations into consideration, realistically, otherwise, we won’t be able to survive in the competing ML landscape.

Learning from existing lessons can be very valuable and they help us to ground our thoughts – which are usually subjective.

Acceptance comes with a set of criteria that are outlined by the proposal. S0-stage modules are already self-contained and are useful by some (but not necessarily all) of the user communities. As a result (by considering the community over code principle) we should find ways to empower these community members.

It is indeed helpful to discuss the outlook and future evolutions. They should be part of the S1 stage discussion. We, however, cannot capture all the possible futures in the S0 stage.

Notably, future evolutions also come as the project evolves, each with their S1 stage RFCs. As an example, when TorchFX was first introduced. It contains a limited scope of supporting importing graph modules(S0). Later it was then expanded to support generic frontend compilations such as TorchDynamo.

It is important for us as a community to refine the RFC process to empower the community to contribute to the project. Many motivations are discussed in the original RFC as well.

Importantly, most S0 modules usually reuse existing components (but not all of them). So there is indeed a shared community (that different community members have a subset of possibly overlapping sub-areas, but not all of them). It is also a common practice in existing projects to have code-owners that take interest in some of the submodules. The entire project is still governed by the same set of people. A healthy project containing a mixture of S0, and S1 modules can help us to maintain the overall community.

We also need to avoid another risk, that pushes out community members who need S0 modules and lack community empowerment overall.

It is important to discuss the relation of the S0-module with existing modules. Notably, some of the technical assessments can be subjective and different people have different opinions.

No duplication is an too high bar for S0-stage modules. Explicitly stating the burden can be unwelcoming to community contributions. Many of such assessments of whether X can be done in existing modules can be subjective. They are important conversations to have in the S1 stage and the bar there is certainly higher. In cases where there are some subjective contention in assessment, it is important to take a step back and consider the community impact(community over code)

  • Does the introduction of the module empower additional community members?
  • Would the introduction of the module disrupt existing flows?

If the answer is yes to the first question and no to the second, it is usually common practice to bring them in from an empowerment point of view. Of course with the rationale, pros and cons are clearly stated.

Taking existing practices as an example. TorchFX does not initially have all the functionality that TorchScript has. Similarly, SparkGraph’s functionality can be perceived as being similar to GraphX. Nevertheless, the respective communities welcomed those changes without asking for “burdens of proof non-overlapping”.

This is part of G0. This is an important aspect and that is why we have S1-level evolution. The particular RFC discusses BOTH G1 and G0 into consideration. The health of the project also boils down to the health of the community, which encourages a diverse set of community needs. S0-level changes as they are already bringing benefits for the users. Without some of the S0-level empowerment, we run into the risk of lose the corresponding community contributions, as a result, no chance of S1-level improvement at all.

Longevity is a subjective matter. S0-level changes usually already benefit a clear set of users here. While S1-level changes would even help to enhance it further.

The proposed guideline ensures enough maintenance (PMC members and committers) to sustain the development. Think of it in another way, if a module:

  • Clearly is needed by some set of community members
  • Being maintained and supported by a clear set of committers
  • Not disrupt other existing modules in the codebase.

The endorsement, commitment to maintenance, and community needs certainly justify the inclusion and sustained maintenance (as a result of longevity) in the S0-level.

S0-level contributions are already useful to the community as stated. We should of course push hard for S1 stage changes. Note that there is usually a process. As in other projects – TorchFX and TorchScript still co-existed in the codebase and they helped to cater to different users. This is something that we can discuss and improve as separate topics.

Notably, without S0 we don’t even have a path to S1. Note that this proposal does not add to, or remove such complexity – it is the particular RFCs that do. In terms of complexity management, S0-stage modules’ complexity is contained within their own namespace, as a result, won’t result in such compositions.

Thanks, @areusch.

The Apache community is based on volunteers, and we cannot simply hold everybody as being full-time maintaining the modules. The answer here again is to grow and empower the community, bringing in new community members who are interested in maintaining the modules. When a module is no longer getting enough maintenance, the community should collectively come up with deprecation plans and alternatives. The problem applies to all modules in OSS projects. Notably, it is easy to deprecate S0 modules as they are contained in a separate namespace (just remove the corresponding namespaces).

If another module starts to depend and uses the S0-level modules. (e.g. tvmc started to make use of the new module), then that would be an S1-level change, which would bring separate RFCs to take both affected modules into context. S2 is the most serious one, as it implies the removal of existing components, and would require broader discussions and general ahead-of-time notice. The bars of S0, S1 and S2 should be set differently so we can achieve both G0 and G1.

@Hzfengsy I understand your points, but it I find it challenging to commit to the criteria proposed in the current text as the official guidance for the project.

I agree that perhaps we need some sort of guidance for proposals to be brought into the project, but I don’t quite agree that we should do that by increasing the risk of fragmenting our code base or lowering the bar for new modules.

So, given that writing these sorts of guidances is something new to me, I went to learn from the projects previously mentioned in this thread: Apache Spark and torch.

It seems these projects are even more strict than ours, with regards to bringing in major changes. Here is my investigation.

Apache Spark

To bring changes to Apache Spark, there is a process called “Spark project improvement proposals (SPIP)”, described in Spark Project Improvement Proposals (SPIP) | Apache Spark.

In summary, it defines two main roles connected to the SPIP: SPIP Author and SPIP Shepherd. The process requires a proposal to be raised, and one PMC member to be the owner accountable for that change.

It does require a roadmap and time estimate for the change to be brought in, as well as a risk analysis, which seems reasonable things to be asked.

Furthermore, it states a very interesting set of questions, which I think definitely we should explore improving our own processes:

Q1. What are you trying to do? Articulate your objectives using absolutely no jargon.

Q2. What problem is this proposal NOT designed to solve?

Q3. How is it done today, and what are the limits of current practice?

Q4. What is new in your approach and why do you think it will be successful?

Q5. Who cares? If you are successful, what difference will it make?

Q6. What are the risks?

Q7. How long will it take?

Q8. What are the mid-term and final “exams” to check for success?

There is also a set of Appendixes, which talks about design choices. I would like to highlight Appendix C:

Appendix C. Optional Rejected Designs: What alternatives were considered? Why were they rejected? If no alternatives have been considered, the problem needs more thought.

Reading the document, I can’t see anything related to automatically accepting something just on the merits of having a subset of the community who cares about it, nor something resembling the proposed S0, S1, S2 landmarks.

Torch

Torch has a slightly more summarised set of words to describe their process, described at PyTorch Governance | Mechanics — PyTorch 1.12 documentation.

It states an example with 4 steps, specifically highlighting authors to think about cost of maintenance when new changes are brought in, as well as asking for a plan for “maintainership, development and community”.

  1. Interview researchers / stakeholders, talk to community, gather issues;
  2. Read papers, attend conferences, build example pipelines based on experience;
  3. Create a state of the world - make sure this change is necessary, for example adding a new project or module is worth the maintenance cost; or removing a project or module will not remove too much value from PyTorch;
  4. Create a proposal; the proposal covers the maintainership, development and community plan once the proposal is approved.

The core maintainers take final decisions on the proposal, articulating the reasoning behind the decision publicly.

Apart from that, there is a very interesting remark, which connects the the shared-ownership nature of OSS, which is something I think we should also take in considering while thinking on early stage contributions.

It says:

  • There is a high bar for new functionality. Unlike in a corporate environment, where the person who wrote code implicitly “owns” it and can be expected to take care of it for the code’s lifetime, once a pull request is merged into an open source project, it immediately becomes the collective responsibility of all maintainers on the project. When we merge code, we are saying that we, the maintainers, can review subsequent changes and make a bugfix to the code. This naturally leads to a higher standard of contribution.

Source: PyTorch Contribution Guide — PyTorch 1.12 documentation.

So after reading this, I again don’t see this criteria reflected in the documentation links pointed out by those communities. Do you have other communities/projects that are more aligned with the examples given?

Finally, I think I still stand with my suggestions from the previous post, that we should keep a balance between early contributions and how said contributions fit in our long shared view on TVM as a project.

1 Like

Thanks @Hzfengsy. I like the idea of encouraging optional module adoption. Despite the technical details (of course they are very important), I’d like to share some of my high-level thoughts from community building perspective.

We all agree it is very important for an open-source project to keep its community active. How Apache does is to keep bringing in new contributors from outside. For example, the “committer” role in an Apache project is not about how many lines of code one has put into the repository (though one of committers’ most important responsibilities is to merge code), but rather in general the fact that they have contributed and they’re willing to contribute. It’s not like you have delivered a lot of features, and as a reward, you’re granted to be a “committer”. Instead, by inviting you to be a committer, we, as a community, acknowledge your contribution, and we encourage you to work more closely with us and hopefully you will make more contribution and help more people join the community.

This system and philosophy prove to be a huge success keep Apache projects active. There are indeed people become committers, and then faded. But that’s not a big deal. If 1/5, or even 1/10 of the new committers eventually grows to be one of the most active and important contributors in project, it is a success.

Same for optional modules. I understand there is concern that these modules may decrease the project quality, or the authors may leave. I agree we shall keep core module high quality. I also agree we shall find a way to retire these optional modules if they become inactive, and merge them into the core only when they meet the high bar. But on the other hand, I believe there are more benefit than the downside to have a slightly different standard so to easily adopt new, experimental modules, and bring in new contributors.

5 Likes

I appreciate the spirit of this proposal, but I do want to echo some of the concerns voiced in this thread, such as by @areusch, that incorporating optional modules in mainline creates pressure to maintain them and keep them up to certain standards—it will send a message to the community if an experimental feature is in mainline. I think there should be a relatively high level of commitment if we want to have optional experimental modules in mainline (as opposed to developing them in a fork).

If we go with this proposal, I would favor having an explicit requirement that optional modules must be able to be turned off through a feature flag to ensure that they will not create new, (unintentionally) non-optional requirements.

1 Like

+1 (Binding)

IMHO, it is great to lower the bar for new modules to encourage new contributors as long as we can make it optional and doesn’t affect existing functionalities. Just like when we introduced meta-schedule/ansor after autotvm, relay after nnvm, It doesn’t mean we are abandoning/deprecating those previous modules (until last month, there still are new code committed to autotvm), instead this is a natural way to keep project thriving by bringing new tech and contributors in.

Even if the new module brings maintenance burden and no one is willing to take care of it, we still can vote to deprecate it. But it is not open-minded if we assume that those contributors are untrustworthy and will disappear at the very beginning, which might block new developers involved in the project.

2 Likes

Honestly speaking, we should be careful that :

  1. what kind area of our project is in?

  2. what other projects in this area are doing?

  3. what could we learn from them?

  4. what could we do more to attract more contribution?

The first question is very important. ML system / ML compiler is in a very active / competitive area. Many other projects are working very hard and quickly. Like TorchFx / TorchScript in PyTorch or TOSA in MLIR. I think the key success (at least on some aspect…) of TVM is that we are very active and has core team members to work so hard, other hand is we are very open, we encourage others to do fantasy research on TVM and then encourage them to contribute it back leveraging the open source and apache culture, for example ansor.

Now I think we are trying to answer the 4th question: What could we do more to attract more contribution? As we are in so competitive area, we should create more open and more formal process to help them come in TVM ecosystem easily. Core modules should be stable and high quality, I agree. However, for innovation and fantasy research module, we should still encourage them to come in, they could be the optional project like this RFC propose. Maybe they don’t be familiar with TVM process, no worry, as this propose suggest that PMC could help them. Not every innovation project could be successful, however we should have courage to do this and encourage this kind of innovation, if it is success, Bravo, we could make it into core module and require it should be high quality, if it is fail, no worry, we could vote it deprecated.

innovation distinguishes between a leader and a follower – Steve Jobs

1 Like

@yzhliu I think it is a fair assumption that people might move on in their careers and do something else. We as any other community should be always prepared to take care of what is in our repository e.g have tests to validate it and a clear idea on what we want to do with such contributions/modules. Therefore it requires a high bar to be part of any major project, agreeing here with what @slyubomirsky wrote above.

It seems the aforementioned projects MLIR/TorchFX and others don’t operate in this way of a wide open definition “encourage others to do fantasy research on TVM and then encourage them to contribute it back leveraging the open source and apache culture, for example ansor.” without a high quality bar and a sense that code and discussions need to be sorted out before such contribution is merged.

These seem like two different moments when creating some work intended to be research as a first step and then when it is ready to be contributed upstream in a production grade software like TVM aims to be. We should encourage people to do their research on their own forks, and not in out mainline branch.

IMHO, this shouldn’t be a discussion about “innovation versus quality” as if they were opposed to one another. I think the discussion should be similar to what other projects do in this sense, which is “innovation with quality”, and therefore, as demonstrated in my previous post, with high quality bar, similar to Spark and torch.

Another pointer to illustrate is a recent hundred-plus post discussion in MLIR about the level of quality for new dialects, which demonstrates that communities do care about quality before merges - [RFC] Proposal for a high-level ML dialect in MLIR - MLIR - LLVM Discussion Forums.

We agree that quality is important and is something that we would like to maintain. The quality is ensured through: a clear initial vetting process during the review. Note that the RFC establishes the S0-level module, all future code changes to the module follow the normal review process with the same quality standard.

We observe that if more than three PMC/committers who are familiar with the area have reviewed and endorsed the module, usually they already meet a quite high quality caliber. We acknowledge that indeed clarify furtherd. As a result, we add the following guidelines

  • The decision to establish a S0-level module needs to get majority support from PMC.
  • The code changes of S0-level modules follow the normal code review process as in all other modules in the codebase.
  • The champion is expected to suggest the resolution by taking all the discussions and feedback into consideration.

The rationale is that discussions of alternatives during establishment of the module can be subjective and less grounded compared to a code patch (e.g. this code causes a bug and can be instead done in this way). As a result, we encourage us to have open minds in our subjective technical assessment when establishing S0-level modules.

Another good suggestion is to clarify the transition between possible states of the module. We decided to add the following clarification:

  • S0 → deprecation: When a S0 module no longer has an active set of maintainers, the module will be deprecated. The removal of the module is easy as they are contained in the respective folders, with no modules that come and depend on them.
  • S0 → S1: When developers propose to incorporate a S0 module broadly into existing flows/components. Each such composition would require its own RFC and following the existing RFC process.

Note that things are also updated in the original RFC.

Thanks for the discussions so far. One thing that I would like to high-light again is that in facing an important topic as it is. We should take community into account.

A lot of discussions are around technical implications, which are important to have. In the meantime, we know that technical implications are usually materialized with grounded examples(such as reviewing a single PR), and are more subjective on broader discussions.

We should recall Community over code principle: we can of course strive to write the best code, there can be cases where we have disagreement on “what is the best way to write code”, in this case, we should come back and think about the community. Remember that it is the community members who bring in the code, and we should empower community members so they feel welcomed, and we can have a diverse set of solutions to keep us competitive in the landscape.

Having a clear separation of S0, S1, S2 helps us to set boundaries, enable more people to collaborate while acknowledging that sometimes we may disagree on how S0 can be done, but we want to empower the people to bring in things as long as they do not disrupt the existing flow.

I would encourage all of us to answer the question:

  • How to empower community through-out the process?
  • How to bring in new community members who are interested in contributing in S0 modules?
  • How do we continue to empower others who need S0 modules, even if we disagree with some of the hows(which can be subjective), while the module is self-contained and not disrupting existing work overall?

S0 doesn’t mean no code quality requirement, IMO they still should obey the same code bar like other components. We encourage innovation project and wish them could come to TVM early in S0 level. However, this doesn’t mean they have to complete their incubator project in TVM main branch from 0 to 1, they could be free that they complete their prototype in their fork and reach S0 level, then pull request to TVM as one optional project. The key point is that we want to attract more and more S0 level people and project to TVM. Currently we don’t have the concept of S0, S1, S2. Everyone says that we should have a very mature and proven project then we could contribute it to TVM.

2 Likes

I’ll have limited time this week to keep replying here, so I’ll leave some suggestions, but it seems @areusch and @slyubomirsky are also voicing some concerns with this proposal, therefore I feel definitely needs more discussion.

Here are some suggestions, based on previous contributions in OSS projects.

  • How to empower community through-out the process?

Do:

  • Give them visibility in community meetings to showcase the work, and welcome them in the community;
  • Advertise their work on TVM website with a similar this section in LLVM: The LLVM Compiler Infrastructure Project ; perhaps we can have a page which showcase prominent research done with TVM.
  • Provide a clear set of rules on what they need to be mindful when making the transition to mainline TVM
  • Foster a culture that work can be done constructively to TVM, rather than perhaps a new module - this is very clearly highlighted more than once in torch and Apache Spark set of rules;

Don’t:

  • Merge contributions in TVM main without any expectation on how that fits in the stack long term;
  • Intentionally merge something with lower quality bar (as implied in many responses) with other reasons than technical reasons. What we merge should be in the primary interest of the project, and that seems what all other OSS project do;
  • How to bring in new community members who are interested in contributing in S0 modules?

I think there is a balance here in terms of what’s the perceived advantage for TVM as a project get with such S0 module, that needs to be reviewed on a case-by-case instance, rather than a default rule.

I’d be very worried if we start bringing modules that we are unsure where they fit in the long term.

  • How do we continue to empower others who need S0 modules, even if we disagree with some of the hows(which can be subjective), while the module is self-contained and not disrupting existing work overall?

I’d argue that once something is merged in the codebase and enabled in CI, it has potential to disrupt existing work, therefore it should meet quality and roadmap requirements, at the same level as any common OSS does.

I’m sure nobody here wants to work in a code base with lots of disjoint pieces of work that we have no specific plans long term. In fact, I think this might repel new contributors.

Can you expand on who “everyone” is? IMHO, contributions don’t need to be mature (we don’t even require this today), but certainly needs to be proven to be contributing long term in TVM, that seems to be the common practice.

Finally, I’m not saying contributors need to do all the work in making their new features to cover all our existing stack (e.g. all runtimes / all frontends). However, I think we should be open to contributions that demonstrate they have a shared vision for where they fit, and have at least a plan to gather the support so the community is willing to expand that to cover all the stack at some point.

I feel the community will be there to support contributions that invest not only in “getting into the code base”, but also in fitting in with the community vision they are contributing to. If potential contributors are not even willing to see where they fit into the stack, I think that is a warning sign for us a community.

This is one concept defined. We of course don’t say explicitly you should prove your project is mature, however when we contact many people using TVM(sorry, no allowment I can not list their company name, but I will try to persude their approve later), they say “we want to contribute it but I think our project is not mature and don’t think it is a good time to contribute it back”. I try to conclude that our existing project module pull request is almost mature and proven in the industry (for example ansor, BYOC etc). Key point is currently we don’t have one S0 example show that you could come earlier. I think we have S0 S1 S2 concept and boundary is very important, and attract them to make these module as optional module doesn’t affect existing functionality. This will bring TVM’s ecosystem stronger and more healthy, we could also see more S1 level module, which could make TVM as leader in the so competitive area.

S0 level project is not everyone should care. S0 level project should also be related with TVM. Modular is important, even today in LLVM, so many module projects, but they could work well, BE people doesn’t consider Clang analyzer module much. So does TVM, we define S0 S1 S2, they are one part of TVM ecosystem and could work well.

1 Like

Lots of great thoughts in this thread! Here is my 2p - it makes sense to define the processes around contributing new modules to TVM and of course, encourage as wide a pool of different contributors as possible. To do this successfully, we have to accept that TVM community is quite diverse, so if we are talking about community over code, we have to take that diversity in the community into account.

Besides TVM being very well suited for realizing latest research in the fast moving ML compilation world, there are also current (and hopefully future) users who value the usability, robustness and reliability of TVM alongside well defined community procedures and long term view for the project. With that in mind, I mainly disagree with two points in the current proposal:

  • In discussions of S0-level RFC, maintain a clear separation from S1, and S2 level decisions in later stages so we can encourage S0 changes early while enabling informed decisions at S1, and S2 levels in continued discussions as the modules continue to evolve in the ecosystem.
  • We encourage discussions about S1, S2 level to set up context, but they should not be used as blocking factors to the RFC.

Since everything that becomes merged into main becomes a responsibility (with the maintenance cost) of the wider community, discussing the future of the module is important for the community to be able to take a decision on whether it can be supported in TVM long term. There being a clear future roadmap for experimental features would be a source of confidence for potential contributors who are worried about the production grade usability of TVM. So I suggest we ask for a S1/S2 type roadmap be included with the S0 proposal, similarly how other upstream communities do it, as @leandron mentioned above. Giving some thought about the implications of your work to the project you contribute to is not a huge ask, I think.

  • More than three PMC members endorse the S0-level proposal to ensure that there are enough eyes and shepherding in the module. The decision to establish a S0-level module needs to get majority support from PMC.

That sounds like overriding Apache procedures to me - I don’t think we should do this. For many contributors TVM being Apache is one of the main attractions since it gives some confidence that the community follows well defined procedures which allows different contributors to work with community to achieve their objectives. Changing that has potential to alienate current and future contributors. Also, I don’t think it is anyhow necessary, surely we can innovate without overriding the Apache procedures and lowering the bar for new contributions solely based on the fact that they are new. If Apache Spark and Torch can include new experimental modules without making it excessively easy to merge them into main, so can we :slight_smile:

2 Likes

Thank you for the proposal @Hzfengsy. I think it is important that we talk about how new modules are added to TVM.

There is wide variety of people using TVM: academics, companies, programmers working on personal projects. Each has different uses and needs from TVM. I am going to share my perspective here.

I am someone who ends up spending more time maintaining and using features than contributing new ones. I value robustness and maintainability. Robustness because my company uses TVM on a large variety of models and a wide amount different device targets. I need TVM to work across all of them. I value maintainability because I have to fix TVM when there is a bug on any of these many model-target combinations. I have to understand code that is not mine and that I do not have much experience with.

From my point of view, adding optional modules (as proposed here) decreases maintainability and robustness. I see little benefit from keeping optional modules in tree unless they are in the process of becoming a regular module. I see the following issues with optional modules:

  • Optional modules need to have some integration into the codebase. If we leave the integration in the optional module then we essentially require the module contributor to duplicate a lot of code (with the issues mentioned below). If we put the integration into the main codebase then I see little point in having the module be optional. We’d have code in main that only exists because of said “optional” module.
  • If optional modules don’t require any integration code, then why even put them in tree? The module contributors have to do more work to ensure that the module works with CI, stays up to date, etc. And TVM maintainers have to do more work by managing this optional code.
  • You mention duplicated code is allowed. Duplicated code greatly increases the cost of maintenance. Copies must be held in sync — contributors must remember to change two copies of code that live in entirely different directories. How can we expect maintainers to even know about the duplicated code? As a small example, auto scheduler and metaschedule have almost identical (but duplicated) code for features extraction. A fix was merged for auto scheduler’s feature extraction (#12079) in July. It wasn’t until two weeks ago that we realized the same issue effected meta scheduler.
  • Optional modules in tree imply that they are “blessed” in some way by the TVM community. This may imply support (in the form of issue tracking and discuss forum), or that the module is in a generally usable state. Any issues/bugs/crashes caused by usage of the module reflect poorly on the quality of TVM as a whole.

To be honest, I don’t think I’ve ever had a good experience with optional modules in any projects. For example, I’ve tried to use OpenCV’s contrib (the rgbd module in particular), and it was incredibly buggy. In some cases it straight up crashed. Trying to figure out what was wrong was impossible as there was no documentation.

2 Likes