[Process RFC] Clarify Community Strategy Decision Process

Dear community, there has been quite a bit of discussions on how we can collectively make strategic decisions forward, including posts by @Hzfengsy and others. After reviewing our conversations in the past years and existing practices from other ASF projects, we think it is helpful to bring clarity, so we can make strategic decisions for directions like [DISCUSS] TVM Community Strategy for Foundational Models - #6 by tqchen

As a result, we made the following process RFC.

Summary

Machine Learning Compilation (MLC) is an emerging field in fast development. With the tremendous help from the whole community, it’s exciting to see that TVM delivers significant needs from and to developers and thus has become widely popular in both academia and industry.

As the community pushes for different goals that help each other, naturally, there are strategy decision points about overall directions and new modules adoptions. These decisions are not fine-grained code-level changes but are important for a community to be viable in the long term. The process of bringing those changes is less clarified to the community, and hurdles can be high. We have made attempts in the past to bring a more verbose processes, but this has proven to be less successful. One observation is that it is hard for broader volunteer developers and community members to follow complicated processes. Additionally, different members can have different interpretations of how to do things, leading to stagnation and lack of participation from volunteer members.

We are in a different world now in the case of ML/AI ecosystem, and it is critical for the community to be able to make collective decisions together and empower the community. Following the practices of existing ASF projects (e.g. hadoop), we propose to use a simple process for strategic decisions.

Proposal: Strategy Decision Process

We propose the following clarification of the strategy decision process: It takes lazy 2/3 majority (at least 3 votes and twice as many +1 votes as -1 votes) of binding decisions to make the following strategic decisions in the TVM community:

  • Adoption of a guidance-level community strategy to enable new directions or overall project evolution.
  • Establishment of a new module in the project.
  • Adoption of a new codebase: When the codebase for an existing, released product is to be replaced with an alternative codebase. If such a vote fails to gain approval, the existing code base will continue. This also covers the creation of new sub-projects within the project.

All these decisions are made after community conversations that get captured as part of the summary.

8 Likes

Thanks TQ. I proposed a similar one a year ago, and it’s a good time to bring it back. The outlined approach, including the lazy 2/3 majority rule, seems fair and transparent. It’s a positive step towards more democratic decision-making in our community, and I fully support its implementation.

1 Like

I’m always in favor of making process in a way that keeps up-to-date with latest trends. Thanks @tqchen for this proposal!

1 Like

Although this topic has been raised a few time in the past 1-2 years, it’s always good to clarify it again and make some adjustments as needed.

Just share my two cents: I fully agree with TQ that MLC is emerging, especially since late 2022 when ChatGPT and LLaMA came out. In this era, move fast is the most important thing to stay on SOTA and keep the community growth. For example, HuggingFace text-generation-inference (TGI), vLLM, FastChat, and MLC-Chat are very popular solutions/applications to deploy open source LLMs. And you could find that one common reason is they support new models and technique as soon as they can. For instance, all these frameworks supported LLaMA 2 within one week of its release. Although I personally don’t develop TVM recently due to the shift of my focus, I’m still paying attention to TVM and its eco-system. The reason is that I need to deliver an LLM serving flow ASAP so I have to use existing solutions such as TGI, but ultimately I still need a compiler-based solution to achieve an even better efficiency. In short, I would probably just stay with TVM and won’t even consider TGI if MLC-Chat came out earlier, and this illustrates the importance of fast decision and adoption.

1 Like

Thanks for the proposal. I’m in favor of it. MLC is a emerging field and it’s important to make strategy decision together and empower the community.

1 Like

Thanks TQ for the proposal. I fully support the decision-making process. I actually shared my opinions last year that we should keep moving fast and not block ourselves to make TVM easier to use and/or to adopt new technologies.

1 Like

Thanks TQ for the proposal.

I’ve read the proposed RFC and I am in complete support of this new strategy decision process. It provides a clear and unambiguous path to making strategic decisions that will have a substantial impact on the TVM community. I am looking forward to witnessing the decisions we make using this new process.

Thank you again for this thoughtful proposal. I am voting +1.

1 Like

Tanks for bring this up, I have totally agree with TQ’s proposal, for nowadays, quick iteration is the key of success, and the key of quick iteration is quick decision. TVM already have great success in several seniors, but if the key feature that maybe lead TVM continuous success in future is blocked by lots of discussion and alignment, then I think maybe we will loss the opportunity when we think we are ready to start.

1 Like

I am in 100 perfect favor of this proposal and voting +1.

There always have been sayings that deep learning compilers are going to converge with no foreseeable innovations required in this field. However, the reality is that the rise of new hardware with new features and new models motivates new compiler frameworks to be built. LLM and its various supporting frameworks are examples of it.

In fact, Transformers have been proposed for around 6 years and have been popular for at least 4 years. Still, people hardly use DL compilers as the first choice when they craft systems to serve LLM inference, but using vendor libs and hand-crafted kernels. As a member of this open-source DL compiler community and one who has been working on TVM for several years, I regard this as a reminder that ML engineering is still a problem far from being resolved, at least for us.

For an infrastructure software that is trying to accommodate the requirement of a fast-changing field, it either gets busy living or gets busy dying.

And to change the way people doing Computer Science, we need to accept changes and being able to change faster than people do.

1 Like

Thank you, everyone. I want to add an update about the current state. Right now, we have two kinds of opinions on the process. This is likely due to our different preferences of community operation, as well as our interpretation of some of the same text (e.g. “what qualifies as breaking change”, “whether it is important to listen to community members”).

I would categorize the overall rationale as two kinds of community operational process.

  • W0: A more restrictive and verbose process. E.g. limit “new development branch or an optional module”, and with requirements on an additional set of RFC documents following the decision, among other elaborations.
  • W1: The proposal with a rationale of keeping things simple (e.g. decision applies to module period) and clear, and also comes with clear ways to move forward when different interpretations happen.

Unfortunately, the approaches in W0 and W1 are exclusive to each other, so we cannot have a proposal text that has both. e.g. we cannot have a process that is both simple and verbose. Stagnating on not making clarity likely means no W0 even if many prefers that as the community operational process. So far, I incorporated suggestions that help to clarify the text while keeping the proposal aligned in W1.

At the moment, there is no actionable W0-style proposal (given some suggestions are open ended), but please check out https://github.com/apache/tvm-rfcs/pull/102 thread which I think gives ideas around W0.

Depending on the individual, there can be different interpretations of the consequences of W0 and W1. E.g., what qualifies as “support long term health of the community”, whether W1 will bring. I think it is fair to say that some have opinions that W1 can bring less stability and breaking and no-go (that others disagree).

We have had related conversations for about a year now, I think it is the right time to bring clarity to the matter. Specifically we need to decide on how we operate, and approach to resolve different interpretations of the same text (which W1 offers to some extent due to its simplicity).

Every community is different and may have needs to adjust their operational depending on the state of the projects, the state of the ecosystem(in our case AI/ML ecosystem) and other factors.

ASF has an established way for the community to do that collectively after we review the possible ways of how (in this case W1 or W0/nop). Please do check out the original text and requests in https://github.com/apache/tvm-rfcs/pull/102 so you can form your own opinion about W0/W1 as this post could also be interpreted different depending on kind of approach each person prefer.

1 Like

Thank you everyone for your inputs so far. We have had many related conversations over the past year and get collective inputs on different views to approach this process. Based on the inputs so far over the past year, I opened a vote to bring forward the version that most in the discussion thread prefers so we can bring clarity to this process.