[DISCUSS] TVM Core Strategy for Emerging Needs

In my personal feeling, TVM’s momentum was at a peak around 2019-2020, but went down afterwards. I can’t speak for others, but for myself, I can’t keep contributing to TVM since 2021 because my team switched the gear to work on distributed training, which was popular by that time, but we felt it would be time consuming to propose and upstream training support (it turns out to be true if you look at Relax). Finally we worked on our own codebase and gradually got away from TVM upstream.

On the other hand, I recently found that TVM’s momentum goes up again along with the brand “MLC-LLM”, just because a few community members make some efforts to release a high-performance chat bot app of Llama-2 powered by TVM unity. This to me is a role model that shows the importance of catching up the workloads/applications most people care in order to stay on SOTA.

Consequently, it would be great and happy to see TVM unity becomes the official main branch, so that it’s likely to further accelerate the developments of LLM related features. For example, serving a LLM with tensor parallelism and quantization would be extremely important in the upcoming 6 months. NVIDIA is going to release TensorRT-LLM next week, it would be a big bump if MLC-LLM is able to be competitive.

My two cents.

6 Likes

I support TVM unity to be the official main branch.

I would like to share some of my experiences with tvm. In the past, there are several pain points when we tried to adopt TVM. For instance, quickly experimenting with an optimization was not easy and often required hacking the entire compilation stack. Moreover, the lack of support for training, dynamic shape, etc. limited its applicatiom. From my perspective, the current unity approach has addressed these concerns, and is crucial for increasing adoption.

3 Likes

I think we all agree that bringing the features development in unity is desirable for the many reason listed and stressed over and over in the posts above.

The strategy to bring the changes and incorporate unity changes, seems less optimal and less inclusive, given nobody in the community is expected to keep in sync with changes introduced in unity, as development branch.

Just taking the path that is most convenient for a subset of the community (like moving unity to be the new main, or nominating unity as the main branch) would force unknown and potentially breaking changes into the codebase, while leaving developers and end-users to their own devices, to figure out what works and what doesn’t. I don’t think it is an onus of the development community to keep track of all development branches.

I would like to suggest that together with T0, T1 and T2, that we consider a T3 and take “unity” as a regular contribution and merge it into main in organised chunks of features that would make more sense in the git history.

T3: Extract major features from unity and raise feature-level PRs against main. Fast-track them onto main once documentation and testing are in place, and current main CI passes.

This is also an opportunity for us as a community to understand the changes coming in, organise commit messages, describe features in an inclusive way, making sure documentation exists, etc.

A positive side seems that many of these changes are local to specific namespaces, and having them integrated in patches shouldn’t be a lot of work. From an engineering point of view, it looks more in line with the common practice in upstream project and much more inclusive as well, when compared to taking just a blob of ~200k+ lines into the project.

More importantly, I don’t think it is right to impose such a bulk change in the project this way. At the same time, I understand the time pressures some might feel, but those should be put in perspective of keeping the project coherent and visible to the wider community (not only unity branch maintainers).

The main reasons for T3 (a.k.a. orderly moving unity features to main) are:

1.Diff delta / opportunity for code review

Before becoming unity branch in TVM repo, there were hundreds of changes in which the broad community had no chance to review (the main cause being it was host somewhere else for a long time).

According to a rough comparison done on GitHub, the diff between unity and main today (including submodules) is something along the lines of “888 changed files with 211,567 additions and 8,684 deletions”.

2.Bundled features

In tandem with “unity” as a branch, there are quite a few features being brought in such as ”MSC Graph”, “Disco”, etc. which are largely unknown to the wider community, so in moving them orderly into main, is an opportunity to advertise those better to the community.

Remember all these features/tests become responsibility of all to maintain once they are integrated with the code.

3.Test coverage of the branch

Many tests and CI validation parts are currently disabled in the unity branch, which will require current contributors to rework features and also for them to figure out if their support works at all in the new landscape, which seems hostile to current contributors that have visibility of ”main”.

In moving changes orderly from unity to main, it has the advantage of at least guaranteeing minimum compatibility with current test base, so that we can keep the “revolution with impact minimization in mind” commitment stated above.

The cost of the change will always be somewhere, I’m just advocating that we don’t simply pull the plug on features people are currently relying on in main for lack of testing, and replace it with a less tested/reviewed version of it.

1 Like

Thank you for your input. Different members of the community have different interests, and there are common shared goals. One thing to be mindful of is that there are different ways, and some may work for certain modules, circumstances, and the current community demands.

In the end, it is also about the real practical executions of the community. Let me first expand on some of the points here.

Responsibility of maintenance

The reality is that the responsibility of mostly maintenance falls onto the members who contributed and developed the modules and related features. Such merit is developed through contributions. So naturally, most communities lean towards empowering the decisions of these members as long as the contribution is scoped in modules.

As a community with diverse needs, nobody is required to be aware of all modules to contribute. In practice we also pay attention to the modules we normally engage with. Based on our past year’s contribution activities, members who designed and maintained the core modules, such as arithmetic, relay IR design, TensorIR, AutoTVM, and some of the graph IR nodes, are responsible for and still actively maintain these modules in main.

Maintainers of core modules in the main branch are also now maintaining unity. But they all took the extra mile to bring things of existing modules to main to fulfill their responsibility, even if that incurs a good amount of overhead in the past year.

It is the same group of people who would continue to ensure that the features in the main continue to work as no changes go through them, through the module isolation and changes. We would lean more towards them for thinking about ways of incorporation.

Preserving main modules As stated above, the development practices as of now bring all the changes in the existing modules to main. Changes to these modules are covered by tests in the main. Other tests are turned off in unity due to cost, as many modules are not structured through a UT approach, which brought undesirably long time, and changes to unity are not related to these modules. We will of course, work together to ensure that the relevant modules don’t break and work to fix them when that happens. These are concrete conversations that we can have to enable the community.

Bring community awareness

Some of the suggestions boils down to the common goal of community awareness. There are many ways to achieve the goal, and the community is doing that in the past year, and continue to do so in the incoming months:

Sending modules in chunks to the main would have been a great evolution approach one year ago. The community worked to do some of the approaches when the unity branch was established. At the current state, however, the approach likely will be much less practical or desirable due to the energy and complexity involved to also maintain the goal of bring timely foundational model support.

In practice, we need pragmatic ways to enable the goal of bringing foundational model support timely. Ideally such support should have landed months ago, and timing matters, especially given the current ML/AI space. There is a great risk of lost momentum and relevance. We would already run into that risk in practice if we didn’t empower unity development.

This being said, we would love to see the community spend energy on more awareness. Please ask questions about the modules and bring suggestions and discussions, as we repeatedly mentioned in the past one year. We would love to work together to solve concrete questions like “here is how we bring BYOC” and these energies are worth spending. There are likely months we can continue to do so, and likely more to come.

Action matters and speaks louder

Many discussions we have are about how to approach our goals. Where naturally the community can have different opinions. Regardless of the conversation of how. We need concrete action and executions to land – which in term needs support from community members doing these ground works. Having a seemingly perfect approach won’t necessarily get us LLM support, or even get the community to collectively work toward that. We need real groundwork to make things happen, and empower the community to make these ground works.

Actions and results speak louder, and that is what users turn to the community for. We would need to avoid running the risk of being over-bureaucratic, as we do not exactly command that the community should take exactly one approach.

The reality is simple: insisting one seemingly perfect engineering approach won’t give us solutions for LLM. Nor will they resolve real technical challenges at the core, while the right architecture(and empowering members that does ground work to bring them) is the real key.

Following one specific approach is not a necessary nor sufficient condition to get a “good or right project”. Approaches can only facilitate and empower the community towards the goal.

In the end, it is the community who comes and builds code, does the groundwork, maintains the existing modules, and brings real LLM support. These are real works that every member sees taken into consideration when we make the collective decision on ways to move forward. We all have to do real groundwork to earn merits and, as a natural consequence, alignment from community members, which in turn is reflected in community strategy and collective choices that naturally empower these ground work.

It is good that we brought up these possible alternatives for the community to consider. Let us also work on ground works to help show the viability of the path. e.g. here is one approach, and here are some examples on enabling LLM or other needs through this approach.

These extra information can help community decide the path forward taking them into consideration.

The last thread was a bit long (mostly addressing how). Admittedly there are many possible approaches, some of us would prefer one approach. There is also a good amount of clarity about what the approaches are and we have spent quite a bit of conversations about how in the past year at meta-level. In the end it is the community that collectively decides the approach to take based on the specific context.

I would encourage us instead spend more energy here talking about concrete groundworks, actions, design conversations so we can realistically bring the needs. This would include examples like solutions for LLM serving, and actions to maintain current modules (e.g. sending improvements to arith). Ideas on how to leverage core strategy, or better strategy to execute on goals.

Many in the community would welcome these on-the-ground conversations. We love to bring Llama2, diffusion models, and different codegens for our community. It would have been much more fun and productive, and we trust the community to collectively figure out the how in each situation based on concrete on the ground signals.

I am a contributor to both main and unity, as well as most of the active contributors. More specifically, I initiated/contributed/maintained to the following modules in main:

  • TVMScript
  • TensorIR
  • Relay
  • Runtime
  • MetaSchedule
  • TOPI
  • RPC
  • TE

so far, I love the development in unity that people are focusing on the right thing, concrete ground work that is pushing the boundary of open source world, empowering individual contributors to democratize the underlying techniques, making sure they are not monopolized by only one powerful company or two, enabling critical usecases like LLM, StableDiffusion, multi-GPU inference, KVCache, etc.

I also continued to maintain and bring contributions to main modules as shared above. I believe T1 is the best approach to work with community to continue supporting modules both already in main and unity - I’d love to continue contributing this way under the T1 solution.

Anyways, to conclude, we should always focus on concrete ground work enabling usecases like LLM inference, diffusion models, etc., and I’d love to work together with all the community members to make this process easier to all our contributors!

2 Likes

LLM is one of the central topics in AI today. As a result, it’s great to see TVM unity becomes the main branch, which means TVM has the ability to accelerate and deploy the most popular AI workloads.

2 Likes

T1 is the favored option for me. Transitioning to unity as the main branch will offer numerous advantages to our community. Users can readily explore distinct TVM unity features such as distributed inference, LLM, and stable diffusion support. While current main branch users and contributors can continue on that branch. In fact, the modules from the main branch are incorporated in unity. This shift to unity as the main branch would allow them to access more features, and the existing use cases should still work. We can incrementally enable the skipped test cases in unity. However, time-intensive tests, like those for Mxnet/TensorFlow1/caffe2 frontends, might be best left disabled in unity to expedite our CI, which currently averages over 4 hours per PR in main.

I have concerns about T3. The unity branch currently has 600+ additional commits (210k+ lines of code) compared to the current main. Landing these commits through PRs from unity to main would be a lengthy process. Given the CI duration, potential change requests, review times, and other factors, it wouldn’t be surprising if this took much more than six months.

1 Like

As an active participant in the TVM community, I began contributing in 2019, initially to TensorCores and subsequently to the TensorIR project.

I firmly believe that concrete groundworks are crucial for the community’s collective growth. Consider TensorIR, for instance. The intricacies of its operation, including the details of ScheduleState and Primitive implementations, remain largely unnoticed. Yet, its popularity stems from its ease of use and its ability to meet emerging needs, such as Tensorizations.

A similar phenomenon should occur with unity. It provides foundational support for traditional models, much as Relay does, and the MLC-LLM has demonstrated unity's applicability to novel requirements. From a user perspective, unity is significantly more user-friendly than existing alternatives, which I anticipate will garner community approval.

In summary, our focus should be on making tangible contributions wherever possible.

Great to see we bring this topic up again and people are converging that Unity should be merged into the main branch. As an active ex-contributor, I’ve been still closely following the updates in the community and always strongly supporting Unity as part of the main branch.

The recent trends on LLM inference and deployment would definite make Unity even more appealing than before. Particularly, there are already quite some data points that showed good performance compared to other open source projects. I’d like to see this happens in the community ASAP(favorably landed as T1), and I volunteer to help review PRs if it would make the process smoother.

1 Like

Thanks everyone for input so far. I would also encourage us to take a look at the strategy and bring thoughts on how can they help accelerate our future developments . As the original post mainly proposes as a community we adopt a different strategy of development moving forward:

  • Before: use build-centric for everything, offers no solution for emerging needs.
  • Proposed: take a per sub-area approach, use build-centric for some existing use-cases and enable on-demand shift to abstraction-centric, abstraction-centric approach for emerging needs, and accelerate new solutions(LLM) with abstraction centric.

This is the key change that we as a community can take and goes beyond simply set of features. That means different modules/area and their respective contributors can set the pace and technical approaches while keeping things scoped.

The “before” approach has not been empowering the members who would love to amplify thrusts in foundational models and emerging needs (that accounts for 90% per recent poll). These members had taken great strides and did ground work to preserve the opportunity for the TVM community to still have a chance in the LLM/genAI space, plus additional extra groundwork to maintain main modules and sync the branches. That is a lot of service to this community.

The “proposed” approach helps empower concrete groundworks to amplify foundational model thrusts. It offers a concrete solution to emerging needs and future growth with complexity management and scope isolation. The tradeoff is that we need to be open-minded about different approaches at meta-level and look at concrete cases collectively. Get the choice to the community and ground works.

2 Likes

Given there are quite some interest in the direction, just want to chime in about the timeline. First, it would be great to get the community to check and collectively have a strategy (aka proposed approach or others), this is the primary part of the initial post.

As for the transition timeline, we will leave at least one to two months so community members will be able to ask questions get get dived into the new modules if needed, they can be found in the unity category in the past year.

Then the ground work will hopefully happen as we continue to develop with a strategy for foundational models

1 Like

Will Integrating OpenAI triton’s method?

  1. THE TRITON LANGUAGE | PHILIPPE TILLET - YouTube
  2. 谈谈对OpenAI Triton的一些理解 - 知乎 (zhihu.com)

We might be able to leverage library dispatch support to enable triton and others. We will also incorporate some of the triton’s insights to improve the tensor level abstraction

2 Likes

Sorry for late participation, ARM China as a commercial user of TVM have done lots of work to meet the customer’s requirements, just like what tianqi have pointed.

1. The centric build approach can’t work for us

in the early time we try our best to add the customized pass or logic in Relay’s centric build flow, then we found we need pay lots of time to fix official failed cases and actually all these fixes are very specific for our business scenario, they aren’t general solutions, so they shouldn’t be contributed to community. Finally, as the business requirement grow, we choose to use our own build flow, this way can control the passes flexibly.

2. TVM Script (TensorIR) is the key to our next step DSL work

Besides the Relay/Relax graph level work, our next step work will be focus on DSL, in other words, we need a higher level and more abstract programming method than the traditional OpenCL C way, the key to archive this work is TVM script, because it resolves the expressive power issue of TE. So we need and look forward to the newest update of TVM script and relevant work.

Many TVM users may know TVM script and Relax are the two key point of Unity branch, some community contributors like @Hzfengsy @junrushao pay lots of extra time to keep the TVM script work synced between branch main and Unity, the same changes need to send two different PR to two branch will consume lots of precious energy and time of TVM contributors. So we agree use Unity as the main branch.

3. The work of Unity is the key of everyone’s next step success.

Everyone know the work of Unity is focus on LLM/AIGC, if a company can’t keep up with this technical revolution, then we can see it can’t be success in the future. Use Unity as the main branch will make the downstream organization of TVM community like us easier to use the newest work like dynamic shape, Relax and so on.

4. The transition will be smooth for Relay users.

As previous discuss pointed out, the Unity will still keep all work of Relay, so the downstream like us can transition smoothly, this is very important for our customers, so very thanks to community contributors for keep this point.

4 Likes

I’ve passed all mainline tests in the unity branch. I think it’s a good time to consider migration :slight_smile:

5 Likes

Thanks everyone for chiming in and @Hzfengsy for efforts making sure all existing modules supported. Now that we are in new year and genAI becomes ever more important. It is a good time to do the proposed transition. We will open up a vote in the incoming week.

5 Likes

Formal voting thread [VOTE] Transition Main to Unity · Issue #16368 · apache/tvm · GitHub

thanks to everyone, unity is now main. Here is a followup post to bring document to align with the core strategies discussed in this post [DISCUSS] TVM Unity Transition Docs Refactor