[RFC] Remove CODEOWNERS

Summary

Move .github/CODEOWNERS to .github/CODEOWNERSHIP to avoid triggering GitHub’s automatic review requests. Add GitHub Actions automation to a) populate Review Requests based on PR thread traffic and b) ping languishing PRs.

Motivation

Languishing PRs are a relatively common occurrence in TVM today. In order to maintain a vibrant open source community, we should work to reduce or eliminate these occurrences. PRs languish for a variety of reasons, but a common problem new contributors have is finding a reviewer for their PR.

In June 2021, TVM attempted to solve this problem by introducing CODEOWNERS. The goal was to make it easy for contributors to find a reviewer for their PR. This approach was problematic because the layout of TVM code meant that a file-based approach to sharding code ownership was incompatible with most PRs. The average PR spans so many CODEOWNERS directories that the average PR triggered code-owner requests to half of the TVM committers, diluting the responsibility of each reviewer and in many cases generating spam for reviewers who won’t end up reviewing the PR at all.

It would be nice to rely on automation by tuning the CODEOWNERS file. However, the organization of TVM is such that the scope of a directory includes many different efforts. For instance, a change in src/relay/backend may affect the core compiler but may also affect automation and runtime. Tuning CODEOWNERS could well amount to adding file-level ownership, and maintaining that is intractable.

We also attempted to tune CODEOWNERS by switching to round-robin review style in tvm#9057, but this approach ran into problems, summarized by @areusch in this comment.

Guide-level explanation

Many reviewers find it difficult to sort through review traffic and determine which PRs they are on the hook for. GitHub provides solutions for this in the form of Review Requests and Assignee fields—reviewers can list the PRs which mention them there. Since CODEOWNERS worked by auto-populates the Review Requests field, removing CODEOWNERS from the repo in turn allows us to reuse these fields to better track who is on the hook for reviewing a PR. We should take this opportunity in spam reduction to attempt to make GitHub PR traffic more relevant to the community as a whole. A great way to do this is to develop a better system for populating Review Requests and Assignee fields to leverage the GitHub PR review system.

However, absent CODEOWNERS, these fields need to be manually filled. One could imagine that TVM Committers and Triagers could triage new PRs and populate those fields. However, there are some limitations on this system imposed by GitHub and the Apache Software Foundation which make this difficult. Specifically, anyone mentioned in a PR must either be a TVM committer or actively participating (e.g. by replying) in the PR in order to be placed in the Review Requests field. This means that a committer must continuously monitor a PR thread to keep those fields as accurate as possible.

Since manual processes often lead to inconsistencies, and the conditions above are somewhat adversarial, some automation is desirable here to attempt to standardize on one system for tagging PRs with assigned reviewers. To address that need, this RFC proposes two additional changes:

  1. Automatically assigning reviewers based on cc tags in PR messages
  2. Periodic automated ping messages for participants in PRs to prevent PRs from languishing

Committers are responsible for monitoring and triaging new PRs and issues to the relevant parties, and this RFC doesn’t change that. It assists by reducing notification spam so that each notification a committer gets is now something that needs to be addressed.

Reference-level explanation

Removing CODEOWNERS solves the spam issue by stopping the deluge of notifications to committers, but introduces a new issue in that PR authors still need to be able to assign reviewers. The combination of https://github.com/apache/tvm/pull/9934 and https://github.com/apache/tvm/pull/9973 are meant to address this. Since many TVM contributors don’t have permissions to add reviewers themselves, anyone who is a committer that is addressed in a PR body with cc @username will be added as a reviewer.

PRs should not stay open forever and should get reviewed in a timely manner. The second PR linked above addresses this by periodically (currently set to wait 7 days) pinging PRs that have not had recent activity.

These two tools should make it so the TVM community is still able to maintain a good velocity on PRs while avoiding spamming committers with notifications.

Drawbacks

It may make it more difficult for some PRs to get reviews. Instead of everyone being tagged, no one is tagged. We need to rely on active committers and triagers to triage new PRs without review requests to the relevant people.

Rationale and alternatives

  • Narrow CODEOWNERS to people that will commit to reviewing every request they receive. This is likely untenable due to the volume and cross cutting nature of many changes (i.e. a small change to one file as part of a larger PR will trigger reviewers for that file, even if they can’t review the entire PR).
  • Drastically lower the requirements to become a committer. This would remove the need for some of the automation above as we could rely on GitHub reviews instead of bespoke tools but we would still need to get rid of CODEOWNERS to avoid spam. Additionally, the set of reviewers will become broader, improving PR response latency but increasing the need for coordination amongst reviewers.
  • Use GitHub teams to assign reviews. This is difficult since the teams have to be created in the Apache organization which is hard for us to manage. Despite sharing responsibility, this still leads to lots of notifications for participants.

Future possibilities

  • There could be a rotation of triagers for new PRs and issues. When responsibility is shared, it is easy for someone to say they thought another committer would do the triaging and PRs/issues end up unaddressed. There could be a specific triager assigned each week to monitor PRs and issues. PyTorch has a similar process.

Note: This will be published as a process RFC after discussion

11 Likes

It was brought up that it would be nice to have the ability to not only cc specific people as reviewers but groups of people by labels. PyTorch does something similar which we could mostly copy. So labels could be automatically assigned based on matching “labels” in PR titles. For example, the PR: [microTVM] My microTVM change would look up the microTVM label and cc the relevant people to that label on the PR.

We could also try to automatically label PRs based on the content of PRs, but at that point we’re basically wrapping back around to CODEOWNERS.

1 Like

I support this RFC and will shepherd it through review process. It looks like we have some :heart: from the OctoML folks, but would be great to get some feedback from others in the community, particularly from folks who may be monitoring TVM PRs less frequently than those of us at Octo.

Let’s allow for comments through end of this week, and we can propose the formal RFC on Friday.

My personal opinion of this is that I think removing automatic assignment of reviewers from CODEOWNERS will help to reduce the spam and make it easier to respond to PRs. I’m not sure this is a full-fledged replacement for CODEOWNERS, and such a replacement may become necessary in the future. However, I think it’s a good start, and we can continue to iterate on this problem once we reduce the blast radius of each PR (in terms of review requests). However, I also helped author this RFC so I don’t feel comfortable proceeding without feedback from the rest of the community.

ccing a few folks from the community @Mousius @manupa-arm @leandron @ramana-arm @comaniac @yzhliu @ziheng @wrongtest @kparzysz @hjiang

It definitely makes sense for us to reduce the traffic from the github emails. Github teams is definitely a good idea. I’m supportive :+1:

Supportive. It’s tedious that Apache org creates many barriers for us to leverage a more systematic way, but the proposed solutions with a number of bots to assist the review process are definitely helpful.

Meanwhile, I recall that one important reason of adding code owners is to help improve the PR visibility from new contributors who don’t know anyone in the community and have no idea who to cc. In this case, it’s possible that no one from the community will participate the PR so the proposed solutions still cannot help. Thus, I would suggest adding one more change to this RFC (either of the following):

  1. Improve the PR template to guide the contributors about how to improve the PR visibility. We actually have this already but seems not effective. Pro: it is a simple change. Con: it may be ignored by most contributors.
  2. Improve the cc-bot to do more. For example, if the cc-bot finds no @ in the PR body, it replies the PR saying that it is encouraged to cc someone from the community for visibility and here is the code owner list. Pro: this will definitely get the attention by PR author. Con: it needs more changes to the cc-bot.
1 Like

I agree that the existing approach does not work well and support this RFC :+1:

@comaniac thanks for your comments!

I agree we have this, I’m not sure it scans very well (for example, the links are in Markdown so someone isn’t going to mentally parse those and render them in their head). Perhaps we can improve the template and make it a bit more organized.

I really like this idea. @driazati do you think this would be a big change? seems like it’s pretty related to the PR ping bot.

While our docs could use some consolidation, we already link to our contributing guide in the existing PR template. I don’t think we shouldn’t expect new contributors to read through a docs just to find who to tag in their PRs, we should have a constant rotation of committers monitoring new PRs and sending them to the relevant people. The process could look like:

  • Check https://github.com/apache/tvm/pulls for PRs with a triaged label (and auto-mark PRs from contributors triaged)
  • Briefly inspect the PR topic / changes and assign reviewers directly / tag people via @-ing them
  • Add the triaged label

It would be simple to have the PR ping bot tag some people if there has been no activity on a PR and the author is new after a quiet period of a day.

I filed an issue with ASF Infra to see if we can get them through GitHub teams but it looks like we will end up having to re-create it ourselves

1 Like

@driazati @areusch ,

This looks like a great suggestion!.

I think the proposal is about adding a mechanism to use the cc tag to attach people as reviewers, which seems good step.

I agree with @comaniac by helping the new authors finding people to tag for reviews rather than doing a mandatory review requests for many committers. Secondly, I think this is not really removing the CODEOWNERS but rather removing mandatory reviewer assignment from CODEOWNERS (maybe a text change ?). Otherwise, how would we populate the list to help new authors ?

cc : @leandron @Mousius , PTAL when you have some time.

1 Like

I think that’s what @driazati meant. “Removing” CODEOWNERS doesn’t mean to remove but rename that file: https://github.com/apache/tvm/pull/10192

1 Like

I see. All good then :smiley:

Thanks for @driazati @areusch to bring this for discussion, I think it make sense to reduce the notification spam by rename the codeowner file, and definitely that can help contributor to focus on more related message from community.

I agree with @comaniac @manupa-arm about helping the new authors to find people to tag for reviewer, and we may need some following up PR to address such concern.

Posted the https://github.com/apache/tvm-rfcs/pull/58 with some links to PRs that should address some of the concerns about getting reviews without codeowners

2 Likes

We discussed this at the TVM Community Meeting this morning.

  • It seems like some of the problems addressed here (unable to assign PRs to folks outside the Apache org) are better handled by adding more folks to Apache org. Are we exploring this?

    • Yes, but separately from this automation. This RFC is merely an attempt to improve our automation.
  • Is there any other way to work around the limitation of being in the Apache org?

    • I think Apache requires us to host our source code on their infrastructure. We could perhaps consider moving away from GitHub, but that’s more drastic. Within the limitations of GitHub, I believe we need to stay inside the Apache namespace.
  • The bar to be in Apache via TVM seems high.

    • Agreed, and this is something the project needs to work on.
  • Is it possible for folks to join Apache org without being a committer?

    • We do have a Triage role we can consider using for this, but there are a limited number of seats so we need to balance this request against other project-level needs.
    • The limit is somewhat artificial from Apache.
  • Are we exploring whether we can add membership roles in the ASF to address this? Getting write access to a repo seems like a fairly large step–can we take a smaller step here?

    • We haven’t explored this, but it would be a good thing to work on.
  • Does this RFC make anyone participating on a review thread into a committer?

    • No.
  • Reviewers are missing some functionality that would be particularly helpful:

    1. Can’t close issues
    2. Review approvals don’t show up on PRs (e.g. no green checkmark appears)
    • It would be great if we could find a way to improve the situation.
  • Remember this is just one step forward, so we will continue to monitor and improve from here.