Improve Code Review Guideline

tqchen · August 27, 2021, 10:50pm

Dear community:

As we continue to grow the codebase and community, it would be helpful for us to also update code review guideline to help us get common ground in the healthy collaborations.

This thread proposes to update the code review guideline to

@jroesch, myself and many others drafted a proposed update capture some of the best practices in open source and lessons we learnt in the past. The goal is of course to serve as living document for the lessons, we learnt and continue to update the process. Please share your thoughts.

Code Review Guideline

Open source code is maintained by a community with diverse backgrounds, interests, and goals. Hence it is important to provide clear, documented and maintainable code and processes. Code reviews are a shepherding process used to collectively spot potential problems, improve quality of the code, and educate both contributors and reviewers about the code base and its assumptions. It is also one mechanism to ensure there are multiple people who can maintain a related piece of code together. Contributors are encouraged to polish the code to a reviewable state before requesting reviews. This is especially important for committer candidates, as committers are expected to participate in not only writing code but also reviewing it.

This document is a living guideline for code review in open source.

Building Trust

First and foremost, we are building a community that based on trust, which takes time and effort to both build and maintain. We expect our community members to work together in a constructive way and work together with common sense. Although we all have different sets of backgrounds, interests and goals we must work together to find solutions that work for the larger community. Trust-based collaboration is also a key tenant of the Apache way and an important factor to consider in growing the community, and promoting members to official roles.

Community Participation

Everyone is welcomed to comment on PRs. We encourage committers to wait for some period of time(e.g. three days) before merging PR that contains a major architecture change. The goal is to give people time to speak up and express interest in reviewing and participate.

Remembering that we are all coming from different backgrounds is important here. For example some community members work in different time zones, only work on open source during work hours, or may be traveling or having other events going on in their lives. An important part of working in a large project is ensuring there is collective understanding, so no one person is a bottleneck. While it is important to allow time for participation in code review we also can not block all changes on all reviewers. Remember that helping people land PRs is a great way to encourage broader participation, especially for those who volunteer their time to contribute.

Part of this is trusting and communicating with fellow maintainers that if changes need to be applied in the future that PR authors will later follow through on their promises. It is the responsibility of committers to listen to all feedback whether from PMC members or new contributors and consider what actions need to be taken.

Read the code carefully

Sometimes we may quickly read through the code and only pick up on a selective aspects of the code. These type of comments are usually helpful and should be welcomed in the community. However, they are only part of performing code review and should be part of more comprehensive feedback. A good and careful code review is a large time investment and sometimes can be longer than writing the actual contribution.

For example receiving only highly critical feedback on minor aspects of your PR rarely feels good, and it can be discouraging if your time and effort was not reciprocated during review. Practicing empathy when acting both as a contributor and committer is important and can help make you a more effective code reviewer and contributor.

We expect that all committers carefully read and understand the code before signing off. There is a lot of trust involved when a committer hits the merge button. In the meantime, we acknowledge that sometimes problems slip through, in that case, the merger is responsible for ensuring the correct follow up actions are taken.

Be Respectful

To everyone who are making comments: making constructive comment will help new contributors to land their PRs timely and help us welcome new members to the community.
To authors: reviewers should spend significant time reading the code, and a careful review could be as time intensive as writing the code from scratch. Respectfully address review comments and reciprocate the review by helping review others changes in the future.

Most importantly focus on having a constructive conversation, and try to assume best intentions when interacting as a reviewer. If there is something in the process not working, consider getting some face time with the other contributors and discussing how to improve the process or communication.

Factors to Consider about Code Quality

High quality code is critical to the long term success of the project. There are many factors of code quality to consider during a code review:

F0: Overall architecture. This includes the definition of public modules, key data structures and public interfaces. Good architectural choices are critical to the success of the project in the long run.
F1: Architectural consistency. There are usually multiple ways to implement a new feature. We must ensure new features are consistent with previous overall architectural choices and interact well with the existing code. Every new feature increases the complexity of the project, and a consistent design ideally minimizes the increase in complexity bought by a new feature, making it easier to maintain code in the long run.
F2: Code robustness and test coverage. Ensure code runs correctly in all possible settings(platforms), ensure test coverage of the new feature. Clear error messages for user facing errors.
F3: User facing API documentation: documentation of public user facing APIs and key module interfaces are mandatory. This includes the API, data structures that appears in the public interface (i.e., include/tvm and user facing python APIs). We generally encourage well documented code and include some form of documentations for internal APIs that are used in multiple places, see also F4.
F4: Code readability. Readability involves multiple aspects: instructive and consistent function names, clear implementation of the overall flow, descriptive comments for complex code logic and internal functions. Readable code is easier to maintain.

Architectural design and consistency are the most important factors since they are likely to introduce the most long term technical debt. As a result, committers should most carefully consider these factors before merging the code.

Test coverage and API documentation are expected for code contributions.

Code readability is relatively a subjective matter compared to the other ones. Different people have different thoughts on how to best write code. Reviewers should make constructive and actionable comments. In the meantime, code review should not be used as a way to get others to write code exactly the way you would. Conversely you should also consider that what you may easily understand, or find acceptable might not work for the larger community or other members. Use your judgment on what is appropriate based on the content and the scope of the contribution and where the contributor is coming from.

We follow common code style guides when writing code. Style guides help ensure that code is readable and maintainable by others, long after the original author has moved on. Style guides are more than about code formatting — they also pertain to the correct way to document code, variable naming, and other conventions that are not enforced by automatic formatters.

Consensus Building

Disagreements can happen during code reviews. We encourage building consensus among the people involved. We are working together and building trust with each other in OSS. The nature of OSS means sometimes we make compromises on less significant issues to make steady progress and welcome broader participation in the community. Compromise unfortunately means sometimes the world will not be exactly as we would like, this true even for leaders of the community.

Be civil and build consensus through constructive technical-based conversations.
A committer who owns the area can serve as a shepherd to drive the discussion by taking all the conversations into consideration, and suggest a resolution with to move forward.
Because a lot of trust is involved on the committer(shepherd), they should read the PR carefully before sign off. Additionally, the merger should also take the responsibility to followup in case there are problems caused by the merge.

Consistency

A final remark is that we are all human and its hard to always be perfectly consistent. If contributors feel that you didn’t apply these guidelines in a consistent way it is important to listen and hear folks out. We will constantly have to iterate on processes and guidelines as we evolve as a community. Our goal is to strive to be consistent and objective but all of us are unfortunately human and imperfect and will need to adjust and learn.

Additional Recommendations

Scope the PRs

We recommend authors to send well scoped PRs that are easy to review and revert in case there is a problem. Authors avoid merging multiple unrelated changes into a single PR and split them into separate PRs.

Label the PRs with Area Prefix

When sending pull requests, it is helpful to prefix the PR title with the areas related PR(e.g. use [TIR] for TIR-related changes). This would help people recognize the related areas and find PRs they are interested in.

Append Other existing tips in code review docs

tqchen · September 3, 2021, 9:37pm

opened a voting thread here https://github.com/apache/tvm/issues/8928

electriclilies · September 4, 2021, 12:06am

Just curious, but is there a list of the area tags that are currently in use? I usually tag things with what I think would be a good descriptor or a descriptor I’ve seen used by someone else.

tqchen · September 4, 2021, 12:55am

right now indeed we just go with the good descriptor as long as it is informative

ramana-arm · September 6, 2021, 10:59am

Thanks for this - there is one thing with code quality

Commit messages.

Commit messages without enough context still continue to go into the repo and while it appears nitpicky it is probably one of the more important things to get right consistently for a longer term open source project.

We should be following and dealing with better commit messages than just the sequence of commit messages that appear as just a sequence of

Fix bug …
Fix linting error

If something like that happens its probably worthwhile to have a better set of commit messages and for the GitHub PR to be recommitted with a more suitable commit message that lives for a longer duration.

can we incorporate that into the guidelines for committers and make it common practice ?

See here for other Free software projects with some minimal guidelines. https://sourceware.org/glibc/wiki/GlibcGit#Commit_Messages

Finally , is this text going to live in the source base ?

regards,

Ramana

tqchen · September 6, 2021, 1:36pm

Thanks @ramana-arm! Some recommendation about commit message could be a good addition to the additional recommendation section(along the area tag and others). The glibc suggestion looks good.

We will update https://github.com/apache/tvm/blob/main/docs/contribute/code_review.rst to add these guidelines so yes they will be part of the source base and our documentation.

How about we bring the current update in, then send an additional PR to add the recommendation about commit messages.

Mousius · September 6, 2021, 3:47pm

Hi @tqchen, overall I’m supportive of codifying this and moving forwards, however I’m curious how this fits into the current PR model with code owners. I agree that having some guidance on commit messages is a good foundational thing to include whilst we’re going through this exercise, it’s been incredibly helpful in other teams. My question is, if I raise a PR adding commit guidelines, do we not need to repeat this process to ensure consensus?

I would also suggest that it’s not a recommendation, I’ve had experience of code reviews rejecting such commit messages, which has lead to a much easier to understand history. This was even in the case of squashed commits, as when providing a post-incident review we often needed to refer back to individual patches in a PR to pin point a specific problem. In the case of the existing recommendations, you may not know the correct area tag and you may not be able to scope the PR further in a sensible way, but you can always be descriptive of your changes .

ramana-arm · September 6, 2021, 4:16pm

@Mousius - I agree that this should be more than a recommendation.

See this one for another motivating example.

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3fe617ccafd6f5bb33c2391d6f4eeb41c1fd0151

The change is about 5 lines relatively simple but the description clear for posterity and easy enough to analyse from a git log.

I think aspiring to clarity in the commit messages in the TVM project is certainly worth the effort for the future.

regards Ramana

ramana-arm · September 6, 2021, 4:17pm

@tqchen I’m fine with staging this but I think this is a point that needs to become part of the project development guidelines.

Ramana

gromero · September 7, 2021, 11:21am

Thanks a lot for putting it forward!

It’s awesome that 1) it’s encouraging committers to wait a bit more before merging a PR with major changes/additions so there is more time for people on different time zones to review the PR, and that 2) it suggests to avoid merging multiple unrelated changes into a single PR and split them into separate PRs.

I’d like to expanding a bit more on 2), because I think it’s really important and is related to other commit message issues.

Currently it’s not uncommon that fixes even somehow related to the main change but not directly related to it get added to a single commit associated to the main change, without going into a separate commit (in a patchset or even as a separate PR). This is not ideal and can cause confusion, usually making ‘bisect’ a hard task - or a nightmare. Moreover, sometimes the fix is not even mentioned in the commit message. That’s not a good practice too. Ideally a fix should land as separate commit in the tree.

On the other hand I’m aware that the current Github review mechanism has some caveats/limitations for dealing with patchset (like “not honoring patchsets” and squashing everything into a single commit).

Another issue with the current GH mechanism is that it’s good to keep the review process recorded, so a ‘git push --force’ is usually not used once the review starts, hence it’s hard to avoid additional commits like “Fix lint to please CI”, which will land in the tree as bullet comments in the body of a single commit message. I think there is no solution to that afaics.

Nonetheless, GH review mechanisms issues aside, I believe there is still room for improving the commit messages, specially regarding the a) title form and b) the message body quality; c) splitting huge changes and using a patchset plus d) adding a good commit message to the commits in the patchset - even if GH will squash them into a single commit the commit that lands will still make sense if the original commit messages are good; and e) avoid “bullet” commit message: “bullet” commit messages are not bad per se, but “bullet” commit messages without any description is likely as bad as commits without any description in the body.

In the same sense, even simple habits like using “classical” f) imperative mood, g) proper caps and period in not ending a commit title help. Currently there are even h) commits with no comments at all. I assume that the rational for this is that Github PRs keep all the info necessary to understand the commit, but that’s not always the case (information in PRs are usually dispersed), additionally it would be wonderful if one could simply understand a commit only using ‘git log’, without going back and forth between git cli to GH.

Specifically about how to split patches (and fixes), I think [1] from Linux docs contains good suggestions which might be useful for us, particularly sections “Describe your changes” [2] and "Separate your changes [3].

As Ramana said it can appear nitpicky, but it’s not, I agree and support his comments. Thus I think whatever we can do to improve the commit messages now we should do. Maybe starting to address some of the a-h points.

The preparation of that Guideline really turns the moment into a great one to discuss that topic and I’m happy to discuss more about it .

[1] https://github.com/torvalds/linux/blob/master/Documentation/process/submitting-patches.rst

[2] https://github.com/torvalds/linux/blob/master/Documentation/process/submitting-patches.rst#describe-your-changes

[3] https://github.com/torvalds/linux/blob/master/Documentation/process/submitting-patches.rst#separate-your-changes

tqchen · September 7, 2021, 12:41am

Thanks for great discussions. Because the current proposal itself contains a big set of changes as a result we followed a voting process. For followup improvements like a single section about code commit message structure, it is still great to have a broad set of discussion since they require buy-in from most community members but maybe we do not need to follow the same process.

Agree on what is being said about informative commit messages. As a concrete actionable item on the commit message guideline. How about we start a thread that brings forward a draft proposal. We should be able to bring that in once we reach consensus. The github mechanism allows us to edit the commit message when squashing the commits, so another option could be encourage committers to reorgnize the commit messages a bit(by deleting usless lint bullet pts)

gromero · September 7, 2021, 11:16am

Thanks for your suggestion on how to move ahead with the commit message guideline @tqchen I agree.

jroesch · September 10, 2021, 11:21pm

I think these kind of detailed recommendations would be a really good follow up, for bigger suggestions like this we can have a targeted conversation on a thread/PR and follow up from there. I think a related problem is just GH review tools are limited so some of these feel like symptoms of the tools more so then the desired state for most committers.

I think it would be good to discuss some of these problems wrt to code review process on GH today as well, but that can happen after these initial guidelines as well. I know there are conversations at Octo about improving the workfflow for large changes.

tqchen · September 22, 2021, 2:44pm

coming back to this. @ramana-arm @gromero would you be interested in sending a proposal that contains the sugestions on commit messages?

gromero · September 22, 2021, 2:58pm

Totally @tqchen I’ll prepare it working with @ramana-arm and the other folks interested and post a text to be reviewed by the community. Thank you!

ramana-arm · September 24, 2021, 11:09am

Yep that sounds good - I suspect @Mousius would also be interested in this.

Ramana