Commit Message Guideline

The proposal below is meant to be an initial draft for the Commit Message Guideline and hopefully it will help us to reach a consensus about the rules and standards that the Apache TVM Community wants to attain to ensure that only good commit messages get merged into the project, which is specially necessary as the codebase grows with more contributions from several developers and engineers around the world, with different backgrounds, and wanting to contribute changes to the manifold TVM components.

Because commit messages are an important aspect of the code quality for several reasons, it is essential for a long term open source project to ensure that they meet high standards. The importance of them conveying enough context and information about the code being changed (or added) will grow as the project grows and bad (poorly written) commit messages can affect negatively the code quality of future changes that would otherwise benefit from past good commit messages if they existed.

Beyond code itself, poorly written commits can also affect the community. For instance, by not providing a consistent/complete history and context for the changes to new people wanting to contribute to the project it can serve as a barrier for such new contributors as much more time will be spent when trying to understand what motivated a past critical change in the first place.

Another motivation to have good commit messages is to avoid depending on Github to understand the code history. Besides going back and forth between git CLI and Github being time-consuming (annoying), I guess nothing guarantees Github will hold the PR conversations / information forever.

The main users of the Commit Message Guideline would be committers and reviewers while reviewing PRs and also any contributors preparing a new commit/patchset/PR to be submitted for review. The idea is that these guidelines will become over time common practice as we get more and more used to them. I’m sure the initial effort will pay off. In that sense the Commit Message Guideline should not be considered merely as a recommendation.

Many (if not all) the proposals/items below for the Commit Message Guideline were already discussed to some extent by the time the Code Review Guideline was being prepared and voted and in the last Apache TVM Community Meeting in 2021 (November).

I’ll collect / consolidate the suggestions and comments as per the discussions and consensus that develop here and, as usual, I’ll prepare a proper RFC accordingly to be submitted after.

The proposal:

Commit message title:

  • Use of imperative mood
  • Proper use of caps at the beginning (uppercase for the first letter)
  • No period at the end
  • No Github username (with or without @ before it), names, or nicknames must be used
  • It’s recommended that a tag is present as a hint about what component(s) of the code the commits “touches”, but that’s optional. Examples of tags are: [CI], [microTVM], and [TVMC]. Case must be observed in the tags in order to reduce the number of variants like, for instance, [Tvmc] and [tvmc] for [TVMC]. Tags help reviewers to identify PRs they can review and also help the release folks a bit when generating the release notes.

Commit message body:

  • Commits without a body (empty body) are forbidden;
  • The message body must attain a certain quality before getting merged. Reviewers can require and suggest enhancements. For instance, fully explaining what the code change or the code being added does. I think we need to be explicit here through some examples about what qualities are expected. Maybe we could use [0, 1] as the references, they are complete in my view but I’m not sure if it’s sound to copy/paste them. Thoughts?
  • Try to avoid “bullet” commit message bodies: “bullet” commit messages are not bad per se, but “bullet” commit messages without any description is likely as bad as commits without any description in the body;
  • No Github username (with or without @ before it), names, or nicknames must be used.

PR organization:

  • Split the changes in a reasonable way, much in the sense as [2] explains, at least for the initial patchset used to create the PR.

For now I left the items regarding Github-specific issues out of this proposal, like the one about not squashing the patches in a PR before merging them because they seem to be challenging (Apache controlled iirc) as a first stab to resolve the commit message quality issues.

[0] GlibcGit - glibc wiki

[1] https://github.com/torvalds/linux/blob/master/Documentation/process/submitting-patches.rst#describe-your-changes

[2] https://github.com/torvalds/linux/blob/master/Documentation/process/submitting-patches.rst#separate-your-changes

3 Likes

cc @tqchen @jroesch @leandron @Mousius @manupa-arm @ramana-arm @tgall_foo @denise

What are the guidelines for messages in commits that come after the initial/main ones? For example, commits to fix lint issues.

Thanks @gromero it would be great to add a few examples to put some of the guideline items into context. Please also mark this into pre-rfc category so more people can notice it

These guidelines are definitely good to have and I think we should codify them in our docs! One big problem we have today is clicking the merge button on GitHub defaults to a bad commit message which [RFC] Allow merging via PR comments should fix but would also be a workflow change. We can enforce some rules via automation, but getting people to change their behavior regarding these kind things is tricky.

With prescriptive rules like this they’re not really enforceable. For one lots of reviewers won’t follow a workflow outside their usual one (i.e. reviewing for commit language), and a review comment can be pretty costly, especially if the reviewer/author aren’t in the same time zone. Is it really worth it to incur this delay for something simple like a change in punctuation in the PR title? Alternatively, the onus could be entirely on the reviewer to edit the message as necessary to conform to the guidelines (but this makes reviewing harder).

I think the ones we can do automatically via the mergebot RFC is ensure:

  • there is at least one ‘tag’ in the title ([ci], microtvm, etc)
  • that the body is not empty

The commits on PRs should not have any bearing on the merged commit message (so no rules apply). They can be helpful for review, but the final message should be based on the PR title + PR body since that is where people usually consolidate the actual information (and it’s what GitHub surfaces by default, whereas you have to go digging for commits).

Hi Tristan. Good question :wink:

In a sense that issue could be put into the class of the issues I mentioned as “Github-specif issues”, because to me ideally such a comments should never exist in the first place. They exist in my understanding primarily because we want to keep the PR conversations (specially regarding the review process) and a git push --force, which is necessary if we amend the initial commits instead of adding a new commit, will be necessary and that will lead to the conversations vanishing sometime after the git push --force (that’s because of git gc).

OTOH overt time I did realize that is not always the case that contributors void push force and eventually some will do it. BTW, we don’t have any rule / guideline to encourage (or dis-encourage it). Let me know if besides keeping the conversations in the PR (which is ultimately a GH limitation I think) there are other advantages on not force pushing in the review process we currently use on GH.

Hence although I do recognize that such additional commits squashed and merged into the final commit tremendously pollute the final commit (how about the CI re-triggers which are so common heh) I don’t have any suggestion for now on how to get rid of them. There options like the maintainer working out the final commit before merge to get rid of them (I think Tqchen suggested it once if I’m not mistaken), but that also can add a burden to the review process, so… I don’t know. We need to discuss more about it.

Hence, assuming we’ll keep the additional commits (commits that go after the initial/main ones), I think that for simple things like a CI retrigger and lint fixes a simple trivial message is ok. I think that enforcing more than that will be tedious for the reviewers and the authors.

However it might be the case that the initial commit message says something like "The buffer size here is set to X because … " and after one review it becomes clear that it should be larger. In cases like that I think that the additional commit message addressing the comment from the reviewer should mention why X doesn’t hold anymore and now Y is being used instead (If git push “allowed” then the initial commit message could be amend in the very first place).

So that’s my suggestion: add a guideline saying that non-trivial additional commit messages should try to capture the essence of the additional changes taking into account what/how it affects the initial commits.

What do you think?

@tqchen Thanks a lot for input and support. I’ve changed the category as pre-RFC and I’ll add a few examples to the initial test. That’s indeed necessary I agree.

@driazati Thanks for pointing that RFC you posted. I totally missed it. Pretty good discussion in there. Yeah I’m aware about the merge button on GH being heavily guilty for a couple of issues associated with the bad commit messages, like title concatenation, etc.

Right, it’s tricky. So although GH can be blamed for many aspects of the bad commit messages currently in the tree, there is a human/behavioral aspect that I firmly believe that needs to be improved and, above all, can be improved :slight_smile: I thought about this guideline as a first step for improving that aspect. That’s why I also said that I would at first “left the items regarding Github-specific issues out of this proposal”, because I wanted to focus on that tricky aspect in the end. In that sense I’m also happy that your RFC is addressing other GH and automation aspects more closely.

I see. Thanks for letting me know what we can achieve (possibly) by automation here, I’m aware you’re working hard on the front. That’s really good. I agree that enforcing these rules might be hard, so maybe we should put these 3 rules (only?) as recommendations?

Now, I do think that the reviewers/committers/maintainers must be aware of the code/commit guidelines and try to follow them at least when they are submitting code, and use the guidelines with parsimony when reviewing code submitted by other authors/new contributors. For instance, the code will need to get fixed anyways, so another CI test/review round is unavoidable, hence just let the user know that the title can be improved to match the guidelines on the next version to be pushed. That’s a quite simple ask that can help a lot a behavioral change. I think the onus should not be on the reviewers, and I think that an ask like that would avoid it: it’s a submitter’s duty to write a good commit message, just like a correct/good code.

Yeah I agree. When I replied to Tristan above I kind was saying that there would be no way to address that issue so went for a few suggestions, but then I was not aware of your RFC that will (hopefully!) address it. In that case, I would leave any recommendation about the additional commit out of this guideline @tkonolige

@driazati In that case are you considering that the submitter will be able to change the initial/original PR body, using a git push --force for instance, after an amend?

1 Like

I definitely agree we should try these suggestions and adding structural comments like these in existing reviews makes sense.

The PR body can be changed at any time after the PR is up by editing it in the UI, but this is fine. The PR body should reflect the current state of the PR, including all changes that have been made since it was originally uploaded, so the commit message ends up being the PR title + body whenever it was merged (with the RFC above or if committers take care to do this manually today).

OTOH overt time I did realize that is not always the case that contributors void push force and eventually some will do it. BTW, we don’t have any rule / guideline to encourage (or dis-encourage it). Let me know if besides keeping the conversations in the PR (which is ultimately a GH limitation I think) there are other advantages on not force pushing in the review process we currently use on GH.

I think some people prefer that new changes to a PR come as new commits so that they can review them on a per-commit basis instead of having to re-review the entire PR.

So that’s my suggestion: add a guideline saying that non-trivial additional commit messages should try to capture the essence of the additional changes taking into account what/how it affects the initial commits.

I think this is a reasonable thing to ask for, but I’m not sure when/how these additional changes should be merged into the final commit message.

Thanks @gromero for taking this initiative.

I would actually push us to take a pragmatic route to enforce these (kind of agreeing @driazati ) given the distributed nature of the TVM/OSS project, failing that we fallback to being at least a “guideline” – which we dont have at the minute :slight_smile: .

I’ve been spending sometime thinking about this how to automate this (like in a way a linter would perform) – In the past, we have seen majority of review cycles are spent fixing commas, spaces, etc – which got fixed using a linter in the CI. (For more info : [CI][LINT] Enabling clang-format based lint checks)

With that spirit, I would like to suggest something, though I admit I have no experience in using this (hoping someone here might have used this :slight_smile: ) , Gitlint. I think we can enable this in the CI, fairly easily depending on the collective opinion of the community.

I would like to second this. I personally worked on PRs with varying complexities and it is not practical to assume a PR to always contain a single commit. However, one thing we might use to solve this would be use “fixup!” or “squash!” commits that the above mentioned tool (Gitlint) can ignore and could be easily removed from the commit message using --autosquash. One thing I dont know is whether/how we can configure Github to use --autosquash.

Assuming if we can find out a way to use --autosquash with Github, we might be able to enable linter for commit messages, because the above tool can be made to ignore fixup! or squash! commits.

(NOTE : above recommendation of the tool is just based on some short research I did, thus, open for any other alternatives)

cc : @masahi @Lunderberg

Thanks @gromero for starting this RFC! I think sorting out commit messages will have a very good impact down the line in organising the project as a whole.

Adding automation to check/enforce this is a great and I support it very much, but I think the important thing at project level as a result of this RFC is that during review we can point authors of PRs to those guidelines so that we can politely ask for better commit messages.

I think there is a balance here to accommodate “ways people like to organise their own work” with the minimum we - as an open source project - need to guarantee some uniformity and quality to the contributions we merge. This is what this RFC is trying to achieve IMHO.

On the topic of “stacking new commits on top of a PR”, I feel it is important that such commits are organised in a way that they mean something, so that they can have a commit message that means something more than e.g. “fix lint”, “fix test” or “cleanup”. These changes should be merged into their relevant commits within a PR.

In terms of how to concretely implement the changes, I think we should make this part of the documentation, near the committer/reviewer guidelines, similar to what other projects e.g. LLVM do.

I think this is a great suggestion @manupa-arm, so that we can codify the bare minimum and save a lot of reviewer’s time in pointing out empty/not tagged commit messages.

@gromero thanks for initiating this conversation. I think this is an important topic to consider as the TVM contributions are growing. Having a guideline that the community has accepted is important so the reviewers could use as a reference to make decisions. But more importantly, I totally agree with @driazati that without some sort of tooling it is very challenging to enforce those guidelines and it would also put too much responsibilities on reviewers.

Looking briefly at @manupa-arm’s suggestion, I think Gitlint seems to be a good option to adopt in TVM.

Regarding the guideline/tooling, in my experience I see that people spend a good time on writing the PR description and usually PR description includes enough information to get a good understanding of the commit after merge. I wonder if there’s a way to add the PR description as part of commit message (or make it the commit message).

1 Like

Speaking from previous experience, having the PR title and description become the commit message was super helpful. Many developers would use the PR description markdown support to write detailed PR descriptions, complete with sections, formatting, and links where needed, as well as having references to related bugs or work items included in the commit message by the tool. I know Azure DevOps supports this and I believe other platforms do also, but it doesn’t appear that is an option on GitHub currently. From my experience it works very well, especially as projects get larger. Hopefully this is something that is added in the future.

@gromero Was there a formal RFC proposed?

It would be great to update the status of this work as to whether its still in initiative that you’d like to move forward.

1 Like

@manupa-arm Hi! Yeah, now that the merge bot code is in place this RFC is unblocked and I’d like to move forward :slight_smile: I’ll prepare a v2 and post it soon.

1 Like

v2 posted to [RFC][v2] Commit Message Guideline