[RFC][v2] Commit Message Guideline

This RFC replaces Commit Message Guideline

This RFC proposes the addition of the following commit message guidelines to pull_request.rst, under section Submit a Pull Request, below subsection Guidelines, as a subsection named “Commit Message Guideline”. The text in the second-last item in subsection Guidelines that mentions PR tags will also be extended to mentioned this guideline.


Commit Message Guideline

Apache TVM uses the Github (GH) platform for patch submission and code review via Pull Requests (PRs). The final commit (title and body) that is merged into the Apache TVM main tree is composed from the PR’s title and body and must be kept updated and reflecting the new changes in the code as per the reviews and discussions. The PR’s title and body are also checked automatically by a linter tool and some rules (see below) are enforced.

Hence, although these guidelines apply essentially to the PRs’ title and body messages, because GH auto-generates the PR’s title and body from the commits on a given branch, it’s recommended to follow these guidelines right from the beginning, when preparing commits in general to be submitted to the Apache TVM project. This will ease the creation of a new PR, avoiding rework, and also will help the review.

The guidelines in this document are very similar to the rules used by other open source projects, like gcc, LLVM, and Linux.

The rules below will help to achieve uniformity that has several benefits, both for review and for the code base maintenance as a whole, helping you to write commit messages with a good quality suitable for the Apache TVM project, allowing fast log searches, bisecting, and so on.

PR/commit title:

  • Guarantee a title exists (enforced);
  • Don’t use Github usernames in the title, like @username (enforced);
  • Check if a tag should be present as a hint about what component(s) of the code the commits “touch”. For example [BugFix], [CI], [microTVM], and [TVMC]. Tags go between square brackets and appear first in the title. Tags help reviewers to identify the PRs they can/want to review and also help the release folks when generating the release notes (enforced);
  • Use an imperative mood. Avoid titles like “Added operator X” and “Updated image Y in the CI”, instead use the forms “Add feature X” and “Update image Y in the CI” instead;
  • Observe proper use of caps at the beginning (uppercase for the first letter) and for acronyms, like, for instance, TVM, FVP, OpenCL. Hence instead of “fix tvm use of opencl library”, write it as “Fix TVM use of OpenCL library”;
  • No period at the end of the title is necessary.

PR/commit body:

  • Guarantee a body exists (enforced);
  • Don’t use Github usernames in body text, like @username (enforced);
  • Avoid “bullet” commit message bodies: “bullet” commit message bodies are not bad per se, but “bullet” commit messages without any description or explanation is likely as bad as commits without any description, rationale, or explanation in the body;

For minor deviations from these guidelines, the community will normally favor reminding the contributor of this policy over reverting or blocking a commit / PR.

Commits and PRs without a title and/or a body are not considered minor deviations from these guidelines and hence must be avoided.

Most importantly, the contents of the commit message, especially the body, should be written to convey the intention of the change, so it should avoid being vague. For example, commits with a title like “Fix”, “Cleanup”, and “Fix flaky test” and without any body text should be avoided. Also, for the review, it will leave the reviewer wondering about what exactly was fixed or changed and why the change is necessary, slowing the review.

Below is an example that can be used as a model:

[microTVM] Zephyr: Remove zephyr_board option from build, flash, and open_transport methods

Currently it’s necessary to pass the board type via ‘zephyr_board’ option to the Project API build, flash, and open_transport methods.

However, since the board type is already configured when the project is created (i.e. when the generate_project method is called), it’s possible to avoid this redundancy by obtaining the board type from the project configuration files.

This commit adds code to obtain the board type from the project CMake files, removing this option from build, flash, and open_transport methods, so it’s only necessary to specify the ‘zephyr_board’ option when calling generate_project.

This commit also moves the ‘verbose’ and ‘west_cmd’ options from ‘build’ method to ‘generate_project’, reducing further the number of required options when building a project, since the ‘build’ method is usually called more often than the ‘generate_project’.

If a PR is created from more than one commit (a patchset), then ideally the changes should be split into commits in a reasonable way. For instance, if it’s a fix and so it also adds a regression test, it’s a good practice to split the changes having one commit for the fix and another one for the test, applying the rules and recommendations in this guideline to each commit.

After a new PR is created and the review starts it’s common that reviewers will request changes. Usually the author will address the reviewers’ comments and push additional commits on top of the initial ones. For these additional commits there is no recommendation regarding the commit messages. The only recommendation is that if the additional commits render the PR title and/or body outdated then the PR title and body must reflect the new changes in the code and be updated accordingly (remember that the PR title and body will be used to compose the final commit message that will land in the main tree).


There are some prerequisites and itens to be discussed before accepting this Commit Message Guideline, which are:

  1. Committers agree on merging code to the main tree using the tvm-bot, because the bot will correctly take care of using the PR’s title and body (and parsing them) when preparing the final commit that will land into the TVM’s main tree, regardless if the PR is composed of one or more commits? Currently GH workflow is confusing because it has different behavior depending on the number of commits composing the PR: if there is only one commit GH will use the commit title and body to generate the final commit to get merged to the main tree; otherwise, if there is more than one commit, it will use the PR’s title as the final commit title and as for the final commit body it will be al concatenation of all the commit titles and bodies, sequentially, which can get quite messy since even the additional commits - the ones added as the outcome of the review process - will get concatenated.

  2. Will tvm-bot keep the commit message authorship?

  3. How is access control applied to the tvm-bot? Who is allowed exactly to merge the PR using the tvm-bot? Just committers?

  4. Should we block on a lint error against the PR (enforced rules) or just warn the user? I believe we need to block on enforced rules.

  5. Line breakage must be applied to the text in the PR’s body, enforcing 72 characters per line (tvm-bot will guarantee that)?

cc @tqchen @driazati @Mousius @manupa-arm @tkonolige @ramana-arm @leandron @areusch @alanmacd

tvm-bot is currently not working and blocked on a couple issues (permissions changes in the repo that are causing hard-to-debug errors and tvm-bot merged PRs don’t run GitHub Actions jobs), but there is an easy way to vastly improve commit messages in the meantime.

When committers go to “Squash and Merge” a PR, they should not just click the two buttons, but instead:

  1. Click “Squash and Merge”
  2. (the important step) Go to the PR body, click “Edit”, copy-paste the markdown text (excluding any @s)
  3. Paste this into the squash and merge box as the commit message
  4. Click the button to actually merge the PR

This takes only a couple seconds but I suspect lots of people feel uneasy about editing this box, but I think it should become common practice, and even while we discuss finer points from this RFC doing this would right now do away with messages like this one.

@driazati I totally second you on this point. I already do that myself for the PRs I merge and it works quite well. Thanks for citing this alternative that can be used meanwhile.

Regarding the tvm-bot issues themselves, although I don’t have a vast experience on this front I volunteer myself to try to help fixing these bot issues.

To the actual content of the RFC:

  1. I think we should stick with squash-and-merge, so the commits on the PR don’t actually matter. They’re not very visible on GitHub and new contributors might have trouble messing with git to amend their messages vs just clicking “Edit” on the PR box on GitHub.
  2. tvm-bot already (should) merge all co-authors that GitHub has detected for a PR
  3. Access right now is PR author + committers, anyone else gets rejected. There also must be an approval on the most recent commit for anything to work (but we could revisit this in the future if it’s cumbersome)
  4. I think we definitely should block, at least for the core rules, or else they become easy to ignore
  5. We could probably implement this (and other agreed upon rules) as an automatic pre-processing step so people don’t have to worry about it. I want to make sure whatever rules we use we don’t end up with human-linters going back and forth with PR authors on standards if we can avoid it.

The main questions going forward are the set of rules to enforce (the ones listed here seem pretty reasonable to me) and how we implement that. For tvm-bot to work as the sole way to merge PRs we need some way to block the merge button on GitHub (or to educate TVM committers to not use it) which isn’t clear to me how we could do that well ATM. We can address the GitHub Actions jobs on main once https://issues.apache.org/jira/browse/INFRA-23548 is looked at by ASF infra.

Hi David. Thanks a lot for your review and suggestions!

  1. I think we should stick with squash-and-merge, so the commits on the PR don’t actually matter. They’re not very visible on GitHub and new contributors might have trouble messing with git to amend their messages vs just clicking “Edit” on the PR box on GitHub.

I agree that the commit messages associated to the commit(s) on the PR should not actually matter (by this I understand that we’re saying they will be discarded and only the PR tile and body will be used to compose the final commit message that lands in to the main tree). But could you clarify what do you mean exactly in this case by “stick with squash-and-merge” ? Because isn’t “squash-and-merge” what we are currently doing?

  1. tvm-bot already (should) merge all co-authors that GitHub has detected for a PR

Cool. Thanks for clarifying it!

  1. Access right now is PR author + committers, anyone else gets rejected. There also must be an approval on the most recent commit for anything to work (but we could revisit this in the future if it’s cumbersome)

Got it. So does it mean that currently it’s already possible that a PR author merges its own PR?

  1. I think we definitely should block, at least for the core rules, or else they become easy to ignore

@leandron and I discussed it previously and we agree. Let’s get more inputs tomorrow on the community meeting.

The main questions going forward are the set of rules to enforce (the ones listed here seem pretty reasonable to me) and how we implement that. For tvm-bot to work as the sole way to merge PRs we need some way to block the merge button on GitHub (or to educate TVM committers to not use it) which isn’t clear to me how we could do that well ATM. We can address the GitHub Actions jobs on main once https://issues.apache.org/jira/browse/INFRA-23548 is looked at by ASF infra.

Could we (or is it reasonable) to block the merge – similarly to how it happens currently if a CI test fails – if the linter fails (on agreed enforced rules only) but if the linter passes it allows the committer to decide if the final squash-and-merge is done manually as you suggested ( [RFC][v2] Commit Message Guideline - #3 by driazati ) or using the tvm-bot? So we actually allow the merge via two options, at least while we perfect the tvm-bot?

We discussed this in the 10th August Meetup:

  • @driazati presented a demo of the manual workflow in copying a given PR message to be the body of the actual commit message going at “squash and merge” time;
  • @gromero presented an overview of the v2 proposal of the commit messages guideline
    • In previous discussions, we approached how automated tooling could help us to achieve and verify commit messages, mostly driven by @driazati
    • @leandron commented that commit messages sanity is fundamental for our project, so that we can track changes being pushed into the repo, generate release notes and make the reviewer work more productive by setting expectation on what should be on the patch being reviewed;
    • @areusch pointed the proposed guideline should be submitted as an official RFC in the tvm-rfcs repo triggering a [VOTE] thread;
    • Matthew Barrett points out there is a need to generate some guidance around what the project considers a self contained unit of work that make part of a PR - which is a related topic worth exploring but out of scope for the current “Commit Messages RFC”
    • Next steps: @gromero to send an official RFC in the tvm-rfcs repository, to trigger the change in process, that will materialise in a change in the documentation.
2 Likes

I think we covered most of this in the meeting but just for completeness to your questions @gromero:

I was thinking about the other types of merge options (e.g. rebase, merge commit) but I think we’re in agreement here (in that we should keep squashing).

“Yes” but it doesn’t work due to some bugs, my hope is that they will be mostly fixed after https://github.com/apache/tvm/pull/12361 is done and merged

This makes sense to me, a lint rule would be pretty simple to implement as part of CI.

2 Likes

Official RFC PR sent to: https://github.com/apache/tvm-rfcs/pull/88

1 Like