The proposal below is meant to be an initial draft for the Commit Message Guideline and hopefully it will help us to reach a consensus about the rules and standards that the Apache TVM Community wants to attain to ensure that only good commit messages get merged into the project, which is specially necessary as the codebase grows with more contributions from several developers and engineers around the world, with different backgrounds, and wanting to contribute changes to the manifold TVM components.
Because commit messages are an important aspect of the code quality for several reasons, it is essential for a long term open source project to ensure that they meet high standards. The importance of them conveying enough context and information about the code being changed (or added) will grow as the project grows and bad (poorly written) commit messages can affect negatively the code quality of future changes that would otherwise benefit from past good commit messages if they existed.
Beyond code itself, poorly written commits can also affect the community. For instance, by not providing a consistent/complete history and context for the changes to new people wanting to contribute to the project it can serve as a barrier for such new contributors as much more time will be spent when trying to understand what motivated a past critical change in the first place.
Another motivation to have good commit messages is to avoid depending on Github to understand the code history. Besides going back and forth between git CLI and Github being time-consuming (annoying), I guess nothing guarantees Github will hold the PR conversations / information forever.
The main users of the Commit Message Guideline would be committers and reviewers while reviewing PRs and also any contributors preparing a new commit/patchset/PR to be submitted for review. The idea is that these guidelines will become over time common practice as we get more and more used to them. I’m sure the initial effort will pay off. In that sense the Commit Message Guideline should not be considered merely as a recommendation.
Many (if not all) the proposals/items below for the Commit Message Guideline were already discussed to some extent by the time the Code Review Guideline was being prepared and voted and in the last Apache TVM Community Meeting in 2021 (November).
I’ll collect / consolidate the suggestions and comments as per the discussions and consensus that develop here and, as usual, I’ll prepare a proper RFC accordingly to be submitted after.
The proposal:
Commit message title:
- Use of imperative mood
- Proper use of caps at the beginning (uppercase for the first letter)
- No period at the end
- No Github username (with or without @ before it), names, or nicknames must be used
- It’s recommended that a tag is present as a hint about what component(s) of the code the commits “touches”, but that’s optional. Examples of tags are: [CI], [microTVM], and [TVMC]. Case must be observed in the tags in order to reduce the number of variants like, for instance, [Tvmc] and [tvmc] for [TVMC]. Tags help reviewers to identify PRs they can review and also help the release folks a bit when generating the release notes.
Commit message body:
- Commits without a body (empty body) are forbidden;
- The message body must attain a certain quality before getting merged. Reviewers can require and suggest enhancements. For instance, fully explaining what the code change or the code being added does. I think we need to be explicit here through some examples about what qualities are expected. Maybe we could use [0, 1] as the references, they are complete in my view but I’m not sure if it’s sound to copy/paste them. Thoughts?
- Try to avoid “bullet” commit message bodies: “bullet” commit messages are not bad per se, but “bullet” commit messages without any description is likely as bad as commits without any description in the body;
- No Github username (with or without @ before it), names, or nicknames must be used.
PR organization:
- Split the changes in a reasonable way, much in the sense as [2] explains, at least for the initial patchset used to create the PR.
For now I left the items regarding Github-specific issues out of this proposal, like the one about not squashing the patches in a PR before merging them because they seem to be challenging (Apache controlled iirc) as a first stab to resolve the commit message quality issues.