Nathan Hoad

On the subject of commit messages

June 7, 2014

Yesterday, I read an article about version control, describing some best practices. In general the article was more about Git and Github rather than version control in general, but in particular the article made a point of discussing what makes a good commit message. I think the article had good intentions, but the examples provided were still, very much, bad commit messages.

Try to keep them somewhat short and meaningful. Commit messages like “meh”, “fix”, “lol” and so on are useless. Your[sic] are writing to your future self and the other people working on the codebase. It’s important to be able to tell what a commit is all about from its message.

Sure, yeah, “meh” and “lol” are useless. No one would dispute that. I agree, you are writing to future you and future developers, and that it’s important to tell what a commit is all about from its message. That’s what makes the followup examples bad:

“Good”, “descriptive” commit messages

These are not good either. They’re better than “lol”, sure, but they can definitely be a lot better. How? Well, let’s go through a hypothetical solution to illustrate why commit messages like these are bad, and what an actual good commit message is.

The one wherein Bob writes a bad commit message

Say you have a program, and every 60 seconds, it uploads something to a server. This operation is quite slow, taking about 45 seconds:


import threading
import time


def upload_to_server():
    print "Uploading to server..."
    print time.sleep(45)


def main():
    while True:
        t = threading.Thread(target=upload_to_server)
        t.start()
        time.sleep(60)

if __name__ == '__main__':
    main()

Looks good, right? Right, sure, let’s pretend it is. You release it, it uploads things, whatever. A few months down the track, a manager or a customer, someone important, requests that we upload data a little more frequently. “Can we upload things every 30 seconds? It would make the reports feel a little more realtime”, something like that. Bob, one of the other developers on your team, goes ahead and makes that change:

changeset:   1:fd10e225e149
tag:         tip
user:        bob
date:        Sat Jun 07 19:43:14 2014 +1000
files:       foo.py
description:
Change upload to happen every 30 seconds.


diff --git a/foo.py b/foo.py
--- a/foo.py
+++ b/foo.py
@@ -11,7 +11,7 @@ def main():
     while True:
         t = threading.Thread(target=upload_to_server)
         t.start()
-        time.sleep(60)
+        time.sleep(30)
 
 if __name__ == '__main__':
     main()

Now, remember how I mentioned that this function takes 45 seconds to complete? By changing it to run every 30 seconds, Bob has created 15 seconds of overlap between two uploads, which may cause all sorts of issues with the data structures used internally in the process, as well as how the data ends up on the server. So it’s a bug, quite obviously.

Say no one notices this - it makes it onto the production system, and a month later, people start reporting the bug when they notice their data looks a little weird. You go through the changes since the last release, you see this, and you can tell that it’s a (very obvious) race condition introduced by Bob’s change.

“Why did Bob make this change?” you ask yourself, “I know, I’ll check what the commit message was, surely it will explain why he did this.”

Change upload to happen every 30 seconds.

“The code already told me that!”, you shriek at your computer. Not able to find anything else to explain why the change was made, you figure you could just ask Bob.

“Oh, gee… I don’t really remember.” Bob says, frowning slightly, “I know it was for a reason, but I can’t remember why.”

Well, shit. What do you do now? You have three options:

  1. Kill Bob and eat his brain to gain his knowledge.
  2. Revert the broken changeset.
  3. Add a bunch of safety checks such that only one thread uploads at a time.

So you do what anyone would do, and you eat his brain revert the broken changeset. Sure, you could add the safety checks, but in the interests of getting a fix out the door as soon as possible, you want to take the path of least resistance, so you just revert the broken changeset. But you know what else you do? You write a fucking good commit message for it.


changeset:   2:9dcbfffee9bc
tag:         tip
user:        nhoad
date:        Sat Aug 09 16:01:47 2014 +1000
files:       foo.py
description:
Revert fd10e225e149 as it introduced a race condition with uploading data
to the server.

The upload takes approximately 45 seconds, so performing the upload every
30 seconds creates some overlap between upload events.

The race condition resulted in duplicate records for a given minute, with
inconsistent numbers.


diff --git a/foo.py b/foo.py
--- a/foo.py
+++ b/foo.py
@@ -11,7 +11,7 @@ def main():
     while True:
         t = threading.Thread(target=upload_to_server)
         t.start()
-        time.sleep(30)
+        time.sleep(60)
 
 if __name__ == '__main__':
     main()

In another month’s time when that special someone comes around and asks Bob why reports are no longer as realtime, the lightbulb flashes. “That’s why I made that change!” and this time, Bob makes the change again, with appropriate safety checks and a commit message explaining why the frequency is increasing.

How would a good commit message have helped here, though?

Let’s say that in an alternate reality, Bob’s commit message instead read:

Change the upload to happen every 30 seconds because it makes reporting appear more realtime for users.

Then when you’re hunting down the cause of the bug and discover the changeset, you don’t immediately jump to reverting the changeset, because it’s a user-facing feature. You instead lend a little more weight to the option of adding safety checks. You can use this information to talk with the person who originally requested the change, asking how detrimental it would be for it to be reverted.

Imagine if our history books read “The USA bombed Japan in 1945” and nothing more. Wouldn’t you want to know why? This is the same thing. History explains the who, when, where, what, and, providing a good commit, the why.

If you’ve never been in the situation I described above, you fit in one of the following categories;

  1. Everyone you work with writes good commit messages 100% of the time.
  2. Everyone you work with has a perfect memory.
  3. You haven’t been working for long enough.
  4. You are full of it.

I’ve been in that situation a number of times, on both sides. Shit, I’ve been both people in the one situation! “Why did I do this?!?!?!” I cried, wishing it would all end. If you haven’t ever had this happen to you, then your memory is really bad or you’re really, really lucky.

History is important. Commit messages explain the reasoning behind a change you’ve made. They’re the historical explanation of why something happened.

That’s why Jean-Phillippe’s examples are still bad. Just look at them:

Of course, it’s not all black and white. There are situations where “I did X” messages are okay. You fixed some typos? “fixed a typo” is okay. You fixed an obvious syntax error? “syntax error, I’m an idiot” is okay. There’s also ugly or large changesets to consider. If you have a 5000 line changeset that you cannot break up for any reason, then sure, describe what it does at a high level. But still describe why. You can’t change 5000 lines without explaining why.

So, look, when you write a commit message, just stop and think. “Does this message fully convey the reason I have made this change? Does it fully explain the benefits? Would I hate to read this six months from now?”