Rewriting Git Commit Messages

Posted by Ross Burton on March 6, 2018

So this week I started submitting a seventy-odd commits long branch where every commit was machine generated (but hand reviewed) with the amazing commit message of "component: refresh patches". Whilst this was easy to automate the message isn't acceptable to merge and I was facing the prospect of copy/pasting the same commit message over and over during an interactive rebase. That did not sound like fun. I ended up writing a tiny tool to do this and thought I'd do my annual blog post about it, mainly so I can find it again when I need to do it again next year...

Wise readers will know that Git can rewrite all sorts of things in commits programatically using git-filter-branch and this has a --msg-filter argument which sounds like just what I need. But first a note: git-filter-branch can destroy your branches if you're not careful!

git filter-branch --msg-filter has a simple behaviour: give it a command to be executed by the shell, the old commit message is piped in via standard input, and whatever appears on standard output is the new commit message. Sounds simple but in a way it's too simple, as even the example in the documentation has a glaring problem.

Anyway, this should work. I have a commit message in a predictable format (: refresh patches) and a text editor containing a longer message suitable for submission. I could write a bundle of shell/sed/awk to munge from one to the other but I decided to simply glue a few pieces of Python together instead:

import sys, re

input_re = re.compile(open(sys.argv[1]).read())
template = open(sys.argv[2]).read()

original_message = sys.stdin.read()
match = input_re.match(original_message)
if match:
    print(template.format(**match.groupdict()))
else:
    print(original_message)

Invoke this with two filenames: a regular expression to match on the input, and a template for the new commit message. If the regular expression matches then any named groups are extracted and passed to the template which is output using the new-style format() operation. If it doesn't match then the input is simply output to preserve commit messages.

This is my input regular expression:

^(?P<recipe>.+): refresh patches

And this is my output template:

{recipe}: refresh patches

The patch tool will apply patches by default with "fuzz", which is where if the
hunk context isn't present but what is there is close enough, it will force the
patch in.

Whilst this is useful when there's just whitespace changes, when applied to
source it is possible for a patch applied with fuzz to produce broken code which
still compiles (see #10450).  This is obviously bad.

We'd like to eventually have do_patch() rejecting any fuzz on these grounds. For
that to be realistic the existing patches with fuzz need to be rebased and
reviewed.

Signed-off-by: Ross Burton <ross.burton@intel.com>

A quick run through filter-branch and I'm ready to send:

git filter-branch --msg-filter 'rewriter.py input output' origin/master...HEAD

tags: tech