Occluding git merges

How to lose track of your work without realising it

Summary:

Git makes merging easy, but most tools around git aren't very informative about merges. A merge that clobbers the history of one or more of its parents can be a source of confusion at best, and a liability at worst. Have no fear, there's a script to detect them.

How to reproduce

git merge origin/shared-branch shared-branch
# screw up
git reset --hard HEAD
# Hmm, this seems alright now
git commit
git push origin shared-branch

Yes, it is definitely a case of PEBKAC. And I'm sure you would never do this. But that's not the issue here--the issue is that, if you or someone on your team does this, most of git's logging tools won't report it, and it's quite possible that it will go unnoticed for a long time.

Why git log and gitk won't report this (by default)

It helps to draw a set diagram of a two-way merge.

A diagram of a merge

A merge is a commit M that combines the changes present in two previous commits A and B. When combining, it resolves the changes present in both A and B, it includes the useful changes from both (iA, iB and the conflicted files c), and it may also discard bits from both (eA and eB). Finally, a merge commit can introduce changes of its own (m).

If you want to visualise the difference history of a two-way merge, you run into the problem that there are two sets of differences. The difference between M and A is {eA, m, c, iB}. The difference between M and B is {eB, m, c, iA}. The git log command will visualise this if you pass it the -m flag.

But what will it show if commit B is completely occluded? Well, the difference between A and M will be {m, eA}. The difference between A and B will be {iA, m, iB, eB}. c is of course, empty. A completely occluded merge commit will look very similar to a normal merge commit, containing much of the same files and differences. The biggest clue is that the difference between M and A is so small. And this difference will not be as pronounced when the merge only occludes a few files.

Why this is something to care about

Occlusions are invisible to most git tooling. git blame? Won't find it. git pickaxe? similarly useless. git bisect won't work either, because as far as git is concerned, the occluded branch represents a discarded part of the commit history.

A clumsy co-worker can effortlessly 'undo' a lot of work, with almost nothing to alert the other developers. A malicious committer can easily revert security fixes—the logs will make it seem as if they never happened.

Of course, occlusion can be a perfectly valid choice in a merge. There's even an option to make git do it without prompting you: git merge -s ours. In practice, I've seen this happen accidentally several times, and I've found at least one similar tale of woe.

How to detect some occluding merges

The easiest way to look for occluding merges at the ‘resolution’ of a commit: if there is a merge commit M that is completely equal to parent A, this means that parent B is occluded. However, if the committer makes additional changes to the merge commit, possibly because they're unaware that git sees the pending commit as a merge, this approach fails.

We can ‘zoom in’ by looking at the merge-base O. The merge-base is the first commit that is an ancester of both A and B.

A merge including two parents and their merge-base

Now we can define occlusion at file resolution. If a file is modified between B and O it ought to be modified between M and O as well, unless it is occluded, intentionally or not.

If it is modified in the difference between O and M, is the difference equal to that between O and A? It is also occluded.

A script that looks at all merge commits and performs the first test is available here. It only performs the first test, because currently it is fairly slow already, particularly with msysGit. And just like the example on this page, octopus merge commits are not handled correctly yet.