A recent wry tweet by @bcrypt really tickled my funny bone:
gitcoin: the author of the commit sha1 with the longest prefix of 0’s in your repository is now the project maintainer
The genius in the tweet is how it draws a comparison to Bitcoin’s approach to achieving distributed consensus with achieving consensus on choosing a project maintainer.
With Bitcoin, there’s a proof-of-work algorithm that relies on generating SHAs until you find one with a certain number of leading zeros. Git commit SHAs could perhaps serve a similar purpose.
Is this a good way to pick a project maintainer? Probably not. But then again, it’s not that far off from how I make most important life decisions. If your project wants to take a walk on the wild side, I’ve got just the command for you.
A simple solution
Run the following command in a Git repository and it’ll return the name of the author, the commit date, and the SHA of the commit that has the lowest SHA sorted lexicographically.
git log --pretty=format:"%H %ad %an" | sort | head -n1
Or, if you prefer a Git alias:
coin = !git log --pretty=format:'%H %ad %an' | sort | head -n1
The SHA that results will have the most leading zeros in the repository. There may be other commits with the same number of leading zeros, but for the sake of this thought exercise, I’ll just pick the one that’s sorted first.
For those not familiar with the
git log command, there are a gaggle of options. I’ll break down this specific invocation.
--pretty=format option takes a custom format string that specifies the contents of the output.
%H is the commit SHA.
%ad is the commit date and
%an is the author name.
We pipe that to the
sort command. Since no two SHAs can be the same, we don’t have to worry about sorting on just the first column. We can just sort using the entire line as the sort key. Then we use
head -n1 to pluck the first item.
It’s possible that there won’t be any commits with leading 0s, but I ignore that for now. I figure the commit with the lowest SHA sorted alphabetically fits with the spirit of the idea.
Since github.com runs on Ruby on Rails, I thought it’d be fun to try it out on https://github.com/rails/rails. I cloned the repository to my machine and ran
git coin on it. Here’s the output (SHA truncated for presentation purposes):
000121e8 Thu Nov 15 23:10:45 2012 +0200 Agis Anastasopoulos
Congratulations Agis! You are the new maintainer of Rails!
Not so fast!
I know what some of you are thinking, “You are ridiculous. This is a waste of time.” To those I say, hold my beer because I’m not done yet.
Others who are familiar with Bitcoin’s consensus protocol are thinking, “This is not how the protocol works. It’s not about choosing the lowest sorted SHA, it’s about reaching a target number of leading zeros.” To those I say, you’re taking this too seriously!
Even so, in anticipation of all the “Well, Actually” responses I’m sure to receive, I’ll address this fair point.
With Bitcoin, the first miner to generate a SHA with the target number of leading zeros is the one to add their block to the global blockchain.
I’m hand waving a bit here for the sake of brevity. The important point is that it’s not the block with the lowest SHA. It has nothing to do with the sorting of SHAs.
Over time, the protocol compensates for the global increase in computing power by increasing the number of leading zeros in the proof of work target. That way a block is added roughly every ten minutes no matter how fast computers get and no matter how many computers are mining.
If we translate this to the gitcoin idea, we probably want to look at the first commit to reach the most number of leading zeros.
For example, say that the current maintainer was chosen because of a commit with a SHA that has two leading zeros. The next maintainer is chosen by the commit that has three (or more) leading zeros. The next maintainer after that is chosen by the first commit with one more leading zero than the commit that chose the previous maintainer. And so on.
In other words, every time a new maintainer is chosen by this protocol, the target number of leading zeros increases by one. The implication is that over time, maintainers chosen by this project will spend more and more time maintaining the project. Not sure that’s necessarily a desirable trait or not.
The shell script to find the maintainer with these rules is considerably more complex than my previous script. This is why I originally wanted to stop with that script and call it a day. Also, I’m lazy.
Not to mention, my background is primarily with Windows so my Unix-fu is fairly weak. However, my time at GitHub working with Git has helped me exercise those muscles quite a bit more than I did before. So I thought it’d be fun to give it a shot. Here’s the script I came up with:
TZ=UTC git log --pretty=format:'%H%x09%ad%x09%an' --date=iso-local | grep ^0.* | sed -E 's/(0+)(.*)/\1\t\1\2/' | sort -k1,1 -k3,3r | tail -n1 | cut -f 2,3,4
And the award goes to…
When I run this against the Rails repository, it outputs (again, SHA truncated for presentation purposes):
00050dfe 2006-04-09 21:27:32 +0000 David Heinemeier Hansson
Sorry Agis, this David Heinemeier Hansson person is now the Rails maintainer! I hope David accepts this responsibility seriously.
Uh, still not there.
If you read an earlier version of this post, you’ll note I declared DHH the maintainer of Rails. But Jean-Jacques Lafay noted in the comments to this post that I need to look at the leading zeros of the SHA when written in binary form. Whoops!
This makes a lot of sense, when you think about it. Under my original implementation, every time we choose a new maintainer, we increased the difficulty in choosing the next maintainer by 16 times. When we look at leading zeros in binary form, we only double the time.
Fortunately, the correction to my script is pretty simple, I need to grab all the zeros (if any) and the first non-zero character when creating the sort key. Any characters after that won’t change the leading zeros.
001b have the same number of leading zeros when expressed as binary. But
00b do not have the same number of leading zeros.
So here’s the updated script:
TZ=UTC git log --pretty=format:'%H%x09%ad%x09%an' --date=iso-local | sed -E 's/^(0*[1-9a-f])(.*)/\1\t\1\2/' | sort -k1,1 -k3,3 | head -n1 | cut -f 2,3,4
And once again, Agis is the new maintainer for Rails!
The excrutiatingly detailed breakdown
Let’s break this down piece by piece for those of you like me who don’t eat and breath shell scripting.
The first thing we do is set the local timezone to UTC (
TZ=UTC) so we can sort by date and compare apples to apples.
git log --pretty=format:'%H%x09%ad%x09%an' --date=iso-local
Just like before, we’re running a git log command. It looks ugly, but all I’m doing here is using tab character
%x09 in place of spaces. That’ll come
in handy later. I also specify that the date format should be
iso-local. This provides a date that’s sortable lexicographically. We’ll need that later too.
sed -E 's/^(0*[1-9a-f])(.*)/\1\t\1\2/'
Sed is a powerful command used to perform text transformations on an input stream. In this case, we’re using the
s/ command which is a regex replacement. The
-E indicates that sed should use extended regular expressions. What I’m doing here is extracting the consecutive sequence of
0s that the SHA starts with as a new column in the output.
So if the
git log command we ran earlier returned something like this (SHAs and name truncated for presentation purposes):
005371e1 2004-12-01 13:59:16 +0000 David
0daa29ec 2004-12-01 13:18:51 +0000 David
08a2249e 2004-11-26 02:16:05 +0000 David
Piping this output to this sed expression results in (name truncated for brevity):
005 005371e1 2004-12-01 13:59:16 +0000 David
0d 0daa29ec 2004-12-01 13:18:51 +0000 David
08 08a2249e 2004-11-26 02:16:05 +0000 David
That format is pretty handy because we can sort this by the first column. This sorts commits from those with the most leading zeros to the least.
This will also group all SHAs with the same number of leading zeros together. Then we can sort by the date column to find the first commit in any such group.
sort -k1,1 -k3,3
Does exactly that. One thing that tripped me up when I first worked on this is I thought I should be able to
sort -k1 -k3. The
-k option specifies a sort key. By default, when you specify a column, it takes that column and all columns after it as the sort key. Thus
-k1 is pretty much equivalent to not specifying a sort key at all as it sorts by the whole line.
Fortunately, you can specify an end column for the sort key using the comma. So
-k1,1 sorts just by the first column. Whereas
-k1,3 would take the first three columns as a sort key.
Now that we have the proper sort in play, we just need to take the first entry. this is the oldest commit with the most leading zeros.
cut -f 2,3,4
And finally, we don’t need the leading zeros column in the final output so I run the
cut command and only keep columns 2, 3, and 4. This is where inserting the tabs before comes in handy. By default,
cut uses the tab character as a delimiter.