Hidden Code Mines
Code is unforgiving. As the reasonable human beings that we are, when we review code we both know what the author intends. But computers can’t wait to Well, Actually all over that code like a lonely Hacker News commenter:
Well Actually, Dave. I’m afraid I can’t do that.
Hal, paraphrased from 2001: A Space Odyssey
As an aside, imagine the post-mortem review of that code!
Code review is a tricky business. Code is full of hidden mines that lay dormant while you test just to explode in a debris of stack trace at the most inopportune time – when its in the hands of your users.
The many times I’ve run into such mines just reinforce how important it is to write code that is intention revealing and to make sure assumptions are documented via asserts.
Such devious code is often the most innocuous looking code. Let me give one example I ran into the other day. I was fortunate to defuse this mine while testing.
This example makes use of the Enumerable.ToDictionary
method that
turns a sequence into a dictionary. You supply an expression to produce
a key for each element. In this example, loosely based on the actual
code, I am using the CloneUrl
property of Repository
as the key of
the dictionary.
IEnumerable<Repository> repositories = GetRepositories();
repositories.ToDictionary(r => r.CloneUrl);
It’s so easy to gloss over this line during a code review and not think twice about it. But you probably see where this is going.
While I was testing I was lucky to run into the following exception:
System.ArgumentException:
An item with the same key has already been added.
Doh! There’s an implicit assumption in this code – that two repositories
cannot have the same CloneUrl
. In retrospect, it’s obvious that’s not
the case.
Let’s simplify this example.
var items = new[]
{
new {Id = 1},
new {Id = 2},
new {Id = 2},
new {Id = 3}
};
items.ToDictionary(item => item.Id);
This example attempts to create a dictionary of anonymous types using
the Id
property as a key, but we have a duplicate, so we get an
exception.
What are our options?
Well, it depends on what you need. Perhaps what you really want is a
dictionary that where the value contains every item with the given key.
The Enumerable.GroupBy
method comes in handy here.
Perhaps you only care about the first value for a given key and want to
ignore any others. The Enumerable.GroupBy
method comes in handy in
this case.
In the following example, we use this method to group the items by Id
.
This results in a sequence of IGrouping
elements, one for each Id
.
We can then take advantage of a second parameter of ToDictionary
and
simply grab the first item in the group.
items.GroupBy(item => item.Id)
.ToDictionary(group => group.Key, group => group.First());
This feels sloppy to me. There is too much potential for this to cover up a latent bug. Why should the other items be ignored? Perhaps, as in my original example, it’s fully normal to have more than one element for the key and you should handle that properly. Instead of grabbing the first item from the group, we retrieve an array.
items.GroupBy(item => item.Id)
.ToDictionary(group => group.Key, group => group.ToArray());
In this case, we end up with a dictionary of arrays.
UPDATE: Or, as Matt Ellis points out in the comments, you could use theEnumerable.ToLookupmethod. I should have known such a thing would exist. It’s exactly what I need for my particular situation here.
What if having more than one element with the same key is not expected
and should throw an exception. Well you could just use the normal
ToDictionary
method since it will throw an exception. But that
exception is unhelpful. It doesn’t have the information we probably
want. For example, you just might want to know, which key was already
added as the following demonstrates:
items.GroupBy(item => item.Id)
.ToDictionary(group => group.Key, group =>
{
try
{
return group.Single();
}
catch (InvalidOperationException)
{
throw new InvalidOperationException("Duplicate
item with the key '" + group.First().Id + "'");
}
});
In this example, if a key has more than one element associated with it, we throw a more helpful exception message.
System.InvalidOperationException: Duplicate item with the
key '2'
In fact, we can encapsulate this into our own better extension method.
public static Dictionary<TKey, TSource>
ToDictionaryBetter<TSource, TKey>(
this IEnumerable<TSource> source,
Func<TSource, TKey> keySelector)
{
return source.GroupBy(keySelector)
.ToDictionary(group => group.Key, group =>
{
try
{
return group.Single();
}
catch (InvalidOperationException)
{
throw new InvalidOperationException(
string.Format("Duplicate item with the key
'{0}'", keySelector(@group.First())));
}
});
}
Code mine mitigated!
This is just one example of a potential code mine that might go unnoticed during a code review if you’re not careful.
Now, when I review code and see a call to ToDictionary
, I make a
mental note to verify the assumption that the key selector must never
lead to duplicates.
When I write such code, I’ll use one of the techniques I mentioned above to make my intentions more clear. Or I’ll embed my assumptions into the code with a debug assert that proves that the items cannot have a duplicate key. This makes it clear to the next reviewer that this code will not break for this reason. This code still might not open the hatch, but at least it won’t have a duplicate key exception.
If I search through my code, I will find many other examples of potential code mines. What are some examples that you can think of? What mines do you look for when reviewing code?
Comments
26 responses