January 2012 Blog Posts
Back in November, someone asked a question on StackOverflow about converting arbitrary binary data (in the form of a byte array) to a string. I know this because I make it a habit to read randomly selected questions in StackOverflow written in November 2011. Questions about text encodings in particular really turn me on.
In this case, the person posing the question was encrypting data into a byte array and converting that data into a string. The conversion code he used was similar to the following:
string text = System.Text.Encoding.UTF8.GetString(data);
That isn’t exactly their code, but this is a pattern I’ve seen in the past. In fact, I have a story about this I want to tell you in a future blog post. But I digress.
The infamous Jon Skeet answers:
You should absolutely not use an Encoding to convert arbitrary binary data to text. Encoding is for when you've got binary data which genuinely is encoded text - this isn't.
Instead, use Convert.ToBase64String to encode the binary data as text, then decode usingConvert.FromBase64String.
Yes! Absolutely. Totally agree. As a general rule of thumb, agreeing with Jon Skeet is a good bet.
Not to give you the impression that I’m stalking Skeet, but I did notice that this wasn’t the first time Skeet answered a question about using encodings to convert binary data to text. In response to an earlier question he states:
Basically, treating arbitrary binary data as if it were encoded text is a quick way to lose data. When you need to represent binary data in a string, you should use base64, hex or something similar.
This perked my curiosity. I’ve always known that if you need to send binary data in text format, base64 encoding is the safe way to do so. But I didn’t really understand why the other encodings were unsafe. What are the cases in which you might lose data?
Round Tripping UTF-8 Encoded Strings
Well let’s look at one example. Imagine you’re receiving a stream of bytes and you store it as a UTF-8 string and pop it in the database. Later on, you need to relay that data so you take it out, encode it back to bytes, and send it on its merry way.
The following code simulates that scenario with a byte array containing a single byte, 128.
var data = new byte[] { 128 };
string text = Encoding.UTF8.GetString(data);
var bytes = Encoding.UTF8.GetBytes(text);
Console.WriteLine("Original:\t" + String.Join(", ", data));
Console.WriteLine("Round Tripped:\t" + String.Join(", ", bytes));
The first line of code creates a byte array with a single byte. The second line converts it to a UTF-8 string. The third line takes the string and converts it back to a byte array.
If you drop that code into the Main method of a Console app, you’ll get the following output.
Original: 128
Round Tripped: 239, 191, 189
WTF?! The data was changed and the original value is lost!
If you try it with 127 or less, it round trips just fine. What’s going on here?
UTF-8 Variable Width Encoding
To understand this, it’s helpful to understand what UTF-8 is in the first place. UTF-8 is a format that encodes each character in a string with one to four bytes. It can represent every unicode character, but is also backwards compatible with ASCII.
ASCII is an encoding that represents each character with seven bits of a single byte, and thus consists of 128 possible characters. The high order bit in standard ASCII is always zero. Why only 7-bits and not the full eight?
Because seven bits ought to be enough for anybody:
When you counted all possible alphanumeric characters (A to Z, lower and upper case, numeric digits 0 to 9, special characters like "% * / ?" etc.) you ended up a value of 90-something. It was therefore decided to use 7 bits to store the new ASCII code, with the eighth bit being used as a parity bit to detect transmission errors.
UTF-8 takes advantage of this decision to create a scheme that’s both backwards compatible with the ASCII characters, but also able to represent all unicode characters by leveraging the high order bit that ASCII ignores. Going back to Wikipedia:
UTF-8 is a variable-width encoding, with each character represented by one to four bytes. If the character is encoded by just one byte, the high-order bit is 0 and the other bits give the code value (in the range 0..127).
This explains why bytes 0 through 127 all round trip correctly. Those are simply ASCII characters.
But why does 128 expand into multiple bytes when round tripped?
If the character is encoded by a sequence of more than one byte, the first byte has as many leading "1" bits as the total number of bytes in the sequence, followed by a "0" bit, and the succeeding bytes are all marked by a leading "10" bit pattern.
How do you represent 128 in binary? 10000000
Notice that it’s marked with a leading 10 bit pattern which means it’s a continuation character. Continuation of what?
…the first byte never has 10 as its two most-significant bits. As a result, it is immediately obvious whether any given byte anywhere in a (valid) UTF‑8 stream represents the first byte of a byte sequence corresponding to a single character, or a continuation byte of such a byte sequence.
So in answer to the question of why does 128 expand into multiple bytes when round tripped, I don’t really know other than a single byte of 128 isn’t a valid UTF-8 character. So in all likelihood, the behavior shouldn’t be defined. it’s the Unicode Replacement Character used for invalid data (Thanks to RichB for the answer in the comments!).
I’ve noticed a lot of invalid ITF-8 values expand into these three bytes. But that’s beside the point. The point is that using UTF-8 encoding to store binary data is a recipe for data loss and heartache.
What about Windows-1252?
Going back to the original question, you’ll note that the code didn’t use UTF-8 encoding. I took some liberties in describing his approach. What he did was use System.Text.Encoding.Default. This could be different things on different machines, but on my machine it’s the Windows-1252 character encoding also known as “Western European Latin”.
This is a single byte encoding and when I ran the same round trip code against this encoding, I could not find a data-loss scenario. Wait, could Jon be wrong?
To prove this to myself, I wrote a little program that cycles through every possible byte and round trips it.
using System;
using System.Linq;
using System.Text;
class Program
{
static void Main(string[] args)
{
var encoding = Encoding.GetEncoding(1252);
for (int b = Byte.MinValue; b <= Byte.MaxValue; b++)
{
var data = new[] { (byte)b };
string text = encoding.GetString(data);
var roundTripped = encoding.GetBytes(text);
if (!roundTripped.SequenceEqual(data))
{
Console.WriteLine("Rount Trip Failed At: " + b);
return;
}
}
Console.WriteLine("Round trip successful!");
Console.ReadKey();
}
}
The output of this program shows that you can encode every byte, then decode it, and get the same result every time.
So in theory, it could be safe to use Windows-1252 encoding of binary data, despite what Jon said.
But I still wouldn’t do it. Not just because I believe Jon more than my own eyes and code. If it were me, I’d still use Base64 encoding because it’s known to be safe.
There are five unmapped code points in Windows-1252. You never know if those might change in the future. Also, there’s just too much risk of corruption. If you were to store this string in a file that converted its encoding to Unicode or some other encoding, you’d lose data (as we saw earlier).
Or if you were to pass this string to some unmanaged API (perhaps inadverdently) that expected a null terminated string, it’s possible this string would include an embedded null character and be truncated.
In other words, the safest bet is to listen to Jon Skeet as I’ve said all along. The next time I see Jon, I’ll have to ask him if there are other reasons not to use Windows-1252 to store binary data other than the ones I mentioned.
Birthdays are a funny thing, aren’t they? Let’s look at this tweet for example,
It's @haacked's birthday. Give him crap about getting old.
No gifts, please. Especially not what Charlie suggests.
Of course I’m getting older. We’re all getting older. Every second of every day and twice on Monday. Every femtosecond even. Perhaps the only time we’re not getting older is the moment within a Planck time interval. But once that interval is up, yep, you’re older.
Yet people apparently live their lives completely oblivious to this fact until they’re next birthday comes along. As the chronometer slides the next number into place, the realization dawns, “Damn! I’m older!” What? You didn’t know this?!
Feeling Older
The odd thing to me is that I don’t really feel older, mentally. I mean, I consciously know I’m older, but I feel like there’s this smooth continuum from my first memory to now. While the things I spend time thinking about have changed, the way I think about others and about myself feels like it hasn’t changed. I’m the same person then as I am now, and that kind of blows my mind.
For example, I still think fart jokes are funny.
In my mind, old people tell you how they used to walk miles uphill both ways to get to school. But I realize that these days, old people tell you about how they used to have to use their phone to connect online at 1200 baud. And there was no internet!!! OMG! What the hell were we connecting to?
Rather than feeling older, I am observing the evidence that I’m older. For example, I used the word “baud” in this blog post. Another example is how injuries now take much longer to heal. I have two kids, a four year old and a two year old and I’m pretty sure that if I were to slice them clean in half, that’d only put them out of commission for a week. They’d heal up and have no scars! Meanwhile, if I get a paper cut on a finger I can pretty much kiss that finger goodbye. Write it off as a loss and start practicing typing with two bloody stumps for hands.
Getting Experienced
But it’s not just physical. I do notice that while I don’t feel older, I do have the benefit of many more years of experience to draw upon. But more importantly, I’m finally actually paying attention to that. Go figure.
Last week, we had our GitHub summit and Friday was our field trip day to a distillery then a bar. This was the night set aside to party hard. Which is amazing to me because the night before I’m pretty sure we as a company consumed enough alcohol to bring elephants to extinction.
But I drew upon my experience and took it easy because I had a flight early the next morning and I did not want to be sick on an airplane. Contrast this to a few years before at Tech-Ed Hong Kong when I was out with some local friends and at 5:00 AM I had to leave the bar early to catch a flight. For the first time in my life, I contemplated suicide.
Some might call that getting wiser. I call it pain avoidance.
Knowing Less
The other evidence of my getting older is that I know a lot less now than I did when I was younger. Certainly that can’t be true in the absolute sense since I don’t have alzheimer’s (that I’m aware of anyways). But I remember as a young programmer I knew everything!
I knew the right way to do all things in all situations with absolute conviction. But these days, I’m not so sure. About anything. All I have is the breadth of my experience and pattern matching at my disposal. Each new situation is simply a pattern matching exercise against my database of experience followed by an experiment to see if what I thought I knew produces good results.
The great thing about this approach is when you know everything, you have nothing to learn. But now, I’m constantly learning. Many of my experiments fail because many of my experiences are no longer relevant today. The world changes. Quickly. But each experiment is an opportunity to learn.
Staying Young
So yeah, I’m getting older, but I’ve found a loophole. Remember the kids I mentioned slicing in half? I’m not going to do that because I’m worried I’d end up with four of them then and two are already a handful.
These two do a great job of making me feel young because they will laugh at every fart joke I can come up with.

So thanks for all the birthday wishes on Twitter, Facebook, and elsewhere. Here’s to getting older!
Suppose you have a test that needs to compare strings. Most test frameworks do a fine job with their default equality assertion. But once in a while, you get a case like this:
[Fact]
public void SomeTest()
{
Assert.Equal("Hard \tto\ncompare\r\n", "Hard to\r\ncompare\n");
}
Let’s pretend the first value in the above test is the expected value and the second value is the value you obtained by calling some method.
Clearly, this test fails. So you look at the output and this is what you see:

It’s pretty hard to compare those strings by looking at them. Especially if they are two huge strings.
This is why I typically write an extension method against string used to better output a string comparison. Here’s an example of a test using my helper.
[Fact]
public void Fact()
{
"Hard to\rcompare\n".ShouldEqualWithDiff("Hard \tto\ncompare\r\n");
}
And here’s an example of the output.

At the very top, the assert message is the same as before. I deferred to the existing Assert.Equal method in xUnit (typically Assert.AreEqual in other test frameworks) to output the error message.
Underneath the existing message are headings for three columns: the character index, the expected character, and the actual character. For each character I print out the int value and the actual character.
Of course in some cases, I don’t print out the actual value. If I were to do that for new line characters and tab characters, it’d screw up the formatting. So instead, I special case those characters and print out the escape sequence in C# for those characters.
This makes it easy to compare two strings and see every difference when a test fails. Even the hidden ones.
This is a simple quick and dirty implementation available in a Gist. For example, it doesn’t do any real DIFF comparisons and try to line up similarities. That’d be a nice improvement to make at some point. If you can improve this, feel free to fork the gist and send me a pull request.
In the ASP.NET MVC 3 Uservoice site, one of the most voted up items is a suggestion to include an empty project template. No, a really empty project template.
You see, ASP.NET MVC 3 includes an “empty” project template, but it’s not empty enough for many people. So in this post, I’ll give you a much emptier one. It’s not completely empty. If you really wanted it completely empty, just choose the ASP.NET Empty Web Application template.
The Results
I’ll show you the results first, and then talk about how I made it. After installing my project template, every time you create a new ASP.NET MVC 3 project, you’ll see a new entry named “Really Empty”

Select that and you end up with the following directory structure.

I removed just about everything. I kept the Views directory because the Web.config file that’s required is not obvious and there’s special logic related to the Views directory. I also kept the Controllers directory, since that’s where the tooling is going to put controllers anyways. I also kept the Global.asax and Web.config files which are typically necessary for an ASP.NET MVC project.
I debated removing the AssemblyInfo.cs file, but decided to trim it down and keep it.
Building Custom Project Templates
I wrote about building a custom ASP.NET MVC 3 project template a long time ago. However, I’ve improved on what I did quite a bit. Now, I have a single install.cmd file you can run and it’ll determine whether you’re on x64 or x86 and run the correct registry script. The install.cmd and uninstall.cmd batch files are there for convenience and call into a PowerShell script that does the real work.
UPDATE 1/12/2012: Thanks to Tim Heuer, we have an even better installation experience. He refactored the project to output a VSIX file. All you need to do is double click the extension file to install the project template. I’ve uploaded the extension file to GitHub here.
I tried uploading it to the gallery, but it wouldn’t let me. I’ll follow up on that.
History
If you’re wondering why the product team hasn’t included this all along, it’s for a lot of reasons. There was (at least when I was there) internal debate about how empty to make it. For example, when you create a new project with my empty template, and hit F5, you get an error. Not a great experience for most people.
Honestly, I’m all for it, but there are many other higher priority items for the team to work on. So I figured I’d do it myself and put it up on GitHub.
Installation
Installation is really simple. If you like to build things from source, grab the source from my GitHub repository and run the build.cmd batch file. Then double click the resulting VSIX file. Be sure to read the README for more details.
If you don’t yet know how to use Git to grab a repository, don’t worry, just navigate to the downloads page and download the VSIX file I’ve conveniently uploaded.
Contribute!
Hey, if you think you can help me make this better, please go fork it and send me a pull request. Let me know if I include too little or too much.
I’ve already posted a few things that could use improvement in the README. If you'd like to help make this better, consider one of the following. :)
- Make script auto-detect whether VS is running or not and do the right thing
- Test this on an x86 machine
- Write an installer for this
Let me know if you find this useful.
Mary Poppendieck writes the following in Unjust Deserts (pdf), a paper on compensation systems (emphasis mine),
There is no greater de-motivator than a reward system that is perceived to be unfair. It doesn’t matter if the system is fair or not. If there is a perception of unfairness, then those who think that they have been treated unfairly will rapidly lose
their motivation.
Written over seven years ago, the paper is just as insightful and applicable today. For example, let’s apply it to the recent dust-up about the legitimacy and fairness of the Microsoft MVP Program.
I think the MVP program means well. It’s not trying to be a conspiracy or filch you of your just desserts. But if you think about the MVP program as a compensation system, it becomes very clear why people feel disillisioned.
What compensation am I talking about?
- An MSDN Subscription
- Privileged access to product teams and not yet public information (under NDA)
- A yearly summit which provides hotel rooms and access to product team members as well as a nice party.
Not only is it a compensation system, but the means by which compensation is doled out is perceived to be arbitrary and hidden. It’s a recipe for mistrust.
Intrinsic Motivations
Mary goes on to point out,
In the same way, once employees get used to receiving financial rewards for meeting goals, they begin to work for the rewards, not the intrinsic motivation that comes from doing a good job and helping their company be successful.
Someone asked me what I thought about the MVP program recently and I said I think Microsoft’s actually a great company, but I don’t think you should seek out recognition from Microsoft or any other corporation for your community contributions. I think that provides the wrong incentives to build community.
If you run an open source project, don’t do it to receive recognition from Microsoft. Or any other corporation for that matter (except maybe you’re own). Do it to scratch an itch! Do it because it’s fun. Do it to show cool stuff to your peers. Worry about their recognition more than some corporation.
If you answer questions about a technology on StackOverflow, do it because you enjoy sharing your knowledge with others (and you want the SO points!), not because it’s on a checklist to receive an MVP award.
Just as Mary points out, when you start to frame these activities as means to receive an extrinsic reward, you become disillusioned. So whether the program exists or not, we should strive on our part to not feel a sense of entitlement to the program and focus on our intrinsic motivations.
Fixing It
I covered what I think we should strive for. But what do I think Microsoft should do? Several things.
So far, I glossed over the the fact that recognition from Microsoft isn’t the only reason people want the award. There are material benefits. MVPs are part of a privileged group that gets early access to what Microsoft is doing, which might provide a real competitive advantage. Why wouldn’t you seek that out?
Open Development
Let’s tackle the first thing first, privileged “early access”. Well there’s one easy solution to that. Do you know why NuGet doesn’t have an “early access” program? Drew Miller nails it on Twitter:
Know how you avoid the need for a privileged group of folks under NDA that inevitably is seen as special and superior? Develop in the open.
NuGet sidesteps the whole question of a recognition program by developing in the open. The same is true for the Azure SDK. When active development occurs in a public repository, the whole concept of “early access program” makes no sense.
Not only that, but recognition in an open source project doesn’t come from some corporation. It comes from the maintainers of a project and from the folks in the project’s community that you’ve helped. You can point to the reason people are recognizing you.
Better Free Tools
The other reason folks want an MVP is to have access to the professional tools. Most companies will easily shell out the money for this, but if you’re a hobbyist or open source developer, it’s a lot of money to shell out.
In this regard, I think Microsoft should either make its free Express tools have more pro features such as allowing Visual Studio Extensions and multi-project support, or simply make Visual Studio Professional free, and focus on developing the ecosystem that gets a boost when everyone has better tools to build on your platform. Everyone wins.
Focused recognition
I don’t think it’s inherently wrong for a company to recognize people’s contributions. But it has to be done in a way so that it’s seen as icing and not an entitlement or cronyism.
It’s darn near impossible to conceive of a recognition program that would be seen as universally fair and recognizes something so broad as “community contributions”. A better approach might be to have multiple smaller recognition programs. Focus on removing obstacles that get in the way of people inherently doing the things that’s good for all of us. For example, it benefits Microsoft’s when:
- People are helping solve each other problems on the forums.
- People are giving talks about their products.
- People are building software (open and not) on their platforms.
- Probably some others I’m forgetting…
For what it’s worth, I think #1 is already solved by StackOverflow. Just move your forums there and be done with it. After all, nobody gets upset when they answer a question on Twitter and don’t get StackOverflow points.
Recognize!
Will Microsoft change the program? I have no idea. I’m not really all that concerned about it really. In the meanwhile, we can recognize folks who make our lives better. We don’t need to wait for Microsoft to do so. I’ve used a huge swath of open source projects that have made my development smoother. I’ve found many great answers in forums, blog posts, StackOverflow that unblocked me.
Moving forward, I’ll make an extra effort to thank the people responsible for those things. Maybe there’s some projects and folks you should recognize. Go for it! It’ll feel good.
Disclaimer: I was a former Microsoft MVP for about three months before joining Microsoft as an employee. I’m now an employee of GitHub. My opinion here is simply my own opinion and does not necessarily represent the opinion of any employers past, present, and future. Nor does it represent the opinion of my dog, because I don’t have one, nor anyone in my neighborhood.
In the past, I’ve tried various schemes to structure my unit tests but never fell into a consistent approach. Pretty much the only rule I had (which I broke all the time) was to write a test class for each class I tested. I would then fill that class with a ton of haphazard test methods.
That was until I saw the approach that Drew Miller took with NuGet.org. The way he structured the unit tests struck me as odd at first, but quickly won me over. Drew tells me he can’t take all the credit for this approach. This approach came from when he worked at CodePlex, and builds upon practices he learned from Brad Wilson and Jim Newkirk. That’s the thing I like about Drew, he won’t take credit for other people’s work. Unlike me, of course.
The structure has a test class per class being tested. That’s not so unusual. But what was unusual to me was that he had a nested class for each method being tested.
I’ll provide a simple code example to illustrate this approach and then highlight some of the benefits. The following has two methods for embellishing names with more interesting titles. What it does isn’t really that important for this discussion.
using System;
public class Titleizer
{
public string Titleize(string name)
{
if (String.IsNullOrEmpty(name))
return "Your name is now Phil the Foolish";
return name + " the awesome hearted";
}
public string Knightify(string name, bool male)
{
if (String.IsNullOrEmpty(name))
return "Your name is now Sir Jester";
return (male ? "Sir" : "Dame") + " " + name;
}
}
Under Drew’s system, I’ll have a corresponding top level class, with two embedded classes, one for each method. In each class, I’ll have a series of tests for that method.
Let’s look at a set of potential tests for this class. I wrote xUnit.NET tests for this, but you could apply the same approach with NUnit, mbUnit, or whatever you use.
using Xunit;
public class TitleizerFacts
{
public class TheTitleizerMethod
{
[Fact]
public void ReturnsDefaultTitleForNullName()
{
// Test code
}
[Fact]
public void AppendsTitleToName()
{
// Test code
}
}
public class TheKnightifyMethod
{
[Fact]
public void ReturnsDefaultTitleForNullName()
{
// Test code
}
[Fact]
public void AppendsSirToMaleNames()
{
// Test code
}
[Fact]
public void AppendsDameToFemaleNames()
{
// Test code
}
}
}
Pretty simple, right? If you want to see a real-world example, look at these tests of the user service within NuGet.org.
So why do this at all? Why not stick with the old way I’ve done in the past?
Well for one thing, it’s a nice way to keep tests organized. All the tests (or facts) for a method are grouped together. For example, if you use the CTRL+M, CTRL+O shortcut to collapse method bodies, you can easily scan your tests and read them like a spec for your code.

You also get the same effect if you run your tests in a test runner such as the xUnit test runner:

When the test class file is open in Visual Studio, the class drop down provides a quick way to see a list of the methods you have tests for.

This makes it easy to then see all the tests for a given method by using the drop down on the right.

It’s a minor change to my existing practices, but one that I’ve grown to like a lot and hope to apply in all my projects in the future.
Update: Several folks asked about how to have common setup code for all tests. ZenDeveloper has a simple solution in which the nested child classes simply inherit the outer parent class. Thus they’ll all share the same setup code.