September 2012 Blog Posts

Primitive Obsession, Custom String Types, and Self Referencing Generic Constraints

I was once accused of primitive obsession. Especially when it comes to strings. Guilty as charged!

There’s a lot of reasons to be obsessed with string primitives. Many times, the data really is a just a string and encapsulating it in some custom type is just software “designerbation.” Also, strings are special and the .NET Framework heavily optimizes strings through techniques like string interning and providing classes like the StringBuilder.

But in many cases, a strongly typed class that better represents the domain is the right thing to do. I think System.Uri and its corresponding UriBuilder is a prime example. When you work with URLs, there are security implications that very few people get right if you treat them just as strings.

But there’s a third scenario that I often run into now that I build client applications with the Model View ViewModel (MVVM) pattern. Many properties on a view model correspond to user input. Often these properties are populated via data binding with input controls on the view. As such, these properties often need be able to hold invalid values that represent the input the user entered.

For example, suppose I want a user to type in a URL. I might have a property like:

public string YourBlogUrl {get; set;}

I can’t change the type of that property to a Uri because as the user types, the intermediate values bound to the property are not valid instances of Uri. For example, as the user types the first “h” in “http”, trying to bind the input value to the Uri property would fail.

But this sucks because suppose I want to display the host name on the screen as soon as one becomes can (more or less) be determined based on the input. I’d love to be able to just bind another control to YourBlogUrl.Host. But alas, the string type does not have a Host property.

Ideally I would have some middle ground where the type of that property both has structure, but allows me to have invalid values. Perhaps it has methods to convert it to a more strict type once we validate that the value is valid. In this case, a ToUri method would makes sense.

But string is sealed, so can’t derive from it. What’s a hapless coder to do?

Custom string types through implicit conversions

Well you could use the StringOr<T> class written as an April Fool’s joke. It was a joke, but it might be useful in cases like this! But that’s not the approach I’ll take.

Or you can follow the advice of Jimmy Bogard in his post on primitive obsession that I linked to at the beginning (I’m sure he’ll love that I dragged out a post he wrote five years ago) and write a custom class that’s implicitly convertible to string.

In his post, he shows a ZipCodeString example which I will include below, but with one change. The very last method is a conversion overload and I changed it from explicit to implicit.

public class ZipCodeString
{
    private readonly string _value;

    public ZipCodeString(string value)
    {
        // perform regex matching to verify XXXXX or XXXXX-XXXX format
        _value = value;
    }

    public string Value
    {
        get { return _value; }
    }

    public override string ToString()
    {
        return _value;
    }

    public static implicit operator string(ZipCodeString zipCode)
    {
        return zipCode.Value;
    }

    public static implicit operator ZipCodeString(string value)
    {
        return new ZipCodeString(value);
    }
}

This allows you to write code like:

ZipCodeString zip = "98008";

This provides the ease of a string to initialize a ZipCodeString type, while at the same time it provides access to the structure of a zip code.

In the interest of full disclosure, many people have a strong feeling against implicit conversions. I asked Jon Skeet, Number one dude on StackOverflow and perhaps as well versed in C# as just about anybody in the world, to review a draft of this post as I didn’t want to propagate bad practices without due warning. Here’s what he said:

Personally I really dislike implicit conversions. I don't even like explicit conversions much - I prefer methods. So if I'm writing XML serialization code, I'll usually have a FromXElement static method, and a ToXElement instance method. It definitely makes the code longer, but I reckon it's ultimately clearer. (It also allows for several conversions from the same type with different meanings - e.g. Duration.FromHours, Duration.FromMinutes etc.)

I don’t think I’d ever expose an implicit conversion like this in a public API that’s meant to share with others. But within my own code, I like it so far. If I get bitten by it, maybe I’ll change my tune and Skeet can tell me, “I told you so!”

Taking it further

I like Jimmy’s approach, but it doesn’t go far enough for my needs. For example, this works great when you employ this approach from the start. But what if you already shipped version 1 of a property as a string? And now you want to change that property to a ZipCodeString. But you have existing values serialized to disk. Or maybe you need pass this ZipCodeString to a JSON endpoint. Is that going to serialize ok?

In my case, I often want these types to act as much like strings as possible. That way, if I change a property from string to one of these types, it’ll break as little of my code as possible (if any).

What this means is we need to write a lot more boilerplate code. For example, override the Equals method and operator. In other cases, you may want to override the addition operator. I did this with a PathString class that represents file paths so I could write code like this:

// The code I recommend writing.
PathString somePath = @"c:\fake\path";
somePath = somePath.Combine("subfolder");
// somePath == @"c:\fake\path\subfolder";

// But if you did this by accident:
PathString somePath = @"c:\fake\path";
somePath += "subfolder";
// somePath == @"c:\fake\path\subfolder";

PathString has a proper Combine method, but I see code where people attempt to concatenate paths all the time. PathString overrides the addition operator creating an idiom where concatenation is equivalent to path combination.  This may end up being a bad idea, we’ll see. My feeling is that if you’re already concatenating paths, this can only make it better.

I also implemented ISerializable and IXmlSerializable to make sure that, for example, the serialized representation of PathString looks exactly like a string.

Since I have multiple types like this, I tried to push as much of the boilerplate into a base class. But it takes some tricky tricky tricks that might be a little bit evil.

Here’s the signature of the base class I wrote:

[Serializable]
public abstract class StringEquivalent<T> 
  : ISerializable, IXmlSerializable where T : StringEquivalent<T>
{
    protected StringEquivalent(string value);
    protected StringEquivalent();
    public abstract T Combine(string addition);
    public static T operator +(StringEquivalent<T> a, string b);
    public static bool operator ==(StringEquivalent<T> a, StringEquivalent<T> b);
    public static bool operator !=(StringEquivalent<T> a, StringEquivalent<T> b);
    public override bool Equals(Object obj);
    public bool Equals(T stringEquivalent);
    public virtual bool Equals(string other)    
    public override int GetHashCode();
    public override string ToString();
    // Implementations of ISerializable and IXmlSerializable
}

The full implementation is available in my CodeHaacks repo on GitHub with full unit tests and examples.

Self Referencing Generic Constraints

There’s some stuff in here that just seemed crazy to me at first. For example, taking out the interfaces, did you notice the generic type declaration?

public abstract class StringEquivalent<T> : where T : StringEquivalent<T> 

Notice that the generic constraint is self-referencing. This is a pattern that Eric Lippert discourages:

Yes it is legal, and it does have some legitimate uses. I see this pattern rather a lot(**). However, I personally don't like it and I discourage its use.

This is a C# variation on what's called the Curiously Recurring Template Pattern in C++, and I will leave it to my betters to explain its uses in that language. Essentially the pattern in C# is an attempt to enforce the usage of the CRTP.

…snip…

So that's one good reason to avoid this pattern: because it doesn't actually enforce the constraint you think it does.

…snip…

The second reason to avoid this is simply because it bakes the noodle of anyone who reads the code.

Again, Jon Skeet provided an example of the warning that Lippert states in regards to the inability to actually enforce the constraint I might wish to enforce.

While you're not fulIy enforcing a constraint, the constraint which you have got doesn't prevent some odd stuff. For example, it would be entirely legitimate to write:

public class ZipCodeString : StringEquivalent<ZipCodeString>

public class WeirdAndWacky : StringEquivalent<ZipCodeString>

That's legal, and we don't really want it to be. That's the kind of thing Eric was trying to avoid, I believe.

The reason I chose to against the recommendation of someone much smarter than me in this case is because my goal isn’t to enforce these constraints at all. It’s to enable a scenario. This is the only way to implement these various operator overloads in a base class. Without these constraints, I’d have to reimplement them for every class class. If you know a better approach, I’m all ears.

WPF Value Converter and Markup Extension Examples

As a bonus divergence, I thought I’d throw in one more example of a self-referencing generic constraint. In WPF, there’s a concept of a value converter, IValueConverter, used to convert values from XAML to your view model and vice versa. However, the mechanism to declare and use value converters is really clunky.

Josh Twist provides a nice example that cleans up the syntax with value converters that are also MarkupExtension. I decided to take it further and write a base class that does it.

public abstract class ValueConverterMarkupExtension<T> 
    : MarkupExtension, IValueConverter where T : class, IValueConverter, new()
{
  static T converter;

  public override object ProvideValue(IServiceProvider serviceProvider)
  {
    return converter ?? (converter = new T());
  }

  public abstract object Convert(object value, Type targetType
    , object parameter, CultureInfo culture);

  // Only override this if this converter might be used with 2-way data binding.
  public virtual object ConvertBack(object value
    , Type targetType, object parameter, CultureInfo culture)
  {
    return DependencyProperty.UnsetValue;
  }
}

I’m sure I’m not the first to do something like this.

Now all my value converters inherit from this base class.

Back to Primitives

Back to the original topic, I used to supplement primitives with loads of extension methods. I have a set of extension methods of string I use quite a bit. But more and more, I’m starting to prefer dialing that back a bit in cases like this where I need something to be a string with structure.

Git and GitHub Talk in Hawaii!

Next week my wife and I celebrate our tenth anniversary in Oahu with the kids. It’s been a great ten years and I’m just so lucky to have such a wonderful woman and partner in my life along with two deviously great kids.

And what better way to celebrate an anniversary than to give a talk on Git and GitHub for Windows Developers!

UPDATE: Immediately after the talk we’re going to have a drinkup!

git-github-logo

Before I go further, I need you to soak in that logo for a minute. At first glance, it looks like it was drawn by a five year old phoning in a homework assignment. But let it wash over you and the awesomeness starts to make itself apparent.

It’s a commission from http://www.horriblelogos.com/ where you can spend…

$5 for a logo guaranteed to suck

You might even learn something from these logos. For example, if you’ve heard the term “Map Reduce” and you know it’s probably useful but don’t understand what it means. You can thank me later for the following:

horrible-logos-map-reduce

But I digress.

This is my first time speaking in Hawaii and I’m excited. I hope this begins a trend of being invited to speak in lush tropical islands.

Update: I work with a great bunch of designers. Here’s a take on a logo for this talk by Jason Long based on the MetroWindows 8 design style. Click for a larger view.

haacked-hawaii

Quotas, What Are They Good For?

If you look hard enough at our industry (really at all industries), you’ll find many implicit quotas in play. For example, some companies demand a minimum set of hours worked per week.

This reminds me of an apocryphal story of the “know where man”. Here’s one variant of this famous legend as described on snopes:

Nikola Tesla visited Henry Ford at his factory, which was having some kind of difficulty. Ford asked Tesla if he could help identify the problem area. Tesla walked up to a wall of boilerplate and made a small X in chalk on one of the plates. Ford was thrilled, and told him to send an invoice.

The bill arrived, for $10,000. Ford asked for a breakdown. Tesla sent another invoice, indicating a $1 charge for marking the wall with an X, and $9,999 for knowing where to put it.

In this variant, Ford is surprised by the price because $10,000 is a lot to pay for a few minutes of work. But as Tesla points out, he’s not paying for Tesla’s time, he’s paying for a solution to an expensive problem.

Another example is the idea of measuring a developer’s productivity by lines of code. Unless you sell code by the line, this is also pointless as Bill Gates once pointed out:

Measuring programming progress by lines of code is like measuring aircraft building progress by weight

Set working hours is another example of a poor quota. Developers aren’t paid for lines of code, number of hours in the office, or being in the office at certain hours. They’re paid to create value!

I got to thinking about this after reading an article completely unrelated to software - this heart wrenching and infuriating account of young offenders being enlisted as confidential informants and placed in extremely dangerous situations that far outweigh the gravity of their alleged crime.

One thing in particular caught my attention:

Mitchell McLean has come to see his son’s death as the result of an equally cynical and utilitarian calculation. “The cops, they get federal funding by the number of arrests they make—to get the money, you need the numbers,” he explained, alluding to, among other things, asset-forfeiture laws that allow police departments to keep a hefty portion of cash and other resources seized during drug busts.

Notice the incentive here. The focus is on number of arrests. This focuses on a symptom, but not on the actual desired outcome.

That’s the problem with quotas. They rarely lead to the actual outcome you want. They simply reward gaming the quota by any means necessary.

This is not to say that all quotas are useless. Perhaps there are cases where they are called for. But they have to overcome the dreaded law of unintended consequences.

The law of unintended consequences, often cited but rarely defined, is that actions of people—and especially of government—always have effects that are unanticipated or unintended.

I imagine a good quota would be one in which it brings the system closer to the desired outcome and manages to avoid unintended consequences that would set the overall system back in worst shape than before. For example, perhaps if the gulf between your current state and the desired outcome is huge, a quota might help make small gains.

If you have examples where you think quotas produce the desired outcome with negligible unintended consequences, please do comment.