Primitive Obsession, Custom String Types, and Self Referencing Generic Constraints
I was once accused of primitive obsession. Especially when it comes to strings. Guilty as charged!
There’s a lot of reasons to be obsessed with string primitives. Many
times, the data really is a just a string and encapsulating it in some
custom type is just software “designerbation.” Also, strings are special
and the .NET Framework heavily optimizes strings through techniques like
string interning and providing classes like the StringBuilder
.
But in many cases, a strongly typed class that better represents the
domain is the right thing to do. I think System.Uri
and its
corresponding UriBuilder
is a prime example. When you work with URLs,
there are security implications that very few people get right if you
treat them just as strings.
But there’s a third scenario that I often run into now that I build client applications with the Model View ViewModel (MVVM) pattern. Many properties on a view model correspond to user input. Often these properties are populated via data binding with input controls on the view. As such, these properties often need be able to hold invalid values that represent the input the user entered.
For example, suppose I want a user to type in a URL. I might have a property like:
public string YourBlogUrl {get; set;}
I can’t change the type of that property to a Uri
because as the user
types, the intermediate values bound to the property are not valid
instances of Uri
. For example, as the user types the first “h” in
“http”, trying to bind the input value to the Uri
property would fail.
But this sucks because suppose I want to display the host name on the
screen as soon as one becomes can (more or less) be determined based on
the input. I’d love to be able to just bind another control to
YourBlogUrl.Host
. But alas, the string
type does not have a Host
property.
Ideally I would have some middle ground where the type of that property
both has structure, but allows me to have invalid values. Perhaps it has
methods to convert it to a more strict type once we validate that the
value is valid. In this case, a ToUri
method would makes sense.
But string
is sealed, so can’t derive from it. What’s a hapless coder
to do?
Custom string types through implicit conversions
Well you could use the StringOr<T>
class written as an April Fool’s
joke. It was a joke, but it might be useful in cases like this! But
that’s not the approach I’ll take.
Or you can follow the advice of Jimmy
Bogard in his post
on primitive
obsession
that I linked to at the beginning (I’m sure he’ll love that I dragged
out a post he wrote five years ago) and write a custom class that’s
implicitly convertible to string
.
In his post, he shows a ZipCodeString
example which I will include
below, but with one change. The very last method is a conversion
overload and I changed it from explicit
to implicit
.
public class ZipCodeString
{
private readonly string _value;
public ZipCodeString(string value)
{
// perform regex matching to verify XXXXX or XXXXX-XXXX format
_value = value;
}
public string Value
{
get { return _value; }
}
public override string ToString()
{
return _value;
}
public static implicit operator string(ZipCodeString zipCode)
{
return zipCode.Value;
}
public static implicit operator ZipCodeString(string value)
{
return new ZipCodeString(value);
}
}
This allows you to write code like:
ZipCodeString zip = "98008";
This provides the ease of a string
to initialize a ZipCodeString
type, while at the same time it provides access to the structure of a
zip code.
In the interest of full disclosure, many people have a strong feeling against implicit conversions. I asked Jon Skeet, Number one dude on StackOverflow and perhaps as well versed in C# as just about anybody in the world, to review a draft of this post as I didn’t want to propagate bad practices without due warning. Here’s what he said:
Personally I really dislike implicit conversions. I don’t even like explicit conversions much - I prefer methods. So if I’m writing XML serialization code, I’ll usually have a FromXElement static method, and a ToXElement instance method. It definitely makes the code longer, but I reckon it’s ultimately clearer. (It also allows for several conversions from the same type with different meanings - e.g. Duration.FromHours, Duration.FromMinutes etc.)
I don’t think I’d ever expose an implicit conversion like this in a public API that’s meant to share with others. But within my own code, I like it so far. If I get bitten by it, maybe I’ll change my tune and Skeet can tell me, “I told you so!”
Taking it further
I like Jimmy’s approach, but it doesn’t go far enough for my needs. For
example, this works great when you employ this approach from the start.
But what if you already shipped version 1 of a property as a string
?
And now you want to change that property to a ZipCodeString
. But you
have existing values serialized to disk. Or maybe you need pass this
ZipCodeString
to a JSON endpoint. Is that going to serialize ok?
In my case, I often want these types to act as much like strings as possible. That way, if I change a property from string to one of these types, it’ll break as little of my code as possible (if any).
What this means is we need to write a lot more boilerplate code. For
example, override the Equals
method and operator. In other cases, you
may want to override the addition operator. I did this with a
PathString
class that represents file paths so I could write code like
this:
// The code I recommend writing.
PathString somePath = @"c:\fake\path";
somePath = somePath.Combine("subfolder");
// somePath == @"c:\fake\path\subfolder";
// But if you did this by accident:
PathString somePath = @"c:\fake\path";
somePath += "subfolder";
// somePath == @"c:\fake\path\subfolder";
PathString
has a proper Combine
method, but I see code where people
attempt to concatenate paths all the time. PathString
overrides the
addition operator creating an idiom where concatenation is equivalent to
path combination. This may end up being a bad idea, we’ll see. My
feeling is that if you’re already concatenating paths, this can only
make it better.
I also implemented ISerializable
and IXmlSerializable
to make sure
that, for example, the serialized representation of PathString
looks
exactly like a string.
Since I have multiple types like this, I tried to push as much of the boilerplate into a base class. But it takes some tricky tricky tricks that might be a little bit evil.
Here’s the signature of the base class I wrote:
[Serializable]
public abstract class StringEquivalent<T>
: ISerializable, IXmlSerializable where T : StringEquivalent<T>
{
protected StringEquivalent(string value);
protected StringEquivalent();
public abstract T Combine(string addition);
public static T operator +(StringEquivalent<T> a, string b);
public static bool operator ==(StringEquivalent<T> a, StringEquivalent<T> b);
public static bool operator !=(StringEquivalent<T> a, StringEquivalent<T> b);
public override bool Equals(Object obj);
public bool Equals(T stringEquivalent);
public virtual bool Equals(string other)
public override int GetHashCode();
public override string ToString();
// Implementations of ISerializable and IXmlSerializable
}
The full implementation is available in my CodeHaacks repo on GitHub with full unit tests and examples.
Self Referencing Generic Constraints
There’s some stuff in here that just seemed crazy to me at first. For example, taking out the interfaces, did you notice the generic type declaration?
public abstract class StringEquivalent<T> : where T : StringEquivalent<T>
Notice that the generic constraint is self-referencing. This is a pattern that Eric Lippert discourages:
Yes it is legal, and it does have some legitimate uses. I see this pattern rather a lot(**). However, I personally don’t like it and I discourage its use.
This is a C# variation on what’s called the Curiously Recurring Template Pattern in C++, and I will leave it to my betters to explain its uses in that language. Essentially the pattern in C# is an attempt to enforce the usage of the CRTP.
…snip…
So that’s one good reason to avoid this pattern: because it doesn’t actually enforce the constraint you think it does.
…snip…
The second reason to avoid this is simply because itbakes the noodleof anyone who reads the code.
Again, Jon Skeet provided an example of the warning that Lippert states in regards to the inability to actually enforce the constraint I might wish to enforce.
While you’re not fulIy enforcing a constraint, the constraint which you have got doesn’t prevent some odd stuff. For example, it would be entirely legitimate to write:
public class ZipCodeString : StringEquivalent<ZipCodeString>
public class WeirdAndWacky : StringEquivalent<ZipCodeString>
That’s legal, and we don’t really want it to be. That’s the kind of thing Eric was trying to avoid, I believe.
The reason I chose to against the recommendation of someone much smarter than me in this case is because my goal isn’t to enforce these constraints at all. It’s to enable a scenario. This is the only way to implement these various operator overloads in a base class. Without these constraints, I’d have to reimplement them for every class class. If you know a better approach, I’m all ears.
WPF Value Converter and Markup Extension Examples
As a bonus divergence, I thought I’d throw in one more example of a
self-referencing generic constraint. In WPF, there’s a concept of a
value converter, IValueConverter
, used to convert values from XAML to
your view model and vice versa. However, the mechanism to declare and
use value converters is really clunky.
Josh Twist provides a nice
example
that cleans up the syntax with value converters that are also
MarkupExtension
. I decided to take it further and write a base class
that does it.
public abstract class ValueConverterMarkupExtension<T>
: MarkupExtension, IValueConverter where T : class, IValueConverter, new()
{
static T converter;
public override object ProvideValue(IServiceProvider serviceProvider)
{
return converter ?? (converter = new T());
}
public abstract object Convert(object value, Type targetType
, object parameter, CultureInfo culture);
// Only override this if this converter might be used with 2-way data binding.
public virtual object ConvertBack(object value
, Type targetType, object parameter, CultureInfo culture)
{
return DependencyProperty.UnsetValue;
}
}
I’m sure I’m not the first to do something like this.
Now all my value converters inherit from this base class.
Back to Primitives
Back to the original topic, I used to supplement primitives with loads
of extension methods. I have a set of extension methods of string
I
use quite a bit. But more and more, I’m starting to prefer dialing that
back a bit in cases like this where I need something to be a string with
structure.
Comments
31 responses