Primitive Obsession, Custom String Types, and Self Referencing Generic Constraints

Sep 30, 2012 code suggest edit

I was once accused of primitive obsession. Especially when it comes to strings. Guilty as charged!

There’s a lot of reasons to be obsessed with string primitives. Many times, the data really is a just a string and encapsulating it in some custom type is just software “designerbation.” Also, strings are special and the .NET Framework heavily optimizes strings through techniques like string interning and providing classes like the StringBuilder.

But in many cases, a strongly typed class that better represents the domain is the right thing to do. I think System.Uri and its corresponding UriBuilder is a prime example. When you work with URLs, there are security implications that very few people get right if you treat them just as strings.

But there’s a third scenario that I often run into now that I build client applications with the Model View ViewModel (MVVM) pattern. Many properties on a view model correspond to user input. Often these properties are populated via data binding with input controls on the view. As such, these properties often need be able to hold invalid values that represent the input the user entered.

For example, suppose I want a user to type in a URL. I might have a property like:

public string YourBlogUrl {get; set;}

I can’t change the type of that property to a Uri because as the user types, the intermediate values bound to the property are not valid instances of Uri. For example, as the user types the first “h” in “http”, trying to bind the input value to the Uri property would fail.

But this sucks because suppose I want to display the host name on the screen as soon as one becomes can (more or less) be determined based on the input. I’d love to be able to just bind another control to YourBlogUrl.Host. But alas, the string type does not have a Host property.

Ideally I would have some middle ground where the type of that property both has structure, but allows me to have invalid values. Perhaps it has methods to convert it to a more strict type once we validate that the value is valid. In this case, a ToUri method would makes sense.

But string is sealed, so can’t derive from it. What’s a hapless coder to do?

Custom string types through implicit conversions

Well you could use the StringOr<T> class written as an April Fool’s joke. It was a joke, but it might be useful in cases like this! But that’s not the approach I’ll take.

Or you can follow the advice of Jimmy Bogard in his post on primitive obsession that I linked to at the beginning (I’m sure he’ll love that I dragged out a post he wrote five years ago) and write a custom class that’s implicitly convertible to string.

In his post, he shows a ZipCodeString example which I will include below, but with one change. The very last method is a conversion overload and I changed it from explicit to implicit.

public class ZipCodeString
{
    private readonly string _value;

    public ZipCodeString(string value)
    {
        // perform regex matching to verify XXXXX or XXXXX-XXXX format
        _value = value;
    }

    public string Value
    {
        get { return _value; }
    }

    public override string ToString()
    {
        return _value;
    }

    public static implicit operator string(ZipCodeString zipCode)
    {
        return zipCode.Value;
    }

    public static implicit operator ZipCodeString(string value)
    {
        return new ZipCodeString(value);
    }
}

This allows you to write code like:

ZipCodeString zip = "98008";

This provides the ease of a string to initialize a ZipCodeString type, while at the same time it provides access to the structure of a zip code.

In the interest of full disclosure, many people have a strong feeling against implicit conversions. I asked Jon Skeet, Number one dude on StackOverflow and perhaps as well versed in C# as just about anybody in the world, to review a draft of this post as I didn’t want to propagate bad practices without due warning. Here’s what he said:

Personally I really dislike implicit conversions. I don’t even like explicit conversions much - I prefer methods. So if I’m writing XML serialization code, I’ll usually have a FromXElement static method, and a ToXElement instance method. It definitely makes the code longer, but I reckon it’s ultimately clearer. (It also allows for several conversions from the same type with different meanings - e.g. Duration.FromHours, Duration.FromMinutes etc.)

I don’t think I’d ever expose an implicit conversion like this in a public API that’s meant to share with others. But within my own code, I like it so far. If I get bitten by it, maybe I’ll change my tune and Skeet can tell me, “I told you so!”

Taking it further

I like Jimmy’s approach, but it doesn’t go far enough for my needs. For example, this works great when you employ this approach from the start. But what if you already shipped version 1 of a property as a string? And now you want to change that property to a ZipCodeString. But you have existing values serialized to disk. Or maybe you need pass this ZipCodeString to a JSON endpoint. Is that going to serialize ok?

In my case, I often want these types to act as much like strings as possible. That way, if I change a property from string to one of these types, it’ll break as little of my code as possible (if any).

What this means is we need to write a lot more boilerplate code. For example, override the Equals method and operator. In other cases, you may want to override the addition operator. I did this with a PathString class that represents file paths so I could write code like this:

// The code I recommend writing.
PathString somePath = @"c:\fake\path";
somePath = somePath.Combine("subfolder");
// somePath == @"c:\fake\path\subfolder";

// But if you did this by accident:
PathString somePath = @"c:\fake\path";
somePath += "subfolder";
// somePath == @"c:\fake\path\subfolder";

PathString has a proper Combine method, but I see code where people attempt to concatenate paths all the time. PathString overrides the addition operator creating an idiom where concatenation is equivalent to path combination. This may end up being a bad idea, we’ll see. My feeling is that if you’re already concatenating paths, this can only make it better.

I also implemented ISerializable and IXmlSerializable to make sure that, for example, the serialized representation of PathString looks exactly like a string.

Since I have multiple types like this, I tried to push as much of the boilerplate into a base class. But it takes some tricky tricky tricks that might be a little bit evil.

Here’s the signature of the base class I wrote:

[Serializable]
public abstract class StringEquivalent<T> 
  : ISerializable, IXmlSerializable where T : StringEquivalent<T>
{
    protected StringEquivalent(string value);
    protected StringEquivalent();
    public abstract T Combine(string addition);
    public static T operator +(StringEquivalent<T> a, string b);
    public static bool operator ==(StringEquivalent<T> a, StringEquivalent<T> b);
    public static bool operator !=(StringEquivalent<T> a, StringEquivalent<T> b);
    public override bool Equals(Object obj);
    public bool Equals(T stringEquivalent);
    public virtual bool Equals(string other)    
    public override int GetHashCode();
    public override string ToString();
    // Implementations of ISerializable and IXmlSerializable
}

The full implementation is available in my CodeHaacks repo on GitHub with full unit tests and examples.

Self Referencing Generic Constraints

There’s some stuff in here that just seemed crazy to me at first. For example, taking out the interfaces, did you notice the generic type declaration?

public abstract class StringEquivalent<T> : where T : StringEquivalent<T> 

Notice that the generic constraint is self-referencing. This is a pattern that Eric Lippert discourages:

Yes it is legal, and it does have some legitimate uses. I see this pattern rather a lot(**). However, I personally don’t like it and I discourage its use.

This is a C# variation on what’s called the Curiously Recurring Template Pattern in C++, and I will leave it to my betters to explain its uses in that language. Essentially the pattern in C# is an attempt to enforce the usage of the CRTP.

…snip…

So that’s one good reason to avoid this pattern: because it doesn’t actually enforce the constraint you think it does.

…snip…

The second reason to avoid this is simply because itbakes the noodleof anyone who reads the code.

Again, Jon Skeet provided an example of the warning that Lippert states in regards to the inability to actually enforce the constraint I might wish to enforce.

While you’re not fulIy enforcing a constraint, the constraint which you have got doesn’t prevent some odd stuff. For example, it would be entirely legitimate to write:

public class ZipCodeString : StringEquivalent<ZipCodeString>

public class WeirdAndWacky : StringEquivalent<ZipCodeString>

That’s legal, and we don’t really want it to be. That’s the kind of thing Eric was trying to avoid, I believe.

The reason I chose to against the recommendation of someone much smarter than me in this case is because my goal isn’t to enforce these constraints at all. It’s to enable a scenario. This is the only way to implement these various operator overloads in a base class. Without these constraints, I’d have to reimplement them for every class class. If you know a better approach, I’m all ears.

WPF Value Converter and Markup Extension Examples

As a bonus divergence, I thought I’d throw in one more example of a self-referencing generic constraint. In WPF, there’s a concept of a value converter, IValueConverter, used to convert values from XAML to your view model and vice versa. However, the mechanism to declare and use value converters is really clunky.

Josh Twist provides a nice example that cleans up the syntax with value converters that are also MarkupExtension. I decided to take it further and write a base class that does it.

public abstract class ValueConverterMarkupExtension<T> 
    : MarkupExtension, IValueConverter where T : class, IValueConverter, new()
{
  static T converter;

  public override object ProvideValue(IServiceProvider serviceProvider)
  {
    return converter ?? (converter = new T());
  }

  public abstract object Convert(object value, Type targetType
    , object parameter, CultureInfo culture);

  // Only override this if this converter might be used with 2-way data binding.
  public virtual object ConvertBack(object value
    , Type targetType, object parameter, CultureInfo culture)
  {
    return DependencyProperty.UnsetValue;
  }
}

I’m sure I’m not the first to do something like this.

Now all my value converters inherit from this base class.

Back to Primitives

Back to the original topic, I used to supplement primitives with loads of extension methods. I have a set of extension methods of string I use quite a bit. But more and more, I’m starting to prefer dialing that back a bit in cases like this where I need something to be a string with structure.

Found a typo or mistake in the post? suggest edit

Comments

31 responses

Bertrand Le Roy • September 29th, 2012
Fun stuff, but I can't help wondering if this isn't going to create more bugs than it avoids. Would you even need to consider something like that in a dynamic language?
haacked • September 29th, 2012
@Bertrand how so?
Nikhil Kothari • September 29th, 2012
Nice post. I agree this can be a slippery slope, but there are a few key scenarios where this seems like an obvious option, both in terms of simplicity and readability. Dynamic languages tend to go this route, and while implicit conversions can be the source of errors, in more targeted scenarios, they could help.
I personally wish it would be possible to have some middle ground, where you could add what appear like language keywords. Example:
x.BlogUrl = url(http://...)
Similar to typeof() that already exists. They make it clear what is going on with minimal syntax overhead.
I guess I went further and also implied there isn't needs for quotes around that string.
Almost toying with the idea that perhaps a class that offers a ctor with a string parameter, or if there were an explicit conversion from string on the class, that the language would/could support such a syntax.
haacked • September 29th, 2012
Just to clarify, I only use these types on view models. I usually have an IsValid property on these types. All my API methods accept a strict type such as Uri. So if the value is valid, I just cast it to the strict type before passing it into the API.
Micah • September 29th, 2012
I think it's kinda disappointing that you have to jump through all these "bad practices hoops" just to do something this straightforward , especially in a typed language.
Bertrand Le Roy • September 29th, 2012
I don't know, it's a question. But here are some more reasons for the question... HtmlString has no implicit conversion, why did you decide against it at the time? In Orchard, we have LocalizableString that we use everywhere, and we briefly played with implicit conversion and went back because it was creating hard to debug problems for our users down the line. I wrote a full library for path manipulation (http://fluentpath.codeplex.com) that seems close to what you are doing here, but after playing with implicit conversions, I again decided against it but kept explicit conversions and type converters because well, they are quite nice and they are explicit. But implicit conversions always seemed to backfire in my experience. But I would gladly concede that maybe I was doing it wrong.
As for dynamic languages, it seems to me like you would simply mix some additional behavior into a regular string and be done with it, with a lot less fuss. More and more, when I'm doing something that looks both exciting and dangerous in C#, I wonder if I would even attempt something similar in JavaScript or if the problem is just a workaround for self-inflicted statically-typed rigidity. It often is in my experience.
That's my question in a few more words :)
Bertrand Le Roy • September 29th, 2012
One more: would it have helped you if string hadn't been sealed?
smartcaveman • September 29th, 2012
Isn't it a bit of a code smell to expose an IsValid property? It seems to me that if the regex fails to validate in the ctor, then the class should throw an exception.
haacked • September 29th, 2012
A dynamic language certainly removes some of the ceremony of what I'm doing, but I'm not sure a dynamic language solves the specific problem I'm solving. Perhaps I wasn't clear about my goal.
What I'm trying to do is to be able to fuzzily extract structure from an incomplete partially valid value.
Typically, when we think of user input, it's either valid or invalid. But sometimes values are partially valid. In other words, there's parts of the value that can be extracted and displayed while the user continues typing. Kind of like a password strength indicator.
For example, suppose I want the UI to display and validate the drive letter of a file path separately from the full path while a user is typing in the path.
When the user types the first letter, I might not display something. But as soon as they type a ":", as long as its preceded by a letter, I can guess what the drive is.
Suppose the user ends up typing c:\foo|bar\.
At this point, this is not a valid file path. The bar character is not allowed in a file path. So even with a dynamic language using a proper Path type, you wouldn't be able to extract the drive letter because the type would just say, "This ain't a path. What the hell do I know."
It's important to note that I'm not making any decisions based on this data. I'm just displaying what I know. I won't make decisions until the value is valid.
To accomplish this, I have types which represent user input and the intent of that input. I can show in the UI that the value is invalid, but I can also show what parts are valid so far.
When entering a url, I might show the host name. So you have Uri which is strict and UriString which might be better named as UriInput which represents the input that's intended to be a Uri but might be incomplete or partially invalid. Consider that I'm binding the input control value to view model property of this type on every keystroke.
So in a dynamic language, I'd still need these two types. But, I wouldn't need to write the code to coerce one to another. Duck typing would fill that need just fine. So in that case, a dynamic type would mean much less code to write. But the original problem I'm solving is still there.
haacked • September 29th, 2012
@Betrand, ah, I see your point. You may be right and the implicit conversion may come back to byte me. To be honest, it's an experiment because I've always heeded the advice to avoid implicit conversions. But I feel it's time to experience the supposed pain firsthand. Maybe there are cases where it doesn't backfire. :)
Also, I think I address the dynamic language issue in my previous comment. I don't think dynamic languages remove the need for what I'm doing, but they do remove a lot of the ceremony it takes to do it. That is definitely true. :)
haacked • September 29th, 2012
@smartcaveman I think it would be in a type like Uri which is meant to be a URI and never allow invalid values. But the whole point of these types is to represent user input which might be invalid. So invalid input is not an exceptional case. It's quite common! So I have an IsValid property on these types.
Note that these types are just counterparts to strict types like a DirectoryInfo, FileInfo, or Uri.
smartcaveman • September 29th, 2012
I get what you're saying about invalid user input not being exceptional. I guess my problem with it is that I don't see anything explicitly requiring that the class only be used for user input. Validity is meaningless without context, and I wasn't clear on the context. Why did you choose a name like ZipCode instead of something more descriptive of your intention like ZipCodeInput or ZipCodeCandidate? If you're the only one using the ZipCode class, then it's probably not an issue, but the name seems to imply that it actually represents that type of data (and should enforce any implied contracts). As written, it seems likely to be misinterpreted by a client programmer who just needs a ZipCode class.
On another note, I know that I routinely have to work with constrained-string classes in both the business layer and the UI layer. With an approach like this one, it seems likely that the validation logic would end up being duplicated. How do you avoid that? Also, what is your design like for a strongly typed string in the domain/business layer?
Dusty Burwell • September 29th, 2012
I like what you end up with here. I won't pass judgement on the means it took the get there considering I've committed my fair share of sins in the name of WPF and MVVM.
That said, I do find it frustrating that so much of the burden is placed on you and I to solve these problems. Don't get me wrong. I think WPF is an amazing tool that can produce beautiful user interfaces, but I don't think enough time or attention was paid to advanced scenarios beyond Autonomous View. I mean, MVVM was adapted from MVP as a pattern to take particular advantage of features of WPF and it's great, but it was created outside the walls of MS. I guess what I'm saying is that some developer support for decent patterns in WPF from Redmond would be appreciated.
Rant aside, it would be interesting to see a library of these "StringEquivalents", perhaps as an inclusion to Caliburn.Micro or similar.
distantcam • September 30th, 2012
This sounds like a display issue, rather than a back-end one. If the end result is to show a host name the moment one is available why not have a HostName property on your ViewModel, and when the URL string property changes, try to set the HostName property then?
I think of the ViewModel as a facilitator for the View, rather than a holder of the business logic for a screen. That way having properties like SelectedFoo etc also make sense as they are very View specific. Business logic gets put into other classes away from the INotifyPropertyChanged gunk that is only UI relevant. Then the ViewModel becomes a coordinator of a bunch of business-logic calls.
More and more I think the ViewModel is becoming the new code-behind.
distantcam • September 30th, 2012
As an example of using property changed weaving to take away the grind of making ViewModels, here's an example I knocked up.
https://gist.github.com/3805842
I use Simon Cropp's Fody project for post-IL weaving. As you can see in the example it also finds the calculated properties so I don't have to worry about all the change notifications myself.
https://github.com/SimonCropp/Fody
Harry McIntyre • September 30th, 2012
My ImplicitOperatorModelBinder class gives me the shivers. Soooo blooming handy though...
github.com/.../ImplicitOperatorBinderProvider.cs
haacked • September 30th, 2012
@smartcaveman You are correct, the ZipCode class is poorly named. I changed it to be ZipCodeString to be consistent with my other classes. Perhaps ZipCodeInput is an even better name. I would pair this with a real strict ZipCode class.
@Cam I went that route initially. RxUI makes it really easy to have a readonly property that's dependent on the value of another property.
However, the thing I ran into is I had multiple view models that might have one or more of these types of inputs. So I ended up duplicating a lot of code. Why not just have a type that provides structure I can choose to bind to.
I'm not of the mind that a view model should only have primitive properties.
haacked • September 30th, 2012
Do you think this makes more sense if I simply call these "view models?" For example, the UriString is really a UriViewModel and I'm using it as a sub-viewmodel of my main view model?
Diego F. • September 30th, 2012
If you're really using these classes as building blocks for real UI elements then it'd be correct to name them with ViewModel sufix. If it's just input validation for a master (parent) view-model, then the Input sufix feels more appropriate.
distantcam • September 30th, 2012
As @Diego said, if these classes are to be used in the UI they should be called ViewModels.
If you're using these classes to then bind to a property off them, be aware there's a memory issue in WPF if they don't implement INotifyPropertyChanged or the property isn't a DependencyProperty.
http://support.microsoft.com/kb/938416
And you shouldn't be implementing INotifyPropertyChanged on things that aren't ViewModels. ;)
Diego F. • September 30th, 2012
Cam,
I think that KB article applies only on the v3.0 of WPF. Otherwise I'd have a elephant running on my desktop. :) That version was wayyyyy buggy and slow(er).
distantcam • September 30th, 2012
@Diego given it's by design, and is how binding works when you don't implement INotifyProperyChanged or DependencyProperties I would have thought it would still be in later versions. But you might be right, they could have optimized the global table to not hold strong references anymore.
I still prefer to make sure anything I bind to has INPC or DependencyProperties though.
Diego F. • October 1st, 2012
@Cam Yes, it's a good practice to implement INPC when using WPF if you need notifications. But sometimes you have the perfect POCO in a read-only one time binding situation, so why not? Regarding the memory leak I can confirm you it's gone for good -- just tested for it.
Diego F. • October 1st, 2012
@Cam I was wrong! It does still leak -- WPF keeps rearing its ugly head. Just got a beautiful picture of this leak. Interestingly enough, I don't have any single situation like this in my codebase -- every property is a DP and POCO objects don't ever happen to be in this situation. http://yfrog.com/kgxpetp
nyelvoktatás • October 1st, 2012
@smartcaveman: no, just think in distributed or any non-one-layered architecture.
Steve • October 1st, 2012
A side note...
On a lot of apps I've worked on, the question of validation comes up. I realize there is a preponderence of assumptions that objects should have invariant rules. That is we should never be able to set a value to something bad, ever.
Yet in many cases, this often times gets in the way of the users using the app. If the user only has part of the data at this time, disallowing them from entering that part accomplishes what? Let's say we have city and zip but not street. What does that mean for an Address?
Well, without a street it means you can't actually send them a package or letter.
So the point really is do you need to validate this on user input? Or do you need to validate this when the data is being utilized?

I've done a lot of work with the mortgage industry and have handled a lot of addresses. We would run addresses through a Zip+4 database to make sure they are properly formatted for the us postal service. However, we'd usually store two versions of the address. The one the user entered, and the one that had been validated by the database.
Because sometimes the validation was wrong, and you needed to be able to go back and look at the difference. This was frequently the case with mortgages on new construction, because you'll have a new street that wasn't in the Zip+4 database until you get the next quartly update. So it won't validate, but yet we know the address exists.
Anyway, it's just something else to think about. I think we spend a lot of time worried about invariants in domains, when the business reality is much more flexible.
haacked • October 1st, 2012
@Steve great point! I've written about that before in my post, Don't be a Validation Nazi.
Brian • October 2nd, 2012
This is an interesting approach for user input. I work on a website where we allow a lot of flexibility when entering data, even if it doesn't look correct at all, we typically allow it (until we get to the point of billing or something similarly critical to valid input).
I've got a little open source project that I've been working on for a while that has a pretty good conversion library if you are interested in using it (or just borrow the code - I don't care). The project is called BizArk and is available in NuGet as well.
I think you could change the ValueConverterMarkupExtension.Convert method to virtual and use the ConvertEx.To<T> method. The converter can handle many common ways of converting between values including type converters, ToXxx instance methods, static methods, constructors, etc.
Metro • October 5th, 2012
Just a nit, but shouldn't your examples be:
// somePath == @"c:\fake\path\subfolder";
rather than
// somePath == @"c:\fake\path\somePath";
haacked • October 5th, 2012
@Metro you're exactly right! Corrected.
Mario Pareja • April 4th, 2013
Given your base class uses OrdinalIgnoreCase for its equality comparisons, you should probably use same for GetHashCode:

return StringComparer.OrdinalIgnoreCase.GetHashCode(Value ?? "");

Update: I should probably mention that not doing this prevents case insensitive access when working with hash table like collections (HashSet, Dictionary).