Named Formats Redux

UPDATE: Be sure to read Peli’s post in which he explores all of these implementations using PEX. Apparently I have a lot more unit tests to write in order to define the expected behavior of the code.

I recently wrote a post in which I examined some existing implementations of named format methods and then described the fun I had writing a routine for implementing named formats.

In response to that post, I received two new implementations of the method that are worth calling out.

The first was sent to me by Atif Aziz, who I’ve corresponded with via email regarding Jayrock. He noticed that the slowest format method by James had one obvious performance flaw and fixed it up. He then sent me another version that passed all my unit tests. Here it is in full, and it’s much faster than before.

public static class JamesFormatter
{
  public static string JamesFormat(this string format, object source)
  {
    return FormatWith(format, null, source);
  }

  public static string FormatWith(this string format
      , IFormatProvider provider, object source)
  {
    if (format == null)
      throw new ArgumentNullException("format");

    List<object> values = new List<object>();
    string rewrittenFormat = Regex.Replace(format,
      @"(?<start>\{)+(?<property>[\w\.\[\]]+)(?<format>:[^}]+)?(?<end>\})+",
      delegate(Match m)
      {
        Group startGroup = m.Groups["start"];
        Group propertyGroup = m.Groups["property"];
        Group formatGroup = m.Groups["format"];
        Group endGroup = m.Groups["end"];

        values.Add((propertyGroup.Value == "0")
          ? source
          : Eval(source, propertyGroup.Value));

        int openings = startGroup.Captures.Count;
        int closings = endGroup.Captures.Count;

        return openings > closings || openings % 2 == 0
           ? m.Value
           : new string('{', openings) + (values.Count - 1) 
             + formatGroup.Value
             + new string('}', closings);
      },
      RegexOptions.Compiled 
      | RegexOptions.CultureInvariant 
      | RegexOptions.IgnoreCase);

    return string.Format(provider, rewrittenFormat, values.ToArray());
  }

  private static object Eval(object source, string expression)
  {
    try
    {
      return DataBinder.Eval(source, expression);
    }
    catch (HttpException e)
    {
      throw new FormatException(null, e);
    }
  }
}

The other version I got was from another Microsoftie named Henri Wiechers (no blog) who pointed out that a state machine of the sort that a parser or scanner would use, is well suited for this task. It also makes the code easier to understand if you’re accustomed to state machines.

public static class HenriFormatter
{
  private static string OutExpression(object source, string expression)
  {
    string format = "";

    int colonIndex = expression.IndexOf(':');
    if (colonIndex > 0)
    {
      format = expression.Substring(colonIndex + 1);
      expression = expression.Substring(0, colonIndex);
    }

    try
    {
      if (String.IsNullOrEmpty(format))
      {
        return (DataBinder.Eval(source, expression) ?? "").ToString();
      }
      return DataBinder.Eval(source, expression, "{0:" + format + "}") 
          ?? "";
    }
    catch (HttpException)
    {
      throw new FormatException();
    }
  }

  public static string HenriFormat(this string format, object source)
  {
    if (format == null)
    {
      throw new ArgumentNullException("format");
    }

    StringBuilder result = new StringBuilder(format.Length * 2);      

    using (var reader = new StringReader(format))
    {
      StringBuilder expression = new StringBuilder();
      int @char = -1;

      State state = State.OutsideExpression;
      do
      {
        switch (state)
        {
          case State.OutsideExpression:
            @char = reader.Read();
            switch (@char)
            {
              case -1:
                state = State.End;
                break;
              case '{':
                state = State.OnOpenBracket;
                break;
              case '}':
                state = State.OnCloseBracket;
                break;
              default:
                result.Append((char)@char);
                break;
            }
            break;
          case State.OnOpenBracket:
            @char = reader.Read();
            switch (@char)
            {
              case -1:
                throw new FormatException();
              case '{':
                result.Append('{');
                state = State.OutsideExpression;
                break;
              default:
                expression.Append((char)@char);
                state = State.InsideExpression;
                break;
            }
            break;
          case State.InsideExpression:
            @char = reader.Read();
            switch (@char)
            {
              case -1:
                throw new FormatException();
              case '}':
                result.Append(OutExpression(source, expression.ToString()));
                expression.Length = 0;
                state = State.OutsideExpression;
                break;
              default:
                expression.Append((char)@char);
                break;
            }
            break;
          case State.OnCloseBracket:
            @char = reader.Read();
            switch (@char)
            {
              case '}':
                result.Append('}');
                state = State.OutsideExpression;
                break;
              default:
                throw new FormatException();
            }
            break;
          default:
            throw new InvalidOperationException("Invalid state.");
        }
      } while (state != State.End);
    }

    return result.ToString();
  }

  private enum State
  {
    OutsideExpression,
    OnOpenBracket,
    InsideExpression,
    OnCloseBracket,
    End
  }
}

As before, I wanted to emphasize that this was not a performance contest. I was interested in correctness with reasonable performance.

Now that the JamesFormatter is up to speed, I was able to crank up the number of iterations to 50,000 without testing my patience and tried it again. I also made the format string slightly longer. Here are the new results.

named format perf take 2As you can see, they are all very close, though Henri’s method is the fastest.

I went ahead and updated the solution I had uploaded last time so you can try it yourself by downloading it now.

What others have said

Requesting Gravatar... Joe Chung Jan 14, 2009 8:53 PM
# re: Named Formats Redux
I really like state machines, but it's one algorithm that is in desperate need of a cleaner syntax to capture its semantics.

Maybe one of these days, we'll get proper support for continuations in .NET. There's already tail call optimization support in the CLR, but no CLR language seems to support it yet (except maybe F#?).
Requesting Gravatar... brad Jan 14, 2009 10:51 PM
# re: Named Formats Redux
Doing stuff like this reminds you just how fast computers actually are. 50k formats in <1ms WORST case really makes it hurt that much more when you have to wait for VS to nuke your processor for 30 or 60 seconds doing some seemingly simple task.
Requesting Gravatar... Peli Jan 15, 2009 2:40 AM
# re: Named Formats Redux
An interresting thing to do when you have multiple implementations that share the same specification is to compare their outcome. This is really a great scenario to apply tools such as Pex.

For example, in this case, I wrote this simple parameterized unit test:

[PexMethod]
public void HenriVsHaacked(string format, object o)
{
// checks that return value or exception caught match
PexAssert.AreBehaviorsEqual(
() => HenriFormatter.HenriFormat(format, o),
() => HaackFormatter.HaackFormat(format, o)
);
}

The 5-th test case that Pex generates is very interresting: '}'. Sounds innocent enough but the two formatteres don't agree on the outcome for this format string and triggers the assert:

"left threw a System.FormatException exception and right did not."

In fact, the Henri formatter throws a format exception while Phil's happily survives it.
Requesting Gravatar... Peli Jan 15, 2009 4:50 PM
# re: Named Formats Redux
After more investigation, it turns out none of the implementations match... see blog.dotnetwiki.org/.../...matsPexTestimonium.aspx .
Requesting Gravatar... Atif Aziz Jan 16, 2009 2:03 AM
# re: Named Formats Redux
Implementing a table-based state machine by hand, as Henri has done it, is a bit unnecessary given yield in C# (as you've done it in your version). Code using yield is almost always (given certain complexity) easier to write, read, understand, maintain and extend. It is also more concurrent in nature because its easier to avoid side effects. The heavy-lifting of creating an optimal and table-based state machine out of such yielding code can be left as an exercise for the compiler folks. :) Fortunately, that's what already happens today.

I'm not sure if the number of states and complexity of the problem warrants modeling it in terms of an explicit state machine. A simpler implementation with the usual control suspects of the language will do. I think you'll find the following version shorter, simpler and easier to read, follow and debug:

http://gist.github.com/47888
Requesting Gravatar... Henri Jan 16, 2009 7:13 AM
# re: Named Formats Redux

I'm not sure if the number of states and complexity of the problem warrants modeling it in terms of an explicit state machine. A simpler implementation with the usual control suspects of the language will do. I think you'll find the following version shorter, simpler and easier to read, follow and debug:


To me, a simple state machine is easier to understand than a mishmash of loops and branching. These things are subjective, so I encourage people to compare them and decide for themselves.
Requesting Gravatar... Henri Jan 16, 2009 7:16 AM
# re: Named Formats Redux
The new version of JamesFormatter fails on "{{x}}".
Requesting Gravatar... Kelly Stuard Jan 16, 2009 1:02 PM
# re: Named Formats Redux
Notes:
* Only the HanselFormat and OskarFormat work with IDictionary passed as the object.
* HanselFormat is nice enough to not throw a format exception if{Title} has no corresponding Title property passed in. This is different from string.Format, but nice if i want to pass my string through multiple dictionaries / anonymous classes / classes.
Requesting Gravatar... Atif Aziz Jan 17, 2009 6:01 AM
# re: Named Formats Redux
@Henri:


The new version of JamesFormatter fails on "{{x}}".


Yep and there are probably other failing cases not covered by the tests. See my comment to Richard in the earlier post.

In the end, I think your implementation is the best in terms of speed and simplicity. And as you said, whether its simplicity lies in a table-based state machine approach or regular control flow constructs of the language can be left up to the eye and mind of the beholder.

Someone out to nudge the BCL team to add this functionality to String.Format in .NET Framework 4.0 timeframe. There are some great ideas in PEP 3101 that specifies advanced string formatting as it may become available with Python 3000. It would be very useful as dynamic languages and dynamic features in existing languages like C# become more common on the CLR.
Requesting Gravatar... Atif Aziz Jan 17, 2009 6:03 AM
# re: Named Formats Redux
Oops, the link behind "my comment to Richard in the earlier post" in the previous comment is incorrect. The right one is:

my comment to Richard in the earlier post
Requesting Gravatar... Jame sHart Jan 19, 2009 1:53 AM
# re: Named Formats Redux
I'd been mooching around with some ideas for a Linq-expression based format expression 'compiler' approach for a while; never got around to actually building it. So when you posted up this test suite it seemed like it might help me prototype something up.

Was surprised at the result - see my blog entry on the subject at blogs.ipona.com/james/archive/2009/01/16/8557.aspx.
Requesting Gravatar... roboto Jan 19, 2009 3:14 AM
# re: Named Formats Redux
As I had an implementation for this before, I adopted it to your StringLib to enter the 'competition' :)

http://geekswithblogs.net/NotImpossible/archive/2009/01/19/named-format-strings.aspx

What was interesting as well, is a comparison with old fashioned string.Format("{0} {1}...").

Requesting Gravatar... Atif Aziz Jan 21, 2009 6:36 AM
# re: Named Formats Redux
@Kelly Stuard:

Only the HanselFormat and OskarFormat work with IDictionary passed as the object.


Any implementation that uses DataBinder.Eval, as the other formatters do, can work with dictionaries (or any other type for that matter) provided that the supplied source has an ICustomTypeDescriptor implementation available. DataBinder.Eval internally relies on ICustomTypeDescriptor to resolve the property names and subsequently acquire their values. As a result, those formatters are quite open and don't need any further modification. In my projects, I have a keyed-collection implementation that also implements ICustomTypeDescriptor and which can be passed on to such formatters just fine. The code can be found below:

DictionaryObject.cs
Requesting Gravatar... Jon Jan 21, 2009 10:54 AM
# re: Named Formats Redux
<p>I think it's unfair to compare the implementations using DataBinder.Eval with those just doing basic reflection on a type's properties. Eval supports nested properties, indexers, etc, but will inevitably be somewhat slower.</p>
<p>For what it's worth, here's my attempt at an implementation: http://www.amtiskaw.co.uk/JonFormatter.txt</p>
<p>I tried to do it like I would have in 1st year CompSci. I convert the string to a char array and step through with a for loop, building up a StringFormatter result as I go. Turned out a couple of hundredths of a second faster than Henri's.</p>
Requesting Gravatar... Jon Jan 21, 2009 10:56 AM
# re: Named Formats Redux
Also: Your comments system seems to be lying about what tags it allows!
Requesting Gravatar... Craig Stuntz Jan 23, 2009 10:39 AM
# TempData
Sorry, this has nothing to do with Named Formats, but I couldn't find a post about TempData that was less than a year old.

After answering many questions about TempData in ASP.NET MVC, I have a proposal to rename it. I think that the name "RedirectData" will eliminate most of the confusion that people have about this property.

blogs.teamb.com/craigstuntz/2009/01/23/37947/

I know you're getting close to release, but what do you think?
Requesting Gravatar... Andy Scott Jan 25, 2009 10:27 PM
# re: Named Formats Redux
Hey Phil,

Great post - just wondering whether there is any news on the MVC Release Candidate ?

All news have been quite on this front and just trying to plan in some time for upgrade and reading over the new docs/changes etc for my site ?

Thanks!

Andy
Requesting Gravatar... Samen Trouwen Mar 04, 2009 7:29 AM
# re: Named Formats Redux
Craig, that clarification made it clear to me. It probably will be a good idea! :)

What do you have to say?

(will show your gravatar)
Please add 3 and 8 and type the answer here: