Named Formats Redux

code 0 comments suggest edit

UPDATE: Be sure to read Peli’s post in which he explores all of these implementations using PEX. Apparently I have a lot more unit tests to write in order to define the expected behavior of the code.

I recently wrote a post in which I examined some existing implementations of named format methods and then described the fun I had writing a routine for implementing named formats.

In response to that post, I received two new implementations of the method that are worth calling out.

The first was sent to me by Atif Aziz, who I’ve corresponded with via email regarding Jayrock. He noticed that the slowest format method by James had one obvious performance flaw and fixed it up. He then sent me another version that passed all my unit tests. Here it is in full, and it’s much faster than before.

public static class JamesFormatter
{
  public static string JamesFormat(this string format, object source)
  {
    return FormatWith(format, null, source);
  }

  public static string FormatWith(this string format
      , IFormatProvider provider, object source)
  {
    if (format == null)
      throw new ArgumentNullException("format");

    List<object> values = new List<object>();
    string rewrittenFormat = Regex.Replace(format,
      @"(?<start>\{)+(?<property>[\w\.\[\]]+)(?<format>:[^}]+)?(?<end>\})+",
      delegate(Match m)
      {
        Group startGroup = m.Groups["start"];
        Group propertyGroup = m.Groups["property"];
        Group formatGroup = m.Groups["format"];
        Group endGroup = m.Groups["end"];

        values.Add((propertyGroup.Value == "0")
          ? source
          : Eval(source, propertyGroup.Value));

        int openings = startGroup.Captures.Count;
        int closings = endGroup.Captures.Count;

        return openings > closings || openings % 2 == 0
           ? m.Value
           : new string('{', openings) + (values.Count - 1) 
             + formatGroup.Value
             + new string('}', closings);
      },
      RegexOptions.Compiled 
      | RegexOptions.CultureInvariant 
      | RegexOptions.IgnoreCase);

    return string.Format(provider, rewrittenFormat, values.ToArray());
  }

  private static object Eval(object source, string expression)
  {
    try
    {
      return DataBinder.Eval(source, expression);
    }
    catch (HttpException e)
    {
      throw new FormatException(null, e);
    }
  }
}

The other version I got was from another Microsoftie named Henri Wiechers (no blog) who pointed out that a state machine of the sort that a parser or scanner would use, is well suited for this task. It also makes the code easier to understand if you’re accustomed to state machines.

public static class HenriFormatter
{
  private static string OutExpression(object source, string expression)
  {
    string format = "";

    int colonIndex = expression.IndexOf(':');
    if (colonIndex > 0)
    {
      format = expression.Substring(colonIndex + 1);
      expression = expression.Substring(0, colonIndex);
    }

    try
    {
      if (String.IsNullOrEmpty(format))
      {
        return (DataBinder.Eval(source, expression) ?? "").ToString();
      }
      return DataBinder.Eval(source, expression, "{0:" + format + "}") 
          ?? "";
    }
    catch (HttpException)
    {
      throw new FormatException();
    }
  }

  public static string HenriFormat(this string format, object source)
  {
    if (format == null)
    {
      throw new ArgumentNullException("format");
    }

    StringBuilder result = new StringBuilder(format.Length * 2);      

    using (var reader = new StringReader(format))
    {
      StringBuilder expression = new StringBuilder();
      int @char = -1;

      State state = State.OutsideExpression;
      do
      {
        switch (state)
        {
          case State.OutsideExpression:
            @char = reader.Read();
            switch (@char)
            {
              case -1:
                state = State.End;
                break;
              case '{':
                state = State.OnOpenBracket;
                break;
              case '}':
                state = State.OnCloseBracket;
                break;
              default:
                result.Append((char)@char);
                break;
            }
            break;
          case State.OnOpenBracket:
            @char = reader.Read();
            switch (@char)
            {
              case -1:
                throw new FormatException();
              case '{':
                result.Append('{');
                state = State.OutsideExpression;
                break;
              default:
                expression.Append((char)@char);
                state = State.InsideExpression;
                break;
            }
            break;
          case State.InsideExpression:
            @char = reader.Read();
            switch (@char)
            {
              case -1:
                throw new FormatException();
              case '}':
                result.Append(OutExpression(source, expression.ToString()));
                expression.Length = 0;
                state = State.OutsideExpression;
                break;
              default:
                expression.Append((char)@char);
                break;
            }
            break;
          case State.OnCloseBracket:
            @char = reader.Read();
            switch (@char)
            {
              case '}':
                result.Append('}');
                state = State.OutsideExpression;
                break;
              default:
                throw new FormatException();
            }
            break;
          default:
            throw new InvalidOperationException("Invalid state.");
        }
      } while (state != State.End);
    }

    return result.ToString();
  }

  private enum State
  {
    OutsideExpression,
    OnOpenBracket,
    InsideExpression,
    OnCloseBracket,
    End
  }
}

As before, I wanted to emphasize that this was not a performance contest. I was interested in correctness with reasonable performance.

Now that the JamesFormatter is up to speed, I was able to crank up the number of iterations to 50,000 without testing my patience and tried it again. I also made the format string slightly longer. Here are the new results.

named format perf take
2As you can see, they are all very close, though Henri’s method is the fastest.

I went ahead and updated the solution I had uploaded last time so you can try it yourself by downloading it now.

Tags: format strings , named formats

Found a typo or error? Suggest an edit! If accepted, your contribution is listed automatically here.

Comments

avatar

23 responses

  1. Avatar for Joe Chung
    Joe Chung January 14th, 2009

    I really like state machines, but it's one algorithm that is in desperate need of a cleaner syntax to capture its semantics.
    Maybe one of these days, we'll get proper support for continuations in .NET. There's already tail call optimization support in the CLR, but no CLR language seems to support it yet (except maybe F#?).

  2. Avatar for brad
    brad January 14th, 2009

    Doing stuff like this reminds you just how fast computers actually are. 50k formats in <1ms WORST case really makes it hurt that much more when you have to wait for VS to nuke your processor for 30 or 60 seconds doing some seemingly simple task.

  3. Avatar for Peli
    Peli January 14th, 2009

    An interresting thing to do when you have multiple implementations that share the same specification is to compare their outcome. This is really a great scenario to apply tools such as Pex.
    For example, in this case, I wrote this simple parameterized unit test:
    [PexMethod]
    public void HenriVsHaacked(string format, object o)
    {
    // checks that return value or exception caught match
    PexAssert.AreBehaviorsEqual(
    () => HenriFormatter.HenriFormat(format, o),
    () => HaackFormatter.HaackFormat(format, o)
    );
    }
    The 5-th test case that Pex generates is very interresting: '}'. Sounds innocent enough but the two formatteres don't agree on the outcome for this format string and triggers the assert:
    "left threw a System.FormatException exception and right did not."
    In fact, the Henri formatter throws a format exception while Phil's happily survives it.

  4. Avatar for Peli
    Peli January 15th, 2009

    After more investigation, it turns out none of the implementations match... see blog.dotnetwiki.org/.../...matsPexTestimonium.aspx .

  5. Avatar for Atif Aziz
    Atif Aziz January 15th, 2009

    Implementing a table-based state machine by hand, as Henri has done it, is a bit unnecessary given yield in C# (as you've done it in your version). Code using yield is almost always (given certain complexity) easier to write, read, understand, maintain and extend. It is also more concurrent in nature because its easier to avoid side effects. The heavy-lifting of creating an optimal and table-based state machine out of such yielding code can be left as an exercise for the compiler folks. :) Fortunately, that's what already happens today.
    I'm not sure if the number of states and complexity of the problem warrants modeling it in terms of an explicit state machine. A simpler implementation with the usual control suspects of the language will do. I think you'll find the following version shorter, simpler and easier to read, follow and debug:
    http://gist.github.com/47888

  6. Avatar for Henri
    Henri January 15th, 2009

    I'm not sure if the number of states and complexity of the problem warrants modeling it in terms of an explicit state machine. A simpler implementation with the usual control suspects of the language will do. I think you'll find the following version shorter, simpler and easier to read, follow and debug:


    To me, a simple state machine is easier to understand than a mishmash of loops and branching. These things are subjective, so I encourage people to compare them and decide for themselves.

  7. Avatar for Henri
    Henri January 15th, 2009

    The new version of JamesFormatter fails on "{{x}}".

  8. Avatar for Kelly Stuard
    Kelly Stuard January 16th, 2009

    Notes:
    * Only the HanselFormat and OskarFormat work with IDictionary passed as the object.
    * HanselFormat is nice enough to not throw a format exception if{Title} has no corresponding Title property passed in. This is different from string.Format, but nice if i want to pass my string through multiple dictionaries / anonymous classes / classes.

  9. Avatar for Atif Aziz
    Atif Aziz January 16th, 2009

    @Henri:


    The new version of JamesFormatter fails on "{{x}}".


    Yep and there are probably other failing cases not covered by the tests. See my comment to Richard in the earlier post.
    In the end, I think your implementation is the best in terms of speed and simplicity. And as you said, whether its simplicity lies in a table-based state machine approach or regular control flow constructs of the language can be left up to the eye and mind of the beholder.
    Someone out to nudge the BCL team to add this functionality to String.Format in .NET Framework 4.0 timeframe. There are some great ideas in PEP 3101 that specifies advanced string formatting as it may become available with Python 3000. It would be very useful as dynamic languages and dynamic features in existing languages like C# become more common on the CLR.

  10. Avatar for Atif Aziz
    Atif Aziz January 16th, 2009

    Oops, the link behind "my comment to Richard in the earlier post" in the previous comment is incorrect. The right one is:
    my comment to Richard in the earlier post

  11. Avatar for Jame sHart
    Jame sHart January 18th, 2009

    I'd been mooching around with some ideas for a Linq-expression based format expression 'compiler' approach for a while; never got around to actually building it. So when you posted up this test suite it seemed like it might help me prototype something up.
    Was surprised at the result - see my blog entry on the subject at blogs.ipona.com/james/archive/2009/01/16/8557.aspx.

  12. Avatar for roboto
    roboto January 18th, 2009

    As I had an implementation for this before, I adopted it to your StringLib to enter the 'competition' :)
    http://geekswithblogs.net/NotImpossible/archive/2009/01/19/named-format-strings.aspx
    What was interesting as well, is a comparison with old fashioned string.Format("{0} {1}...").

  13. Avatar for Atif Aziz
    Atif Aziz January 20th, 2009

    @Kelly Stuard:

    Only the HanselFormat and OskarFormat work with IDictionary passed as the object.


    Any implementation that uses DataBinder.Eval, as the other formatters do, can work with dictionaries (or any other type for that matter) provided that the supplied source has an ICustomTypeDescriptor implementation available. DataBinder.Eval internally relies on ICustomTypeDescriptor to resolve the property names and subsequently acquire their values. As a result, those formatters are quite open and don't need any further modification. In my projects, I have a keyed-collection implementation that also implements ICustomTypeDescriptor and which can be passed on to such formatters just fine. The code can be found below:
    DictionaryObject.cs

  14. Avatar for Jon
    Jon January 20th, 2009

    <p>I think it's unfair to compare the implementations using DataBinder.Eval with those just doing basic reflection on a type's properties. Eval supports nested properties, indexers, etc, but will inevitably be somewhat slower.</p>
    <p>For what it's worth, here's my attempt at an implementation: http://www.amtiskaw.co.uk/JonFormatter.txt</p>
    <p>I tried to do it like I would have in 1st year CompSci. I convert the string to a char array and step through with a for loop, building up a StringFormatter result as I go. Turned out a couple of hundredths of a second faster than Henri's.</p>

  15. Avatar for Jon
    Jon January 20th, 2009

    Also: Your comments system seems to be lying about what tags it allows!

  16. Avatar for Craig Stuntz
    Craig Stuntz January 22nd, 2009

    Sorry, this has nothing to do with Named Formats, but I couldn't find a post about TempData that was less than a year old.
    After answering many questions about TempData in ASP.NET MVC, I have a proposal to rename it. I think that the name "RedirectData" will eliminate most of the confusion that people have about this property.
    blogs.teamb.com/craigstuntz/2009/01/23/37947/
    I know you're getting close to release, but what do you think?

  17. Avatar for Andy Scott
    Andy Scott January 25th, 2009

    Hey Phil,
    Great post - just wondering whether there is any news on the MVC Release Candidate ?
    All news have been quite on this front and just trying to plan in some time for upgrade and reading over the new docs/changes etc for my site ?
    Thanks!
    Andy

  18. Avatar for Samen Trouwen
    Samen Trouwen March 3rd, 2009

    Craig, that clarification made it clear to me. It probably will be a good idea! :)

  19. Avatar for dotnetchris
    dotnetchris March 23rd, 2012

    I created a fork of Atif Aziz's version of the Henriformatter: https://gist.github.com/2170953
    This adds that you can use anonymous object, IDictionary<string,object>, and RouteData as all valid sources for the Heniformat source.
    Hmmm I'm thinking maybe this should be uploaded as a nuget package.

  20. Avatar for Zaus
    Zaus September 21st, 2012

    Yet another attempt: https://gist.github.com/3763138
    I was curious, and set up my own tests using some randomly generated string-masks (https://gist.github.com/3763223) and I get different test results. Granted I'm using a homegrown benchmark framework (all it's doing is running the Action X times and using Stopwatch), but I thought the results were kinda surprising -- seems the performance somewhat depends on the size of the test set (tokens + string).

  21. Avatar for Mladen Mihajlovic
    Mladen Mihajlovic September 14th, 2013
  22. Avatar for FEBit
    FEBit March 27th, 2015

    Exception Details: System.MissingMethodException: Method 'System.String.ToString' not found.

    Source Error:

    Line 75: else //format info
    Line 76: {
    Line 77: result = retrievedType.InvokeMember("ToString",
    Line 78: BindingFlags.Public | BindingFlags.NonPublic |
    Line 79:

  23. Avatar for Ryan
    Ryan March 22nd, 2016

    Another one joins the fight: https://github.com/crozone/...

    With NuGet: https://www.nuget.org/packa...

    I wrote this as a state machine that tokenizes the input into a List of {Index, Length} with reference to the original format string. The tokens then get looped over, any tokens that are classified as parameters are handled, and everything is fed directly into a StringBuilder without any intermediate SubString() operations. The tokenizer can also take custom characters for the opening and closing braces (like '<' and '>'), and custom behaviour can be specified for how the parameters are handled.

    It also doesn't rely on DataBinder.Eval(), so it's compliant with .NET Standard/Core. Feedback would be appreciated if anyone wants to give it a spin!