Splitting Pascal/Camel Cased Strings

code 0 comments suggest edit

Found this post in RossCode which mentions a blog post that discusses how to bind enumerations to drop downs, something I’ve done quite often.

RossCode has an issue with using this approach personally because typically, the display text of a drop down should have spaces between words, which is not allowed in an enum value. For example…

public enum UglinessFactor {
    ButtUgly,
    Fugly,
    NotSoBad,
}

In the preceding enumeration, you’d probably want the dropdown to display “Butt Ugly” and not “ButtUgly”.

Well if you follow standard .NET naming conventions and Pascal Case your enum values, the following method SplitUpperCaseToString may be of service. It depends on another method SplitUpperCase which will split a camel or pascal cased word into an array of component words.

As a refresher, a Pascal Cased string is one in which the first letter of each word is capitalized. For example, ThisIsPascalCased. By contrast, a Camel Cased string is one in which the first letter of the string is lowercase, but the first letter of each successive word is upper cased. For example, thisIsCamelCased.

public static string SplitUpperCaseToString(this string source) {
  return string.Join(" ", SplitUpperCase(source));
}
 
public static string[] SplitUpperCase(this string source) {
  if (source == null) {
    return new string[] {}; //Return empty array.
  }
  if (source.Length == 0) {
    return new string[] {""};
  }
 
  StringCollection words = new StringCollection();
  int wordStartIndex = 0;
 
  char[] letters = source.ToCharArray();  char previousChar = char.MinValue;
  // Skip the first letter. we don't care what case it is.
  for (int i = 1; i < letters.Length; i++) {
    if (char.IsUpper(letters[i]) && !char.IsWhiteSpace(previousChar)) {
      //Grab everything before the current character.
      words.Add(new String(letters, wordStartIndex, i - wordStartIndex));
      wordStartIndex = i;
    }    previousChar = letters[i]; 
  }

  //We need to have the last word.
  words.Add(new String(letters, wordStartIndex,     letters.Length - wordStartIndex)); 
 
  string[] wordArray = new string[words.Count];
  words.CopyTo(wordArray, 0);
  return wordArray;
}

Try it out and let me know if it was useful for you.

UPDATE (8/3/2010):Fixed a bug so that this doesn’t affect strings that already have spaces in them.

Found a typo or error? Suggest an edit! If accepted, your contribution is listed automatically here.

Comments

avatar

11 responses

  1. Avatar for Joel Ross
    Joel Ross September 24th, 2005

    Phil,



    Thanks for the feedback and advice. This is a great idea, and since I do follow that naming convention, this would definitely work. I'm not sure why it never occured to me!



    And, how'd you know what enumeration I wanted to bind to? ;-)

  2. Avatar for Haacked
    Haacked September 24th, 2005

    Everybody wants to bind the UglinessFactor enumeration. It is probably the most used enumeration in the framework.

  3. Avatar for Kevin Dente
    Kevin Dente September 24th, 2005

    Err...not a very localizable solution. Perhaps that's not an issue for them, but I would hestitate to recommend this as a general solution. Cool for quick-and-dirty stuff though. ;)

  4. Avatar for Chris Martin
    Chris Martin September 24th, 2005

    I've never thought about doing it that way. Something about it smells funny though. It's not very .NETish. This is what I end up doing when binding to enums.





    public enum UglinessFactor

    {

    [EnumDisplayText("Butt Ugly")]

    ButtUgly,



    [EnumDisplayText("Freakin Ugly")]

    Fugly,



    [EnumDisplayText("Not So Bad")]

    NotSoBad,

    }



    public class EnumDisplayTextAttribute : Attribute

    {

    private Enum _host;

    private string _text;



    public EnumDisplayTextAttribute (string text)

    {

    this._text = text;

    }



    public static EnumDisplayTextAttribute Parse(Enum value)

    {

    EnumDisplayTextAttribute toReturn = null;



    Type type = value.GetType();



    FieldInfo field = type.GetField(value.ToString());



    object[] attributes = field.GetCustomAttributes(typeof(EnumDisplayTextAttribute), false);



    if(attributes.Length > 0)

    {

    toReturn = (EnumDisplayTextAttribute)attributes[0];

    toReturn._host = value;

    }



    return toReturn;

    }



    public Enum Host

    {

    get { return _host; }

    }



    public string Text

    {

    get { return _text; }

    }

    }



    public class Application

    {

    private static void Main()

    {

    string[] names = Enum.GetNames(typeof(UglinessFactor));



    foreach(string name in names)

    {

    UglinessFactor uglyFactor = (UglinessFactor)Enum.Parse(typeof(UglinessFactor), name, true);



    EnumDisplayTextAttribute attribute = EnumDisplayTextAttribute.Parse(uglyFactor);



    Console.WriteLine("Host Enum: {0}\nDisplayText: {1}\n", attribute.Host.ToString(), attribute.Text);

    }



    Console.ReadLine();

    }

    }

  5. Avatar for Haacked
    Haacked September 24th, 2005

    Chris, your solution is definitely a better approach if you have control over the enum. But if you were doing a bind to an enum for a library, you might not have the benefit of adding attributes to it.



    Kevin, I assume you mean binding to the enum is not a localizable solution. Yes, I agree, that isn&#8217;t localizable. A localizable solution would be to use the enum value as a key into a resource file and not fiddle around with my method at all.



    But if you were binding an array of enums values that are culture invariant, say an enum of proper names, then this could work.

  6. Avatar for Jon Galloway
    Jon Galloway September 24th, 2005

    You can do this with a regex replace, too: (?<=[a-z])(?=[A-Z])



    Here's a post about PHP code snip that uses it. A lot less code and I'd bet it's faster:



    http://ad.hominem.org/log/2004/12/camelcase.php

  7. Avatar for Haacked
    Haacked September 25th, 2005

    I'm not so sure it's going to be faster. At least, if we compare the methods that split camel case into an array (not the one that joins it back).



    Consider that char by char parsing is almost always faster than a generalized regex engine.



    Also consider that the use case for this method will on average at most split something into three words.



    But of course, I'd like to see some measurements. Measure, measure, measure.

  8. Avatar for secretGeek
    secretGeek September 28th, 2005

    Hey boss



    I've fixed this problem in the past through use of a function called 'DisPascalize' (i've blogged about it here

    http://secretgeek.net/progr_purga.asp)



    it relies on regular expression. initially i thought this was a criminally bad solution -- but a year and a half later and i still use it!



    cheers

    lb

  9. Avatar for Tom
    Tom January 7th, 2009

    Thanks, you saved me some time today!

  10. Avatar for Manuel Castro
    Manuel Castro April 26th, 2010

    Very useful, whorth adding all this code to the project text utilities class !!

  11. Avatar for Kevin Day
    Kevin Day March 24th, 2016

    if you change line:

    if (char.IsUpper(letters[i]) && !char.IsWhiteSpace(previousChar)) {

    to: if ((char.IsUpper(letters[i]) || char.IsDigit(letters[i])) && !char.IsWhiteSpace(previousChar)) {

    you will also get a space before a numbers so that Phone1 will become Phone 1