comments edit

I think Miguel de Icaza nails it regarding some of the FUD being written about Microsoft’s latest move to make the source code to the .NET Framework available under the Microsoft Reference License (Ms-RL).

In fact, his post inspired me to try my hand at creating a comic. I have no comic art skills (nor comic writing skills), so please forgive me for my lack of talent (click for full size)…

Microsoft opens the
code

I know some of the people involved who made this happen and I find it hard to believe that there were nefarious intentions involved. You have to understand that while Bill Gates and Steve Ballmer are known for playing hardball, they aren’t necessarily personally involved in every initiative at Microsoft (as far as I know).

Some things start from the grassroots with motives as simple as trying to give developers a better experience than they’ve had before.

Before: the original code, complete with helpful comments, original variable names, etc… was closed. You could use Reflector (and possibly violate EULAs in the process), but it wasn’t as nice as having the actual code.

After: The source is available to be seen. This is certainly not more closed than before. It is clearly better because you now have more choice. You can choose to view the code, or chose not to. Before, you only had one choice - no lookie lookie here!

But It’s Not Open Source!

Many pundits have pointed out that this is not Open Source. That is correct and as far as I can tell, nobody at Microsoft (at least in an official position) is claiming that.

The Ms-RL is not an open source license, so there is reason to be cautious should you be contributing to the Mono project, or plan to write a component that is similar to something within the framework. As Miguel wrote in his post, these precautions have been in place within the Open Source community for a very long time.

So yes, it’s not open source. But it’s a step in the right direction. As I’ve written before, we’re seeing steady progression within Microsoft regarding Open Source, albeit with the occasional setback.

My hope, when I start at Microsoft, is to be involved with that progress in one form or another as I see it as essential and beneficial to Microsoft. But I will be patient.

Should You Look At The Code?

So should you look at the source code? Frans Bouma says no!

Take for example the new ReaderWriterLockSlim class introduced in .NET 3.5. It’s in the System.Threading namespace which will be released in the pack of sourcecode-you-can-look-at. This class is a replacement for the flawed ReaderWriterLock in the current versions of .NET. This new lock is based on a patent, which (I’m told) is developed by Jeffrey Richter and sold to MS. This new class has its weaknesses as well (nothing is perfect). If you want to bend this class to meet your particular locking needs by writing a new one based on the ideas in that class’ sourcecode, you’re liable for a lawsuit as your code is a derivative work based on a patented class which is available in sourcecode form.

However I think the advice in Miguel’s post addresses this to some degree.

If you have a vague recollection of the internals of a Unix program, this does not absolutely mean you can’t write an imitation of it, but do try to organize the imitation internally along different lines, because this is likely to make the details of the Unix version irrelevant and dissimilar to your results.

My advice would be to use your head and not veer towards one extreme or another. If you’re planning to ship a ReaderWriterLockSlim class, then I probably wouldn’t look at their implementation.

But that shouldn’t stop you from looking at code that you have no plans to rewrite or copy.

And what do you do if you happen to look at the ReaderWriterLockSlim class on accident and were planning to write one for your internal data entry app? Either have another member of your team write it, or follow the above advice and implement it along different lines.

For example, Unix utilities were generally optimized to minimize memory use; if you go for speed instead, your program will be very different …

Or, on the contrary, emphasize simplicity instead of speed. For some applications, the speed of today’s computers makes simpler algorithms adequate.

Or go for generality. For example, Unix programs often have static tables or fixed-size strings, which make for arbitrary limits; use dynamic allocation instead.

Just don’t copy the existing implementation.

For many developers, their code is never distributed because it is completely internal, or runs on a web server. In that case, I think the risk is very low that anyone is going to prove you infringed on a patent because you happened to look at a piece of code, unless the code is a very visible UI element.

Please don’t misunderstand me on this point. I’m not recommending you violate any software patents (even though I think most if not all software patents are dubious), I’m just saying the risk of patent taint for many developers who look at the .NET source code is not as grave as many are making it out to be. When in doubt, you’d do well to follow the advice in Miguel’s post.

UPDATE: Upon further reflection, I realized there is one particular risk with what I’ve just said.

In the case of the ReaderWriteLockSlim, I believe the particular algorithm for high performance is patented. But what if the idea of a reader write lock in general (one that allows simultaneous reads unless blocking for a write) was patented.

Then you could get in trouble for implementing a reader write lock even if you never look at the source code. Patent infringement is a whole different beast than copyright infringement. This scenario is not so far fetched and is something Bill Gates has warned against in the past and has come to pass many times in the present.

Of course, this risk is present whether or not Microsoft makes the source available. By using Reflector, for example, you’d have the same risk of being exposed to patented techniques.

I should point out I’m not a lawyer so follow any of this advice at your own risk.

Having said that, I think a follow-up post on Frans’s blog proposes a solution I think Microsoft should jump on to clear things up. It comes from the JRL (Java Research License).

The JRL is not a tainting license and includes an express ‘residual knowledge’ clause which says you’re not contaminated by things you happen to remember after examining the licensed technology. The JRL allows you to use the source code for the purpose of JRL-related activities but does not prohibit you from working on an independent implementation of the technology afterwards.

It’d be nice if Microsoft added a similar clause to the Ms-RL so much of this FUD can just go away. Or even better, take the next step and look at putting this code (at least some of it) under the Ms-PL.

Disclaimer: Starting on October 15, I will be a Microsoft Employee, but the opinions expressed in this post are mine and mine only. I do not speak for Microsoft on these matters.

I’m also the leader of a couple OSS projects, so I will be very careful about separating what I learn on the job vs what I contribute to Subtext et all. But I’ll be a PM so I hear I won’t be looking at much code anyways. ;)

comments edit

ASP.NET 2.0
AnthologyI just received a few advanced copies of our new book and am giving away three of them to the first three people who leave a comment on this post.

But there’s a catch!

You have to have a blog and promise to write a review on your blog. This is on the honor system so I’ll send you the book and you can then review it.

In your comment, leave your email address in the email field (it’s not visible to anyone else) and I’ll follow up to get your mailing address. Also let me know if you want it signed or not. Not sure why you’d want that, but you never know.

comments edit

One weakness with many blog engines, Subtext included, is that it is difficult to change the tags and categories for multiple entries at a time. In general, most blog engines streamline the workflow for tagging and categorizing a single blog post.

Fortunately, Marco De Sanctis, a friend of Simo (a core Subtext Developer) wrote a nice application that you can use to bulk categorize and tag multiple posts. He developed it using Subtext as a test-bed so it handles the fact that we use the rel-tag microformat within the content as our tagging mechanism. Sweeeeet!

blogmanager_image

Many thanks to Simo for blogging about this and to Marco for writing this.

code, tdd comments edit

It is a sad fact of life that, in this day and age, arguments are not won with sound logic and reasoning. Instead, applying the principle of framing an argument is much more effective at swaying public opinion.

1364145387_b8cf994488 So the next time you try to make headway introducing Test Driven Development (or even simply introducing writing automated unit tests at all) into an organization and are rebuffed with…

Don’t bring your fancy schmancy flavor of the week agile manifesto infested “methodology” here kiddo. I’ve been writing software my way for a loooong time…

You can reply with…

I’m sorry, but I’m not a fan of Bug Driven Development. I think Test Driven Development is not without its challenges, but it’s a better alternative. Either you’re with us, or against us. Are you a bug lover? Bug Driven Development gives comfort to the bugs.

UPDATE: this is an example of my dry humor. I don’t believe that “Framing” is a good way to win an argument and I would never actually say or recommend saying anything similar to to this. It’s meant as a bit of a joke, but with a point.

A team that is not focused on automated testing of some sort throughout the lifecycle of the project is effectively embracing Bug Driven Development. Bugs are going to drive the development cycle at the end of the project.

Don’t believe me though, look at the research done by others. In Facts and Fallacies of Software Engineering, Robert Glass points out…

Fact 31. Error removal is the most time-consuming phase of the life cycle.

In Rapid Development, Steve McConnell relates…

Shortcutting 1 day of QA activity early in the project is likely to cost you from 3 to 10 days of activity downstream.

In other words, if you don’t control the bugs, the bugs control your schedule.

code, tdd comments edit

This is a simple little demonstration of how to write unit tests to test out a specific role based permission issue using NUnit/MbUnit and Rhino Mocks.

In Subtext, we have a class named FileBrowserConnector that really should only ever be constructed by a member of the Admins role. Because this class can write to the file system, we want to take extra precautions other than simply restricting access to the URL in which this object is created.

Here are two tests I wrote to begin with.

[Test]
[ExpectedException(typeof(SecurityException))]
public void NonAdminCannotCreateFileConnector()
{
  new FileBrowserConnector();
}

[Test]
public void AdminCanCreateFileConnector()
{
  MockRepository mocks = new MockRepository();

  IPrincipal principal;
  using (mocks.Record())
  {
    IIdentity identity = mocks.CreateMock<IIdentity>();
    SetupResult.For(identity.IsAuthenticated).Return(true);
    principal = mocks.CreateMock<IPrincipal>();
    SetupResult.For(principal.Identity).Return(identity);
    SetupResult.For(principal.IsInRole("Admins")).Return(true);
  }

  using (mocks.Playback())
  {
    IPrincipal oldPrincipal = Thread.CurrentPrincipal;
    try
    {
      Thread.CurrentPrincipal = principal;
      FileBrowserConnector connector = new FileBrowserConnector();
      Assert.IsNotNull(connector, "Could not create the connector.");
    }
    finally
    {
      Thread.CurrentPrincipal = oldPrincipal;
    }
  }
}

The first test is really straightforward. It simply tries to instantiate the FileBrowserConnector class.

The second test is a bit more involved, but the concept is simple. I’m using the Rhino Mocks mocking framework to dynamically construct instance that implement the IIdentity and IPrincipal interfaces.

The following line…

SetupResult.For(principal.IsInRole("Admins")).Return(true);

Tells the dynamic principal mock to return true when the IsInRole method is called with the parameter “Admins”. We then set the Thread.CurrentPrincipal to this constructed principal and try and create the instance of FileBrowserConnector.

Here’s the results of my first test run, trimmed down a bit.

Found 2 tests
[failure] FileBrowserConnectorTests.NonAdminCannotCreateFileConnector
Exception of type 'MbUnit.Core.Exceptions.ExceptionNotThrownException' 
was thrown. 

[success] FileBrowserConnectorTests.AdminCanCreateFileConnector
[reports] generating HTML report
TestResults: file:///D:/AppData/MbUnit/Reports/UnitTests.Subtext.Tests.html

1 passed, 1 failed, 0 skipped, took 4.37 seconds.

As expected, one test passed and one failed. Now I can go ahead and enforce security on the FileBrowserConnector class.

[PrincipalPermission(SecurityAction.Demand, Role = "Admins")]
public class FileBrowserConnector: Page
{
  //... implementation ...
}

That’s all there is to it. You might be wondering if this test is even needed because all I’m really testing is that the PrincipalPermission attribute does indeed work.

This test is still important to prevent regressions. You don’t want someone coming along and removing that attribute by accident or out of ignorance and you don’t notice it.

In codebases that I’ve worked with, I’ve seen a tendency to ignore or forget to write test cases for security requirements. This demo hopefully provides a starting point for myself and others to making sure that security requirements get good test coverage.

I should probably write yet another test to make sure a principal in a different role cannot create an instance of this class.

code, tdd comments edit

This is a quick follow-up to my last post. That seemed like such a common test situation I figured I’d write a quick generic method for encapsulating those two tests.

I’ll start with usage.

[Test]
public void FileBrowserSecureCreationTests()
{
  AssertSecureCreation<FileBrowserConnector>(new string[] {"Admins"});
}

And here’s the method.

/// <summary> 
/// Helper method. Makes sure you can create an instance  
/// of a type if you have the correct role.</summary> 
/// <typeparam name="T"></typeparam> 
/// <param name="allowedRoles"></param> 
public static void AssertSecureCreation<T>(string[] allowedRoles
  , params object[] constructorArguments)
{
  try   
  {     
    Activator.CreateInstance(typeof (T), constructorArguments);
    Assert.Fail("Was able to create the instance with no security.");
  }
  catch(TargetInvocationException e)
  {
    Assert.IsInstanceOfType(typeof(SecurityException)
      , e.InnerException
      , "Expected a security exception, got something else.");
  }

  MockRepository mocks = new MockRepository();

  IPrincipal principal;
  using (mocks.Record())
  {
    IIdentity identity = mocks.CreateMock<IIdentity>();
    SetupResult.For(identity.IsAuthenticated).Return(true);
    principal = mocks.CreateMock<IPrincipal>();
    SetupResult.For(principal.Identity).Return(identity);
    Array.ForEach(allowedRoles, delegate(string role) 
    {
      SetupResult.For(principal.IsInRole(role)).Return(true);
    });
  }

  using (mocks.Playback())
  {
    IPrincipal oldPrincipal = Thread.CurrentPrincipal;
    try
    {       
      Thread.CurrentPrincipal = principal;       
      Activator.CreateInstance(typeof(T), constructorArguments);
      //Test passes if no exception is thrown.
    }     
    finally
    {       
      Thread.CurrentPrincipal = oldPrincipal;     
    }   
  } 
}

There are definite improvements we can make, but this is a nice quick way to test the basic permission level for a class.

personal comments edit

UPDATE: We released Subtext 2.0 which also includes the fix for this vulnerability among many other bug fixes.

A Subtext user reported a security vulnerability due to a flaw in our integration with the FCKEditor control which allows someone to upload files into the images directory without being authenticated.

As far as we know, nobody has been seriously affected, but please update your installation as soon as possible. Our apologies for the inconvenience.

The fix should be relatively quick and painless to apply.

The Fix

If you’re running Subtext 1.9.* we have a fix available consisting of a single assembly, Subtext.Providers.BlogEntryEditor.FCKeditor.dll. After you download it (Subtext1.9.5-PATCH.zip 7.72KB) , unzip the assembly (I recommend backing up your old one just in case) and copy it into your bin directory.

Alternative Workaround

If you’re running a customized version and the above patch causes problems, you can workaround this issue by backing up and then temporarily removing the following directory in your installation.

Providers\BlogEntryEditor\FCKeditor\editor\filemanager

Notes

The Subtext team takes security very seriously and we regret that this flaw made it into our system. We appreciate that a user discretely brought it to our attention and worked quickly to create and test a patch. I went ahead and updated the release on SourceForge (if you’ve downloaded Subtext-1.9.5b then you’re safe) so that no new downloads are affected.

The code also has been fixed in Subversion in case you’re running a custom built version of Subtext.

I will follow up with a post later describing the issue in more detail and what we plan to do to mitigate such risks in the future. I’ll also write a post outlining general guidelines for reporting and handling security issues in an open source project based on guidance provided by the Karl Fogel book, Producing Open Source Software.

Again, I am sorry for any troubles and inconvenience this may have caused. If you know any Subtext users, please let them know. I’ll be updating the website momentarily.

Download

Again, here is the patch location.

comments edit

In his book, Producing Open Source Software, Karl Fogel gives sage advice on running an open source project. The section on how to deal with a security vulnerability was particularly interesting to me last night.

Upon learning of a potential security hole, Karl recommends the following:

  1. Don’t talk about the bug publicly until a fix is available.
  2. Make sure to have a private mailing list setup with a small group of trusted committers where users can send security reports.
  3. Fix the patch quickly. Time is of the essence.
  4. Don’t commit the fix into your source control lest someone scanning for such vulnerabilities find out about it. Wait till after the fix is released.
  5. Give well known administrators (and thus likely targets) using the software a heads up before announcing the flaw and the fix.
  6. Distribute the fix publicly.

There’s more elaboration in the book, but I think the above list distills the key points. Karl’s advice is born from his experience working on CVS and leading the Subversion project and makes a lot of sense.

But for a project built on Java, .NET, or a scripting language, there is an interesting dilemma. The security fix itself announces the vulnerability.

When the Subversion team releases a patch, it is generally compiled to native machine code, which is effectively opaque to the world. Sure with time and effort, a native executable can be decompiled, but the barrier is high to discover the actual exploit by examining the binary. It buys consumers time to patch their installations before exploits start becoming rampant.

With a language like C#, Java, or Ruby, the bar to looking at the code is extremely low. Such languages can raise the bar slightly by using obfuscators, but that is really not common for an Open Source project and creates very little delay for the determined attacker.

So no matter how well you keep the flaw private until you’re ready to announce the fix. The announcement and publication of the fix itself potentially points attackers to the flaw.

This is one situation in which the increased transparency of such languages can cause a problem. Consumers of projects built on these languages have to be extra vigilant about applying patches quickly, while developers of such code must be extra vigilant in threat modeling and code review to avoid security vulnerabilities in the first case. Then again, this doesn’t mean that code compiled to a native binary should be any less vigilant about security.\

If you have a better way of distributing security patches for VM-based/Scripting language projects than this, please do tell.

comments edit

41XDcuGaQrL._AA240_ Remember the book I mentioned that I was writing along with a few colleagues? Well it is finally available for pre-order on Amazon.com!

If you love me, you’ll buy five copies each. No. Ten copies!

Or, you could wait for the reviews and buy the book on its own merits, which I hope it warrants. But what’s the fun in that?

All kidding aside, this was a fun and tiring collaborative effort with Jeff “Coding Horror” Atwood, Jon Galloway, K. Scott Allen, and Wyatt Barnett. The book aggregates our collective wisdom on the topic of building web applications with ASP.NET 2.0 as a series of tips, tricks, and hacks.

The target audience for the book is the intermediate developer looking to raise his or her skills to the next level, so some of the material quickly rehashes the basics to set the stage for more interesting tips and tricks. The goal of the book is to be a survival guide for ASP.NET developers. We’ve been bitten and had the poison sucked out of our veins so you can avoid the vipers in the wild.

Technorati tags: Books, ASP.NET

personal, code, asp.net mvc comments edit

It was only two and a half months ago when I wrote about receiving my Microsoft MVP award. I was quite honored to receive this award.

In a follow-up comment to that post, rich with unintentional foreshadowing, I mentioned the following…

However, I would like to hit up that MVP conference in Redmond before doing anything to cause my MVP status to be dropped.

Unfortunately, I will not be retaining my MVP status long enough for the MVP conference. I have committed an action that has forced Microsoft’s hand in this matter and they must remove my MVP status.

To understand why this is the case, I must refer you to the Microsoft MVP FAQ which states the following in the fifth question…

Q5: Do MVPs represent Microsoft?

A5: No. MVPs are not Microsoft employees, nor do they speak on Microsoft’s behalf. MVPs are third-party individuals who have received an award from Microsoft that recognizes their exceptional achievements in technical communities.

Starting on October 15, 2007, I will join the ranks of Microsoft as an employee, thus putting myself in violation of this rule.

Don’t worry about me dear friend. I will cope well with this loss of status. I don’t hold Microsoft to blame.

Well, that’s not true. I do hold them to blame. While in Redmond recently, Scott Guthrie (aka ScottGu) showed me a rough prototype of a cool MVC framework they are working on for a future version of ASP.NET. When I saw it, I told Scott,

I want to work on that. How can I work on that?

So yes, I do blame Microsoft. I blame Microsoft for showing me something to which I absolutely could not resist contributing. I will be starting soon as a Senior Program Manager in the ASP.NET team.

I will continue to work from Los Angeles while we work on selling our house, which unfortunately is bad timing as housing prices have taken a bit of a dive around here. Once we have things settled over here, we’ll pack our things and move up to Seattle.

I’ll be in Seattle the week of October 15 for New Employee Orientation and to meet the rest of the team, so hopefully we can have another geek dinner/drink (I’m looking at youBrad,Scott,Peli, et all).

On the other side of the coin, work has been really fun lately at Koders, especially with the release of Pro Edition and the rails work I’ve been doing lately, so leaving is not easy, despite my short tenure. It’s a great company to work for and I wish them continued success.

My last day is this Wednesday and I will be taking a short break in between jobs to spend time with the family, travel, and get the house ready to sell.

As for Subtext, I will continue to contribute my spare moments leading the charge towards making it a fantastic blogging platform. When you think about it, joining the ASP.NET team is really just a clever ploy to make Subtext even better by being able to influence the underlying platform in a direction that makes it a joy to write code and tests for it. Yeah, I said tests. Of course, my goal would be to make every app built on ASP.NET, not just Subtext, better (and more testable as a contributing factor to being better) due to the work that we do.

Wish me luck in that endeavor.

asp.net comments edit

UPDATE: K. Scott Allen got to the root of the problem. It turns out it was an issue of precedence. Compiler options are not additive. Specifying options in @Page override those in web.config. Read his post to find out more.

Conditional compilation constants are pretty useful for targeting your application for a particular platform, environment, etc… For example, to have code that only executes in debug mode, you can define a conditional constant named DEBUG and then do this…

#if DEBUG
//This code only runs when the app is compiled for debug
Log.EverythingAboutTheMachine();
#endif

It’s not common knowledge to me that these constants work equally well in ASPX and ASCX files. At least it wasn’t common knowledge for me. For example:

<!-- Note the space between % and # -->
<% #if DEBUG %>
<h1>DEBUG Mode!!!</h1>
<% #endif %>

The question is, where do you define these conditional constants for ASP.NET. The answer is, well it depends on whether you’re using a Website project or a Web Application project.

For a Web Site project, one option is to define it at the Page level like so…

<%@ Page CompilerOptions="/d:QUUX" %>

The nice thing about this approach is that the conditional compilation works both in the ASPX file as well as in the CodeFile, for ASP.NET Website projects.

According to this post by K. Scott Allen, you can also define conditional compilation constants in the Web.config file using the <system.codedom /> element (a direct child of the <configuration /> element, but this didn’t work for me in either website projects nor web application projects.

<system.codedom>
  <compilers>
    <compiler
      language="c#;cs;csharp" extension=".cs"
      compilerOptions="/d:MY_CONSTANT"
      type="Microsoft.CSharp.CSharpCodeProvider, 
        System, Version=2.0.0.0, Culture=neutral, 
        PublicKeyToken=b77a5c561934e089" />
    </compilers>
</system.codedom>

At heart, Web Application Projects are no different from Class Library projects so you can set conditional compilation constants from the project properties dialog in Visual Studio.ConditionalCompilation -
Microsoft Visual
Studio

Unfortunately, these only seem to work in the code behind and not within ASPX files.

Here’s a grid based on my experiments that show when and where setting conditional compilation constants seem to work in ASP.NET.

Web.config Project Properties Page Directive
Website Code File No n/a Yes
Web Application Code File No Yes No
ASPX, ASCX File No No Yes

In order to create this grid, I created a solution that includes both a Web Application project and a Website project and ran through all nine permutations. You can download the solution here if you’re interested.

It’s a bit confusing, but hopefully the above table clears things up slightly. As for setting the conditional constants in Web.config, I’m quite surprised that it didn’t work for me (yes, I set it to full trust) and assume that I must’ve made a mistake somewhere. Hopefully someone will download this solution and show me why it doesn’t work.

comments edit

Here’s a little plug for something we’ve been working hard at over at Koders. Everyone knows that if you want to find open source code, you go to http://www.koders.com/ (it recently got a minor new facelift so check it out). That’s my area of responsibility here. However, after many many months of hard work, we released Koders Pro Edition 1.0 this week. I helped a bit with this, but it’s mostly due to the hard work of the rest of the team that this is out there, especially Ben, the product manager for Pro.

The Yin-Yang of Open Source and Private
CodePro Edition is the yin to the Koders.com yang. Pro Edition is great for searching and sharing your and your team’s internal code.

This should not be confused with desktop code search, although it can certainly be used in that manner. Rather, it’s more similar to the Google Search Appliance. Something you can install on a server, point it to your source control or files system, and now your whole team can quickly search and find your internal code.

While the focus of Pro Edition is on indexing your internal code, it doesn’t preclude you from indexing public open source code. After all, Pro Edition is cut from the same cloth (though scaled down) as the indexer we use for http://www.koders.com/, so you’re getting a lot of power under the hood.

Pro Edition allows private and public code to be intermingled if you so desire. For example, suppose your company has a limited set of open source projects you’d like to be able to search. Because Pro Edition supports indexing any CVS and Subversion repository (the two most widely used source control systems used by open source projects), there’s nothing stopping you from pointing your local Pro Edition at an open source code repository and start indexing that code along with your internal code.

Doing this would allow you to create a private searchable index of “approved” open source code. If this sounds interesting to you, try out the free trial.

code, tech, blogging comments edit

I was thinking about alternative ways to block comment spam the other day and it occurred to me that there’s potentially a simpler solution than the Invisible Captcha approach I wrote about.

The Invisible Captcha control plays upon the fact that most comment spam bots don’t evaluate javascript. However there’s another particular behavioral trait that bots have that can be exploited due to the bots inability to support another browser facility.

honeypot image from
http://www.cs.vu.nl/\~herbertb/misc/shelia/ You see, comment spam bots love form fields. When they encounter a form field, they go into a berserker frenzy (+2 to strength, +2 hp per level, etc…) trying to fill out each and every field. It’s like watching someone toss meat to piranhas.

At the same time, spam bots tend to ignore CSS. For example, if you use CSS to hide a form field (especially via CSS in a separate file), they have a really hard time knowing that the field is not supposed to be visible.

To exploit this, you can create a honeypot form field that should be left blankand then use CSS to hide it from human users, but not bots. When the form is submitted, you check to make sure the value of that form field is blank. For example, I’ll use the form field named body as the honeypot. Assume that the actual body is in another form field named the-real-body or something like that:

<div id="honeypotsome-div">
If you see this, leave this form field blank 
and invest in CSS support.
<input type="text" name="body" value="" />
</div>

Now in your code, you can just check to make sure that the honeypot field is blank…

if(!String.IsNullOrEmpty(Request.Form["body"]))
  IgnoreComment();

I think the best thing to do in this case is to act like you’ve accepted the comment, but really just ignore it.

I did a Google search and discovered I’m not the first to come up with this idea. It turns out that Ned Batchelder wrote about honeypots as a comment spam fighting vehicle a while ago. Fortunately I found that post after I wrote the following code.

For you ASP.NET junkies, I wrote a Validator control that encapsulates this honeypot behavior. Just add it to your page like this…

<sbk:HoneypotCaptcha ID="body" ErrorMessage="Doh! You are a bot!"
  runat="server"  />

This control renders a text box and when you call Page.Validate, validation fails if the textbox is not empty.

This control has no display by default by setting the style attribute to display:none. You can override this behavior by setting the UseInlineStyleToHide property to false, which makes you responsible for hiding the control in some other way (for example, by using CSS defined elsewhere). This also provides a handy way to test the validator.

To get your hands on this validator code and see a demo, download the latest Subkismet source from CodePlex. You’ll have to get the code from source control because this is not yet part of any release.

comments edit

Akumi-Phil-Cody Today my wife and I celebrate our fifth anniversary of being legally married. If you’ve read my blog long enough, you might have seen this post which suggests we were married June 14, not September 12.

It’s all pretty simple, you see. We had our wedding ceremony on June 14 2003, but were secretly legally married on September 12, 2002.

Ok perhaps the term secret marriage is a bit too strong. But it sounds cool, doesn’t it? The story is that at the time, my wife wanted to take a long trip back to Japan before our planned wedding. Unfortunately, with the tightening up of immigration following September 11, we were concerned she’d have trouble coming back. So we got legally married at the Beverly Hills Courthouse to make sure she could return.

Recently we decided to follow the DRY principle (Don’t Repeat Yourself) and only really celebrate our legal anniversary, as it keeps it simple for me.

Hard enough for a guy to remember one anniversary much less two!

So to Akumi (yes, she actually reads my blog), Happy Anniversary. I love you very much! And I was going to post that other silly pic of you we found, but I want to live to see the next five years of our life together.

comments edit

Last night I nearly lost a dear friend of mine. Now this is the sort of story most men, myself included, would understandably want keep to themselves. Although this deviates from my normal content, I feel a duty to tell all in this age of transparency because while I was in the middle of the ordeal, I turned to Google for help and didn’t find the information I needed. I write this in the hopes it helps some unfortunate guy in the future.

mainimageThe story begins late last night around 1:15 AM as I turned in to bed for the night. Tossing and turning, I started to feel a pain in my lower abdomen and right testicle. I could feel that my right testicle was swollen and a bit harder than one would expect. It felt like an impossibly bad case of blue balls. The worst case imaginable.

Since I hate dealing with hospitals and such, I tried to sleep it off telling myself it would be fine in the morning, as if somehow the Nut-helper Fairy would come in the middle of the night and make it all better.

Suffice it to say, when your genital region is in pain, it’s pretty damn difficult to get a good night’s sleep. You shouldn’t screw around (forgive the pun) when you have a pain in that region. So I got up, Googled it, found nothing but scary stories about testicular cancer and painful hernias, and decided then I should go see a doctor. I told my wife I had to go and I proceeded to walk over to the neighborhood Emergency Room at 2:30 AM.

During triage I explained that the pain was around a 7 on a scale of 1 to 10, it was a dull intense pain, not sharp, and it was constant, not coming in waves, centered around my testicle and my lower abdomen area.

After I was moved to a gurney, the doctor began black box testing on me. It’s not unlike debugging a bug in code for which you don’t have any means to step through a debugger. He’d prod around narrowing down the possible diagnoses. However, unlike debugging code, this process was excruciatingly painful.

After manhandling my right nut for a while, the doctor diagnosed me with Testicular Torsion. Wikipedia defines it thusly…

In testicular torsion the spermatic cord that provides the blood supply to a testicle is twisted, cutting off the blood supply, often causing orchalgia. Prolonged testicular torsion will result in the death of the testicle and surrounding tissues.

I define it as ow! ow! that fucking hurts!

So the doctor leaves to order an ultrasound and returns not long after to “try one more thing.” He then proceeds to grab the nut and twist it around, asking me to let him know when the pain subsides.

Riiiiight.

A man with a latex glove is twisting my nut and asking me when it doesn’t hurt? It doesn’t hurt when you’re not twisting it! 

Exactly how I wanted to spend my Monday morning.

Amazingly enough though, the pain subsided quickly after he stopped. I didn’t realize at the time that he was twisting it back. I thought he was just being sadistic.

The male nurse on duty quipped afterwards…

Probably the first time that someone twisting your testicle made you feel better, eh?

No, twisting my testicle normally elicits feelings of euphoria and joy. Of course it’s the first time! And by Zeus’s eye I hope it’s the last.

Afterwards I was pushed on a gurney into the ultrasound room by a big burly Russian dude who proceeded to ultrasound my testicular nether regions. At this point, there’s really no point in having any shame or bashfulness. I just tried to make small talk as he showed me the screen displaying blood flowing nicely.

As I was being discharged, the doctor told me it was a good thing I went in. Left untreated for around six hours, I could have lost the testicle. I later looked it up and this is what Wikipedia has to say on the subject (emphasis mine).

Testicular torsion is a medical emergency that needs immediate treatment. If treated within 6 hours, there is nearly a 100% chance of saving the testicle. Within 12 hours this rate decreases to 70%, within 24 hours is 20%, and after 24 hours the rate approaches 0. (eMedicineHealth) Once the testicle is dead it must be removed to prevent gangrenous infection.

Yeah, I’m going to be having nightmares too. In any case, it seems that all is well. I still have a slight bit of discomfort not unlike the feeling in your gut long after someone kicks you in the groin and I’ve been walking around a bit gingerly, worried any sudden movement might cause a relapse.

The moral of this story is when you have an intense pain in the balls, don’t be a tough guy about it. Go to the emergency room and be safe about it. No use trying to be stoic and losing a nut over it.

My next step now is to make an appointment with a Urologist so I can have yet another doctor see me in all my glory and make sure it’s all good.

To the doctor at the local neighborhood Emergency Room, I owe you a big one. Because of him, the next time someone asks me, “Hey! How’s it hanging” I can answer, “Pointed in the right direction.”

comments edit

Not too long ago I wrote a blog post on some of the benefits of Duck Typing for C# developers. In that post I wrote up a simplified code sample demonstrating how you can cast the HttpContext to an interface you create called IHttpContext, for lack of a better name.

Is it a duck or a
rabbit?Well I couldn’t just sit still on that one so I used Reflector and a lot of patience and created a set of interfaces to match the Http intrinsic classes. Here is a full list of interfaces I created along with the concrete existing class (all in the System.Web namespace except where otherwise stated) that can be cast to the interface (ones in bold are the most commonly used.

  • ICache - Cache
  • IHttpApplication - HttpApplication
  • IHttpApplicationState - HttpApplicationState
  • IHttpCachePolicy - CachePolicy
  • IHttpClientCertificate - HttpClientCertificate
  • IHttpContext -HttpContext
  • IHttpFileCollection - HttpFileCollection
  • IHttpModuleCollection - HttpModuleCollection
  • IHttpRequest - HttpRequest
  • IHttpResponse - HttpResponse
  • IHttpServerUtility - HttpServerUtility
  • IHttpSession- System.Web.SessionState.HttpSessionState
  • ITraceContext - TraceContext

As an aside, you might wonder why I chose the name IHttpSession instead of IHttpSessionState for the class HttpSessionState. It turns out that there already is an IHttpSessionState interface, but HttpSessionState doesn’t inherit from that interface. Go figure. Now that’s a juicy tidbit you can whip out at your next conference cocktail party.

Note that I focused on classes that don’t have public constructors and are sealed. I didn’t want to follow the entire object graph!

I also wrote a simple WebContext class with some helper methods. For example, to get the current HttpContext duck typed as IHttpContext, you simply call…

IHttpContext context = WebContext.Current;

I also added a bunch of Cast methods specifically for casting http intrinsic types. Here’s some demo code to show this in action. Assume this code is running in the code behind of your standard ASPX page.

public void HelloWorld(IHttpResponse response)
{
  response.Write("<p>Who’s the baddest!</p>");
}

protected void Page_Load(object sender, EventArgs e)
{
  //Grab it from the http context.
  HelloWorld(WebContext.Current.Response);
  
  //Or cast the actual Response object to IHttpResponse
  HelloWorld(WebContext.Cast(Response));
}

The goal of this library is to make it very easy to refactor existing code to use these interfaces (should you so desire), which will make your code less tied to the System.Web classes and more mockable.

Why would you want such a thing? Making classes mockable makes them easier to test, that’s a worthy goal in its own right. Not only that, this gives control over dependencies to you, as a developer, rather than having your code tightly coupled to the System.Web classes. One situation I’ve run into is wanting to write a command line tool to administer Subtext on my machine. Being able to substitute my own implementation of IHttpContext will make that easier.

UPDATE: The stack overflow problem mentioned below has since been fixed within the Duck Typing library.

One other note as you look at the code. You might notice I’ve had to create extra interfaces (commented with a //Hack). This works around a bug I found with the Duck Casting library reproduced with this code…

public class Foo
{
  public Foo ChildFoo
  {
    get { return new Foo();}
  }
}

public interface IFoo
{
  //Note this interface references itself
  IFoo ChildFoo { get;}
}

public static class FooTester
{
  public static void StackOverflowTest()
  {
    Foo foo = new Foo();
    IFoo fooMock = DuckTyping.Cast<IFoo>(foo);
    Console.WriteLine(fooMock);
  }
}

Calling FooTester.StackOverflowTest will cause a stack overflow exception. The fix is to do the following.

public interface IFoo2 : IFoo {}

public class IFoo
{
  IFoo2 ChildFoo { get; }
}

In any case, I hope some of you find this useful. Let me know if you find any bugs or mistakes. No warranties are implied. Download the code from here which includes the HttpInterfaces class library with all the interfaces, a Web Project with a couple of tests, and a unit test library with more unit tests.

comments edit

Ayende recently wrote about Microsoft’s “annoying” tendency to duplicate the efforts of perfectly capable Open Source Software already in existence. In the post, he references this post by Scott Bellware which lists several cases in which Microsoft duplicated the efforts of OSS software.

Fear Factor

Ayende is not convinced by the fear factor argument around issues of software pedigree, patents, and legal challenges. Jon Galloway wrote about this argument a while ago in his post Why Microsoft can’t ship open source code.

In his post, Ayende dismisses this argument as “lawyer-paranoia”. While I agree to some extent that it is paranoia, not all paranoia is bad. I think this point bears more thoughtful responses than simply dismissing it as FUD.

Microsoft really is a huge fat target with a gigantic bullseye on its forehead in the form of lots and lots of money. At that size, the rules of engagement changes when compared to smaller companies.

Nobody is going after small fries who release open source code such as Ayende or myself. But as soon a big fry like Microsoft starts bundling open source code, watch out for the armies of patent trolls, lawyers in tow, coming out of the woodwork.

notld1

As an aside, some commenters mention the “commercial friendliness” of the licenses of the projects they would like bundled such as the BSD and MIT licenses. However, as far as I know, none of these licenses have any patent protection in the same way that the GPL does. Perhaps Microsoft should require bundled OSS software to be licensed with the GPL. I kid! I kid! We’d probably see Steve Ballmer grooving to an IPod in a pink leotard before that happens.

Back to the point at hand. Having said all that, while I think this is a difficult challenge, I don’t think it is an insurmountable challenge. Microsoft can afford an army of lawyers and hopefully some of them are extremely bright and can come up with creative solutions that might allow Microsoft to leverage and even bundle Open Source software in a safe manner. After all, they already face the same risk by allowing any employee to write and ship code. Employees are not immune to lapses of judgement.

We already see progress happening in regards to Microsoft and Open Source. The IronRuby project will accept source code contributions, but most likely with some strict limitations and with required paperwork like the Free Software Foundation does. Progress can be made on this front, but it won’t happen overnight.

How Should They Choose?

For the sake of argument, suppose that Microsoft deals with all the legal issues and does decide to start bundling OSS software. How should they choose which software to bundle?

For mock object frameworks, Scott Bellware mentions Rhino Mocks, a mock framework I’ve written about a few times and would agree with this choice. But what about NMock which has been around longer as far as I know. I think Scott and Ayende would both agree that popularity or seniority should not trump technical quality in choosing which project to bundle. I personally would choose Rhino Mocks over NMock any day of the week.

Bellware’s post also lists NUnit. While NUnit has been around longer than MbUnit, in my opinion I think it is pretty clear that MbUnit is technically a much better testing framework. Naturally, I’m sure there are many fans of NUnit who would disagree vehemently. Therein lies the conflict. No matter which framework Microsoft chooses, there will be many who are unhappy with the choice.

If Microsoft had chosen to not write its own test framework, I fear they would have chosen NUnit over MbUnit simply because it’s more well known or for political reasons. Such a choice would have the potential to hurt a project like MbUnit in the never ending competition for users and contributors.

The fact that the MS Test sucks so much is, in a way, a boon to NUnit and MbUnit. Please understand I’m not saying that “Because choosing a project is hard, it shouldn’t or can’t be done”. I’m merely suggesting that if we’re clamoring for Microsoft to start bundling instead of duplicating, we ought to offer ideas on how that should happen and be prepared for the ramifications of such choices.

So what do I think they should do?

Let’s look at one situation in particular that appears to be an annoying duplication of efforts. A while back, Microsoft identified a business opportunity to create an integrated development IDE suite which included code coverage, bug tracking, unit testing, etc… They came out with Team System which included a unit testing framework that wasn’t even near par with NUnit or MbUnit.

This is a situation in which many have argued that Microsoft should have bundled NUnit with Team System rather than writing their own.

While we can continue to argue the merits of whether Microsoft should or shouldn’t bundle Open Source software, the official stance currently appears to be that it is too much of a liability to do so. So rather than keep arguing that point, let’s take a step back and for the sake of argument, accept it as a given.

So given that Microsoft couldn’t bundle NUnit, what should have they done?

They should have given developers a choice.

What I would have liked to have seen is for Team System to provide extensibility points which make it extremely easy to swap out MS Test for another testing framework. MS Test isn’t the money maker for Microsoft, it’s the whole integrated suite that brings in the moolah, so being able to replace it doesn’t hurt the bottom line.

Given the inability to bundle NUnit, I can understand why Microsoft would write their own test framework. They wanted a complete integrated suite. It wouldn’t work to ship something without a test framework so they provided a barely adequate one. Fine. But why not allow me to switch that out with MbUnit and still have the full non-degraded integrated experience?

Microsoft could have then worked with the OSS communities to provide information and maybe even some assistance with integrating with Team System.

This is not unprecedented by any means. It’s very similar to how Microsoft cooperates with control vendors who build WinForms and ASP.NET widgets and controls.

Microsoft doesn’t provide a GridView, tells us developers that’s all we’ll ever need for displaying data, and then closes the door on other control vendors who might want to provide developers with an alternative grid control. Hell no.

Instead, they make it easy for control vendors to provide their own controls and have a first-class integrated experience (with design time support etc…) within the Visual Studio IDE because they recognize they don’t have the bandwidth to build everything top shelf. This sort of forward thinking should apply anytime they plan to ship a crappy stopgap implementation.

code, regex comments edit

In my last post, I wrote about how most email validation routines are too strict when compared against what is allowed by the RFC. Initially I dismissed this phenomena as the result of ignorance of the RFC or inability to understand it, as I had trouble understanding it myself.

However, I think there’s something more fundamental at work here when it comes to validating user data. It seems that many developers, myself included, choose to ignore Postel’s Law when it comes to field validation. Postel’s law states…

Be conservative in what you do; be liberal in what you accept from others.

Postel wrote that in an RFC that defined TCP, but it applies much more broadly. It’s natural that developers, used to the exacting nature of writing code for a compiler, where even the most minor of typos can bring a program screeching to a halt, have a tendency to apply such exactitude on their users.

Dare I say it, but developers can tend to be validation nazis.

Seinfeld_s7e6

User: (filling out form) user+nospam@example.com

Validation Nazi: Entering a plus sign is $2.00 extra.

User: But the RFC allows for a plus sign.

Soup Nazi: You want plus sign?

User: Yes please.

Validation Nazi: $3.00!

User: What?

Validation Nazi: No form submission for you!

This is a mistake. Users are not compilers so we need to cut them some slack.

A List Apart provides some great examples of mistakes in treating users like computers and ways to correct them in the article, Sensible Forms: A Form Usability Checklist. Here’s a snippet about dealing with phone numbers (emphasis mine).

Let the computer, not the user, handle information formatting

Few things confuse users as often as requiring that users provide information in a specific format. Format requirements for information like telephone number fields are particularly common. There are many ways these numbers can be represented:

    * (800) 555-1212\     * 800-555-1212\     * 800.555.1212\     * 800 555 1212

Ultimately, the format we likely need is the one that only contains numbers:

    * 8005551212

There are three ways to handle this. The first method tells the user that a specific format of input is required and returns them to the form with an error message if they fail to heed this instruction.

The second method is to split the telephone number input into three fields. This method presents the user with two possible striking usability hurdles to overcome. First, the user might try to type the numbers in all at once and get stuck because they’ve just typed their entire telephone number into a box which only accepts three digits. The “ingenious” solution to this problem was to use JavaScript to automatically shift the focus to the next field when the digit limit is achieved. Has anyone else here made a typo in one of these types of forms and gone through the ridiculous process of trying to return focus to what Javascript sees as a completed field? Raise your hands; don’t be shy! Yes, I see all of you.

Be reasonable; are we so afraid of regular expressions that we can’t strip extraneous characters from a single input field? Let the users type their telephone numbers in whatever they please. We can use a little quick programming to filter out what we don’t need.

The recommendation they give fits with Postel’s law by being liberal in what they accept from the user. The computer is really really good at text processing and cleaning up such data, so why not leverage that fast computation, rather than throwing a minor annoyance at your users. No matter how small the annoyance, every little mental annoyance begins to add up. As Jacob Nielsen writes (emphasis his)…

Annoyances matter, because they compound. If the offending state-field drop-down were a site’s only usability violation, I’d happily award the site a gold star for great design. But sites invariably have a multitude of other annoyances, each of which delays users, causes small errors, or results in other unpleasant experiences.

A site that has many user-experience annoyances:

  • appears sloppy and unprofessional,
  • demands more user time to complete tasks than competing sites that are less annoying, and
  • feels somewhat jarring and unpleasant to use, because each annoyance disrupts the user’s flow.

Even if no single annoyance stops users in their tracks or makes them leave the site, the combined negative impact of the annoyances will make users feel less satisfied. Next time they have business to conduct, users are more likely to go to other sites that make them feel better.

However, in the case of the email validation, the problem is much worse. It violates the golden rule of field validation (I’m not sure if there is a golden rule already, but there is now)…

Never ever reject user input when it truly is valid.

In the comments of my last post, several users lamented the fact that they can’t use a clever GMail hack for their email address because most sites (mine included at the time, though I’ve since fixed it) reject the email.

With Gmail you can append a tag to your email address. So let’s say you have “name@gmail.com” you can give someone an email address of “name++sometag@gmail.com” and it will faithfully arrive in your inbox. The use of this for me is that I can track who’s selling my email address or at least who I gave my email to that is now abusing it.

For fun, I wrote a crazy regular expression to attempt to validate an email address correctly according to the RFC, but in the end, this was a case of Regex abuse, not Regex use. But as one commenter pointed out…

THE ONLY WAY TO VALIDATE AN EMAIL ADDRESS IS TO DELIVER A MESSAGE TO IT!

This served to drive home the point that attempting to strictly validate an email address on the client is pointless. The type of validation you do should really depend on the importance of that email address.

For example, when leaving a comment on my form, entering an email address is optional. It’s never displayed, but it allows me to contact you directly if I have a personal response and it also causes your Gravatar to be displayed if you have one. For something like this, I stick with a really really simple email address validation purely for the purpose of avoiding typos…

^.+?@.+$

However, for a site that requires registration (such as a banking site), having the correct email address to reach the user is critical. In that case it might make sense to have the user enter the email twice (to help avoid typos, though most users simply copy and paste so the efficacy of this is questionable) and then follow up with a verification email.

In the end, developers need to loosen up and let users be somewhat liberal about what they enter in a form. It takes more code to clean that up, but that code only needs to be written once, as compared to the many many users who have to suffer through stringent form requirements.

code, regex comments edit

Raise your hand if you know how to validate an email address. For those of you with your hand in the air, put it down quickly before someone sees you. It’s an odd sight to see someone sitting alone at the keyboard raising his or her hand. I was speaking metaphorically.

at-sign Before yesterday I would have raised my hand (metaphorically) as well. I needed to validate an email address on the server. Something I’ve done a hundred thousand times (seriously, I counted) using a handy dandy regular expression in my personal library.

This time, for some reason, I decided to take a look at my underlying assumptions. I had never actually read (or even skimmed) the RFC for an email address. I simply based my implementation on my preconceived assumptions about what makes a valid email address. You know what they say about assuming.

What I found out was surprising. Nearly 100% of regular expressions on the web purporting to validate an email address are too strict.

It turns out that the local part of an email address, the part before the @ sign, allows a lot more characters than you’d expect. According to section 2.3.10 of RFC 2821 which defines SMTP, the part before the @ sign is called the local part (the part after being the host domain) and it is only intended to be interpreted by the receiving host…

Consequently, and due to a long history of problems when intermediate hosts have attempted to optimize transport by modifying them, the local-part MUST be interpreted and assigned semantics only by the host specified in the domain part of the address.

Section section 3.4.1 of RFC 2822 goes into more detail about the specification of an email address (emphasis mine).

An addr-spec is a specific Internet identifier that contains a locally interpreted string followed by the at-sign character (“@”, ASCII value 64) followed by an Internet domain.  The locally interpreted string is either a quoted-string or a dot-atom.

A dot-atom is a dot delimited series of atoms. An atom is defined in section 3.2.4 as a series of alphanumeric characters and may include the following characters (all the ones you need to swear in a comic strip)…

! $ & * - = \^ ` | ~ # % ‘ + / ? _ { }

Not only that, but it’s also valid (though not recommended and very uncommon) to have quoted local parts which allow pretty much any character. Quoting can be done via the backslash character (what is commonly known as escaping) or via surrounding the local part in double quotes.

RFC 3696, Application Techniques for Checking and Transformation of Names, was written by the author of the SMTP protocol (RFC 2821) as a human readable guide to SMTP. In section 3, he gives some examples of valid email addresses.

These are all valid email addresses!

  • Abc\@def@example.com
  • Fred\ Bloggs@example.com
  • Joe.\\Blow@example.com
  • "Abc@def"@example.com
  • "Fred Bloggs"@example.com
  • customer/department=shipping@example.com
  • $A12345@example.com
  • !def!xyz%abc@example.com
  • _somename@example.com

Note: Gotta love the author for using my favorite example person, Joe Blow.

Quick, run these through your favorite email validation method. Do they all pass?

For fun, I decided to try and write a regular expression (yes, I know I now have two problems. Thanks.) that would validate all of these. Here’s what I came up with. (The part in bold is the local part. I am not worrying about checking my assumptions for the domain part for now.)

^(?!\.)("([^"\r\\]|\\["\r\\])*"|([-a-z0-9!#$%&'*+/=?^_`{|}~] |(?@[a-z0-9][\w\.-]*[a-z0-9]\.[a-z][a-z\.]*[a-z]$

Note that this expression assumes case insensitivity options are turned on (RegexOptions.IgnoreCase for .NET). Yeah, that’s a pretty ugly expression.

I wrote a unit test to demonstrate all the cases this test covers. Each row below is an email address and whether it should be valid or not.

[RowTest]
[Row(@"NotAnEmail", false)]
[Row(@"@NotAnEmail", false)]
[Row(@"""test\\blah""@example.com", true)]
[Row(@"""test\blah""@example.com", false)]
[Row("\"test\\\rblah\"@example.com", true)]
[Row("\"test\rblah\"@example.com", false)]
[Row(@"""test\""blah""@example.com", true)]
[Row(@"""test""blah""@example.com", false)]
[Row(@"customer/department@example.com", true)]
[Row(@"$A12345@example.com", true)]
[Row(@"!def!xyz%abc@example.com", true)]
[Row(@"_Yosemite.Sam@example.com", true)]
[Row(@"~@example.com", true)]
[Row(@".wooly@example.com", false)]
[Row(@"wo..oly@example.com", false)]
[Row(@"pootietang.@example.com", false)]
[Row(@".@example.com", false)]
[Row(@"""Austin@Powers""@example.com", true)]
[Row(@"Ima.Fool@example.com", true)]
[Row(@"""Ima.Fool""@example.com", true)]
[Row(@"""Ima Fool""@example.com", true)]
[Row(@"Ima Fool@example.com", false)]
public void EmailTests(string email, bool expected)
{
  string pattern = @"^(?!\.)(""([^""\r\\]|\\[""\r\\])*""|" 
    + @"([-a-z0-9!#$%&'*+/=?^_`{|}~]|(?<!\.)\.)*)(?<!\.)" 
    + @"@[a-z0-9][\w\.-]*[a-z0-9]\.[a-z][a-z\.]*[a-z]$";

  Regex regex = new Regex(pattern, RegexOptions.IgnoreCase);
  Assert.AreEqual(expected, regex.IsMatch(email)
    , "Problem with '" + email + "'. Expected "  
    + expected + " but was not that.");
}

Before you call me a completely anal nitpicky numnut (you might be right, but wait anyways), I don’t think this level of detail in email validation is absolutely necessary. Most email providers have stricter rules than are required for email addresses. For example, Yahoo requires that an email start with a letter. There seems to be a standard stricter set of rules most email providers follow, but as far as I can tell it is undocumented.

I think I’ll sign up for an email address like phil.h\@\@ck@haacked.com and start bitching at sites that require emails but don’t let me create an account with this new email address. Ooooooh I’m such a troublemaker.

The lesson here is that it is healthy to challenge your preconceptions and assumptions once in a while and to never let me near an RFC.

UPDATES: Corrected some mistakes I made in reading the RFC. See! Even after reading the RFC I still don’t know what the hell I’m doing! Just goes to show that programmers can’t read. I updated the post to point to RFC 822 as well. The original RFC.

comments edit

David Meyer recently published a .NET class library that enables duck typing (also sometimes incorrectly described as Latent Typing as Ian Griffiths explains in his campaign to disabuse that notion) for .NET languages.

The term duck typing is popularly explained by the phrase

If it walks like a duck and quacks like a duck, it must be a duck.

For most dynamic languages, this phrase is slightly inaccurate in describing duck typing. To understand why, let’s take a quick look at what duck typing is about.

Duck Typing Explained

duck-rabbit-philDuck typing allows an object to be passed in to a method that expects a certain type even if it doesn’t inherit from that type. All it has to do is support the methods and properties of the expected type in use by the method.

I emphasize that last phrase for a reason. Suppose we have a method that takes in a duck instance, and another method that takes in a rabbit instance. In a dynamically typed language that supports duck typing, I can pass in my object to the first method as long as my object supports the methods and properties of duck in use by that method. Likewise, I can pass my object into the second method as long as it supports the methods and properties of rabbit called by the second method. Is my object a duck or is it a rabbit? Like the above image, it’s neither and it’s both.

In many (if not most) dynamic languages, my object does not have to support all methods and properties of duck to be passed into a method that expects a duck. Same goes for a method that expects a rabbit. It only needs to support the methods and properties of the expected type that are actually called by the method.

The Static Typed Backlash

Naturally, static typing proponents have formed a backlash against dynamic typing, claming that all hell will break loose when you give up static typing. A common reaction (and I paraphrase) to David’s duck typing project goes something like

Give me static types or give me death!

Now I love compiler checking as much as the next guy, but I don’t understand this attitude of completely dismissing a style of programming that so many are fawning over.

Well, actually I do understand…kinda. So many programmers were burned by their days of programming C (among other languages) and its type unsafety which caused many stupid runtime errors that it’s been drilled into their heads that static types are good, just, and the American way.

And for the most part, it’s true, but making this an absolute starts to smell like the monkey cage experiment in that we ignore changes in software languages and tooling that might challenge the original reasons we use static types because we’ve done it this way for so long.

I think Bruce Eckel’s thoughts on challenging preconceived notions surrounding dynamic languages are spot on (emphasis mine).

What I’m trying to get to is that in my experience there’s a balance between the value of strong static typing and the resulting impact that it makes on your productivity. The argument that “strong static is obviously better” is generally made by folks who haven’t had the experience of being dramatically more productive in an alternative language. When you have this experience, you see that the overhead of strong static typing isn’t always beneficial, because sometimes it slows you down enough that it ends up having a big impact on productivity.

The key point here is that static typing doesn’t come without a cost. And that cost has to be weighed on a case by case basis against the benefits of dynamic languages.

C# has used duck typing for a long time

Interestingly enough, certain features of C# already use duck typing. For example, to allow an object to be enumerated via the C# foreach operator, the object only needs to implement a set of methods as Krzystof Cwalina of Microsoft points out in this post

Provide a public method GetEnumerator that takes no parameters and returns a type that has two members: a) a method MoveNext that takes no parameters and return a Boolean, and b) a property Current with a getter that returns an Object.

You don’t have to implement an interface to make your object enumerable via the foreach operator.

 

A Very Useful Use Case For When You Might Use Duck Typing

 

If you’ve followed my blog at all, you know that I’ve gone through all sorts of contortions to try and mock the HttpContext object via the HttpSimulator class. The problem is that I can’t use a mock framework because HttpContext is a sealed class and it doesn’t implement an interface that is useful to me.

Not only that, but the properties of HttpContext I’m interested in (such as Request and Response) are sealed classes (HttpRequest and HttpResponse respectively). This makes it awful challenging to mock these objects for testing. More importantly, it makes it hard to switch to a different type of context should I want to reuse a class in a different context such as the command line. Code that uses these classes have a strong dependency on these classes and I’d prefer looser coupling to the System.Web assembly.

The common approach to breaking this dependency is to create your own IContext interface and then create another class that implements that interface and essentially forwards method calls to an internal private instance of the actual HttpContext. This is effectively a combination of the composition and adapter pattern.

The problem for me is this is a lot more code to maintain just to get around the constraints caused by static typing. Is all this additional code worth the headache?

With the .NET Duck Typing class, I can reduce the code by a bit. Here’s some code that demonstrates. First I create interfaces with the properties I’m interested. In order to keep this sample short, I’m choosing two interfaces each with one property..

public interface IHttpContext
{
  IHttpRequest Request { get;}
}

public interface IHttpRequest
{
  Uri Url { get;}
}

Now suppose my code had a method that expects an HttpContext to be passed in, thus tightly coupling our code to HttpContext. We can break that dependency by changing that method to take in an instance of the interface we created, IHttpContext, instead.

public void MyMethod(IHttpContext context)
{
  Console.WriteLine(context.Request.Url);
}

The caller of MyMethod can now pass in the real HttpContext to this method like so…

IHttpContext context = DuckTyping.Cast<IHttpContext>(HttpContext.Current);
MyMethod(context);

What’s great about this is that the code that contains the MyMethod method is no longer tightly coupled to the System.Web code and does not need to reference that assembly. Also, I didn’t have to write a class that implements the IHttpContext interface and wraps and forwards calls to the private HttpContext instance, saving me a lot of typing (no pun intended).

Should I decide at a later point to pass in a custom implementation of IHttpContext rather than the one in System.Web, I now have that option.

Yet another benefit is that I can now test MyMethod using a mock framework such as RhinoMocks like so…

MockRepository mocks = new MockRepository();
IHttpContext mockContext;
using (mocks.Record())
{
  mockContext = mocks.DynamicMock<IHttpContext>();
  IHttpRequest request = mocks.DynamicMock<IHttpRequest>();
  SetupResult.For(mockContext.Request).Return(request);
  SetupResult.For(request.Url).Return(new Uri("http://haacked.com/"));
}
using (mocks.Playback())
{
  MyMethod(mockContext);
}

You might wonder if I can go the opposite direction. Can I write my own version of HttpContext and using duck typing cast it to HttpContext? I tried that and it didn’t work. I believe that’s because HttpContext is a sealed class and I think the Duck Typing Project generates a dynamic proxy that inherits from the type you pass in. Since we can’t inherit from a sealed class, we can’t simply cast a compatible type to HttpContext. The above examples work because we’re duck type casting to an interface.

With C#, if you need a class you’re writing to act like both a duck and a rabbit, it makes sense to implement those interfaces. But sometimes you need a class you didn’t write and cannot change (such as the Base Class Libraries) to act like a duck. In that case, this duck typing framework is a useful tool in your toolbox.

Technorati tags: Duck Typing, Dynamic Languages, C#, Dynamic Types