code comments edit

I don’t think it’s too much of a stretch to say that the hardest part of coding is not writing code, but reading it. As Eric Lippert points out, Reading code is hard.

First off, I agree with you that there are very few people who can read code who cannot write code themselves. It’s not like written or spoken natural languages, where understanding what someone else says does not require understanding why they said it that way.

Screenshot of
codeHmmm, now why did Eric say that in that particular way?

This in part is why reinventing the wheel is so common (apart from the need to prove you can build a better wheel). It’s easier to write new code than try and understand and use existing code.

It is crucial to try and make your code as easy to read as possible. Strive to be the Dr. Seuss of writing code. Making your code easy to read makes it easier to use.

The basics of readable code include the usual advice of following code conventions, formatting code properly, and choosing good names for methods and variables, among other things. This is all included within Code Complete which should be your software development bible.

Aside from all that, a key tactic to improve code readibility and usability is make your code’s intentions crystal clear.

Oftentimes it’s paying attention to the little things that can really help your code along this path. Let’s look at a few examples.

out vs ref

A while ago I encountered some code that looked something like this contrived example:

int y = 7;
//...
bool success = TrySomething(someParam, ref y);

Ignore the terrible names and focus on the parameters. At a glance, what is your initial expectation of this code regarding its parameter?

When I encountered this code, I assumed that that the y parameter value passed in to this method is important somehow and that the method probably changes the value.

I then took a look at the method (keep in mind this is all extremely simplified from the actual code).

public bool TrySomething(object something, ref int y)
{
  try
  {
    y = resultOfCalculation(something);
  }
  catch(SomeException)
  {
    return false;
  }
  return true;
}

Now this annoyed me. Sure, this method is perfectly valid and will compile. But notice that the value of y is never used. It is immediately assigned to something else.

The intention of this method is not clear. It’s intent is not to ever use the value of y, but to merely set it. But since the method uses the ref keyword, you are required to set the value of the parameter before you call it. You can’t do this:

int y;
bool success = TrySomething(someParam, ref y);

In this case, using the out keyword expresses the intentions much better.

public bool TrySomething(object something, out int y)
{
  try
  {
    y = resultOfCalculation(something);
  }
  catch(SomeException)
  {
    return false;
  }
  return true;
}

It’s a really teeny tiny thing, something you might accuse me of being nitpicky even bringing it up, but anything you can do so that the reader of the code doesn’t have to interrupt her train of thought to figure out the meaning of the code will make your code more readable and the API more usable.

Boolean Arguments vs Enums

Brad Abrams touched upon this one a while ago. Let’s look at an example.

BlogPost p = CreatePost(post, true, false);

What exactly is this code doing? Well it’s obvious it creates a blog post. But what is that true indicate? Hard to say. I better pause, look up the method, and then move on. What a pain!

BlogPost p = CreatePost(post
  , PostStatus.Published, CommentStatus.CommentsDisabled);

In the second case, the intentions of the code is much clearer and there is no interruption for the reader to figure out the context of the true or false as in the first method.

Assigning a Value You Don’t Use

Another common example I’ve seen is where the result of a method is assigned to the value of a variable, but the variable is never used. I think this often happens because some developers falsely believe that if a method returns a value, that value has to be assigned to something.

Let’s look at an example that uses the TrySomething method I wrote earlier.

int y;
bool success = TrySomething(something, out y);
/*success is never used again.*/

Fortunately, Resharper makes this sort of thing stick out like a sore thumb. The problem here is that as a code reader, I’m left wondering if you meant to use the variable and forgot, or if this is an unecessary declaration. Do this instead.

int y;
TrySomething(something, out y);

Again, these are very small things, but they make a big difference. Don’t worry about coming across as anal (you will) because the payout is worth it in the end.

What are some examples that you can think of to make code more readable and usable?

UPDATE: Lesson learned. If you oversimplify your code examples, your main point is lost. Especially on the topic of code readability. Touche! I’ve updated the sample code to better illustrate my point. The comments may be out of synch with what you read here as a result.

UPDATE AGAIN: I found another great blog post about writing concise code that adds a lot to this discussion. It is part of the Fail Fast and Return Early school of thought. Short, concise and readable code - invert your logic and stop nesting already!

comments edit

According to FeedBurner, many of my readers are from London, so I thought you might enjoy this little tale.

Tonight, I met someone extremely famous, or so I was told. When I got home, I looked him up, and sure enough, he is huge in Europe. According to Wikipedia, “he has sold more albums in the UK than any other British solo artist in history”.

Have any of you heard of Robbie Williams?

Robbie
Williams

My wife knew who he was immediately. Must be the fact that she’s a British citizen (she has dual Japanese citizenship as well). She played one of his songs from an Alice 97.3 compilation we have. I rather liked it.

It turns out that he runs (owns?) a soccer team in Los Angeles. We had a friendly scrimmage set up with them at UCLA. I fully expected we’d be playing on the intramural fields where everyone else plays, but instead we played on the immaculate UCLA Football team’s practice field.

This seems to be a trend I’m noticing among British music stars. They move to Los Angeles and start up soccer teams to manage. They also seem to have the means to absorb some of the best talent in Los Angeles in doing so.

As I’ve written before, Steve Jones of the Sex Pistols runs a team in my league. I have heard that Rod Stewart has a team in Los Angeles as well. I suppose if the day comes when I can’t run on the pitch, and if I had that sort of money, I could see running a soccer club (sorry, Footbal Club) as a fantastic hobby.

Santiago
Cabrera

Not to be outdone, my team now has its own celebrity member. Santiago Cabrera from the TV show Heroes is now a member of our team.

Fortunately, he is a very talented soccer player, scoring a bycicle kick against us in our scrimmage tonight (he plays on the other team as well). Now if we could just get some former pros to join us to help solidify our midfield. Zidane, I’m looking at you buddy!

comments edit

Tim Heuer has been on a tear lately submitting some great new skins to the Subtext Skin Showcase, which is part of SubtextSkins.com.

The Showcase is the part of the site in which we display user submitted skins and allow others to download the skins. The other part of the site displays the default skins in Subtext.

Glossy
Blue Terrafirma Dirtylicious Informatif

It appears that Tim has been porting some of the nicer designs in the Open Designs website, a website devoted to open source web design.

Tim happens to also be the creator of Origami (which you can see in use on Rob Conery’s Blog), which many consider to be the nicest skin in Subtext.

If you are a Subtext user, try out some of these skins. They may find their way into future releases of Subtext.

comments edit

Simone Chiaretta, a member of the Subtext team (not to mention many other projects), just released a Vista Gadget which allows you to monitor a CruiseControl.NET build within your sidebar.

It looks spiffier than the system tray applet that comes with CCNET.

Here’s a screenshot of it docked.

CCNET Gadget
Docked

And undocked.

CCNET Gadget
Undocked

From the screenshots you can see the status of the projects he is monitoring. The good news is that the 1.9 build has been fixed since he took these screenshots.

Pretty nifty!

comments edit

I received a strange delinquency notice for a parking ticket. At first glance, it seemed normal enough. Yep, there’s my license plate number. Yep, the make of the car is correct. But look at this, the color of the car is wrong.

That’s strange since it’s not one of those cases where they indicated midnight blue when the car is black. No, they indicated red and my car is blue.

And one other minor detail was a bit off. The parking ticket was for Fillmore street in San Francisco and I live in Los Angeles.

Huh?!

I called the SF parking department and the nice woman on the phone looked into it and told me that the parking attendant made several errors in the citation and I can disregard the notice.

Several errors? I’ll say.

Like hallucinating a car that couldn’t possible be in San Francisco at the time? Or, perhaps there just happens to be a red car with the same make as mine and the same license plate number, just with a “B” where mine has an “8”.

comments edit

It wasn’t till 1987 that I experienced my first (and worst) case of technolust ever. The object that inspired such raw feelings of lust, of course, was the Commodore Amiga.

As a lowly Commodore 128 owner, which was really just a glorified Commodore 64 in a beige case, I bought every issue of the Commodore magazines of the day.

Amiga
500These magazines started showing off these lush advertisements of the Commodore Amiga, boasting of its 4096 colors and 4-channel stereo sound.

I had to have it.

Looking back, I am shocked at how much my lust for the Amiga held sway over me. I purchased a copy of every Amiga magazine on the newstand, talked about it incessantly to anyone who would listen, and had vivid dreams of the Amiga’s amazing graphics capabilities.

And when I finally got my hands on it, it was every bit as good as I had hoped.

For many Amiga users at the time, the Amiga was true to its name (spanish for female friend) in that it was the closest thing to a girlfriend we had. Give me a break, I was only twelve at the time.

Like having a girlfriend, I spent countless hours with the computer, not to mention countless dollars on peripherals and upgrades. I remember hustling for tips at the local commissary in order to upgrade the beast from 512K to 1MB of ram (cost: $99).

The reason I bring this up is I came across a recent article on the Wired website entitled Top 10 Most Influential Amiga Games, which filled me with a rush of nostalgia.

I only had the pleasure to play two of the games listed, Defender of the Crown, in which catapulting castles was pure fun, and SpeedBall 2, which probably was responsible for the pile of broken joysticks I accumulated.

Defender of the Crown Catapult
Scene Speedball 2
Screenshot

Personally though, I thought Lords of the Rising Sun (also made by Cinemaware) was even better than Defender of the Crown.

Lords of the Rising Sun
Screenshot Lord of the rising sun screenshot with a
ninja

The game sequence in which you could snipe advancing siegers using a first-person bow and arrow with a little red laser point dot was exhilarating (sadly, I could not find a screenshot).

Speedball 1
Screeshot

I also liked Speedball 1 (shown here) slightly better than 2 because the side scrolling in 2 always threw me off.

I still have my Amiga 500 gathering dust in a storage cabinet in the garage. I’ve been meaning to unpack it and see if it still works, but my home is small and there’s really no room to set it up. I figure there must be a better way to try out my old games.

Amiga Emulation!

Digging around, I discovered there’s an active project to create an Amiga emulator for *nix called UAE. There’s a Windows port called, not surprisingly, WinUAE (click for full size).

WinUAE
screenshot

Unfortunately, these projects cannot distribute the Amiga ROM nor its operating system due to copyright issues. However they do provide instructions on how to transfer the ROM and operating system over to your PC on their FAQ.

Amiga Forever

An even easier approach is to simply purchase Amiga Forever for around forty bucks. This is an ISO image that contains a preconfigured WinUAE with the original ROM and operating system files. Amiga Forever is sold by Cloanto who currently own certain intellectual property rights to the Amiga.

Amiga Forever comes with several games for the Amiga as well that vary with the edition purchased. The site also has a games section in which they list places to download more games.

For example, the Cinemaware site has disk images for pretty much all of their games available for free, including Lords of the Rising Sun.

Play Defender of the Crown Immediately

All this talk of Amiga emulation sounds like fun and everything, but seriously, do I need yet another time sink? If you’re jonesing for some Amiga gaming now and don’t want to be bothered with emulation, head over to the Cinemaware website and satiate your Amiga gaming kick by playing the Flash version of Defender of the Crown. Now about that time sink…

Though I owned a couple computers prior to the Amiga, the Amiga is truly the computer that fueled my fire for computing.

comments edit

The GeeksWithBlogs.net website just switched over its 1442 (and counting) blogs, containing 25,921 blog posts and 39,140 comments over to Subtext. As Jeff Julian reports, it only took them six hours.

Jeff posted a pic of the crew at work to make it happen (click for larger).

GWB'ers burning the midnight
oil

Not depicted in the picture are members of the Subtext team who have tried their best to be responsive and helpful to the GWB team during their early planning phases for the move.

Subtext should handle the load just fine considering that they were running on .TEXT prior, and though we’ve made a lot of changes, we haven’t changed the data access code drastically.

Tip of the hat to Scott Watermasysk for building the original .TEXT code in a scalable manner, laying a good foundation for this sort of installation.

Already, the large site may have sussed out a caching bug we’ve been trying to track down for ages, but haven’t been able to reproduce.

Anyways, congratulations to the GWB team for a successful migration.

Technorati tags: Subtext, Geeks With Blogs

comments edit

Maybe this is obvious, but it wasn’t obvious to me. I’m binding some data in a repeater that has the following output based on two numeric columns in my database. It doesn’t matter why or what the data represents. It’s just two pieces of data with some formatting:

42, (123){.console}

Basically these are two measurements. Initially, I would databind this like so:

<%# Eval("First") %>, (<%# Eval("Second") %>)

The problem with this is that if the first field is null, I’m left with this output.

, (123){.console}

Ok, easy enough to fix using a format string:

<%# Eval("First", "{0}, ") %>(<%# Eval("Second") %>)

But now I’ve learned that if the first value is null, the second one should be blank as well. Hmm… I started to do it the ugly way:

<%# Eval("First", "{0}, ") %> <%# Eval("First").GetType() == 
  typeof(DBNull) ? "" : Eval("Second", "({0})")%>

*Sniff* *Sniff*. You smell that too? Yeah, stinky and hard to read. Then it occured to me to try this:

<%# Eval("First", "{0}, " + Eval("Second", "({0})")) %>

Now that code smells much much better! I put the second Eval statement as part of the format string for the first. Thus if the first value is null, the whole string is left blank. It’s all or nothing baby! Exactly what I needed.

comments edit

UPDATE: Luke Wroblewski posted a link in my comments to his Best Practices for Form Design PDF. It is 100+ pages chock full of good usability information concerning forms. Thanks Luke!

James Avery writes about the Art of Label Placement in which he links to a few great articles on form design and label placement.

Web Application Form Design by Luke Wroblewski - This article covers the best ways to arrange labels and submission buttons.

Web Application Form Design Expanded by Luke Wroblewski - Another great article from Luke W. expanding on the same topics.

Label Placement in Forms by Matteo Penzo - Matteo takes Luke’s advice but applies eyetracking to evaluate how usable it is.

Eye Tracking
Map

Based on these articles, James decides that non-bold labels above input fields are the best for usability. Interestingly enough, a non-bold label just above the form field just happenes to be my personal preference as well.

And now, I know why.

Matteo Penzo’s research using Eye Tracking provides some empirical evidence that this arrangement is more usable.

comments edit

Rob Conery is soliciting our feedback for a panel on Open Source that he᾿ll be participating in at Mix07.

He᾿s joined by some big names in the world of Open Source Software including Miguel de Icaza. Hot Damn!

I won᾿t lie, I did want to be a part of the panel when I first heard about it (in part to get a free ticket, but also be cause I love hearing myself talk about Open Source) but did not make the cut. Now I see why and I᾿m kind of glad I᾿m not up there risking looking like a fool next to those guys.

Not to say that Rob is going to look foolish. He᾿s got a lot of smarts. You᾿ll do fine Rob! Trust me.

comments edit

How good are you at thinking on your feet?

Last night I watched the premier for a new show called Thank God You’re Here. It’s a sketch improv comedy show starring various comedy television and movie stars, who have to bluff their way through a scene. They are given costumes, a set, but no script.

The set of Thank God You're
Here

The title of the show derives from the fact that the first line of each skit is “Thank god you’re here!”

I love improv comedy and I thought Neuman from Seinfeld was great as well as the dad from Malcom in the Middle. You can watch the premier online.

It’s reminiscent of one of my favorite improv shows ever, Drew Carey’s Who’s Line Is It Anyways?, but with better sets and costumes. Though it remains to be seen if they will ever top the funniest Whose Line episode ever with Richard Simmons.

Who’s Line Is It
Anyways? 

code, sql comments edit

I’m not one to post a lot of quizzes on my blog. Let’s face it, while we may create altruistic reasons for posting quizzes such as:

  1. It’s an interesting problem I thought up
  2. It’s an interesting bug I ran into

we all know the real reasons for posting a quiz.

  1. It serves as blog filler.
  2. It’s a way to show off how smart the blogger is.

With that in mind, let me humbly present my latest SQL Quiz, which is something I ran into at work recently, and will not show off any smarts whatsoever.

The circumstances of this problem have been dramatically changed and simplified to both protect the guilty and save me from a lot of typing.

In this application, we have two tables. One contains a lookup list of various statistics. The second is a larger table of measurements for each of the statistics.

The following screenshot shows the data model.

Statistic table and Measurement
Table

The following screenshot shows the list of contrived statistics.

Statistic Table
Data

What we see above are the following:

  1. LOC per bug - Lines of code per bug.
  2. Simplicity Index - some magical number that purports to measure simplicity.
  3. Awe Factor - The awe factor for the source code.

For each of these statistics, the larger, the better.

The following is a view of the Measurement table.

Measurement Table
Data

Each measurement has the previous score and current score (this is a denormalized version of the actual tables for the purposes of demonstration).

I needed to write a query that would show each of the stats for a given developer as well as a Trend Factor. The Trend Factor tells you whether or not the statistic is trending positive or negative, where positive is better and negative is worse.

Result of the
query

Here is my first cut at the stored procedure. It’s pretty straightforward. In order to make the important part of the query as clear as possible, I used a Common Table Expression to make sure the count of measurements for each statistic can be referenced as if it were a column.

CREATE PROC [dbo].[Statistics_GetForDeveloper](
  @Developer nvarchar(64)
)
AS
WITH MeasurementCount(StatisticId, MeasurementCount) AS
(
  SELECT s.Id
    ,MeasurementCount = COUNT(1)
  FROM Statistic s
    LEFT OUTER JOIN Measurement m ON m.StatisticId = s.Id
  GROUP BY s.Id
)
SELECT 
  Statistic = s.Title
  , Developer
  , CurrentScore
  , PreviousScore
  , mc.MeasurementCount
  , TrendFactor = (CurrentScore - PreviousScore)/mc.MeasurementCount
FROM Statistic s
  INNER JOIN MeasurementCount mc ON mc.StatisticId = s.Id
  LEFT OUTER JOIN Measurement m ON m.StatisticID = s.Id
WHERE Developer = @Developer
GO

I bolded the relevant part of the query. We calculate the TrendFactor by taking the current score, subtracting the previous score, and then dividing the difference by the number of measurements for that particular statistic. This tells us how that statistic is trending.

In this application, I am going to present an up arrow for trend factors larger than 0.1, a down arrow for trend factors less than -0.1, and a flat line for anything in between. A trend factor going upward is always considered a “good thing”.

The Challenge

This works for now because for each statistic, a larger value is considered better. But we need to add a new statistic, Deaths per LOC, which measures the number of deaths per line of code (gruesome, yes. But whoever said this industry is all roses and rainbows?). For this statistic, an upward trend is a “bad thing”.

Therefore, if the current score is larger than the previous score for this statistic, we would want the TrendFactor to be negative. Not only that, we may want to add more statistics in the future. Some for which larger values are better. And some for which smaller values are better.

So here is the quiz question. You are allowed to make a schema change to the Statistic table and to the stored procedure. What changes would you make to fulfill the requirements?

Bonus points, can you fulfill the requirements without using a CASE statement in the stored procedure?

Here is a SQL script that willl setup the tables and initial stab at the stored procedure for you. The script requires SQL Server 2005 or SQL Server Express 2005.

comments edit

Subsonic
LogoRob Conery just announced that Beta 1 of SubSonic 2.0 is ready for your immediate data access needs. He’s looking for beta testers (open to anyone and everyone) to make sure this release is rock solid.

I may attempt to claim a significant contribution, but do not believe me. I only contributed a teeny-tiny amount of code to this release.

I am using a small bit of Subsonic in a current project (just using it to generate Stored Procedure wrappers since the existing database already has a legacy data model and stored procedures to work with).

While I’m talking about release dates for open source projects, I should mention that Subtext 1.9.5 will be released soon and afterwards we’ll turn our full focus to getting Subtext 2.0 out the door. I’ve made some progress on 2.0 while working on the 1.9 branch, so hopefully it will follow 1.9 shortly.

My cohorts and I finished our first draft of the book we’re working on, so I should hopefully have more time to work on Subtext. That is, till the kid arrives.

comments edit

I think it’s time to start a video collection of amazing talents people acquire when they have too much time on their hands. This one must surely qualify. It’s worth two minutes of your time to check it out.

Dice Stacking Video on
YouTube

Found via my doppleganger, the other Phil Haack.

blogging comments edit

Technorati recently released their latest State of The Blogosphere report (renamed to something about the Live Web to avoid confusion with the Dead Web) chock full of statistics and pretty graphs.

This would be interesting, if I were interested in anything other than myself. No, I don’t care about how other blogs are doing. I only care about Me Me ME!

How is MY Blog doing?

To find out I could check on some external sources. For example, Alexa.com shows that my site has experienced steady growth in the past three years (click on the chart to see the actual report page).

Alexa Graph of Haacked.com over 3
years

But lest I let that go to my head, let’s compare my site’s reach with my friend Jeff’s using Alexa’s comparison tool.

Hmmm, it may be high time I contrive my own crowd pleaser FizzBuzz post.

Moving on, let’s see what Technorati has to say.

Haacked.com on Technorati - Rank 6358 (1276 links from 473
blogs)

Wow. 6358 is a big number! That’s good right? Oh, maybe not. But we can see that 473 blogs have provided 1276 links to my blog. I should hit these suckers up for a loan!

Let’s swing over to see what Feedburner says:

Subscribers: 3,339. Site Visitors:
1,334

It’d be nice to have just one score to look at. Let’s swing over to the Website Grader.

Website Grade: 97/100 Page Rank:
6

I could have saved some time by just going here first. Hey Ma! I got an A! Can I leave the cage?

Looking Inward

Well, if there’s one thing I learned about happiness it’s to look for it inward, rather than relying on external validation. That way, you don’t have to let reality intrude on your carefully crafted world view. So let’s look at some internal statistics.

  • Posts - 1322
  • Comments - 2510
  • Spam Comments - 9818 (which is low because I periodically clean out the table)

Hmmm… I wonder what are my five most popular posts based on Ayende’s formula.

Title Web Views
Video: The Dave Chappelle Show 169,398
PHOTO: When Nerds Protest The RNC 81,353
Year of the Golden Pig 60,316
Response.Redirect vs Server.Transfer 54,076
Using a Regular Expression to Match HTML 37,807

I won’t lie. It depresses me a bit to learn that my three most popular posts have nothing to do with technology. Not only that, the most popular post by a longshot is a skit about a family with an unfortunate last name. It’s a mispelling of a horrible racial epithet, which happens to bring alot of bad spelling racists in search of god knows what.

What the Numbers Don’t Say

Well all these numbers are fine and good, but they can’t measure the enjoyment I get out of blogging. Nor can they measure the satisfaction that some readers (any reader?) gets from reading my posts. At least not until someone builds Satisfactorati or Satisfactorl.icio.us.

The numbers may not support my complete self-centered ego-centric view, but when has vanity and a self-inflated ego ever been subdued with so called “facts”?

So what is the state of your blog?

This post is a refresh to my Blogging Is Pure Vanity post from way back when.

comments edit

Jeff Atwood writes a great summary of Open Source Licenses. As far as I’m concerned, there’s really only four software licenses to worry about (open source or otherwise).

  1. Proprietary - The code is mine! You can’t look at it. You can’t reverse engineer it. Mine Mine Mine!
  2. GPL - You can do whatever you want with the code, but if you distribute the code or binaries, you must make your changes open via the GPL license.
  3. New BSD - Use at your own risk. Do whatever the hell you want with the code, just keep the license intact, credit me, and never sue me if the software blows your foot off. The MIT license is a notable alternative to the New BSD and is very very similar.
  4. Public Domain - Do whatever you want with the code. Period. No need to mention me ever again. You can forget I ever existed.

Yes, there are many more licenses, but I think you’ll do just fine if you just stick with these four. (Note, I am not a lawyer, take this advice at your own risk and never ever sue me. Ever.)

Of course, this really is focused on software, what about the content of your blog, or sample code in your blog?

For small code snippets in your blog, I recommend either explicitly releasing the samples to the Public Domain or pick the new BSD License.

UPDATE: I’ve updated this section based on feedback. Creative Commons is a poor choice for source code.

The tricky part in my mind is that there are two potential uses for source code snippets in a blog.

For example, you may just want to post the same code in your blog. In that case, I see the code as being content, for which CC might be appropriate. The other use is posting the code in an application. Then it really is source code, and CC is not appropriate.

In any case, my source code snippets are released to the Public Domain unless otherwise stated. I only ask that you do reference the blog post where you got the code from, but it is not required.

Note, except in the case of releasing content to the Public Domain, if you choose to license your code using an Open Source License or license your content using a Creative Commons license, it does not mean you give up your copyright to the material. You still own the copyright. The license just lets people know that they may make use of your content and what restrictions are in place. That is where the Some Rights Reserved phrase commonly associated with Creative Commons content comes from, as opposed to All Rights Reserved.

Also, keep in mind that you can choose to license code snippets in your blog differently from your blog’s content. Many people do not want to share their blog content, but do want to share code snippets. Just make it clear in your copyright notice.

If you want to know more about software licensing, check out my multi-part series on copyright law and software licensing for developers:

code, tech comments edit

Just something I noticed today. A lot of people (I may even be guilty of this) publish their emails on the web using the following format:

name at gmail dot com

Substitute gmail dot com with your favorite email domain.

The problem with this approach is that it is trivially easy to harvest email addresses in this format with Google.

Harvest

First, do a search for the following text (include the quotes):

”* at * dot com”

Now, all you need to do is run a regular expression over the results. For example, using your favorite regular expression tool, search for this:

(\w+)\s+at\s+(\w+)\s+dot\s+com

and replace with this:

$1@$2.com

Now before you blame me for giving the spammers another tool in their arsenal, I would be very surprised if spammers aren’t already doing this. I highly doubt I’m the first to think of it.

So what is a better way to communicate your email address without making it succeptible to harvesting? You could try mish-mashing your email with HTML entity codes. For example, when viewed in a browser, the following looks exactly the same as name at gmail dot com.

name at gmail dot com

The key is to somewhat randomly replace characters with entity codes, so that we all don’t use the exact same sequence. If we all replaced every letter with its corresponding entity code, it would be trivially easy to farm.

But by introducing some randomness, it becomes a lot more difficult to farm these emails. It’s possible, but would take more technical chops and computing power than the technique I just demonstrated.

comments edit

Code CompleteA while ago I read Steve McConnel’s latest book, Software Estimation: Demystifying the Black Art, which is a fantastic treatise on the “Black Art” of software estimation.

One of the key discoveries the book highlights is just how bad people are at estimation, especially single point estimation.

One of several techniques given in the book focuses on providing three estimation points for every line item.

  1. Best Case: If everything goes well, nobody gets sick, the sun shines on your face, how quickly could you get this feature complete?
  2. Worst Case: If your dog dies, your significant other leaves you, and your brain turns to mush, what is the absolute longest time it would take to get this done? In other words, there is no way on Earth it would take longer than this time, unless you were shot.
  3. Nominal Case: This is your best guess, based on your years of experience with building this type of widget. How long do you really think it will take?

The hope is that when development is complete, you’ll find that the actual time spent is between your best case and worst case. McConnell provides a quiz you can try out to discover that this is harder than it sounds.

Over time, as you reconcile your actual times into your past estimates, you’ll be able to figure out what I call your estimation batting average, a number that represents how accurate your estimates tend to be.

Once you have these three points for a given estimate, you can apply some formulas and your estimation batting average to create a probability distribution of when you might complete the project. Here is a simple example of what that might look like (though in real life there may be more point values).

  • 20% 50 developer days
  • 50% 70 developer days
  • 80% 90 developer days

So the numbers above show that there’s only a 20% chance the project will be complete within 50 developer days and an 80% chance of completion if the development team is given 90 developer days.

This technique showcases the uncertainty involved in creating estimates and focuses on the probability that estimates really represent.

After reading this book, I fired up Excel and built a nice spreadsheet with the formulas in the book and columns for these three estimation points. Now I can simply enter my line items, plug in my best, worst, and nominal cases, and out pops a probability distribution of when the project will be complete.

However, as I mentioned before, the crux of this technique relies on that estimation batting average. But when you’re just starting out, you have no idea what that average is, so you have to pull it out of the air (I recommend pulling conservatively).

The reason I bring this all up is that I watched an interesting interview today on the ScobleShow. Robert Scoble interviewed FogCreek founder and well known technology blogger, Joel Spolsky.

Joel let it be known that they are building a new scheduling feature for FogBugz 6 that reflects the reality of software estimation better than typical scheduling software.

For example, one key observation he makes is that estimates tend to be much shorter than the actual time than they are longer.

For example, it’s quite common to estimate that a feature will take two days, only to have it take four days, or eight days. But it’s rare that the feature actually ends up taking one day. Obviously it’s impossible for that feature to take 0 days or -4 days.

This makes obvious sense when you think about it.

The amount by which you can finish a feature before an estimated time is constrained, but the amount of time that you can overshoot an estimate is boundless.

Yet many software scheduling software completely ignore this fact, hoping that an underestimation on one item will be offset by an overestimation of another. They assume these over and under estimates are balanced, which they are clearly not.

This new feature will attempt to take that into account as well as your track record for estimates (your batting average if you will), and provide a probablity of completion for various dates.

Sounds like a brilliant idea! If done well, that would be quite hot and allow me to chuck my hackish Excel spreadsheet.

code, tdd, open source, tech comments edit

RhinoAyende just announced the release of Rhino Mocks 3.0. The downloads are located here. If you aren’t subscribed to Ayende’s blog, I highly recommend it. This guy never sleeps and churns out code like a tornado.

Ever since I discovered mocking frameworks in general, and especially Rhino Mocks, mocking has become an essential part of my unit testing toolkit.

A while ago I wrote a short intro demonstrating how to write unit tests for events defined by an interface. This small example shows the usefulness of something like Rhino Mocks.

If you’re wondering what the difference between a mocks, stubs, and fakes, be sure to read Jeff Atwood’s Taxonomy of Pretend Objects.