When searching for source code in a particular language, what do the words being searched on tell you about that language?

Koders.com publishes an interesting Open Source Zeitgeist which focuses on search trends and patterns within open source code. This is very similar to Google’s Zeitgeist, but grouped by programming language and specific to open source code. This might help us gain some insight into answering the above question.

For example, compare this screenshot of the top Ruby, Java, and C# searches.

Top Java Searches - 1. md5, 2.swing,
3.javaTop
C# Searches 1.system, 2.dataset,
3.openforecast Top Ruby Searches 1. proxy, 2.file,
3.socket Top PHP Searches 1. None, 2. excel,
3.mail

It’s hard to draw any conclusive conclusions based on this sample, but let me offer a few uninformed thoughts, and you can tell me how off-base I am.

Someone suggested that you sort of get a sense of the maturity of a language by the terms being searched. I can kind of see that if I define maturity in this case to mean how well the general developer community within this language understands the features of the particular language.

The idea is that if a language has been around for a long time, there might not be as many searches on basic language features and more searches that appear to be task focused, or at least on esoteric features of the language. I admit, I’m not exactly convinced. Is this true? Let’s take a look.

Take Ruby for example. Even though it’s been around as long as Java, it is only recently (past few years) that it has had a huge surge in popularity. Thus, many of the top search terms seem focused on programming constructs such as proxy, file, socket, and thread. This might reflect the large number of people just learning their way around the language.

Then again, Ruby developers are also searching on terms such as rails, controller, activerecord. These are mature software development concepts.

Whereas Java, which arguably is more mature and has a much larger community, the top terms are slightly more esoteric (md5, swing, tree) or just vain. Java developers search for “java” when searching Java code? How many search results does that produce?However, also in the top are the terms string and file. That makes sense since even though Java is mature, there are still lots of new Java developers.

What’s really interesting to me is the inclusion of “Hibernate” as number 10 in the Java results.

Contrast this to C# where It does not surprise me that dataset is number 2 for C#. It’s the workhorse for the RAD developer. It appears that in pure numbers, the DataSet is winning over OR/M and such. There are no search results for activerecord, NHibernate, Subsonic, OR/M, etc… Whether that is a sad thing or not I leave for a subsequent flame war.

What’s interesting to me is that PHP seems really focused on the domain. Being unfamiliar with PHP, I could totally be wrong, but with search terms like excel, mail, and forum, that’s the impression I get.

Sort of makes sense that an old established widely used scripting language would have its basic features already understood. Though I have no idea why the top search term would be none. Are PHP programmers nihilistic?

In any case, many of you are thinking I’m drawing too many conclusions from too little data. You are absolutely correct. This is mere idle speculation already colored by preconceived notions.

However, I do find it interesting to look at these results and ask, what do they say about these languages and their users?