Open Source Programming Language Zeitgeist

When searching for source code in a particular language, what do the words being searched on tell you about that language?

Koders.com publishes an interesting Open Source Zeitgeist which focuses on search trends and patterns within open source code. This is very similar to Google’s Zeitgeist, but grouped by programming language and specific to open source code. This might help us gain some insight into answering the above question.

For example, compare this screenshot of the top Ruby, Java, and C# searches.

Top Java Searches - 1. md5, 2.swing, 3.javaTop C# Searches 1.system, 2.dataset, 3.openforecast Top Ruby Searches 1. proxy, 2.file, 3.socket Top PHP Searches 1. None, 2. excel, 3.mail

It’s hard to draw any conclusive conclusions based on this sample, but let me offer a few uninformed thoughts, and you can tell me how off-base I am.

Someone suggested that you sort of get a sense of the maturity of a language by the terms being searched. I can kind of see that if I define maturity in this case to mean how well the general developer community within this language understands the features of the particular language.

The idea is that if a language has been around for a long time, there might not be as many searches on basic language features and more searches that appear to be task focused, or at least on esoteric features of the language. I admit, I’m not exactly convinced. Is this true? Let’s take a look.

Take Ruby for example. Even though it’s been around as long as Java, it is only recently (past few years) that it has had a huge surge in popularity. Thus, many of the top search terms seem focused on programming constructs such as proxy, file, socket, and thread. This might reflect the large number of people just learning their way around the language.

Then again, Ruby developers are also searching on terms such as rails, controller, activerecord. These are mature software development concepts.

Whereas Java, which arguably is more mature and has a much larger community, the top terms are slightly more esoteric (md5, swing, tree) or just vain. Java developers search for "java" when searching Java code? How many search results does that produce? However, also in the top are the terms string and file. That makes sense since even though Java is mature, there are still lots of new Java developers.

What’s really interesting to me is the inclusion of "Hibernate" as number 10 in the Java results.

Contrast this to C# where It does not surprise me that dataset is number 2 for C#. It’s the workhorse for the RAD developer. It appears that in pure numbers, the DataSet is winning over OR/M and such. There are no search results for activerecord, NHibernate, Subsonic, OR/M, etc... Whether that is a sad thing or not I leave for a subsequent flame war.

What’s interesting to me is that PHP seems really focused on the domain. Being unfamiliar with PHP, I could totally be wrong, but with search terms like excel, mail, and forum, that’s the impression I get.

Sort of makes sense that an old established widely used scripting language would have its basic features already understood. Though I have no idea why the top search term would be none. Are PHP programmers nihilistic?

In any case, many of you are thinking I’m drawing too many conclusions from too little data. You are absolutely correct. This is mere idle speculation already colored by preconceived notions.

However, I do find it interesting to look at these results and ask, what do they say about these languages and their users?

Technorati tags: , ,

What others have said

Requesting Gravatar... DotNetKicks.com Mar 22, 2007 11:46 PM
# What does code search tell you about a programming language?
You've been kicked (a good thing) - Trackback from DotNetKicks.com
Requesting Gravatar... Csaba Ketszeri Mar 23, 2007 4:06 AM
# re: Open Source Programming Language Zeitgeist
In the php searches you can easily spot the features changed in PHP5. (OO XML)
Requesting Gravatar... John Mar 23, 2007 5:21 AM
# Don't get it
I don't think the analysis holds much water. There are too many variables to just say X language is more mature, and which is not.
Requesting Gravatar... Harald Korneliussen Mar 23, 2007 5:29 AM
# re: Open Source Programming Language Zeitgeist
Yes, but why are these terms searched for?

I suspect that when searching for "md5", it might be to find out what the standard way of making an md5 hash in the standard libraries of Java might be. It may be much faster to see by example, than by finding it in javadoc and trying to read yourself to how to use it.

Searches for "String" or "java" though, might just be idle curiosity.
Requesting Gravatar... Mladen Mihajlovic Mar 23, 2007 6:24 AM
# re: Open Source Programming Language Zeitgeist
My thoughts on your findings:

1. The fact that each search has the name of the language in it (java searches with java term, ruby searches with ruby term) is most probably a left over from google as there you can't filter by language. Probably most people forget and enter the language they are searching for before they realise they can filter it using another box below the search box.

2. Rails, ActiveRecord and Controller terms are mostly there because Ruby on Rails has become such a buzzword and most people are trying to learn how it works or to use it, and are learning Ruby through Rails instead of the other way around. I doubt they are searching for them because Ruby is such a mature product.

3. None is probably there because people would like to browse the included projects instead of searching for a specific term? I dunno - strange one that.
Requesting Gravatar... Ben Mar 23, 2007 7:09 AM
# re: Open Source Programming Language Zeitgeist
One factor that I think my have not be completely represented here, is the idea of hype or growth. Even a mature language can undergo growth spurts. For example, if there was a mass case of deadly e.coli at the latest C++ conference, there may be a huge number of developers brushing up on or learning C++ so that they can fill the newly created void.

Ruby is newish, but it is also very hyped right now. If Java were to come out with the next "killer" technology, I am sure you would see the newbie search terms begin to bubble up to the top of the list.
Requesting Gravatar... Picacodigos Mar 23, 2007 7:41 AM
# re: Open Source Programming Language Zeitgeist
Also, I don't believe that if "Subsonic" or "NHibernate" are low searched for terms that means they are not widely used: one searches for "subsonic" in Google at first, but when you start using it, you search its own forums when in doubt...

Requesting Gravatar... SirMo Mar 23, 2007 7:59 AM
# re: Open Source Programming Language Zeitgeist
PHP: none is there because it takes you to php.net . Php has by far the best reference website in php.net. This is probably why you don't see many searches on the php language constructs. Although I would venture to say that php user base has many novice programmers.

Personally I prefer Python.

So I guess in order to be able to determine maturity of the language we first have to know how a particular book or a popular website for a given language addresses all the questions a new programmer may have.
Requesting Gravatar... Eddie Velasquez Mar 23, 2007 8:10 AM
# re: Open Source Programming Language Zeitgeist
Well, another factor is that the number of experienced developers that actually use Open Source Zeitgeist could be very limited.
Requesting Gravatar... Haacked Mar 23, 2007 9:20 AM
# re: Open Source Programming Language Zeitgeist
@John: I would tend to agree with you. :)

@Harald: Exactly right. We don't have any idea why these words were searched, which would give us more insight.

@Mladen: (nice avatar by the way ;) ) This Zeitgeist is produced by Koders which does allow language filtering. Though it's possible that they were using the "All Languages" filter.
Requesting Gravatar... Mladen Mihajlovic Mar 23, 2007 10:56 AM
# re: Open Source Programming Language Zeitgeist
Thanks, it's from one of my all time favourite games: Ultima VI (http://www.abandonia.com/games/en/95/Ultima6FalseProphet.htm) ;)

About the filter: I know about it - but people could miss the filter - so that is my suggestion why each block has a language term as well.

It's all Google's fault ;)
Requesting Gravatar... Haacked Mar 23, 2007 11:29 AM
# re: Open Source Programming Language Zeitgeist
@Mladen. My favorite is Ultima IV. But that's the last one I played. ;)

In this case, it's not Google's fault. These results are from Koders.com. So blame them! ;)
Requesting Gravatar... orcmid Mar 23, 2007 12:05 PM
# re: Open Source Programming Language Zeitgeist
I was wondering if the prevalence of "java" in Java and "system" in C# code searches is related to searches for class names and include/using statements.

Umm, as in java.security.MessageDigest.getInstance("MD5")
Requesting Gravatar... thomash Jul 03, 2008 10:22 PM
# re: Open Source Programming Language Zeitgeist
i believe those top 20 lists are totally inaccurate. it seemed very strange to me that for example md5 was much more commonly searched for than e.g. string. so i did a quick google trends query:

www.google.com/trends

the keywords java and string are searched for over 32 times as much as the keywords java and md5. this seems intuitively correct.

i don't know how you got those lists exactly but they seem way off.

What do you have to say?

(will show your gravatar)
Please add 7 and 5 and type the answer here: