Wednesday, April 07, 2010

TIOBE Programming Community Index

I've conversed with numerous colleagues and read papers about how to introduce programming and computer science to new CS students. We all tend to largely agree on pedagogic techniques and fundamental principles that should be conveyed, particularly in CS-1 courses. But there continues to be differing views on what language should be used. How do we choose what is best for our students? Again, very little agreement here. Some think that we should teach using languages that are the most widely used in industry. Others want to choose languages that are at the exact opposite end of the spectrum. If you go to the major job search websites, Java is still the clear winner here. Are there other measures?

Introducing the TIOBE Programming Community Index. This group runs a monthly analysis of programming language popularity. (More on this metric shortly.) It seems that as of April 2010, the C programming language is "back at number 1 position!" This takes over Java's domination, moving it to #2. Moreover, regarding Java, they state,

"So the main reason for C's number 1 position is not C's uprise, but the decline of its competitor Java. Java has a long-term downward trend. It is losing ground to other languages running on the JVM. An example of such a language is JavaFX script that is now approaching the top 20."

Before throwing out your Java books and hopping back on the C bandwagon of the 1980s, let us examine how they create this list. See the index definition. I'll summarize the most essential metric here. They go to all of the popular search engines, enter the query +" programming", and tally the number of responses. Minor adjustments are made to normalize numbers and such, but that pretty much summarizes their statistic. 

Let's experiment using Google:
  • Java -  about 2,700,000 hits
  • C - about 2,420,000 hits
  • C++ - about 1,530,000 hits
  • PHP - about 1,060,000 hits
  • C# - about 700,000 hits
  • Python - about 559,000 hits
  • Ruby - about 205,000 hits
Let's look at some of the lesser known languages out there, some of which have been around for a LONG time (and some of which should be more widely considered, at least in academia.)
  • Scheme - about 58,300 hits
  • Scala - about 22,800 hits
  • Smalltalk - about 16,300 hits
  • Go - about 59,600 hits (presented in honor of Google. After all, I used their search engine. :-)
I'm always pleased when I can easily replicate published results. But, what do these numbers mean? I'm not really sure. It's essentially seems like a popularity contest. Do they directly relate to industry? How do they relate to what is taught in academia? And, what type of correlations are there with respect to the time that the language has been available? That would be interesting to know. I am inclined to believe that there are probably some loose correlations to be found among these and many other variables still hidden. Those other variables could be easily characterized by sifting through those millions of web page hits. This looks like a neat job for.... Data Mining!!! :-) Anyway, these numbers should really be observed with a cautious eye, and not read into too deeply. The paper states that they try to catch some false positives, but a 10 second glance through C programming hits shows numerous C++ hits as well, and vice versa. It would be rare in this day to present a C only web page without mention of C++, and vice versa. Regardless, the trends are interesting to observe. 

Meanwhile, let the debate about the language to use for the CS-1 and Data Structures courses continue on. I'm sure it'll make for interesting conversations for years to come. As soon as we figure it out, it'll probably need to be re-evaluated again. :-)

No comments: