R is still hot in PHDs, but Python is king of The Times.
Matt Asay (@MongoDB) November 25, 2013
R is clearly the language of choice for data scientists, but Python is stealing R’s ground.
There are many reasons for this change, but perhaps the biggest one is that Python is more versatile and simpler than the complex programming environment of R, which is difficult to master.
In a world increasingly dependent on data, simplicity is sure to win out.
R: Not really a programming language
Part of the reason people take such pains to learn R is that it is not a programming language. As expert John Cook points out, R is really a statistical interactive environment, not really a programming language. He suggested that rather than thinking of R as a programming language, it should be thought of as having programming language characteristics.
Also, R doesn’t look like a traditional programming language at all, which is hard for would-be R developers to master.
But for tools like SAS and SPSS, R reduces complexity for analysts, as Bob Muenchen points out, because it combines macros (Marcro) and matrix languages, while in other languages, such as SPSS, you have to learn it yourself. But if they expect R to be like Stata, they will be disappointed.
To sum up, R… It’s different, it makes things harder…
Python: Lowers the bar for data science
Python, however, is very easy to learn. For one thing, most developers are familiar with Python and can use it in a variety of programs. Unlike R, which is limited to the domain of user data analysis, a developer can experience Python the first time she scripts her website or other programs.
As companies struggle to get their data to work, they also go to great lengths to find qualified data scientists. Often, however, such data scientists are already working for them and should be familiar with Python. Because it is so important to provide the right analytics based on user data, it is much more effective for companies to train their own big data technicians on complex user data than it is to train new data scientists themselves. As Svetlana Sicular of Gartnet says
One Python rules all
Setting aside the existing Python talent pool, one of the biggest benefits of using Python is the increased efficiency of using the same programming language in different programs. Tai Yarkoni, a researcher at the University of Texas at Austin, explains:
It has been shown that development and analysis in the same language can be beneficial. For one thing, when you can do everything in just one language, you don’t have to torture yourself by saying that Ruby uses blocks instead of indentation, or that you need to call the len method of an array in Python instead of Array.length to get the length of an array. Also, you never have to worry about interfaces between different languages in your project. There is nothing more annoying than processing the same literal data in Python, and finally making it in the format you want, only to know that you have to write it to disk in a different format so that you can hand it over to R or Matlab for some other analysis. Individually, this is not a big problem. Output them in Python as a CSV or JSON file, read them in R, and add them. None of this would be possible if only one language were used.
There is no exaggeration here. It’s a general truth that when we praise the right technology for how well it solves our problems, technology wins. As David Himrod, Director of AppNexus at Optimization and Analytics, puts it, “The biggest challenge at AppNexus is hiring a diverse workforce in a unified technology space. Python provides employees with diverse backgrounds, especially engineers, mathematicians, and analysts — a common, easy-to-understand language that companies can use to define new functional prototypes.”
Mainstream data science using Python
Python still lacks some of R’s richness in data analysis, but the gap is closing fast. Keep in mind that the key to Python’s success is not its ability to handle arcane methods better than R or any other language, but its ease of learning and generality. Data science has moved beyond the realm of the junior geek, as was evident at the O’Reilly Strata conference last month. The convention, once the haunt of PhDs, is now dominated by ordinary economic analysts and those sent by companies to analyze big data.
The new conference is more likely to use Python than R. Python is relatively easy to use, and they already use it in other projects. In other ways, people prefer tools that they already know or are easy to learn over powerful and complex tools, and should avoid using such powerful and complex tools if possible.