Why is Python Growing So Quickly? The translation has been licensed by original author David Robinson.

According to Stack Overflow’s recent survey, Python has become the fastest growing mainstream programming language and the most clicked TAB on Stack Overflow for visits from high-income countries.

Why is Python growing so fast? Python is used for everything from web development to data science to DevOps. So it’s worth taking a closer look at the specific ways in which Python has become more widely used recently. I’m a data scientist using R, and I’m interested in the evolution of Python in my field. In this article, I’ll take another look at Stack Overflow data to understand exactly where Python usage is growing and in which companies and organizations Python is being used the most.

The analysis leads to two conclusions. First, the fastest growing use of Python is mainly in the following areas: data science, machine learning, and academic research. This is evident in the growth rate of usage for the Pandas package, which is also the fastest growing Python related tag on the site. In terms of which industries are using Python, we find that Python is used more in the following industries: electronics, manufacturing, software, government, and especially universities. Overall, however, Python’s growth has been fairly evenly distributed across industries. In summary, we can see from the conclusion that data science and machine learning have become common practice in many different types of companies, and Python is the widely accepted choice in this process.

We analyzed data from high-income countries recognized by the World Bank Organization.

Types developed in Python

Python is a programming language that can be used for many purposes, from web development to data science to many different types of tasks. So how do we sort out Python’s recent evolution between these domains?

As a beginner, we can take a look at the most famous Python packages in each domain to see the growth in the number of visits to the tags that represent them. Compare Django and Flask, the web development frameworks, to the data science packages NumPy, Matplotlib, and Pandas. (You can also use Stack Overflow Trends to compare question rates, not just views)

Looking at Stack Overflow visits from high-income countries, it’s clear that Pandas is the fastest growing Python package: it only appeared in 2011, and about 1% of the questions on Stack Overflow are about it. Over time, the number of questions about NumPy and Matplotlib has grown considerably. In contrast, the number of Django-related problems has remained fairly flat over this time, and Flask has grown, but remains a relatively small percentage. This suggests that Python’s growth should be largely due to data science, not web development.

But that’s not the whole story, because only the python-specific packages that are widely used are shown here. System administrators and DevOps engineers also use Python in many places, and their Python questions involve Linux, Bash, Docker, and so on. Also, a lot of python-related web development questions don’t mention Django or Flask, and the questions are associated with “support” tags like JavaScript, HTML, and CSS. But we can’t take Linux, Bash, JavaScript, and so on directly into account and arbitrarily assume that they are Python related. Therefore, we will only explore tags that are mentioned in conjunction with Python.

We only took into account visits in the summer of 2017 (July and August), which eliminated the impact of students, and also eliminated the huge computing issues that occur over a long statistical period. We only consider registered users and have viewed at least 50 questions on Stack Overflow in that time frame. We believe that there are at least two requirements for a person to be a Python user: 1) the tags he browses are primarily Python; 2. At least 20% of the pages he visits are Python-related.

What other tags would someone who browses python-related tags like to browse?

It should come as no surprise that Pandas is the most visited tag for Python developers. The second most visited tag by Python developers is JavaScript, which represents those who use Python for web development, as does Django not far below. This confirms our idea that we should consider the growth of tags that are accessed with Python, not just those associated with Python in general.

You can see additional technical “clusters” in the following section of the list. We look at which tags tend to be associated to see how they relate: whether Python users view the two tags very differently. After filtering tags in pairs with high Pearson correlation algorithm, we can get the following network graph. There are many other visualizations like this.

As you can see from the figure, several large clusters of technologies can roughly describe the kinds of problems that can be solved in Python in general. The middle section above shows clusters of data science and machine learning: Pandas, NumPy, and Matplotlib in the middle are closely tied to R, Keras, and TensorFlow. The clusters below represent web development, linked to JavaScript, HTML, CSS, Django, Flask, and JQuery. There are also two smaller clusters, one for systems administration and DevOps, and the other for data engineering (Spark, Hadoop, and Scala) on the right.

Growth by subject

Now we’ve seen that Python-related Stack Overflow visits can be roughly divided into several topics. Now we can analyze which topics are driving the huge increase in Python traffic on Stack Overflow.

Imagine that when we look at a user’s browsing history, we find that Python is the most visited TAB. So how can we tell if he’s a web developer, data scientist, system administrator, or something else? We should look at the second most visited TAB, then the third, etc., and follow his list of visits until we find something related to one of the clusters shown above.

We’ve come up with a simple way to categorize a user into a topic. Here are nine of the most frequently visited tags to categorize users.

  • Data scientists: Pandas, NumPy, or Matplotlib;
  • Web developers: JavaScript, Django, HTML;
  • System administrator or DevOps: Linux, Bash, or Windows;
  • Others: Except for the above nine tags, all other tags account for less than 5% of the traffic.

This is not rigorous enough, but it allows us to quickly assess the impact of each type of Python growth. We’ve tried more rigorous algorithms like the potential Dirichlet distribution, but we get similar results.

What kind of Python developer is slowly becoming more common? Note that we are categorizing users, not browsing issues, and we are showing a subset of all registered users on Stack Overflow (including those who do not have access to Python).

The chart above shows a relatively slow and steady increase in Python traffic over the past three years due to web or system-administration-related technologies. But Python traffic related to data science is growing fast. This suggests that Python’s widespread use in data science and machine learning should be a major driver of its rapid growth.

We also calculate the number of visits to each tag by Python developers between 2016 and 2017 to determine the growth of individual tags. It’s also possible, for example, that Javascript traffic is relatively flat overall, but actually the percentage of internal visits to Python users has declined. Once we have these tag-based growth rates, we can display these results in our network graph to understand which topics are growing and which are shrinking.

This helps confirm our suspicion that the vast majority of python-related growth is data science and machine learning related. The color of those clusters is trending toward orange, indicating that the corresponding tags are beginning to become a major part of the Python ecosystem.

industry

Another way to understand the growth in Python usage is to consider what type of company the traffic is coming from. The difference between this perspective and the type of developer thinking about web browsing is that retail companies and media companies both employ data scientists and web developers.

We are focusing on two countries where Python growth is very high: the United States and the United Kingdom. In both countries, we can break down traffic by industry (just like comparing AWS and Azure).

The academic world, dominated by colleges and universities, tops the list of visitors. Is it because undergraduates now take Python in programming classes?

That makes sense, but it’s not entirely true. As we mentioned in a previous article, Python traffic from colleges and universities is steady in the summer, not just in the spring and fall. For example, Python and Java are both high in traffic from colleges and universities, but seasonal differences can be seen.

As you can see in percentage terms, the number of Visits to Java drops off a cliff every summer because Java has become commonplace in college classrooms. By contrast, Python accounts for a high percentage of traffic each summer. So the bulk of the flow of Python questions from universities comes from academic researchers, who continue to work throughout the year. This, in turn, provides evidence that Python’s growth has come primarily from scientific computing and data analysis.

Python is widely used in government and growing rapidly, as well as in electrical and manufacturing. I’m not that familiar with these industries and would like to know why. Python is not widely used in retail and insurance companies, where some surveys suggest Java is still dominant.

The main purpose of this article is to investigate the reasons for Python’s growth. Is there a particular industry in which Python traffic has increased significantly?

Python has spread across many industries in the past year, at least according to U.S. and U.K. data. In every industry, Python has seen an absolute increase in traffic of two to three percent. (Note that this implies even greater relative growth in industries such as insurance and retail, which are not as widely used.)

According to 2017 data to date, Java is still the most visited TAB in most industries, but Python has continued to grow. For example, according to the financial industry (a big contributor to Stack Overflow traffic), Python tags have moved up from fourth place in 2016 to second place in 2017.

conclusion

As a data scientist who used Python and now uses R, should I switch back to Python after seeing this analysis?

I don’t think so. On the one hand, R is also growing very well, with an earlier article showing it second only to Python in the list of fastest growing programming languages. On the other hand, I like using R for data analysis, which has little to do with how widely it is used. I’m also planning to write another article about my experience switching from Python to R, what FEATURES I like about both languages, and why I don’t want to be forced to switch back.

Either way, data science is an exciting and rapidly growing field, and there will naturally be multiple languages in it. My main goal is to encourage new developers to think about building their skill set in data science. There is no doubt that this is the fastest growing part of software development, and it is widely practiced in many industries.

Thanks to Guo Lei for correcting this article.