The Top 100 Java Libraries in 2017-based on 259,885 Source Files. Original author :Henn Idan

A year has passed, as if we had just analyzed the 2016 Top Java libraries on GitHub yesterday. This year, we used Google’s BigQuery for data retrieval to get more accurate results.

BigQuery is a fully hosted petabyte low-cost enterprise data warehouse designed by Google specifically for data analytics needs. The service allows developers to run SQL statements against large databases using Google’s architecture. BigQuery scans 1 TERabyte of data in seconds and 1 petabyte of data in minutes.

First, we pulled the top 1,000 Java repositories from GitHub, ranked by star number, and then filtered out Android projects, leaving 477 pure Java projects.

Our analysis is based on these 477 pure Java projects. We counted all the class library imports after de-duplication. A more in-depth introduction to statistical methods is provided at the bottom of the article.

Without further ado, let’s take a look at the most popular Java libraries of 2017. And who’s holding the no. 1 spot this year?

Top 20 Most popular Java class libraries

The number one class library, as it was last year, is stillJUnit. Based on itJUnit RunnerTake the second place, or even the older onejunit.frameworkThis time, he is also in third place. That is to say,JUnitAll the top three.

Mockito, the open source mock testing framework, ranks fourth.

Mockito is a powerful Mock testing framework for Java development that allows you to create and configure Mock objects to simplify the testing of classes that have external dependencies.

Slf4j, the logging component in Java, ranks fifth. This is a sign that today’s developers love logging. It can also be seen that Java developers have low usage of the java.util.logging library. We’ve also looked at some of the logging habits and preferences of Java developers. Organize it in your eBook.

The rise of the Hamcrest library shows that developers really need a better testing environment.

Hamcrest is a framework that assists in writing software tests in the Java language. It supports the creation of custom assertion matchers (the name “Hamcrest” is an anemic formation of “matchers”), allowing declarative definition of matching rules. These matchers are useful in unit testing frameworks such as JUnit and jMock.

Analyzing the top libraries, we found that testing is very important for writing better code. This speaks to the fact that online problems are the last thing developers want, so we do everything we can to avoid them. (There are also some ads for the author’s website in this section, which I won’t translate.)

Google’s Guava library ranks seventh. The most popular JSON class library is Jackson. At number 20 on the list is a new class library: org.w3c.dom. It provides a series of interfaces for manipulating the DOM.

Other libraries that deserve our attention

Looking at the top 100, we found that Spring did very well. The following eight libraries make the top 100:

# 57 - org. Springframework. Beans. Factory. The annotation # 60 - org. Springframework. Context # 65 - Org. Springframework. Context. The annotation # 66 - org. Springframework. Stereotype # 68 - org. Springframework. Util # 81 - Org. Springframework. Test. The context. Junit 4 # 85 - org. Springframework. Beans. Factory # 91 - org.springframework.web.bind.annotationCopy the code

In addition to Spring, the Apache class library has a wide range of applications:

# 16 - org.apache.com mons. IO # 22 - org. Apache. HTTP # 24 - org.apache.com mons. Lang # 25 - org. Apache. HTTP. Impl. Client # 30 - . Org. Apache. HTTP client # 33 - org. Apache.. HTTP client. The methods # 34 - org.. Apache log4j # 35 - mons. Org.apache.com codec. The binary # 45 - org.apache.com mons. Lang3 # 53 - org. Apache. HTTP. Entity # 61 - org. Apache. HTTP. Util # 64 - org.apache.com mons. Logging Org.apache.http. Message #88 -- org.apache.zookeeper #95 -- org.apache.hadoop.conf #98 -- org.apache.hadoop.conf Org. Apache. HTTP. Client. Config # 100 - org. Apache.. HTTP client. UtilsCopy the code

I’m glad to see that the Apache class library is doing so well. I’m a big fan of not reinventing wheels, and some of the methods we might use in everyday development are best implemented in Apache’s library. Such as processing IO streams, processing collections, and so on.

AssertJ, a significant improvement on last year’s ranking, provides Java with Fluent assertions. This year it climbed to number 50.

We were also found in the list of the javax.mail. The script and org. Apache.. HTTP client. Utils these two scripts API.

The scripting API is used by application programmers who want to execute programs written in the scripting language in their Java applications.

The Top 100 Java libraries in 2017

Analysis method

As mentioned in this article, we used Google’s BigQuery to process data this year. We pulled 1000 copies of the repository code through the GitHub API. After filtering out Android, Arduino, and some outdated repositories, we still have 259,885 Java source files left. After we de-duplicated the libraries used in the same repository, 25,788 libraries remained.

How do we actually do that?

First, we create a repository table to store the top star repositories, named JAVA_top_repos_filtered: javA_top_repos_filtered

SELECT full_name FROM java_top_repos_1000 WHERE NOT ((LOWER(full_name) CONTAINS 'android') OR (LOWER(full_name) CONTAINS  'arduino')) AND ((description IS null) OR (NOT ((LOWER(description) CONTAINS 'android') OR (LOWER(description) CONTAINS  'arduino') OR (LOWER(description) CONTAINS 'deprecated'))));Copy the code

Now we have the names of the top-ranked libraries, and we pull them all down:

SELECT
  repo_name,
  content
FROM
  [bigquery-public-data:github_repos.contents] AS contents
INNER JOIN
(
  SELECT
    id,
    repo_name
  FROM
    [bigquery-public-data:github_repos.files] AS files
  INNER JOIN
    java_top_repos_filtered AS top_repos
  ON
    files.repo_name = top_repos.full_name
  WHERE
    path LIKE '%.java'
) AS files_filtered
ON
  contents.id = files_filtered.id;
Copy the code

Now that we have the source code for each project, we need to filter out the deduplicated import statements and then extract the package name.

SELECT
  package,
  COUNT(*) count
FROM
( //extract package name (exclude last point of data) and group with repo name (to count each package once per repo)
  SELECT
    REGEXP_EXTRACT(import_line, r' ([a-z0-9\._]*)\.') package,
    repo_name
  FROM
  ( //extract only 'import' code lines from *.java files
    SELECT
      SPLIT(content, '\n') import_line,
      repo_name
    FROM
      java_relevant_data
    HAVING
      LEFT(import_line, 6) = 'import'
  )
  GROUP BY
    package,
    repo_name
)
GROUP BY
  package
ORDER BY
  count DESC;
Copy the code

Finally, we filter again to make sure there are no native libraries provided by Android, Arduino, outdated, or Java.

SELECT
  *
FROM
  java_top_package_count
WHERE
  NOT ((LEFT(package, 5) = 'java.') OR
       (LOWER(package) CONTAINS 'android'))
ORDER BY
  count DESC;
Copy the code

At this point, you have a list of the Top 100 Java libraries of 2017.

One last thought

One main takeaway: Those libraries that were popular in 2016 will remain popular in 2017. This shows that the developers, teams, or companies behind these libraries are working hard to make them better.

This also means that if you’re going to start writing your own Java project, or if you’re in daily development, our spreadsheet can provide some good advice. These top-ranked libraries are a good choice.