ClickHouse was originally developed as a solution for Web Analytics in Yandex Metrica, a popular service for analyzing Web traffic that is currently ranked second behind Google Analytics.

In 2008, Metrica team engineer Alexey Milovidov was looking for a database that could create reports on metrics such as daily page views, unique visitors, and bounce rates without having to aggregate the data up front. The idea is to provide a wide range of metrics data and let users ask any questions about them, which is a classic problem for data warehousing.

However, Alexey could not find a program that met Yandex’s requirements, especially for large data sets, linear scaling, efficiency, and compatibility with SQL tools. In short: Similar to MySQL, but used for analysis application OLAP.

So Alexey wrote one of these prototypes, which was originally a GROUP BY prototype. The prototype evolved into a complete solution called ClickHouse, or “Clickstream Data Warehouse” for short.

Alexey added additional features, including SQL support and a MergeTree engine. The SQL dialect is superficially similar to MySQL, which is also used in Metrica, but cannot handle query workloads without complex pre-aggregation.

By 2011, ClickHouse was in production with Metrica. Over the next five years, Alexey and a growing team of developers expanded ClickHouse to include new use cases.

By 2016, ClickHouse was Metrica’s core back-end service. It has also become entrenched as a data warehouse within Yandex, expanding into use cases such as service monitoring, network flow logging, and event management.

ClickHouse has evolved from a one-person project to business-critical software with a full team of more than a dozen engineers led by Alexey.

By 2016, ClickHouse was eight years old and ready to become a large open source project.

What is ClickHouse?

It is a column-oriented database. This means that internally, it stores columns together instead of rows. In practice, this means that it has been optimized for computational analysis on large data sets.

It is a good substitute for a time series database, even if it is not technically a time series database. Someone migrated data from InfluxDB to ClickHouse and the performance improved significantly.

www.jdon.com/56312