The premise

With the development of current services, the amount of data in several business systems began to expand dramatically. Before, I used relational database MySQL to conduct a data warehouse modeling, and found that after the amount of data came up, a large number of JOIN operations were still a little unbearable after improving the configuration of cloud MySQL. In addition, I developed a label service based on relational database design. Daily full label data (unavoidable Cartesian product) single table exceeds 5000W. At present, the user ID-based segmentation with multi-process processing method has temporarily delayed the deterioration of performance, but considering the near future, we still need to build a small data platform. The system of Hadoop is too large, with too many components and high learning costs of hardware and software, which cannot be mastered overnight by all members of a small team. With all these factors in mind, ClickHouse is a dark technology that needs to be invoked to see if it can get out of the way.

Software version

ClickHouse is not covered here, but is fully documented at https://clickhouse.tech. In Windows10, you can install Docker directly and run ClickHouse images directly using hyper-V features. The software required for setting up the development environment is listed below:

software version note
Windows 10 Make sure you use itWindows10And turn it onHyper-VIn order to useDocker
Docker Desktop any DockertheWindowsThe desktop version
ClickHouse Server X 20.3. Direct pulllatestThe mirror image of
ClickHouse Client X 20.3. Direct pulllatestThe mirror image of
Cmder The latest version Optional, used to replace the unusable console that comes with it

In Windows10, you can enable the Hyper-V feature by running the following command: Control Panel -> Programs -> Enable or Disable Windows Features -> Hyper-V

Then the Docker official site https://www.docker.com/get-started child pages can find Docker Desktop download entry:

After installation, Docker Desktop will automatically start with the system, and the software interface is as follows:

Install and use ClickHouse

Note that you need to have a preliminary understanding of ClickHouse’s core directory before starting the container installation.

Mirror pull and core directory

Download the images of ClickHouse Server and ClickHouse Client:

docker pull yandex/clickhouse-server
docker pull yandex/clickhouse-client
Copy the code

After downloading, the following prompt will be displayed:

Check this out with Docker Images:

λ  docker images
REPOSITORY                 TAG                 IMAGE ID            CREATED             SIZE
yandex/clickhouse-server   latest              c85f84ea6550        10 days ago         515MB
yandex/clickhouse-client   latest              f94470cc9cd9        10 days ago         488MB
Copy the code

Both images are actually wrapped in a tiny Ubuntu system, so the container can be used as if it were a Linux system. The core directory portion of ClickHouse Server in the container is as follows:

  • /etc/clickhouse-serverThis is:ClickHouse ServerDefault configuration file directory, including global configurationconfig.xmlAnd user configurationusers.xmlAnd so on.
  • /var/lib/clickhouseThis is:ClickHouse ServerDefault data storage directory.
  • /var/log/clickhouse-serverThis is:ClickHouse ServerDefault log output directory.

In order to facilitate configuration management, data viewing and log searching, the above three directories can be directly mapped to the specific directory of the host computer. The author has done the following mapping in this development machine:

Docker container directory Host directory
/etc/clickhouse-server E:/Docker/images/clickhouse-server/single/conf
/var/lib/clickhouse E:/Docker/images/clickhouse-server/single/data
/var/log/clickhouse-server E:/Docker/images/clickhouse-server/single/log

A few points to note before starting ClickHouse Server:

  • ClickHouse ServerThe service itself depends on three ports, whose default value is9000(TCPProtocol),8123(HTTPAgreement) and9009(Cluster data replication), the mapping to the host should be one-to-one, so you need to ensure that the three ports on the host are not occupied and can be usedDockerThe parameters of the-pSpecifies the port mapping between the container and the host.
  • ClickHouse ServerThe number of file handles in the container system needs to be modifiedulimit nofile, you can useDockerparameter--ulimit nofile=262144:262144Specifies the number of file handles.
  • There’s a technique that you can use. UseDockerthe--rmParameter to create a temporary container, obtained first/etc/clickhouse-serverDirectory configuration file, passDocker cp container directory Host directoryCommand to copy the container configuration file to the host directory. After the container is stopped, it will be deleted directly, so that the host configuration file template can be retained.

Temporary container copy configuration

Docker run –rm -d –name=temp-clickhouse-server Yandex /clickhouse-server After success, copy the container’s config. XML and users. XML files to the host using the following command:

  • docker cp temp-clickhouse-server:/etc/clickhouse-server/config.xml E:/Docker/images/clickhouse-server/single/conf/config.xml
  • docker cp temp-clickhouse-server:/etc/clickhouse-server/users.xml E:/Docker/images/clickhouse-server/single/conf/users.xml

After these two commands are executed, you can see that config.xml and users.xml have been generated in the host’s disk directory. Then you need to do a few configurations:

  • createdefaultPassword of the account.
  • Create a new onerootAccount.
  • Open client listeningHost, avoid later useJDBCClient orClickHouse ClientCannot connect whenClickHouse Server.

Docker exec-it temp-clickhouse-server /bin/bash:

  • PASSWORD=$(base64 < /dev/urandom | head -c8); echo "default"; echo -n "default" | sha256sum | tr -d '-'
  • PASSWORD=$(base64 < /dev/urandom | head -c8); echo "root"; echo -n "root" | sha256sum | tr -d '-'
root@607c5abcc132:/# PASSWORD=$(base64 < /dev/urandom | head -c8); echo "default"; echo -n "default" | sha256sum | tr -d '-'
default
37a8eec1ce19687d132fe29051dca629d164e2c4958ba141d5f4133a33f0688f
root@607c5abcc132:/# PASSWORD=$(base64 < /dev/urandom | head -c8); echo "root"; echo -n "root" | sha256sum | tr -d '-'
root
4813494d137e1631bba301d5acab6e7bb7aa74ce1185d456565ef51d737677b2
Copy the code

This gives you the SHA256 digest of the passwords for the default:default and root:root accounts. Modify the users.xml file on the host:

Then modify the config.xml file on the host:

Finally, the temporary container is stopped and destroyed with the Docker Stop temp-clickhouse-server.

Run the ClickHouse service

Then create and run an instance of the ClickHouse Server container using the following command (make sure config.xml and users.xml already exist) :

Name and container name: docker run -d --name=single-clickhouse-server Port mapping: -p 8123:8123 -p 9000:9000 -p 9009:9009 Number of file handles: - the ulimit nofiles = 262144-262144 data directory mapping: - v E: / Docker/images/clickhouse - server/use/data: / var/lib/clickhouse: rw configuration directory mapping: - v E: / Docker/images/clickhouse - server/use/conf: / etc/clickhouse - server: rw log directory mapping: - v E: / Docker/images/clickhouse - server/use/log: / var/log/clickhouse - server: rw mirror: another dual/clickhouse - serverCopy the code

Docker run -d –name=single-clickhouse-server -p 8123:8123 -p 9000:9000 -p 9009:9009 –ulimit nofile=262144:262144 -v E:/Docker/images/clickhouse-server/single/data:/var/lib/clickhouse:rw -v E:/Docker/images/clickhouse-server/single/conf:/etc/clickhouse-server:rw -v E: / Docker/images/clickhouse – server/use/log: / var/log/clickhouse – server: rw another dual/clickhouse – server.

After executing the command above, the Docker Desktop will have several pop-ups confirming whether to share the host directory. Simply press the Share it button.

Finally, the native command line Client, ClickHouse Client, is used to connect. Use the docker run-it –rm –link single-clickhouse-server:clickhouse-server yandex/ clickhouse-client-uroot –password command Root – host clickhouse – server:

λ docker run-it --rm --link single-clickhouse-server:clickhouse-server yandex/ clickhouse-client-uroot --password root -- Host Clickhouse-server Clickhouse Client Version 20.10.3.30 (Official build). Connecting to Clickhouse-server :9000 AS Connected to ClickHouse server version 20.10.3 revision 54441.f5abc88ff7e4 :) select 1; SELECT 1 ┌ ─ ─ 1 ┐ │ │ 1 └ ─ ─ ─ ┘ 1 rows in the set. The Elapsed: 0.004 SEC.Copy the code

The next time your computer restarts the ClickHouse Server container and the container does not start, simply use the command docker (re)start single-clickhouse-server to pull up the container instance.

Connect to the ClickHouse service using JDBC

ClickHouse currently has three JDBC drivers:

  • clickhouse-jdbc(Official) : The address ishttps://github.com/ClickHouse/clickhouse-jdbc, the current version is based onApache Http ClientThe implementation.
  • ClickHouse-Native-JDBC(Third party) : The address ishttps://github.com/housepower/ClickHouse-Native-JDBCBased on theSocketThe implementation.
  • clickhouse4j(Third party) : The address ishttps://github.com/blynkkk/clickhouse4jLighter than the official driver.

To be honest, it is a little embarrassing that the official driver package does not connect with TCP private protocol stack, but uses HTTP protocol for interaction. I do not know how much performance will be reduced, but based on the thinking of “official is better”, I still choose the official driver package for Demo demonstration. Introducing clickHouse-JDBC dependencies:

<dependency>
    <groupId>ru.yandex.clickhouse</groupId>
    <artifactId>clickhouse-jdbc</artifactId>
    <version>0.2.4</version>
</dependency>
Copy the code

Write a test class:

public class ClickHouseTest {

    @Test
    public void testCh(a) throws Exception {
        ClickHouseProperties props = new ClickHouseProperties();
        props.setUser("root");
        props.setPassword("root");
        // There is a global default database when no database is created
        ClickHouseDataSource dataSource = new ClickHouseDataSource("jdbc:clickhouse://localhost:8123/default", props);
        ClickHouseConnection connection = dataSource.getConnection();
        ClickHouseStatement statement = connection.createStatement();
        // Create a table. The table engine is Memory. This type of table will be deleted automatically after the service restarts
        boolean execute = statement.execute("CREATE TABLE IF NOT EXISTS t_test(id UInt64,name String) ENGINE = Memory");
        if (execute) {
            System.out.println(Table default.t_test created successfully);
        } else {
            System.out.println(Table default.t_test already exists);
        }
        ResultSet rs = statement.executeQuery("SHOW TABLES");
        List<String> tables = Lists.newArrayList();
        while (rs.next()) {
            tables.add(rs.getString(1));
        }
        System.out.println("Table in default database :" + tables);
        PreparedStatement ps = connection.prepareStatement("INSERT INTO t_test(*) VALUES (? ,?) . (? ,?) ");
        ps.setLong(1.1L);
        ps.setString(2."throwable");
        ps.setLong(3.2L);
        ps.setString(4."doge");
        ps.execute();
        statement = connection.createStatement();
        rs = statement.executeQuery("SELECT * FROM t_test");
        while (rs.next()) {
            System.out.println(String.format("Result,id:%s,name:%s", rs.getLong("id"), rs.getString("name"))); }}}Copy the code

The result is as follows:

Table default.t_test already exists # < Default Table in the database :[t_test] Query result,id:1,name:throwable query result, ID :2,name:dogeCopy the code

summary

After the ClickHouse development environment is set up, you will learn the basic syntax of ClickHouse, the features and usage scenarios of the various engines, and clustering (sharding and multiple copies).

References:

  • https://clickhouse.tech

remind

After a direct power outage, I found that the ClickHouse service in Docker was successfully restarted, but the error log output File not found, causing all clients to fail to connect to the service. The initial judgment is that metadata and actual stored data caused by the “power failure” caused by inconsistency. So proposal in a development environment to turn it off before you enter the container calls the service clickhouse – stop server, and then in the host machine call docker stop container name | containers ID stop to shut down, Otherwise, you need to recursively delete all the files in the Store directory in the data directory to restart ClickHouse Server and use it properly (this is a very rude method and has a high chance of directly causing data loss, so be careful).

(C-2-D E-A-20201108)

Personal blog

  • Throwable’s Blog