From the internship, I started to contact ClickHouse, which can be regarded as the cognition of database before opening. “There is something different about the database.” For details, see the clickHouse website. Simply put clickhouse is a place for good design.

clickhouse

The column type storage

The feature of row storage is that a single data record is set together, which is closer to the relational model. The writing efficiency is high, and a complete data record can be obtained quickly when reading, showing the locality of data.

But row storage is not friendly to OLAP analysis queries:

  1. When the query will be the whole of the field query out, and returned to the user is generally a few fields, so actually a large number ofIOThe operation is invalid.
  2. Reading into memory is a block structure. Depending on the computer storage medium, there is actually a lot of invalid data filling the CPU cache line.

Clickhouse stores each field in a single file in storage and uses sparse indexes to greatly speed up queries on wide tables.

Table engine

Rich table engine, give developers more choices and adapt to a variety of scenarios. Here are some of the engines I use a lot:

  • Mysql: Direct mappingmysql tableCan be used by youclickhouseRich functions to operate onmysqlThe data.
  • MergeTree: You could say this isckThe largest branch of the table engine series,Replicated+Support for data copy,Replacing+You can delete duplicated data. (Further details can be discussed in separate articles)
  • Kafka: can be directly used asKafkaThe consumer side, at the same timeMaterialized ViewsData persistence can be achieved.

Of course, in addition to these, the official website there are many engines, see the specific

More

  • Batch insert, it is recommended to write in batches, which is also in line with the original design of these software database. To reduceIOTo improve write performance.
  • Rich functions with rich data types, really pureSQLDevelopment.

The above mentioned ck’s many excellent features, how to best use these features to operate CK in development?

Go exposes the top-level database/ SQL interface, and there are already drivers like Clickhouse-Go that can manipulate CK just like the native API. But not enough:

  1. How to usebatch insertImprove write performance?
  2. Distributed table or local table, which write should be selected?
  3. The amount of returned data is too large, how to usegoGood design returns data?

That’s why we designed CKGroup: to provide a more user-friendly library for ClickHouse.

ckgroup

Ckgroup is a clickhouse-Go wrapper that provides a more user-friendly and convenient API for developers to use.

features

  • pureGolangThe development of
  • The cluster is automatically done during the inserthash Sharding, avoidclickhouseInsertion performance loss and uneven data distribution due to internal fragmentation
  • Insert failure retry mechanism
  • The query result is automatically converted tostruct

The installation

  • Go to 1.13And above, support the three latest versions of Go
  • Clichouse (+ 19.16)
$ go get -u github.com/tal-tech/cds/tools/ckgroup
Copy the code

Rapid experience

Make sure you have docker, docker-compose installed. After downloading ckGroup, you can run./demo.sh directly.

As shown in the figure above, demo.sh does the following:

  • Created toReplicatedMergeTreeA table of engines. The data will be stored as a copy inclickhouseIn the.
  • ExecAuto()Insert will pre-insert each oneshardThis can be seen in the amount of data printed at each node.
  • QueryRow()The query results

use

Ckgroup is a wrapper around Clickhouse-Go. In use, developers only need to import ckGroup, do not need to import other driver, clickhouse can be operated.

In this case, we prepare two statements:

  1. One for inserting tuples (rows)
  2. The other is for queries.
import "github.com/tal-tech/cds/tools/ckgroup"

// Fill the config
var (
	ckgroupConfig = config.Config{
        ShardGroups: []config.ShardGroupConfig{
            {ShardNode: "tcp://localhost:9000", ReplicaNodes: []string{"tcp://localhost:9001"}},
            {ShardNode: "tcp://localhost:9002", ReplicaNodes: []string{"tcp://localhost:9003"}}},})func main(a)  {
    group := ckgroup.MustCKGroup(c)

    // Ready data
    var args [][]interface{}
    for _, item := range generateUsers() {
        args = append(args, []interface{}{item.Id, item.RealName, item.City})
    }

  	// Batch insert, ckgroup will help you to make the shard
    err := group.ExecAuto(`insert into user (id,real_name,city) values (? ,? ,?) `.0, args)
    iferr ! =nil {
        panic(err)	// Just for example purpose
    }
  
  	// Query multi rows of the user in Shanghai
  	datas := make([]user, 0)
    err := group.QueryRows(&datas, `select id,real_name,city from user where city=? `."Shanghai")
    iferr ! =nil {
        panic(err)
    }
    for i := range datas {
        fmt.Println(datas[i])
    }
}

func generateUsers(a) []user {
    var users []user
    for i := 0; i < 10000; i++ {
        item := user{
            Id:       i,
            RealName: fmt.Sprint("real_name_", i),
            City:     "test_city",
        }
        users = append(users, item)
    }
    return users
}
Copy the code

Feel free to contribute your own examples!

TODO

  • Change to an interface implementation to facilitate test mocks
  • Improve the ease of use of inserts
  • streaming
  • .

The project address

ckgroup