Abstract

The data storage model is the core part of the system design, which is of large magnitude and high QPS. It usually reduces system bottlenecks such as CPU/ memory/disk I/O by dividing libraries, and reduces performance problems caused by a large single table by dividing tables. So what’s the problem from a business perspective with something like sharding? What is the index method, the gene method?

preface

Large amount of data storage, common horizontal sharding algorithm:

  • Range
  • Hash

Horizontal sharding algorithm is more popular, just in order to link the previous to the next simple code some, understand the students can quickly skip!

Range

Sharding by scope based on Unique Key. Dimensions of segmentation:

  • Based on fixed scale scale, such as ten million scale scale. Tb1:0-1000W, TB2:1000W-2000W.
  • Table based on time dimension, such as year, month.

Advantages:

  • The routing policy is simple and can quickly locate fragments based on the resource range.
  • Expansibility, level to create the table needed in the future.

Inadequate:

  • The distribution of data is severely uneven.
  • The single table overheats in the hotspot data set. Procedure
  • The Unique Key must satisfy the increment property.

Hash

Mold extraction based on Unique Key, uniform fragment.

Advantages:

  • The routing policy is simple and the Hash Unique Key can quickly locate fragments.
  • Data store & request volume evenly distributed as long as Unique Key is uniform.

Inadequate:

  • Poor scalability and high cost of capacity expansion.

The above is the preheating part, when the data sharding after what problems to the business?

example

Passport Service is a basic Service that almost every company must have.

I thought and designed user storage when I was in charge of designing the Passport service in my previous team.

Passport Service is a very common basic Service, which mainly provides account pass-related capabilities. In usage scenarios, such as login, registration, logout, login status verification, update, etc., the core storage of Passport Service is a user table:

Such as:

CREATE TABLE `tb_user0` (
  `id` int(10) NOT NULL AUTO_INCREMENT COMMENT 'table increment ID'.`uid` bigint(20) NOT NULL DEFAULT '0' COMMENT 'user ID'.`user_name` varchar(255) NOT NULL DEFAULT ' ' COMMENT 'username, regardless of size; All uppercase, encrypted and stored '.`pwd` char(40) NOT NULL DEFAULT ' ' COMMENT 'password'.`... `
  
  `update_time` int(10) NOT NULL DEFAULT '0' COMMENT 'Last Updated'.`create_time` int(10) NOT NULL DEFAULT '0' COMMENT 'Creation time',
  PRIMARY KEY (`id`),
  KEY `idx_uid` (`uid`))ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='User master table';
Copy the code

User_name is the login user name, usually a mobile phone number, which requires desensitization.

Ok, how do I design this form?

Single table implementation is not reasonable, combined with business scenarios, horizontal sharding is a must.

High availability and consistency are required due to a large number of accounts and a large number of requests. UID as a Unique Key, based on UID is a high frequency query, so it is based on the UID level table.

Front desk service relative core operations:

  • To log in. The login is implemented based on user_name+ PWD< = 1%.UID is the result data
  • Verify login status. Based on the token to resolve the UID, determine the login state to obtain user information, the magnitude of the overall market> = 99%.UID is the start data

Background service (MIS or operation management background) operation:

  • Hyperdimensional retrieval
  • Batch paging query

The problem

Based on UID sharding, then the problem is ~

  1. Based on THE UID sharding, the front service login has no UID, only user_name, not clear account data in which table, scan the whole library? Low performance ugly.

    How to implement query efficiently with UID sharding?

    This is the subject of this article.

  2. Since the data is fragmented, the background service multi-dimensional retrieval, cross-fragmented summary data paging query how to do? Continue iterating through the full library memory count? Can Mysql resist? Even if Mysql can resist, can background users wait? Won’t the client time out even if the background user is good-natured?

    How to retrieve summaries efficiently from multiple fragments?

    This is the subject of this article.

plan

Issue1: How to implement query efficiently with UID sharder?

Scheme 1: index method

UID can be used to locate fragments. User_name cannot be used to locate fragments.

Solution:

  • Create an index table based on Mysql to store the Mapping relationship between user_name and UID.
  • During login verification, the UID is queried by user_name based on the index table, and then the user is routed to a specified fragment to obtain data.
  • Index table attributes less, theoretically small single row, capacity of tens of millions of no problem.
  • The overall model is: based onThe Index Index table +Meta Metadata table(divided)

Problem: Performance deteriorates after multiple DB queries.


Scheme 2: Persistent cache mapping

If you cannot accept more than one DB query, persist the mapping to the cache.

Solution:

  • Store mapping relationships based on cache (Redis)
  • During login verification, user_name is used to query the UID in the cache and route to the specified fragment.
  • If the cache does not find the mapping, scan the entire library to obtain the mapping and persist it in the cache

Question:

  • One more Redis query, in fact, nothing.
  • There areCache breakdownRisk: The Cache does not exist. When a high number of concurrent requests come in, the DATABASE is swept to the database, causing a sudden increase in DB pressure.
  • There areThe cache to penetrateAt risk, when UID is not present in the Cache or DB, and requests are constantly made, this attack can also cause database stress.

In common scenarios, indexes + caching can solve most of the problems


Scheme 3: Generate uid based on username

Use username to generate the UID

Hash (f(username) = uid)

Question:

  • Risk of Hash collisions and UID collisions
  • Username cannot be updated

Option 4: genetic method

Select username from username and add it to the uid generation rule.

Solution:

It is necessary to introduce the ability to generate distributed globally unique ID. This genetic method relies on distributed ID and will output the text of Distributed Number Generator later. Firstly, a common algorithm of distributed ID generation, Snowflake, will be briefly introduced

The result of the SnowFlake id is an integer of 64bit size. Its structure is shown below:

  • 1bit, no, because the highest bit in binary is the sign bit, 1 for negative number, 0 for positive number. The generated iD is usually an integer, so the highest bit is fixed at 0
  • 41bit- Timestamp, used to record time stamps, millisecond level
  • 10bit- WORKING machine ID, which is used to record the cluster + machine ID
  • 12bit- Serial number. The serial number is used to record different ids generated within the same millisecond. The step size can be increased by itself.

The so-called genetic algorithm is to extract the fragment gene of username and merge it into the globally unique ID to generate a new UID. The diagram below:

Demo source code directly on it:

The global unique ID is generated based on the SnowFlake algorithm, with a simple adjustment to the number of bits per Block.

Basic configuration:

const (
	GENE_NODEIDID_BITS int64 = 10	// Machine node 10Bit
	GENE_SEQ_BITS      int64 = 13	// Serial number of the same node at the same time is 13 bits
	GENE_BITS          int64 = 12	// Fragment gene 12Bit

	GENE_NODEID_MAX int64 = - 1 ^ (- 1 << GENE_NODEIDID_BITS)	// The maximum number of machine nodes is 1024
	GENE_SEQ_MAX    int64 = - 1 ^ (- 1 << GENE_SEQ_BITS)			// The maximum serial number is 8192

	// This piece gives up time, in order to preserve genes. Timestemp unit s, 28 bits approximately 7 or 8 years
	GENE_TIMESTEMP_SHIFT       = GENE_NODEIDID_BITS + GENE_SEQ_BITS + GENE_BITS
	GENE_NODEIDID_SHIFT  int64 = GENE_SEQ_BITS + GENE_BITS
	GENE_SEQ_SHIFT       int64 = GENE_BITS

	// Refuse to waste, cherish time
	GENE_EPOCH int64 = 1624258189

	// Default step size
	GENE_DEFAULT_STEP_LONG = 1
)
Copy the code

In this Demo, the number of GENE_BITS is determined by the number of fragments. For example, if the fragments are divided into 16 pieces, four bits of GENE_BITS are acceptable

Example of gene ID:

type GeneID struct {
	m         sync.Mutex	/ / read/write locks
	timestemp int64				// The current timestamp
	nodeID    int64				// Machine node
	seq       int64				/ / serial number
	geneID    int64				/ / gene
	step      int64				// The sequence number grows by step
}
Copy the code

Sample gene extraction:

// Extract gene ID based on gene sample
func ExtractGene(geneSample []byte) int64 {
	gene := md5.Sum(geneSample)
	hashGeneValue := fmt.Sprintf("%x", gene)[29:32]
	geneID, _ := strconv.ParseInt(hashGeneValue, 16.64)
	return geneID
}
Copy the code
  • The sample Hash generates 32-bit MD5
  • Take the last three characters
  • Converts the string to int64 as a hexadecimal000-fff
  • The result is the gene ID

Complete ID generation based on snowflake algorithm:

func (g *GeneID) Generate(a) int64 {
	g.m.Lock()
	defer g.m.Unlock()

	now := time.Now().UnixNano() / 1e9 // nanosecond to second
	if now == g.timestemp {
		g.seq = g.seq + g.step
		if g.seq > GENE_SEQ_MAX {
			for now <= g.timestemp {
				now = time.Now().UnixNano() / 1e9
			}
			g.seq = 0}}else {
		g.seq = 0
	}
	g.timestemp = now
	return g.timeBlock() | g.nodeBlock() | g.seqBlock() | g.geneBlock()
}
Copy the code

Full Demo source can be accessed: github.com/xiaoxuz/idg…

Insufficient: Username cannot be updated


Issue2: How to retrieve a summary efficiently with multiple fragments?

Idea 1: Decouple front-end and back-end data and isolate resource storage to avoid jitter caused by inefficient background queries.

Idea 2: Adopt the data redundancy storage design, use other storage services as background storage components, and write data in double or asynchronous mode.

Idea 3: Background storage components can be selected based on data timeliness:

  • timelinessDemand is high: chooseElasticsearchBased on theDistributed inversion indexReal-time synchronization of foreground data, and provide multi-dimensional retrieval and aggregation capabilities for background services.
  • timelinessFor low: Select big data related services, such as hourly or daily data entryHive.

Data isolation design based on Elasticsearch is as follows:

Based on ali open source Canal service real-time subscription consumption foreground Mysql binlog, binlog is published to the message queue Kafka.

The Binlog in Kafka can be consumed downstream by the Flink real-time computing service or by a Consumer implemented by Golang, parsed and saved to Elasticsearch.

The backend service implements real-time search and aggregation based on the RESTful API of Elasticsearch.

conclusion

Issue1: How to implement query efficiently with UID sharder?

  • Index method
  • Cache map
  • The username generated uid
  • Genetic method

Issue2: How to retrieve a summary efficiently with multiple fragments?

  • Front and background resource isolation, redundant storage design
  • Store background data based on Es or big data related services, and provide real-time & offline data capabilities.

thinking

Memory is like a cache. Note-taking is like a tray. The cache will always fail, so you can keep multiple copies of the hard drive.

Don’t just rely on the head to learn, precipitation down to write down in the book is their own.

Call it a day

Thank you for reading!