Redis

Introduction and overview of NoSql

Introductory overview

Great opportunity in Internet era, why noSQL

The good old days of standalone MySQL
  • In the 1990s, the number of visits to a website was generally small, and it was easy to cope with a single database. At that time, more static web pages were available, and there were few sites of dynamic interaction type
  • Under the above architecture, what is the bottleneck of data storage?
    1. The total amount of data that can’t fit in a machine
    2. Index of data (B+ Tree) When there is no more space in a machine
    3. Traffic (mixed read/write) an instance cannot withstand
2.Memcached +MySQL+ vertical split
  • Later, with the increase of traffic, almost all sites using MySQL architecture began to have performance problems in the database, web applications are no longer just focused on function, but also in pursuit of performance. Programmers began to use caching technology to relieve the pressure of the database, optimize the structure and index of the database. At the beginning, file caching is popular to relieve the database pressure, but when the traffic continues to increase, multiple Web machines can not share through the file cache, a large number of small file cache also brought a relatively high IO pressure. At this point, Memcached naturally becomes a very fashionable technology product
  • Memcached, as an independent distributed cache server, provides a high-performance shared cache service for multiple Web servers. On the Memcached server, multiple Memcached services are extended based on the hash algorithm. Then came consistent hashing to address the massive cache invalidation caused by rehashing by adding or subtracting cache servers
3. The primary and secondary reads and writes of MySQL are separated
  • As database write pressure increases, Memcached only relieves database read pressure. Most websites start to use master-slave replication technology to achieve read/write separation in order to improve read/write performance and scalability of the read/write library. Mysql’s master-slave mode became standard on the site at this time
4. Split table and library + horizontal split +mysql cluster
  • On the basis of Memcached cache, MySQL master-slave replication, read/write separation, the write pressure of MySQL master library begins to bottleneck, and the data volume continues to surge. Because MyISAM uses table lock, there will be serious lock problems under high concurrency. A large number of high-concurrency MySQL applications are starting to use the InnoDB engine instead of MyISAM

  • At the same time, it became popular to use separate tables and libraries to ease the write pressure and data growth expansion problems. At this point, dividing tables and libraries became a hot technology, a hot interview question and a hot technical question discussed in the industry. It was around this time that MySQL came out with a table partition that was still not stable, which gave hope to companies with modest technology. Although MySQL launched the MySQL Cluster Cluster, the performance can not well meet the requirements of the Internet, but provides a very large guarantee of high reliability

5.MySQL scalability bottleneck
  • MySQL database also often stores some large text fields, resulting in a very large database table, which results in very slow database recovery, and it is not easy to quickly restore the database. For example, 10 million 4KB of text is close to 40GB of text. If you could leave this data out of MySQL, MySQL would become very small. Relational databases are powerful, but they can’t handle all application scenarios well. MySQL’s poor scalability (complex technology is required to achieve), big data IO pressure, difficult to change the table structure, are the current use of MySQL developers are facing problems
What does the system look like today
7.为什么要用NoSQL
  • Today, we can easily access and capture data through third-party platforms (such as Google,Facebook, etc.). User profiles, social networks, geolocation, user-generated data and user logs have multiplied. If we want to mine these user data, the SQL database is not suitable for these applications, and the development of NoSQL database can deal with these large data very well

What is the

  • NoSQL = Not Only SQL

  • A generic term for a non-relational database. With the rise of the Internet web2.0 website, the traditional relational database has been unable to cope with the web2.0 website, especially the super large scale and high concurrency SNS type web2.0 pure dynamic website, exposing a lot of problems that are difficult to overcome, while the non-relational database has been very rapid development because of its own characteristics. NoSQL database is created to solve the challenges of large data sets and multiple data types, especially the problems of big data applications, including the storage of super-large data

  • (Google or Facebook, for example, collect trillions of bits of data for their users every day). These types of data stores do not require fixed schemas and can scale horizontally without unnecessary operations

Can do

Easy extension
  • There are many types of NoSQL databases, but one common feature is the removal of relational characteristics from relational databases. There is no relationship between the data, which makes it easy to scale. It also brings scalability at the architectural level
High performance with large data volume
  • NoSQL databases have very high read and write performance, especially under large data volumes, due to its irrelevance and simple database structure
  • Generally, MySQL uses Query Cache, which is invalidated every time a table is updated. It is a large-granularity Cache, and the Cache performance is not high in applications with frequent web2.0 interactions. NoSQL Cache is a record level Cache, which is a fine-grained Cache, so NoSQL performs much better at this level
Diverse and flexible data models
  • NoSQL does not need to create fields for the data to be stored in advance and can store custom data formats at any time. In relational databases, adding and deleting fields is a very troublesome thing. If you have a very large table, adding fields is a nightmare
Traditional RDBMS VS NOSQL
  • RDBMS vs NoSQL

  • RDBMS NoSQL
    Highly organized structured data Stands for more than just SQL
    Structured Query Language (SQL) There is no declarative query language
    Data and relationships are stored in separate tables There are no predefined patterns
    Data manipulation language, data definition language Key-value pair storage, column storage, document storage, graph database
    Strict consistency Final consistency, not ACID properties
    Based on the transaction Unstructured and unpredictable data
    The CAP theorem
    High performance, high availability and scalability

Where to go next

  • Redis
  • Memcache
  • Mongdb

How to play

  • KV
  • Cache
  • Persistence
  • .

3 v + 3 high

3V in the era of big data

  • Mass Volume
  • Multiple segments
  • Real-time Velocity

Internet demand is 3 high

  • High concurrency
  • High scalability
  • A high performance

The current NoSQL classic application

Today’s applications are SQL and NOSQL together

Alibaba Chinese station commodity information how to store

  • Multi-data source, multi-data type storage issues that are relevant to us
1. Basic commodity information
  • Name, price, date of manufacture, manufacturer, etc
  • Mysql/Oracle database: mysql/ Oracle databaseNote that the Mysql used in Taobao is transformed by daniu himself
    • Why he went to IOE: In 2008, Jian Wang joined Alibaba as group chief Architect, which is now the chief technology officer. The former executive vice president of Microsoft Research Asia is positioned by Ma to help Alibaba group build a world-class technology team and be responsible for the group’s technology architecture and infrastructure technology platform. After joining Ali, Wang Jian, with his technical gene and academic style, put forward the idea of “IOE removal” (removing IBM minicomputer, Oracle database and EMC storage equipment in the process of IT construction) in Alibaba Group, and began to implant the essence of cloud computing into Ali IT gene. Wang Jian summarized the relationship between the “go IOE” movement and Ali Cloud in this way: “Go IOE” has completely changed the basis of Ali Group’S IT architecture, which is the basis of Ali’s embrace of cloud computing and output of computing services. The essence of “de-IOE” is distribution, which makes Commodity PC architecture available everywhere possible and the primary condition for cloud computing to be implemented.
2. Product description, details and evaluation information (multi-text)
  • The I/O read and write performance deteriorates
  • Document database MongDB
3. Pictures of goods
  • Commodity picture display category
  • Distributed file system
    • Taobao’s own TFS
    • Google’s GFS
    • Hadoop的HDFS
4. Product keywords
  • Search engine, Taobao for internal use
  • ISearch
5. Hot high-frequency information of the band nature of commodities
  • In-memory database
  • Tair, Redis, Memcache
6. Commodity trading, price calculation and point accumulation
  • External system, external third party payment interface
  • Alipay
7. Summarize the difficulties and solutions of large-scale Internet applications (big data, high concurrency, diverse data types)
  • The difficulties in
    • Data type diversity
    • Data source diversity and variation refactoring
    • Data source transformation and data service platform does not require extensive reconstruction
  • The solution
    • EAI and unified data Platform services
    • Ali, Taobao: UDSL
      • What is the
      • What kind of
        1. mapping
        2. API
        3. Hot cache
        4. .

Introduction to NoSQL data model

An e-commerce customer, order, order, address model to compare the relational database and non-relational database

How do you design traditional relational databases?
  • ER diagram (1:1/1:N/N:N, common primary foreign keys, etc.)
How do you design NoSQL
  • BSON () is a Binary form of json that supports inline document objects and array objects

  • BSON data model

    • { "customer":{ "id":1136, "name":"Z3", "billingAddress":[{"city":"beijing"}], "orders":[ { "id":17, "customerId":1136, "OrderItems" : [{" productId ": 27," price ": 77.5," productName ":" thinking in Java "}]. "shippingAddress":[{"city":"beijing"}] "orderPayment":[{"ccinfo":"111-222-333","txnid":"asdfadcd334","billingAddress":{"city":"beijing"}}], } ] } }Copy the code
The contrast, the problem and the difficulty
  • Why can the above situation be handled by the aggregation model
    • Associated queries are not recommended for highly concurrent operations, and Internet companies use redundant data to avoid associated queries
    • Distributed transactions do not support much concurrency
  • Student: Think about a relational model database how do you look it up? According to our newly designed BSon, is it cute to query

The aggregation model

  • KV key-value pairs
  • BSON
  • Column family
    • As the name implies, data is stored in columns. The biggest feature is easy to store structured and semi-structured data, easy to do data compression, for a certain column or several columns of the query has a very large I/O advantage
  • graphics

The four categories of NoSQL databases

KV key value: Typical introduction

  • Sina: BerkeleyDB + redis
  • Meituan: redis + tair
  • Alibaba, Baidu: memcache+ Redis

Document database (bSON format) : Typical introduction

  • CouchDB
  • MongoDB
    • MongoDB is a database based on distributed file storage. Written in C++ language. Designed to provide scalable high-performance data storage solutions for WEB applications
    • MongoDB is a product between relational database and non-relational database. Among non-relational databases, it has the most rich functions and is the most like relational database

Column storage database

  • Cassandra, HBase
  • Distributed file system

Graph relational database

  • It’s not about graphics, it’s about relationships, such as: circle of friends, social networks, AD recommendation systems, social networks, recommendation systems, etc. Focus on building the relationship map
  • Neo4j,InfoGrid

Four contrast

CAP principle CAP+BASE in distributed database

What is conventional ACID

  • A (Atomicity) is easy to understand. It means that all operations in A transaction are either done or not done. The condition for A transaction to succeed is that all operations in the transaction are successful. For example, in bank transfer, 100 yuan is transferred from account A to account B. There are two steps: 1) Withdraw 100 yuan from account A; 2) Deposit 100 yuan into account B. These two steps either complete together, or do not complete together, if only complete the first step, the second step failed, the money will be puzzling less 100 yuan.
  • C (Consistency) Consistency is easy to understand, that is, the database must always be in a consistent state, transaction running does not change the original Consistency constraints of the database.
  • The so-called independence means that the concurrent transactions will not affect each other. If the data to be accessed by one transaction is being modified by another transaction, the data accessed by the other transaction will not be affected by the uncommitted transaction as long as the other transaction is not committed. For example, there is A transaction that transfers 100 yuan from account A to account B. Under the condition that the transaction is not completed, if B queries his own account at this moment, he cannot see the newly added 100 yuan
  • D (Durability) Durability means that once a transaction is committed, its modifications are kept in the database forever and will not be lost even if downtime occurs.

CAP

  • C:Consistency
  • A:Availability
  • P:Partition tolerance

CAP is two out of three

  • CAP theory says that in a distributed storage system, only two of the above can be implemented. Since the current network hardware is bound to have problems such as delayed packet loss, partition tolerance is something we must implement. So there is a trade-off between consistency and availability, and no NoSQL system can guarantee all three

  • C: strong consistency, A: high availability, P: distributed tolerance

    • CA Traditional Oracle database

    • AP’s choice of most website architectures

    • CP Redis, mongo

    • Note: Trade-offs must be made when working with distributed architectures. Strike a balance between consistency and availability. More than most Web applications, strong consistency is not really required. So sacrifice C for P, which is the current direction of distributed database products

Classic CAP figure

  • The core of CAP theory is that it is impossible for a distributed system to meet the three requirements of consistency, availability and fault tolerance of partitions.

You can only satisfy two at a time. Therefore, according to the CAP principle, NoSQL database can be divided into three categories that meet the CA principle, CP principle and AP principle:

  • CA – A single point cluster, a system that satisfies consistency, availability, and is generally not very powerful in scalability
  • CP – Meet the consistency, partition tolerance required system, usually not particularly high performance
  • AP – Systems that meet availability and partition tolerance may generally have lower requirements for consistency

BASE

  • BASE is a solution to reduce availability caused by strong consistency of relational database.

  • BASE is an abbreviation of three terms:

    • Basically Available
    • Soft State
    • Eventually consistent
  • The idea is that the overall scalability and performance of the system can be improved by relaxing the system’s demands for consistency at one point in time. Why do you say so? The reason is that it is impossible for large systems to use distributed transactions to complete these indicators because of the geographical distribution and extremely high performance requirements. In order to obtain these indicators, we must adopt another way to complete, and BASE is the solution to this problem

Overview of Distributed + Cluster

  • Distributed system

    • Consists of multiple computers and communicating software components connected over a computer network (local or wide area network). A distributed system is a software system built on a network. Because of the characteristics of software, distributed systems have a high degree of cohesion and transparency. As a result, the difference between networks and distributed systems is more about high-level software (especially operating systems) than hardware. Distributed system can be applied in different platforms such as Pc, workstation, LAN and WAN.
  • To put it simply:

    1. Distributed: Different service modules (projects) are deployed on different servers. They communicate and invoke each other through Rpc/Rmi to provide services externally and collaborate within the group

    2. Cluster: The same service modules are deployed on multiple servers. Distributed scheduling software is used to centrally schedule services and provide external access

Introduction to Redis

Introductory overview

What is the

  • Redis:REmote DIctionary Server

  • Is completely open source free, written in C language, comply with the BSD protocol, is a high performance (key/value) distributed memory database, based on memory running, and support persistent NoSQL database, is currently one of the most popular NoSQL database, also known as the data structure server

  • Redis shares three characteristics with other key-value cache products (memcache)

    1. Redis supports persistent data, which can keep data in memory on disk and be reloaded for use upon restart
    2. Redis not only supports simple key-value type data, but also provides the storage of list, set, zset, hash and other data structures
    3. Redis supports data backup, namely, data backup in master-slave mode

Can do

  • Memory storage and persistence: Redis supports asynchronous writing of data in memory to disk without affecting continued service
  • The operation of fetching the latest N data, for example, you can put the ids of the latest 10 comments in the Redis List
  • Emulating functions like HttpSession that require an expiration time
  • Publish and subscribe messaging systems
  • Timer, counter

Where is the

  • Redis official website address

How to play

  • Data types, basic operations, and configurations
  • Persistence and replication, RDB/AOF
  • Control of transactions
  • copy
  • .

The installation of Redis

Download redis to the /opt directory

  • Third-party software is stored in the /opt directory

Run the make and make install commands

  • Note the GCC version

  • For details, see Upgrading GCC

Redis start

  • The default installation directory /usr/local/bin is displayed
  • Redis-server redis.conf(custom conf file)
  • redis-cli -p 6379

Explanation of miscellaneous basic knowledge after Redis is started

Single process

  • A single-process model to handle client requests. The response to events such as reads and writes is done by wrapping the epoll function. The actual processing speed of Redis is entirely dependent on the execution efficiency of the main process
  • Epoll is an enhanced version of select/ Poll, a multiplex IO interface for Linux, which is improved by the Linux kernel for handling a large number of file descriptors. It can significantly improve the system CPU utilization of programs with only a small amount of activity in a large number of concurrent connections

Default 16 database, similar to the array table starting from zero, the initial default use of zero library

  • Set the number of databases. The default database is 0. You can use the SELECT command to specify the database ID on the connection

The Select command switches the database

  • select 0
    Copy the code

Dbsize Displays the number of keys in the current database

Keys lists all the keys in the library

  • keys *
  • Keys k? Similar to fuzzy search

FlushDb: flushes the current library

FlushALL: flushes all libraries

Unified password management, 16 libraries are the same password, or all OK or one also can not connect

Redis indexes are all built from scratch

Why is the default port 6379

Redis data type

Five data types of Redis

String(String)

  • String is the most basic redis type. You can think of it as Memcached, with each key corresponding to a value
  • The string type is binary safe. The redis string can contain any data. Like JPG images or serialized objects
  • The string type is the most basic Redis data type. A Redis string value can be 512 MB at most

Hash (similar to a Map in Java)

  • Redis Hash is a collection of key-value pairs
  • Redis hash is a mapping table of fields and values of string type. Hash is especially suitable for storing objects
  • Map
    ,object>

The List (List)

  • Redis lists are simple lists of strings, sorted by insertion order, and you can add an element to either the top (left) or bottom of the list
  • It’s actually a linked list underneath

Set (Set)

  • Redis’ Set is an unordered, non-repeating collection of type string. It is implemented through the HashTable implementation
  • Note: New HashSet is the underlying equivalent of new HashMap.

Sorted set Zset(sorted set)

  • Redis zset, like set, is a set of elements of type string and does not allow duplicate members. Each element is associated with a score of type double
  • Redis uses scores to sort the members of a collection from smallest to largest. Members of a Zset are unique, but scores can be repeated.

Where do I get redis common data type manipulation commands

Redis command reference

  • Redis command

Redis key (key)

The commonly used

case

  • keys *
  • Exists Indicates the name of a key, which determines whether a key exists
  • Move key db –> the current library does not exist, is removed
  • Expire Key seconds: Sets the expiration time for a given key
  • TTL key Checks how many seconds remain to expire. -1 indicates that it will never expire, and -2 indicates that it has expired
  • What type of key is your key

Redis String (String)

The commonly used

Single Value and the Value

case

set/get/del/append/strlen
Incr/decr/incrby/decrby, must be a number to add or subtract
getrange/setrange
  • Getrange: obtains a value within a specified range, for example, between…… The relationship between the and
  • Setrange Specifies the value in the specified range. The format is setrange Key value
Setex (set with expire) keysecond value /setnx(set if not exist)
  • Setex: Set key with expiration time, set dynamically
  • Setex key second value True value
  • Setnx: Sets the value of the key only if it does not exist
mset/mget/msetnx
  • Mset: Sets one or more key-value pairs simultaneously
  • Mget: Gets all (one or more) of the values of a given key
  • Msetnx: Sets one or more key-value pairs simultaneously, if and only if none of the given keys exist
Getset getSet getSet
  • Getset: Sets the value of the given key to value and returns the old value of the key

Redis List

The commonly used

Single Value more than the Value

case

Lpush (list header)/rpush(tail)/lrange
lpop/rpop
Lindex, get elements by index subscript (top to bottom)
llen
Delete N values for the LREM key
  • Delete 2 elements v1 from left to right, return the actual number of deleted elements
  • LREM list3 0 value, indicating that all given values are deleted. Zero is the total value
Ltrim Key Start index End index. Truncate a specified range of values before assigning a value to the key
Rpoplpush Source list Destination list
lset key index value
Linsert key before/after value 1 value 2
The performance summary
  • It is a linked list of strings, left and right can be inserted to add; If the key does not exist, create a new linked list; If the key already exists, add the content. If the values are removed, the corresponding keys are gone.

  • Linked lists are extremely efficient both at the beginning and the end, but are notoriously inefficient when operating on intermediate elements.

Redis Set (Set)

The commonly used

Single Value more than the Value

case

sadd/smembers/sismember
Scard, gets the number of elements in the set
Srem key value Deletes elements from the collection
Srandmember key Specifies an integer.
Spop key is randomly removed from the stack
Smove key1 key2 A value in key1 assigns a value from key1 to key2
Mathematical set class
  1. Difference set: an item whose sdiff is in the first set but not in any subsequent set
  2. Intersection, sinter
  3. And set: sunion

Redis Hash

The commonly used

KV mode stays the same, but V is a key-value pair

case

hset/hget/hmset/hmget/hgetall/hdel(*)
hlen
Hexists Key Indicates the key of a value in the key
hkeys/hvals
hincrby/hincrbyfloat
hsetnx
  • There is no assignment, there is no assignment

Redis ordered set Zset(sorted set)

The commonly used

The difference between the set

  • I’m going to add a score to set, where set was k1 v1 v2 v3, and now Zset is k1 score1 v1 score2 v2

case

zadd/zrange
Zrangebyscore Key Start Score End score
  • withscores
  • (do not contain
  • Limit returns the number of subscript steps that Limit the start of the subscript step
Zrem key Value corresponding to a score, which is used to delete elements
  • Delete an element in the format of the key value of zREM zset, which can be multiple, and a corresponding zREM key score, which can be multiple values
Zcard /zcount key Score /zrank key values, which is used to obtain the subscript value /zscore key corresponding value and obtain the score
  • Zcard: Gets the number of elements in the set
  • Zcount: obtains the number of elements in the score range. Zcount key starts the score range and ends the score range
  • Zrank: obtains the subscript position of the value in the zset
  • Zscore: Obtain the corresponding score according to the value
Zrevrank key values, used to obtain the subscript value in reverse order
zrevrange
Zrevrangebyscore Key End Score Start score
  • Zrevrangebyscore zset1 90 60 withscores is reversed

Parse the redis.conf configuration file

Units unit

  • Configure the size units. The beginning defines some basic units of measurement. Bytes only, not bits are supported
  • Case insensitive

INCLUDES contains

  • Redis. conf can be used as a master gate, including others

GENERAL GENERAL

Daemonize starts as a daemon thread

Pidfile

Port

Tcp-backlog

  • Set the TCP backlog. The backlog is actually a connection queue. The total backlog = incomplete three-handshake queue + completed three-handshake queue.
  • In high-concurrency environments you need a high backlog value to avoid slow client connection issues. Pay attention to the Linux kernel will reduce this value to the/proc/sys/net/core/somaxconn values, so you need to confirm increase somaxconn and tcp_max_syn_backlog two values to achieve the desired effect

Timeout

Bind

Tcp-keepalive

  • The unit is second. If the value is 0, Keepalive detection is not performed. You are advised to set the value to 60

Loglevel

  • debug (development/testing)
  • verbose
  • notice (production probably)
  • warning

Logfile

Syslog-enabled

Syslog-ident

Syslog-facility

Databases

SNAPSHOTTING snapshot (*)

Save

  • Format: save Number of write operations in seconds

    • RDB is a compressed Snapshot of the entire memory. The RDB data structure can be configured with composite Snapshot trigger conditions.

    • The default is 10,000 changes in 1 minute, 10 changes in 5 minutes, or 1 change in 15 minutes

    • save 900 1

    • save 300 5

    • save 60 10000

  • disable

    • If you want to disable RDB persistence, simply do not set any save directives, or pass an empty string argument to save
    • “Save”

Stop-writes-on-bgsave-error

  • Stop meaning of write operation if save error

  • If set to NO, you do not care about data inconsistencies or other means of discovery and control

rdbcompression

  • Rdbcompression: For snapshots stored in disks, you can set whether to compress them. If so, Redis uses the LZF algorithm for compression. If you don’t want to use CPU for compression, you can turn this feature off

rdbchecksum

  • Rdbchecksum: After storing the snapshot, you can also have Redis use the CRC64 algorithm to verify the data, but this increases performance by about 10%. You can turn this off if you want the maximum performance gain

dbfilename

dir

  • Obtain the directory config get dir

The REPLICATION copy

SECURITY safety

  • View, set, and cancel access passwords

LIMITS to limit

  • Maxclients
    • Set how many clients Redis can connect to at the same time. The default is 10000 clients. When you cannot set the process handle limit, Redis will set the current handle limit value minus 32 because Redis will save some handles for its own internal processing logic. If this limit is reached, Redis rejects new connection requests and responds by issuing a “Max number of clients reached” to those connection requesters.
  • Maxmemory
    • Set the amount of memory redis can use. Once the memory usage limit is reached, Redis will attempt to remove internal data, which can be specified by maxmemory-policy. If Redis is unable to remove data from memory according to the remove rule, or if “remove not allowed” is SET, then Redis will return error messages for instructions that require memory allocation, such as SET, LPUSH, etc.
    • However, for instructions with no memory requisition, it will still respond normally, such as GET, etc. If your redis is the primary redis, then you need to set some memory space in the system for the synchronous queue cache when setting the memory usage limit, and only if you set it to “do not remove”, this factor is not taken into account
  • Maxmemory-policy
    • LRU algorithm: Leastest Recently Use Least recently used
    • Volatile- lRU: Removes keys using the LRU algorithm, only for keys with expiration dates
    • Allkeys-lru: Removes keys using the LRU algorithm
    • Volatile-random: Removes random keys from the expiration set, only for keys with expiration dates
    • Allkeys-random: removes a random key
    • Volatile- TTL: Removes the keys with the smallest TTL, that is, the keys that have recently expired
    • Noeviction: Does not remove, returns error messages for write operations
  • Maxmemory-samples
    • Set the number of samples. The LRU algorithm and the minimum TTL algorithm are not exact algorithms, but estimates, so you can set the size of the sample. Redis will check the number of keys by default and select the LRU one

APPEND ONLY MODE APPEND (*)

appendonly

appendfilename

Appendfsync

  • Three strategies

    • Always: Synchronous persistence Every data change is recorded immediately. Poor disk performance but good data integrity
    • Everysec: Asynchronous operation, recorded every second If the system is down within one second, data will be lost
    • No

No-appendfsync-on-rewrite: Specifies whether to use appendfsync (default No) to ensure data security

Auto-aof-rewrite-min-size: Sets the reference value for rewriting

Auto-aof-rewrite-percentage: specifies the reference value for rewriting

Redis.conf (*)

Parameters that

  • Conf configuration items are described as follows:
    1. Redis does not run as a daemon by default. You can modify this configuration item by using yes to enable daemonize no
    2. Pid is written to the /var/run/redis.pid file by default when Redis is running as a daemon. You can specify pidfile /var/run/redis.pid with pidfile
    3. The default port is 6379. In my blog post, I explained why 6379 was chosen as the default port because 6379 is the same number as MERZ on the phone keys, which is named after Alessia MERZ. Port 6379
    4. Bind 127.0.0.1 to the host address
    5. If the value is set to 0, timeout 300 is disabled
    6. Redis supports four log levels: debug, verbose, notice, and warning. The default value is verbose loglevel verbose
    7. The log mode is standard output by default. If Redis is configured to run in daemon mode and the log mode is configured as standard output, the log will be sent to /dev/null logfile stdout
    8. Set the number of databases. The default database is 0. You can use the SELECT command to specify the database ID on the connection
    9. The default Redis configuration file provides three conditions: Save 900 1 Save 300 10 Save 60 10000 indicates 1 change in 900 seconds (15 minutes), 10 changes in 300 seconds (5 minutes), and 10000 changes in 60 seconds respectively.
    10. Redis uses LZF compression. If you want to save CPU time, you can turn this option off, but it will cause database files to become large Rdbcompression yes
    11. Specifies the name of the local database file. The default value is dump. RDB dbfilename dump. RDB
    12. Specify the local database directory dir./
    13. If the Slav service is used, set the IP address and port number of the master service. When Redis starts, it will automatically synchronize data from the master to slaveof
    14. If password protection is enabled for the master service, the Slav service connects to the Master using the masterauth password
    15. Set the Redis connection password. If the password is configured, the client needs to provide the password by using the AUTH command when connecting to Redis. Requirepass foobared is disabled by default
    16. The maximum number of client connections that Redis can open at a time is the maximum number of file descriptors that the Redis process can open at the same time. If maxClients 0 is set, there is no limit. When the number of clients reached the limit, Redis will close new connections and return Max Number of clients reached error message maxClients 128
    17. Specifies the maximum memory limit for Redis. When Redis starts up, it loads data into memory. When the maximum memory is reached, Redis tries to clear expired or expiring keys first. The new VM mechanism of Redis will store the Key in memory and the Value in swap maxMemory
    18. Specifies whether to log after each update operation. Redis writes data to disk asynchronously by default. If not enabled, data may be lost for a period of time in the event of a power outage. Because redis itself synchronizes data files according to the above save conditions, some data will only exist in memory for a period of time. The default is no appendonly No
    19. Appendone.aof appendfilename appendone.aof specifies the update log filename
    20. Three values are available: no: data is cached by the operating system and synchronized to disk (fast) always: data is written to disk (slow and safe) manually by calling fsync() after each update operation everysec: Represents synchronization once per second (compromise, default) appendfSync everysec
    21. Specifies whether to enable the virtual memory mechanism. The default value is no. For a brief introduction, the VM mechanism stores data in paging mode, and Redis swaps cold data on less visited pages to disk. Frequently visited pages are automatically swapped out from disk to memory (I will analyze Redis VM mechanism in detail in a later article) VM-enabled No
    22. The default value is/TMP /redis.swap. Multiple Redis instances cannot share vm-swap-file/TMP /redis.swap
    23. All data larger than VM-max-memory is stored in virtual memory. No matter how small the VM-max-memory setting is, all index data is stored in memory (Redis index data) Keys), which means that when vM-max-memory is set to 0, all values actually exist on disk. The default value is 0 vM-max-memory 0
    24. The Redis swap file is divided into many pages. An object can be stored on multiple pages, but a page cannot be shared by multiple objects. Vm-page-size is set according to the size of the stored data. The page size is best set to 32 or 64bytes; If you store large objects, you can use a larger page, or if in doubt, use the default vM-page-size 32
    25. Set the number of pages in the swap file. Since the page table (a bitmap indicating that a page is free or used) is placed in memory, every 8 pages on disk consumes 1byte of memory. vm-pages 134217728
    26. Set the number of threads that can access the swap file. It is best not to exceed the number of cores on the machine. If set to 0, all operations on the swap file are serial and may cause a long delay. The default value is 4 VM-max-threads 4
    27. Glueoutputbuf Yes Specifies whether to combine smaller packets into one packet when replying to clients. By default, glueOutputBuf Yes is enabled
    28. Hash-max-zipmap-entries 64 hash-max-zipmap-value 512 Specifies that a special hash algorithm is used when the number of elements exceeds a certain threshold or the maximum number exceeds a certain threshold
    29. Activerehashing Yes specifies whether rehash hash is enabled. The default is enable
    30. Include additional configuration files. You can use the same configuration file among multiple instances of Redis on the same host, and each instance has its own specific configuration file include /path/to/local.conf

Redis persistence (*)

General introduction

website

RDB (Redis DataBase)

What is the

  • Writes a Snapshot of an in-memory data set to disk at a specified time interval. A Snapshot is restored by reading the Snapshot file directly into memory
  • Redis forks a separate subprocess for persistence, writing data to a temporary file that will be used to replace the last persistent file after the persistence process is complete. During the whole process, the main process does not perform any I/O operations, which ensures high performance. If large-scale data recovery is required and data integrity is not sensitive, the RDB mode is more efficient than the AOF mode. The downside of RDB is that data can be lost after the last persistence

Fork

  • Fork () Fork (); Fork (); Fork (); Fork (); Fork (); Fork (); Fork (); Fork (

Rdb stores the dump. Rdb file

  • The save command is used to quickly back up RDB files

Location of profile (see resolving profile snapshot)

How do I trigger an RDB snapshot

Configuration file Default snapshot configuration
  • The default is 10,000 changes in 1 minute, 10 changes in 5 minutes, or 1 change in 15 minutes
  • save 900 1
  • save 300 5
  • save 60 10000
The command save or bgsave
  • Can generate dump. RDB files instantly
  • Save: Save only, other ignore, all block
  • BGSAVE: Redis takes snapshots asynchronously in the background and responds to client requests. You can run the lastsave command to obtain the time when the last snapshot was successfully executed
Flushall also produces an empty dump. RDB file.meaningless

How to restore

  • Move the backup file (dump.rdb) to the redis installation directory and start the service
  • CONFIG GET dir Obtains the directory

advantage

Suitable for large-scale data recovery
Low requirements for data integrity and consistency

disadvantage

Backups are made at regular intervals, so if Redis unexpectedly goes down, all changes since the last snapshot are lost
At Fork, the data in memory is cloned, and roughly twice as much expansibility needs to be considered

How to stop

Redis -cli config set save “”

A small summary

AOF (Append Only File)

website

What is the

  • Every write operation is recorded in the form of a log. All write instructions performed by Redis are recorded (read operations are not recorded). Only files can be appended but files cannot be overwritten. When redis is restarted, write instructions are executed from front to back according to the contents of the log file to complete data recovery

AOF holds the appendone.aof file

Configure the location

  • See parse configuration file -APPEND ONLY MODE APPEND

AOF startup/repair/recovery

To restore normal
  • Start: Set Yes, modify default appendonly no to Yes
  • Make a copy of the aOF file with data and save it to the corresponding directory (config get dir)
  • Recovery: Restart Redis and reload
Abnormal return
  • Start: Set Yes, modify default appendonly no to Yes

  • Back up the bad AOF file

  • Fix: redis-check-aof –fix aof file to fix

  • Recovery: Restart Redis and reload

Rewrite

What is the
  • When the size of AOF files exceeds the set threshold, Redis will start the content compression of AOF files and only retain the minimum instruction set that can recover data. You can use the command bgrewriteaof

Rewrite the principle
  • If the AOF file continues to grow too large, a new process is forked to rewrite the file (also write temporary files first before rename), traversing the memory of the new process, with a Set statement for each record. The operation of overwriting an AOF file, rather than reading the old AOF file, commands the entire contents of the database in memory to rewrite a new AOF file, similar to a snapshot
triggering
  • Redis records the AOF size of the last rewrite, triggered by default when the AOF file size is double the size since rewrite and the file size is greater than 64M

advantage

  • Each modification synchronization: appendfSync always Synchronous Persistence Each data change that occurs is immediately logged to disk with poor performance but better data integrity
  • Synchronization per second: appendfsync everysec Asynchronous operation that records data loss if the service is down within one second
  • Not synchronized: appendfsync no is not synchronized

disadvantage

  • For the data of the same data set, AOF files are much larger than RDB files, and the recovery speed is slower than RDB
  • The running efficiency of Aof is slower than that of RDB, and the synchronization efficiency is better than that of RDB

A small summary

Conclusion (Which one)

Website suggest

Comparison:

  • RDB persistence allows you to take snapshots of your data at specified intervals

  • When the server is restarted, it will execute these commands again to restore the original data. The AOF command appends each write operation to the end of the file using redis protocol. Redis can also rewrite AOF files in the background to prevent the size of AOF files from being too large

  • Cache only: If you only want your data to live while the server is running, you can also do without any persistence.

Enable both persistence methods

  • In this case, when Redis restarts, AOF files will be loaded first to recover the original data, because AOF files usually hold more complete data sets than RDB files
  • RDB data is not real-time, and server restart will only look for AOF files when both are used. Should we just use AOF? The authors advise against it, as RDB is better for backing up databases (AOF is not easy to back up constantly), restarts quickly, and there are no potential AOF bugs left as a back-up measure.

Performance Suggestions

  • Since RDB files are only used for backup purposes, it is recommended to persist RDB files only on the Slave and backup them only once every 15 minutes, leaving only the save 9001 rule.
  • If you Enalbe AOF, the benefit is that you lose less than two seconds of data in the worst case. The startup script is simpler and you just load your own AOF files. The cost is constant IO and the inevitable blocking of writing new data to new files in the rewrite process. The frequency of AOF rewrite should be minimized as long as hard drives are permitted; the default base size of AOF rewrite, 64M, is too small and can be set to more than 5G. The default size exceeds 100% of the original size and overrides can be changed to an appropriate value.
  • If AOF is not enabled, only master-slave Replication can be used to implement high availability. It saves a lot of IO and reduces the system volatility that comes with rewriting. The trade-off is that if the Master/Slave is dropped at the same time, the data will be lost for more than ten minutes. The startup script will also compare the RDB files in the two Master/Slave files and load the newer one. Sina Weibo has chosen this structure

The affairs of Redis

What is the

website

  • Redis official website address

An overview of the

  • You can execute more than one command at a time, which is essentially a collection of commands. All commands in a transaction are serialized, executed sequentially and not inserted by other commands, without any insertion

Can do

  • The execution of a series of commands in a queue at one time, sequentially, and exclusively

How to play

Common commands

Case1: The operation is normal

Case2: The transaction is abandoned

Case3: All sit together

Case4: Creditors

  • Even if an error was made, queued was added to the queue, as opposed to case3 reporting an error at runtime and not being queued

Case5: Watch monitoring

Pessimistic lock/Optimistic lock /CAS(Check And Set)
  • Pessimistic locking
    • Pessimistic locks, as the name implies, are Pessimistic. Each time I fetch the data, I think someone else will change it, so I Lock the data each time I fetch it, so that someone else will try to fetch it and block it until it gets the Lock. Traditional relational database inside used a lot of this locking mechanism, such as row lock, table lock, read lock, write lock, etc., are in the operation before the first lock
  • Optimistic locking
    • Optimistic Lock, as the name implies, is very Optimistic. Every time I go to get data, I think that others will not modify it, so I will not Lock it. But when UPDATING, I will judge whether others have updated the data during this period, and I can use the version number and other mechanisms. Optimistic locking is suitable for multi-read applications to improve throughput
    • Optimistic locking policy: Commit version must be greater than record current version to perform update
  • CAS
Initialize the available credit card balance and debt
No tampering, monitoring and then multi to ensure that the two amount changes in the same transaction
Tampering with a plug
  • The key is monitored, and if the key is changed, the execution of a subsequent transaction fails
unwatch
  • Cancel the monitoring

Monitoring locks added before exec are removed once executed
summary
  • The Watch directive, similar to an optimistic lock, does not execute the entire transaction queue if the Key value has been changed by another client, such as a list that has been pushed/popped by another client

  • The WATCH command monitors multiple Keys prior to transaction execution. If any Key value changes after the WATCH, the EXEC command will abort the transaction and nullmulti-bulk reply will be returned to inform the caller that the transaction failed

Three phase

open

  • Start a transaction with MULTI

The team

  • Queues multiple commands into a transaction that are not executed immediately but are placed in a transaction queue waiting to be executed

perform

  • Execution: transactions are triggered by the EXEC command

Three features

  • Separate isolated operations: All commands in a transaction are serialized and executed sequentially. The transaction will not be interrupted by command requests from other clients during execution

  • There is no concept of isolation level: commands in the queue are not actually executed until the transaction is committed, because no instructions are actually executed until the transaction is committed, so there is no headache of “in-transaction queries seeing updates in the transaction and out-of-transaction queries not”

  • No atomicity guaranteed: Redis if a command fails in the same transaction, subsequent commands will still be executed without rollback – partial support for transactions

Redis publishing and subscribing

What is the

An overview of the

  • A message communication pattern between processes in which the sender (PUB) sends messages and the subscriber (sub) receives messages

Subscribe/publish message graph

The command

case

You can receive messages only after you subscribe and publish

operation

1.1. SUBSCRIBE C1, C2, C3
PUBLISH C2 hello-redis
2.1 Subscribe multiple, wildcard *, PSUBSCRIBE new*
2.2 Receiving messages, PUBLISH new1 redis2015

Redis replication (Master/Slave)

What is the

website

  • Redis primary/secondary replication
  • Jargon: A master/slaver mechanism that automatically synchronizes updated data from the host to the Slave host based on the configuration and policies. Master data is mainly written and Slave data is mainly read

Can do

Reading and writing separation

Disaster recovery

How to play

1. Subordinate (library) not master (library)

2. Slave library configuration: Slaveof Master library IP Port of the master library

  • Every time you disconnect from the master, you need to reconnect, unless you configure the redis.conf file
  • Info replication

3. Modify the details of the configuration file

3.1 Copying Multiple Redis. conf files
3.2 Enabling daemonize yes
3.3 Pid File name
3.4 Specifying a Port
3.5 Log File name
3.6 Dump. RDB name

4. 3 tips

A main two servants
  • Init

    • Info replication Views information

  • One Master and two slaves

  • The log view

    • The host log
    • From the machine log
    • info replication
  • Master-slave problem demonstration

    • 1. Entry point problem? Slave1 and Slave2 are copied from scratch or from pointcuts? For example, if you come in from K4, can you copy 123 before that

      A: Pointcut replication is not required from scratch

    • 2. Can the slave machine write data? Can set?

      A: The slave machine is read-only and cannot be set

    • 3. What happens after the host is shutdown? The slave is on top or standing by

      Answer: in situ standby, not superior

    • 4. After the host returns, the host adds new records to check whether the slave machine can successfully copy them

      A: It can be copied smoothly

    • 5. What happens when one slave machine goes down? Can it keep up with the big army?

      A: After the restoration, the original master/slave Settings are cancelled and the system changes to master. Re-slaveof will do

New generation
  • The previous Slave can be the Master of the next Slave, and the Slave can also receive the connection and synchronization request of other Slaves, so the Slave acts as the next Master in the chain, which can effectively reduce the writing pressure of the Master (decentralization).

  • Mid-change redirect: erases the previous data and creates a new copy

  • Slaveof New master library IP New master library port

Going to
  • SLAVEOF no one
    • Causes the current database to stop synchronizing with other databases and become the primary database

The principle

Replication process
  1. After connecting to the master successfully, the Slave sends the sync command
  2. The Master receives the command to start the background saving process and collects all the commands received to modify the data set. After the background process is complete, the Master sends the entire data file to the slave for a complete synchronization
  3. Full replication: After receiving the database file data, the slave service saves it and loads it to the memory
  4. Incremental replication: The Master sends all the new collected modification commands to the slaves in turn to complete the synchronization
  5. However, whenever the master is reconnected, a full synchronization (full copy) will be performed automatically

Sentinel mode (Sentinel)

What is the

  • The main automatic version, can background monitoring host failure, if the failure according to the number of votes will automatically convert from the main library

Using the step

1. Structural adjustment, 6379 with 80, 81
2. Create the sentinel.conf file with the correct name
3. Configure sentries and fill in the information
  • Sentinel monitor database name (own name) 127.0.0.1 6379 1
  • The last number, 1, indicates that when the host dies, Salve votes to see who takes over as the host
4. Activate sentry
  • Redis – sentinel/opt/Redis – 6.0.6 / sentinel. Conf

5. Normal master-slave demonstration
6. The original master dies
7. Vote for new elections
  • The sentry monitors the failure of the original master node and automatically votes for a new master node
8. Restart the primary/secondary server and check whether info Replication is running
9. Question: If the previous master is restarted, will there be a conflict between the two masters?
  • Don’t
  • Keep the newly elected master, which was down, as a child of the new master

A set of Sentinels can monitor multiple Masters simultaneously

Disadvantages of copying

Replication delay

  • Since all write operations are performed on the Master and then synchronized to the Slave, there is a delay in synchronization from the Master to the Slave. This delay is exacerbated when the system is busy, and is exacerbated by the increase in the number of Slave machines

Redis Java client Jedis

Jedis website

Basic usage

  • Jedis basic usage

Advanced usage

  • Jedis advanced usage

Jedis commonly used API

Testing connectivity

5 + 1

A key
Five data types

Transaction commit

daily
lock
  • In layman’s terms, the watch command marks a key. If a key is marked, the transaction will fail if the key is changed by someone else before the transaction is committed. In this case, you can usually try again in the program

    • First, the key balance is marked, and then check whether the balance is sufficient. If the balance is insufficient, the mark will be cancelled and no deduction will be made. If enough, start the transaction for update
    • If the key balance is modified by someone else in the meantime, an error will be reported when the transaction is committed (exec), and the program can usually catch such errors and execute again until it succeeds

A master-slave replication

  1. 6379,6380 activated, individually first

  2. The main writing

  3. From reading

JedisPool

To get a Jedis instance, you need to get it from JedisPool

Jedis instances need to be returned to JedisPool

If Jedis is used incorrectly, it also needs to be returned to JedisPool

case

JedisPoolUtil
JedisPool.getResource()

Configuration Summary All

JedisPool configuration parameters are mostly assigned by the corresponding items of JedisPoolConfig.
  1. MaxActive: Controls how many jedis instances can be allocated to a pool, using pool.getResource(); If the value is -1, there is no limit; If the pool has already allocated maxActive jedis instances, the pool is in the exhausted state

  2. MaxIdle: Controls the number of jedis instances in a pool that are in idle state

  3. Whenustedaction: Action to be taken when all jedis instances in the pool have been allocated.

    There are three default types:

    • When_hausted_fail: NoSuchElementException is thrown when there is no Jedis instance
    • When_hausted_bloc: when blocked, or JedisConnectionException thrown when maxWait is reached;
    • When_hausted_grow: indicates that a new jedis instance is created, that is, maxActive is useless.
  4. MaxWait: Indicates the maximum wait time when borrow a Jedis instance. If the wait time is exceeded, JedisConnectionException is thrown directly.

  5. TestOnBorrow: Whether to check connection availability when obtaining a Jedis instance (ping()); If true, the resulting Jedis instances are all available;

  6. TestOnReturn: check connection availability when returning a jedis instance to pool (ping())

  7. TestWhileIdle: If true, an idle object evitor thread will scan the idle object. If validate fails, the object will be dropped from the pool. This item is only meaningful when timeBetweenEvictionRunsMillis is greater than 0;

  8. TimeBetweenEvictionRunsMillis: it means the idle object evitor between two scans to sleep the number of milliseconds;

  9. NumTestsPerEvictionRun: indicates the maximum number of objects scanned by the Idle Object evitor at a time.

  10. MinEvictableIdleTimeMillis: an object at least stay in the idle state of the shortest time, and then you can be idle object evitor scans and expelling; This item is only meaningful when timeBetweenEvictionRunsMillis is greater than 0;

  11. SoftMinEvictableIdleTimeMillis: on the basis of minEvictableIdleTimeMillis, joined at least minIdle target has had in the pool. If it is -1, Evicted does not expel any objects based on idle time. If minEvictableIdleTimeMillis > 0, then this setting meaningless, and is only meaningful when timeBetweenEvictionRunsMillis is greater than 0;

  12. When lifo: borrowObject returns an object, DEFAULT_LIFO (last in first out, i.e. the most frequently used queue similar to cache) is adopted. If it is False, it indicates FIFO queue.

  13. The default Settings of JedisPoolConfig for some parameters are as follows: testWhileIdle=true minEvictableIdleTimeMills=60000 timeBetweenEvictionRunsMillis=30000 numTestsPerEvictionRun=-1

Redis Cluster (brief)

What is the

  • Redis cluster achieves horizontal expansion of Redis, that is, start N Redis nodes, store the entire database in these N nodes, and each node stores 1/N of the total data
  • Redis clusters provide a degree of availability through partitions: the cluster can continue to process command requests even if some nodes in the cluster fail or fail to communicate

The cluster configuration

The Redis Cluster configuration is modified

  • Cluster-enabled yes Enables the cluster mode
  • Cluster-config-file node-6379.conf Sets the node configuration file name
  • Cluster-node-timeout 15000 Specifies the node disconnection time. After the time (milliseconds), the cluster automatically performs a primary/secondary switchover

Integrate redis instances

  • CD/opt/redis – 6.0.6 / SRC
  • ./redis-trib.rb create –replicas 1 xxx:6379 xxx:6380 xxx:6381

Cluster allocation rules

  • A cluster must have at least three nodes
  • The –replicas 1 option means that we want to create a slave node for each primary node in the cluster
  • The allocation principle tries to ensure that each primary database runs on a different IP address, and that each secondary and primary database does not run on the same IP address

What is the slots

  • A Redis cluster contains 16384 hash slots in which each key in the database belongs. The cluster uses the formula CRC16(key) % 16384 to calculate which slot the key belongs to. CRC16(key) is used to calculate the CRC16 checksum of the key

  • Each node in the cluster is responsible for processing a portion of the slots. For example, if a cluster can have a primary node, where:

    • Node A processes slots 0 to 5500
    • Node B processes slots 5501 to 11000
    • Node C processes slots 11001 to 16383

Common Cluster Commands

  • Start the client in cluster mode: redis-cli -c -p 6379
  • View cluster information: Cluster Nodes
  • Calculate on which slot the key should be placed: CLUSTER KEYSLOT
  • CLUSTER COUNTKEYSINSLOT returns the number of key/value pairs that the slot currently contains
  • CLUSTER GETKEYSINSLOT returns count of the keys in the slot