The yanxi Palace of the database world is coming, and you will lose without looking at you

The idea that winners take all and losers take all seems to have lasted from ancient times to the present day. Must distinguish a win or lose, distinguish who is good, who is bad to stop.

The same can happen in the database world, where I’ve been working for years. MySQL database occupies the top position of open source database, and MongoDB occupies the first place of NoSQL database.

Let’s take a look at the overall ranking of the database:

Both are number one, so they are always compared. MongoDB 4.0 has come out and supports transactions. Will MongoDB be able to replace MySQL in the future?

MySQL or MongoDB which database to use? Today, I would like to pass this article, with a full interpretation of the difference between MySQL and MongoDB. Let confused old players understand that there is no substitute for each other, only which scene suits each other better.

Only with a better understanding of each other can we make better use of their functionality. Here are the differences between the two in four directions:

Database Overview
Daily operation and maintenance management dimensions
Cluster Architecture Level
Application Scenario Angle

Database Overview

Let’s take a look at the MySQL database.

MySQL database features:

After MySQL understanding, we will understand the introduction of MongoDB and its characteristics:

MongoDB features are described as follows:

According to the diagram above, we have a certain understanding of both databases. Next, we will examine the differences between the two databases from the perspective of operation and maintenance.

Daily operation and maintenance management dimensions

Differences in terms and concepts

As you can see from the figure above, “tables” in relational databases are called collections in MongoDB. Rows are called documents in MongoDB. So we call MongoDB a document database.

Differences in storage data structures

When designing tables in a relational database, some information requires multiple table records. In MongoDB, the above three tables become the following code can be implemented.

{ _id:"M416", name:"zhangsu", phone: [1234567],... }Copy the code

The characteristics of MongoDB table design are as follows:

Data aggregation
Nested data
Array structure

The startup configuration file format is different

MySQL database configuration is called my.cnf.

[client] port    = 3306 socket  = /data/mysql/mysql.sock  [mysql] prompt="\u@db \R:\m:\s [\d]> " no-auto-rehash  [mysqld] user    = mysql port    = 3306 basedir = /usr/local/mysql datadir = /data/mysql/ socket  = /data/mysql/mysql.sock pid-file = db.pid character-set-server = utf8mb4 skip_name_resolve = 1 open_files_limit    = 65535 back_log = 1024 max_connections = 512 max_connect_errors = 1000000 table_open_cache = 1024 table_definition_cache = 1024 table_open_cache_instances = 64 thread_stack = 512K external-locking = FALSE max_allowed_packet = 32M sort_buffer_size = 4M join_buffer_size = 4M thread_cache_size = 768 #query_cache_size = 0 #query_cache_type = 0 interactive_timeout = 600 wait_timeout = 600 tmp_table_size = 32M max_heap_table_size = 32M slow_query_log = 1 slow_query_log_file = /data/mysql/slow.log log-error = Log long_query_time = 0.1 server-id = 3306101 log-bin = /data/mysql/mybinlog sync_binlog = 1 binlog_cache_size = 4M max_binlog_cache_size = 1G max_binlog_size = 1G expire_logs_days = 7 master_info_repository = TABLE relay_log_info_repository = TABLE gtid_mode = on enforce_gtid_consistency = 1 log_slave_updates=1 binlog_format = row relay_log_recovery = 1 relay-log-purge = 1 key_buffer_size = 32M read_buffer_size = 8M read_rnd_buffer_size = 4M bulk_insert_buffer_size = 64M #myisam_sort_buffer_size = 128M #myisam_max_sort_file_size = 10G #myisam_repair_threads = 1 lock_wait_timeout = 3600 explicit_defaults_for_timestamp = 1 innodb_thread_concurrency = 0 innodb_sync_spin_loops = 100 innodb_spin_wait_delay = 30 secure_file_priv='' super_read_only=0 transaction_isolation = REPEATABLE-READ #innodb_additional_mem_pool_size = 16M innodb_buffer_pool_size = 1024M innodb_buffer_pool_instances = 8 innodb_buffer_pool_load_at_startup = 1 innodb_buffer_pool_dump_at_shutdown = 1 innodb_data_file_path = ibdata1:100M:autoextend innodb_flush_log_at_trx_commit = 1 innodb_log_buffer_size = 32M innodb_log_file_size = 2G innodb_log_files_in_group = 2 innodb_max_undo_log_size = 4G innodb_io_capacity = 4000 innodb_io_capacity_max = 8000 innodb_flush_neighbors = 0 innodb_write_io_threads = 8 innodb_read_io_threads = 8 innodb_purge_threads = 4 innodb_page_cleaners = 4 innodb_open_files = 65535 innodb_max_dirty_pages_pct = 50 innodb_flush_method = O_DIRECT innodb_lru_scan_depth = 4000 innodb_checksum_algorithm = crc32 #innodb_file_format = Barracuda #innodb_file_format_max =  Barracuda innodb_lock_wait_timeout = 10 innodb_rollback_on_timeout = 1 innodb_print_all_deadlocks = 1 innodb_file_per_table = 1 innodb_online_alter_log_max_size = 4G internal_tmp_disk_storage_engine = InnoDB innodb_stats_on_metadata = 0 innodb_status_file = 1 [mysqldump] quick max_allowed_packet = 32MCopy the code

The MongoDB configuration file is in Yaml format, as shown below:

Add, delete, change and check the difference between operations

Differences in transaction support

But with MongoDB 4.0, which will support multi-document transactions, MongoDB will be the only database that supports speed, flexibility, JSON document model, and ACID data integrity at the same time.

The so-called multi-document transaction can be understood as the multi-row transaction of a relational database. In relational transaction support, almost everyone supports atomicity of operations within the same transaction, that is, commit all or rollback all.

There can be multiple operations within the same transaction, on multiple tables, or on multiple rows of data within the same table.

Conclusion: With the increase of transaction support, MongoDB functions more like relational databases, but there are still essential differences.

MySQL is a relational model-based database, which is not as good as MongoDB for various data changing scenarios such as the Internet of Things or socialization.

The JSON model of MongoDB is dynamic and flexible, and the database can be upgraded without going offline. In this scenario, MongoDB is particularly suitable.

Differences in backups

MySQL backup mode, as shown below:

MongoDB backup mode (logical backup and restore) :

mongodump
mongorestore
mongoexport
mongoimport

Note: MongoDB does not yet have abackup tool as useful as Xtrabackup. Therefore, logical backup is generally used to perform operations.

After we have a deeper understanding of them from the perspective of operation and maintenance, we will explore deeper differences from the dimension of cluster architecture.

Cluster Architecture Level

Differences at the cluster architecture level

We will start with MySQL replication and then introduce the MySQL high availability cluster architecture.

MySQL master/slave replication schematic diagram

MySQL replication types summary

Asynchronous replication: Usually asynchronous replication is not specified. That is, after the primary database performs a Commit, the Binlog logs written by the primary database can be returned to the client successfully. The Binlog logs do not need to be sent to the secondary database.

Semi-synchronous replication: After MySQL 5.5, semi-synchronous replication is introduced. The semi-synchronous replication plug-in must be installed on both the primary and secondary servers to enable this function.

In this function, the waiting thread on the master library is notified of the completion of the operation only after ensuring that the contents of the Binlog from the master library have been written to its own Relay Log.

If the wait times out and exceeds the time set by rpl_semi_SYNC_master_timeout, the semi-synchronous replication is disabled and the asynchronous replication mode is automatically changed until at least one secondary library notifies the master library that it has received Binlog information.

Multi-source replication: The so-called multi-source replication is to synchronize data from multiple master libraries to a slave server. The slave libraries create pipes to each master library.

Prior to MySQL 5.7, only one master, one master, many slave, or many master, many slave replication architecture can be implemented. If you want to implement multiple master, one slave replication, you can only use MariaDB. MySQL 5.7 already supports multiple master and one slave replication.

Parallel replication: Use MySQL 5.7 parallel replication. The concept of parallelism was introduced in version 5.6, but parallel replication is done at the library level, namely slave_parallel_type=database. In this mode, just based on multiple libraries and few tables, is not suitable for a real production environment.

In MySQL 5.7, the parallel replication based on group commit is really realized. Simply speaking, the master library executes SQL statements in parallel, and the slave library can also execute transactions submitted by the master library in the Relay Log concurrently through multiple Workers threads.

To enable parallel replication for MySQL 5.7, set the slave library parameter slave_parallel_workers > 0.

And set the slave_parallel_type parameter newly added in version 5.7 to LOGICAL_CLOCK.

This parameter has two values: DATABASE and LOGICAL_CLOCK. MySQL 5.6 defaults to DATABASE.

MySQL high availability cluster architecture

MySQL ha cluster architecture classification diagram is as follows:

MHA

MHA cluster architecture diagram

The purpose of MHA is to maintain the high availability of the Master library in MySQL Replication. The main feature of MHA is that it can repair the difference log between multiple slaves. Finally, all slaves keep the data consistent, and then select one of them to act as the new Master. And point other slaves to it.

When the Master fails, you can read the Position number of the Binlog of the Master library by comparing the I/O threads among the slaves, and select the nearest Slave as the secondary Master library (Slave Slave). Other slave libraries can generate different relay logs by comparing them to the alternate master library.

Apply the Binlog saved from the original Master to the secondary Master, and promote the secondary Master to Master. Finally, the corresponding differential relay logs are applied to the other slaves and replicated from the new Master.

Double master + Keepalived

This structure is easiest for small to medium sized enterprises. The two nodes can be in a simple master slave mode, or dual master mode.

In addition, they are placed in the same VLAN, in the event of a Master node failure, Keepalived/Heartbeat is highly available mechanism to quickly switch to the Slave node.

PXC cluster

PXC is a high availability cluster architecture of MySQL based on Galera protocol. Galera products provide high availability Cluster solutions for MySQL in the form of Galera Clusters. Galera Cluster is a MySQL Cluster that integrates Galera plug-ins.

Galera Replication is a MySQL data synchronization solution from Codership that is highly available and easy to scale.

In addition, it can realize synchronous data replication and read and write between multiple MySQL nodes, ensuring high availability of database services and strong data consistency.

MGR architecture

MySQL Group Replication (MGR) is available in 5.7.17. Master1, Master2, Master3, all members complete their own transactions independently.

When a client initiates an update transaction, the transaction is executed locally and the commit operation is initiated after completion.

The resulting copy write set needs to be broadcast and copied to other members before it is actually committed. If the conflict detection succeeds, the group decides that the transaction can be committed and applied by other members, otherwise it is rolled back.

Ultimately, this means that all group members receive the same set of transactions in the same order. Therefore, the group members apply the same changes in the same order to ensure data consistency in the group.

MongoDB replication

MongoDB replication set, as shown below:

The three-copy architecture is the most basic replica set architecture, one master and two standby mode. The active node receives read/write requests and synchronizes data to the standby node. When the active node fails, the standby node is automatically switched over to the active node, which does not affect online services and prevents single points of failure.

MongoDB replication sets automatically switch, as shown below:

All members of the replica set can be read. By default, however, the application points its read operations to Primary.

Replica sets can have up to one Primary node, and when the Primary node goes down, the cluster triggers an election to select a new Primary node.

In the following three-member node replica set schema, when the Primary goes down, an election is triggered to elect a new Primary node from the remaining two Secondary nodes.

Read/write separation Settings of MongoDB replication set, as shown below:

Read Preference determines which node the MongoDB client reads data from. By default, the application points its read operations to the Primary node in the replica set.

Note when specifying the Read Preference option: Because asynchronous replication is used, replication delays can cause data on Secondary to be not up to date.

By default, all Read requests from a replicate set are sent to the Primary node. The Driver can route Read requests to other nodes by setting Read Preference:

Primary: The default rule. All read requests are sent to Primary.
PrimaryPreferred: Primary preferred. If the Primary is unreachable, request Secondary.
Secondary: All read requests are sent to Secondary.
SecondaryPreferred: SecondaryPreferred. When all Secondary is unreachable, the Primary is requested.
Nearest: Send the read request to the Nearest reachable node (Ping the Nearest node).

The MongoDB sharding architecture is shown as follows:

Sharding is a method of distributing data across multiple machines. MongoDB uses a sharding architecture to help you manage clusters with very large data sets and high throughput operations.

The business situation of large data volume and high throughput is very challenging for a single server. For example, high query rates may exhaust the CPU capacity of the server.

When the working set size exceeds the system memory, the stress is put on disk, which is not desirable for IO. MongoDB supports horizontal scaling by sharding.

Summary: MySQL has many types of replication, and cluster architecture is more selective. However, the horizontal expansion capability is not as strong as the sharding architecture of MongoDB.

Finally, we make a summary of the two databases through the different application scenarios of MySQL and MongoDB.

Application Scenario Angle

As mentioned in the introduction to MySQL’s features, MySQL has nearly 100% coverage.

Large BAT, e-commerce platforms, game companies, and even many traditional industries are all moving towards MySQL database without exception, reaching the trend of gradual monopoly.

The application of MongoDB has also spread to various fields, such as games, logistics, e-commerce, content management, social networking, Internet of Things, live video, etc. :

Game field: use MongoDB to store game user information, users’ equipment, points and other directly stored in the form of embedded documents, convenient query, update.

Logistics scenario: MongoDB is used to store order information. The order status is constantly updated during delivery and stored in the form of an embedded array in MongoDB. All changes of an order can be read out in a single query.

Social scenarios: use MongoDB to store user information and the information published by users in the circle of friends, and realize the functions of nearby people and places through the geographical location index.

Internet of Things scenario: Use MongoDB to store information about all connected smart devices and logs reported by the devices, and analyze the information in multiple dimensions.

I started working with MySQL in 2009 and MongoDB in the first version 2.1 in 2012, and it was really a palm of my hand for both databases.

When I am lonely and lonely, they always accompany me, thanks to technology to bring us simple happiness.

No matter what happens in the future, there is no such thing as who will replace who, just that each of them has its own characteristics that make it more appropriate for us to use them in different application scenarios.

There is no palace infighting, no intrigues, only the most simple heart to do technology, is the reality version of “Yanxi Palace”!

Zhang Su is an expert and well-known person in the field of database, author of the book “MySQL King promotion Road” and expert blogger 51CTO. Nearly 10 years of online processing and training experience in the Internet, focusing on MySQL database, NoSQL database such as MongoDB, Redis and Hadoop ecosystem related technology in-depth research, has very rich theoretical and practical experience.

Original: http://database.51cto.com/art/201808/582300.htm

The yanxi Palace of the database world is coming, and you will lose without looking at you

Related Posts

Summary of Python crawler simulation login methods

Have you fallen into any of these Spring “traps” (2)

When I look at Event Driven Architecture, what comes to mind (I)