This article comes from a real interview I had.

The company’s real name is “Sanzang”, and my name “Wukong” very fit, Tang Sanzang gave Wukong interview, reasonable, but also with a bit of interest, so I went to the interview. Sanzang company is a small factory, the technical person in charge of my face, to know the interview results, the end of the article announced.

The main contents of this paper are as follows:

The difference between MongoDB and MySQL

1. Interviewer: Look at your resume and you wrote MongoDB, tell me the difference between MongoDB and MySQL.

In fact, I was prepared for this question in advance, the resume mentioned MongoDB, the interviewer will ask the difference between MongoDB and MySQL.

MongoDB is a non-relational database (NoSQL), belonging to the document database, document database is to solve the problems brought by the relational database. The biggest feature is no-schema, which can store and read arbitrary data.

The stored data format is JSON (or BSON). The JSON format is familiar. For example, the Response returned by a Rest API request is in JSON format. The difference between JSON data and XML is that JSON is simpler and has fewer tags to define field names. That is, JSON is self-describing. In addition, when the JSON format is stored in MongoDB, even if a field does not exist in JSON, it will not cause syntax errors like SQL.

Mongo advantages

Because of the no-schema feature of document databases, there are several obvious benefits.

(1) The newly added field will not make mistakes.

For example, adding a nickname field to a user table does not require updating the table structure as relational data does. We can query this document directly and see the new fields.

(2) There is no error in querying historical data.

As mentioned above, a new nickname field is added, but it does not exist in the historical data. If historical data is queried, this field will not exist in the returned data. Although the query does not report an error, the value will return null. If the business code uses a nickname field, you need to do compatibility.

(3) Easy storage of complex data.

Because it’s stored in JSON, and JSON can represent complex data structures, such as fields that can hold arrays, fields that can nest fields, and fields that can hold many fields. For MySQL, you need to design several tables to store. The structure of MongoDB data storage is particularly suitable for e-commerce business scenarios. For example, two different commodities have very different attributes, but JSON storage can be easily handled.

But what are the downsides of document databases?

Directing a disadvantage

(1) Multi-document transactions are not supported before 4.0.

Combined with MongoDB document model embedded array, document support, single document transaction can meet the needs of most developers. To make MongoDB more adaptable and easier to develop, MongoDB 4.0 will support replication of multi-document transactions within a set across one or more collections, ensuring atomicity of updates to multiple documents. Distributed transactions for shard clusters will also be supported in a future MongoDB 4.2 release.

Let’s take a look at what MongoDB supports in different versions:

MongoDB’s transaction interface is very simple, the developer only needs to place the atomicity of the update sequence between the start and commit transactions of a session. Here is sample code for Java using MongoDB transactions:

(2) Associated query is not supported.

We all know that MySQL supports associated query, that is, can perform Join operation. For example, there are two tables: the user table and the order table. The order table has the user ID, and the gender exists only in the user table. If you want a male user who has bought a mobile phone, you can do it in one step with relational query. However, if you use MongoDB, you need to look twice, first to find out who bought the phone in the order table, and then to find out which of those users are male.

Second, the disadvantages of relational data

Interviewer: Why didn’t you use a relational database for this project? What are the disadvantages of relational databases?

The interviewer is asking about the shortcomings of relational databases.

The shortcomings of relational databases

(1) Store row records.

Cannot store data in array, nested fields, etc.

(2) It is inconvenient to expand the table structure.

Operating on nonexistent columns generates an error, and adding columns requires SQL statement execution. And you need to be careful when making changes, because tables can be locked for long periods of time while updating, which can have a serious impact on the online environment.

(3) High memory usage.

A relational database can have a high memory footprint when performing operations such as statistics on a large table because it reads an entire row of data from a storage device into memory even if it is performing operations on a single column.

(4) Poor full-text search performance

Relational databases similar to MySQL can only use like to match whole table scans, which is inefficient. Today, there are many scenarios that need to support fuzzy matching, and they must support efficient lookups. For example, to query log information containing keywords, or to query the list of products based on a certain keyword.

To address the above shortcomings, we used two non-relational data storage solutions in this project: MongoDB and ElasticSearch.

NoSQL classification and characteristics

Interviewer: What NoSQL databases do you know? What are their characteristics?

NoSQL = Not Only SQL;

I know Redis, MongoDB, HBase, Elasticsearch. They are different non-relational storage schemes.

K – V storage

Redis, for example, can store data in k-V key-value pairs, and store values in several formats, such as String, Hash, list, set, bitmap, etc.

Document storage type

MongoDB, for example, stores jSON-formatted documents to solve the problem of table constraints of relational databases, such as queries for non-existent fields will report errors. It also solves some of the storage format problems, as JSON can represent arrays and nested field storage.

Column storage

For example, HBase stores data in columns, which solves I/O problems in big data scenarios.

Relational databases store data in rows, so they are called row databases. Storing by row has the following advantages:

  • Multiple columns can be read from a single row, and multiple columns can be read into memory with only one disk operation.
  • A row of data can be written to multiple columns, ensuring the atomicity and consistency of row data. The multi-column write operation of column storage may cause some columns to succeed and some to fail, resulting in data inconsistency.

Full text search engine

The most used place for this is the logging system, as well as searching for product information and similar scenarios. The e-commerce site shown in the picture below.

The log search we use in our project is using ELK.

Elasticsearch is the E of ELK. Elasticsearch is a full text search engine. Note that it is a NoSQL solution, not a NoSQL database.

Logstash is the L of ELK. It is one of the core products of Elastic Stack and can be used to aggregate and process data and send it to Elasticsearch. Logstash is an open source server-side data processing pipeline that allows you to capture, enrich and transform data from multiple sources at the same time before indexing it to Elasticsearch.

Kibana is the K in ELK. Elasticsearch is a data visualization and management tool for Elasticsearch. It provides real-time histogram, linear graph, etc.

As shown below:

Traditional relational databases rely on indexes for quick queries, but in the context of full-text search, it does not work.

Let’s take a look at why it is difficult for relational databases to do efficient full-text search:

  • Because in full-text search, search conditions can be randomly arranged and combined, for example, fields A, B, C, can be arranged into six kinds, if you want to use indexes to support fast query, you need to create multiple indexes, which is very troublesome, at the same time, multiple indexes have an impact on the efficiency of data insertion.
  • Fuzzy matching can only use like query, and like query is the whole table scan, the efficiency is very low.

In my previous post on Elasticsearch, I wrote an article on how to find the full text of Elasticsearch by inverting the index. Here’s an example of an inverted index:

Suppose the database has the following movie records:

1- A Chinese Odyssey

2- A Chinese Odyssey

3- Analysis of A Chinese Odyssey

4- Journey to the West

5- Exclusive interpretation of fantasy Journey to the West

Participle: To break an entire sentence into words:

The serial number Save the word to ES The corresponding cINEphile number
A Westward journey 1, 2, 3, 4, 5
B Big words 1, 2, 3
C gaiden 2, 4, 5
D parsing 3, 5
E Drop the magic 4
F dream 5
G exclusive 5

Retrieval:Exclusive Journey to the West

The exclusive westward journey is divided into exclusive, westward journey and westward journey

ES records A, B, and G all have one of these three words, so records 1,2, 3,4, and 5 all have related words hit.

No. 1 recorded 2 hits, both in A and B (2 hits), and No. 1 recorded 2 words, correlation score: 2 hits /2 words =1

No. 2 records hit both words A and B (hit twice), and No. 2 records hit two words, correlation score: 2 times /3 words = 0.67

No. 3 record hit both words A and B (hit twice), and No. 3 record hit two words, correlation score: 2 times /3 words = 0.67

No. 4 record hit 2 words A (hit 1 time), and No. 4 record hit 3 words, correlation score: 1 time /3 words = 0.33

No.5 record hit 2 words A (hit 2 times), and No. 4 record hit 4 words, correlation score: 2 times /4 words = 0.5

Therefore, the order of retrieved records is as follows:

1- A Chinese Odyssey to the West (Wish related score: 1)

2- Legend of A Chinese Odyssey (Wish score: 0.67)

3- Analysis of A Chinese Odyssey to the West (Wish score: 0.67)

5- Exclusive analysis of Fantasy Journey to the West (Thinking score: 0.5)

4- Conquering the Demons on your Journey to the West (Wish score: 0.33)

Elasticsearch vs. mysql

The serial number Mysql Elasticsearch
1 The Mysql service ES Cluster service
2 Database Database The Index Index
3 Table Table Type the Type
4 Record the Records in rows. Document (JSON format)

There’s also a graphical database in NoSQL, which I won’t expand here.

How to choose between relational and NoSQL?

4. Interviewer: How to choose between relational and NoSQL?

Relational and NoSQL database selection, considering several indicators, data volume, concurrency, real-time, consistency requirements, read and write separation, security, operation and maintenance, etc. According to these metrics, software systems can be divided into several categories.

  • Relational systems are preferred for management systems, such as operations systems.
  • In a high-traffic system with multiple fields and fast data growth, NoSQL is preferred.
  • Log system: Elasticsearch is preferred
  • Search system, refers to the site search, non-general search, such as commodity search, preferred Elasticsearch.
  • Transactional system, such as inventory, transaction, accounting, choose relationship + cache + consistency protocol.
  • Columnar databases are preferred for offline computing, such as bulk data analysis.
  • Real-time computing, real-time monitoring, timing database, or column database.

Interview result: The technical director thinks it is ok, but THE HR is not here today, and there will be no further notification until the next notification from HR. To the end.