Interviewer: Tell me what you know about NoSQL. Why NoSQL

However, when dealing with some requirements, relational databases are often unable to hold, and need to be supplemented by non-relational databases. What are the problems with relational databases?

Relational database

Row storage structure

For example, a relational database is a row storage structure, so when you only want to take a few columns in a row, the data from the hard disk to the memory will also be the entire row of data, when the data volume is very large, IO will be too much. Of course, it is possible to solve this problem by using vertical partition tables, but vertical partition tables also introduce complexity. For details, see this article I wrote about mysql database partition tables

And the attention of headline list, for example, if in a relational database, it is your id, you pay attention to the person’s id, kept such a line of data, and then focuses on the people there are 10 will have 10 such records, and then you see, when you pay attention to the list of needs to check 10 rows from a database, and then assemble it back to the front end.

Strong structure

Strong structure means that the table structure of a relational database has strong constraints and must be stored in such a format, which is not flexible enough. Therefore, when new requirements need to add fields, the table structure needs to be modified. If the table has a lot of data, modifying the table structure may lock the table for a long time, resulting in the table is not available.

Not as fast as an in-memory database

Redis, for example, can’t match the speed of relational databases in memory.

The full text retrieval ability is weak

If full-text retrieval is provided, general relational databases can only like full table scan, which is poor performance, although full-text indexes like mysql are also available. But I think there’s a specialty. Because these database vendors want to expand the breadth of their software, of course they want their software to support all the needs and then let everyone use it. If it works this well, things like ElasticSearch will still live. Mongodb4, for example, supports ACID transactions and so on.

A software-based design determines that it is good on the one hand and bad on the other. ** Generally speaking, performance or flexibility will be sacrificed to achieve something that is not its domain. ** So here are the strengths of each type of NoSQL, their strongest points.

NoSQL (not only SQL)

NoSQL is actually a supplement to the relational database, at present we are impossible to leave the relational database, so NoSQL to make up for the deficiency of the relational database in some cases. Common NoSQL is divided into four types: K-V, document, column storage, and full-text search

1, K – V type

Key-value, this is something we’re all familiar with like Map. The representative database is Redis. The value of Redis is divided into many structures, such as list, set, sorted set, hash, string, etc.

It is stored in memory, so fast is often used as a cache server. And because of its structure, some operations are easier than a relational database.

For example, the [LPUSHX key value] operation on List inserts a value into the head of an existing List, which is ordered. What would you do if you were in a relational database? Insert a piece of data, and set the field controlling the position, for example, index, to 1. I’m going to have to change the index of all the following rows, so that I can control the order, which data to delete, and maintain the index change. Operation is tricky.

However, ACID transactions only support I and C, i.e. isolation and consistency, not atomicity and persistence. So it is not suitable in some cases where transaction requirements are required.

2. Document type

This type has an unconstrained structure and can store any structure because it’s a document. For example, in a relational database, there are only two table fields, one id and one name. If you want to store the sex field you have to change the table structure. The document type is not used, because the data format of the document type is usually Json, AND I can fill in the fields inside Json, and it is free.

{
 "id":"1"."name":"aa"
}
Copy the code

Let me plug in the next one let me write it this way

{
 "id":"2"."name":"bb"."sex":"man"
}
Copy the code

And this type of database is easy to store complex structures, because Json is a powerful description language that can clearly describe complex data structures. If a complex data structure is placed in relational data it may score many tables. For example, a table for my user’s basic information, a table for the user’s favorite movies, a table for the user’s favorite music, a table for the user’s favorite games, blah, blah, blah, all of these Json can be done at one time without having to divide so many tables.

The database represented is MongoDB. Before 3.2, the join operation is not supported, and then a LOOKUP is used to implement the join operation. Transactions were not supported prior to version 4.0, and since then, although transactions are supported, the industry has rarely used them to guarantee transactions.

3, column storage type

That is, data is stored in columns, whereas relationships are stored in rows. The advantage of row storage is that services can easily obtain data in one row, that is, multiple columns. Because row storage is continuous, disks can read data in all columns in one operation.

But by column, because the storage of columns is discontinuous, disk reads are less efficient than rows

Write by row If the operation is also a row together, all columns are guaranteed to either be written successfully or fail. If you do it by column it’s possible that some columns will succeed and some columns will fail.

However, in big data statistics, data of a certain column or several columns are generally counted. If the storage is row by row, the entire row will be read each time from disk to memory, resulting in too much I/O and a waste of resources.

So save I/O by using column storage, so that you only need to take the desired column for statistics each time.

The HBase database is used for offline big data analysis and statistics. Why offline? As mentioned above, the operation of write may have problems, and the efficiency of whole line reading is low, so online data is generally copied to make a column database, specializing in user data analysis.

4. Full-text retrieval

This type of database is mainly used in the traditional relational database in the full text retrieval of the situation. Because the search conditions are many, such as looking for objects in the website search, female +170+ Hangzhou + love spicy + love fitness + love travel +28 years old. Let’s think about how a relational database would build this index… There are too many permutations and combinations of search criteria. So relational databases don’t work. Remember to introduce a full-text search database.

Full-text search engines use inverted indexing, meaning that each word is indexed, to build an index of words into documents, so that results that meet your search criteria are quickly displayed.

Represents Elasticsearch, distributed document storage. The way to use it is to export data from a relational database, convert it to Json format and then enter it into Elasticsearch for indexing and use.

For Elasticsearch, there will be no further analysis. You need to find relevant information by yourself. Elasticsearch is also document-oriented, but its focus is on full text retrieval, so it’s classified as such.

conclusion

Relational and non-relational databases complement each other, so we need to architecture them according to their strengths, weaknesses and needs. There is no best, only the most suitable.

If wrong welcome to correct! Personal public account: Yes training level guide