• How to choose the right database
  • Original article by Tzoof Avny Brosh
  • Translation from: The Gold Project
  • This article is permalink: github.com/xitu/gold-m…
  • Translator: Jessica
  • Proofreader: Ruby, Starry

How to choose the right database

We’ll discuss existing database types and database best practices for different project types.

Whether you’re an experienced software engineer or a student writing a university course, you’ll always have to choose a database for a project.

If you’ve ever worked with a database, you might say “I’ll just pick X, that’s the database I know and have used,” which is perfectly fine if performance isn’t an important requirement of the system. Otherwise, a faulty database can become a project roadblock as the project scales, and is sometimes difficult to fix. Even if you are working on a mature project that has been using a particular database for some time, it is important to understand its limitations and know when to add another type of database to the stack (it is common for multiple data combinations to work).

Another plus for knowing about different databases and their properties is that it’s a common question in an interview!

In this article, we will discuss two main database types:

  • Relational database (BASED on SQL).
  • NoSQL database.

We’ll discuss the different types of NoSQL databases and when to use them. Finally, we’ll discuss the pros and cons of relational versus NoSQL databases. This article will not cover comparisons between different products of the same type of database (such as MySQL and MS SQL Server).

Article summary: If you want a quick cheat sheet through this article, skip to the end of the article.


Relational database (BASED on SQL)

A relational database consists of a set of joined tables, such as CSV tables. Each row in the table represents a record.

Why is it called relational? What are the “relationships” that exist in such databases? Suppose you have a student information sheet and a course score sheet (course, grade, student ID), and each score row is associated with one record in the student information sheet. See the figure below, the value of the “Student ID” column in the course grade table points to the row in the “Students” table through the value of the “ID” column.

All relational databases query using SQL like languages that are common and come with JOIN operations (that is, JOIN operations that combine rows from two or more tables, such as the student information table and course grade table above). This database supports indexing of columns, enabling faster queries based on those columns.

Because of the structured nature of relational databases, the schema (schema refers to the organization and structure of the data in the database) is determined before the data is inserted.

Common relational databases include MySQL, PostgreSQL, Oracle, and MS SQL Server.


No database

While in a relational database everything is structured in rows and columns, in a NoSQL database there is no structured schema that is common to all records. Most NoSQL databases store JSON records, and different records can contain different fields.


In fact, NoSQL databases should be referred to as “Not mainly SQL” — many NoSQL databases support queries using SQL, but using them is Not a best practice.

There are four main types of NoSQL databases:

1. Document storage database

The atomic unit of a document storage database is a document. Each document is a JSON, and different documents can have different schemas with different fields. A document storage database allows certain fields in a document to be indexed, enabling faster queries based on those fields (which would force all documents to have this field).

When should you choose it? Data analysis – Since different records do not depend on each other (in terms of logic and structure), this database supports parallel computing. We can use it to easily do big data analysis on the data.

Common document storage databases: MongoDB, CouchDB, DocumentDB.

2. Column storage database

Column storage The atomic unit of a database is a column in a table, which means that data is stored by column. Its column storage features make column-based queries very efficient, and because the data on each column has nearly the same structure, you can better compress the data.

When should you choose it? When you tend to query for a column subset of the data (the data doesn’t need to be the same subset each time you query it!) . A column-based database can perform such queries very quickly because it only needs to read those specific columns (whereas a row-based database must read the entire data).

  • This is common in data science, where each column represents a feature. As a data scientist, I often train my models with feature subsets and often check for relationships between features and scores (correlation, variance, significance).
  • This is also common in logs — we typically store many fields in the log database, but use only a few fields per query.

A common column storage database: Cassandra.

3. Key-value stores the database

Queries are based on keys only — you ask for a key and get the corresponding value. Query across different record values is not supported, such as “select all records where city == New York”. A useful feature in such a database is the TTL field (Time to Live), which can be set to a different value for each record and state when the record is to be deleted from the database.

Advantage – it’s fast. First, because of the use of unique keys, and second, because most key-value storage databases store data in memory (RAM), where it can be accessed quickly. Disadvantages – You need to define unique keys, which are good identifiers and are created at query time from data you already know. Usually more expensive than other types of databases (because it runs in memory).

When should you choose it? It is mainly used for caching because it is fast and does not require complex queries, and the TTL feature is useful for caching. It can also be used for any other type of data that needs to be queried quickly and satisfies the key-value format.

Common key-value storage databases are Redis and Memcached.

4. Figure storage database

The graph storage database contains nodes representing entities and edges representing relationships between entities.

When should you choose it? When your data is graphs like knowledge graphs and social networks.

Common graph storage databases: Neo4j, InfiniteGraph.


Relational database vs. document storage database

As you probably know by now, there is no standard answer to this question, and no single database can solve all problems. We usually use the most common relational and document storage databases, so we will compare them.

Benefits of relational databases

  • Its data structure is simple enough to match most types of data that are common in projects.
  • It uses SQL. SQL is common and inherently supports join operations.
  • Allows for rapid update of data. All the databases are kept on one machine, and the relationships between the records are used as Pointers, which means you can update one record at a time, and all of its related records will be updated immediately.
  • Relational databases also support atomic transactions. What is an atomic transaction: Suppose I want to transfer X dollars from Alice to Bob. I want to do three things: reduce Alice’s account by X dollars, increase Bob’s account by X dollars, and finally record the transaction. I want to think of these actions as an atomic unit — either all of them happen or none of them happen.

Disadvantages of relational databases

  • Because each query is done on the table — the query execution time depends on the size of the table. This is an important limitation that requires us to keep the tables relatively small and optimize our database for scalability.
  • In scaling a relational database, you can scale by adding more computing power to the machine on which the database is running, an approach known as “scaling up.” Why is this a disadvantage? This is due to the limited computing power that the computer can provide, and the fact that extending resources to the computer may require some downtime.
  • Relational databases don’t support OOP, they don’t support object orientation, and even simple lists are very complex to represent.

Advantages of a document storage database

  • It allows you to keep objects with different structures.
  • You can use cute JSON to represent almost any data structure, including object-based OOP, lists, and dictionaries.
  • Although NoSQL is schema-free in nature (meaning that it does not need to describe predefined structures, called schemas, to databases as relational databases do), it generally supports schema validation, which means that you can pattern a data set that is not as simple as a table. It is a JSON Schema with specific fields. (In this case, a schema).
  • NoSQL queries are fast, each record is independent, so query times are independent of database size, and parallelism is supported.
  • In NoSQL, scaling the database by adding more machines and distributing data between them is called “scaling horizontally.” This allows us to automatically extend resources to the database as needed without causing any downtime.

Disadvantages of document storage databases

  • Updating the data in the document store database is a slow process because the data is divided and copied between different machines.
  • Atomic transactions themselves are not supported. You can add this to your code by using validation and recovery mechanisms, but since records are divided between machines, it cannot be an atomic process, and a race situation can occur. (Note: MongoDB 4.0 already provides native transaction operations)

Quick cheat sheet:

  • For caching — use key-value to store the database.
  • For graph-like data — Use the graph to store the database.
  • If you prefer to query column subsets and query characteristics — use columns to store the database.
  • For all other use cases — use a relational database or document store database.

If you find any errors in the translation or other areas that need improvement, you are welcome to revise and PR the translation in the Gold Translation program, and you can also get corresponding bonus points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


Diggings translation project is a community for translating quality Internet technical articles from diggings English sharing articles. The content covers the fields of Android, iOS, front end, back end, blockchain, products, design, artificial intelligence and so on. For more high-quality translations, please keep paying attention to The Translation Project, official weibo and zhihu column.