This is the 7th day of my participation in Gwen Challenge

Paradigms and antiparadigms are often mentioned in database design specifications. Understanding the concepts and principles of paradigms is very helpful for us to design data sheets. However, paradigms are not perfect. In actual development, it is often based on paradigms design, and then anti-paradigms design is added according to the actual business situation, forming a hybrid pattern. That is, there is rarely, in fact, a complete paradigm design or a complete anti-paradigm design.

The difference between a paradigm and an antiparadigm

For the concept of paradigms, you can do your own Internet search, and in most cases, the first three paradigms are described:

  • First normal form: Each column is atomic, meaning that the meanings of each column are irresolvable and ambiguous. For example, the name field itself contains the first and last name. If you want to treat the two as separate entities, you need to split them into two fields. If you don’t need a separate field that’s fine.
  • Second normal form: Each column of the data table is associated with the primary key. This means that each table cannot hold multiple entity data, only data related to its own entity. The key here is whether fields of other entity attributes need to be redundant.
  • Third normal form: each column of the data table is only directly related to the primary key, not indirectly. That is, the columns of the table and the primary key of the table represent direct attributes of the entity, not associated attributes.

For the anti-paradigm, it allows information to be redundant or stored in multiple different data tables. Take the classic people, departments, and supervisors. The simplest design is to put all three directly into the same table (many traditional Excel records data this way).

CREATE TABLE t_employees (
  employee VARCHAR(32),
  department VARCHAR(32),
  head VARCHAR(32));Copy the code

This approach can lead to inconsistencies in the event of data modification. For example, If Zhang SAN, Li Si and Wang Wu are in the same department, the head of Zhang SAN’s department has changed and the data of Li Si and Wang Wu need to be updated at the same time. In addition, a department must rely on employee information to exist. Deleting all employees in a department will cause the department information to be lost. To avoid this problem, we need to create two entity tables:

CREATE TABLE t_employees (
  employee VARCHAR(32),
  department VARCHAR(32));CREATE TABLE t_department (
  department VARCHAR(32),
  head VARCHAR(32));Copy the code

This is only second normal, but it is much better than the previous approach.

Advantages and disadvantages of paradigm design

Normal design is usually recommended for performance problems. It has the following advantages:

  • Table updates are faster than anti-paradigm.
  • Because there is no redundant data, there is less data to change and less storage space for a single table.
  • The lack of redundant data means there is less need for queries that use DISTINCT and GROUP BY, which can be done BY directly querying the associated main table.

The disadvantage of a typical table is that it usually requires at least one join table query, or even a joint query of multiple tables. This is not only costly, but also can cause some index policies to fail.

Advantages and disadvantages of anti-paradigm design

The most important feature of anti-typical tables is that the same table contains all information, thus avoiding joint queries. Without the use of federated queries, the worst-case scenario for most queries is a full table scan (without using indexes). Even this is faster than a federated query that doesn’t hit the cache because it avoids random I/O access. A single-table index strategy for anti-canonical tables is more effective. Imagine an UGC application where some of the users are VIP users. Then, if you want to see the last 10 messages of a VIP user, if you use a paradigm data table and use an index on the release date, the query might look something like this:

SELECT content, user_name
FROM user_content
	INNER JOIN user on user_content.user_id=user.id
WHERE user.account_level='vip'
ORDER BY user_content.published DESC LIMIT 10;
Copy the code

When executing this query, MySQL needs to scan the published index and check each row from the user table to see if the user is a VIP user. If only a small number of users are VIPs, it will be very inefficient. With an anti-paradigm design, you can redundancy the user account type into the user content table and add the syndicated index (account_level, published), eliminating the need for a syntable query:

SELECT content, user_name
FROM user_content
WHERE account_level='vip'
ORDER BY published DESC LIMIT 10;
Copy the code

Of course, anti-paradigm design also has its disadvantages. On the one hand, the storage space will increase after the data table is redundant, and on the other hand, if the main table corresponding to the redundant column changes, a large number of data row updates may be required. For example, if a user’s level is reduced from VIP to normal, the user’s data in the corresponding user content table needs to be updated synchronously (of course, it also depends on whether the service needs to be updated synchronously).

Actual development and application

Both paradigm design and antiparadigm design have advantages and disadvantages, so how to choose? In fact, full paradigm and anti-paradigm design are only laboratory tests, not practical applications. In real development, it is usually a mixture of the two, usually partial normal form data tables, cache tables, or other techniques. The most common form of anti-paradigm data design is redundancy, caching columns from other data tables. For example, the user content table above only redundant user account levels, which avoids the synchronization problems of completely anti-paradigm inserts and deletions. It also keeps the user content table from being too large, but the efficiency improvement is obvious. The side effect, however, is that updating a user’s account level requires updating the user content table at the same time, depending on how often the user level is updated and how often queries are made. Another scenario where data is redundant is sorting. For example, user content that needs to be sorted by author name can be expensive to design on a paradigm basis, whereas redundant author names and indexes in user content tables can be very efficient. For example, if we want to know how many posts are made by each user, we can add a field to the user table to count the number of posts made by each user and update this field with each post. In this way, if you need to query the number of users’ content items or sort by the number of users’ content items, you do not need to perform the sum operation every time from the user content table.

conclusion

The concept of normal and anti-normal database design itself is worth referring to, through their ideas we can better understand the database design. In the actual development process, it depends on the actual business to decide which approach to follow. This usually does not follow a single paradigm for the entire database, but rather combines the strengths and weaknesses of the business at the data surface level.