Suggestions for database table design

1. Relationship between original documents and entities

Tables are designed with one-to-one, one-to-many, many-to-many relationships. In most cases, the design of the table is one-to-one: that is, one original document corresponds to only one entity.

In special cases, they may be one-to-many or many-to-one relationships, i.e. one original document for multiple entities, or multiple original documents for one entity. The entities here can be thought of as basic tables.

2. Primary and foreign keys

In general, an entity cannot have both primary and foreign keys. In the e-R diagram, an entity in the leaf position may or may not have a primary key (because it has no descendants), but must have a foreign key (because it has a father).

The design of primary key and foreign key plays an important role in the design of global database. A primary key is a high abstraction of an entity. The pairing of a primary key with a foreign key represents the connection between entities.

3. Properties of basic tables

Basic tables are different from intermediate and temporary tables because they have the following four features:

(1) atomicity. Fields in the base table are not decomposable.

(2) primitiveness. Records in a base table are records of raw data (underlying data).

(3) Deductibility. All output data can be derived from data in the base table and code table.

(4) Stability. The structure of the basic table is relatively stable, and the records in the table are to be preserved for a long time.

Once you understand the nature of basic tables, you can distinguish them from intermediate and temporary tables when designing your database.

4. Paradigm standards

The relationship between the base table and its fields should meet the third normal form as much as possible. However, database design that meets the third normal form is often not the best design. In order to improve the efficiency of database operation, it is often necessary to reduce the standard of paradigm: to increase the redundancy appropriately, to achieve the purpose of space for time.

“

[2] There is a basic table for goods, as shown in Table 1. The existence of the field “amount” indicates that the design of the table does not meet the third normal form, because “amount” can be obtained by multiplying “unit price” by “quantity”, indicating that “amount” is a redundant field. However, adding the “amount” of the redundant field can improve the speed of the query statistics, which is the practice of space for time.

In Rose 2002, there were two types of specified columns: data columns and computed columns. Columns such as “amount” are called “calculation columns”, while columns such as “unit price” and “quantity” are called “data columns”.

Table 1 Table structure Name Model Unit Price Quantity Amount TV 29 “2,500 40 100,000

5. Understand the three paradigms informally

A simple understanding of the three paradigms is of great benefit to database design. In database design, in order to better apply the three paradigms, it is necessary to understand the three paradigms popularly (popular understanding is sufficient understanding, not the most scientific and accurate understanding) :

The first normal form: 1NF is the atomicity constraint on attributes, which requires attributes to have atomicity and cannot be decomposed again.

Second normal form: 2NF is a constraint on the uniqueness of records, requiring records to have a unique identity, that is, the uniqueness of entities;

Third normal Form: 3NF is a constraint on field redundancy, that is, no field can be derived from any other field, which requires that the field have no redundancy.

No redundant database design can do that. However, a database without redundancy is not necessarily the best database, and sometimes in order to improve the efficiency of operation, it is necessary to lower the paradigm standards and retain redundant data appropriately. The specific approach is to follow the third normal form in the conceptual data model design, and reduce the standard of the normal form to the physical data model design. To reduce the normal form is to add fields, allowing redundancy.

6. Be good at identifying and correctly handling many-to-many relationships

If a many-to-many relationship exists between two entities, it should be eliminated. This was eliminated by adding a third entity in between. In this way, what was once a many-to-many relationship is now two one-to-many relationships. The attributes of the original two entities should be properly allocated among the three entities. The third entity here is essentially a more complex relationship that corresponds to a base table. In general, database design tools cannot recognize many-to-many relationships, but can handle many-to-many relationships.

“

In the library information system, the book is an entity, and so is the reader. The relationship between these two entities is a classic many-to-many relationship: a book can be borrowed by multiple readers at different times, and a reader can borrow multiple books. Therefore, to increase the third entity between them, the entity named “borrowed books”, its properties are: borrowed time, borrowed marks (0 borrow books, 1 book), in addition, it also should be two foreign keys (” book “of primary keys,” the reader “primary key), make it can connect with” book “and” the reader “.

7. Primary key PK value method

PK is a tool for programmers to use between tables, can be no physical meaning of the number string, by the program to automatically add 1 to achieve. It can also be a field name or a combination of field names that have physical meaning. But the former is better than the latter. When PK is the combination of field names, it is recommended that the number of field names be not too large. If the number of field names is too large, index space is occupied and the index speed is slow.

8. Recognize data redundancy

The repeated occurrence of primary and foreign keys in multiple tables is not data redundancy, and it must be clear that many people are not. The repetition of non-key fields is data redundancy! And it’s a low-level redundancy, which is repetitive redundancy. Advanced redundancy is not the repetition of fields, but the derivation of fields.

“

[example 4] : The three fields of “unit price, quantity and amount” in a commodity, “amount” is derived from “unit price” multiplied by “quantity”. It is redundancy, and it is a kind of advanced redundancy. The purpose of redundancy is to increase processing speed. Only low-level redundancy increases data inconsistencies, because the same data may be entered multiple times, from different locations, and roles. Therefore, we advocate high level redundancy (generative redundancy) and oppose low level redundancy (repetitive redundancy).

9. There is no standard answer for E–R

The E–R diagram of information system has no standard answer, because its design and drawing method is not unique, as long as it covers the business scope and functional content of the system requirements, it is feasible. Instead, modify the E–R diagram. Just because there is no single answer doesn’t mean it can be designed arbitrarily. The standard of a good E-R graph is: clear structure, concise association, moderate number of entities, reasonable attribute allocation, and no low-level redundancy.

10. Intermediate tables, reports and temporary tables

An intermediate table is a table that holds statistics and is designed for use in a data warehouse, output reports, or query results, sometimes without primary and foreign keys (except for data warehouses). Temporary tables are designed by individual programmers to store temporary records for personal use. Base and intermediate tables are maintained by the DBA, and temporary tables are automatically maintained by the programmer himself.

11. Integrity constraints are manifested in three aspects

Domain integrity: Use Check to implement constraints. In the database design tool, when defining the value range of a field, there is a Check button that defines the value city of the field. Referential integrity: PK, FK, table level triggers. User-defined integrity: It is a set of business rules implemented with stored procedures and triggers.

12. The way to prevent database design patching is the “three less principle”

(1) The fewer tables in a database, the better. Only when the number of tables is less, it can be explained that the e-R graph of the system is less and more precise, and the repeated redundant entities are removed, which forms a high abstraction of the objective world, carries on the data integration of the system, and prevents the patch-type design.

(2) The fewer fields in a table that combine primary keys, the better. Due to the primary key’s function, one is to build the primary key index, and the other is to serve as the foreign key of the child table, so the number of fields combined with the primary key is less, which not only saves the running time, but also saves the index storage space.

(3) The fewer fields in a table, the better. Only the number of fields is less, it can show that there is no data duplication in the system, and there is little data redundancy. More importantly, it urges readers to learn to “change columns to rows”, so as to prevent the fields in the child table from being pulled into the main table, leaving many empty fields in the main table.

The so-called “column change row” is to pull out part of the contents of the main table and create a separate child table. This method is very simple, some people are just not used to, do not adopt, do not implement.

The practical principle of database design is to find the right balance between data redundancy and processing speed. “Three less” is a whole concept, comprehensive point of view, can not isolate a certain principle.

The principle is relative, not absolute. The “three more” principle is definitely wrong. Consider this: if you cover the same function of the system, an E-R diagram with 100 entities (1000 attributes) is much better than an E-R diagram with 200 entities (2000 attributes).

Advocate “three little” principle, is to ask the reader to learn to use the database design technology to carry on the system data integration. The steps of data integration are to integrate file system into application database, application database into topic database, and topic database into global comprehensive database.

The higher the degree of integration, the stronger the data sharing, the less information island phenomenon, the number of entities, the number of primary keys, the number of attributes in the global E-R graph of the entire enterprise information system will be less.

Advocated the principle of “three little” purpose, is to prevent the reader using patching technology, constantly to increase bowdlerize database, make the enterprise database becomes a random “dump” design a database table, or database table “painting”, finally caused the base table in the database, code table, the intermediate table, temporary table desultorily, countless, As a result, the information system of enterprises and institutions cannot be maintained and paralyzed.

Anyone can do the “three more” principle, which is the “patching method” of database design. “Three less” principle is the principle of less and better, it requires a higher database design skills and art, not anyone can do, because the principle is to put an end to the use of “patch method” design database theoretical basis.

13. Methods to improve the efficiency of database operation

Under the given system hardware and software conditions, the methods to improve the operation efficiency of the database system are as follows:

(1) In the physical design of the database, reduce the paradigm, increase the redundancy, and use fewer triggers

(2) when the calculation is very complex and the number of records is very large (for example, 10 million), the complex calculation should be completed outside the database by using C++ language in the way of file system, and finally added to the database table.

(3) If it is found that there are too many records in a table, such as more than 10 million records, the table should be split horizontally. In horizontal split, records of the table are split horizontally into two tables based on a value of the primary key PK. If a table is found to have too many fields (for example, more than 80 fields), the table is split vertically. The original table is split into two tables.

(4) In the use of data-oriented SQL language programming, try to take the optimization algorithm.

In a word, to improve the operation efficiency of the database, we must from the database system level optimization, database design level optimization, program implementation level optimization, these three levels at the same time.

Suggestions for database table design

Related Posts

Backtracking is awesome!

Describe how data is managed in a distributed environment in a distributed system?

An interesting combination of PHP deserialization with some unexpected WordPress bugs