The original source
8 Reasons Why MySQL’s ENUM Data Type Is Evil

MySQL enumeration type “eight deadly SINS”

Wednesday, March 2, 2011

The MySQL enumeration (ENUM) type is a hot topic of discussion in the programmer community. At first glance, we can do a good job of limiting recorded values to what is allowed by enumerating types. A typical example would be a data sheet with a field called “continental plates” : each country sits on a continental plate that is unlikely to change very often. Sure, maybe one day the North American plate will collide with the Asian plate to form North American Asia, but even if your database lasts until then, at least you won’t need to discuss how to reconstruct your data sheet. That would be the job of the developers at the time.

Back to the point. If ENUM were the only choice that could represent which continent a country belongs to, we could move on to debating the merits of NoSQL, Git versus SVN, and the shortcomings of your preferred framework. But there is a general best practice for implementing enumerations:

Wikipedia describes relational tables like this:

… This is a table that separates known enumeration data. For example, in the warehouse data of a relational database, the “object” in the warehouse might have a “status” field that records the declared value, such as “sold, scheduled, sold out.” In the minimalist database design, these values are stored in separate relational tables called “states,” which satisfies the Database Normalization.

Therefore, relational tables can also satisfy the implementation of enumeration. Here’s what ENUM’s “Eight deadly SINS” are:

1. Data is being treated incorrectly

Male and female; Sir, Madam, Madam; Africa, Asia, and so on. These short words that people use as ENUM type fields are called data. When you use an ENUM type field, technically you are pulling the data out (corresponding to the actual table) and putting it in a separate position (a type of database metadata with precisely defined fields). This is not the same as constraining data types, as we usually do: numeric fields can only store integer data, or date fields can’t be null — both of which are fine and important. When we use ENUM type fields, we are actually saving part of the data as a characteristic of the data model. In short, ENUM type fields break normal form requirements. This may seem very “academic” or “pedantic”, but it is the source of the following crimes.

2. Changing an ENUM type field is expensive

One thing that never changes is that every time you create an ENUM field you say, “This field cannot change.” Human beings generally lack the ability to see the big picture, and the prediction is even worse, such as the r&d department’s new product line, your company’s new shipping scheme, the North American plate colliding with the Asian plate.

Using ALTER TABLE to ALTER an entire TABLE’s ENUM type fields is very resource-intensive. If ENUM(‘red’, ‘blue’, ‘black’) is changed to ENUM(‘red’, ‘blue’, ‘white’), MySQL needs to reconstruct the entire table and retrieve all data to check for the invalid value ‘black’. MySQL is really stupid, it does do this every time you add a new ENUM value! (It is rumored that the efficiency of ENUM type fields will be addressed in the future, but I doubt it will be taken seriously.)

Full table refactoring may not be as painful in small tables, but in the case of large amounts of data it can lead to resources being locked up for a long, long time. If you use relational tables instead of ENUM fields, changing the enumeration set is just INSERT, UPDATE, and DELETE, which is a funny comparison.

It is important to note that when changing an enumeration set of fields of type ENUM, MySQL converts any existing record that does not exist in the new enumeration set to a value of ” (an empty string). With relational tables, there is a lot of flexibility in changing and deleting enumerated collections (as discussed below).

3. It is almost impossible to add additional attributes to linked data

There hasn’t been a more sensible way to change an ENUM type field, and that’s our norm. In our “country, continental plate” example, what happens if you change the “land area”? We didn’t expect this property, but take it as it comes. Using relational table design, we can easily extend the continental plate table, adding data and fields to it in any way we want. ENUM? Stop it.

Another wonderful flexibility is the ease of expanding relational tables. A simple flag bit field indicates whether this “enumerated value” is available. So, when your company stops selling black accessories, you simply make a note of the “black” in the IS_Discontinued field. And you can still see the colors that have been sold, and your orders for black ornaments will still count. ENUM, would you like to try it?

4. Obtain all possible values of ENUM

A common requirement is to display data existing in a database in a drag-and-drop list, for example:

Select color:

Red blue black

If these values are stored in a table named ‘colors’, all you have to do is SELECT * FROM colors to dynamically display the data in a draggable list. You can add or change colors in the color relational table, and the color options for your cool order will update automatically, which is awesome. The example given here is equivalent to: “By managing in the background, you can limit the options available to the front-end user for certain types of data.” Such functions.)

Back to enums: How do you get all enumeration values? You can of course use ENUM values with DISTINCT values, but this will only return the ENUM values that are actually used and exist in the optional values of the ENUM field in the table, not all possible values. You could also query INFORMATION_SCHEMA and parse the returned data through code to find all the values of the ENUM you want, but this would be completely redundant. In fact, I still haven’t found any elegant and native SQL way to get all the values of an ENUM type field.

5. The optimization provided by ENUM type fields is limited

The usual justification for using enums is “optimization”, such as improved performance, simplified models and better readability.

So let’s look at performance. You can do a lot of crazy and outrageous things in an unoptimized database, but most of the time, performance doesn’t really matter until you get to a certain size, and often our products are nowhere near that size. It is important to note that because database developers are keen to make their designs fully normal, they will only consider antiparadigms when they encounter performance problems. If you’re worried that using relational tables will slow you down, test your performance against the same benchmark in different ways and think again. Do not assume that associated queries will be a bottleneck, and sometimes they are not. Evidence to support that ENUM isn’t always appreciably faster than alternatives.

Another argument for ENUM optimization is that enUms can effectively reduce table foreign keys in a database. No, using foreign keys is like connecting a lot of different boxes with lines, and in large systems, paradigm design has reduced the need for complex human understanding boundaries. But why do we design the model, why do we abstract it so that we can understand it. Try making a new data model diagram or ER diagram, and ignore the small details and relational tables. Sometimes using enUms is as simple as it seems, but in fact you need to have an implicit relational table in mind, so it’s not as simple as it seems.

6. The ENUM value cannot be directly reused in other data tables

When you create an ENUM field with a value, you cannot reuse the ENUM directly in other tables. With relational tables, the same form of application can be reused in other data tables. When you change one data in a relational table, multiple other tables get a response.

The separation of ENUM type fields will enable you to reuse the same ENUM value across multiple tables (consistency is required).

7. ENUM type fields have obvious pitfalls

If you set a field ‘color’ ENUM(‘blue’, ‘black’, ‘red’) and you want to INSERT a row of data, but the ‘color’ field is ‘purple’, MySQL will change the invalid value to ” (empty string). It works fine, but if we use relational tables with foreign keys, our data will be more robust.

Similarly, MySQL associates enumeration indexes with ENUM values and mistakenly calls indexes instead of ENUM values in use, and vice versa.

Imagine this:

CREATE TABLE test (foobar ENUM('0', '1', '2')); mysql> INSERT INTO test VALUES ('1'), (1); Query OK, 2 rows affected (0.00 SEC) Records: 2 Duplicates: 0 Warnings: 0 mysql> SELECT * FROM test; + -- -- -- -- -- -- -- -- + | foobar | + -- -- -- -- -- -- -- -- + | 1 | | 0 | + -- -- -- -- -- -- -- -- + 2 rows in the set (0.00 SEC)Copy the code

We inserted a ‘1’ (string) and accidentally inserted a 1 (no quotes, numeric). MySQL will treat our numeric data as if it were an enumerated index (not wrong, but confusing), and according to the index, the first value of the ENUM field is 0. Enumeration index starts at 1

8. Poor portability of ENUM

ENUM types are not SQL standards and belong to MySQL, and other DBMSS do not necessarily have native support. PostgreSQL, MariaDB, and Drizzle (the latter two branches of MySQL), I only know that these three are supported ENUM. If someone plans to migrate the database, they will “love you more” because they will have to take more steps to process your “delicate” ENUM fields. If it were you, you’d know you were smart enough. In general, database migration doesn’t happen very often, and it’s the “eighth sin” because everyone assumes that something must go wrong in the process.

When is it appropriate to use ENUM:

1. When you need to store accurate, unchanging values

The best example is the continental plate, which is well defined. Another common example is the address Mr, Mrs, Miss, or the suit in poker: diamonds, clubs, hearts, spades. However, even these examples sometimes need to be stretched (such as when someone needs you to address “Dr. Chen” instead of “Mr. Chen,” or when you use joker cards in your poker game).

2. You never need to store additional relational information

Take the poker example. Poker, played for all ages, relies on the rules of clubs and spades being black and diamonds and hearts being red (e.g., yukos). What if we needed to associate additional information for suits, such as colors? If we were using a relational table, we would simply add fields to the relational table, no big deal. If we use ENUM to represent suits, it will be difficult to accurately represent the relation between suits and colors, so we can only achieve this relation at the application layer.

3. The number of ENUM values is greater than or equal to2And less than20a

If you have only two ENUM values, you can simply replace the ENUM with the more efficient TINYINT(1) or the more efficient BIT(1) (MySQL5.0.3 and above). For example: gender ENUM(‘male’, ‘female’) can be changed to is_male BIT(1). When you only have two options, it is perfectly possible to distinguish them by Boolean values true/false, combined with the “is” keyword in the field name. As for 20 caps, yes, enums can actually hold up to 65535 values, but please don’t try. More than twenty values become cumbersome, and more than fifty are inevitably difficult to manage and use.

If you’re going to use ENUM anyway:

1. Do not use numeric values

ENUM is defined as character data for a reason. This is not to say that you are wrong to use numeric field types to store numbers, but there is ample evidence that MySQL internally uses numbers to reference indexes (see article 7 above). Don’t store numbers in enUms anyway, OK?

2. Consider using strict mode

Enable strict mode to at least report an error if you insert a nonexistent ENUM value. Otherwise, a warning simply appears and the value is set to an empty string “” (enumeration index 0). Copy notes: If you set IGNORE, errors will still be ignored.

conclusion

Do what makes sense from a development, maintenance perspective, and optimize when performance issues arise — generally, using relational tables versus using ENUM types is a matter of debate.

There is no denying that performance bottlenecks are abused. Developers waste a lot of time thinking about it, worrying about it, and (for example) running speed on non-critical code. These demands on efficiency have a great negative impact on debugging and maintenance. We should ignore the small percentage of efficiency, as premature optimization is the root of all evil in terms of 97%.

While we shouldn’t give up on optimizing the 3%, a good developer shouldn’t be complacent and should be aware of the need to be smart about key code. But this should also be done with an understanding of the code. – Donald Knuth


komlenic.com is the weblog/playground of Chris Komlenic, a full stack developer and generalist living in central Pennsylvania.