• Classes vs. Data Structures
  • By Robert C. Martin
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: EmilyQiRabbit
  • Proofreader: JalanJiang, suhanyujie

Classes and Data Structures

What is a class?

A class is a specification for a collection of similar objects.

What is the object?

An object is a set of functions that operate on encapsulated data elements.

More specifically, an object is a set of functions that operate on implicit data elements.

What do implied data elements mean?

Object provides some functionality, which means that the object contains some data elements; However, this data is not directly accessible from outside the object; from outside the object, it is not visible.

So the data is not in the object?

It’s possible, but it’s not mandatory. From the user’s point of view, an object is just a set of functions. The data that these functions operate on must exist, but the location of the data is unknown to the user.

Well. Ok, I see.

Very good. So what is a data structure?

A data structure is a group of highly correlated data elements.

Or, put another way, a data structure is a set of data elements that are operated on by implicit functions.

Ok, I get it. A function that operates on a data structure is not defined by the data structure itself, but its existence implies that the operating function must exist.

Yes. Now, what do you notice about these two definitions?

They’re sort of in opposition to each other.

That’s true. They’re complementary. They fit like a hand and glove.

An object is a set of functions that operate on implicit data elements. A data structure is a set of data elements that are operated on by an implicit function.

Wow, so objects are not data structures.

Correct. Objects and data structures are antagonistic.

So, Dtos — data transfer objects — are not objects?

Correct, Dtos are data structures.

So database tables are not objects, right?

Correct. A database contains data structures, not objects.

And so on. ORM — Object relational mapping — doesn’t it map database tables to objects?

Of course not. There is no mapping between database tables and objects. Database tables are data structures, not objects.

So what does ORM do?

They transfer data between data structures.

Do they have nothing to do with the object?

Yes, nothing at all. There is no such thing as object-relational mapping; There is no mapping between database tables and objects.

But I think ORM builds business objects for us.

No, ORM abstracts the data that our business objects operate on. This data is loaded by ORM and exists in the data structure.

So it’s not the business objects that contain these data structures?

It may contain. Or maybe not. But this is not the responsibility of the ORM.

Looks like a little semantic point.

Not at all. This distinction has important implications.

For example?

Such as designing database schemas and designing business objects. Business objects define the structure of business behavior. The database schema defines the structure of the business data. The two structures are constrained by very different conditions. The structure of business data may not be appropriate for business behavior.

Well, it’s very confusing.

One way to think about it is that the database schema is not tailored for just one application; It has to serve the entire enterprise. So the structure of the data is a compromise between many different application requirements.

Ok, I get the point.

Very good. Now consider each individual application. The object model of each application describes how these application behaviors are constructed. Each application has a different object model that is tailored to the behavior of each application.

Oh, I see. Because the database schema is a compromise for various applications, it can mismatch any application’s object model.

Right!!! Both objects and data structures are constrained by very different conditions. They rarely fit perfectly. This is commonly referred to as object/relationship impedance mismatch.

I’ve heard this before. But I thought this impedance mismatch was resolved by ORM.

Now you know a different answer. There is no impedance mismatch because objects and data structures are complementary, not isomorphic.

What did you say?

They are opposites, not similar entities.

The opposite?

Yes, in a very interesting way. You see, objects and data structures mean opposite control structures.

Wait a minute. What did you say?

Imagine a set of object classes that all conform to a common interface. For example, a picture class representing both sizes has the area of a calculated shapeareaAnd perimeterperimeterMethods.

Why do all software examples always include graphics?

Let’s consider two different shapes: a square and a circle. As we all know, these two kinds of perimeter and area calculation functions operate on different implicit data structures. We also know that the way these operations are invoked is through dynamic polymorphism.

And so on. A little more slowly. What did you say?

There are two different ways to calculate the area; One is used to calculate square areas and the other is used to calculate circles. When a caller calls an area function based on an object of a particular type, the object determines which function to call. We call this dynamic polymorphism.

good This is the case. Objects determine how methods are implemented. Of course it is.

Now, let’s replace the object with a data structure. We will use Discriminated Unions.

What is a Discriminated union?

Discriminated Unions, in this case, are actually two different data structures. One for squares and one for circles. The data elements of a circular data structure include a central point coordinate and a radius. It also has a type code that says it represents a circle.

You mean, like an enumerated type?

Yes. The data structure of the square contains the points in the upper left corner, as well as the side lengths. It also has code for identifying types — an enumerated type.

Yeah, two data structures and one type code.

That’s right. Now let’s consider the area function. It needs to switch states internally, doesn’t it?

Well, in two different situations, yes. One for square area and one for circle area. A function that computes the perimeter at the same time requires a similar state switch.

That’s right. Now think about the structure in these two scenarios. In the object scenario, the two implementations of the area function are independent of each other and to some extent type dependent. The area function of the square belongs to the square, and the calculation of the area of the circle belongs to the circle.

Yeah, I see where you’re going. In the case of data structures, the two implementations of the area function are in the same function and do not “belong” to either type.

Things became clearer and clearer. What code would you have to change if you wanted to add triangle types to objects?

No code needs to be modified. You must create a new triangle class. But I think the way instances are created needs to change.

That’s right. So when you add a new type, you need to change very little. Now, say you want to add a new function — a function that computes the center point.

So now you have to add this function to all three types: circle, square and triangle.

Very good. So, adding a new function is more difficult, and you need to modify each class.

But with data structures, things are different. In order to add the triangle type, you have to change the state of each function by adding triangles to it.

Yes. Creating new types is also difficult, and you need to modify each function.

But when you add new center functions, nothing else needs to be changed.

That’s right. Adding new functions is easy.

Wow, that’s the opposite of what I said above.

It is. Let’s review:

Adding new functions to a set of classes is difficult; you need to modify each class. Adding new functions to a set of data structures is easy; you just need to add functions and nothing else needs to change. Adding a new type to a set of classes is easy; you just need to add a new class. Adding new types to a set of data structures is difficult; you need to modify each function.

Yes, it is. But it’s antagonistic in a very interesting way. I mean, if you’re adding new functions to a set of types, then I’d want to choose to use data structures. But if you want to add new types, then you want to use classes.

You made a great point! But today we have one last thing to think about. In another respect, data structures and classes are also antagonistic. It has to do with dependency.

Rely on?

Yes, source code depends on this aspect.

Okay, I’m freaking out. What’s the difference?

Consider the data structure scenario first. Each function has a switch statement that selects the appropriate implementation based on the enumeration type code.

Yes, that’s true. But what does that matter?

Imagine we called the area function. The object calling the function depends on the area function, and the area function depends on each particular implementation.

How does it “depend”?

Imagine that each area calculation method is implemented in a function of the object itself. So there’s going to be circular area, square area and triangular area.

Ok, so the switch statement only calls these functions.

Imagine that these functions are in different source files.

These source files with switch statements need to be imported, used, or included in all of them.

Correct. This is source code dependency. One source file depends on another. So what is the direction of this dependence?

The source file with the switch statement depends on the source file that contains all the implementation functions.

What about the caller of the area function?

The caller of the area function depends on the source file with the switch statement, which in turn depends on the source file with all the implementations.

Correct. All source file dependencies point in the direction of the invocation, from the caller to the implementation. So, if you make a wrong change in one of these implementations…

Ok, I see what you mean. A change in either implementation will cause the source file with the switch statement to be recompiled, which will cause any function that uses the switch statement, such as our area calculation function, to be recompiled.

Yes. At least for language systems that rely on the date of the source file to determine which modules should be compiled.

They almost all use static typing, right?

Yes, but some of them aren’t.

That requires a lot of recompilation.

It also requires a lot of redeployment.

Ok, but can these weaknesses be addressed in a class scenario?

Yes, because the caller to the area function depends on an interface, and the function responsible for implementation depends on that interface.

I understand. The source file for the square class introduces, or uses, or contains the shape interface.

Yes. The source file containing the implementation is useful in the opposite direction of the call. They point from the implementation to the caller. At least that’s true for statically typed languages. In dynamically typed languages, the callers of the area function depend on nothing at all. Its dependencies are only found at run time.

Yes, it is. So if you change one of these implementations…

Only modified files need to be recompiled or deployed.

This is because the dependencies between source files are in the opposite direction of the call.

Correct. We call this dependency inversion.

Ok, let me see if I can summarize this. Classes and data structures are antagonistic in at least three ways.

  • Classes expose functions and hide data. Data structures expose data but hide functions.
  • Classes make it easy to add types, but hard to add methods. Data structures make it easy to add functions, but difficult to add types.
  • Data structures allow callers to compile and deploy repeatedly. Class isolates the caller from the parts that need to be repeatedly compiled and deployed.

You were right about all that. These are the things that every good software designer and architect needs to keep in mind.

If you find any mistakes in your translation or other areas that need to be improved, you are welcome to the Nuggets Translation Program to revise and PR your translation, and you can also get the corresponding reward points. The permanent link to this article at the beginning of this article is the MarkDown link to this article on GitHub.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, front-end, back-end, blockchain, products, design, artificial intelligence and other fields. If you want to see more high-quality translation, please continue to pay attention to the Translation plan of Digging Gold, the official Weibo, Zhihu column.