Welcome to follow me on Github, and here’s my blog.

Write code that is easy to delete, not easy to extend

Chinese translation

Programming is a terrible thing — something you learn after you’ve wasted your entire life

Write code that is easy to delete, not easy to extend.

No line of code is rational, highly maintainable, and cannot be accidentally deleted.

For every line of code you write, there is a cost: maintenance. In order not to spend too much on code, we have reusable software. But there’s a problem with code reuse: it gets in the way when you want to change it later.

The more users an API has, the more code needs to be rewritten to introduce changes. Similarly, the more you rely on third-party apis, the more trouble you’ll have when anything changes. Managing compatibility between code, or dependencies between modules, is an important issue in large systems. And the longer the project goes on, the more complex the problem becomes.

My point today is that if we want to calculate how many lines of code a program has, we should not think of it as “how many lines were produced,” but “how many lines were consumed.” EWD 1036

If we think of “how many lines of code” as “how many lines of code did it take,” then when we remove the code, we reduce maintenance costs. We should try to develop disposable software, not reusable software.

I don’t need to tell you that deleting code is more fun than writing it.

To write code that is easy to remove: Repeat yourself to avoid creating module dependencies, but don’t re-manage the code. Also layer your code: Build easy-to-use apis based on modules that are easy to implement but not easy to use. Break up your code: Isolate modules that are difficult to implement and likely to change from each other and from other modules. Do not write every option dead, allowing changes to be made at run time. Don’t try to do all of these things at once, and maybe you shouldn’t have written so much code in the first place.

Phase 0: No code

The number of lines of code doesn’t tell us much by itself, but the order of magnitude can be: 50, 500, 5,000, 10,000, 25,000, and so on. A 1 million-line behemoth is obviously more painful than a 10, 000-line program. Replacing it would also take significantly more time, money, and effort.

While the more code you have, the harder it is to get rid of, one less line of code doesn’t save anything by itself.

Even so, the easiest code to remove is the code you avoided writing in the first place.

Phase 1: Copy and paste code

Writing reusable code is something that is easier to do after the fact, with examples of use in the code base, than it is to anticipate beforehand. On the bright side, you’re probably already reusing a lot of code just by using the file system, so why worry? A little redundancy is healthy.

It’s perfectly fine to copy and paste code a few times, rather than write a library function just to give the usage a name. Once you make something a shared API, it becomes harder to change.

The piece of code that calls your function depends on the intentional or unintended behavior behind its implementation. Programmers who use your functions will call them not based on your documentation, but based on what they observe.

Deleting code within a function is easier than deleting a function.

Stage 2: Don’t copy and paste code

When you’ve copy-pasted enough times, it’s probably time to refine a function. This is what “saved me from the library” : “Open a configuration file and return a hash table”, “Delete this folder”. These examples include stateless functions, or functions that have some global information, such as environment variables. These are the things that end up in a file called util.

Narrator: Create a util folder and put different functions in different files. The single util file always gets bigger and bigger until it becomes too big to split. Using a single util file is not neat.

The more common code is to an application or project, the easier it is to reuse and the less likely it is to be changed or deleted. These include logging, third-party apis, handles, or process-related libraries. Other code you won’t delete are lists, hash tables, and other collections. This is not because their interfaces are generally simple, but because their scope does not grow over time.

Instead of making all code easy to delete, try to separate the hard-to-delete parts of code from the easy-to-delete parts as much as possible.

Stage 3: Write more templates

Although we use libraries to avoid copy-and-paste, we often need to copy-and-paste to use these libraries, resulting in more code being written. But we’ll give this code another name: a boilerplate. Templates are very much like copy and paste, except that you make a few changes in different places each time you use a template, rather than repeating the exact same thing over and over again.

Just like copy-and-paste, we repeat parts of code to avoid introducing dependencies in order to gain flexibility at the cost of redundancy.

Libraries that need templates often have network protocols, wire formats, parsing suites, or things that make it difficult to interweave policies (what an application should do) and protocols (what an application can do) without limiting the options available. This kind of code is hard to remove: communicating with other computers or working with different files is often a necessity, and we never want to overwhelm it with business logic.

Writing templates is not an exercise in code reuse: we try to separate the parts that change frequently from the parts that are more stable. Library dependencies or responsibilities should be minimized, even if we have to use them through templates.

You write more code, but the extra code is in the easy to delete parts.

Stage 4: Don’t write templates

Templates are most useful when libraries need to cater to all requirements. But sometimes there’s too much repetition. It’s time to wrap up a resilient library with one that takes policy, process, and state into account. Developing an easy-to-use API is to convert a template into a library.

This is more common than you might think: One successful example is the most popular and beloved Python HTTP client module Requests, which packaged a more cumbersome library, URllib3, to give users a simpler set of interfaces. When using HTTP, Requests takes care of the general workflow, while hiding many of the actual details from the user. In contrast, URllib3 handles pipelining and connection management without hiding any details from the user.

When you package one library into another, it’s not so much to hide details as to separate different concerns: Requests are adventures over HTTP, and URllib3 gives you tools to choose your own adventures.

I’m not advocating that you build a /protocol/ and /policy/ folder, but you should definitely try to keep util uncluttered by business logic and develop easy-to-use libraries on top of easy-to-implement libraries. You don’t have to write all of one library and then write another library on top of it.

It is often a good practice to package a third-party library, even if they are not libraries of protocol classes. You can write a library that fits your code, rather than locking down one option throughout the project. Developing a usable API and developing an extensible API are often at odds with each other.

Separating out different concerns like this allows us to make some users happy without making what others want impossible. Layering is easiest when you have a good API from the start. But developing a good API on top of a poorly written ONE can be difficult. Good apis are designed with the user in mind, and layering is the realization that we can’t please everyone at the same time.

Layering is more about making code that is hard to remove easy to use (without tainting it with business logic) than it is about writing code that can be removed later.

Stage 5: Write a chunk of code

You’ve copy-pasted, you’ve refactored, you’ve layered, you’ve built, but the code still needs to do something at the end. Sometimes the best thing to do is give up and write a big piece of junk code to hold the rest together.

Business logic is code with endless boundary cases and fast and dirty hacks. That’s fine. I have nothing against it. Other styles, like “game code,” or “founder code,” are the same thing: shortcuts to save a lot of time.

The reason? Sometimes it’s easier to get rid of one big mistake than to get rid of 18 small, interlaced mistakes. A lot of programming is exploratory, and it’s faster to make a few mistakes and iterate than to think about getting it right the first time.

This is especially true for more interesting or creative endeavors. If you’re writing your first game: Don’t write it as a game engine. Similarly, don’t write a framework before you’ve written an application. Feel free to write a bunch of messy code the first time. You don’t know how to break it down into modules unless you’re a prophet.

A single library has a similar trade-off: you don’t know in advance how to split your code, and one big bug is obviously easier to deal with than 20 closely related bugs.

When you know which code will be discarded, deleted, or replaced, you can take more shortcuts. This is especially true if you are writing a one-off client site, or a page about an event. Or any place where there are templates, copies to delete, and gaps left by the framework to fill.

I’m not saying you should do the same thing ten times to correct a mistake. To quote Perlis: “Everything should be built from top to bottom, except the first time.” You should make new mistakes with every attempt, embrace new risks, and slowly improve through iteration.

Becoming a professional software developer is a process of accumulating a list of regrets and mistakes. You don’t learn anything from success. It’s not that you know what good code looks like, it’s that you remember what bad code looks like.

Projects eventually fail or become legacy code anyway. Failure is more frequent than success. Write ten big balls of mud and see where they take you faster than trying to polish a ball of dung.

It’s easier to delete all the code in one line than to delete it section by section.

Stage 6: Break your code into small pieces

Large chunks of code are the easiest to write, but also the most expensive to maintain. A seemingly simple change can affect almost every part of the code base in a specific way. What was once simple to delete as a whole became impossible to delete paragraph by paragraph.

Just as we layer our code according to independent tasks, from platform-specific code to domain-specific code, we also need to find a way to tease out the top-level logic.

Start with a series of difficult or easily changeable design decisions. Then design modules so that each module hides one design decision from the others. D. Parnas

Instead of breaking up code into modules that share common functionality, we split it up based on parts of code that are not shared. We keep the most frustrating parts of writing, maintaining, or deleting from each other.

We build modules not for reuse, but for ease of modification.

Unfortunately, some problems are more difficult and complex to separate than others. Although the principle of single responsibility states that “each module should solve only one problem”, it is more important that “each problem should be solved by only one module”.

When a module does two things, it is usually because changing one requires changing the other. A poorly written component with a simple interface is usually easier to use than two components that need to coordinate with each other.

I would never again try to define what should be accepted and accepted as a shorthand description of “loose coupling”, and I might never be able to define it in a clear and understandable way. But I recognize it when I see it, and the current code is not that kind of code. SCOTUS Justice Stewart

A system is often said to be loosely coupled if you can remove one module from it without having to rewrite others. But it’s easier to explain what loose coupling looks like than to build such a system in the first place.

Even writing a variable to death once or using the command line to mark a variable can be called loose coupling. Loose coupling allows you to change your mind without having to rewrite too much code.

For example, Microsoft Windows internal and external apis exist for this purpose. The external API is tied to the desktop application lifecycle, and the internal API is tied to the kernel. Hiding these apis gives Microsoft flexibility without losing too much software.

There are examples of loose coupling in HTTP: set up a cache in front of your HTTP server. Move the images to the CDN and just change the links to them. Neither of these will hang up your browser.

HTTP error codes are another example of loose coupling: common problems between servers have their own unique error codes. When you receive 400, try again and get the same result. It might change if it’s 500. As a result, HTTP clients can handle many errors instead of programmers.

When breaking a piece of software into smaller pieces, you must think about how to handle errors. This is easier said than done.

I reluctantly decided to use LATEX. Implement a reliable distributed system in the presence of errors. Armstrong, 2003

Erlang/OTP has a unique way of handling errors: Supervision trees. Basically, each Erlang process is initiated and monitored by a supervisor process. When a process encounters a problem, it exits. When a process exits, its monitoring process restarts it.

(These monitoring processes are initiated by a bootstrap process that restarts the monitoring process when it encounters an error.)

The idea is that it is faster to fail quickly and then restart than to deal with errors. Error handling like this may seem counterintuitive — reliability is gained by giving up processing when an error occurs. But a reboot is a panacea for temporary errors.

Error handling and recovery is best done in the outer layers of the code. This is called the end-to-end principle. The end-to-end principle says that it is easier to handle errors on the far side of a connection than in the middle. Even if the processing is done in the middle layer, ultimately the top-level inspection cannot be eliminated. If errors need to be handled at the top level anyway, why do they need to be handled at the inner level?

Error handling is one of the ways in which a system can be tightly knit together. There are many other examples of tight coupling, but it would be unfair to single out a bad design. In addition to the IMAP.

Each operation in IMAP is like a snowflake, with its own unique selection and processing. Error handling can be painful: errors can occur as a result of other operations.

IMAP uses unique tokens, not UUids, to identify each piece of information. These tokens can also be changed midway through an operation. Many operations are not atomic operations. It took 25 years to find a reliable way to move an email from one folder to another. It also uses a special UTF-7 encoding and a unique Base64 encoding.

I’m not making any of this up.

File systems and databases are far better examples of remote storage. In a file system, the types of operations are fixed, but there are many manipulable objects.

Although SQL is like a much broader interface than a file system, it still follows the same pattern. Several operations on sets, many, many operations on rows. You can’t always replace one database with another, but it’s easier to find something that works with SQL than it is to find any home-grown query language.

Other examples of loose coupling are systems with middleware, filters, and pipelines. For example, The Twitter Finagle service uses a common API, which allows generic timeout handling, retry mechanisms, and authentication to be easily incorporated into both client-side and server-side code.

(I’m pretty sure someone would complain to me if I didn’t mention UNIX pipes here)

First we layered our code, but now some of those layers share an interface: a set of identical behaviors and operations with different implementations. Good loose coupling usually means consistent interfaces.

A healthy codebase does not have to be perfectly modularized. Modular parts make coding fun, just as lego is fun because all of its parts can be put together. A healthy code base has some verbosity and redundancy, but they keep portable components just right apart so you don’t get boxed in.

Loosely-coupled code is not necessarily code that is easy to delete, but it is much easier to replace and modify.

Phase 7: Continuous coding

If you don’t have to think about old code when writing new code, it’s much easier to test new ideas. This is not to say that you need to write small modules and avoid large applications, but that your system needs to be able to support a trial or two while you are developing normally.

Feature flags are a way to change your mind later. While feature Flag is seen as a way to test different features, it also allows you to apply changes without redeploying.

Google Chrome is a good example of the benefits it can bring. They found it most difficult to maintain a fixed release cycle when merging a long-established branch of functionality.

With the ability to activate and close new code without recompiling, large changes can be broken down into smaller merges without affecting existing code. If new features appear earlier in the code base, it becomes more apparent when a long term feature development affects other parts of the code.

Feature Flag is not a command line switch, it is a way to separate Feature release and merge branches, and separate Feature release and code deployment. When software updates take hours, days, or even weeks, being able to change functionality on the fly becomes increasingly important. Ask any operations person and you’ll know that any system that might wake you up in the middle of the night is worth controlling while it’s running.

You need to have a feedback loop more than you need to iterate. Modules are more about isolating different components in response to changes than they are about code reuse. Dealing with code changes isn’t just about developing new features, it’s also about getting rid of old ones. Write extensible code in the hope that at the end of three months you’ll have done everything right. Writing code that can be deleted is based on the opposite assumption.

The strategies I talked about above – layering, isolation, common interfaces, constructs – are not about writing great software, but about developing software that changes over time.

So the question of management is not whether to build an experimental system and throw it away. You’ll do it. […] So be prepared to abandon it; You will anyway. Fred Brooks

You don’t have to throw it all away, but you do need to remove some parts. Good code is not about getting things right the first time. Good code is legacy code that doesn’t get in the way.

Good code is always code that is easy to delete.

Translator: Zhang Yongfeng