The author Zach Holman

This article is for Coding users. If you have suggestions for the translation of this article, please submit the Pull Request.

Let’s talk about deployment

Whenever you make changes to your code base, there is always the risk that you will break something.

No one likes outages, no one likes grumpy users, and no one likes angry managers. So deploying new code to a living environment can be stressful.

You don’t have to stress about it, and I’m going to repeat it over and over again:

Your deployment should be as monotonous, straightforward, and stress-free as possible.

Deploying new functionality into production should be as easy as starting a war of words on Hacker News about Spaces versus tabs. It should be simple enough for new employees to understand, it should be built to prevent errors, and it should be well tested before the first end user sees the new code.

This is an article on the high level about deployment, includes: cooperation, security and speed, etc., in terms of the underlying also talked a lot, but these are hard to cross language to generalize, and to be honest, there are a lot more closely than at a high level technology problem to solve, I prefer to talk about how teams work together, and deployment is the most critical part of collaboration with others. I think it’s worth taking the time to evaluate your team from time to time.

There’s a lot that comes from my five years at GitHub, and the advice and consulting I’ve had with tech companies large and small over the last year, about the focus on improving their deployment workflow (which has ranged from “very respectable” to “I think this server is on fire”). I recommend one startup in particular, Dockbit, whose product is designed to address collaboration in deployment, and so on. This article was based on a number of conversations I had with their teams, and I thought it would be helpful to write down the different parts of many deployment challenges.

I am grateful to some friends from different companies for helping with this article and offering their different perspectives on deployment: Corey Donohoe (Heroku), Jesse Toth (GitHub), Aman Gupta (GitHub), and Paul Betts (Slack). I keep finding that different companies can take interesting different paths, but they tend to focus on the basics of collaboration, risk and caution, and I think there’s something universal here.

Anyway, I’m sorry for the lengthy introduction, but this article is going to be a long one anyway, please try to read it lol.

directory

  • Isn’t target deployment a solved problem?

  • Get ready to start preparing for deployment: tests, Feature Flags, and how your development works together

  • Branching Branching for your code setup is an essential part of deployment. When deploying new code you can isolate the parts of it that lead to any unintended consequences. Start thinking about deploying branches, automatically deploying master branches, blue/green deployment.

  • Control the core of deployment. How can you control the code that is released. Deal with the different permissions structures in deployment and merge, set up a set of audit trails for your deployment, and keep everything in order through deployment locks and deployment queues.

  • Monitor Cool, your code is already in production. Now you can be concerned about how your deployment monitors metrics in different ways, and ultimately decide whether to roll back the code for your changes.

  • Conclusion “What have we learned, Palmer?” “I don’t know, Sir.” “I don’t fucking know. I guess what we’ve learned is, don’t do it anymore.” “Yes, Sir.”

How to Deploy Software was originally published on March 1, 2016.

Isn’t deployment a solved problem?

If you’re talking about pulling code and sending it to different servers, then things have been worked out pretty nicely and these things are pretty boring. You’ve got Capistrano (a remote server automation tool) in Ruby, Fabric(a Python class library and command line tool) in Python, Shipit in Node (a general purpose automation and publishing tool for Javascript writing), all amazon cloud services, and even FTP seem to be around for centuries, so tools really aren’t an issue right now.

If we have pretty good tools at this point, why did the deployment go wrong? Why do people always post bugs? Why are there always outages? We’re all perfect programmers writing perfect code, and that’s the hell with it.

Obviously, things happen unexpectedly, so I think deployment should be an interesting area for small and medium sized companies to focus on. No other area has such a good input/output ratio. Do you have procedures in place to deal with and respond to problems early? Can you use different tools to facilitate deployment more easily?

It’s not a matter of tools, it’s a matter of processing.

I’ve told many, many startups over the years that there hasn’t been a deployment workflow that looked “good” from an organizational perspective.

You don’t have to have someone in charge of deployment, you don’t have to have a specific deployment date, you don’t have to have everyone on every deployment. You just have to do something smart.

Start on a good foundation.

You must walk before you can run. I think one of the most fashionable aspects of startups is that they all use the coolest and latest deployment tools, but when you cut in and look at their processing, they spend 80% of their time on the processing base. If they had been streamlined from the start, everything would have been handled properly and more quickly.

test

Testing is one of the easiest places to start. This is not a necessary step in some superficial dependency processing, but it has a huge impact.

Many of these tips depend on your language, platform, or framework, but the general advice is to test your code and speed up your testing.

My favorite quote Ryan Tomayko wrote in GitHub’s internal testing document:

We can make good tests faster but we can’t make fast tests better.

So start with a good foundation: do a good test, and don’t skimp on that, because it affects the direction of everything.

Once you start to have a quality test suite that you can count on, although it costs money to get started. If you have any kind of income or funding from your behind-the-scenes team, pretty much the number one place you should spend money is where you should run your tests. If you use Travis CI or CircleCI, if you can run a parallel compiled build then repeat what you did today. If you need to run on specific hardware, buy huge servers.

I’ve seen companies gain the most important productivity advantage by moving to a faster test suite, and you earn it, too, because it affects the iterative feedback cycle, reduces dependency time, increases developer happiness and makes it habitual. Spend money on problem solving: servers are cheap, but programmers are not.

I did an informal poll on Twitter asking my Followers how fast their test suite was running. Admittedly, it’s hard to explain the tiny service, the surprising number of language differences among people who have never been tested. But it’s still pretty obvious when it comes to full stack versus faster unit testers. Most people wait at least five minutes after a push to see the build status.

How fast does fast mean? When I was on GitHub, tests were typically run in 2-3 minutes. We don’t have a lot of integration testing, so it allows for relatively fast testing, but the faster you test, the faster you get developer feedback loops.

There are many projects designed to help you build projects in parallel. In Ruby there is parallel_tests and test-queue. For example, if your tests are not completely independent of each other, it is a good idea to write tests differently, but if the opposite is true, you should also deal with it.

Feature Flags

Another aspect of all of this is to start looking at your code and transforming it to support multi-channel deployment code paths.

Again, our goal is that your deployment should be as monotonous, straightforward, and stress-free as possible. The natural pressure to deploy any new code is that the code runs with problems you can’t foresee, and you end up affecting the behavior of your users (i.e. the downtime and errors they experience). Even if you have the best programmers in the universe, bad code will eventually be deployed. It doesn’t matter if the bad code affects 100% of your users or just one user that is very important to you.

A simple method is used to deal with it, that is, Feature Flag. Feature Flags have been around for a long time, at least, technically, since the invention of the if statement, and the first time I remember actually hearing about a company using Feature Flags was in a Flickr article in 2009. Flipping Out

This allows us to turn on the functionality we’re working on without affecting the functionality other developers are working on. It also lets us test individual features on or off.

Flags that only you can see, or that only your team can see, or that all employees can see are two different things: you can test your code in the real world with real data and make sure everything works; You can also get real benchmarks about the performance and risks of the feature when it was officially released.

All of this is great for you to be ready to deploy the new functionality, all you need to do is change a line of code to true, and everyone sees the new code. This makes the often scary deployment of new releases monotonous, straightforward, and stress-free.

Verifiable correct deployment

As an additional step, Feature Flags provide a good way to prove that your upcoming code deployment will not adversely affect performance and reliability. Several new tools have been developed in recent years to help you do this.

I talked about it a few years ago in my presentation “Move Fast and Break Nothing”, which basically involves running the Feature Flag in two code paths in production and only returning the results of the old code to see how the new code you introduce performs against the new code you are about to replace. Once you have that data, you can make sure you don’t spoil anything. Deployment will be monotonous, straightforward, and stress-free.

An open source Ruby library on GitHub called Scientist can help you abstract a lot. The library has been ported to the most popular languages at this point, so if you’re interested, it might be worth your time to take a look.

Another approach is grayscale publishing. Once you’re confident that the code you’re deploying is spot-on, you still carefully expose only a small number of users to double-check and triple-check until nothing breaks. It’s better to break the experience of 5% of your users than to break the experience of 100% of your users.

There are plenty of libraries designed to help you, from Rollout in Ruby, Togglz in Java, fflip in JavaScript, and many more. There are plenty of startups offering solutions to this problem, like LaunchDarkly.

It’s also worth noting that this isn’t just about the Web. Native applications can also benefit from this. Just take a quick look at GroundControl, the library that handles presentation in iOS.

Feel good about your code build? Great, let’s get out of that now and start talking about deployment.

Branch management

Many organizational issues surrounding deployment are hampered by a lack of communication between the deployer and others. You want everyone to know every aspect of the code you’re about to launch, and to do this without stepping on their toes.

Here are a few interesting ways to help you, all of which depend on the simplest element of deployment: branching.

Code branch

Branches, I mean branches of Git, Mercurial, and other version control systems. Cut out a branch, program on that branch, and then push code to your favorite code-hosting platform (GitLab, Bitbucket, Coding, etc.)

You should also use Pull Requests, Merge Request, or other code review tools to review written code. Deployment must be collaborative, and code review is an important part of that. We’ll talk more about this later.

Code review

The topic of code review is too big, too complex, and depends on your team and risk profile. I think there are several important issues for all teams to consider:

  • You’re in charge of your branch. Every successful company I’ve seen has the idea that the ultimate responsibility for failing to deploy code is the person who wrote it. They don’t blame the deployment failure on the people who went live and got up and ate, of course those people should be involved in the code review, but most importantly you are responsible for your own code. If it fails, you fix it……. Not your poor OPS team. So don’t mess it up.

  • Start early and review frequently. You don’t need to complete a branch and then ask for a review. If you can initiate a review request for prospective code, for example, spending 20 minutes on it and being told, “No, we don’t have to do that,” is much better than spending two weeks later writing the code.

  • You always need someone to review the code. You can rely on a team to do this, but having an eye for reviewing is very helpful. For highly structured companies, you may want to explicitly assign someone to do the code review and ask them to start reviewing the code before it’s finished. For less structured companies, you can assign different teams to see who can best help you. At both ends of the spectrum, you set the expectation that someone will give you a hand before you rush, or deploy the code on your own.

Branch and deploy rhythms

There’s an old joke about code reviews. Whenever you open a review request for six lines of code, you’re going to get a lot of criticism from colleagues about those six lines of code. But when you push a branch of code that took weeks, you often get a quick response: “Yeah, I think so!”

Basically, programmers are often a bunch of annoying slackers.

But you can use it to your advantage by: using smaller branches and Pull requests as quickly as possible. Keep your code small enough that it can be easily accessed and reviewed. If you write a large branch, it will take a long time for someone to review it and slow down the development process.

How to make code smaller? This is where the feature Flags mentioned above come in handy. When three people on my team rebuilt GitHub Issues in 2014, we pushed hundreds of small Pull Requests to Production using Feature flags. We deployed many small units (before they were “perfect”). This makes code reviews easier, while allowing faster deployment and early visibility into the state of the product online.

You need to deploy quickly and frequently. A 10-person team can comfortably deploy at least 7-15 branches per day. Again, the smaller the DIFF, the more monotonous, straightforward, and stress-free the deployment.

The deployment of the branch

When you are ready to deploy your new code, you should always deploy your branches before merging them. Note “always”.

View the entire code base as a record of fact. Your master branch (or whatever default master branch you specify) should act as an absolute mirror of your production environment. In other words, you need to make sure that your main branch is “fine,” meaning that there are no known problems with that branch.

Branching is a big problem. If you merge your branch to master and then deploy the master branch, you can’t simply tell if the code is working properly. The “fine” branch is the branch that doesn’t need to do any nasty code rollback. This isn’t necessarily rocket science, but if your deployment crashes the site, you’ll eventually need to rethink it. You need a simple solution.

This is why it is important that your deployment tool should support your deployment branch. Once you confirm that your performance is not fluctuating, there are no stability issues, and the functionality is available as expected, you can merge it. The reason for doing this is not to make sure things work, but to prevent things from not working. When it goes wrong, your solution should be monotonous, straightforward, and stress-free: just redeploy the Master branch. That’s it. You’re back to “no problem.”

Automatic deployment

It’s important to have a clear definition of your “known state,” and the easiest way to do that is to have a simple no-mistake rule:

Unless you are testing a branch, everything deployed to production is always represented by the Master branch. The easiest way I’ve seen is to keep automatic deployment on the Master branch. This is a super simple rule set that encourages everyone to make the least risky commit to the branch.

There are many tool platforms available, such as Heroku, which automatically deploys the latest branches. CI tools such as Travis CI can also help you deploy automatically. Or proprietary Heaven and Hubot-deploy-Tools (which we’ll cover in a moment) can also help.

Automatic deployment also helps when you merge your working branch into the master branch. Your tool should be able to pick a new revision and redeploy the site. Although the software content remains the same (you’re effectively deploying the same set of code), the SHA-1 value changes, which makes the known state of the production environment more clear (again, the master branch is known state).

Blue green deployment

Martin Fowler praised blue-green deployments in his 2010 article (well worth reading). In it, Fowler talks about the idea of using two ideal production environments, which he calls “blue” and “green.” Blue indicates the online production environment, and green indicates the idle production environment. You can deploy to a green cluster, verify that everything is working, and then switch to a blue cluster through a seamless switch (such as load balancing). In this way, the production environment receives risk-free code.

One of the challenges of automated deployment is switching software from the last step of testing to production.

This is a very powerful idea, made even more so by the increasing popularity of virtualization, container technology, and (easily throwaway and forgotten) own environments. In addition to a simple blue/green deployment, you can also keep the production environment flowing because everything is virtual.

There are many solutions, from disaster recovery to adding time to test critical features before users see it, but my favorite is the code to use additional features.

Using new code is very important in the product development cycle. Of course, many problems should be found ahead of time in code reviews or through automated testing, but if you’re trying to make real products, sometimes it’s hard to predict until you’ve tried real data for a long time. This is why a blue-green deployment is more important than having a simple temporary server whose data may be out of date or completely fabricated.

More importantly, if you need your code to be deployed in a specific environment, you can bring in different stakeholders early on. Not everyone has the technical ability to pull your code onto their computer and install it locally – and it shouldn’t! For example, if you can give your accounting department a screen showing off your new launch, before the whole company sees it, they can give you some realistic feedback about it, which can help you find many errors and problems early on.

Heroku Pipelines

Whether you use Heroku or not, take a look at the concept of “Review Apps” in their ecosystem: Apps are deployed directly from a Pull Request and go live instead of screenshots or big descriptions of “this is what it will look like when it goes live”. Get more people involved early rather than trying to convince them later with a bad product.

Controlling the Deployment process

You see, when I talk about how a startup is organized, I’m all hippie free Yuppie: I believe in developer autonomy, a bottom-up approach to development that focuses on people rather than management. I think it makes people happier and makes the product better. But when it comes to deployment, well, it’s a very important, all-or-nothing thing to do well, so I think it makes sense to add control here.

Fortunately, the deployment tool is to put in limits that take the stress out of everyone, so if you do it right it’s a huge benefit, not a hindrance. In other words, your process should help get things done, not get in the way.

An audit trail

I was surprised at some startups that didn’t get audit logs quickly enough. Although there may be some chat logs available, this shouldn’t be something you can’t pull out when you need it.

The benefit of an audit trail is exactly what you would expect: you can find out who deployed when and where. When you run into problems later, you can roll back to a node, which saves a lot of time.

Many services provide this type of deployment log. Amazon CodeDeploy and Dockbit, for example, provide broad deployment tools and provide good tracking tools. GitHub’s excellent deployment API is also a great way to integrate Pull Request deployments into your external systems.

GitHub development API

If you are in expert mode, many databases and services such as InfluxDB, Grafana, Librato or Graphite need to be plugged in during your deployment and deployment time. Being able to compare a given metric to a deployment layer metric is very powerful: seeing an unexpected metric increase might make you curious at first, but not when it’s a deployment happening.

The deployment of lock

If you get to the point where you have a lot of people in a code base, you’re going to have a lot of people ready to deploy their code at some point. Of course it is possible to deploy multiple branches into production at the same time, but I suggest that when you get there, you need tools to handle this situation. Deployment locking is the first thing to look at.

Deployment locking is basically what you’d expect: locking the production environment so that everyone can deploy in sequence. There are many ways to do this, but the most important thing is that you make it visible.

The easiest way to do this is through conversation. A common way to do this is to set deployment commands to lock the production environment, for example:

/deploy / to 
Copy the code

i.e.,

/deploy api/new-permissions to production
Copy the code

This makes it clear to everyone what you are deploying. I’ve seen companies using Slack go into a Slack deployment chat room and say: I’m deploying… ! I don’t think it’s necessary, it just distracts your colleagues. Here, just throwing information into the chat room is enough. If you forget to do this later, you can also add an additional command to return the system to its current production status.

There are many easy ways to plug this workflow into your chat room. Dockbit has a Slack integration. There is also an open source solution called SlashDeploy that integrates with GitHub and Slack. (Bearychat.com offers a similar service.)

I’ve also seen some special web-based tools for this step. Slack has a custom internal App for visual deployment. Pinterest has an open source Web-based deployment system. You can extend the idea of locking to other areas, depending on how to make your team work most efficiently.

Once the deployment branch is merged into the master branch, the production environment should be automatically unlocked for the next person to operate.

There is also a certain lock-in etiquette. You certainly don’t want people waiting for a careless programmer to forget to understand the lock production environment. Automatic unlocking tools come in handy. For example, you can also set up a timer to alert the deployer if their production environment has been locked for more than 10 minutes. Tenet is: take a shit and get out of here.

The deployment of the queue

Once you have a lot of deployments to schedule and you have a lot of people ready to deploy, you obviously have some deployment arguments. For this, choose from your inner English gentry and form a deployment queue.

A deployment queue has several parts: 1) add your name to the end if you need to wait, and 2) allow someone to jump the queue (some very important deployments need to be executed immediately and you need to allow this)

The only problem with deployment queues is that there are too many people queuing to deploy. GitHub has been dealing with this issue for the past year or so; Everyone wants to deploy their changes on Monday, and the deployment list seems to last an hour or more. I’m not a big advocate of microservices, but I think one of the benefits of deploying queues is that you can hack things out of majestic boulders.

permissions

There are many ways to restrict who can use deployment permissions.

Two-step verification is an option. It’s best that your employee chat account isn’t made public, and it’s best that they have other security measures on their computer (full encryption, strong passwords, etc.), but if you want to feel at ease, it’s best to ask them to turn on two-step authentication.

Maybe you already have chat services like Campfire and Slack that offer two-step verification. Coding.net also provides [two-step verification](coding.net/help/doc/mo…) If you need two-step verification before deployment, you can add two-step verification to your process.

Another possible approach is what I call “riding shotgun” with investigators who are outside their purview. I’ve seen a lot of processes or tools, formal or informal, to ensure that at least one senior developer is involved in every deployment. There is no reason not to, for example, require your deployer and senior developer (riding shotgun) to verify that the code is deployable.

Appreciate and examine your work

Once you’ve deployed your code, it’s time to test whether you really did what you wanted to do.

Check your Playbook

Whether it’s updating the front end, back end, or any other code, each deployment must conform to the same policy guidelines. You have to see if the site is still working, if performance suddenly gets worse, if there are more bit error rates, if there are more feedback issues, etc. So streamlining that strategy will be in your best interest.

For different aspects of the above, if there are multiple sources of information, try adding a link to each dashboard, for example, at the time of final deployment confirmation. This is a reminder each time to observe and verify whether the changes have a negative impact on the metrics.

Ideally, get your information from one source. This is easier to guide, for example, when a new employee should observe important metrics on his first deployment. Printerest’s Teletraan, for example, contains all the information in one interface.

Measure index

There are a number of metrics you can collect that will help you determine if you just deployed successfully.

The most significant, of course, is the bit error rate. If it suddenly jumps, it means you may have to redeploy the master branch and fix the problems. These processes can be automated and even set a threshold, such as automatic redeployment if the bit error rate is exceeded. If you are sure that master is a branch that is familiar to you and can be rolled back, then it will be easier to roll back the deployment automatically if a large number of exceptions are triggered after you deploy.

Deployment itself is an interesting metric to keep in hand. A good example is an overview of deployments over the past year, which can help you understand whether the pace of deployment is amplifying or why it’s slowing down. You can take a closer look at who is deploying, who is causing errors, and develop a way to check if your team’s developers are reliable.

Clean up after deployment

The final chore to do is cleaning up.

Feature Toggles is one of the worst technical debits to discuss this, although the title is a bit radical. If you are building projects with feature flags and staff development, you run the risk of complicating your codebase in the long run:

Supporting code branching with piping and scaffolding logic is a nasty technical liability, since every functional switch has been introduced since. Feature Flags make code more vulnerable, harder to test, understand, maintain, support, and less secure.

You don’t need to deploy and clean up immediately; If you have a need for a new feature or bug fix, you should take the time to monitor system metrics rather than immediately remove code, although you should do so soon after deployment. If you have a major release, you can review it a day or a week later and remove the code that is no longer needed. One thing I like to do is prepare two pull Requests: one to switch Feature flags (i.e., open the Feature to everyone) and the other to clean up any redundant code you introduce. Once I’m sure I haven’t broken anything and it looks good, I can merge the second pull Request with no more thought or development.

You also need to celebrate yourself: this is the ultimate signal that you have successfully completed the project. Everyone would love to see the diff almost red, and it would be nice to remove code.

Delete the branch

When you’re done, you can also delete the branch, which is guaranteed. But if you use GitHub’s Pull Request, you can usually keep the deleted branch, which is like deleting it from the branch list, without actually losing any data. This process can also be done automatically: periodically run a script to check your old branches that have been merged into master and delete them. Coding.net Merge Request also provides automatic deletion of branches after merging.

The whole game

I only get emotional about two things: a touching photograph of a golden retriever leaning on his best friend on a mountaintop, looking out to sea at the sunset; Then there is the deployment workflow. I care so much about it because it’s the most critical part of the game, and at the end of the day, I only care about two things: how my colleagues feel and what the product I’m working on is like. For me everything else comes from those two sides.

Deployment can cause stress and frustration, especially if your company’s development pace is slow, and can slow and prevent you from adding new features and fixing bugs for your users.

I think it’s worth thinking about this and optimizing your workflow. It pays to take the time to make your deployment as monotonous, straightforward, and stress-free as possible.