How to become a Top DevOps Engineer

If you’re not familiar with the concept of Devops, it’s okay to jump to Wikipedia and read the DevOPS article. With that vague idea in mind, let’s put aside all the hype and hype surrounding DevOps and consider why the position has emerged in recent years. More on DevOps

In software development, one person can do several things: product design, development, testing, operations, etc. Control the project schedule without considering the communication cost of multi-person collaboration.

Unfortunately, such a beautiful scene can only appear in small projects or the early stage of the project, an excellent product is often composed of many sub-projects, is a huge system engineering, requires the cooperation of many people to make it on time delivery.

In a company’s R&D department, every project often involves a development team, a test team, and an operation and maintenance team. After designing the architecture and determining the technical route, the project leader will assign the development task to the development team according to the functions and modules. After the developer completes the development, it will be handed over to the tester for testing. The project leader will iterate repeatedly until the expected goal is achieved through the integration test, and the project leader will hand it to the operation and maintenance team to complete the delivery or launch of the product. Progress will be continuously tracked by the project manager. This is the most common scenario of software development in software companies and Internet companies.

More on DevOps

Doesn’t this process look good? What’s the problem?

It’s a big question. It’s like talking about reality and ideals. More on DevOps

First of all, the architecture given by the technical director is not so reasonable, and the business is not fully decoued and modular. In the development process, it is found that those seemingly independent development work and strong dependency relationship.

Then, some cool languages, development frameworks and design patterns were used in the technical route given, but there were many unbroken pits in the dark, leaving hidden dangers of operation and maintenance. In subsequent online operations, the development/operations staff found some very strange phenomena but could only scratch their heads.

Then, the developers are of varying skill, and while they write the code that is so amazing, they also give away a bunch of known and unknown bugs for free, so that when they take over the work or maintenance, they can barely read the magic symbols left by the previous generation, and then it’s refactoring, refactoring, refactoring.

At the same time, the versioning of the code was haphazard, resulting in numerous problems at deployment time.

Later, the testers took the code out of the box and called the developers to blame, but failed to catch all the bugs, leaving behind some unknown bugs that surfaced one by one with a midnight ring on the ops’ phones.

Finally, the day of integration came. Each team took the subsystem/module/component ABCDE to integrate. When running the integration test, they found all kinds of unexpected problems.

Finally, the code will be in the hands of the operations staff, and the baton will be passed to the last kilometer, which will be the most chaotic battlefield: Operation and maintenance personnel refer to the deployment documents provided by developers for deployment. Unfortunately, some developers have poor documentation, and most of them don’t write documentation, run to operation and maintenance personnel and hand them a Lotus King. You just need to execute start.sh prepared by me. Then, the operation and maintenance personnel compile and package the software, and sometimes they are forced to discard their integrity by the project manager with the eye of the tiger behind. How fast they come, KPI is more important, and directly on the source code. After a few tests, the software is delivered to the customer in a panic, or the service goes live.

So is this the end of the baton pass? In the days that followed, the operation and maintenance staff would wake up every night with a damning alarm message. In order to restore business to normal, the developers rushed to fix bugs and hotfixes without writing tests, and some even modified the environment directly online.

Then everyone goes to sleep and wakes up forgetting everything that happened last night, until one day, the developers deploy a new upgrade pack, and the old bug reappears, and the new version introduces a new bug, and the service doesn’t start properly. The OPERATION and maintenance personnel need to roll back the database, but they did not consider the rollback policy in advance. They had to manually roll back the database, but found that the database table format was changed.

On the other side of the world is the customer’s browser: 503 Service UnAvailable. Oh, my God, what a stupid website.

Then, after listening to the sales manager’s report, the Boss angrily summoned all the bosses in the R&D department. The heads of R&D, testing, and operations began to engage in a heated exchange of ridicule…

The end of the whole. More on DevOps

What do we do?

There is plenty of evidence to suggest that in large organizations there is more of an “us versus them” tribalism that hampers productivity. And these tribal cultures are not limited to “development VS operations”, but also “product VS sales”, “marketing VS development” and “development VS quality assurance”.

In the real world, development leaders are always happy to introduce new technologies, frameworks and as many new features as possible into new releases in order to get bragging rights at technical conferences. For the sake of operation and maintenance stability, the boss of the operation and maintenance group always prefers to change as little as possible. The project manager always wants to progress as fast as possible, constantly forcing developers to cut tests for the sake of progress, etc. More on DevOps

If, I mean if:

If the guys in charge of the architecture can do a better job of decoupling.

If, the man in charge of the technical route can start from the perspective of operation and maintenance, use more mature technology.

If developers work harder to improve the quality of their code.

If the testers had more coverage, they would have caught a few more bugs.

If the o&M personnel are more experienced and familiar with all the mainstream and emerging technologies on the market, they will use advanced automation tools for deployment and monitoring.

Unfortunately, there is no if.

The architecture design needs a long time of high intensity of accumulation, not heart can achieve perfect.

The use of new technologies will improve productivity, but also bring potential dangers. It is impossible to determine which double-edged sword you are holding by just a few days of technical research without in-depth use.

The level of developer is affected by many factors, you can’t just hit CTRL + C, CTRL + V at the best developer.

Testing only reduces the probability of failure, not the absence of bugs.

Similarly, operations personnel cannot be familiar with every technical detail.

What should you do

These are things that are hard to change. More on DevOps

but

And then you show up and yell I’m DevOps.

People stare at you like you’re an alien: What does the existence of DevOps mean?

We exist for the harmony and happiness of the team.

We are all helpless and suffering now, can you help us out of this dilemma?

We will adopt a unified protocol and a comprehensive tool chain to resolve the current impasse.

Unified code management First of all, there must be a unified standard in code version control, detailed to the naming and classification of the warehouse, code branch protocol and software release cycle management must have a unified standard to constrain.

Why does this need to be done by DevOps? First of all, the research and development, test, and operations are involve to management of the code, so need someone to unify all the code management department, secondly in terms of version control and branch development specification, make all staff only need to read the same document, you can complete the corresponding work, for example, like to use the master as a development team develop branch, Other teams use the Master branch as the production branch, which causes the operations staff to be distracted during deployment.

Secondly, in terms of the quality of the code, the code review mechanism is introduced so that experienced developers can review other people’s code, so as to reduce bugs and improve the quality of the code.

Of course, manual review does not guarantee that the code is foolproof. In each code submission, the corresponding test code should be accompanied by automated testing tools to ensure that the submitted code logically meets the expectations.

At the same time, only code that passes tests and human reviews is put into the repository and packaged.

Frequent integration build from the project can be integrated on the same day, all items will be frequently integration build, run the unit test, function test and human test, etc., and will build a failure of the error log is sent to relevant personnel, and then identify the reasons for the failure of the integration, and must be solved in the day.

Frequent integration builds spread the integration risks left over to the final day, making the project’s development schedule manageable.

We will use lifecycle management and system configuration management tools to write deployment code. Before writing these scripts, you will need to communicate with development/operations again and again. Do not compromise on specifications until you have achieved your goals. Finally, the deployment code is delivered to the operations personnel, and all software deployment is done automatically through the tool without human intervention.

Strengthen the communication between the team In the whole process of software development, development operations, operations don’t understand development is a very common thing, because we want to strengthen the communication between the team, we will go to understand why will hate that several dbas do back-end developers, operations staff why complains the deployment of a project.

I’ve hidden a lot of details here, giving you a general idea of what DevOps is all about.

One of the things devOps does is discipline, and discipline is a prerequisite for keeping teamwork organized.

Second chain using continuous integration tools such as Gitlab, Jenkins, Gerrit, Foreman, Puppet instead of manual operation, such as use automated methods to reduce repeated labor and avoid man-made error.

Another important task is communication, facilitating collaboration between various teams.

Do we need full-time DevOps people

If you find yourself already doing any of these things, you’re already doing DevOPs-related work. So can we have everyone doing their job instead of having one position?

My answer is no.

For every position, there is a responsibility.

If you’re a developer, do you care if the operations team’s code management is messed up, and do you take responsibility for it?

If you’re an operations person and your development team’s code gets pushed to the warehouse without anyone reviewing it or running tests, do you care and take responsibility for it?

But if you’re a DevOPS person, you have to correct the messy code management, and you have to strangle the submission of code without human review and without running tests at the source.

So, we need someone in charge of Devops.

But I’m not a fan of having a dedicated DevOPS person on a small team. In my opinion, a DevOPS engineer, first and foremost, has to be a qualified development/test/operations person, and Devops shows that he has another important responsibility.

More on DevOps

Because devops can’t solve real problems if they’re not on the job, if they don’t touch the pain points of the team.

Therefore, a “part-time” Devops is my ideal DevOPS engineer. More on DevOps

On the Top

I’m glad you had the patience to read this far. If you can do the job above, congratulations, you’re a Qualified DevOps engineer.

Wait, isn’t the title of the article how to be a Top DevOps Engineer? !

Well, I don’t think you can become a top Devops engineer by reading a cliched article. As a DevOPS, you have to be strong. You have to have a voice. Otherwise, how do you deal with r&d, testing, and operations? As A Devops, you have to be familiar with and even proficient in every area. Otherwise, how can you create a set of rules that make sense? As Devops you have to be familiar with the tools for continuous integration, otherwise how can you pick the tool chain that fits your team’s actual needs? You have to be good at communicating as a Devops, otherwise you’re not going to know what everyone’s really thinking. Before you become a DevOPS, you should plan your efforts into Dev,Test, and Ops, put yourself in their shoes, and then go back to Devops and rethink what should be done.

DevOps requires trial and error. Don’t be afraid of failure and frustration, which are invaluable sources of experience, but never fall down the same pit twice. I like the saying that an expert is someone who has made all the mistakes in a very small area. More on DevOps

※ Some articles from the network, if any infringement, please contact to delete; More articles and materials | click behind the text to the left left left 100 gpython self-study data package Ali cloud K8s practical manual guide] [ali cloud CDN row pit CDN ECS Hadoop large data of actual combat operations guide the conversation practice manual manual Knative cloud native application development guide OSS Operation and maintenance actual combat manual cloud native architecture white paper Zabbix enterprise distributed monitoring system source document 10G large factory interview questions