What exactly does operation and maintenance do? It is estimated that even the operation and maintenance engineers themselves do not know, and they can hardly get the answer in Baidu. After searching for many old operation and maintenance employees, they finally summed up the job content of operation and maintenance engineers:

Generally speaking, operation and maintenance engineers are the operation and maintenance engineers of Internet enterprises. They usually belong to the technical department, which is the four major departments supporting Internet product technology as well as research and development, testing and system management. The division between domestic and foreign companies and between large and small companies will be different. The main tasks are as follows:

1. Ensure the long-term and stable operation of the business system

After all, if the business system has a little error, users will complain, so the core job of operation and maintenance engineers is to ensure the stable operation of the business system.

First of all, we need to know what the business runs on. Generally speaking, website servers are Nginx and Apache, etc., which rely on mysql database for data storage and PHP for parsing. Therefore, operation and maintenance engineers must master the knowledge of LNMP, LAMP and other environment deployment.

2. Ensure data safety and reliability

Data security is the most important part of the company leaders, operation and maintenance engineers also need to ensure the safety and reliability of data, if there is a little mistake, the leaders will call operation and maintenance for tea.

Sometimes you need to manually change the content of the database, you must learn to master the knowledge of mysql database add, delete, check and change;

Sometimes when the server hardware that needs to handle the database fails, you need to make a master/slave copy of Mysql for a rainy day.

Sometimes you need to restore the database, you need to learn mysql incremental backup and restore, restore to a specified point in time;

Sometimes a timed backup is not enough, so you need to use rsync+inotify for real-time backup.

Sometimes to increase server security, iptables is used to control company IP or jumper IP access.

3. Build monitoring and alarm system

Zabbix and Nagios are commonly used by operation and maintenance engineers to carry out alarm monitoring. If there is no monitoring operation and maintenance, it is blind. Therefore, it is necessary to build an alarm monitoring system first, and then solve the system failure.

Generally speaking, common faults include application faults, database faults, network cable faults, etc. Some faults are software faults, and sometimes hardware faults. An experienced OPERATION and maintenance engineer can locate the fault cause in the first time.

4. Deal with technical and business problems

There are two core problems, namely technical problems and service problems. Technical problems mainly need network packet capture analysis, tcpdump packet capture analysis and proxy mechanism.

Business problems are more complex than technology. For example, data analysis at the business level requires not only statistics of various indicators of the business, but also analysis and anatomy of the data to find out where the business problems lie.

5. Version test and launch

This is also the common work of operation and maintenance engineers. They are responsible for testing and launching versions. Before developers release versions, operation and maintenance engineers need to conduct performance and functional tests. In addition, it is better to launch the version in the evening when the volume of business is small, to avoid too much pressure to launch.

conclusion

Operations and development are two very different directions. If you do operation and maintenance, there is a foundation for development so the post transfer is not impossible.

Operation and maintenance is responsible for the operation and maintenance of specific product lines. Meanwhile, it is necessary to master the ability of development, go deep into the business, understand the pain points and problems of the business, develop/optimize the platform, tools and means for the business needs of the product, be able to access all kinds of excellent system architectures and be able to compare the advantages and disadvantages. Meanwhile, the control of business determines the role of operation and maintenance engineers in business development.