The authors introduce

Lin Weihao, SecDevOpsor, engaged in data network, network security and game operation and maintenance at China Telecom and netease Games successively. He has done a lot of research on Linux operation and maintenance, virtualization and network security protection, and currently focuses on network security automatic detection and defense system construction.

With the development of IT technology and business and the emergence of all kinds of security vulnerabilities, operation and maintenance and security are increasingly integrated, and people pay more and more attention to operation and maintenance security, so a new cross-field called “operation and maintenance security” has gradually emerged.

Hackers and white hats are busy digging security loopholes in operation and maintenance, and enterprises are busy building security systems in operation and maintenance.

The author based on their own years of operation and maintenance security practice, also to discuss a few.

In accordance with the idea of asking questions and responding to answers, this paper first presents the author’s understanding of OPERATION and maintenance security, and explains the reasons for attaching importance to operation and maintenance security. Then, according to the bad working habits and common problems that enterprises face at the front line of operation and maintenance security, the classification of general operation and maintenance security problems is sorted out. After that, the next part will also put forward a good operation and maintenance security form: not only lies in the safety consciousness of engineers, but also lies in a relatively complete operation and maintenance security system, from process to technology, point, line and plane are created together.

First, what is operation and maintenance security?

Let’s first look at a Venn diagram. In reality, the relationship between business, operation and maintenance, and security is interrelated and interdependent:

From this chart, three different security-related sub-specialties are derived: “Operation and maintenance + Security”, “Security and operation + Operation and maintenance”, and “Business + Operation and maintenance + Security”. In the recruitment of Internet companies, we often see operation and maintenance security engineer, security operation and maintenance engineer, these two positions are better suited. “Service + O&M + security” is usually included in the position of security engineer. In recent years, application O&M security engineer is more in line with the positioning of “service + O&M + security”.

1, operation and maintenance security = operation and maintenance + security

O&m security studies the discovery, analysis, and blocking of o&M related security problems, such as operating system or application version vulnerabilities, access control vulnerabilities, and DDoS attacks. Obviously, operation and maintenance security is based on operation and maintenance. From the perspective of enterprise architecture, it usually belongs to the operation and maintenance department or infrastructure department, and the professional sequence of operation and maintenance security engineers generally belongs to operation and maintenance engineers.

2. Safe operation and maintenance = security + operation and maintenance

Security operation and maintenance studies the operation and maintenance of security systems or equipment, such as firewall, vulnerability scanner maintenance, vulnerability mining and emergency response, etc. It is also obvious that security operation and maintenance belongs to the security department, and the professional sequence of security operation and maintenance engineers also belongs to security engineers.

3. Application O&M security = service + O&M + security

Application of o&M security research is business o&M and security, mainly including security risk assessment and security scheme planning and design and implementation. In terms of organizational structure, the position belongs to security department, business department, and corresponding professional sequence belongs to security engineer, and development engineer.

By comparing the differences of the three sub-majors “o&M + Security”, “Security + O&M” and “Business + O&M + Security”, we have clarified the research fields and job responsibilities of o&M security. See here, you may have a question, what is the cause of operation and maintenance security now so “scenery”?

Second, why do we attach importance to operation and maintenance security?

It can be said that 2013-2014 was a watershed in the development of operation and maintenance security. What is special about these two years is that several major applications as Internet infrastructure were exposed or attacked one after another, such as Struts2 remote code execution vulnerability, Openssl heartbleed, Bash shell breaking vulnerability, and “the largest DDoS attack in history” at that time, which resulted in a large number of.cn and.com domain names being unable to be resolved. Since then, enterprises have rapidly increased their investment in operation and maintenance security, and various operation and maintenance security issues have attracted widespread attention. Until today, operation and maintenance security has become the top priority of enterprise security construction.

1. A leaky software supply chain

  • Struts2 Remote code execution vulnerability

When S2 came out, the Internet howled. Here are some of the companies affected. Are there any that you don’t recognize?

  • Openssl is bleeding

Just like S2, it’s deadly.

  • The ios app developed by Xcode is infected with Trojan horses

The researchers found that 76 of the top 5,000 apps on the AppStore were infected. It turned out that the culprit was developers downloading the Xcode development environment from non-Official Apple sources.

2, operation and maintenance security vulnerabilities accounted for a significant proportion

Since a cloud left, it has to be said that the sharing of the domestic Internet security situation gradually closed, but this opportunity, also emerged a lot of commercial companies.

Even if it is now left a certain day a certain method, can query the statistical analysis data is actually very limited. Even the user experience is not good enough, and the statistical analysis function is not good enough. The rest, all sorts of study also never include operational safety problems in separate statistical category, so here to borrow 2016 CNVD statistics, can be found obviously belongs to operational safety loopholes in network equipment and operating system vulnerabilities, proportion is more than 20%, combined with various applications included in the application vulnerabilities to bug, It is believed that the proportion of vulnerabilities attributed to o&M security will be significant.

3. Cost-effective utilization of operation and maintenance security vulnerabilities

The attack on operation and maintenance security vulnerability is a typical “one or two thousand dollars”, with very high ROI: small investment, easy to discover and use, and great harm.

The risk of o&M security vulnerability is measured according to Microsoft DREAD model as follows:

Common bad habits of operation and maintenance

The frequent occurrence of o&M security incidents is not only due to the blank or failure of o&M or safety regulations, but also due to the lack of strong o&M security awareness among o&M personnel and the existence of bad safety habits in daily work.

Here’s a list of 14 potholes you can try to figure out if you’ve ever stepped in the same place before.

1. After modifying iptables, the configuration is not restored, and iptables is even cleared and disabled

Temporarily emptying iptables for testing is understandable, but many people forget to restore and don’t have an automatic restore mechanism in place.

iptables -F

2, the script does not check “*”, Spaces, variables

If we accept that “not only the user’s input can’t be trusted, but our own input can’t be trusted”, we will have fewer such pits.

rm -rf /$var1/$var2

3. The service starts to listen to all addresses by default

This is the default configuration of most apps, and enabling listening on all addresses without effective access controls is not far from dangerous.

The bind – address 0.0.0.0

4, to open too much permission to the file, anyone can read and write

This is a bit like PHPInfo, giving intruders a push.

chmod 777 $dir || chmod 666 $script

5. Start the service as root

For most operations people, once on the machine, they switch to root and start the service from root in one go.

#nohup ./server &

Don’t bother with authentication or access control

This is similar to listening on arbitrary addresses, and is usually a default configuration that the user is not aware of hardening.

#requirepass test

7. Neglecting to check iptables after docker is installed on a single machine, docker changes iptables to open the Internet

It is needless to say that Docker technology brings us convenience, but docker brings not a few security risks. By default, docker daemons can control the host iptables. If a Docker daemon uses TCP sockets or starts a container that can be accessed from outside, the host can also be destroyed. For example, the following startup container opens TCP /443 to the public:

docker restart

*nat

:PREROUTING ACCEPT [8435539:534512144]

:INPUT ACCEPT [1599326:97042024]

:OUTPUT ACCEPT [4783949:343318408]

:POSTROUTING ACCEPT [4783949:343318408]

:DOCKER – [0:0]

-A PREROUTING -m addrtype –dst-type LOCAL -j DOCKER

-A OUTPUT ! -d 127.0.0.0/8 -m addrtype –dst-type LOCAL -j DOCKER

-a POSTROUTING -s 172.17.0.0/16! -o docker0 -j MASQUERADE

-A POSTROUTING -s 172.17.0.1/32 -d 172.17.0.1/32 -p tcp -m tcp –dport 443 -j MASQUERADE

-A FORWARD -o docker0 -j DOCKER

-A FORWARD -o docker0 -m conntrack –ctstate RELATED,ESTABLISHED -j ACCEPT

-A FORWARD -i docker0 ! -o docker0 -j ACCEPT

-A FORWARD -i docker0 -o docker0 -j ACCEPT

A DOCKER -d 172.23.0.3/32! -i br-1bf61a2fa2e7 -o br-1bf61a2fa2e7 -p tcp -m tcp –dport 443 -j ACCEPT

*filter

:INPUT ACCEPT [1599326:97042024]

:OUTPUT ACCEPT [4783949:343318408]

-A INPUT -s 10.0.0.0/8 -j ACCEPT

-A INPUT -s 127.0.0.1 -j ACCEPT

-A INPUT -j DROP

# The last rule is bypassed

8, Sudo authorization is too large, resulting in custom scripts

If an attacker can modify the content of the script, it is easy to raise rights.

sudo script.sh

Reference links:

Script. Sh: http://script.sh

9. Give root permission to developer or QA and blame him for the mess?

We’ve been emphasizing RBAC for a long time, but when operations are too busy and development testers have too many demands, many operations will simply grant them root privileges, and they don’t have a good understanding of system-level access control, resulting in a significant vulnerability.

dev@pro-app-01:/home/dev$su

root@pro-app-01:/home/dev#whoami

root

10, key/token/ SSH private key saved in TXT file, also have the personal SSH private key on the server

op@pro-app-01:/home/op$ls ~/.ssh

id_rsa id_rsa.pub

11. Release the code you’re working on

I met an intern who submitted the project code to Github and replied that git was mismatched. I don’t know whether it’s true or not, but I think at least they lack safety awareness.

git remote add origin https://github.com/secondwatchCH/EFS.gitgit push origin master

12, The personal home directory is so sensitive that some people use direct hosting service, at least.bash_history leak is not escape

dev@pro-app-01:/home/dev$python -m HTTPSimpleServer

13. Safety risks are not considered in application selection

Apache Struts Version: Struts 2.5-Struts 2.5.12 # Online services using Version S2 affected by S2-052

14. No concept of software supply chain security

From the Xcode incident to the PIP official discovery of the malicious SSH library, it is clear to us that the security risk of the software supply chain is extremely high. At present, the most common problems among operation and maintenance personnel are:

  • SSH client or development IDE from Baidu web disk download

  • Close your eyes and use github/pypi/ DockerHub apps/libraries/images directly into the production environment

  • The default password or default configuration is not cleared

4. Common operation and maintenance security problems

Previously, we talked about some bad habits in operation and maintenance operation and thinking, or lack of security awareness. Based on vulnerability analysis and response, common operation and maintenance security problems can be divided into the following types:

1. Open sensitive ports to the outside world

The DB or cache are sensitive applications and are usually deployed on the Intranet. However, if the deployed machine has an Intranet and Intranet IP address and the default listening address is 0.0.0.0, the sensitive port is open to the public. For example, MySQL/MongoDB/Redis/rsync/Docker DAEMON APIS are open to the public.

2, sensitive applications without authentication, air command or weak password

Similarly, if sensitive applications are running on the default configuration, authentication will not be enabled, while MySQL/MongoDB/Redis/rsync/supervisord RPC/Memcache is running without authentication. Sometimes for the convenience of testing, the configuration of a weak password or empty command, authentication is in vain.

3, sensitive information leakage, such as code backup, version tracking information, certification information leakage

Web.tar.gz /backup.bak /.svn/.git/config.inc. PHP/test.sql and other information leakage can be seen everywhere, everyone knows the danger, but from time to time someone will step on the hole.

4. The default application configuration is not cleared

Jenkins script/Apache server-status and other default functions are not cleaned, for example, you can directly execute commands as shown in the following figure:

5. Enable the debug mode of the application system

Django debug mode exposes URI paths, phpInfo () exposes server information, or even Webroot, which can then be used by attackers to further infiltrate. Many white hat should feel the same way. Phpinfo () is the best thing you can do.

6. Application vulnerabilities are not updated in time

The more generic an application is, the more bugs are exposed. As a saying goes, it is not because of hackers that the world is unsafe, but because of insecurity that there are hackers, hackers to uncover that layer of illusion, we found so much insecurity. So Struts2, OpenSSL, Apache, Nginx, Flash and other Cves came one after another.

7. Loose authority management

Do not follow the principle of minimum permission. Grant root permission to development or admin permission to service accounts.

8. DDoS attacks

DDoS attacks are a familiar security problem for operations personnel. We all know that the server can not respond to normal requests by using up bandwidth and resources. In the final analysis, it is an attack method of resource confrontation. If you only rely on server resources to de-resist and filter, as shown in the following figure, under heavy traffic and high concurrency, it will only lead to an avalanche:

This, coupled with the availability and low cost of DDoS attack platforms, has made DDoS attacks the preferred means of suppressing competitors, retaliating, extortion and other conspirators.

9. Traffic hijacking

Do you still remember the report in 2015 when six companies including Xiaomi, Tencent, Weibo and Toutiao jointly issued a statement calling on telecom operators to crack down on traffic hijacking? Even so, now the Internet is still rolling currents. Here are three common traffic hijacking methods, which have plagued o&M security personnel for years:

  • Arp hijacking: The BASIC function of ARP is to query the MAC address of the target device based on the IP address of the target device to ensure communication. Based on the working characteristics of ARP protocol, hackers continuously send fraudulent ARP packets to the other computer, and fake the target IP address for ARP response, so as to achieve the man-in-the-middle attack.

  • Domain name hijacking: Hijacks the DNS resolution result of a domain name and hijacks HTTP requests to a specific IP address so that the client can establish a TCP connection with the attacker’s server instead of directly connecting to the target server.

  • HTTP hijacking/direct traffic modification: Fixed content insertion into a page on the data path, such as AD pop-ups, etc.

10 cases,

Previously, we discussed a lot of bad practices and classification of operation and maintenance security problems. The following are some cases that we are familiar with, and see how the operation and maintenance security vulnerabilities are “cost-effective” :

svn

  • The. SVN directory was mistakenly uploaded during web code deployment.

  • SVN directory is not excluded when code is uploaded using rsync. SVN repository is not used either. SVN propedit SVN :ignore Ignore files or directories that should not be uploaded.

  • The attacker uses svN-tool or SVN-extractor to restore the code through SVN information leakage.

rsync

  • Rsync is started as the root user, the module is not configured with authentication, and the default port 873 is opened to the public.

    • The attacker uses rsync to write crontab task to successfully rebound Shell and plant a mining Trojan horse.

Redis

  • Redis is started as the root user. Authentication is not configured and the default port 6379 is open to the public.

  • The attacker uses Redis to write an SSH public key to the. SSH directory of user root.

  • Generally, Redis machines have Intranet IP addresses that attackers can use to roam the Intranet.

Kubernetes

  • K8S API is open to the outside world, and authentication is not enabled.

  • The attacker called API to create a container and mounted the root directory of the container file system to the root directory of the host. The attacker successfully bounced Shell by writing crontab task and created a mining Trojan horse on the host species.

  • Sometimes there is uncompiled code running in the container or any image can be pulled to the private Docker image repository on a corrupted machine, the result will be unimaginable, as the following K8S API, it is very simple to call.

Then, it is time to consider a question: how to do a good job of operation and maintenance security? There is a saying in Chinese medicine that suits the right symptoms. We spend a lot of time to analyze the problem, because we want to start from the problem, correct or cultivate good operation and maintenance safety habits, combined with a complete operation and maintenance security technology system, is the way out of the problem. As for how to solve the problem, we will explain in detail in the next article.

More operational practices at 2018 Gdevops Global Agile Operations Summit in Beijing! Topics of the summit cover AIOps evolution, DevOps landing, database selection, SQL optimization, technical management and other practical activities, helping you move forward with full blood in an all-round way!

↓↓↓ Click the link for more details ↓↓↓

2018 Gdevops Global Agile Operations Summit – Beijing Station