The background,

Hello everyone, I’m very glad to share with you the topic of “Zero Trust Security Construction of Office Network”.

Before sharing, I would like to introduce our company briefly. Qunjia is a game company, mainly facing the overseas market, so many students may not have heard of our company. But for those of you who like to play games, you might have heard of a team, you could do FPX which is actually a team in our company.

2. Share content

I want to talk about three things today



First of all, why do we do zero trust security construction, many teachers in front of the zero trust of some application scenarios and the concept of zero trust, here I briefly mention why we are interested in plus to do zero trust construction; The second is how we design the zero-trust architecture. The construction of zero-trust is mainly realized by combining business. Here, I would like to share our construction ideas by taking Qiujia as an example. The third point is how we do in the process of construction practice, what we have done and some details of the problem;

Three, why do it

First of all, why do we want to do zero trust construction? Many companies pay attention to security. The source of demand is divided into two categories. So what we’re interested in doing here is actually driving internally, and then doing some security things

Why do you say that? Because we’re a game company, and game companies take security very seriously; Therefore, we have some requirements in terms of security. The second one is that the whole team attaches great importance to security, because the security of a game can determine the life cycle of the game. When promoting zero trust, the cooperation of the team is also very important. The third point is that after we reported the zero-trust security construction plan to the leaders, they also supported us to do such a thing very much.

3.1 Safety requirements

Just now I said that we have security needs, then why do we have such security needs? Here I give you an example.

3.1.1 Network architecture

Our company mainly divides the network into two networks, namely the internal office network and the public network.

Based on the two network construction of the security system, which the internal office network is believed to be a trusted network by default, that is to say, you only need to connect to the office network; You access some internal services, he is believed to be trusted, you only need to go through some simple authentication can operate; However, there are some problems. For example, during the epidemic in 2020 last year, many colleagues worked at home. To access the internal office system, they had to connect to the internal network through VPN. This network architecture actually has some weaknesses;

VPN can only guarantee the credibility of the identity, but it cannot guarantee the security of the device. In addition, in many cases, VPN is connected, but the traffic accessed does not need to use VPN. For example, WeChat and visiting some non-office network web pages are actually a waste of VPN resources; The concept of zero trust is actually very appropriate when we are thinking about how to access the internal service of the internal office network without using VPN and ensuring security.

3.1.2 Zero trust concept

Here I have summarized some of the concepts of zero trust. Here are four of them



The first is the default distrust, the default distrust of the user and the device and the network, the structure of an office network that we had before was the default trust of the network, and the default trust of all the devices; In fact, this concept can make up for some of our shortcomings

The second point is the dynamic access rights. As long as you are connected to this network and your identity is verified once, you can operate the following permissions. This can also strengthen our authentication, because it has been continuously verifying after logging in. For example, if I think a user is illegal, I can always delete that user, and its subsequent access will be immediately front-end.

The third point is to reduce the exposure of resources, narrow the scope of attack, before we want to access the internal services, just need to link the office network can be directly connected to the service, and then the service itself to access control, there are some services in fact security is very weak, such as weak password and so on, the scope of attack is relatively wide

The fourth is continuous assessment and security response, which uses multiple dimensions to determine whether a request is secure enough.

3.2.3 Construction Objectives

Combined with the network status quo and some concepts of zero trust, we put forward three construction objectives



The first is to allow employees to access the company’s internal services more safely and conveniently; the second is to ensure that the identity of visitors and the network environment are safe before allowing access; the third is to solve the problem that the access log is scattered and the user’s behavior cannot be traced back. Therefore, these three points are the main purpose of our construction.

3.2 Pay attention to safety

As mentioned earlier, our game company attaches great importance to safety, mainly because it can directly affect the company’s revenue. Here are two examples from the game industry

3.2.1 Source leak scenario

Let’s start with a source leak case. What happens if the source is leaked



Many students may be heard the legend, legend private servers or played in September 2002 the legendary leaked source code through the Italian server source code, soon spread to the domestic, in half a year has more than 500 private legend server setup, a lot of players started from vestment into private servers, legendary operators greatly affected, They stopped paying the legendary developer’s agency fees, leaving the developer at risk of bankruptcy and later even being acquired. From this example, the safety of a game can determine its life cycle.

3.2.2 High-risk vulnerability scenarios

There is also a high risk vulnerability scenario. In the US DEF CON 2017 conference, a hacker revealed to the media some ways he used online game vulnerabilities to make money in the past two decades

And the scene demonstrated in the debugger input command, to his game account increased a lot of gold coins; Different games use different methods to increase currency, the same is that the added coins or items are mainly traded through third party markets to make money; In this case, the safety of a game can have a direct impact on the company’s bottom line; Therefore, our team attaches great importance to safety, and the boss’s importance to safety also gives us strong support, so that we can confidently build zero-trust security.

IV. Architecture design

Having decided to do zero trust building, we mainly did three things



The first point is to determine an ideal goal, after determining a good goal is familiar with the existing network architecture; Because zero trust is not a product, the product development is completed, it is very close to the line of business, so you should be familiar with the current network structure, and then combine the goals and the status quo to get a target scheme that can be implemented;

4.1 Ideal goals

This diagram is our ideal target effect. In the diagram, you can see that I want all the accessing users to access these internal services through the Secure Gateway Proxy

Before agent visit, we will go to verify the identity of the visitors are legal, internal equipment to test whether the device is at the same time, the illegal request, and if any of his request parameters in addition there are some of his abnormal behavior, such as he is at work at ordinary times time to access the service, suddenly one day he point at two o ‘clock in the morning to access the service, At this time, we will lower his safety level and need him to verify twice to improve his safety score.

To achieve these goals, we need to do the left several items, such as unified resource management and unified control of external access. For example, we need to unified management of these users, unified control of these devices, unified access to internal services through secure gateways; So these services need to configure some firewall restrictions to allow only the security gateway to access it; The other two are the desire to dynamically adjust access control policies and the desire to reduce the use of VPNs.

4.2 Familiarize yourself with the existing architecture

After determining the ideal goal, have to be familiar with the existing network structure, the current network situation is mainly network credibility and identity authentication two mechanisms, so I focus on understanding these two.

4.2.1 Network access

First of all, there are three ways for PC devices, such as Windows, Mac and Linux, to access the internal network. First, when working at home, they will be connected through VPN, and the login account is required to connect to the VPN. Second, when working in a company, they usually connect to our network through WiFi. After connecting to WiFi, you will need to access the login account and password to confirm your identity. The last one is the network cable access of the office network. In fact, the access mode of mobile devices such as mobile phones is basically the same as that of computers, and some dumb devices such as printers and cameras are mainly connected to the internal network through network cables.

Here are a few problems about the current network



The first point is that after a user connects to our internal communication, if he also wants to access a certain service, it is completely by the service itself for permission control, there is no unified identity to control whether it can access; Some systems have weak passwords or weak accounts, which is from a security perspective;

The second point is that the stability of VPN cannot be guaranteed. For example, some employees need to access the internal service of the office network on the high-speed railway. At this time, they have to connect to the internal communication through VPN first. At the same time, there are some traffic that does not need to be accessed through VPN, but it obviously increases the direct cost of network through VPN;

The third point is that users, devices and applications are all in the same network, which is not reasonable. It should be that our different departments are in different network isolation zones. For example, the development department has a development network, and the administrative department has an administrative network.

4.2.2 Identity authentication

Identity authentication is mainly account verification, account we are mainly composed of two parts

Part is the unity of the company account, and the other part is self-built accounts, the current staff have such a unified account, but some of the outsourced staff he want to visit a certain business, will be on business system gave him a self-built account, part of this system because of using open source system, transform up more troublesome, still stays on the self-built account.

4.3 Adjusted scheme

After getting familiar with the existing structure and combining with the previous ideal goal, we need to design a set of landing architecture, mainly this diagram, in which we can see that all the requests to access the office network application through the terminal have to go through a security gateway

The security gateway he is mainly a proxy role, before the proxy he will call the security policy center evaluation data, through the data to determine whether the request is legal, if it is not the law will abandon the person request; The decision of the security policy center is mainly based on the data of the center of device management center and ID card

The device management center mainly stores the safety baseline data of the terminal, and issues certificates to the devices at the same time. The terminal will report the security data, such as whether the process is safe, whether the network is safe and whether there is a lock screen. The certificate management is mainly to verify whether the device is a legitimate device of the company.

For example, if someone uses a private device to access the company’s office network, we are not allowed to do so, the device management center will give the device a relatively low score; Another is based on id card in the center, mainly investigate it is legal to have a user identity information, integrated the information had a security rating, safety grade is lower when we may face recognition or other multifactor authentication to promote one of his security rating, final decision whether this request can visit our office network applications

4.4 Construction module

Now let’s take a closer look at the five modules that we’ve just reduced to five by looking at the architecture diagram

4.4.1 Identity Certification Center

The main role of the identity authentication center is to provide identity authentication, in addition to the usual conventional authentication, which should have some strengthened secondary authentication methods, multi-dimensional to ensure the credibility of the identity; Here we mainly use the third-party identity authentication service, Ali Cloud IDaS service and Google Google identity authenticator.

4.4.2 Security Policy Center



As mentioned just now, the security policy center mainly determines whether the request can be accessed. The security policy is dynamically generated based on the identity authentication and device risk, and the policy needs to be updated in real time.

In addition, some applications need a higher level of security, such as financial systems may require secondary multi-factor authentication to increase the security score.

4.4.3 Security gateway



The main role of the security gateway is to proxy the external network traffic to the internal, the security gateway will judge before forwarding has logged in, if there is no login will let the request jump to the IDAAS system, let the user log in first;

The security score of the Security Policy Center will be invoked when the request is made again. If its score is low, the request will be blocked. If it is valid, you can do some traffic hooks to the request, such as accessing a wiki system, and add some watermarks to it.

When traffic has passed through the secure gateway, logs can be stored and put into the log analysis platform for statistical analysis for future traceability and auditing.

4.4.4 equipment agent



The device agent is mainly to collect some security information of the terminal, such as system information, network connection information, whether to install anti-virus software, and report these information to the server of our device management center.

4.4.5 Equipment management center



The equipment management center is mainly to store some safety information reported by the equipment, conduct authentication of the equipment, judge whether the equipment is internal equipment of our company and so on, and provide data support to the security decision-making center.

Five, construction experience

Zero trust it is not a short-term can complete construction, zero trust construction for the construction of Google, has spent the last ten years, zero trust He Yi in front of the teacher in a perfect world construction also spent four years of time, so the zero trust is not a short time can finish, but now zero trust relatively already has a basic architecture, so we build up will be a lot faster, But it’s also periodic.

5.1 Implementation in phases

There are roughly 6 things that need to be done to build zero trust

We interest at the time of the construction of zero trust is to implement by stages, mainly divided into four stages, including Q1 to the first stage, the first is to have a security gateway, and supports authentication, this is the most basic function point, and then is to access some business to verify the feasibility of this model, in this process will be collected a lot of demand, We will improve it and continue to promote business access. We have already landed the first stage. At present, the second stage is under construction. The second stage is mainly about the full-volume access of the business and the third-party purchase evaluation of terminal software. The business access has been completed, but the terminal is still under evaluation.

5.1.1 Stage 1

The first stage is mainly to build a security gateway to achieve the most basic needs, such as traffic forwarding and login, as well as identity authentication, access to some office applications to verify whether this mode is feasible.

5.1.2 Stage II



The second stage is mainly to access the full range of applications. At the same time, we need to investigate the products of this terminal, which needs to support the baseline detection, certificate issuance and TLS two-way certification.

5.1.3 Stage 3



So far, the third and fourth stages are just a planning stage and haven’t been implemented yet. I won’t go into this part too much

5.1.4 Long-term implementation

Some things can be done at all stages, and there are no dependencies



For example, the integration of business needs, support more detailed access rights control, but also to enrich the ability of the proxy gateway, including traffic intercept and behavior analysis statistics, content injection, such as watermarking, etc. The third is to enrich the user behavior audit capabilities, combined with business access behavior and terminal security logs, comprehensive monitoring of security risks.

5.2 Detailed construction

Here I will talk about some of the details of our construction, mainly to recommend a few open source tools

5.2.1 openresty

Before we talked about the security gateway, security gateway is mainly a role of forwarding, here many students will think of NGINX server, I here is also the use of NGINX server, but NGINX has a encapsulated service, called OpenRESTY, it is more convenient for us to execute some Lua scripts

5.2.2 NginxWebUi

If you use the NGINX format to forward, it will definitely involve the NGINX configuration file. If you use the Vim editor to forward and edit the configuration file, it is inevitable that there will be an error. So it’s better to have an interface that you can manipulate and generate configuration files on.



Here I recommend an open source tool called NginxWebUi open source project, you can complete a reverse proxy configuration in the graphical interface

5.2.3 Configuration distribution

We have more than 20 nodes in the world, and each node has a server. If we put the proxy gateway on one node, the proxy speed will be very slow, so it is impossible for us to deploy one node to each node. When there are multiple nodes, a solution is needed for configuration file synchronization



In our scenario, the configuration is first generated by NginxWebUi. A program listens for changes to the folder and commits the configuration to the Gitlab repository if there are any changes. At the same time, each node is notified and the version number is passed. The node pulls the latest configuration from the GitLab server, checks for an error in the configuration file, and restarts NGINX and passes the version used by the node to the center.


Author: Tang Qingsong

Date: 2021-5-14