How to ensure THE security of API interfaces?

The introduction

Some time ago, the company conducted a security scan of the operating system using the AppScan provided by IBM.

This is the so-called do not sweep it does not matter, a sweep frighten jump, the result will sweep out such a problem.

An old and disrepair internal system of ours was scanned for potential security risks when we logged in. I can’t remember the exact scientific name, but basically it is that when we sent the login request, there was a field name of password, which AppScan thinks is not secure, which is probably as follows:

My first reaction was to change the name of this field. After all, if it can be easily solved, it will result in a slap in the face.

Whether I change the name to AAA or BBB, I will report the same problem again. The only difference is that the name of the field on the security report is changed.

This is interesting, this problem is not good, after I look up (don’t ask me how to check, ask is a blind guess), find the reason.

Because our system is an internal system, the log-in person was lazy, so he simply made a form submission on the page, like this:

This code I once wrote on the big homework in the university, I did not expect to see such a code after many years, actually let me have a special feeling of seeing a fellow citizen.

The specific reason for this problem is that AppScan directly checks the input box of type=’password’ on the page, and then checks whether there is a corresponding field in the request. Don’t ask me how I know this, because I did change it to type=’text’ and there is no error. The only drawback is that the password box on the page will clearly display the password.

Although I can use JS to dynamically replace the values in the input field with anything I want, such as dots, asterisks, and other styles, it’s still a bit unethical to do so.

I found the problem, so how do I fix it?

This brings me to what I want to talk about today, how to ensure the security of THE API interface?

First of all, this problem is divided into two parts, the client side and the server side.

The service side

Since I’m a server developer, this question starts with the server side.

In my opinion, security measures are mainly reflected in two aspects, one is how to ensure the security of data in the process of transmission, and the other is how to identify the data after the data has arrived at the server to ensure that the data is not attacked.

Let’s talk about them one by one:

1. Source identification in HTTP requests

Source identification in HTTP requests is how a server can identify that the current request was initiated by its own client rather than a request impersonated by a third party.

Let’s take a look at what would be in the header of a normal HTTP request:

I opened the home page of Baidu, grabbed a random request through the network, and checked the request header of this request. The fields I want to say are all in red boxes:

Origin: Indicates which site the current request is coming from.
Referer: Used to indicate the page from which the current request was linked
User-agent: Information about the browser or system that identifies the current request. We generally perform whitelist domain name verification for Origin and Referer in HTTP request headers, first to determine whether the request is sent by our own domain, and then to verify user-Agent once to ensure that the current request is sent by the browser. It’s not coming from some stupid simulator.

Currently, since the front end is completely untrustworthy, these fields can be tampered with and simulated (I definitely wrote about this in the previous crawler article). However, do as much verification as possible. We can’t plug all the holes at once, but we can plug at least some of them.

2. Encrypt data

Data is easy to be captured in the process of transmission. If data is transmitted directly, such as through HTTP, the data transmitted by users can be obtained by anyone. So you have to encrypt the data.

The common practice is to encrypt key fields, for example, the user password is directly encrypted by MD5. The current mainstream approach is to use THE HTTPS protocol to add an encryption layer (SSL layer) between HTTP and TCP, which is responsible for data encryption and decryption.

3. Data signature

Adding a signature is to add a string that cannot be forged when sending HTTP requests to prevent data from being tampered with during transmission.

The most popular algorithm for data signature is THE MD5 algorithm, which is the data to be submitted, somehow combined into a string, and then used to generate a signature.

I’ll use the previous login interface as a simple example:

SRT: name={parameter 1}&password={parameter 2}&$key={user key} md5.encrypt (STR)Copy the code

The key is a key held by the client and the server, and the final JSON data submitted by the login request will look like this:

{ "name": "test", "password": "123", "sign": "098f6bcd4621d373cade4e832627b4f6"}
Copy the code

Otherwise, after the request is hijacked, a third party can use the key to generate its own signature. Of course, if simple MD5 is not secure, you can add salt and hash during MD5 to further reduce the risk of simulation after the request is hijacked.

4. The timestamp

The timestamp mechanism is mainly used to deal with illegal DDOS attacks. After our requests are encrypted and signed, it is difficult to reverse crack the request. However, some attackers simply attack the captured packets without paying attention to the specific data inside.

We can add the timestamp of the current request in the parameter. After receiving the request, the server will compare the current time with the time in the request. For example, the time within 5 minutes will be transferred to the subsequent business processing, and the error code will be returned directly if it is beyond 5 minutes.

It should be noted that the time of the client and the server is basically impossible to be consistent, and the transmission of the request is time-consuming in the network, so the threshold of the time limit cannot be set too small to prevent legitimate requests from being inaccessible.

At this point, our request data will look something like this:

{ “name”: “test”, “password”: “123”, “timestamp”: 1590334946000, “sign”: “098f6bcd4621d373cade4e832627b4f6” }

5. AppID

In many cases, an API interface may not have only one client to call, but many callers. In order to verify legitimate calling users, our server can add an AppID.

If you want to call our API interface, you must apply for an AppID from me in offline mode. Only after the AppID is opened, you can legally access my interface. During the interface access, the AppID needs to be added to the request parameters and submitted together with other data.

At this point, the parameter passed to our login interface above should look like this:

{ “appid”: “geekdigging”, “name”: “test”, “password”: “123”, “timestamp”: 1590334946, “sign”: “098f6bcd4621d373cade4e832627b4f6” }

6. Encrypt parameters as a whole

We conducted a series of processing parameters on the face request, the overall idea is to prevent the third party were caught and cracking, but if I’m not a third party, for example, I was caught in the browser’s network, the request of the data I can see clearly clearly, the attacker can be accessed through regular way first, When we understand what we’re doing and then fake the request and attack it, all of our previous efforts seem to have been wasted.

Don’t say what no one will do so, I will give you an example, pay treasure to login to the web interface, if we can find out which requests sent rules, the attacker can use buy user database, tested batch into library, through the request and response can verify the results of a batch of pay treasure account password (of course, there won’t be so simple ah, I just give examples).

We can then do the whole encryption of the request again, now the mainstream encryption methods have symmetric encryption and asymmetric encryption.

Symmetric encryption: The symmetric key uses the same key during encryption and decryption. Common symmetric encryption algorithms include DES, AES, RC4, Rabbit, and TripleDes. The advantage is that the calculation speed is fast, but the disadvantage is that before data transmission, the sender and receiver must agree on the secret key, and then enable both sides to keep the secret key, if one party’s secret key is leaked, then encrypted information is not safe.

Asymmetric encryption: the server generates a pair of keys. The private key is stored on the server and the public key can be distributed to anyone. The advantage is that it is more secure than symmetric encryption, but the speed of encryption and decryption is much slower than symmetric encryption. The widely used algorithm is RSA.

Perform DES encryption on the data we submitted above, using the key 123456, and we can get such a result:

U2FsdGVkX18D+FiHsounFbttTFV8EToywxEHZcAEPkQpfwJqaMC5ssOZvf3JJQdB/b6M/zSJdAwNg6Jr8NGUGuaSyJrJx7G4KXlGBaIXIbkTn2RT2GL4NPrd 8oPJDCMky0yktsIWxVQP2hHbIckweEAdzRlcHvDn/0qa7zr0e1NfqY5IDDxWlSUKdwIbVC0o mIaD/dpTBm0=

Then we put this string in the request, and our login request looks like this:

It is believed that more than 99% of attackers will give up when they see such packet capture requests in the network, but the remaining 1% will open the developer tools provided by Chrome to debug line by line. For these people, we will talk about how to deal with them in the following section of the client side.

7. Current limit

In some scenarios with high concurrency, to protect the service system, you need to limit the request access rate to prevent the service system from bursting when the access rate is too high.

Especially for external interfaces, for customers or suppliers, because the caller has no control over who knows what their code is going to do.

I once saw that the supplier used the interface of modifying data we provided as a batch interface to run the batch, and the service was suspended every night. Later, until we asked the supplier, they said that they would use this interface to do tens of millions of data synchronization every night. I was also drunk.

For security reasons, it is necessary to limit traffic on the server.

Traffic limiting algorithms on the server are as follows: token bucket limiting, leaky bucket limiting, and counter limiting.

Token bucket flow limiting: The principle of token bucket algorithm is that the system puts tokens into the bucket at a certain rate, and then discards them when the bucket is full. When the request is made, the token will be taken out of the bucket first. If the token can be obtained, the request can continue to be completed, otherwise wait or refuse service. Token buckets allow a certain amount of burst traffic, which can be processed as long as there are tokens, and support holding multiple tokens at a time.
Leaky bucket traffic limiting: The leaky bucket algorithm flows out requests at a constant rate. The inflow rate is arbitrary. When the number of requests exceeds the bucket capacity, new requests are waiting or denied services. It can be seen that the leaky bucket algorithm can forcibly limit the data transmission speed.
Counter limit: counter is a relatively simple and crude algorithm, mainly used to limit the total number of concurrent, such as database connection pool, thread pool, second kill concurrency; Counter current limiting as long as the total number of requests within a certain period of time exceeds the set threshold, the current limiting.

In terms of implementation, Guava provides the RateLimiter tool class based on the token bucket algorithm. If you need it, you can search it for yourself.

Blacklist 8.

The blacklisting mechanism is already a bit of a risk control concept, we can define illegal operations.

For example, record the access frequency of each AppID. If 5 or more times of overclocked access and more than 10 times of access occurred within 30 minutes, the AppID can be put into the blacklist, and the AppID can be removed only after 24 hours or the caller can contact offline.

For example, record The Times of timeout access of AppID. Normally, timeout access does not happen frequently. If a large number of timeout access occurs in a certain period of time, there must be a problem with the AppID.

In fact, the blacklist is more applied at the business level. For example, you may have encountered the risk control of Pin-dad, which directly threw the account into the blacklist and prohibited the account from placing orders for certain subsidized goods.

The client

In today’s Internet era, web pages and APPS have become the mainstream information carriers.

Among them, some reinforcement technologies can be used to reinforce the APP to prevent others from brute force cracking.

The web page is more difficult, the dynamic of the web page is to rely on JavaScript to complete, the logic is to rely on JavaScript to achieve, and JavaScript has the following characteristics:

JavaScript code runs on the client side, meaning it must be loaded and run on the user’s browser side. JavaScript code is transparent, which means the browser can get the source code of the running JavaScript directly. Based on these two points, JavaScript code is not secure, anyone can read, analyze, steal, tamper with JavaScript code.

So JavaScript without some processing, no matter how sophisticated encryption and decryption schemes are used, will inevitably be emulated or copied when someone finds the logic in it.

Common reinforcement schemes for front-end JavaScript include compression, obfuscation, and encryption.

1. The compression

Code compression is to remove unnecessary whitespace, line breaks and other content in JavaScript code, and process and share some code that may be common. Finally, the output results are compressed into one or several lines of content, which makes the code readability very poor and also improves the loading speed of the website, just like the following:

This is me from baidu’s page casually looked for a JS cut out.

There is almost no protection from compressing the code solely to remove white space, because it simply reduces the direct readability of the code.

You can format code through a variety of tools, including the Chrome browser itself.

At present, the mainstream front-end technology will use Webpack to package, Webpack will compile and compress the source code, output several packaged JavaScript files, where we can see the output JavaScript file name with some irregular strings, At the same time, the content of the file may be only a few lines, and the variable names are simple letters.

This includes JavaScript compression techniques, such as some common library output as bundles, some call logic compressed and escaped into a few lines of code, all of which are JavaScript compression. It also includes some very basic JavaScript obfuscation techniques, such as replacing variable and method names with simple characters to reduce code readability.

On the whole, JavaScript compression works only to a small degree of protection, and real protection depends on JavaScript obfuscation and encryption.

2. The confusion

JavaScript obfuscation is done entirely on top of JavaScript. Its purpose is to make JavaScript difficult to read and analyze, greatly reducing the readability of code, and it is a very practical JavaScript protection scheme.

There are two general types of JavaScript obfuscators:

The first one is low cost, but the effect is also general, and it is suitable for the scene where the requirement of obfuscation is not high. The second is more expensive to implement, but more flexible, and more secure and suitable for adversarial scenarios.

The obfuscation is implemented by syntax tree replacement. The implementation of this obfuscation method is a bit complicated, so I won’t go into it here.

One of the better, and commercially available, ways to modify syntax trees for obfuscation is Jscrambler.

All in all, the above schemes are JavaScript obfuscation implementations that protect JavaScript code to varying degrees.

In general, the first obfuscation method is enough for us to use. Now the mainstream implementation of JavaScript obfuscator is the javascript-Obfuscator library, which can be used to implement page obfuscation very easily. It is combined with Webpack, As a result, you can output compressed and obfuscated JavaScript code, making it much less readable and harder to reverse.

3. The encryption

Different from JavaScript obfuscation technology, JavaScript encryption technology can be said to be a further upgrade of the protection of JavaScript obfuscation technology. Its basic idea is to use some core logic such as C/C++ language to write and execute through JavaScript calls. Thus providing binary level protection.

There are now Emscripten and WebAssembly, with the latter becoming increasingly mainstream.

If you are interested, you can find it on baidu.

summary

The above mentioned so much, just for our program can be more secure and stable operation, reduce the loss caused by attacks (overtime).

A lot of content is from Baidu after information sorting, I hope you can make good use of the search engine tool.

How to ensure THE security of API interfaces?

The introduction

The service side

The client

summary

Related Posts

People who write code

Explore String, StringBuilder, and StringBuffer in Java

Two solutions of Redis