background

In fact, I have seen this problem before, just a few days ago, Professor Hong in a group shared a “some interesting attack method.pdf”, I think this topic should still be a lot of people are not clear, today I am ready to “actual combat”, also please see the official pat.

Qiangning Hong (Professor Hong), founder and CTO of Ein Interactive, was the chief architect of Douban and one of the founders of Python User Group in China (CPUG).

Hash conflict

What is Hash conflict? Let’s start with a Hash table (or Hash table). We know that the expected time to find an element in a Hash table is order (1). How can we do that? That’s what the hash() function is doing.

At first glance, the internal storage of a hash table is similar to that of an array. It uses contiguous memory space to store elements. As long as the elements to be stored are mapped to the subscripts of the array in some way, the corresponding elements can be read by subscripts just like an array.

Hash sample

For example, suppose it is a hash function designed by me that satisfies the following conditions:

  • Hash (“hello”)=0: string “hello” stores array subscript 0;
  • Hash (“world”)=2: “world” stores array subscript 2;
  • Hash (“tangleithu”)=5: “tangleithu” stores array subscript 5;

So far, it looks pretty good, but that’s an assumption, and I can’t assume that this hash is going to map different strings to different indices perfectly.

Hash (” stone “) = 2. This is called “Hash conflict”, and the most common solutions to Hash conflicts are “open chain” methods, as well as linear, square, and so on.

There are dozens of articles on hashmaps that I won’t go into here. Let me give you a simple example to help you understand.

Hash collision open chain method

The open chain method is shown in the figure above. When we store elements, we store them as a linked list. When conflicts occur, we add the conflicting elements to the end of the list. The example above happens to be unlucky, with the strings shitou and stone both calculated with an index of 2.

That’s a big problem. The time that we expected O(1) to find elements, now it’s a linear time to find elements in a linked list, and if you insert something at this point, it’s the worst-case time. (We will not discuss the tree-to-tree case here.)

Bad guys can get in

That leaves room for the bad guys to imagine. As long as the bad guy carefully designs a set of strings to put into the hash table and makes them all have the same Hashcode, this will result in hash collisions, which will result in the CPU spending a lot of time processing hash collisions, resulting in a DoS attack.

Hash tables are all too common. In Web services, the processing of forms is usually stored in a hash table (the back end often needs to know that the corresponding parameter value is obtained by a specific parameter key).

In actual combat

This article will take Java SpringBoot as an example to attempt an attack.

But don’t think this “Hash conflict with DoS” is unique to Java, Python, Apache, Tomcat/Jetty, PHP, etc. In fact, as early as the end of 2011 was a large number of exposure, some of the framework has been gradually improved and repaired. Ocert-2011-003 Multiple Implementations denial-of-service via Hash Algorithm Collision [1]

Here, let’s list one Apatch Tomcat from CVE-2011-4858[2].

Apache Tomcat before 5.5.35, 6.x before 6.0.35, and 7.x before 7.0.23 computes hash values for form parameters without restricting the ability to trigger hash collisions predictably, which allows remote attackers to cause a denial of service (CPU consumption) by sending many crafted parameters.

The following screenshots are from Professor Hong’s PPT, but the specific source of the content is unknown (I tried to find it, but failed to find it), please refer to it.

Bandwidth required to implement hash conflicting DoS attacks

The left side shows the bandwidth required to implement this attack in different languages (frameworks), and the right side shows the CPU target of the attack. As you can see, this attack is actually quite cheap to carry out (as evidenced by the subsequent stone experiment).

Have to say “PHP is the best programming language in the world” (don’t fight), there is some truth to it, hahahahahahahahahaha (not enough for one image, add one more)

The above language sort, not necessarily right, we can refer to it, do not tangle with the specific accuracy.

In fact, the way to verify, of course, is relatively simple, just find out the different strings causing the conflict, the specific language may be different.

talk is cheap

Now follow me to try an attack, I use my own laptop test (configuration: MBP 13-inch, 2.5ghz Intel Core I7, 16 GB 2133 MHz LPDDR3).

The code below is an example of a pair of hashed string pairs, which can actually be generated by the previous permutation and combination.

System.out.println("Aa".hashCode());
System.out.println("BB".hashCode());
System.out.println("BBBBBBBBBBBBBBBBBBBBBBBBAaBBBBAa".hashCode());
System.out.println("BBBBBBBBBBBBBBBBBBBBBBBBAaBBBBBB".hashCode()); // Output 2112 2112 2067858432 2067858432Copy the code

Application vulnerability Due to Non Random Hash Functions[3] Or refer to this Hash Collision DoS problem [4].

Then I started a SpringBoot (2.2.2.release) Web service, JDK 1.8 (actually 1.7 works better).

@RequestMapping("/hash")
public String hash(HttpServletRequest Request) {// simply return the size of the argument and its correspondinghashCode
    int size = request.getParameterMap().size();
    String key = (String)(request.getParameterMap().keySet().toArray())[0];
    return String.format("size=%s, hashCode=%s", size, key.hashCode());
}Copy the code

Curl curl curl curl Curl Curl Curl Curl Curl curl curl curl curl curl curl

Curl experimental results

If the generated string is not enough, you can also add concurrent requests. You can borrow tools like “Apache Benchmarking” to send requests. I have also introduced this command performance test tool – AB simple application, interested in the reference.

Conflicts like hashcode

Click a breakpoint to see the effect, as shown above. Indeed, all hash values are the same. But a single request didn’t seem to affect a significant change in my computer’s CPU.

I tested 29859 strings and was trying to generate more conflicting strings when I took a closer look and found that the request was truncated and the parameter size returned by the request was 10000. Tomcat is rigged in SpringBoot, as shown below, because the default request size is limited to 10000.

More than the maximum number of request parameters (GET plus POST) for a single request ([10,000]) were detected. Any parameters beyond this limit have been ignored. To change this limit, set the maxParameterCount attribute on the Connector.

The number of POST parameters is limited

One way, of course, is to modify the limit on the number of request parameters. In addition, you can try JDK 1.7 to verify, should be better (reason, you know, smart reader?). . Don’t bother here, try to win by volume, submit requests concurrently with ab, and see what happens.

Here are the results of the pressure test I ran with the following parameters:

ab -c 200 -n 100000 -p req.txt 'localhost:8080/hash'Copy the code

The results of pressure measurement are shown in the figure below:

Ab Test result of hash conflict

Then we take a look at the CPU changes, specially recorded screen to do a GIF, you can see that is relatively obvious. It went from almost no CPU usage to 39.6%, and then all of a sudden it went to 158%.

Hash collision – demo figure

This process in practical test not lasted (one is caught in the process of retry above), on the one hand, because of I use JDK 1.8, originally post-conflict lookup process has been optimized, may effect is not obvious, also suspect there may be some cache optimization, such as in the second place is not enough for the amount of 10000? Specific I also did not delve into, interested readers can try to play.

So far the experiment has been successful.

Success is drag

I this or single machine, if more than a few clients, every minute the Web service to death ah.

defense

That was a success. What about the defense? In essence:

  • Change hash algorithm to calculate a kind of; For example, in cases where random algorithms are used as hash functions, different random seeds can be tried to generate; But there is no perfect hash algorithm.
  • The experiment in this article also encountered this, which is to limit the number of parameters in the request, as well as the request length. Keep restrictions as small as possible without affecting business;
  • On the Web Application Firewall (WAF), use a professional Firewall to clean traffic.

The last

This article is for learning and communication only, please do not try online service easily! Don’t try online services easily! Don’t try online services easily!

If there is something wrong, I hope you can point it out.


Github standard star 81.6K Java interview shock manual, a comprehensive detailed standard P7 post

Ali P7 knowledge points collection collate notes