background

In fact, I have seen this problem before, just a few days ago, Professor Hong in a group shared a “some interesting attack method.pdf”, I think this topic should still be a lot of people are not clear, today I am ready to “actual combat”, also please see the official pat.

Qiangning Hong (Professor Hong), founder and CTO of Ein Interactive, was the chief architect of Douban and one of the founders of Python User Group in China (CPUG).

This is really big guy, the original hong professor in the appropriate letter of the time, have shared this content, but did not know did not attend. After reading it, I realized that the Timing Attack mentioned in my last article is also one of the contents. Haha, I will study it later and continue to talk about other content.

Hash conflict

What is Hash conflict? Let’s start with a Hash table (or Hash table). We know that the expected time to find an element in a Hash table is order (1). How can we do that? That’s what the hash() function is doing.

Roughly speaking, the internal storage of a hash table is similar to that of an array. Elements are stored in contiguous memory space. As long as the elements to be stored are mapped to the subscripts of the array in some way, the corresponding elements can be read by subscripts just like an array.

Hash sample

For example, suppose it is a hash function designed by me that satisfies the following conditions:

  • Hash (“hello”)=0: string “hello” stores array subscript 0;
  • Hash (“world”)=2: “world” stores array subscript 2;
  • Hash (“tangleithu”)=5: “tangleithu” stores array subscript 5;

So far, it looks pretty good, but that’s an assumption, and I can’t assume that this hash is going to map different strings to different indices perfectly.

Hash (” stone “) = 2. This is called “Hash conflict”, and the most common solutions to Hash conflicts are “open chain” methods, as well as linear, square, and so on.

There are dozens of articles on hashmaps that I won’t go into here. Let me give you a simple example to help you understand.

Hash collision open chain method

The open chain method is shown in the figure above. When we store elements, we store them as a linked list. When conflicts occur, we add the conflicting elements to the end of the list. The example above happens to be unlucky, with the strings shitou and stone both calculated with an index of 2.

That’s a big problem. The time that we expected O(1) to find elements, now it’s a linear time to find elements in a linked list, and if you insert something at this point, it’s the worst-case time. (We will not discuss the tree-to-tree case here.)

Bad guys can get in

That leaves room for the bad guys to imagine. As long as the bad guy carefully designs a set of strings to put into the hash table and makes them all have the same Hashcode, this will result in hash collisions, which will result in the CPU spending a lot of time processing hash collisions, resulting in a DoS attack.

Hash tables are all too common. In Web services, the processing of forms is usually stored in a hash table (the back end often needs to know that the corresponding parameter value is obtained by a specific parameter key).

In actual combat

In this article, Stone brother will take Java SpringBoot as an example to try to carry out an attack.

But don’t think this “Hash conflict DoS” is unique to Java, Python, Apache, Tomcat/Jetty, PHP, etc. In fact, as early as the end of 2011 was a large number of exposure, some of the framework has been gradually improved and repaired. Ocert-2011-003 Multiple Implementations denial-of-service via Hash Algorithm Collision [1]

Here, let’s list one Apatch Tomcat from CVE-2011-4858[2].

Apache Tomcat before 5.5.35, 6.x before 6.0.35, and 7.x before 7.0.23 computes hash values for form parameters without restricting the ability to trigger hash collisions predictably, which allows remote attackers to cause a denial of service (CPU consumption) by sending many crafted parameters.

The following screenshots are from Professor Hong’s PPT, but the specific source of the content is unknown (I tried to find it, but failed to find it), please refer to it.

Bandwidth required to implement hash conflicting DoS attacks

The left side shows the bandwidth required to implement this attack in different languages (frameworks), and the right side shows the CPU target of the attack. As you can see, this attack is actually quite cheap to carry out (as evidenced by the subsequent stone experiment).

Have to say “PHP is the best programming language in the world” (don’t fight), there is some truth to it, hahahahahahahahahaha (not enough for one image, add one more)

The above language sort, not necessarily right, we can refer to it, do not tangle with the specific accuracy.

In fact, the way to verify, of course, is relatively simple, just find out the different strings causing the conflict, the specific language may be different.

talk is cheap

Now follow me to try an attack, I use my own laptop test (configuration: MBP 13-inch, 2.5ghz Intel Core I7, 16 GB 2133 MHz LPDDR3).

The code below is an example of a pair of hashed string pairs, which can actually be generated by the previous permutation and combination.

System.out.println("Aa".hashCode()); System.out.println("BB".hashCode()); System.out.println("BBBBBBBBBBBBBBBBBBBBBBBBAaBBBBAa".hashCode()); System.out.println("BBBBBBBBBBBBBBBBBBBBBBBBAaBBBBBB".hashCode()); / / output 2112211220678584322067858432Copy the code

Application vulnerability Due to Non Random Hash Functions[3] Or refer to this Hash Collision DoS problem [4].

Then I started a SpringBoot (2.2.2.release) Web service, JDK 1.8 (actually 1.7 works better).

@requestMapping ("/hash")public String Hash (HttpServletRequest Request) {// Demo, HashCode int size = request.getParameterMap().size(); String key = (String)(request.getParameterMap().keySet().toArray())[0]; return String.format("size=%s, hashCode=%s", size, key.hashCode()); }Copy the code

Curl curl curl curl Curl Curl Curl Curl Curl curl curl curl curl curl curl

Curl experimental results

If the generated string is not enough, you can also add concurrent requests. You can borrow tools like “Apache Benchmarking” to send requests. I have also introduced this command performance test tool – AB simple application, interested in the reference.

Conflicts like hashcode

Click a breakpoint to see the effect, as shown above. Indeed, all hash values are the same. But a single request didn’t seem to affect a significant change in my computer’s CPU.

I tested 29859 strings and was trying to generate more conflicting strings when I took a closer look and found that the request was truncated and the parameter size returned by the request was 10000. Tomcat is rigged in SpringBoot, as shown below, because the default request size is limited to 10000.

More than the maximum number of request parameters (GET plus POST) for a single request ([10,000]) were detected. Any parameters beyond this limit have been ignored. To change this limit, set the maxParameterCount attribute on the Connector.

The number of POST parameters is limited

One way, of course, is to modify the limit on the number of request parameters. In addition, you can try JDK 1.7 to verify, should be better (reason, you know, smart reader?). . Don’t bother here, try to win by volume, submit requests concurrently with ab, and see what happens.

Here are the results of the pressure test I ran with the following parameters:

ab -c 200 -n 100000 -p req.txt 'localhost:8080/hash'
Copy the code

The results of pressure measurement are shown in the figure below:

Ab Test result of hash conflict

Then we take a look at the CPU changes, specially recorded screen to do a GIF, you can see that is relatively obvious. It went from almost no CPU usage to 39.6%, and then all of a sudden it went to 158%.

Hash collision – demo figure

This process in practical test not lasted (one is caught in the process of retry above), on the one hand, because of I use JDK 1.8, originally post-conflict lookup process has been optimized, may effect is not obvious, also suspect there may be some cache optimization, such as in the second place is not enough for the amount of 10000? Specific I also did not delve into, interested readers can try to play.