The cause of the incident was that one of my classmates asked me to help him look at a problem, which was described as: the server did not respond to a large number of requests. The site environment is

  • OS: CentOS Linux Release 7.6.1810 (Core)
  • Web server: undertow v2.0.20. The Final
  • SpringBootVersion: 2.1.5. RELEASE
  • An Intranet penetration tool FRP

The network topology is like client-> Intranet penetrating FRP ->httpServer and the preconditions are like this.

Try looking at network state and thread state

netstat

First netstat na | grep port to see the status of the network, and using awk estimated the status of the connection under:

CLOSE_WAIT 613
ESTABLISHED 53
TIME_WAIT 17
Copy the code

netstat result like :

[root@localhost backend]# netstat -nalt |grep 8077Tcp6 0 0: ::8077 ::* LISTEN tcp6 1 174696 192.168.2.195:8077 172.16.1.10:49588 CLOSE_WAIT tcp6 1 188280 192.168.2.195:8077, 49576:172.16.1.10 CLOSE_WAIT...Copy the code

jstatck

UnderTow has a total of 32 HTTP processing threads in the following states:

"XNIO-1 task-32" #69 prio=5 os_prio=0 tid=0x0000000001c8e800 nid=0x1e10 runnable [0x00007f0a20dc1000]
   java.lang.Thread.State: RUNNABLE
    at sun.nio.ch.PollArrayWrapper.poll0(Native Method)
    at sun.nio.ch.PollArrayWrapper.poll(PollArrayWrapper.java:115)
    at sun.nio.ch.PollSelectorImpl.doSelect(PollSelectorImpl.java:87)
    at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
    - locked <0x00000006cf91e858> (a sun.nio.ch.Util$3)
    - locked <0x00000006cf91e848> (a java.util.Collections$UnmodifiableSet)
    - locked <0x00000006cf91e5f0> (a sun.nio.ch.PollSelectorImpl)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
    at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:101)
    at org.xnio.nio.SelectorUtils.await(SelectorUtils.java:46)
    at org.xnio.nio.NioSocketConduit.awaitWritable(NioSocketConduit.java:263)
    at org.xnio.conduits.AbstractSinkConduit.awaitWritable(AbstractSinkConduit.java:66)
    at io.undertow.conduits.ChunkedStreamSinkConduit.awaitWritable(ChunkedStreamSinkConduit.java:379)
    at org.xnio.conduits.ConduitStreamSinkChannel.awaitWritable(ConduitStreamSinkChannel.java:134)
    at io.undertow.channels.DetachableStreamSinkChannel.awaitWritable(DetachableStreamSinkChannel.java:87)
    at io.undertow.server.HttpServerExchange$WriteDispatchChannel.awaitWritable(HttpServerExchange.java:2039)
    at io.undertow.servlet.spec.ServletOutputStreamImpl.writeBufferBlocking(ServletOutputStreamImpl.java:577)
    at io.undertow.servlet.spec.ServletOutputStreamImpl.write(ServletOutputStreamImpl.java:150)
    at org.springframework.security.web.util.OnCommittedResponseWrapper$SaveContextServletOutputStream.write(OnCommittedResponseWrapper.java:639)
    at org.springframework.util.StreamUtils.copy(StreamUtils.java:143)
    at com.berry.oss.service.impl.ObjectServiceImpl.handlerResponse(ObjectServiceImpl.java:732)
...
Copy the code

At this point, we see two suspects,

  1. Why are there so many connections in CLOSE_WAIT state
  2. Why are all processing threads waiting on write methods

CLOSE_WAIT

I have seen too many cases of TIME_WAIT and too many cases of CLOSE_WAIT in CJH production line problem before, so I reviewed TCP handshake and wave again, as shown in the figure:

The client has completed its write operation and only reads the data sent by the server. When the server is finished sending the data, it sends its own FIN packet.


	@GetMapping("/hello")
	public void hello(HttpServletResponse response) throws IOException {
		String path="C:\\Users\\Administrator\\Desktop\\over.mp4";
		FileInputStream fileInputStream=new FileInputStream(path);
		OutputStream outputStream=response.getOutputStream();
		response.setContentType("video/mp4");

		int byteCount = 0;
		byte[] buffer = new byte[4096];
		int bytesRead = -1;
		while((bytesRead = fileInputStream.read(buffer)) ! = -1) { outputStream.write(buffer, 0, bytesRead); byteCount += bytesRead; } outputStream.flush(); }Copy the code

Just an HTTP service for sending large files and videos, with no time-consuming operations. So I’m at the end of my rope.

Tcpdump caught

Since the problem was with TCP, we looked briefly at the packet capture data of tcpdump and decided to capture packets to see what was happening on the network. The general information captured is as follows:

You can see that after the client sends the FIN, the suspect:

  1. It is clear that the server and client have been sending ACK packets to each other,
  2. Win is always 0 in the ACK package of the client

So what does win stand for,

TCP window

View the TCP format:

The window size

The problem summary

Therefore, the main problem is that there is a bug in the network proxy FRP, or the machine on which the service is deployed is too garbage, so that the server has been blocked in writing method and cannot continue. Remove the agent and return to normal

TCP behavior corresponding to closing the Java socket

This part I think is a very important point, and I have not been noticed before, of course, I do not have the opportunity to do socket programming, after checking the summary is as follows:

  1. The close() method closes the read and write operation, and the kernel send-q attempts to send all data before sending the FIN packet. After the peer receives a FIN, the read reads -1. If the setSoLinger() configuration is set, an RST packet will be sent to close the connection if the attempt to send send-q data times out. If the peer writes, the RST packet will be sent by the peer that has already closed (), and a SocketException will be reported on the peer’s second write.
  2. ShutdownOutput () sends FIN packets and stops writing, but is still readable
  3. ShutdownInput () stops the read operation and all packets on the peer end are ACK.

Reference:

Blog.csdn.net/zlfing/arti…

http2

So far the problem with CLOSE_WAIT has been located, but I have tested it several times and noticed something very strange: I used tcpdump on the server side and my browser as a client. After clicking on the video, I directly closed the page, but I couldn’t catch FIN package and the connection was ESTABLISHED. Server has been an error. The Java nio. Channels. ClosedChannelException and stop the data transmission, it makes me confused, if the connection has been exist, what is that stopped the transfer from the server.

After countless attempts, a puzzle finally emerged:

00:52:57. 745946 IP 116.237.229.239.64575 > izuf6buyhgwtrvp2bv981yz. 8077: Flags [P.], seq 1400:1442, ack 2399686, win 513, length 42Copy the code

Every time I close the page, I always catch a 42 length datagram sent by the client. P stands for immediate delivery to the upper application without waiting. So it looks like the server stopped transmitting data due to the behavior of the application, which is not controlled by TCP. After testing aimlessly for a long time, I finally see the protocol h2 in the browser console, which represents the http2 protocol. Http2 http2

Blog. Wangriyu. Wang / 2018/05 – HTTP…

There is already a lot of information about HTTP2 on the web, but I’ll just briefly note the key points:

Http2 multiplexing

Http1.1 has appeared TCP connection multiplexing (keep-alived), but an HTTP state is always controlled by TCP open and closed, and only one HTTP connection can exist on TCP at a time, even if the use of pipelining technology, can send many HTTP requests at the same time, But the server is still FIFO policy processing and return in order. In HTTP2, the client and server only need a TCP connection. Each HTTP is called a stream, and each frame is treated as a unit of HTTP packets. Each frame identifies which stream it belongs to, and is divided into types to control the state of the stream:

  • HEADERS: HEADERS frame (type=0x1), used to open a stream or carry a header block fragment
  • DATA: DATA frame (type=0x0), which fills the body information. One or more DATA frames can be used to return the response body of a request
  • PRIORITY: PRIORITY frame (type=0x2), which specifies the stream PRIORITY suggested by the sender. PRIORITY frames can be sent in any stream state, including idle and closed streams
  • RST_STREAM: A stream termination frame (type=0x3) is used to request the cancellation of a stream, or to indicate that an Error has occurred. The payload has a 32-bit unsigned integer Error Codes. RST_STREAM frames cannot be sent on streams in the idle state
  • SETTINGS: SETTINGS frame (type=0x4), which sets the parameters for this connection and applies to the entire connection
  • PUSH_PROMISE: a server push frame (type=0x5). The client can return a RST_STREAM frame to reject a push stream
  • PING: PING frame (type=0x6) to determine whether an idle connection is still available, and also to measure minimum round trip time (RTT)
  • GOAWAY: GOWAY frame (type=0x7), used to initiate a request to close a connection or to warn of a serious error. GOAWAY stops receiving new streams and finishes processing previously established streams before closing the connection
  • WINDOW_UPDATE: The window update frame (type=0x8) is used to perform flow control. It can be used on a single Stream (specifying a Stream Identifier) or on the entire connection (Stream Identifier 0x0). Only the DATA frame is affected by flow control. After the flow window is initialized, the flow window is reduced as much as the load is sent. If the flow window is insufficient to send, the WINDOW_UPDATE frame can increase the flow window size
  • CONTINUATION: a CONTINUATION frame (type=0x9) used to continue the transmission of a sequence of header block fragments, see header compression and decompression

How to upgrade from HTTP to HTTP2

This is what I’m curious about, exactly where the server client negotiates the upgrade to HTTP2. Reference:

Imququ.com/post/protoc…

A TLS extension named NPN (Next Protocol Negotiation) is developed by Google in SPDY. In this extension, Protocol selection is enabled.

Extension: application_layer_protocol_negotiation (len=14) Type: application_layer_protocol_negotiation (16) Length: 14 ALPN Extension Length: 12 ALPN Protocol ALPN string length: 2 ALPN Next Protocol: h2 ALPN string length: 8 ALPN Next Protocol: HTTP /1.1Copy the code

You can see that the client passes HTTP1.1H2 to the server, and the server’s handshake returns:

Extension: application_layer_protocol_negotiation (len=5)
    Type: application_layer_protocol_negotiation (16)
    Length: 5
    ALPN Extension Length: 3
    ALPN Protocol
        ALPN string length: 2
        ALPN Next Protocol: h2

Copy the code

This is the http2 negotiation handshake.

Wireshark captures HTTP2 packets

Http2 must be based on HTTPS, so wireshark cannot see the contents of the packet.

Zhuanlan.zhihu.com/p/36669377 eventually see the length of bag is 42:

HyperText Transfer Protocol 2 Stream: RST_STREAM, Stream ID: 3, Length 4 Length: 4 Type: RST_STREAM (3) Flags: 0x00 0... . . . . . . . = Reserved: 0x0 .000 0000 0000 0000 0000 0000 0000 0011 = Stream Identifier: 3 Error: CANCEL (8)Copy the code

The state of this frame is RST_STREAM, which tells the application layer to stop the stream. At this point all doubts were cleared up.

The final summary

I realize how bad I am when I’m working on it.