This article is sponsored by Yu Gang Shuo Writing Platform [1]

Crystal Shrimp Dumplings [2]

Copyright: this article belongs to the wechat public account Yu Gang said all, without permission, shall not be reproduced in any form

We’ll take a brief look at TCP/IP and then learn the Java Socket API by implementing an Echo server. Finally, we’ll talk a little more advanced about socket long connections and protocol design.

Introduction to TCP/IP

IP

First, let’s look at the Internet Protocol (IP). The IP protocol provides the communication between hosts.

In order to communicate between different hosts, we need some way to uniquely identify a host, and this identifier is known as an IP address. Through the IP address, the IP protocol can help us to send a packet to each other.

TCP

As we said earlier, the IP protocol provides communication between hosts. TCP implements process-to-process communication on the two hosts based on the communication function provided by IP.

With IP, different hosts can exchange data. However, when the computer receives the data, it does not know which process it belongs to (simply, a process is a running application). The purpose of TCP is to let us know which process the data belongs to, thus completing the communication between processes.

To identify which process the data belongs to, we assign a unique number to the process that needs to communicate with TCP. This number is often referred to as the port number.

TCP stands for Transmission Control Protocol, and most people probably talk about its connection-oriented features. The reason why it is connected is that before communication, the communication parties need to go through a three-way handshake process. After three handshakes, the connection is established. At this point we can start sending/receiving data. (The opposite is UDP, which sends data directly without a handshake.)

Let’s take a quick look at the three-way handshake process.

  1. First, the client sends one to the serverSYN, assume that the sequence number isx. thisxIs generated by the operating system according to certain rules, think of it as a random number.
  2. Received by the serverSYNThen, the system sends another one to the clientSYNAt this point the serverseq number = y. In the meantime, yesACK x+1“, telling the client “ReceivedSYNYou are ready to send data.
  3. Description The client received the packet from the serverSYNAfter, reply oneACK y+1theACKIt tells the server,SYNRoger that. The server is ready to send data.

After these three steps, the TCP connection is established. There are three things to note here:

  1. The connection is initiated by the client
  2. In step 3, the client replies to the serverACKThe TCP protocol allows us to carry data. This is not possible because of the limitations of the API.
  3. The TCP protocol also allows “four-way handshakes” to occur, but again, due to THE limitations of the API, this extreme case does not occur.

This is the theoretical knowledge of TCP/IP. There are also very interesting features about TCP such as reliability, flow control, congestion control, etc. I highly recommend reading Richard’s classic TCP/IP In Detail, Volume 1 (note that this is the first edition, not the second edition).

Now let’s look at something a little bit more practical.

Basic Socket Usage

Socket is the encapsulation of TCP layer, through Socket, we can carry out TCP communication.

In the Java SDK, there are two interfaces for sockets: ServerSocket for listening to customer connections and socket for communication. The procedure for using a socket is as follows:

  1. createServerSocketAnd listen for customer connections
  2. useSocketConnecting to the server
  3. throughSocketGets input and output streams for communication

Next, let’s learn how to use sockets by implementing a simple Echo service. The echo service is where the client writes any data to the server, and the server writes the data back to the client intact.

1. Create ServerSocket and listen for client connections

public class EchoServer {

    private final ServerSocket mServerSocket;

    public EchoServer(int port) throws IOException {
        // 1. Create a ServerSocket and listen on port port
        mServerSocket = new ServerSocket(port);
    }

    public void run(a) throws IOException {
        // 2. Start accepting customer connections
        Socket client = mServerSocket.accept();
        handleClient(client);
    }

    private void handleClient(Socket socket) {
        // 3. Use socket to communicate...
    }


    public static void main(String[] argv) {
        try {
            EchoServer server = new EchoServer(9877);
            server.run();
        } catch(IOException e) { e.printStackTrace(); }}}Copy the code

2. Use the Socket to connect to the server

public class EchoClient {

    private final Socket mSocket;

    public EchoClient(String host, int port) throws IOException {
        // Create the socket and connect to the server
        mSocket = new Socket(host, port);
    }

    public void run(a) {
        // Communicate with the server
    }


    public static void main(String[] argv) {
        try {
            // Since the server runs on the same host, we use localhost here
            EchoClient client = new EchoClient("localhost".9877);
            client.run();
        } catch(IOException e) { e.printStackTrace(); }}}Copy the code

3. Obtain input/output streams through socket.getinputStream ()/getOutputStream() for communication

First, let’s implement the server:

public class EchoServer {
    // ...

    private void handleClient(Socket socket) throws IOException {
        InputStream in = socket.getInputStream();
        OutputStream out = socket.getOutputStream();
        byte[] buffer = new byte[1024];
        int n;
        while ((n = in.read(buffer)) > 0) {
            out.write(buffer, 0, n); }}}Copy the code

As you can see, the implementation on the server side is actually quite simple, we are constantly reading the input data and writing it back to the client.

Now let’s look at the client.

public class EchoClient {
    // ...

    public void run(a) throws IOException {
        Thread readerThread = new Thread(this::readResponse);
        readerThread.start();

        OutputStream out = mSocket.getOutputStream();
        byte[] buffer = new byte[1024];
        int n;
        while ((n = System.in.read(buffer)) > 0) {
            out.write(buffer, 0, n); }}private void readResponse(a) {
        try {
            InputStream in = mSocket.getInputStream();
            byte[] buffer = new byte[1024];
            int n;
            while ((n = in.read(buffer)) > 0) {
                System.out.write(buffer, 0, n); }}catch(IOException e) { e.printStackTrace(); }}}Copy the code

The client is a little bit more complicated, we want to read the response from the server as well as the user input. So, a thread is created to read the server response.

For those unfamiliar with lambda, replace Thread readerThread = new Thread(this::readResponse) with the following code:

Thread readerThread = new Thread(new Runnable() {
    @Override
    public void run(a) { readResponse(); }});Copy the code

Open the two terminals and run the following commands:

$ javac EchoServer.java
$ java EchoServer
Copy the code
$ javac EchoClient.java
$ java EchoClient
hello Server
hello Server
foo
foo
Copy the code

On the client side, we’ll see that all the characters that we typed are printed.

A few final points to note:

  1. In the above code, all of our exceptions are not handled. In actual applications, when an exception occurs, you need to close the socket and handle the error according to the actual service
  2. On the client side, we didn’t stopreadThread. In practice, we can make the thread return from a blocked read by closing the socket. Java Concurrent Programming In Action is recommended.
  3. Our server only handled one client connection. If you need to process more than one client at a time, you can create threads to process requests. This is left to the reader as an exercise.

Socket and ServerSocket are indistinct

Before we get into the topic of this section, the reader may want to consider the following question: In the example from the previous section, how many sockets are present when the client connection succeeds after we run the Echo service?

The answer is three sockets. There is one client and two servers. Directly related to the answer to this question is the topic of this section — what is the difference between sockets and ServerSockets.

Sharp-eyed readers may have noticed that in the last section I described them this way:

In the Java SDK, there are two interfaces for sockets: ServerSocket for listening to customer connections and socket for communication.

Note that I only said ServerSocket is used to listen for customer connections, not that it can also be used for communication. Let’s take a closer look at the differences.

Note: The following descriptions use the UNIX/Linux system APIS

First, after we create ServerSocket, the kernel creates a socket. This socket can be used either to listen for client connections or to connect to remote services. Since ServerSocket is used to listen for client connections, it then calls listen on the socket created by the kernel. In this way, the socket becomes a so-called listening socket, and it starts listening for the client’s connection.

Next, our client creates a Socket, and the kernel creates an instance of the Socket. The socket created by the kernel is no different from the one created by ServerSocket initially. The difference is that the Socket then executes connect to it, initiating a connection to the server. As mentioned earlier, the Socket API encapsulates the TCP layer, so after connect, the kernel sends a SYN to the server.

Now, let’s switch roles to the server. After receiving the SYN, the server host creates a new socket, which continues the three-way handshake with the client.

After the three-way handshake, the serverSocket.Accept () we execute returns an instance of the Socket that the kernel automatically created for us in the previous step.

So, in the case of one client connection, there are actually three sockets.

There’s another interesting thing about the socket that the kernel automatically creates. Its port number is the same as ServerSocket. Hey!!!!! A port can only be bound to one socket. That’s not exactly true.

What I said earlier about TCP using port numbers to identify which process data belongs to needs to be changed in socket implementation. Sockets do not only use port numbers to distinguish different Socket instances, but use the quad .

In the example above, our ServerSocket looks like this: <*:*, *:9877>. That means it can accept any client and any local IP.

The Socket returned by Accept looks like this: <127.0.0.1:xxxx, 127.0.0.1:9877>, where XXXX is the client port number.

If the data is sent to a connected socket, the kernel finds an exact match, so the data is sent exactly to the peer.

If the client initiates the connection, only <*:*, *:9877> will match, so the SYN is sent to the listening socket.

So much for the Socket/ServerSocket distinction. If the reader feels not satisfied, can refer to “TCP/IP in Detail” volume 1, volume 2.

The implementation of Socket long connection

Background knowledge

Socket long connection: the Socket connection between the client and the server is kept open for a long time.

For those of you who are familiar with sockets, you may know that there is an API:

socket.setKeepAlive(true);
Copy the code

HMM… “Keep alive” means to keep TCP open. So, we want to implement a socket long connection, only need this one call.

Unfortunately, life isn’t always so beautiful. For the 4.4BSD implementation, the keep Alive option of the Socket is opened and no communication occurs for two hours, then the underlying layer sends a heartbeat to see if the other is still alive.

Mind you, it only happens once every two hours. In other words, when there is no actual data communication, I unplug the cable, and your application will not know about it for two hours.

Before explaining how to implement persistent links, let’s address the problems we face. Assuming there is now a pair of connected sockets, the socket will no longer be available if:

  1. When an end is closed, it is a socket. The party that initiatively closes will sendFINTo notify the other party to close the TCP connection. In this case, if the other end tries to read the socket, it will readEoFEnd of File. So we know that the other side has closed the socket.
  2. The application crashes. The socket is then closed by the kernel, as in case 1.
  3. The system crashed. At this time the system is too late to sendFINBecause it’s already on its knees. At this point the other party can not know this situation. When the peer tries to read the data, it finally returns read Time out. If data is written, an error such as host unreachable is displayed.
  4. Cables are cut and network cables are pulled out. Similar to case 3, if the socket is not read or written, neither side knows that an accident has occurred. Unlike in case 3, if we connect the network cable back, the socket will still work.

In all of these cases, we can tell if the socket connection is abnormal by reading and writing the socket. Based on this, to implement a socket long connection, we need to do is constantly write data to each other, and then read each other’s data, the so-called heartbeat. As long as the heart is beating, the socket is alive. The data write interval depends on the actual application requirements.

Heartbeat packets are not actual service data and need to be processed differently according to the communication protocol.

For example, if we use JSON for communication, we can add a type field that indicates whether the JSON is heartbeat or business data.

{"type": 0, // 0 indicates the heartbeat //... }Copy the code

The situation is similar with binary protocols. The requirement is that we can distinguish between a packet of heartbeat and real data. In this way, we have implemented a socket long connection.

Implementation examples

In this section, let’s implement an Android Echo client with a long connection.

First, the interface part:

public final class LongLiveSocket {

    /**
     * 错误回调
     */
    public interface ErrorCallback {
        /** * Return true */ if reconnection is required
        boolean onError(a);
    }

    /** * Read data callback */
    public interface DataCallback {
        void onData(byte[] data, int offset, int len);
    }

    /** * Write callback */
    public interface WritingCallback {
        void onSuccess(a);
        void onFail(byte[] data, int offset, int len);
    }


    public LongLiveSocket(String host, int port,
                          DataCallback dataCallback, ErrorCallback errorCallback) {}public void write(byte[] data, WritingCallback callback) {}public void write(byte[] data, int offset, int len, WritingCallback callback) {}public void close(a) {}}Copy the code

Our class that supports long links will be called LongLiveSocket. If the socket needs to be reconnected after disconnection, just return true in the corresponding interface (in the real scenario, we also need to ask the customer to set the wait time for reconnection, read and write, connection timeout, etc.). For the sake of simplicity, I won’t support it here.

Another thing to note is that to make a full library, you need to provide both blocking and callback apis. Again, for the sake of space, I’ll just skip it.

First let’s look at the write() method:

public void write(byte[] data, int offset, int len, WritingCallback callback) {
    mWriterHandler.post(() -> {
        Socket socket = getSocket();
        if (socket == null) {
            // initSocket failed and the customer said there was no need to reconnect, but the customer asked us to send data to him again
            throw new IllegalStateException("Socket not initialized");
        }
        try {
            OutputStream outputStream = socket.getOutputStream();
            DataOutputStream out = new DataOutputStream(outputStream);
            out.writeInt(len);
            out.write(data, offset, len);
            callback.onSuccess();
        } catch (IOException e) {
            Log.e(TAG, "write: ", e);
            // Close the socket to avoid resource leakage
            closeSocket();
            // Here we return the failed data to the client, so that the client can more easily resend the data
            callback.onFail(data, offset, len);
            if(! closed() && mErrorCallback.onError()) {/ / reconnectioninitSocket(); }}}); }Copy the code

Since we need a timed write heartbeat, we use a HandlerThread to handle the write request. The protocol used for communication is simply to prefix the user data with a len field to determine the length of the message.

Let’s look at the heartbeat transmission:

private final Runnable mHeartBeatTask = new Runnable() {
    private byte[] mHeartBeat = new byte[0];

    @Override
    public void run(a) {
        // We use the data of length 0 as the heart beat
        write(mHeartBeat, new WritingCallback() {
            @Override
            public void onSuccess(a) {
                // Every HEART_BEAT_INTERVAL_MILLIS
                mWriterHandler.postDelayed(mHeartBeatTask, HEART_BEAT_INTERVAL_MILLIS);
                mUIHandler.postDelayed(mHeartBeatTimeoutTask, HEART_BEAT_TIMEOUT_MILLIS);
            }

            @Override
            public void onFail(byte[] data, int offset, int len) {
                // nop
                The // write() method handles the failure}}); }};private final Runnable mHeartBeatTimeoutTask = () -> {
    Log.e(TAG, "mHeartBeatTimeoutTask#run: heart beat timeout");
    closeSocket();
};
Copy the code

The heartbeat is sent using the write() method we implemented above. After sending a task successfully, post delay a timeout task. If no response is received from the server after the task expires, we will consider that the socket is abnormal and directly close the socket. Finally, the processing of the heartbeat:

int nbyte = in.readInt();
if (nbyte == 0) {
    Log.i(TAG, "readResponse: heart beat received");
    mUIHandler.removeCallbacks(mHeartBeatTimeoutTask);
}
Copy the code

Since the length of user data is always greater than 1, here we use len == 0 as the heartbeat. Remove the mHeartBeatTimeoutTask after the heartbeat is received.

The rest of the code is not very relevant to our topic, and the reader can either find the complete code here [3] or complete the example themselves.

Finally, if you want to save resources, you can omit heart beat when a customer sends data.

There may also be some controversy about how we handle read errors. After the read error, we just close the socket. The socket does not reconnect until the next write action occurs. In practice, if this is a problem, you can simply start reconnecting after a read error. In this case, some additional synchronization is required to avoid duplicate socket creation. The situation is similar for Heart Beat Timeout.

Protocol design with TCP/IP

If we just want to use sockets, we can ignore the protocol details. I recommend you check out TCP/IP In Detail because there is so much to learn. Many of the questions we encounter in our work can be answered here.

Each of the following sections is titled a small question, and readers are advised to think for themselves before moving on. If you find your answer is better than mine, be sure to send me an email AT ljtong64 AT Gmail DOT com.

How do I upgrade the protocol version?

There is a popular saying that the only constant in the world is change. When we upgrade the version of the protocol, it is very important to correctly identify the different versions of the protocol for software compatibility. So, how do we design the protocol to prepare for future version upgrades?

The answer can be found in the IP protocol.

The first field of the IP protocol is version. The current value is 4 or 6, indicating IPv4 and IPv6 respectively. Since this field is at the beginning of the protocol, the receiver can determine whether the packet is IPv4 or IPv6 based on the value of the first field after receiving the data.

Again, this field is in the first field in both versions of the IP protocol. For compatibility purposes, the corresponding field must be in the same location. The situation is similar for text protocols (e.g. JSON, HTML).

How do I send packets with variable length of data

For example, let’s use wechat to send a message. The length of the message is indeterminate, and each message has its boundary. How do we deal with this boundary?

Again, look at IP. The header of IP has two fields: Header Length and Data Length. By adding an LEN field, we can separate the data according to the application logic.

Another alternative to this is to place a terminator at the end of the data. For example, like C strings, we put a \0 at the end of each piece of data as a terminator to identify the end of a message. The problem with this approach is that the user’s data may also have \0. At this point, we need to escape the user’s data. For example, change all \0’s of user data to \0\0. In the process of reading the message, if there is \0\0, it represents \0, if there is only one \0, it is the end of the message.

The advantage of using the LEN field is that we don’t need to escape the data. It’s much more efficient to read all the data at once according to the len field.

The terminator scheme requires us to scan the data, but if we can start reading data from anywhere, we need the terminator to determine where the message begins.

Of course, the two methods are not mutually exclusive and can be used together.

Uploading multiple files is successful only when all files are successfully uploaded

Now we have a requirement to upload multiple files to the server at one time, and it will only be successful if all files are successfully uploaded. How do we do that?

If a datagram is too large, IP splits a datagram into multiple fragments and sets a more Fragments (MF) bit to indicate that the packet is only a part of the data after being split.

Ok, let’s also learn about IP. Here, we can number each file starting with 0. Upload the file with this number, and an additional MF logo. The MF flag is set for all files except the one with the largest number. Since MF is not set to the last file, the server can use this to figure out how many files there are.

Another way to do this without using the MF flag is to tell the server how many files there are before we upload them.

If the reader is familiar with the database, it is also possible to learn how the database is handled with transactions. I won’t discuss it here.

How to ensure the order of the data

Here’s an interview question I’ve come across. There is now a task queue from which multiple worker threads take tasks and execute them, and the execution results are placed in a result queue. The first requirement is that when the results are queued, the order in which they are queued is the same as when the results are queued (that is, if the task is fetched first, the results of execution need to be queued first).

Let’s see how TCP/IP works. When an IP address sends data, the time for different datagrams to arrive at the peer end is uncertain. The data sent later may arrive earlier. TCP solves this problem by assigning a sequence number to each byte of the data it sends. This sequence number allows TCP to reassemble the data in its original order.

Similarly, we assign a value to each task, increasing in order of entry to the work queue. After completing the task, the worker thread checks whether the sequence number of the object to be put is the same as its own task before putting the result in the result queue. If not, the result cannot be put in. At this point, the easiest thing to do is to wait until the next result that can be queued is the one you executed. However, the thread can no longer process the task.

Better yet, we maintain a buffer of one more result queue with data sorted by sequence number from smallest to largest. There are two possibilities for a worker thread to put results in:

  1. The task that just finished is the next one, putting this result in the queue. Then, starting at the head of the buffer, you put in all the data that can be put into the result queue.
  2. The completed task cannot be placed in the result queue, at which point the result queue is inserted. Then, as in the previous case, you need to check the buffering.

If the test shows that the result buffer does not contain much data, use a normal linked list. If you have a lot of data, you can use a minimum heap.

How do I ensure that the message is received

We say TCP provides reliable transport. Wouldn’t that make sure the other party got the message?

Unfortunately, no. When we write data to the socket, the remote kernel receives the data and returns an ACK. At this point, the socket considers that the data has been written successfully. However, it is important to note that just because the kernel of the running system has successfully received the data, it does not mean that the application has successfully processed the data.

The solution is the same, let’s take TCP and add an APP ACK for the application layer. After receiving the message and processing it successfully, the application sends an APP ACK to the recipient.

With APP ACK, another problem we need to deal with is, what should we do if the other party really does not receive it?

Messages can also be lost when TCP sends data. If TCP does not receive an ACK from the peer for a long time after sending data, it assumes that the data is lost and resends the packet.

We do the same. If we do not receive an APP ACK for a long time, we will assume that the data is lost and send another one.

The attached:

[1] renyugang.io/post/75

[2] jekton.github.io

[3] github.com/Jekton/Echo

Welcome to pay attention to the wechat official account to receive first-hand technical dry goods