omg

With the booming development of the Internet industry, the market demand for programmers is booming, especially Java programmers, and Java programmers who are not trained are not a minority. I am one of them. Due to the lack of understanding of the bottom layer of the computer, there are many frameworks related to the bottom of the implementation do not understand. But as an excellent Java programmer, how can tolerate such things happen, there is no understanding to do everything possible to understand.

These two days I just began to learn Redis, when I understand why redis so fast, a strange noun appeared in front of my eyes, “IO multiplexing”, so I like most people, have a question on the Baidu, but I found that to understand IO multiplexing, you must understand the IO model of Linux, Understanding socket connections, understanding file descriptors, and so on I have never been exposed to knowledge. At this point, some people might think, I just need to know how to use it, and I don’t need something this complicated to work with. But I want to say, just because you are a good programmer doesn’t mean you are a good programmer. So I’ve been on various blogs and forums to finally get a sense of these unfamiliar concepts, and I’m going to take you through sockets and file descriptors from a non-professional programmer’s perspective, as well as take a look at some of the other IO models of Linux

What is a socket

Before we understand what a socket is, let’s take a common example in daily life: making a phone call. I want to make a phone call to my girlfriend. First, I have to dial her phone number. After the radio transmission, the other party’s cell phone receives the signal I made and rings. Once connected, you can talk to your girlfriend and she can talk to you. This process is very much like a socket connection

If computer A wants to communicate with computer B through the network, then computer A must have A socket, and computer B must also have A socket. Once the two sockets are connected, computer A can send and receive data to computer B, and computer B can send and receive data to computer A. When computer A sends data to computer B, it uses the SocketA OutputStream. When computer A receives data from computer B, it uses the Socket InputStream







The official explanation

There is a detailed explanation of sockets in the Oracle official documentation, I believe that read my explanation above, you will have a better understanding

A socket is one endpoint of a two-way communication link between two programs running on the network. A socket is bound to a port number so that the TCP layer can identify the application that data is destined to be sent to.

Two programs running in the network, they set up a link to enable the two sides to communicate with each other, a socket is one end of this link. A socket is bound to a port so that the TCP transport layer knows where the data is going

Two, socket connection process

First, the server has a socket bound to port 80 (port 80 is open for HTTP hypertext transfer protocol). The server waits until a client sends a connection request to the server

The client informs the server of its IP address and port information. In this case, the server enables a local port with the same port number as the client and creates a socket to ensure that the socket of port 80 can continue to listen to other connections

In this way, a pair of sockets are established, and the client and server can send and read data through sockets








The relationship between TCP/IP and Socket

TCP/IP, or the Transmission Control Protocol/Internet Protocol, is a suite of communication protocols used to interconnect network devices on the internet.

TCP/IP, also known as Transmission Control Protocol/Network Protocol, is a set of protocols used to connect network devices over the Internet. So what is an agreement? In layman’s terms, an agreement is like a traffic rule. It’s a set of rules that tell motorists how to walk on the road. TCP/IP protocol is a set of specifications for data transmission across the Internet.



Having said that, where is the socket?As can be seen from the figure, Socket is an intermediate software abstraction layer for the communication between the application layer and the TCP/IP protocol family. It is a set of interfaces (apis). The TCP/IP stack is implemented in the operating system kernel, and Socket is a series of interfaces provided by the operating system kernel to the application layer. Socket encapsulates TCP/IP. To use TCP/IP to send data, call the Socket OutputStream. To receive data using TCP/IP, InputStream of the Socket is called








4. Read/write cache of Socket

Now computer A has established A Socket connection with computer B, and computer A wants to send data to computer B directly. The answer is NO, the Socket sends data first through the Socket buffer

First of all, we must figure out the process of sending data. If we want to send the data in user mode to the Internet, we must copy the data to the kernel mode first, and then the kernel mode sends the data out for us. Therefore, every time the computer creates a socket, the CPU will allocate a pair of read/write buffers in memory for it. The read/write buffer is in the kernel state, and its size does not change with the data size.

Computer wants to send data to the computer B, the first computer A copy data from user mode to kernel mode of the output buffer, again by the output buffer data sent to the computer via the Internet B input buffer, computer B data copies of input buffer to user mode, completes the send and receive data at A time.

Because the size of the data buffer is limited, if there is data in the data buffer that is not sent out, and there is other data to be sent in user mode, the space of the data buffer is not enough, which can cause a series of problems. This can also cause problems if computer B wants to receive data and never receives data from computer A, causing the input buffer to remain empty.

Linux has five solutions to these problems. These are the five IO models of Linux.

Before we look at the IO model, we also need to know what is a file descriptor










What is a file descriptor

Reference on zhihu: zhuanlan.zhihu.com/p/105086274

File descriptors are indexes created by the operating system kernel to efficiently manage opened files. File descriptors are used to refer to opened files

In Linux, each process has a file descriptor table, which is an array of Pointers. The system initializes the first three bits of the array by default. Bit 0 points to a standard input stream (typically a keyboard), bit 1 to a standard output stream (typically a monitor), and bit 2 to a standard error stream (typically a monitor).

Now, if you have a process with only one hello. TXT file open, bit 3 of the file descriptor table for that process will be the pointer to that hello. If the process then creates a socket, bit 4 of the file descriptor table is the pointer to the socket, because in Linux everything is a file and the socket is a file. The file descriptor is the index of the array in the process, so it can also be said to be an index

Six, Linux practical operation

The socket and file descriptors mentioned above are concepts that can be seen in Linux with commands (actually these files, everything in Linux is a file).

1. Check the file descriptor for current bash

ll /proc/$$/fd
Copy the code

$$indicates the current bash process ID. The file descriptors 0,1, and 2 represent entry points for standard input, standard output, and error output. 255 is unique to bash and can be left alone

2. Set up a socket connection with www.baidu.com in Linux

exec 6<> /dev/tcp/www.baidu.com/80
Copy the code
  • 6 is the file descriptor. The number is arbitrary and cannot duplicate any existing number (0,1,2,255).
  • <> is a redirect, < represents an input stream, > represents an output stream, meaning/dev/tcp/www.baidu.com/80The file descriptor for this file is redirected to 6 (or the input/output stream for this file executes file descriptor 6), from which the input/output stream for this file is obtained




3. Check socket connections

ll /proc/$$/fd   
Copy the code

Again, look at the file descriptor in the process, because the socket is also a fileIf file descriptor 6 points to a socket, the socket is successfully established with Baidu




4, send request header to socket, get Baidu home PAGE HTML

echo -e 'GET/HTTP / 1.0 \ n'1 > & 6Copy the code
  • Echo is the command printed on the console, and -e means that the string \n is automatically converted to a newline
  • 1 > 6The output of the console is redirected to file descriptor 6, which is to send data to Baidu
  • 1 > & 6Since > is followed by a file descriptor, ampersand is used after >




5. View the returned data

cat 0<& 6
Copy the code
  • Cat is the command to view the contents of a file
  • cat 0<& 6The contents representing file descriptor 6 (the contents of the socket read buffer) are entered into cat’s standard input so that the contents sent by Baidu can be viewed

However, no data was returned, because we operated for too long, the socket connection did not send any data, so the server timed out and disconnected, so we just need to do the above steps quickly, we can get the contentSo you get baidu’s data










To the reader

I believe you have read this file on sockets and file descriptors have a certain understanding. As I said at the beginning of this article, I wrote this article to record what I have learned because I am a non-professional programmer who has not systematically learned the knowledge at the bottom of the operating system. I think it would be very boring and purposeless to systematically learn the underlying knowledge. My way of learning is to have a deep understanding of the underlying knowledge encountered in the learning process. For example, when I learn Redis, I need to understand IO multiplexing, understand IO multiplexing, understand the five IO models of Linux, understand these models and know what are sockets and file descriptors, etc. Then I will study these in depth, so that learning will be more purposeful and there will not be the problem of learning and forgetting.

In the next article, I will use this knowledge to explain the five memory models of Linux, including IO multiplexing. Stay tuned.

Please let me know in the comments section if you have any problems with this article. If you think it’s a good article, give me a thumbs up