Original: https://blog.csdn.net/ThinkWo…

Computer network architecture

Among the basic concepts of computer network, the hierarchical architecture is the most basic. There are many abstract concepts in computer network architecture, so we should think more when learning. These concepts are very helpful for later learning.

What is the network protocol?

In order to exchange data methodically on a computer network, there are some agreed rules that must be followed, such as the format of the data to be exchanged and whether a reply message needs to be sent. These rules are called network protocols.

Why layer network protocols?

  • Simplify problem difficulty and complexity. Since the layers are independent, we can divide the big problems into smaller ones.
  • Good flexibility. When the technology of one layer changes, the other layers are not affected as long as the indirect interface relationship of the layer remains unchanged.
  • Easy to implement and maintain.
  • Promote standardization. When separated, the functionality of each layer can be described relatively simply.

Disadvantages of network protocol layering: Functions may occur in multiple layers, incurring additional overhead.

In order to make the computer network of different architecture can be interconnected, the international organization for Standardization (ISO) put forward a standard framework to try to make all kinds of computers interconnected into a network in the world in 1977, namely the famous open system Interconnection basic reference model OSI/RM, referred to as OSI.

OSI’s seven-layer protocol architecture is clear in concept and complete in theory, but it is complex and impractical, unlike TCP/IP architecture, which is now very widely used. TCP/IP is a four layers architecture, which contains the application layer, transport layer, the layer and the network interface layer (with the name of the layer is to emphasize this layer is to solve the problem of different network interconnection), but in essence, only the top three layers of the TCP/IP, because of the network interface layer on the bottom of the there is no specific content, Therefore, when learning the principle of computer network, the compromise method is often adopted, that is, the advantages of OSI and TCP/IP are integrated, and the system structure with only five layers of protocol is adopted, which is concise and can explain the concept clearly. Sometimes, the bottom two layers can be called the network interface layer for convenience.

The relationship between layer 4, Layer 5, and layer 7 protocols is as follows:

  • TCP/IP is a four-layer architecture, mainly including: application layer, transport layer, Internet layer and network interface layer.
  • The architecture of five-layer protocol mainly includes: application layer, transport layer, network layer, data link layer and physical layer.
  • The OSI seven-layer protocol model mainly includes: Application layer, Presentation layer, Session layer, Transport layer, Network layer, Data Link layer, Physical layer

Note: the five-layer protocol architecture is only designed to introduce the network principle, and the actual application is still TCP/IP four-layer architecture.

TCP/IP protocol family

The application layer

The task of the application-layer is to complete a specific network application through the interaction between application processes. Application layer protocols define the rules for communication and interaction between application processes (processes: running programs in a host).

Different application layer protocols are required for different network applications. There are many application layer protocols in the Internet, such as domain name system DNS, HTTP protocol to support world Wide Web applications, SMTP protocol to support E-mail and so on.

Transport layer

The main task of the Transport layer is to provide a common data transfer service for communication between two host processes. Application processes use the service to transmit application-layer packets.

The transport layer uses the following two protocols

  1. Transmission control Protocol -TCP: provides connection-oriented, reliable data transfer services.
  2. User data protocol-UDP: provides a connectionless, best-effort data transfer service (without guaranteeing the reliability of data transfer).

Each application layer (the top layer of the TCP/IP reference model) protocol typically uses one of two transport layer protocols:

Protocols running over TCP:

  • Hypertext Transfer Protocol (HTTP) is used for common browsing.
  • HTTPS (HTTP over SSL) : the secure version of HTTP.
  • File Transfer Protocol (FTP) : used for File Transfer.
  • POP3 (Post Office Protocol, Version 3) is used to receive mails.
  • SMTP (Simple Mail Transfer Protocol) is used to send emails.
  • TELNET (Teletype over the Network) is a method of logging in to the Network through a terminal.
  • Secure Shell (SSH) is used to encrypt and Secure login.

Protocols running on UDP:

  • Boot Protocol (BOOTP) : applies to diskless devices.
  • Network Time Protocol (NTP) is used for Network synchronization.
  • Dynamic Host Configuration Protocol (DHCP) is used to dynamically configure IP addresses.

Running over TCP and UDP:

  • The Domain Name Service (DNS) is used to search for addresses and forward mails.

The network layer

The task of the network layer is to select the appropriate network routing and switching nodes to ensure the timely transmission of computer communication data. When sending data, the network layer encapsulates the packet segments or user datagrams generated by the transport layer into packets and packets for transmission. In the TCP/IP architecture, packets are also called IP datagrams, or datagrams for short, because the network layer uses the IP protocol.

The Internet is composed of a large number of heterogeneous networks connected with each other through routers. The network layer protocol used by the Internet is the Connectionless Internet Protocol (Intert Prococol) and many routing protocols, so the network layer of the Internet is also called the Internet layer or IP layer.

Data link layer

The Data Link Layer is often referred to as the link layer for short. Data transmission between two hosts is always transmitted over a segment of the link, which requires the use of a special link-layer protocol.

When transmitting data between two adjacent nodes, the data link layer assembles IP datagrams handed over by the network layer into frames, and transmits frames on the link between the two adjacent nodes. Each frame contains data and necessary control information (such as synchronization information, address information, error control, etc.).

When receiving data, control information enables the receiver to know which bits a frame begins and ends in.

A typical Web application’s traffic flow looks like this:

When transmitting data from layer to layer, the sender will print the header information of this layer every time it passes through layer. On the other hand, when transmitting data from layer to layer, the receiver will remove the corresponding header information each time it passes through the layer.

The physical layer

The units of data transmitted at the physical layer are bits. The role of the physical layer is to realize the transparent transmission of bitstreams between adjacent computer nodes, shielding the differences between specific transmission media and physical devices as much as possible. The data link layer above it does not have to consider what the specific transmission medium of the network is. “Transparently transmitted bitstream” means that the bitstream transmitted by the actual circuit does not change, and the circuit appears to be invisible to the transmitted bitstream.

TCP/IP protocol family

Among the various protocols used on the Internet, the most important and the most famous are TCP/IP. Nowadays, TCP/IP is often referred to not only as TCP and IP, but as the entire TCP/IP protocol family used by the Internet.

Internet Protocol Suite (IPS) is a network communication model and a family of network transport protocols that serve as the basic communication architecture of the Internet. It is commonly referred to as the TCP/IP Protocol Suite, or TCP/IP Protocols. Because the two core protocols of the protocol family: TCP (Transmission control Protocol) and IP (Internet protocol), are the earliest standards adopted in the family.

To highlight:

TCP (Transmission Control Protocol) and IP (Internet Protocol) are the first two core protocols defined, so they are collectively called TCP/IP protocol family

TCP three handshakes four waves

TCP is a connection-oriented, reliable, byte stream based transport layer communication protocol. Before sending data, communication parties must establish a connection with each other. The so-called “connection” is actually a piece of information about each other, such as IP address and port number, saved by the client and the server.

TCP can be thought of as a byte stream that handles packet loss, duplication, and errors at the IP layer and below. During the establishment of a connection, the two parties need to exchange some connection parameters. These parameters can be placed in the TCP header.

A TCP connection consists of two IP addresses and two port numbers. A TCP connection is usually divided into three phases: connection, data transfer, and exit (close). Establish a link with three handshakes and close a link with four waves.

When a connection is established or terminated, the segment exchanged contains only the TCP header and no data.

The header structure of a TCP packet

Before learning about TCP connections, learn about the structure of TCP header packets.

There are a few fields in the figure above that need to be highlighted:

(1) Serial number: SEQ serial number, consisting of 32 bits, which identifies the byte stream sent from the TCP source to the destination and is marked when the initiator sends data.

(2) Confirmation number: THE ACK number is 32 bits. The confirmation number field is valid only when the ACK flag bit is 1. Ack = SEq +1.

(3) Flag bits: 6 in total, URG, ACK, PSH, RST, SYN, FIN, etc., with the following meanings:

  • ACK: Confirms that the serial number is valid.
  • FIN: Releases a connection.
  • PSH: The receiver should send the packet to the application layer as soon as possible.
  • RST: resets the connection.
  • SYN: Initiates a new connection.
  • URG: Urgent Pointer is valid.

Note that:

  • Do not confuse the acknowledgement sequence ACK with the ACK in the flag bit.
  • Ack of the confirming party = SEQ +1 of the initiating party.

Three-way handshake

The essence of the three-way handshake is to confirm the ability of the communicator to send and receive data

First, I ask the Courier to transport a letter to the other party, and the other party receives it, so he knows that my sending ability and his receiving ability are ok.

So he wrote me a letter, and if I received it, I would know that my sending ability and his receiving ability are ok, and his sending ability and my receiving ability are ok.

However, he didn’t know whether his sending ability and my receiving ability were ok or not, so I gave him one last feedback. If he received it, he would know that his sending ability and my receiving ability were ok.

That’s three handshakes. Do you understand that?

  • First handshake: To initiate a connection request to the server, the client generates an ISN(for example, 100) randomly. The packet segment sent from the client to the server contains the SYN flag (SYN=1) and the SEQUENCE number is seq=100.
  • Second handshake: After receiving the packet from the client, the server finds SYN=1 and realizes that this is a connection request. The server stores the client’s start sequence number 100 and randomly generates a server’s start sequence number (for example, 300). Then, a reply packet is sent to the client containing the SYN and ACK flags (SYN=1,ACK=1), sequence number seQ =300, and ACK number 101(sequence number +1 from the client).
  • Third handshake: After receiving the reply from the server, the client finds ACK=1 and ACK= 101, and knows that the server has received the packet with serial number 100. SYN=1 (SYN=1); SYN=1 (SYN=1); The client then replies a packet to the server containing the ACK flag bit (ACK=1), ACK= 301(serial number +1 of the server), and SEQ =101(the sequence number used to send the packet during the first handshake is the same, so the SEQ starts from 101. Note that the ACK packet without data does not occupy the sequence number, so the seQ is still 101 when the data is formally sent for the first time. When the server receives the packet and finds ACK=1 and ACK= 301, it knows that the client has received the packet with serial number 300. In this way, the client and the server establish a connection through TCP.

Four times to wave

The purpose of four waves is to close a connection

For example, ISA=100 initialized by the client and ISA=300 initialized by the server. After the TCP connection is successful, the client sends 1000 bytes of data, and the server replies 2000 bytes of data before the client sends FIN packets.

  • First wave: After all data is transmitted, the client sends a connection release packet to the server. The connection release packet contains the FIN flag bit (FIN=1) and serial number (SEQ =1101) (100+1+1000, Where 1 is the serial number used to establish the connection. Note that after sending FIN packets, the client cannot send data, but can receive data normally. In addition, a FIN packet segment occupies a sequence number even if it does not carry data.
  • Second wave: After receiving the FIN packet from the client, the server replies with an ACK bit (ACK=1), ACK number (ACK= 1102), and SEQUENCE number (SEQ =2300(300+2000). In this case, instead of sending FIN packets to the client immediately, the server is in the closed waiting state. This state lasts for a period of time because the server may not finish sending data.
  • Third wave: After sending the last data (such as 50 bytes), the server sends a connection release packet to the client. The packet contains the FIN and ACK flag bits (FIN=1,ACK=1),ACK= 1102, and SERIAL number seQ =2350(2300+50).
  • Fourth wave: After receiving the FIN packet from the server, the client sends an ACK packet containing the ACK flag bit (ACK=1), ACK id (ACK= 2351), and SEQUENCE number (SEQ =1102) to the server. Note The client does not release the TCP connection immediately after sending the confirmation packet, but releases the TCP connection after 2MSL(twice the lifetime of the longest packet segment). The server releases the TCP connection as soon as it receives the confirmation packet from the client. Therefore, the server terminates the TCP connection earlier than the client.

Frequently seen exam

Why are TCP connections made 3 times? Can’t you do it twice?

The problem of packet loss during connection needs to be considered. If the handshake is done twice, if the segment of the confirmation packet sent by the server to the client is lost during the second handshake, the server has prepared the data (it can be understood that the server has been successfully connected), but the client has not received the confirmation packet from the server. Therefore, the client does not know whether the server is ready or not. In this case, the client does not send data to the server and ignores the data sent by the server.

For example, if the ack packet sent by the client for the third handshake is lost and the server does not receive the ACK packet for a period of time, the server resends the SYN packet segment. After receiving the retransmitted packet segment, the client sends an ACK packet to the server.

Why are TCP connections made 3 times and closed 4 times?

This is because TCP can only be disconnected when neither the client nor the server has data to send. When the client sends FIN packets, the client can only ensure that no data is sent. It is unknown whether the server sends data to the client. While the service side only after receipt of the client a FIN message to reply a confirmation message to the client first told me the client service side FIN of your message has been received, but I still have some data server didn’t send out, such as the data is sent over the server to send the client FIN packet (so cannot one-time to send confirmation message and FIN a message to the client, This is the extra one).

Why does the client wait 2MSL to release the TCP connection after sending the fourth wave acknowledgement packet?

Packet loss is also considered here. If the fourth wave packet is lost, the server will resend the third wave packet without receiving the ack packet. In this way, the longest time for the packet to go back is 2MSL, so it takes such a long time to confirm that the server has received the packet.

What if the connection has been established, but the client suddenly fails?

TCP has a keepalive timer, so if the client fails, the server can’t wait forever, wasting resources. The server resets this timer every time it receives a request from the client, usually for two hours. If it does not receive any data from the client within two hours, the server sends a probe segment, which is then sent every 75 seconds. If there is no response after 10 probe packets are sent, the server assumes that the client is faulty and closes the connection.

What is HTTP and the difference between HTTP and HTTPS

HTTP is a convention and specification for the transfer of hypertext data, such as text, pictures, audio, and video, between two points in the computer world

Common HTTP status code

The HTTP status code indicates the return result of the HTTP request from the client, identifies whether the server processes the request properly, and indicates the error of the request.

Categories of status codes:

GET and POST

When it comes to GET and POST, we have to mention HTTP because the browser and server interaction is performed over HTTP, and GET and POST are also two methods in THE HTTP protocol.

HTTP Hyper Text Transfer Protocol (HTTP) ensures communication between browsers and servers. HTTP works as a request-response protocol between clients and servers.

The HTTP protocol defines different methods for the browser and server to interact. There are four basic methods: GET, POST, PUT, and DELETE. These four methods can be understood as the search, change, add, delete of server resources.

  • GET: Obtains data from the server. It only obtains resources from the server without modifying them.
  • POST: Submits data to the server, which involves updating data, that is, changing data on the server.
  • PUT: In English, PUT means to add new data to the server.
  • DELETE: The process of deleting data from the server.

GET and POST

  1. Get is insecure because the data is placed in the requested URL during transmission; All Post operations are invisible to the user. This is sometimes absolutely true, and most people do this in the same way, but you can also add a request body to a GET request and a URL to a POST request.
  2. A maximum of 2048 bytes of data can be contained in the URL submitted by a Get request. This limit is imposed by the browser or server. HTTP does not limit the LENGTH of a URL. Post requests have no size limit.
  3. Get restricts the data set of the Form Form to ASCII characters; Post supports the entire ISO10646 character set.
  4. Get performs better than Post. Get is the default method for form submission.
  5. GET generates a TCP packet; POST generates two TCP packets.

For GET requests, the browser sends both HTTP headers and data, and the server responds with 200 (return data). For POST, the browser sends a header, the server responds with 100 continue, the browser sends data, and the server responds with 200 OK (returns data).

What are symmetric and asymmetric encryption

Symmetric key encryption means that encryption and decryption use the same key. The biggest problem in this method is key sending, that is, how to securely send the key to the other party.

Asymmetric encryption refers to the use of a pair of asymmetric keys, namely a public key and a private key. The public key can be distributed freely, but the private key is known only to itself. The party that sends the ciphertext uses the other party’s public key for encryption. After receiving the encrypted information, the other party uses its own private key to decrypt the encrypted information. Asymmetric encryption can ensure security because it does not need to send the private key for decryption. But it’s very slow compared to symmetric encryption

What is a HTTP2

HTTP2 can improve the performance of web pages.

In HTTP1, the browser limits the number of requests for the same domain (typically six in Chrome). When requesting many resources, the browser waits until the maximum number of requests has been completed.

HTTP2 introduces multiplexing, which allows all requested data to be transmitted over a single TCP connection. Multiplexing improves web page performance by bypassing the problem of browsers limiting the number of requests under the same domain name.

The main differences between Sessions, Cookies, and Tokens HTTP is stateless. What is stateless? The server cannot identify the user.

What is a cookie

Cookies are small files (key-value format) saved by the Web server on the user’s browser, containing user-related information. The client makes a request to the server, and if the server needs to record the user status, it issues a Cookie to the client browser using response. The client browser saves the Cookie. When the browser requests the site again, the browser submits the requested URL along with the Cookie to the server. The server checks the Cookie to identify the user.

What is a session

The session is implemented depending on cookies. Session is a server-side object

Session is the storage space allocated by the server during the session between the browser and the server. The server sets the sessionid in the cookie by default. The browser sends the cookie containing the sessionid during the request to the server. The server obtains the information stored in the session according to the sessionid and determines the identity information of the session.

Cookies are different from sessions

  • Storage location and security: Cookie data is stored on the client with poor security, while session data is stored on the server with higher security.
  • Storage space: A single cookie can store no more than 4K of data. Most browsers limit the number of cookies stored on a site to 20. Session does not have this limit
  • Occupy server resources: The session is stored on the server for a certain period of time. If the number of accesses increases, the server performance will be affected. In consideration of the server performance, cookies should be used.

What is a Token

Introduction of Token: Token comes into being when the client frequently requests data from the server, and the server frequently queries and compares the user name and password in the database to judge whether the user name and password are correct or not.

Definition of Token: A Token is a string generated by the server as a Token for the client to request. After the first login, the server generates a Token and returns the Token to the client. The client only needs to bring the Token to request data without the need to bring the user name and password again.

The purpose of using Token: The purpose of Token is to reduce the strain on the server, reduce frequent queries to the database, and make the server more robust.

Tokens are generated on the server side. If the front end uses the user name and password to request authentication from the server, the server returns the Token to the front end. The front-end can prove its legitimacy with tokens at every request

The difference between session and token

  • The session mechanism has some problems, such as increasing server pressure, CSRF cross-site forgery request attack and poor scalability.
  • The session is stored on the server and the token is stored on the client
  • Token provides authentication and authorization functions. As identity authentication, token is more secure than session.
  • Session only works when the client code and the server code run on the same server. Token works for project-level separation of the front and back ends (the front and back ends run on different servers).

Are servlets thread-safe

Servlets are not thread-safe, and concurrent reads and writes from multiple threads can cause data synchronization problems.

The solution is to avoid defining the name attribute and instead define the name variable in the doGet() and doPost() methods. Synchronized (name){} blocks can solve the problem, but can cause threads to wait, which is not very scientific.

Note: Multithreaded concurrent reads and writes to servlet-class attributes can cause data to be out of sync. But if properties are read concurrently without being written, there is no problem with data asynchronism. Therefore, it is best to define read-only properties in servlets as final.

What methods are available in the Servlet interface and the Servlet lifecycle

In Java Web applications, servlets are responsible for receiving user requests for HttpServletRequest, doing the corresponding processing in doGet(), doPost(), and returning the HttpServletResponse to the user. The Servlet can set initialization parameters for internal use by the Servlet.

The Servlet interface defines five methods, the first three of which are related to the Servlet lifecycle:

  • void init(ServletConfig config) throws ServletException
  • void service(ServletRequest req, ServletResponse resp) throws ServletException, java.io.IOException
  • void destory()
  • java.lang.String getServletInfo()
  • ServletConfig getServletConfig()

Life cycle:

Once the Web container loads and instantiates the Servlet, the Servlet life cycle begins, and the container runs its init() method to initialize the Servlet;

When the request arrives, the Servlet’s service() method is called. The service() method calls the corresponding doGet or doPost methods as needed.

When the server shuts down or the project is uninstalled, the server destroys the Servlet instance, calling the destroy() method of the Servlet.

The init and deStory methods are executed only once, and the Service method is executed every time the client requests the Servlet. Servlets sometimes use resources that need to be initialized and destroyed, so you can put the code that initializes the resource in init and the code that destroys the resource in destroy so that you don’t need to initialize and destroy the resource every time you handle a client request.

Does it still work if the client disables cookies?

Cookie and Session are generally considered to be two independent things. Session uses the scheme of maintaining state on the server side, while Cookie uses the scheme of maintaining state on the client side.

But why can’t you get sessions if you disable cookies? The Session ID is used to determine the server Session corresponding to the current Session, and the Session ID is transmitted through the Cookie. If the Cookie is disabled, the Session ID is lost and the Session cannot be obtained.

Assume that the user closes the Cookie to use the Session, it can be implemented in the following ways:

  1. Manually pass the Session ID through the URL and hide the form.
  2. Save Session IDS in files, databases, etc., and call them manually during page crossing.

If you have any mistakes or other questions, please leave comments and correct them. If you have any help, please like + forward to share.