1 Communication Protocol

1.1 define

Communications Protocol refers to the rules and conventions that two entities must follow to complete communication or services. The protocol defines the format used by the data unit, the information and meaning that the information unit should contain, the connection mode, and the timing of the sending and receiving of the information to ensure the smooth transmission of data to the specified place in the network.

The three elements of 1.2

The agreement mainly consists of the following three elements:

  • Syntax: “how to speak”, data format, encoding and signal level (level high and low).
  • Semantics: “what”, data content, meaning, and control information.
  • Timing rules (timing) : Specify the order, rate matching, and ordering of communications.

1.3 the characteristics of

The communication protocol is hierarchical, reliable and effective.

1.4 Architecture

To layer the network system is to decompose the complex communication network coordination problems, and then deal with them separately, so that the complex problems are simplified, so as to facilitate the understanding of the network and the design and implementation of each part. The schematic diagram of the layered structure is shown in the figure. Each layer implements relatively independent functions. The lower layer provides services to the upper layer, and the upper layer is the user of the lower layer. Conducive to communication, understanding and standardization; The protocol is for one layer only, for communication between equivalent entities; Easy to implement and maintain; Good flexibility, structurally separable.

2 the OSI model

2.1 the OSI/RM

The Open System Interconnection Reference Model (OSI/RM) is the basic Reference Model for Open System Interconnection. Open means non-monopoly. Systems are the parts of a real system that are connected.

2.2 Seven-story structure

2.2.1 PhysicalLayer (PhysicalLayer)

Specifies the mechanical, electrical, functional, and process characteristics of communications equipment used to establish, maintain, and dismantle physical link connections. To be specific, the mechanical characteristics of the network connection required connector size, pin number and arrangement, etc. The electrical characteristics specify the signal level size, impedance matching, transmission rate distance limit and so on when transmitting bit stream on physical connection. Functions and features define the functions of each line between DTE (data terminal device) and DCE (data communication device). Process characteristics define a set of operation procedures for bit stream transmission using signal lines. It refers to the actions of DTE and DCE on each circuit during the establishment, maintenance, and exchange of information of physical connections.

Typical specification representatives belonging to the physical layer definition include EIA/TIARS-232, EIA/ TIarS-449, V.35, RJ-45, etc.

Main functions of the physical layer:

A data path is provided for the data end device to transmit data. The data path can be one or multiple physical media. A complete data transfer, including activating, transmitting, and terminating physical connections. The so-called activation means that no matter how many physical media are involved, it is necessary to connect the two data terminal devices in communication to form a path.

Main equipment of physical layer: repeater, hub.

2.2.2 DataLinkLayer

Based on the bitstream service provided by the physical layer, the data link between adjacent nodes is established, and the error-free transmission of data frames on the channel is provided through error control, and the action series of each circuit is carried out.

The data link layer provides reliable transport over unreliable physical media. The functions of this layer include: physical address addressing, data framing, flow control, data error detection, retransmission, etc.

At this level, the units of data are called frames.

Data link layer protocols include SDLC, HDLC, PPP, STP, frame relay, etc.

Main functions of the link layer:

The link layer provides data transmission services for the network layer, and this service depends on the functions of the layer. The link layer provides the following functions:

The establishment, removal, and separation of links. Frame demarcation and frame synchronization. The data transmission unit of the link layer is frame. The length and interface of frame are different with different protocols, but the frame must be delimited anyway.

Sequence control refers to the control of the sequence of sending and receiving frames. Error detection and recovery. Link identification, flow control and so on. Error detection uses square matrix check and cyclic check to detect the error code, while frame loss is detected by serial number. The recovery of various errors is usually accomplished by feedback retransmission technique.

Main devices at the data link layer: Layer 2 switches and Bridges

2.2.3 Network Layer

Two computers communicating in a computer network may pass through many data links and may also pass through many communication subnets. The task of the network layer is to select the appropriate internetwork routing and switching nodes to ensure the timely transmission of data. The network layer will decapsulate the frame received by the data link layer and extract the packet. The packet is encapsulated with the network layer packet header, which contains the logical address information – the network address of the source site and destination site address.

If you are talking about an IP address, then you are dealing with layer 3 issues, which are “packets”, not layer 2 “frames”. IP is part of the layer 3 problem, along with several routing protocols and address resolution protocol (ARP). Everything about routing is handled at layer 3. Address resolution and routing are important purposes for Layer 3. Network layer can also realize congestion control, Internet interconnection and other functions.

At this level, the units of data are called packets.

Network layer protocols include IP, IPX, and OSPF.

Main functions of the network layer:

The network layer provides the following functions for establishing network connections and providing upper-layer services: routing and relaying; Activate, terminate network connection; Multiple network connections are multiplexed on one data link, and time-sharing multiplexing technology is adopted. Error detection and recovery; Sorting, flow control; Service selection; Network management; Introduction to network layer standards.

Main device at the network layer: router

2.2.4 Transport Layer

This layer is responsible for capturing all the information, so it must keep track of bits of data, packets arriving out of order, and other hazards that may occur in transit. Layer 4 provides transparent, reliable end-to-end (end user to end user) data transfer services for the upper layer. The so-called transparent transmission means that the transmission layer shields the details of the communication transmission system from the upper layer in the communication process.

Transport layer protocols include TCP, UDP, and SPX.

2.2.5 Session Layer

This layer can also be called the meeting layer or the conversation layer. At the higher levels of the conversation layer and above, the units of data transmission are collectively referred to as the message, but are not named separately. The session layer does not participate in the specific transport and provides mechanisms for establishing and maintaining communication between applications, including access validation and session management. For example, the server authenticates the user login by the session layer.

The session layer provides services that enable applications to establish and maintain sessions and synchronize sessions.

2.2.6 Presentation Layer

This layer mainly solves the problem of syntactic representation of user information. It converts the data to be exchanged from an abstract syntax suitable for a user to a transport syntax suitable for use within the OSI system. That is, to provide formatted presentation and transformation data services. The presentation layer is responsible for data compression and decompression, encryption and decryption. The display of image formats, for example, is supported by protocols located in the presentation layer.

2.2.7 Application Layer

The application layer provides the interface for the operating system or network applications to access network services. Application layer protocols include Telnet, FTP, HTTP, SNMP, and DNS.

The OSI seven layers use a variety of control information to communicate with their counterparts in other computer systems. These control messages contain special requests and instructions that are exchanged between the corresponding OSI layers. The head and tail of each layer of data are two basic forms that carry control information.

Communication between one OSI layer and another is done using services provided by the second layer. Adjacent layers provide services that help one OSI layer communicate with its counterpart in another computer system. A particular layer of an OSI model is usually associated with three other OSI layers: the one directly adjacent to it and the one below, as well as the corresponding layer of the target networked computer system. For example, the data link layer of computer A should communicate with its network layer, physical layer, and data link layer of computer B.

2.3 Advantages of Layering

  • (1) People can easily discuss and learn the specification details of the protocol.
  • (2) Standard interfaces between layers facilitate engineering modularization.
  • (3) Create a better interconnection environment.
  • (4) Reduce the complexity, make the program easier to modify, faster product development.
  • (5) Each layer uses the services of the adjacent lower layer to make it easier to remember the functions of each layer.

OSI is a well-defined set of protocol specifications and has many optional parts that accomplish similar tasks.

It defines the hierarchy of an open system, the relationships between the layers, and the possible tasks that each layer contains. It serves as a framework to coordinate and organize the services provided by the various layers.

3 TCP/IP protocol

TCP/IP is the TCP/IP protocol.

Transmission Control Protocol/Internet Protocol abbreviation, translated as Transmission Control Protocol/Internet Interconnection Protocol, also known as network communication Protocol, is the most basic Protocol of the Internet, the foundation of the Internet, generally speaking: TCP is responsible for detecting problems with transmission and signaling any problems, requiring retransmission until all data is safely and correctly transmitted to its destination. IP assigns an address to every computer on the Internet.

3.1 Composition Level

TCP/IP is not a combination of TCP and IP, but refers to the entire TCP/IP protocol family on the Internet.

In terms of protocol layered model, TCP/IP consists of four layers: network interface layer, network layer, transport layer and application layer.

Notice TCP does not provide the error detection function caused by noise during data transmission, but provides the error retransmission function for timeout.

3.1.1 Network interface Layer

The physical layer defines the properties of the physical medium:

  1. Mechanical properties;
  2. Electronic characteristics;
  3. Functional features;
  4. Procedure characteristics.

The data link layer is responsible for receiving IP packets and sending them over the network, or receiving physical frames from the network, extracting IP packets and delivering them to the IP layer. ARP is a forward address resolution protocol that searches for the MAC address of a host based on a known IP address. RARP is a reverse address resolution protocol that determines an IP address based on a MAC address. Such as diskless workstations and DHCP services. Common interface layer protocols include Ethernet 802.3, Token Ring 802.5, X.25, Frame Relay, HDLC, PPP ATM, etc.

3.1.2 the network layer

Responsible for communication between adjacent computers. Its function includes three aspects.

  1. Process packet send requests from the transport layer. Upon receipt of the request, the packet is loaded into an IP datagram, the header is populated, the path to the host is selected, and the datagram is sent to the appropriate network interface.
  2. Processing of incoming datagrams: first check their validity, then pathfinding – if the datagrams have reached the homing machine, remove the headers and hand the rest over to the appropriate transport protocol; If the datagram has not yet arrived at the destination, the datagram is forwarded.
  3. Handle problems such as path, flow control, and congestion.
  • The network layer includes Internet Protocol (IP) and Internet Control Message Protocol (ICMP).
  • Packet control Protocol, Address Resolution Protocol (ARP), and Reverse ARP(RARP).
  • The IP address is the core of the network layer. The next IP address is encapsulated by routing and delivered to the interface layer. IP datagrams are connectionless services.
  • ICMP is a supplement to the network layer and can send back packets. It is used to check whether the network is normal.
  • The Ping command is an ICMP echo packet sent back to the echo relay for network testing.

3.1.3 the transport layer

Provides communication between applications. Its functions include: first, format information flow; Two, to provide reliable transmission. To achieve the latter, the transport layer protocol stipulates that the receiver must send back confirmation and, if the packet is lost, resend it, a process known as the “three-way handshake” that provides reliable data transmission.

The Transmission layer protocols include Transmission Control Protocol (TCP) and User Datagram Protocol (UDP).

3.1.4 application layer

Provide users with a set of common applications, such as E-mail, file transfer access, remote login, and so on. Remote login TELNET Provides an interface registered with other hosts on the network using the TELNET protocol. TELNET sessions provide character-based virtual terminals. File transfer Access FTP The FTP protocol is used to copy files between machines on the network.

Application layer protocols include FTP, TELNET, DNS, SMTP, NFS, and HTTP.

  • File Transfer Protocol (FTP) is a File Transfer Protocol. Generally, the FTP service is used for uploading and downloading files. The data port is 20H, and the control port is 21H.
  • Telnet service is a user remote login service. It uses 23H port and uses clear code transmission. It has poor security and is simple and convenient.
  • The Domain Name Service (DNS) is a Domain Name resolution Service that translates Domain names to IP addresses using port 53.
  • SMTP is a Simple Mail Transfer Protocol used to control the sending and forwarding of letters. Port 25 is used.
  • Network File System (NFS) is a Network File System used to share files between hosts on the Network.
  • Hypertext Transfer Protocol (HTTP) is a Hypertext Transfer Protocol used to implement WWW services on the Internet. It uses port 80.

3.2The main features

  1. The TCP/IP protocol is not dependent on any specific computer hardware or operating system and provides an open protocol standard. Even without considering the Internet, TCP/IP is widely supported. So TCP/IP protocol becomes a practical system to combine all kinds of hardware and software.
  2. TCP/IP is not dependent on specific network transport hardware, so TCP/IP can integrate a wide variety of networks. Users can use Ethernet, Token Ring Network, dial-up line, X.25, and all the Network transport hardware.
  3. The unified network address assignment scheme enables the entire TCP/IP device to have a unique address on the network
  4. Standardized high-level protocols can provide a variety of reliable user services.

3.3Agreement advantage

In the long run, IP gradually replaces other networks. Here’s a simple explanation. IP transmits universal data. Data can be used for any purpose and can easily replace data previously transmitted by proprietary data networks.

3.4Main drawback

First, the distinction between services, interfaces and protocols is not clear. A good software engineer should distinguish between functionality and implementation, and TCP/IP’s failure to do this makes the TCP/IP reference model inadequate as a guide to using new technologies. The TCP/IP reference model is not suitable for other non-TCP /IP protocol clusters.

Second, the host-network layer itself is not an actual layer; it defines the interface between the network layer and the data link layer. The division between the physical layer and the data link layer is necessary and reasonable, and a good reference model should separate them, which the TCP/IP reference model does not.

3.5 summarize

The protocols in the network layer mainly include IP, ICMP, IGMP and so on. Because it contains the IP protocol module, it is the core of all networks based on TCP/IP protocol. In the network layer, the IP module performs most of the functions. ICMP and IGMP and other ip-enabled protocols help IP perform specific tasks, such as transmitting error control information and control messages between hosts/routers. The network layer controls the transmission of information between hosts in the network.

The main protocols at the transport layer are TCP and UDP. Just as the network layer controls the flow of data between hosts, the transport layer controls the flow of data that will enter the network layer. Two protocols are the two ways it manages this data: TCP is a connection-based protocol; UDP is a protocol for connectionless service management.

4 TCP/IP Transport layer in the system

The transport layer of TCP/IP has two different protocols:

(1) User Datagram Protocol (UDP)

(2) Transmission Control Protocol (TCP)

4.1 Service Types

The services provided by the transport layer can be divided into transport connection services and data transmission services.

  • Transport connection service: Generally, for each transport connection requested by the session layer, the transport layer establishes a corresponding connection at the network layer.
  • Data transmission service: emphasizes the provision of connection-oriented reliable services (OSI began to develop standards for connectionless services very late), and provides flow control, error control, and sequence control to realize the transmission of messages between two terminal systems without error, loss, repetition, and disorder.

• Connection-oriented services:

  • Establish a connection and then transfer data, then disconnect
  • During data transmission, packets do not need to carry destination addresses
  • Ensure the reliability of data transmission

• Connectionless services:

  • Send data directly without establishing a connection beforehand
  • Each packet carries a complete destination address
  • The reliability of packet transmission is not guaranteed

4.2 to establish TCP Join the three handshakes

TCP is a connection-based protocol. Before sending or receiving data, a reliable connection must be established with the peer. A TCP connection must go through three “conversations” to set up, and the process is very complicated. Here we will give a simple, graphic introduction, as long as you can understand the process. Let’s look at the simple process of these three conversations:

Host A sends A connection request packet to host B: “I want to send you data, ok?” This is the first dialogue;

Host B sends host A A packet agreeing to connect and asking for synchronization (synchronization is when two hosts are sending and one is receiving, coordinating work) : “Yes, when will you send?” This is the second dialogue;

Host A sends A data packet to confirm host B’s request synchronization: “I send now, you then!” This is the third dialogue.

The purpose of the triple chat is to synchronize the sending and receiving of data packets. After the triple chat, host A sends data to host B.

4.3 the end TCP Connect the four waves

Because the TCP connection is full-duplex, each direction must be closed separately. The principle is that a party can send a FIN to terminate the connection in that direction when it has finished sending its data. Receiving a FIN only means that there is no data flow in that direction, and a TCP connection can still send data after receiving a FIN. The party that closes first performs an active shutdown and the other party performs a passive shutdown.

(1) Client A sends A FIN to shut down data transmission from client A to server B.

(2) When server B receives the FIN, it sends back an ACK with the received FIN number plus 1. As with the SYN, a FIN takes a sequence number.

(3) Server B closes the connection with client A and sends A FIN to client A.

(4) Client A sends an ACK packet to confirm the receipt, and sets the sequence number of the ACK packet to 1.

4.4 port

  • Ports are transport layer service access points (TSAP).
  • The function of the port is to enable various application processes of the application layer to deliver their data down to the transport layer through the port, and to let the transport layer know that the data in its packet segment should be delivered up to the corresponding application layer process through the port.
  • In this sense, ports are used to identify processes at the application layer.

The port address in the TCP segment structure is 16 bits. The port number ranges from 0 to 65535. For the 65536 port numbers, there are the following rules:

  1. A port number smaller than 256 is a common port. Servers are generally identified by common port numbers. Any TCP/IP implementation provides services with port numbers between 1 and 1023, which are managed by IANA.
  2. The client only needs to ensure that the port number is unique on the local machine. The client port number is also called a temporary port number because it exists for a short time.
  3. Most TCP/IP implementations assign temporary port numbers ranging from 1024 to 5000. The port number larger than 5000 is reserved for other servers.

Some common port numbers and their uses are as follows:

  • Port 21: FTP file transfer service
  • 22 Port: SSH remote connection service
  • 23 Port: TELNET terminal emulation service
  • 25 Port: SMTP simple mail transfer service
  • 53 Port: DNS domain name resolution service
  • Port 80: HTTP hypertext transfer service
  • Port 443: Hypertext transfer service with HTTPS encryption
  • Port 3306: port of the MYSQL database
  • Port 5432: indicates the postgresQL database port
  • Port 6379: Redis database port
  • Port 8080: indicates the default port of the TCP server
  • Port 8888: port of the Nginx server
  • Port 9200: specifies the Elasticsearch server port
  • Port 27017: default port of the mongoDB database
  • 22122 Port: Default port of the FastDFS server

5 Domain Name System (DNS))

Domain Name System (foreign language acronym DNS, foreign language full Name Domain Name System), is a core service of the Internet, it can map Domain names and IP addresses to each other as a distributed database, can make people more convenient access to the Internet, without having to remember can be read directly by the machine IP (Chinese full Name: Network protocol) address number string.

The Authoritative Name Server and Caching Name Server have two types of DNS servers: Authoritative Name Server and Caching Name Server. Authoritative domain name server includes primary server, secondary server and other sub-types.

The Primary Master/Slave Server: Each district has at least one AUTHORITATIVE DOMAIN name Server (DNS) to resolve The district. Considering The Server and network instability, The district is usually resolved by two or more DNS servers. One of them is designated as the primary server and the others as secondary servers. The complete configuration information of the region is saved on the primary server, and the secondary server periodically copies the data of the region from the primary server to the local computer.

Storage buffered DNS servers: To save query time and improve performance, DNS servers that support recursive query usually cache query results within a certain period of time. DNS servers that use this method are called storage buffered DNS servers.

5.1 Functions and Principles of DNS

The Domain Name System (DNS) is a naming System for computers and network services organized into a Domain hierarchy. It is used for TCP/IP networks.

The role of DNS

There are usually two ways to identify a host: by hostname or IP address. People prefer an easy-to-remember hostname representation, while routers prefer fixed-length, hierarchical IP addresses. In order to satisfy these different preferences, we need a directory service capable of hostname to IP address translation. The domain name System (DNS), as a distributed database that maps domain names and IP addresses to each other, makes it easier for people to access the Internet.

Principles of DNS domain name resolution

DNS adopts a distributed design, and its domain name space adopts a tree hierarchy:

The figure above shows part of the DNS server hierarchy, from top to bottom: the root DNS server, the top-level DNS server, and the authoritative DNS server. There are actually 13 root DNS servers on the Internet, mostly in North America. The second layer is the top-level domain server, which is responsible for top-level domains (such as com, org, net, edu) and top-level domains for all countries (such as UK, FR, CA, and JP). At the third layer is the authoritative DNS server. Every organization on the Internet with publicly accessible hosts (such as Web servers and mail servers) must provide publicly accessible DNS records, which are kept by the authoritative DNS server of the organization. These records map the names of these hosts to IP addresses.

In addition, there is an important class of DNS servers called local DNS servers. The local DNS server is not technically part of the DNS server hierarchy, but it is important to the DNS hierarchy.

Let’s take an example to see how DNS works. Suppose host A (IP address abc.xyz.edu) wants to know the IP address of host B (def.mn.edu), as shown in the following figure:

  • Host A first sends A DNS query packet to its local DNS server. The query message contains the translated host name def.mn.edu.
  • The local DNS server forwards the packet to the root DNS server. After noticing that the queried IP address prefix is EDU, the root DNS server returns the IP address list of the TOP-LEVEL DNS server responsible for edu to the local DNS server.
  • The local DNS server sends query packets to the TLD servers. The top-level DNS server notices the prefix mn.edu and responds with the IP address of the authoritative DNS server. In general, the TOP-LEVEL DNS server does not always know the IP address of the authoritative DNS server of each host, but only some server in the middle, which in turn can find the IP address of the corresponding host. Let’s assume that the middle has experienced authoritative servers ① and ②. Finally, the authoritative DNS server ③ in charge of def.mn.edu is found. After that, the local DNS server directly sends a query packet to this server to obtain the IP address of host B.

In the figure above, IP address query actually goes through two query methods, namely recursive query and iterative query.

Extension: Two ways of domain name resolution query

  • Recursive query: what if the host asked the local domain name server does not know the IP address of the queried domain name, then the local domain name server in the DNS client identity, send a query request message to other root name servers to continue, namely for the host to query, and not let host himself into the next phase of the query, as shown above steps (1) and (10).
  • Iterative queries: when the root domain name server receives the local domain name server iterative query request packet, or given to the IP address of the query, or tell the local server should find which next to query the domain name server, and then let the subsequent queries on a local server, as shown above step (2) ~ (9).

6 MAC (Media Access Control Layer)

A Media Access Control (MAC) address, also called a MAC address or a hardware address, defines the location of a network device. In the OSI model, the layer 3 network layer is responsible for IP addresses and the layer 2 data link layer is responsible for MAC addresses. So a host will have an IP address, and each network location will have its own MAC address.

6.1 an overview of the

Medium/Media Access Control (MAC) address: identifies each site on the Internet. It is a hexadecimal number with a total of six bytes (48 bits). Among them, the first three bytes are codes assigned to different manufacturers by IEEE registration management agency RA (high 24 bits), also known as “Organizationally Unique identifiers”. The last three bytes (the lowest 24 bits) are assigned by each manufacturer to the adapter interface produced by the manufacturer and are called extended identifiers (unique). An address block can generate 224 different addresses. The MAC address is actually the adapter address or adapter identifier EUI-48.

6.2 role

IP address as a job, and the MAC address is the talent that is like to apply for the position, can let a sitting position, sit down, also can let b in the same way a node’s IP address for the network card is not required, basically what manufacturers can use, that is the IP address and MAC address the binding relationship does not exist. The mobility of some computers is relatively strong, just as talents can work for different units of the truth, the mobility of talents is relatively strong. The relationship between jobs and talents is a bit like the relationship between IP addresses and MAC addresses.

6.3 Differences between MAC Addresses and IP Addresses

The similarities between IP addresses and MAC addresses are as follows:

  1. For a device on the network, such as a computer or a router, the IP address is mutable (but must be unique) and the MAC address is immutable. We can assign any IP address to a host as needed. For example, we can assign IP address 192.168.0.112 to a computer on the LAN or change it to 192.168.0.200. Once produced, the MAC address of any network device (such as network card or router) is unique and cannot be changed by users.
  2. The length is different. The IP address is 32 bits, and the MAC address is 48 bits.
  3. Distribution depends on different basis. IP addresses are assigned based on the network topology, and MAC addresses are assigned based on the manufacturer.
  4. The addressing protocol layer is different. IP addresses are applied to the OSI Layer 3, or network layer, while MAC addresses are applied to the OSI Layer 2, or data link layer. Data link layer protocol can make the data transmitted from one node to another node of the same link (via the MAC address), on the network layer protocol that data can be passed from one network to another network (ARP according to the destination IP address, find the MAC address of the intermediate node, transmitted through intermediate nodes, and eventually reach the destination network).

6.4 Obtaining a MAC Address

1. In Windows 2000/XP/Vista/7

Click Start, click Run, enter CMD, and then enter ipconfig /all. (Or enter ipconfig-all)

Physical Address. . . . . . . . . : 00-23-5A-15-99-42

2. View the MAC address in LINUX

Enter ifconfig on the command line to view the MAC address.

7. The difference between HTTP and HTTPS

7.1 Basic Concepts

HyperText Transfer Protocol (HTTP) is an application-layer Protocol used in distributed, collaborative and hypermedia information systems. Simply put, it is a method of publishing and receiving HTML pages that are used to transmit information between the Web browser and the Web server.

HTTP works on TCP port 80 by default. The standard HTTP service starts with http:// when users visit websites.

HTTP sends content in plaintext and does not provide any data encryption. If an attacker intercepts a packet transmitted between a Web browser and a Web server, the attacker can directly understand the packet. Therefore, HTTP is not suitable for transmitting sensitive information, such as payment information such as credit card numbers and passwords.

Hypertext Transfer Protocol Secure (HTTPS) is a Transfer Protocol for Secure communication over computer networks. HTTPS communicates over HTTP, but uses SSL/TLS to encrypt packets. HTTPS is developed to provide identity authentication for web servers and protect the privacy and integrity of exchanged data.

HTTPS works on TCP port 443 by default. Its workflow is generally as follows:

  1. TCP Three-way handshake
  2. The client authenticates the server digital certificate
  3. DH algorithm Negotiates the keys of symmetric encryption algorithm and hash algorithm
  4. The SSL encryption tunnel negotiation is complete
  5. The web page is transmitted in encrypted way, and encrypted with negotiated symmetric encryption algorithm and key to ensure the confidentiality of data. The hash algorithm is used to protect data integrity from tampering.

7.2 Differences between HTTP and HTTPS

  1. Security: HTTP data is transmitted in plaintext and is not encrypted, resulting in poor security. HTTPS (SSL+HTTP) data is encrypted, resulting in high security.
  2. Certificate: To use THE HTTPS protocol, you need to apply for a Certificate from the Certificate Authority (CA). Generally, there are few free certificates and some fees are required. Certificate authorities such as Symantec, Comodo, GoDaddy and GlobalSign.
  3. Response speed: HTTP pages respond faster than HTTPS pages, mainly because HTTP uses TCP three-way handshake to establish a connection. The client and server need to exchange three packets, whereas HTTPS requires the three PACKETS of TCP and nine packets of SSL handshake, so the total number of packets is 12.
  4. Port: HTTP and HTTPS use completely different connections and use different ports, the former 80 and the latter 443.
  5. HTTPS is an HTTP protocol built on TOP of SSL/TLS, so HTTPS requires more server resources than HTTP.

7.3 the HTTP protocol

An HTTP request operation is called a transaction and its working process is divided into four steps:

7.3.1 Establishing a TCP Connection

Because HTTP uses TCP at the transport layer, the client and server need to establish a connection, such as clicking a hyperlink or submitting registration information, to make HTTP work

7.3.2 Sending HTTP Request Packets

After the TCP connection is established, the client sends an HTTP request packet to the server. The format of the HTTP request packet is as follows:

7.3.3 Responding

After the server receives the HTTP request sent by the client, the server program processes the request and finally gives the corresponding response information. The HTTP response packet is shown as follows:

7.3.4 Disconnecting the Server

After the client receives the HTTP response from the server, the browser automatically parses the information in the HTTP response and displays it. Then the client disconnects from the server. The HTTP response packet usually contains the following data:

  1. Flash animation, such as QQ Farm and pasture
  2. Streaming data, such as downloading files
  3. Html page, this is the most common result, we go to Sina.com to read the news, the main page is Html page

7.4 Get And Post Requests

Http defines different ways to interact with the server. The most basic methods are GET, POST, PUT, and DELETE. The full name of A URL is a resource descriptor, which can be considered as follows: A URL address, which is used to describe a resource on the network. The GET, POST, PUT and DELETE operations in HTTP correspond to searching, changing, adding and deleting this resource. GET is used to GET/query resource information, while POST is used to update resource information. But the most commonly used are GET and POST, called GET request and POST request, such as enter www.sina.com in the browser to enter sina, will be issued GET request; A POST request will be issued when the mailbox registration information is submitted.

  1. How to Get data: Usually Get is used to Get data from the server, and Post is used to submit data to the server
  2. Request URL: When a Get request is made, the requested data is appended to the URL, so it is displayed in the browser address bar. In order to? Split URL and transfer data with & between parameters such as login.action? Name = hyddd & password = idontknow&verify = A5 E5 A0 BD E4% % % % % % BD. If the data is English letters/digits, send it as is, if it is empty, convert it to +, if it is Chinese/other characters, then the string is directly encrypted with BASE64, such as: %E4%BD%A0%E5%A5%BD, where XX in % XX is the ASCII hexadecimal representation of the symbol. In a Post request, the requested data is placed in the HTTP request body, so it is not displayed in the browser address bar. This is also called an implicit submission. Therefore, when submitting some important data, it is strongly recommended to use Post request, such as logging in online banking, submitting registration information and so on
  3. Amount of data transferred: The amount of data transferred by Get is small, mainly because it is limited by URL length. Post can transfer a large amount of data. In theory, THERE is no limit on Post. Therefore, you can only use Post when uploading files
  4. Security: POST is more secure than GET. Note: Security is not the same as “security” mentioned in GET above. The user name and password will appear in clear text on the URL because :(1) the login page may be cached by the browser; (2) the login page may be cached by the browser. Other people can view your browser history, so they can GET your account and password. In addition, submitting data using GET can create a cross-site Request Forgery attack.

Example:

7.5 the Cookie and Session

  • The HTTP protocol (hypertext Transfer Protocol) is stateless and cannot store information about communication (interactions) between clients and servers.
  • For example, take the most common login, now many websites require users to log in to the operation, if in operation A, the user successfully logged in to the system, and then b operation, because HTTP protocol is stateless,
  • The user’s previous login information is not recorded. Then, the user needs to log in again to continue, and so on, every operation needs to log in once, which is a very scary thing.
  • Cookie and session technologies are born to solve a series of problems caused by HTTP stateless protocol. They maintain the state of the session between the user and the server.

7.5.1 cookies

  1. Cookies On the browser side (that is, the client side), the browser will automatically store data to Cookies (the server tells the browser to store certain data to Cookies through HTTP response).
  2. Cookie data is automatically sent to the server in the background by the browser according to certain principles (data is placed in the request header of HTTP request);
  3. The limit of a single Cookie on the client is 3K, that is to say, a site can not store more than 3K cookies on the client;
  4. The data in Cookies is not secure. Important data such as bank card and password cannot be stored in Cookies.
  5. Examples of using Cookies: advertising ads (search for Cookies data in the local browser), shopping sites (shopping cart: the server saves the data on the browser side of the Cookies, if you disable Cookies, products cannot be added to the shopping cart).

7.5.2 Session

A Session is a collection of requests made by the browser and the server. Each Session has a unique identifier (which is guaranteed to be unique by the server), called a Session ID, which is used by the server to distinguish between different clients (when the client sends a request to the server for the first time, The server assigns it a unique sesisonId),

Storing data in a Session means: When a large number of users access the server, the server is under great pressure. In order to eliminate the waste of resources and ensure that the server resources are reasonably and effectively utilized by users, a mechanism is needed to reclaim invalid sessions. The default session timeout period is 30 minutes. If the client does not send HTTP requests to the server within 30 minutes, the server considers the client as an invalid user and reclaims the memory resources occupied by the client.

7.5.3 Differences and connections between Cookies and Session

  1. Cookie is a client side mechanism, Session server mechanism, data stored in Session is safer than stored in cookies;
  2. When a client sends an HTTP request to the server for the first time, the server assigns it a sessionId and stores the sessionId in the Cookies of the client (the sessionId does not have to be stored in the Cookies of the client). The next HTTP request sent by the client must contain a sessionId. After receiving the request packet, the server finds a valid sessionId and considers the two requests to be in the same session.

7.6 TCP three-way handshake

In TCP/IP, TCP establishes a reliable connection through a three-way handshake

  • First handshake: The client attempts to connect to the server and sends a SYN packet (syn Sequence number Synchronize Sequence Numbers) to the server. Syn =j then the client enters the SYN_SEND state and waits for confirmation from the server
  • Second handshake: The server receives and acknowledges a SYN packet (ACK = J +1) from the client and sends a SYN packet (ACK = K) to the client. At this time, the server enters the SYN_RECV state
  • Third handshake: The client receives a SYN+ACK packet from the server and sends an ACK packet (ACK = K +1) to the server. After the packet is sent, the client and the server enter the ESTABLISHED state to complete the three-way handshake

Simplify:

7.7 WORKING Principles of HTTPS

It is well known that HTTPS encrypts information to prevent sensitive information from being accessed by third parties, which is why it is used for many banking websites and email services with high security levels.

1. The client initiates an HTTPS request

The user types in an HTTPS url in the browser and connects to port 443 on the server.

2. Configure the server

HTTPS servers must have a set of digital certificates, which can be made by themselves or applied to organizations. The difference is that the certificates issued by themselves need to be verified by the client before they can continue to access, while certificates applied by trusted companies will not pop up a prompt page (startssl is a good choice. There is one year free service).

This certificate is a pair of public and private keys, if don’t quite understand, public and private keys can imagine as a key and a locks, but the world only you a man who had the key, you can put the locks to others, other people can use this lock lock important things up, and then sent to you, because only you a man who had the key, So only you can see what’s locked with this lock.

3. Send certificates

The certificate is actually the public key, but contains a lot of information, such as the certificate authority, expiration time and so on.

4. The client parses the certificate

This part of the work is done by TLS on the client side. First, it verifies whether the public key is valid, such as the issuing authority, expiration time, etc. If an exception is found, a warning box will pop up, indicating that there is a problem with the certificate.

If the certificate is fine, then generate a random value and encrypt it with the certificate. As mentioned above, lock the random value so that you can’t see what’s locked unless you have a key.

5. Transmit encrypted information

This part transmits the random value encrypted with the certificate. The purpose is to let the server get this random value, and the communication between the client and the server can be encrypted and decrypted by this random value.

6. The server decrypts the information

The service side, after using a private key to decrypt the received client coming random value (private key), then the content through the value for symmetric encryption, the so-called symmetric encryption is that information and a private key by some algorithms are mixed together, so unless you know the private key, you can’t get access to content, and just the client and the server knows the private key, so as long as the encryption algorithm is tough enough, If the private key is complex, the data is secure.

7. Transmit encrypted information

This information is encrypted with the private key of the service segment and can be restored on the client.

8. The client decrypts the information

The client decrypts the message sent by the service segment with the previously generated private key, and then obtains the decrypted content. The whole process, even if the third party listens to the data, there is nothing to do.

7.8 SIMPLIFIED WORKING Principles of HTTPS

(1) Request: port 443, support algorithm, key length

Understanding: two net friends agree to go out to play tomorrow, play what? Mountain climbing, swimming, amusement park pick oneCopy the code

(2) Response: Select one and send its key component to the client

Understanding: Let's go to the amusement parkCopy the code

(3) Response: digital certificate

Understanding: To prove that I am a legal citizen. Let me show you my ID card, issued by XX Public Security Bureau. The id number is XX. It will expire after 10 yearsCopy the code

(4) Response: the negotiation is completed

OK, I agree to go to the amusement parkCopy the code

(5) Generate random password string and use certificate public key encryption

Understanding: To prove to each other that we are negotiating online, add a "meow" at the end of each sentence.Copy the code

(6) Request: Try to use encryption string encryption

Understanding: Let's try. I'm client meowCopy the code

(7) Response: Send finish

Understanding: Try it when I'm doneCopy the code

(8) Response: encrypt and send the same as the client

Understanding: Server meowCopy the code

The SSL link is then complete

7.9 HTTP Does not Save User Status. How Can I Save User Status

We know that if a particular client requests the same object twice in a short period of time, the server does not stop responding because it has just been made available to the user, but resends the object as if the server had completely forgotten what it did not long ago. Because an HTTP server does not store any information about the client, HTTP is said to be a stateless protocol.

There are usually two solutions:

① Session persistence based on Session implementation

After the client sends the HTTP request to the server for the first time, the server creates a Session object and stores the client’s identity as a key-value pair. Then, a Session ID (SessionId) is assigned to the client. This Session ID is stored in the client Cookie. After that, every time the browser sends an HTTP request, it will bring the SessionId in the Cookie to the server. The server can associate the previous status information with the session according to the SessionId, so as to realize the session persistence.

Advantages: High security, because status information is stored on the server side.

Disadvantages: Because large websites usually use distributed servers, HTTP requests sent by browsers usually pass through load balancers before reaching specific background servers. If two HTTP requests from the same browser land on different servers, the session-based method cannot achieve Session persistence.

[Solution: Use middleware, such as Redis. We store Session information in Redis, so that each server can access the previous state information.]

② Session persistence based on Cookie

When the server sends a response message, the SET-cookie field is Set in the HTTP response header to store the client status information. The client parses the field information in the HTTP response header and creates a different Cookie based on its life cycle. In this way, each time the browser sends an HTTP request, the Cookie field will be carried, thus achieving state preservation. The main difference between cookie-based Session persistence and session-based Session persistence is that the former completely stores Session state information in browser cookies.

Advantages: The server does not need to store status information, reducing the storage pressure on the server, and facilitating the horizontal expansion of the server.

Disadvantages: This approach is not secure because the state information is stored on the client side, which means that confidential data cannot be saved in the session. In addition, the browser needs to send additional cookies to the server each time it initiates an HTTP request, which consumes more bandwidth.

Extension: What if cookies are disabled?

If cookies are disabled, you can rewrite the URL to put the session id in the PARAMETERS of the URL, and session persistence can also be realized.

7.10 HTTP Status Code

The HTTP status code consists of three decimal numbers. The first number defines the type of the status code, and the second two do not serve as a classification. There are five types of HTTP status codes:

classification Classification description
1XX Indication message – indicates that the request is being processed
2XX Success – Indicates that the request has been successfully processed
3XX Redirection – Additional operations are required to complete the request
4XX Client error – The request has a syntax error or the request cannot be implemented and the server cannot process the request
5XX Server-side error – The server has an error processing the request

List of corresponding HTTP status codes:

Status code English names Product description
100 Continue To continue. The client continues to process the request
101 Switching Protocol Switch protocol. The server switches to a more advanced protocol at the request of the client
200 OK The request succeeded. The desired response header or data body of the request is returned with this response
201 Created Request to implement. And a new resource has been created based on requirements
202 Accepted The request has been accepted. The request has been accepted, but processing is not complete
203 Non-Authoritative Information Unauthorized information. The request succeeded. The meta information returned is not in the original server, but a copy.
204 No Content No content. The server successfully processed the request, but does not need to return any physical content
205 Reset Content Reset the content. Similar to 204, except that the response that returns this status code asks the requester to reset the document view
206 Partial Content Part of the content. The server successfully processed some of the GET requests
300 Multiple Choices A variety of options. The requested resource has a selection of feedback information, and the user or browser can choose a preferred address for redirection
301 Moved Permanently Permanently move. The requested resource has been permanently moved to the new URI, the return message contains the new URI, and the browser is automatically directed to the new URI
302 Found Temporary move. Similar to 301. However, the resource is only temporarily moved and the client should continue to use the original URI
303 See Other Look at other addresses. Similar to 301. Use GET and POST requests to view
304 Not Modified Unmodified. If the client sends a conditional GET request and the request is granted, the content of the document has not changed (since the last access or according to the conditions of the request), the server should return this status code
305 Use Proxy Use a proxy. The requested resource must be accessed through the specified proxy
306 Unused In the latest version of the specification, the 306 status code is no longer used
307 Temporary Redirect Temporary redirect. The requested resource now temporarily responds to the request from a different URI, similar to 302
400 Bad Request The syntax of the client request is incorrect and the server cannot understand it. The requested parameters are incorrect
401 Unauthorized The current request requires user authentication
402 Payment Required This status code is reserved for possible future requirements
403 Forbidden The server understands the request, but refuses to execute it
404 Not Found The requested resource was not found on the server
405 Method Not Allowed The method in the client request is disabled
406 Not Acceptable The content characteristics of the requested resource do not satisfy the condition in the request header, and therefore the response entity cannot be generated
407 Proxy Authentication Required Similar to the 401 response, except that the client must authenticate on the proxy server
408 Request Time-out The request timed out. The server waited for a request sent by the client for a long time and timed out. Procedure
409 Conflict The request could not be completed because of a conflict with the current state of the requested resource
410 Gone The requested resource is no longer available on the server and does not have any known forwarding address
411 Length Required The server refuses to accept a request without defining the Content-Length header
412 Precondition Failed A prerequisite error occurred when the client requested information
413 Request Entity Too Large The server refused to process the current request because the size of entity data submitted by the request was larger than the server was willing or able to process
414 Request-URI Too Large The request URI was longer than the server could interpret, so the server refused to service the request
415 Unsupported Media Type The server could not process the media format attached to the request
416 Requested range not satisfiable The scope requested by the client is invalid
417 Expectation Failed The server cannot satisfy Expect’s request headers
500 Internal Server The server encountered an unexpected condition that prevented it from completing processing the request
501 Not Implemented The server does not support a feature required for the current request
502 Bad Gateway An invalid response was received from the remote server when a server working as a gateway or proxy tried to execute the request
503 Service Unavailable The server is currently unable to process requests due to temporary server maintenance or overload and may recover after some time
504 Gateway Time-out The server acting as a gateway or proxy did not get the request from the remote server in time
505 HTTP Version not supported The server does not support, or refuses to support, the HTTP version used in the request

Common questions about status codes during an interview:

① What is the difference between status codes 301 and 302?

301: Permanent move. The requested resource has been permanently moved to the new URI, and the old address has been permanently removed. The return message will include the new URI, and the browser will automatically redirect to the new URI. Future requests should be replaced with new URIs.

302: Temporary move. Similar to 301, the client redirects to a new URL after receiving the response message from the server. But the resource is only temporarily moved, the old address remains, and the client should continue to use the original URI.

What does the HTTP exception status code know?

This question usually only needs to answer some common exception status codes starting with 3, 4, and 5.

7.11 If you visit a Website slowly, how can I troubleshoot it

There are many reasons why web pages open slowly. Here are some common ones:

1) first of all, is the most direct way to check the local network is normal, you can speed through the network software such as computer butler to speed of the computer, if the speed is normal, we check to see if the network bandwidth occupied, for example when you are downloading movies and no speed limits, is affects the speed of you open the web page, this kind of situation is often caused by small processors memory;

(2) when the speed test is normal, we carried out on the web server speed, through the ping command to check the link to the server time and packet loss, and so on and so forth, a speed good room, first of all, packet loss rate no more than 1%, secondly ping value is smaller, the last is ping value stability, such as the maximum and minimum difference is too big that routing instability. Or we can check the opening speed of other websites on the same server to see if other websites are also slow to open.

③ If the speed of the web page is fast and slow, and even sometimes can not be opened, it may be the reason for the space instability. When it is determined that the problem is to find your space business to solve or change the space business, if you buy space, you can choose to buy double line space or multi-line space; If it is fast in some places and slow in others, it should be the network line problem. Telecom line user visits the website that puts unicom server, unicom line user visits the website that puts on telecom server, open speed is slower for certain relatively.

④ Find the reason from the website itself. The problem of website mainly includes three parts: website program design, webpage design structure and webpage content.

Website programming: when visiting a web page there is a code that slows down the opening speed of the website, it will affect the opening speed of the web page, such as the statistical code in the web page, we had better put it at the end of the website. Therefore, we need to check whether the design structure of the web program is reasonable; Page design structure: if it is a table layout site, check whether there are too many nested times, or a large table divided into multiple tables such a page layout, at this time we can use DIV layout and CSS optimization. Web page content: check whether there are many large images or large Flash in the web page. We can reduce the image quality, reduce the image size and use less large Flash to solve the problem. In addition, some pages may refer too much to the content of other sites. If some of the cited sites are slow to access, or some pages no longer exist, the opening speed will be slow. An immediate solution is to remove unnecessary add-ons.

7.12 Whole process of Web page parsing [The whole process of the user entering the URL and displaying the corresponding page]

(1) DNS resolution: when the user enters a web address and press enter, the browser gets a domain name, and in the actual communication process, we need to be an IP address, so we need to first convert the domain name into the corresponding IP address. Domain name resolution: Search the local host file. If there is a mapping between the IP address and the domain name, the data is returned directly. If there is no result, the system sends a DNS request to the local DNS server and queries the local cache. If there is no result, the system returns the result. If there is no result, the system searches the DNS root server (DNS domain server) and finds the corresponding IP address.

② TCP connection: after the browser obtains the real IP address of the Web server through DNS, it initiates a TCP connection request to the Web server. After the connection is established through TCP three-way handshake, the browser can send the HTTP request data to the server. The three-way handshake is placed in the transport layer.

③ Send HTTP request: the browser to the Web server to initiate an HTTP request, HTTP protocol is built on the TCP protocol application layer protocol, its essence is in the established TCP connection, according to the HTTP protocol standard send a request for a Web page. In this process, operations such as load balancing are involved.

Extension: What is load balancing?

Load Balance refers to balancing loads (work tasks) and allocating them to multiple operation units, such as FTP server, Web server, enterprise core server, and other main task servers, to cooperatively complete work tasks. Load balancing is built on existing networks and provides a transparent, inexpensive and effective way to expand the bandwidth of servers and network devices, increase throughput, enhance network processing capacity, and improve network flexibility and availability.

Load balancing is one of the factors must be considered in the design of the distributed system architecture, such as Tmall, jingdong big users such as site in order to process mass user initiated in the request, it often adopts distributed servers, and through the introduction of reverse proxy, the user request is evenly distributed to each server, and the process is implementation of load balancing.

(4) Processing the request and returning it: After receiving the HTTP request from the client, the server determines how to obtain the corresponding file based on the content in the HTTP request and sends the file to the browser.

⑤ Browser rendering: the browser begins to display the page according to the response, first parsing HTML files to build a DOM tree, and then parsing CSS files to build a rendering tree. When the rendering tree is completed, the browser begins to lay out the rendering tree and draw it to the screen.

⑥ Disconnection: The client and server terminate the TCP connection by waving the hand four times. [The details are explained in the transport layer]

7.13 B/S Architecture and C/S Architecture

C/S(Client/Server) refers to the Client and Server. Client software must be installed on the Client to access the Server, such as QQ and Feisin, as shown in the following figure:B/S(Browser/Server) refers to the Browser and the Server side. There is no need to install special software on the client side, as long as a Browser. B/S architecture is a typical C/S architecture, and B/S system uses Http protocol, as shown in the following figure

7.13.1 Comparison between B/S Architecture and C/S Architecture

  1. B/S and C/S have their strengths and both are very important computing architectures today
  2. B/S is much better than C/S in terms of Internet availability, maintenance workload, etc
  3. The B/S architecture needs to focus on the compatibility of the system in different browsers, such as Internet Explorer (IE6/7/8/9), Firefox and Google Chrome
  4. C/S architecture needs to consider the system installation and anti-installation (can not make rogue software), which platform to support (such as Win32, Win64, Linux32, etc.)
  5. The client does not need to be maintained and is suitable for large user groups or situations where customer requirements often change. C/S relieves server-side stress
  6. C/S architecture can give full play to the processing capacity of the client PC, so in terms of business performance, C/S is much stronger than B/S. For example, many large games can not be made into B/S system, but can only be made into C/S system

7.14 Processes and Threads

  1. Process: A process can run independently in the operating system and serves as the basic unit of resource allocation. Represents a program running in memory.
  2. Thread: A thread is an instance of a process that serves as the basic unit of system scheduling and dispatching. A sequence in a process that performs a function in the process.
  3. The difference between a process and a thread
  • (1) The same process can contain multiple threads, a process contains at least one thread, a thread can only exist in one process.
  • (2) All threads in the same process can share the resources in the process. (At runtime, the system allocates a different area of memory for each process, but not for threads. A thread can only share the resources of the process it belongs to.
  • (3) After the process ends, all threads under the process will be destroyed, and the termination of one thread will not affect other threads under the same process.
  • (4) thread is a lightweight process, its creation and destruction of the time needed much smaller than the process, all the operating system’s execution function is to create threads to complete.
  • (5) Threads are synchronous and mutually exclusive during execution because they share resources under the same process.
  • (6) In the operating system, the process is an independent unit with system resources, it can have its own resources. In general, a thread cannot own its own resources, but it can access the resources of its subordinate process.