HTTP and HTTPS details

Principles of computer communication

The key technology of the Internet is the TCP/IP protocol. Communication between two computers takes place over the Internet through the TCP/IP protocol. This is actually two protocols:

TCP: Transmission Control Protocol
IP: Internet Protocol Internet Protocol.

The TCP/IP protocol family is a network communication model, as well as an entire family of network transport protocols, for the basic communications architecture of the Internet. The two core protocols of the protocol family, TCP (Transmission Control Protocol) and IP (Internet Protocol), are the earliest standards in the family. This family of protocols is maintained by the Internet Engineering Task Force.

TCP: Communication between applications

TCP ensures that the packets arrive in the correct order and tries to confirm that the contents of the packets have not changed. TCP uses ports over IP addresses. It allows computers to provide various services over the network. Some port numbers are reserved for different services, and these port numbers are well known.

Service or daemon: On the machine providing the service, there is a program that listens for traffic on a particular port. For example, most E-mail traffic flows out on port 25, and HTTP traffic for WWWW flows out on port 80.

When an application wants to communicate with another application over TCP, it sends a communication request. The request must be sent to an exact address. After the handshake, TCP establishes a full-duplex communication between the two applications, occupying the entire communication line between the two computers. TCP is used to control data transmission from an application to the network. TCP is responsible for splitting data into IP packets before they are transmitted, and then reassembling them when they arrive.

TCP/IP means that TCP and IP protocols work together and have a hierarchical relationship.

TCP is responsible for communication between application software (such as your browser) and network software. IP is responsible for communication between computers. TCP is responsible for splitting the data and loading it into IP packets, which are sent to the receiver via THE IP router, which is responsible for addressing them correctly based on traffic, errors in the network, or other parameters, and then reassembling them when they arrive.

IP: communication between computers

The IP protocol is a mechanism that computers use to communicate with each other. Each computer has an IP. Used to identify the computer on the Internet. IP is responsible for sending and receiving data packets over the Internet. With IP, messages (or other data) are split into small, independent packets and sent between computers over the Internet. IP is responsible for routing each packet to its destination.

The IP protocol simply allows computers to send messages to each other, but it does not check whether the messages arrived in the order they were sent and are uncorrupted (only critical header data is checked). In order to provide the message verification function, the transmission control protocol TCP is designed directly on the IP protocol.

HTTP

HTTP concept

From Wikipedia HTTP: HyperText Transfer Protocol (HTTP) is an application-layer Protocol for distributed, collaborative, and hypermedia information systems. HTTP is the basis of data communication on the World Wide Web. HTTP was originally designed to provide a way to publish and receive HTML pages. Resources requested over HTTP or HTTPS are identified by Uniform Resource Identifiers (URIs).

The HTTP protocol layer

HyperText Transfer Protocol (HTTP) is an application layer Protocol based on TCP.

HTTP request response model

HTTP consists of request and response and is a standard client server model (B/S). The HTTP protocol is always the client making the request and the server sending back the response. See below:

HTTP is a stateless protocol. Stateless means that there is no need to establish a persistent connection between the client (Web browser) and the server. This means that when a client makes a request to the server and the server returns a response, the connection is closed and no information about the connection is retained on the server. HTTP follows the Request/Response model. The client (browser) sends a request to the server, which processes the request and returns the appropriate response. All HTTP connections are constructed as a set of requests and replies.

HTTP Working Process

An HTTP operation is called a transaction, and it works as follows:

Address resolution

If use the client browser requests this page: localhost.com: 8080 / index. HTM

The protocol name, host name, port, object path and other parts are decomposed from it. For our address, the result obtained by parsing is as follows:

Protocol name: HTTP Host name: localhost.com Port: 8080 Object path: /index. HTMCopy the code

In this step, the domain name system (DNS) resolves the domain name localhost.com to obtain the IP address of the host.

Encapsulates HTTP request packets

Combine the above part with the information of the machine itself and encapsulate it into an HTTP request packet

Encapsulate it into a TCP packet and establish a TCP connection (TCP three-way handshake)

Before the HTTP work begins, the client (Web browser) first establishes a connection with the server through the network. This connection is completed through TCP, which, together with the IP protocol, builds the Internet, namely the well-known TCP/IP protocol family. Therefore, the Internet is also called the TCP/IP network. HTTP is an application-layer protocol at a higher level than TCP. According to rules, connections can be made to protocols at a higher level only after low-level protocols are established. Therefore, a TCP connection must be established first. This is port 8080.

The client sends the request command

After the connection is established, the client sends a request to the server in the format of uniform resource identifier (URL), protocol version number, followed by MIME information including request modifiers, client information, and content.

Server response

After receiving the request, the server will give the corresponding response information in the format of a status line, including the protocol version number of the message, a success or error code, followed by MIME information including server information, entity information and possible content.

An entity message is when the server sends the header to the browser, it sends a blank line to indicate the end of sending the header, and then it sends the actual data requested by the user in the format described in the Content-type reply header

The server closed the TCP connection. Procedure

Normally, once the Web server sends the request data to the browser, it closes the TCP connection, and then if the browser or server adds this line of code to its header

Connection:keep-alive
Copy the code

The TCP connection will remain open after being sent, so the browser can continue sending requests over the same connection. Staying connected saves the time needed to establish a new connection for each request and saves network bandwidth.

Concepts used in the HTTP working process

Message format

HTTP1.0 packets come in two types: request and response. The message formats are as follows:

Request Message Format

Request method URL HTTP/ Version Request header field (optional) Empty line Body (valid for Post requests only)Copy the code

Such as:

GET http://m.baidu.com/ HTTP/1.1 Host m.baidu.com Connection keep-alive... // Other header keys =iOSCopy the code

Response Message Format

HTTP/ Version Number Return Code Return Code Description Reply header field (optional) The blank line bodyCopy the code

Such as:

HTTP/1.1 200 OK Content-type text/ HTML; charset=UTF-8 ... // Other headers < HTML >...Copy the code

The structure of the URL

HTTP is used to access resources through Uniform Resource Locator (URL). The URL format is as follows:

scheme://host:port/path? Query Scheme: represents protocols, such as Http, Https, Ftp, etc. Host: indicates the name of the host where the resource is accessed, for example, www.baidu.com. Port: indicates the port number. The default value is 80. Path: indicates the storage path of the accessed resources on the destination host. Query: indicates the query condition. For example: http://www.baidu.com/search?words=BaiduCopy the code

HTTP request method

GET: Obtains the resource specified by the URL. DELETE: deletes a file. HEAD: Obtains the header of the packet. Compared with GET, the packet body is not returned. CONNECT: A tunnel must be established for the communication with the proxy server. The tunnel is used for TCP communication. SSL and TLS are used to encrypt data and then transmit it over the network tunnel.Copy the code

Message field

The HTTP header field consists of the field name and field value, separated by a comma (:), for example, content-type: text/ HTML. One field name can correspond to multiple field values.

There are five types of HTTP packet fields:

Request packet field
Reply packet field
Entity header field
General message field
Other Message Fields

Request packet field

Packet fields supported in HTTP requests.

Accept: Indicates the media type that can be processed by the client. Text/HTML, for example, means that the client wants the server to return HTML data, or text data if not available. The format of the media type is:type/subType: indicates that subType data is requested preferentially. If no subType data is requested, the value is returnedtypeType data can also be used. Common media types: Text files: TEXT/HTML, Text /plain, Text/CSS, Application/XML Image files: IAMge/JPEG, image/ GIF, image/ PNG; Video files: Binary files used by the video/ MPEG application: The application/ OCTEt-STREAM and Application /zip Accept fields can be set to multiple field values, so that the server matches the first media type in sequence and returns the first media type matched. Of course, you can also set the weight of the media type by using the Q parameter. The higher the weight, the higher the priority. The value of q is [0, 1] with three decimal places. The default value is 1.0. For example, Accept: text/ HTML, application/ XML; Q =0.9, */* accept-charset: specifies the character set supported by the client. For example, accept-charset: GB2312, isO-8859-1 Accept-encoding: specifies the content Encoding format supported by the client. For example, accept-encoding: gzip Commonly used content Encoding: gzip: Encoding format generated by the file compression program gzip. Compress: Encoding format generated by the Unix file compression program compress; Deflate: An encoding generated using a combination of the Zlib and deflate compression algorithms; Identity: Default encoding format, no compression is performed. Accept-language: indicates the Language supported by the client. For example, accept-language: zh-CN, EN Authorization: indicates the client authentication information. If the client accesses the Authorization field, the server returns 401. Then the client adds the authentication information into the Authorization field and sends it to the server. If the authentication is successful, the server returns 200. For example, the Ftp server under the Linux commune is this process: ftp://ftp1.linuxidc.com. Host: indicates the name of the Host where the resource is accessed, that is, the domain name in the URL. For example, m.baidu.com if-match: indicates that the if-match value is associated with the ETag value (entity tag) of the requested resource. The server processes the request only when the resource changes and the entity tag changes accordingly. If-modified-since: Used to confirm the timelessness of local resources owned by the client. If the client requests a resource that has changed after the time specified by if-Modified-since, the server processes the request. For example: if-modified-since :Thu 09 Jul 2018 00:00:00, the server processes the change request If the client requested a resource that changed after 0:00 on January 9, 2018. With this field, we can solve the following problems: There is an interface that contains a large amount of data and has high real-time performance, so we can use the field to change when refreshing, so as to avoid excess traffic consumption. If-none-match: The server processes the request only If the if-match value is inconsistent with the ETag value of the requested resource. If-range: If the if-range value (ETag value or time) is the same as the ETag value or time of the resource being accessed, the server processes the request and returns the specified Range of data set in the Range field. If not, everything is returned. If-range is actually an upgraded version of if-match, because If its value does not Match, data can still be returned, while If if-match does not Match, the request will not be processed, and data needs to be requested again. If-unmodified-since: As opposed to if-modified-since, the request will be processed only If the requested resource has not changed after a specified time, otherwise 412 will be returned. Max-forwards: indicates the maximum number of servers through which a request can pass. Every time a request is forwarded, max-forwards is reduced by 1. When max-forwards is 0, the server will not forward the request but respond directly. Communication problems can be located through this field. For example, when alipay fiber was cut before, the approximate position can be located by setting max-forwards. Proxy-authorization: When a client receives an authentication challenge from a Proxy server, the client adds authentication information to proxy-authorization to complete authentication. Similar to Authorization, except that Authorization occurs between the client and the server. Range: Obtain partial resources, for example: Range: Bytes =500-1000 refers to the 500th to 1000th byte of the specified resource. If the server can handle the request correctly, it will return 206 as a response, indicating that part of the data has been returned. If it cannot handle such a range request, it will return 200 as a response, and the full data will be returned. Tell the server which page the request was made from. For example, in the Baidu home page search for a keyword, the result page request header will have this field, its value is https://www.baidu.com/. This field can be used to calculate the click on the AD. User-agent: sends information such as the name of the browser and Agent that initiates the request to the server. For example, user-agent: Mozilla/5.0 (Linux; The Android 5.0. Sm-g900p Build/LRX21T) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/63.0.3239.84 Mobile Safari/537.36Copy the code

Reply packet field

Packet fields supported in the HTTP reply.

Indicates that it cannot be processed. Age: The server tells the client how long ago the source server (not the cache server) created the response. The unit is second. ETag: The identity of an entity resource that can be used to request a specified resource. Location: The new Location of the requested resource. Proxy-authenticate: sends authentication information required by the Proxy server to the client. Retry-after: The server tells the client how long to Retry. This is typically used with 503 and 3XX redirection type replies. Server: Tells the Server about the HTTP Server application currently in use. Www-authenticate: informs the client of the authentication scheme, such as Basic or Digest, applicable to the accessed resource. The response from 401 must contain the wwW-authenticate field.Copy the code

Entity header field

Allow: Notifies the client of the request method supported by the server. However, when receiving an unsupported Method, the server responds with Method 405 (Method Not Allowed). Content-encoding: tells the client the Content Encoding of the resource for the server. Content-language: indicates the natural Language used by the resource. Content-length: indicates the Length of the resource to the client. Content-location: indicates the Location of the resource to the client. Content-type: indicates the media Type of the resource notified to the client. The value is the same as Accept in the header field of the request. Expires: Indicates the expiration date of the resource to the client. Can be used to process the cache. Last-modified: Indicates the time when the client resource was Last Modified.Copy the code

General message field

Packet fields that can be used in HTTP requests or HTTP responses.

Cache-control: Controls the caching behavior; Connection: Manages persistent connections. Set the value to keep-alive to implement persistent connections. Date: indicates the Date and time when HTTP packets are created. Pragma: a field that exists before Http/1.1 and is only defined for Http/ 1.0 backward compatibility. Pragma is a generic field that is commonly used in client requests. For example, Pragma: no-cache indicates that the client does not return cached data to the server during the request. Transfer-encoding: specifies the Transfer Encoding used to transmit the message subject. For example, transfer-encoding: chunked Upgrade: Checks whether a higher version of HTTP or other protocols is available. Via: Traces the transmission path of the packet between the client and the server, and avoids the occurrence of the loop. Therefore, this field must be added when passing through the proxy. Warning: An Http/1.1 packet field, evolved from AfterRetry in Http/1.0. It is used to inform users of cache-related warnings.Copy the code

Other Message Fields

These fields are not defined in the HTTP protocol, but are widely used in HTTP requests.

Cookie: belongs to the request packet field. Cookie is added when the request is made to realize the HTTP status record.
Set-cookie: indicates a response packet field. This field is used when the server passes Cookie information to the client.

Set-cookie field properties:

NAME=VALUE: Specifies the NAME and VALUE of the Cookie. Expires =DATE: Expires of a Cookie; Path =PATH: The directory on the server is used as the applicable object of cookies. If the path is not specified, the default path is the file directory where the document resides. Domin = Domain name: specifies the domain name that Cookies are applicable to. If this parameter is not specified, the default domain name of the server that creates Cookies is used. Secure: Cookies are sent only in HTTPS Secure communications. HttpOnly: makes cookies not accessible by JS scripts; Such as: the Set - cookies: BDSVRBFE = Go; max-age=10; domain=m.baidu.com; path=/Copy the code

HTTP reply status code

Status code	category	describe
1xx	Informational status code	The request is being processed
2xx	Success status code	Request processed successfully
3xx	Redirection(Redirection status code)	Redirection is required
4xx	Client Error(Client status)	The server could not process the request
5xx	Error(Server status code)	The server failed to process the request

Common response status code:

Understanding the meaning of the response status code can help us locate problems in the development process. For example, when 4XX occurs, we first need to check whether there is a problem with the request, and when 5XX occurs, we should let the server do the corresponding check work.

HTTP shortcomings

Communication is in plain text and may be eavesdropped
If the identity of the communicating party is not verified, it may be disguised
The integrity of the packet cannot be proved and may be tampered with

These are the drawbacks of HTTP, which is a deadly problem for enterprise security in network communications. Can HTTPS solve these problems? Let’s talk about HTTPS.

HTTPS

HTTP+ encryption + authentication + integrity protection = HTTPS

HTTPS concept

From Wikipedia HTTPS: Hypertext Transfer Protocol Secure (English: Hypertext Transfer Protocol Secure) HTTPS, often called HTTP over TLS, HTTP over SSL, or HTTP Secure, is a transport protocol for Secure communication over computer networks. HTTPS communicates over HTTP, but uses SSL/TLS to encrypt packets. The main purpose of HTTPS development is to provide identity authentication for web servers and protect the privacy and integrity of exchanged data. The protocol was first proposed by Netscape in 1994 and later extended to the Internet. Historically, HTTPS connections have been used to pay for transactions on the World Wide Web and to transfer sensitive information in enterprise information systems. In the late 2000s and early 2010s, HTTPS became widely used to protect the authenticity of web pages on all types of websites, protect accounts and keep users’ communications, identities and web browsing private.

HTTP transmits information in plaintext, which may cause information eavesdropping, information tampering, and information hijacking. However, TLS/SSL provides the functions of identity authentication, information encryption, and integrity check to avoid such problems.

TLS/SSL Is a Transport Layer Security protocol between TCP and HTTP. It does not affect the original TCP and HTTP. Therefore, you do not need to modify the HTTP page to use HTTPS.

HTTPS is the secure version of THE HTTP protocol, which establishes the SSL encryption layer over HTTP and encrypts the transmitted data. HTTPS provides the following functions:

The data is encrypted and an information security channel is established to ensure the data security during transmission
To the web server for real identity authentication

The difference between HTTPS and HTTP

As you can see, HTTPS has one more layer of TLS/SSL protocol than HTTP. What does this protocol do and what does it do? Here’s how TLS/SSL works.

How TLS/SSL works

The main functions of HTTPS are basically dependent on TLS/SSL, and the implementation of TLS/SSL mainly depends on three basic algorithms: Hash function Hash, symmetric encryption, and asymmetric encryption. The asymmetric encryption is used to implement identity authentication and key negotiation. The symmetric encryption algorithm uses negotiated keys to encrypt data and verifies information integrity based on the Hash function.

Hash function Hash

Common functions include MD5, SHA1, and SHA256. These functions are unidirectional and irreversible, very sensitive to input, and with fixed output length. Any modification of data will change the result of the hash function, which is used to prevent information tampering and verify data integrity. In the process of information transmission, the hash function cannot realize the tamper-proof of information alone, because in plaintext transmission, the middleman can modify the information and recalculate the information digest, so the transmitted information and the information digest need to be encrypted.

Symmetric encryption

Common ones include AES-CBC, DES, 3DES, and AES-GCM. The same key can be used for information encryption and decryption. Only the key can be mastered to obtain information and prevent information eavesdropping. The advantage of symmetric encryption is that the information is transmitted one-to-one, and the same password is required to be shared. Password security is the basis of information security. The communication between the server and N clients requires the maintenance of N password records, and there is no mechanism for changing passwords.

Asymmetric encryption

The common RSA algorithm, including algorithms such as ECC and DH, is characterized by the fact that keys are paired, which is generally called public key (public key) and private key (private key). The information encrypted by the public key can be unlocked only by the private key, and the information encrypted by the private key can be unlocked only by the public key. Therefore, different clients that have the public key cannot decrypt each other’s information, and can only encrypt communication with the server that has the private key. The server can implement one-to-many communication, and the client can also authenticate the identity of the server that has the private key. The characteristic of asymmetric encryption is that the information is transmitted from one pair to many, and the server only needs to maintain one private key to encrypt communication with multiple clients, but the information sent by the server can be decrypted by all clients, and the calculation of this algorithm is complex and the encryption speed is slow.

Combining with the characteristics of three kinds of algorithm, the basic works of TLS, the client USES asymmetric encryption to communicate with the server, to implement the authentication and negotiation use symmetric encryption key, then using symmetric encryption algorithm to encrypt the key of information and information consultation communication, using symmetric key different between different nodes, In this way, information can only be obtained by both parties.

PKI system

Risks of RSA authentication

Authentication and key negotiation are the basic functions of TLS, and the prerequisite is that a valid server holds the corresponding private key. However, the RSA algorithm cannot ensure the validity of the server identity, because the public key does not contain information about the server, causing security risks.

Client C communicates with server S, and intermediate node M intercepts the communication between them.
Node M computs a pair of public keys pub_M and private keys pri_M.
When C requests the public key from S, M sends its public key pub_M to C.
The data encrypted by C using public key pub_M can be decrypted by M, because M has the corresponding private key pri_M, but C cannot determine the identity of the server based on the public key information. Therefore, a “trusted” encrypted connection is established between C and * M.
A legitimate connection is established between the intermediate node M and server S, so the communication between C and S is fully mastered by M, and M can eavesfall and tamper with information.
In addition, the server can also send their own information for the denial, do not admit that the relevant information is their own.

Therefore, there are at least two kinds of problems under this scheme: man-in-the-middle attack and information denial.

Authenticate the CA and certificate

The key to solve the above authentication problem is to ensure that the public key obtained is legal and can authenticate the identity information of the server. To this end, an authoritative third-party CA(such as Votong CA) needs to be introduced. The CA verifies information about the owner of public keys, issues certificates, and provides certificate verification services for users, namely, the PKI system (PKI basic knowledge).

The basic principle is that the CA is responsible for auditing the information, and then uses the private key to “sign” the key information, and exposes the corresponding public key, and the client can use the public key to verify the signature. The CA can also revoke issued certificates, including two types of CRL files and OCSP. The CA process is as follows:

A. Server S submits information such as public key, organization information, and personal information (domain name) to the third-party organization CA and applies for authentication;

B.ca verifies the authenticity of the information provided by the applicant through online and offline means, such as whether the organization exists, whether the enterprise is legal, whether it has the ownership of the domain name, etc.

C. If the information is approved, CA will issue the certification document – certificate to the applicant. A certificate contains the following information: the public key of the applicant, organizational and personal information of the applicant, information about the issuing authority (CA), validity period, certificate serial number, and a signature. Signature generation algorithm: first, the hash function is used to calculate the public plaintext information digest, then the CA private key is used to encrypt the information digest, ciphertext is the signature;

D. When client C sends a request to server S, server S returns a certificate file.

E. Client C reads the plaintext information in the certificate and uses the same hash function to calculate the information digest. Then, client C uses the public key of the CA to decrypt the signature data and compares the information digest of the certificate.

F. The client verifies the domain name and validity period related to the certificate.

G. The client has built-in information about the trusted CA certificate (including the public key). If the CA is not trusted, the CA certificate cannot be found and the certificate is determined to be invalid.

Note several points in this process:

A. You do not need to provide a private key when applying for a certificate. Ensure that the private key can only be held by the server.

B. The validity of the certificate still depends on the asymmetric encryption algorithm. The certificate mainly adds server information and signature.

C. The certificate corresponding to the internal CA is called the root certificate. The issuer and user are the same, and they sign themselves, that is, a self-signed certificate.

D. Certificate = public key + information of applicant and issuer + signature;

The certificate chain

For example, if a certificate authority (CA) is added between the CA root certificate and the server certificate, the generation and verification principles of the certificate remain the same, but a layer of verification is added, as long as the certificate can be verified by any trusted CA root certificate.

A. The server certificate server.pem is issued by an intermediate certificate authority. Inter verifies that the certificate issued by the server.

B. CA is root for issuing the inter. Pem certificate. Root verifies that the inter.

C. The client trusts the CA root.pem certificate. Therefore, the server certificate server.pem is trusted.

The server certificate, intermediate certificate, and root certificate are combined to form a valid certificate chain. The verification of the certificate chain is a process of trust transfer from bottom to top. Advantages of level 2 certificate structure:

A. The management workload of the root certificate structure is reduced and certificates are examined and issued more efficiently.

B. The root certificate is usually built in the client, and the private key is stored offline. Once the private key is disclosed, it is very difficult to revoke the certificate and cannot be remedied in time.

C. If the private key of the intermediate certificate structure is disclosed, the certificate can be revoked online quickly and a new certificate is issued for the user.

D. The HTTPS performance is not significantly affected if the certificate chain is within four levels.

The certificate chain has the following features:

A. The same server certificate may contain multiple valid certificate chains. The generation and verification of certificates are based on the public key and private key pair. If the same public key and private key are used to generate different intermediate certificates, the issued authorities are all legal cas. The difference is that the issuing authority of the intermediate certificate is different.

B. The levels of different certificate chains may not be the same. They may be level 2, level 3, or level 4 certificate chains. The issuing authority of the intermediate certificate may be the root certificate Authority or another intermediate certificate authority, so the hierarchy of the certificate chain may not be the same.

The certificate shall be revoked

The CA can issue certificates, but there is also a mechanism to invalidate previously issued certificates. The user of the certificate is invalid. The CA needs to revoke the certificate. Or the private key is lost, and the user applies to invalidate the certificate. There are two types of mechanisms: CRL and OCSP.

CRL

What is a Certificate Revocation List (CRL)? What does the revocation list do), a separate file. This file contains the serial number (unique) of the certificate that has been revoked by the CA and the revocation date. In addition, this file contains the effective date and the time to update the file next. Of course, this file must contain the signature of the CA private key to verify the validity of the file. A certificate usually contains a URL CRL Distribution Point, which informs the user where to download the CORRESPONDING CRL to verify whether the certificate is revoked. The advantage of this revocation mode is that the CRL does not need to be updated frequently, but the certificate cannot be revoked in a timely manner because the CRL update takes several days, during which a great loss may have been caused.

OCSP

Online Certificate Status Protocol: Online Certificate Status query Protocol. It is a way to query whether a Certificate is revoked in real time. The requester sends the information about the certificate and asks for a query, and the server returns any state, normal, revoked, or unknown. The certificate will also contain an OCSP URL address, which requires the query server to have good performance. Some or most of the self-signed cas (root certificates) do not provide a CRL or OCSP address, and revocation of the certificate can be very troublesome.

HTTPS performance and optimization

HTTPS performance loss

This section describes the principles and advantages of HTTPS, such as authentication, information encryption, and integrity verification, and does not modify TCP and HTTP. However, new protocols must be added to achieve more secure communication. The performance loss of HTTPS protocol is mainly reflected as follows:

Increase the time delay

Analysis of the previous handshake process, a complete handshake requires at least two back and forth communication between the two ends, at least 2* RTT delay, use the session cache to reuse the connection, and at least 1* RTT delay

High CPU resources are consumed

In addition to data transmission, HTTPS communication mainly includes symmetric encryption and decryption, asymmetric encryption and decryption (the server mainly uses private keys to decrypt data). Single-core CPU of TS8: Symmetric encryption algorithm AES-CBC-256 throughput 600Mbps, asymmetric RSA private key decryption 200 times /s. Without considering the overhead of other software layers, the symmetric encryption of 10G nic consumes about 17 CPU cores, and the 24-core CPU can access HTTPS connection 4800 at most. Static node The HTTP single-node access capability of TS8 with 10G network adapter is about 10W /s. If all HTTP connections are changed to HTTPS connections, RSA decryption is obviously the first bottleneck. Therefore, the decryption capability of RSA is the main problem currently plaguing HTTPS access.

HTTPS Access Optimization

CDN access

The delay increased by HTTPS is mainly the transmission delay (RTT). The characteristic of RTT is that the closer the node is, the smaller the delay will be. The CDN is naturally closest to the user. The CDN node can maintain a persistent connection with the service server, reuse sessions, and optimize link quality to greatly reduce the delay caused by HTTPS.

Session cache

As mentioned earlier, even with HTTPS session caching, the latency is at least 1*RTT, but at least the latency has been reduced to half of the previous, a significant delay optimization; In addition, the HTTPS connection based on the session cache does not require the server to decrypt the pre-master information using the RSA private key, which saves CPU consumption. If service access connections are centralized and the cache hit ratio is high, the HTTPS access capability is significantly improved. The cache hit ratio of the current TRP platform is greater than 30% during the peak period. The access resources of 10K /s can actually carry 13K /s access, which is very impressive.

Hardware acceleration

A dedicated SSL hardware accelerator is installed for the access server. Similar to a GPU, it releases CPUS and provides higher HTTPS access capabilities without affecting service programs. A hardware acceleration card can provide 35K decrypting ability, which is equivalent to 175-core CPU and at least equivalent to 7 24-core servers. Considering the overhead of other programs of the access server, a hardware card can achieve the access ability of nearly 10 servers.

Remote decryption

The local access consumes too many CPU resources, nic and hard disk resources. In this case, transfer the RSA decryption computing task that consumes the most CPU resources to another server. In this way, the access capacity of the server can be fully utilized, and the bandwidth and NIC resources can be fully utilized. The remote decryption server can be selected as a machine with low CPU load to realize machine resource reuse, or it can be a server optimized for high computing performance. Currently, it is also one of the solutions used by CDN for large-scale HTTPS access.

SPDY/HTTP2

The previous methods improve HTTPS access performance by reducing transmission delay and single machine load respectively, but they are all based on the optimization method proposed on the basis of not changing THE HTTP protocol. SPDY/HTTP2 takes advantage of TLS/SSL to improve HTTPS access performance by modifying the protocol. Improve download speed and so on.

Pay attention to my

Welcome to follow the public account: Jackyshan, technology dry goods first send wechat, the first time push.