Illustrated HTTP reading Abstract - Moment For Technology

This HTTP brochure is pretty good and has already been used twice at 😜

I have made some notes this time, which is convenient for myself and others to read and review. Because this book is the first edition in 2014, some techniques that are not commonly used are omitted in the notes, and only some commonly used ones are recorded

If you want to obtain the PDF resources of this book, you can pay attention to the qr code at the end of the article plus wechat group to find the main ~

1. Understand Web and network basics

1.1 Basic NETWORK TCP/IP

Commonly used networks, including the Internet, operate on the basis of the TCP/IP protocol family, of which HTTP is a subset.

It is divided into the following four layers: application layer, transport layer, network layer, and data link layer.

Application layer: Determines the activities of communication when providing application services to users, such as FTP, DNS, and HTTP
Transport layer: For the upper application layer, it provides data transmission between two computers connected on the network, such as TCP and UDP
Network layer: Used to deal with packets of data flowing across the network. This layer defines the path (the so-called transmission route) through which the packets reach each other’s computers and are transmitted to each other. The role of the network layer is to select a transmission route among many options when it is transmitted to and from the other computer through multiple computers or network devices.
Data link layer: The part of the hardware that handles the connection to the network

When the TCP/IP protocol family is used for network communication, the communication with the peer party is hierarchical and sequential. The sender goes down from the application layer, and the receiver goes up from the application layer.

Take HTTP as an example, a communication process

First, the client as the sender sends an HTTP request to get the Web page at the application layer (HTTP protocol)
In order to facilitate transmission, the transport layer (TCP) divides the data (HTTP request packets) received from the application layer, marks the serial number and port number of each packet and forwards it to the network layer.
At the network layer (IP protocol), add the MAC address as the communication destination and forward the MAC address to the link layer. This way, the communication request to the network is ready
The server at the receiving end receives data at the link layer and sends the data to the upper layer in sequence, all the way to the application layer. HTTP requests sent by clients are received only when they are transmitted to the application layer.

When transmitting data from layer to layer, the sender must print the header information of the layer every time it passes through the layer. The receiver, on the other hand, cancels out the corresponding headers as it passes from layer to layer.

This practice of wrapping data information is called encapsulate.

1.2 Three handshakes

TCP uses a three-way handshaking strategy to deliver data accurately to the destination. TCP flags SYN (Synchronize) and ACK (Acknowledgement) are used in the handshake.

The sender first sends a packet with the SYN flag to the peer. After receiving the packet, the receiving end sends a packet with the SYN/ACK flag to confirm the packet. Finally, the sender sends back a packet with an ACK flag, indicating the end of the handshake.

1.3 DNS Service for Domain name Resolution

The Domain Name System (DNS) service is a protocol at the application layer like HTTP. It provides domain name to IP address resolution service.

1.4 Relationship between protocols and HTTP

This figure shows the roles played by IP, TCP, and DNS in communication over HTTP.

1.5 the URI and URL

Take a look at the parts of a URI

Protocol Scheme Name:usehttp: 或 https:Protocol Scheme name Specifies the protocol type when obtaining access resources. Case insensitive, with a colon at the end. Also can usedata: 或 javascript:This class specifies the schema name of the data or script.
Login information (authentication) : Specify the user name and password as necessary login information (authentication) to obtain resources from the server. This item is optional.
Server address:With absolute URIs, you must specify the server address to be accessed. The address can be similarhackr.jpThis DNS resolvable name, or192.168.1.1This type of IPv4 address can also be[0:0:0:0:0:0:0:1]The IPv6 address name enclosed in square brackets.
Server port number: specifies the network port number to which the server is connected. This option is optional. If omitted, the default port number is automatically used.
Hierarchical file path: Specifies the file path on the server to locate the specified resource. This is similar to the file directory structure on UNIX systems.
Query string: You can use the query string to pass in any parameter for a resource within a specified file path. This item is optional.
Fragment identifiers: Fragment identifiers are often used to mark a child resource (a location within a document) within an acquired resource. However, the RFC does not specify its use method. This item is also optional

2. Simple HTTP protocol

2.1 HTTP is a protocol that does not save state

HTTP is a stateless protocol that does not save state. The HTTP protocol itself does not store the state of communication between requests and responses. The HTTP protocol is designed to be so simple in order to process a large number of transactions more quickly and ensure protocol scalability.

HTTP1.1 is a stateless protocol, but Cookie technology was introduced to achieve the desired state-preserving functionality.

2.2 HTTP method to inform the server of intent

GET: Used to request access to resources identified by the URI. The server returns a response after parsing the specified resources. If the requested resource is text, return it as is; If it is a Common Gateway Interface (CGI) program, it returns the output after execution.
POST transmits the entity body: although the entity body used to transmit the entity can also transmit the entity body with GET method, but generally do not GET method for transmission, but with POST method. While the function of POST is similar to GET, the primary purpose of POST is not to GET the body of the response.
PUT: Contains the content of the file in the body of the request packet and saves the file to the location specified by the request URI. HTTP1.1 PUT does not have an authentication mechanism. Anyone can upload a file.
HEAD retrieves the packet HEAD: the same as GET except that the body of the packet is not returned. Used to verify the validity of the URI and the date and time of resource updates.
DELETE Deletes a file. The opposite of PUT is used to DELETE a file. The DELETE method deletes the specified resource based on the request URI. HTTP1.1’s DELETE method itself, like the PUT method, does not have an authentication mechanism, so DELETE is not used by ordinary Web sites.
OPTIONS Query supported methods: Used to query supported methods for the resource specified by the request URI.
TRACE: Method for the Web server to loop back previous request traffic to the client. At the time of sending the request, the forward field of max-forwards is filled with a value. After passing through each end, the value is reduced by one. When the value reaches zero, the transmission is stopped. However, TRACE methods are not commonly used, especially since they are prone to XST (cross-site Tracing) attacks.
CONNECT Requires a tunnel protocol to CONNECT to the proxy: A tunnel must be established when communicating with the proxy server to implement TCP communication using the tunnel protocol. Secure Sockets Layer (SSL) and Transport Layer Security (TLS) protocols are used to encrypt communication content and transmit it through network tunnels.

methods	instructions	Supported HTTP version
GET	Access to resources	1.0, 1.1,
POST	Transport entity body	1.0, 1.1,
PUT	Transfer files	1.0, 1.1,
HEAD	Get message header	1.0, 1.1,
DELETE	Delete the file	1.0, 1.1,
OPTIONS	Ask for supported methods	1.1
TRACE	Tracking path	1.1
CONNECT	A tunnel protocol is required to connect the agent	1.1
LINK	Establish relationships with resources	1.0
UNLINE	Disconnection relation	1.0

LINK and UNLINK are deprecated by HTTP1.1 and are no longer supported.

2.3 Persistent Connections save traffic

In the original version of the HTTP protocol, TCP connections were disconnected for every HTTP communication.

For example, when viewing an HTML page with multiple images in a browser, you send a request to access the resources on the HTML page and also request other resources contained in the HTML page. Therefore, each request causes unnecessary TCP connection establishment and disconnection, increasing the traffic overhead.

Persistent link keep-alive

HTTP1.1 and some of HTTP1.0 came up with keep-alive connections. The characteristic of a persistent connection is that the TCP connection remains as long as neither end explicitly disconnects.

The benefits of persistent connections are that they reduce the overhead caused by the repeated establishment and disconnection of TCP connections and reduce the load on the server side. In addition, the time spent reducing overhead allows HTTP requests and responses to end earlier, which increases the speed of Web page display.

In HTTP1.1, all connections are persistent by default, but they are not standardized in HTTP1.0.

Pipelines pipelining

Persistent connections make it possible to send most requests as pipelining. After sending the previous request, wait and receive the response before sending the next request. With the advent of pipelining, the next request can be sent directly without waiting for a response.

This makes it possible to send multiple requests simultaneously in parallel without having to wait for one response after another.

Using persistent connections can make requests end faster. Pipelining is faster than persistent connections. The more requests there are, the more significant the time difference becomes.

2.4 State management using Cookies

HTTP is a stateless protocol that does not manage the status of previous requests and responses. That is, the request cannot be processed based on the previous state.

If the Web page that requires login authentication cannot manage the login status (the login status is not recorded), you need to log in to the Web page again each time or add parameters in each request packet to manage the login status.

Cookie technology controls client status by writing Cookie information in request and response packets. The Cookie notifies the client to save the Cookie based on the set-cookie header field in the response packet sent from the server. When the client sends a request to the server next time, the client automatically adds the Cookie value to the request packet and sends the request packet.

3. HTTP information in HTTP packets

3.1 Encoding improves transmission rate

HTTP can be encoded to increase the transmission rate during transmission.

3.2 Content encoding of compressed transmission

Content encoding specifies the encoding format to be applied to entity content and keeps entity information compressed as is.

Common content encodings include: GZIP (GNU Zip), Deflate (Zlib), Identity (no encoding).

3.3 Split sending block transmission coding

During HTTP communication, the browser cannot display the requested page until all the encoded entity resources are transferred. When transferring large amounts of data, the browser can gradually display the page by dividing the data into multiple pieces. This ability to block entity bodies is called Chunked Transfer Coding.

Chunking transfer coding divides the entity body into parts (blocks). Each block is marked with a hexadecimal size, while the last block of the entity body is marked with “0(CR+LF)”.

The entity body that uses the chunking transfer encoding is decoded by the receiving client and reverts to the entity body before encoding.

3.4 A collection of multi-part objects that send a variety of data

When using multi-part object sets in HTTP packets, you need to add content-Type in the header field:

Multipart /form-data: used when uploading Web form files.
Multipart/Byteranges: Status code 206 (Partial Content) used when the response packet contains multiple ranges of Content.

3.5 Scope request for partial content

If you encounter a network interruption during the download, you have to start all over again. In order to solve the above problems, a recoverable mechanism is needed. Recovery refers to the ability to recover a download from a previous download break.

To do this, you need to specify the Range of entities to download. The Request sent by the specified Range is called a Range Request. For range requests, the response returns a status code 206.

Such as:

5001 to 10 000 bytes:Range: bytes=5001-10000
All after 5001 bytes:Range: bytes=5001-
From the beginning up to 3000 bytes and multiple ranges from 5000 to 7000 bytes:Range: bytes=-3000, 5000-7000

3.6 Content Negotiation Returns the most appropriate content

Content negotiation mechanism means that the client and the server negotiate the resource content of the response, and then provide the most suitable resource to the client. Content consultation is based on the language, character set, encoding method of the responding resource.

Some header fields (such as Accept, accept-charset, accept-Encoding, Accept-language, and Content-language) contained in the request message are benchmarks for judgment

There are several types:

Server-driven negotiation: it is automatically processed on the server side using the header field of the request as a reference. But for users, judging by what the browser sends is not always a good way to filter out the best content.
Client driver negotiation: The user manually selects from the list of options displayed in the browser. You can also make this selection automatically on a Web page using JavaScript scripts. For example, according to the OS type or browser type, the PC version of the page or mobile version of the page.
Transparent negotiation: a combination of server-driven and client-driven content negotiation by the server and client respectively.

4. Return the HTTP status code of the result

The status code is responsible for describing the returned request results when the client sends a request to the server. The status code lets the user know whether the server handled the request normally or if an error occurred.

4.1 Status code Indicates the result of the request returned from the server

The first digit of the number specifies the response category, and the last two digits are unclassified

	category	The reason the phrase
1XX	Informational (Informational status code)	The received request is being processed
2XX	Success (Success Status code)	The request is successfully processed
3XX	Redirection (Redirection status code)	Additional action is required to complete the request
4XX	Client Error (Client Error status code)	The server cannot process the request
5XX	Server Error	The server failed to process the request

4.2 2 xx success

The request was processed normally.

200 OK: Indicates that the request from the client is processed on the server
204 No Content: Indicates that the request received by the server is processed successfully, but the response packet returned does not contain the body of the entity. Also, it is not allowed to return the body of any entity. For example, when a 204 response is returned after processing a request from the browser, the page displayed by the browser is not updated. This is usually used when only information needs to be sent from the client to the server, and no new information content needs to be sent to the client.
206 Partial Content: Indicates that the client made a Partial request and the server successfully executed the Partial GET request. The response message contains the entity Content in the Range specified by content-range.

4.3 3XX Redirection

The browser needs to perform some special processing to properly handle the request.

301 Moved Permanently: Permanent redirection. This status code indicates that the requested resource has been assigned a new URI and that the URI to which the resource now refers should be used later. That is, if the URI corresponding to the resource is already bookmarked, it should be saved again as indicated in the Location header field.
302 Found: Temporary redirection. This status code indicates that the requested resource has been assigned a new URI and is expected to be accessed by the user using the new URI. The URI of a resource that has been moved may change in the future.
303 See Other: The GET method is used to obtain the requested resource because another URI exists for the requested resource. 303 and 302 have the same functionality, but the 303 status code is different from the 302 status code in that it explicitly states that the client should use the GET method to GET the resource.
304 Not Modified:Contains in the request header that the client sends the GET methodIf-Match,If-Modified-Since,If-None-Match,If-Range,If-Unmodified-SinceA condition in which the server allows the request to access the resource, but the condition is not met. The 304 status code returned does not contain any response body. Although 304 is classified as 3XX, it has nothing to do with redirection.
307 Temporary Redirect: indicates a Temporary Redirect, the same meaning as 302 Found.

4.4 4XX Client Errors occur

An error occurred on the client.

400 Bad Request: indicates that the Request packet contains syntax errors. When an error occurs, you need to modify the content of the request and send the request again. In addition, the browser treats the status code like 200.
401 Unauthorized:Indicates that the sent request requires authentication information authenticated through HTTP. If the request has been made before, the user authentication failed. A response that returns 401 must contain a value that applies to the requested resourceWWW-AuthenticateThe first is used to query user information. When the browser receives the 401 response for the first time, an authentication dialog will pop up.
403 Forbidden: Access to the requested resource is denied by the server. It is not necessary for the server to give a detailed reason for the rejection, but if it is desired, the reason can be described in the body of the entity so that the user can see it. For example, the file system is not authorized or the access permission is incorrect.
404 Not Found: Requested resource could Not be Found on the server. In addition, it can be used when the server rejects the request without giving a reason.

4.5 5XX Server Error

An error occurred on the server. Procedure

500 Internal Server Error: An Error occurs when the Server executes the request. The Error may be a Web application bug or some temporary fault.
503 Service Unavailable:Indicates that the server is temporarily overloaded or is down for maintenance and is unable to process requests at this time. If you know in advance how long it will take to resolve the above situation, it is best to writeRetryAfterThe header field is then returned to the client.

5. Web servers that collaborate with HTTP

A Web server can set up Web sites with multiple independent domain names or serve as a transfer server on the communication path to improve transmission efficiency.

5.1 Use a single virtual host to achieve multiple domain names

The HTTP1.1 specification allows a single server to Host multiple Web sites, which is a Virtual Host (also known as a Virtual server) feature.

If a server hosted two domain names, corresponding to the same server IP, when receiving the request you need to figure out which domain name to visit, because the virtual host can host a number of different host names and domain names of the Web site, so when sending HTTP requests, The URI for the Host name or domain name must be fully specified in the Host header.

5.2 Communication data forwarding program: agent, gateway and tunnel

These applications and servers can forward requests to the next server on the communication line, and can receive the response from that server and forward it to the client.

Agent:Receives the request sent by the client and forwards it to other servers. The agent does not change the request URI and sends it directly to the target server holding the resource ahead.
1. Caching proxy: The resource cache is stored on the proxy server in advance. When the proxy receives another request for the same resource, it can directly return the previously cached resource as a response.
2. Transparent proxy: a transparent proxy does not process the packet when forwarding a request or response, and an opaque proxy processes the packet content.
Gateway: A server that forwards communication data from other servers and, when receiving requests from clients, processes them as if they were a source server with its own resources. Sometimes the client may not even realize that its communication target is a gateway. Gateways improve communication security by encrypting the communication line between the client and gateway to secure the connection.
Tunnel: establishes a communication line with other servers as required. Encryption means such as SSL are used for communication. The communication ends when the communication ends. The purpose of the tunnel is to ensure secure communication between the client and the server.

5.3 Saving the Cache of Resources

A cache is a proxy server or a copy of resources stored locally on the client.

The advantage is to avoid forwarding resources from the source server multiple times. The client can obtain resources from the nearest cache server, and the source server does not have to process the same request multiple times.

Client-side caching: The browser cache, if valid, is read locally instead of being requested from the server. When the cache is determined to be expired, the validity of the resource is verified with the source server. If the browser cache is invalid, the browser requests new resources again.

6. The HTTP header

HTTP request and response packets must contain the HTTP header. The structure of the request packet is similar to that of the response packet

6.1 HTTP Header Fields

The value consists of the header field name and field value, separated by colons (:)
Field values can have multiple values corresponding to a single HTTP header field

HTTP header fields are defined as the behavior of cached and uncached proxies, divided into two types.

End to End header: The header in this category is forwarded to the final recipient of the request/response and must be stored in the response generated by the cache. It must also be forwarded.
Hop – by – hop headers: the headers in this category are valid only for a single forward and will not be forwarded because of caching or proxies. In HTTP1.1 and later versions, if you want to use a hop-by-hop header, you must provide the Connection header field.

Hop – by – hop header fields in HTTP1.1 are listed below. Except for these eight header fields, all other fields belong to the end-to-end header.

Connection
Keep-Alive
Proxy-Authenticate
Proxy-Authorization
Trailer
TE
Transfer-Encoding
Upgrade

HTTP headers are classified into four types of HTTP header fields based on usage. The 47 header fields used in HTTP communication are not limited to those defined in RFC2616. Cookie, set-cookie, and Content-Disposition are also defined in other RFCS and are used with high frequency.

Generic header field

Header used by both request and response packets.

Header field name	instructions
Cache-Control	Controls the behavior of caching
Connection	Hop – by – hop header and connection management
Date	Date and time when the packet was created
Pragma	Packet instructions
Trailer	View the header of the packet end
Transfer-Encoding	Specifies the transmission code of the packet body
Upgrade	Upgrade to another protocol
Via	Proxy server information
Warning	Error notification

Request header field

Header used to send request packets from the client to the server. This section provides additional information about the request, client information, and priority of the response.

Header field name	instructions
Accept	The type of media that the user agent can handle
Accept-Charset	Preferred character set
Accept-Encoding	Priority content encoding
Accept-Language	Preferred language (natural language)
Authorization	Web Authentication Information
Expect	Expect specific behavior from the server
From	Email address of the user
Host	Request the server where the resource resides
If-Match	Compare Entity Tag (ETag)
If-Modified-Since	Compares the update times of resources
If-None-Match	Compare entity tags (as opposed to if-match)
If-Range	Send scope requests for entity Byte when the resource is not updated
If-Unmodified-Since	Compare resource update times (as opposed to if-modified-since)
Max-Forwards	Maximum transmission hop by hop
Proxy-Authorization	The proxy server requires authentication information of the client
Range	Byte range request for the entity
Referer	The original acquirer of the URI in the request
TE	Priority of transmission encoding
User-Agent	HTTP client program information

Response header field

Header used to return response packets from the server to the client. Additional content added to the response also requires the client to attach additional content information.

Header field name	instructions
Accept-Ranges	Whether to accept byte range requests
Age	Calculate the elapsed time of resource creation
ETag	Matching information of resources
Location	Causes the client to redirect to the specified URI
Proxy-Authenticate	The proxy server authenticates the client
Retry-After	Request the timing of the request to be made again
Server	HTTP server installation information
vary	Proxy server cache management information
WWW-Authenticate	Authentication information about the server to the client

Entity head field

The header used for the entity portion of request and response messages. Added entity-related information such as when the resource content was updated.

Header field name	instructions
Allow	HTTP methods supported by the resource
Content-Encoding	The encoding method applicable to the entity body
Content-Language	The natural language of entity subjects
Content-Length	Size of entity body in bytes
Content-Location	Replace the URI of the corresponding resource
Content-MD5	The packet digest of the entity body
Content-Range	The location range of the entity body
Content-Type	The media type of the entity body
Expires	The date and time when the entity body expires
Last-Modified	The last modified date and time of the resource

6.2 HTTP1.1 Generic header field

The common header field refers to the header used by both request and response packets.

Cache-Control

No – the cache:Prevents returning expired resources from the cache. The client requests if containsno-cache, indicating that the client will not receive the cached response and the cache server must forward the client request to the source server. Contained in the server responseno-cache, the cache server cannot cache the resources, and the source server will not confirm the validity of the resources proposed in the cache server’s request and forbid it to cache the response resources.
No-store: The cache cannot store any part of the request or response locally.

It is easy to mistake no-cache for no cache, but no-cache means no cache of expired resources. The cache will process the resource after the expiration date is confirmed by the source server. No-store means no cache.

Connection

Controls header fields that are no longer forwarded to agents:When the client sends a request and the server returns a response, useConnectionHeader field that controls the header field (hop-by-hop header) that is no longer forwarded to the agent.
Managing persistent connections:HTTP1.1 defaults to persistent connections, where clients send requests continuously. This parameter is set when the server wants to disconnectConnectionThe header field isClose. Prior to HTTP1.1, the default was non-persistent connections. Therefore, you need to set this parameter if you want to continue the connection over the old HTTP protocolConnectionThe header field isKeep-Alive.

Date

Indicates the date and time when the HTTP packet is created.

Upgrade

Used to detect whether HTTP and other protocols can communicate with a higher version, and the parameter value can be used to specify a completely different communication protocol.

6.3 Request Header Field

Field used in the request packet sent from the client to the server to supplement the additional information of the request, client information, and priority of the response.

Accept

Notifies the server of the media types that the user agent can handle and the relative priority of the media types. You can specify multiple media types at once using the type/subtype form.

Accept: text/html,application/xhtml+xml,application/xml; Q = 0.9 * / *; Q = 0.8Copy the code

Host

Tells the server the Internet host name and port number of the requested resource. The Host header field is the only header field in the HTTP1.1 specification that must be included in the request.

When a request is sent to the server, the host name is simply an IP address. However, if multiple domain names are deployed under the same IP address, the server will not be able to understand which domain name corresponds to the request. Therefore, the header field Host is used to explicitly indicate the Host name of the request. If the server does not have a host name, set it to null.

If-Match

If- XXX is called a conditional request. Once received, the server will execute the request only if it determines that the specified condition is true.

The first field, if-match, is one of the strings that tells the server the entity tag (ETag) value used to Match the resource. The server cannot use weak ETag values. The server compares the field value of if-match with the ETag value of the resource. The request is executed only when the two values are consistent.

You can use an asterisk *, and the server will ignore the ETag value and process the request as long as the resource exists.

If-None-Match

In contrast to if-match, If the entity tag ETag used to specify if-none-match does not Match the ETag of the requested resource, it tells the server to process the request.

If-Modified-Since

If the resource is updated after the date and time specified in the if-Modified-since field, the server accepts the request.

After specifying the date and time of the if-modified-since field value, the requested resource will be returned only If the content has been Modified after the given date and time, with status code 200, or If the requested resource has Not been updated, with status code 304 Not Modified.

If-Unmodified-Since

If-unmodified-since is the opposite of if-modified-since. It tells the server that the request will be processed only if the specified request resource has not been updated after the specified date and time. If an update occurs after the specified date and time, the feed Failed is returned in response with the status code 412.

If-Range

If-range fields that match the ETag value or updated date and time are treated as Range requests. Otherwise, all resources are returned.

6.4 Response header Field

The server sends the fields used in the response packets to the client to supplement the additional information about the response, server information, and additional requirements for the client.

ETag

Entity identification, a unique identification of a resource as a string. The server assigns an ETag value to each resource. When the resource is updated, the ETag value needs to be updated.

If the connection is broken or reconnected during the download, the resource is specified according to the ETag value.

6.5 Entity header Field

The header used in the entity part of the request message and the response message, used to supplement the update time of the content and other entity-related information.

Allow

Notify the client of all HTTP methods it can support. When receiving an unsupported HTTP Method, the server returns a response with the status code 405 Method Not Allowed. At the same time, all supported HTTP methods are written to the header field Allow and returned.

Content-Encoding

Inform the client of the content encoding method chosen by the server for the body of the entity. Content encoding refers to compression without loss of entity information.

The main options are Gzip, COMPRESS, Deflate, and Identity

Content-Length

Indicates the size in bytes of the body part of the entity. You can no longer use the content-Length header field when transferring Content encoding to entity bodies.

Content-Type

Specifies the media type of the object in the entity body, assigned by type/subtype.

Content-Type: text/html; charset=UTF-8
Copy the code

Expires

Expires tells the client when the resource Expires. The cache server responds to a request with a cache after receiving an Expires response, and a copy of the response is kept until the Expires field value is specified. When the specified time passes, the cache server turns to the source server to request the resource when the request is sent.

When the source server does not want the cache server to cache the resource, it is best to write the same time value as Date in the Expires field.

However, when a max-age is specified in the header field cache-Control, the max-age directive takes precedence over Expires.

Last-Modified

Contains the date and time that the resource identified by the source server was modified.

6.6 Is the header field of the Cookie service

Header field name	instructions	The first type
Set-Cookie	Cookie information used to start state management	Response header field
Cookie	Cookie information received by the server	Request header field

Set-Cookie

Set-Cookie: status=enable; expires=Tue, 05 Jul 2011 07:26:31 GMT; path=/; domain=.hackr.jp;	
Copy the code

attribute	instructions
NAME=VALUE	The name and value assigned to the Cookie (required)
expires=DATE	Cookie validity period (defaults to before browser closure if not explicitly specified)
path=PATH	Use the file directory on the server as the appropriate object for cookies (default to the file directory where the document resides if not specified)
Domain = domain name	The domain name used as the Cookie object (default if not specified to the domain name of the server that created the Cookie)
Secure	Cookies are sent only for SECURE HTTPS communication
HttpOnly	Restrict cookies so that they cannot be accessed by JavaScript scripts

Once a Cookie is sent from the server side to the client, the server side has no way to explicitly delete the Cookie. However, the substantial deletion of client cookies can be achieved by overwriting expired cookies.

The HttpOnly attribute is an extension of cookies and makes them unavailable to JavaScript scripts. Its main purpose is to prevent Cookie information theft by cross-site scripting (XSS).

Cookie

Cookie: status=enable
Copy the code

When a client wants HTTP state management support, it includes a Cookie received from the server in the request.

6.7 Other header Fields

X-XSS-Protection

XSS is a countermeasure against cross-site scripting (XSS) attacks. It is used to enable or disable the XSS defense mechanism of the browser. The field values can be specified as follows

0: Sets XSS filtering to invalid state
1: XSS filtering is set to a valid state

7. Ensure Web security through HTTPS

HTTPS can effectively prevent security problems such as information eavesdropping and identity disguise.

7.1 Disadvantages of HTTP

Communications use clear text (not encryption) and the content can be eavesdropped
The identity of the communicating party is not verified, so it is possible to encounter camouflage
The integrity of the message could not be proved, so it may have been tampered with

7.2 HTTP+ Encryption + Authentication + Integrity Protection = HTTPS

HTTP with encryption and authentication mechanisms is called HTTP Secure (HTTPS). HTTPS is not a new protocol at the application Layer. Instead, Secure Socket Layer (SSL) and Transport Layer Security (TLS) are used to replace the HTTP communication interface. SSL is independent of HTTP. Therefore, other protocols running at the application layer, such as SMTP and Telnet, can work with SSL.

HTTPS uses a mixture of shared key encryption and public key encryption. If the key can be exchanged securely, it is possible to consider using public-key encryption only for communication. However, public key encryption is slower than shared key encryption.

Therefore, the public key encryption mode is used in the key exchange, and the shared key encryption mode is used in the communication exchange message establishment stage.

The public key Certificate issued by the DIGITAL Certificate Authority (CA) and its related authorities is the trusted public key for authentication. The server sends the public key Certificate issued by the CA to the client for communication in public-key encryption mode. Public key certificates can also be called digital certificates or simply certificates.

Steps for HTTPS communication:

The Client sends a Client Hello packet to start SSL communication. The packet contains the specified VERSION of SSL supported by the client and the Cipher Suite list (encryption algorithm and key length).
When SSL communication is enabled, the Server responds with Server Hello packets. As with the client, the message contains the SSL version as well as the encryption component. The server’s encryption component content is filtered from the received client encryption component.
The server then sends a Certificate packet. The message contains a public key certificate.
Finally, the Server sends a Server Hello Done packet to notify the client that the INITIAL SSL handshake negotiation is complete.
After the first SSL handshake is complete, the Client responds with a Client Key Exchange packet. The packet contains a random password string called pre-master secret, which is used in communication encryption. The packet is encrypted with the public key in Step 3.
The client then sends a Change Cipher Spec packet. The packet prompts the server that the communication after the packet is encrypted with the pre-master secret key.
The client sent a Finished packet. Procedure The packet contains the overall checksum of all packets so far connected. Whether the handshake negotiation can succeed depends on whether the server can decrypt the packet correctly.
The server also sends a Change Cipher Spec packet.
The server also sent a Finished packet.
After exchanging Finished packets between the server and client, the SSL connection is established. Of course, the communication is protected by SSL. This is where application layer protocol communication starts, that is, sending HTTP requests.
Application layer protocol communication, that is, sending HTTP responses.
Finally, the client disconnects. When the connection is disconnected, the close_notify packet is sent. After this step, a TCP FIN packet is sent to close the communication with TCP.

In the preceding process, the application layer attaches a Message Authentication Code (MAC) digest to the data it sends. The MAC checks whether packets are tampered to protect packet integrity.

Is SSL slow

The problem with HTTPS is that it slows down when SSL is used. There are two kinds of it. One is slow communication. On the other hand, the processing speed slows down due to the large consumption of CPU and memory resources.

HTTPS networks can be two to 100 times slower than HTTP. In addition to TCP connections and sending HTTP requests and responses, SSL communication must be performed, so the overall processing traffic inevitably increases. In addition, SSL must be encrypted and decrypted on both the server and client. As a result, SSL consumes more hardware resources than HTTP, resulting in increased load.

There is no fundamental solution to slow down, and hardware such as SSL accelerators is often used. It can increase the computing speed of SSL several times, and only takes effect during SSL processing to share the load.

The reason why HTTPS is not used

Compared with plain text communication, encrypted communication consumes more CPU and memory resources. If every communication is encrypted, it consumes a considerable amount of resources, and the number of requests that can be processed on a single computer is bound to decrease.

For non-sensitive information, HTTP is used. For sensitive data, such as personal information, HTTPS is used for encrypted communication. You can save resources by encrypting only those where information needs to be hidden.

In addition, the desire to save on the cost of purchasing certificates is also a reason.

8. Verify the authentication of the access user

Some pages only want to be viewed by certain people, which introduces authentication.

HTTP1.1 Common authentication modes:

BASIC Certification (BASIC Certification)
DIGEST Authentication
SSL client authentication
FormBase authentication (form-based authentication)

9. Function addition protocol based on HTTP

9.1 WebSocket for Full-duplex Communication

The initiator of the connection is still the client. Once the WebSocket communication connection is established, either the server or the client can send packets to the other side.

Push function: The server pushes data to the client. This way, the server can send data directly without waiting for the client to request it.
Reduced traffic: Not only is there less overhead per connection compared to HTTP, but traffic is also reduced because the header information is small.

Establishment of communication:

First, use the Upgrade header field of HTTP to inform the server that the communication protocol has changed and shake hands.
```
GET /chatHTTP / 1.1Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Origin: http://example.com
Sec-WebSocket-Protocol: chat, superchat
Sec-WebSocket-Version13:Copy the code
```
The sec-websocket-key field records the essential Key values in the handshake process. The sec-websocket-protocol field records the subprotocol used.
Previous requests will be returned with the 101 Switching Protocols response
```
HTTP / 1.1101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
Sec-WebSocket-Protocol: chat
Copy the code
```
The value of the sec-websocket-Accept field is generated from the value of the sec-websocket-key field in the handshake request.

After a successful handshake to establish a WebSocket connection, communication uses websocket-independent data frames instead of HTTP data frames.

HTTP / 2.0 9.2

The goal of HTTP/2.0 is to improve the user experience of speed when using the Web. Features:

HTTP/2.0 uses binary format rather than text format
HTTP/2.0 is fully multiplexed, not ordered and blocking — only one connection is required to achieve parallelism
Using header compression, HTTP/2.0 reduces overhead
HTTP/2.0 allows the server to actively “push” responses into the client cache

10. Techniques for building Web content

10.1 HTML

HTML (HyperText Markup Language) is a Markup Language developed to send HyperText on the Web.

Hypertext is a document system that associates information anywhere in a document with other information (text, images, etc.), known as hyperlinked text.

10.2 Web application

A Web application is an application provided through Web functionality.

CGI starts up every time it receives a request, and once the traffic becomes too heavy, the Web server takes on a considerable load. Servlets, on the other hand, run in the same process as Web servers, so they are less loaded.

The environment in which servlets run is called a Web container or Servlet container. With the popularity of CGI, the CGI running mechanism of launching a new CGI program on every request became a performance bottleneck. Servlets live in memory and, on each request, can start servlets that are lighter than the process level, making the program more efficient.

11. Attack techniques on the Web

11.1 Web Attack Technologies

HTTP itself has no security problems. Servers, clients and Web applications running on servers are the targets of attacks.

HTTP is a universal simple protocol mechanism, and developers need to design and develop authentication and session management functions to meet the security requirements of Web applications. Therefore, running applications will hide various security vulnerabilities that are easy to be abused by attackers.

The request can be tampered with on the client side

The entire content of an HTTP request received from the browser can be freely altered or tampered with on the client side. So the content received by the Web application may be different from the expected data.

You can launch attacks against Web applications by loading attack codes into HTTP request packets. Through THE URL query fields or forms, HTTP headers, cookies and other ways to enter the attack code, if there is a security vulnerability in the Web application at this time, the internal information will be stolen, or the attacker to get management rights.

Attack mode against Web applications

There are two modes of Web attack: active attack and passive attack

Take the initiative to attack

An attack mode in which an attacker directly accesses a Web application and passes in attack code. Since this mode attacks resources on the server directly, the attacker needs to be able to access those resources.

Representative are SQL injection attacks and OS command injection attacks.

Passive aggression

An attack pattern that uses trap strategies to execute attack code. In the passive attack, the attacker does not attack the access of the target Web application directly.

Typical examples are cross-site scripting attacks and cross-site request forgery.

General attack steps

The attacker tricks the user into triggering a trap that triggers the sending of HTTP requests with embedded attack code.
The trap can be triggered by a user’s browser or email client when the user is unwittingly caught.
After being tricked, the user’s browser sends the HTTP request containing the attack code to the Web application that is the target of the attack and runs the attack code.
After the attack code is executed, the Web application with security vulnerability becomes a springboard for attackers, which may lead to the theft of personal information such as cookies held by users and the abuse of user rights in the login state.

11.2 Security vulnerabilities caused by incomplete escape of output values

There are two main types of verification:

Client-side validation
Web application side (server side) validation, including input value validation and output value escape

Generally, validation data is done using JS on the client side, but the script on the client side can be tampered with, so it is not suitable as a security countermeasure.

When exporting data from a database or file system, HTML, mail, etc., value escaping is a critical security policy. When the output value is not fully escaped, the attack code passed in by the attacker is triggered.

XSS cross-site scripting attacks

Cross-site Scripting (XSS) is an attack that occurs when a client with a security vulnerability runs illegal HTML tags or JavaScript. Dynamically created HTML sections can hide security holes.

If the user fills in a form that is parsed directly as HTML, then script tags are the trick.

SQL injection attack

SQL Injection is an attack caused by running illegal SQL against the database used by Web applications.

If there is an error in the way the SQL statement is called, an illegal SQL statement may be maliciously injected.

For example, if the fields of a form query are concatenated directly into an SQL statement, the following situation may occur

Ueno xuan, ueno Xuan, ueno Xuan, ueno Xuan, ueno Xuan, ueno Xuan, ueno Xuan

SELECT * FROM bookTbl WHERE author = 'Ueno Xuan' and flag = 1;
SELECT * FROM bookTbl WHERE author = 'Ueno Xuan'--' and flag = 1;
Copy the code

And flag=1 is ignored automatically, so the filter is lost and the unwanted content is displayed.

OS command injection attack

OS Command Injection attack means that an illegitimate operating system Command is executed to attack a Web application. Wherever Shell functions can be called there is a risk of attack.

HTTP header injection attack

HTTP Header Injection attack is an attack in which the attacker adds any response Header or body by inserting a newline into the response Header field. Passive attack mode.

Attacks that add content to a header body are called HTTP Response Splitting attacks.

For example, sometimes Web applications will sometimes assign values that they receive from the outside to the Location and set-cookie fields at the head of the response. If an attacker sends 101%0D% 0ASet-cookie :+SID=123456789 where %0D%0A represents a newline character, the result will be returned:

Location: http://example.com/?cat=101 (%0D%0A: newline character)Set-Cookie: SID=123456789
Copy the code

Here the attacker can modify any Cookie information.

In this example, the %0D%0A entered by the attacker becomes a newline, which inserts a new header field. In fact, the attacker can insert any header field in the response.

HTTP response truncation attack

The HTTP response truncation attack is an attack used for HTTP header injection.

The attack order is the same, but two %0D%0A%0D%0A are inserted side by side and sent. These two consecutive newlines are used to make the empty lines needed to separate the HTTP header from the body, thus displaying the forged body for the purpose of the attack.

Set-cookie: UID= (%0D%0A: newline) (%0D%0A: newline) <HTML><HEAD><TITLE> <! -- (The header field and the body of the original page are all considered comments)Copy the code

Using this attack, the user’s browser that has triggered the trap will display a forged Web page and then ask the user to enter their personal information, which can achieve the same effect as the cross-site scripting attack.

Mail header injection attack

Mail Header Injection refers To the Mail sending function in a Web application. An attacker launches an attack by adding illegal content arbitrarily To the To or Subject in the Mail Header. Using a Web site with security vulnerabilities, you can send advertisement mail or virus mail to any email address.

Directory traversal attack

Directory Traversal attack refers to an attack in which a file Directory that is not intended to be exposed is accessed by illegally truncating its Directory path. This attack is sometimes called a Path Traversal attack.

When processing files through Web applications, in the case of omissions in processing externally specified file names, users can use… Relative paths such as/are positioned to absolute paths such as /etc/passed. Therefore, any file or file directory on the server can be accessed. This makes it possible to browse, tamper with, or delete files on your Web server illegally.

http://example.com/read.php?log=0401.log http://example.com/read.php?log=.. /.. /etc/passwdCopy the code

Remote files contain vulnerabilities

Remote File Inclusion vulnerability is an attack in which an attacker uses the URL of a specified external server to act as a dependency File when some script content needs to be read from other files. After the script is read, any script can be run.

11.3 Security vulnerabilities caused by setting or design defects

Forced to browse

The urls of files that would otherwise not be made public are hidden for security reasons. But once you know those urls, that means you can browse the files that correspond to them. Such as

http://www.example.com/entry/entry_081202.log
Copy the code

With such a path, it is easy to assume that the next file will be entry_081203.log, which will be accessed.

Incorrect error message handling

An Error Handling Vulnerability is when a Web application Error message contains information useful to an attacker. The main error messages related to Web applications are as follows:

Error messages Thrown by Web applications Web applications do not need to display detailed error messages on the user’s browsing screen. For an attacker, detailed error messages can be a clue to the next attack.
Error messages thrown by systems such as databases Attackers can read the type of the database selected from some error messages, and sometimes can see fragments of SQL statements. This may give attackers inspiration for SQL injection attacks. The system should suppress detailed error messages or use custom error messages.

Open redirection

Open Redirect is a function that allows you to Redirect any specified URL. The security vulnerability associated with this feature is that if a specified redirect URL is directed to a malicious Web site, the user will be directed to that Web site. Such as:

http://example.com/?redirect=http://www.tricorder.jp
http://example.com/?redirect=http://hackr.jp
Copy the code

A Web site with high reliability can be used as a springboard for phishing attacks if the redirection function is enabled.

11.4 Security Vulnerabilities caused by session management negligence

Session hijacking

Session Hijack means that an attacker obtains a user’s Session ID by some means and uses the Session ID to disguise himself as a user.

After knowing that the Web site has a security vulnerability that can attack XSS cross-site, the attacker sets a trap with JavaScript script to call Document. cookie to steal cookie information and obtain the cookie containing session ID.

After obtaining the user’s session ID, set the session ID in the Cookie of your browser to access the Web site disguised as a user whose session ID was stolen.

Session fixation attack

Session Fixation attack is passive attack for Session hijacking, which is used to steal target Session IDS. Session Fixation attack is used to force users to use Session IDS specified by attackers.

The attacker sets a trap to force the user to use the session ID and waits for the user to authenticate with the session ID. Once the user triggers the trap and completes authentication, the state of the session on the server (user A is authenticated) is recorded and the previous session ID is used to access the site.

CSRF cross-site request forgery

Cross-site Request Forgeries (CSRF) attacks are passive attacks in which an attacker forces unexpected updates of personal information or Settings of users who have completed authentication by setting traps.

This message board feature, for example, allows only authenticated and logged in users to post content. The user’s Cookie holds an authenticated session ID. If the user clicks on the malicious code link left by the attacker, the user’s information will be used to perform the operation.

11.5 Other Security Vulnerabilities

Password cracking

Password Cracking is a Password Cracking attack.

There are two main methods for password trial and error through the network:

Brute-force Attack refers to a brute-force Attack on the Keyspace consisting of all key sets.
Dictionary attack refers to an attack that attempts to pass authentication by enumerating the passwords in the dictionary by using the candidate passwords collected in advance (stored in the dictionary after various combinations).

One type of dictionary attack exploits leaked lists of ids and passwords from other Web sites, as many users are accustomed to using the same set of ids and passwords casually across multiple Web sites.

There are usually the following methods to crack the encrypted password:

Analogies are made through exhaustive/dictionary attacks
Rainbow table
To get the key
Vulnerabilities in encryption algorithms

Click on the hijacked

Clickjacking is the use of transparent buttons or links to create traps over Web pages. An attack that induces the user to click on the link to access content without their knowledge. This behavior is also known as INTERFACE dressing.

Booby-trapped Web pages with seemingly innocuous content are already buried with links you want the user to click on. When the user clicks on the transparent button, he is actually clicking on the iframe page that has the transparent attribute element specified.

DoS attack

A Denial of Service attack is an attack that stops a running Service. Sometimes called a denial-of-service attack or denial of service attack. DoS attacks are not limited to Web sites, but also include network devices and servers.

Centralized utilization of access requests causes resource overload, such as sending a large number of legitimate requests
Stop the service by attacking a security hole.

backdoor

A Backdoor is a hidden entry into a development setting that uses restricted functionality without following the normal steps. Backdoors allow access to previously restricted functionality.

Embedded backdoors can be detected by monitoring the status of processes and communications. However, backdoor programs in Web applications are often difficult to detect because they are not very different from those in normal use.

PS: Welcome to pay attention to my official account [front-end afternoon tea], come on together

In addition, you can join the “front-end afternoon tea Exchange Group” wechat group, long press to identify the following TWO-DIMENSIONAL code to add my friend, remarks add group, I pull you into the group ~