With the development and progress of the network era, our study, work and life have long been inseparable from the Internet, smart home, online shopping, daily travel all need the support of the Internet. The Internet has actually brought a lot of convenience to our lives.

Have you ever come across a situation like this? When we use a mobile phone or computer to browse some information, or to search for information in a search engine, click the search results to jump, the browser will pop up a blank 404 Not Found page.

This error code indicates that the server did not find the file, that the page to be visited has been changed or removed, or that the wrong address has been entered.

So why do we use 404 instead of some other number to indicate that the access resource does not exist? The Internet has a “legend” about the birth of 404. It is said that before the third technological revolution, the entire Internet was shaped like a large central database, housed in a room called 404. At that time, all Internet access requests were done manually, and if the requested File was Not Found in Room 404, or because the requested File number was incorrectly written, the staff would return a message saying “Room 404: File Not Found.”

Of course, after the actual research found that the legend of Room 404 in fact does not exist, and the real source of 404 from the Internet -HTTP protocol began.

The origin of the status code

As we all know, the establishment of the Internet broke the geographical restrictions, through the communication between the browser and the server let us stay at home to know the world. The browser communicates with the server via the HTTP protocol.

HTTP (Hypertext Transfer Protocol), Hypertext Transfer Protocol, is an application layer Protocol. Because of its simple and fast way, it is suitable for distributed and cooperative hypermedia information systems. Since 1990 it has been used in the World Wide Web (WWW) Global Information Service System.

The process of the user surfing the Internet is that the browser sends the request to the server through the HTTP protocol, and then the content on the server host is displayed to the local.

Underpinning the work of the HTTP protocol is the TCP/IP model worker, which takes care of the underlying data transfer. In this respect alone, the so-called Hypertext Transfer Protocol has nothing to do with transport, which is a bit of a misnomer. So why is HTTP still called a transport protocol? The answer is that it carries the contents of the message.

The HTTP protocol defines the format of the message in detail in the specification document, stipulates the components, parsing rules, and processing strategies, so it can achieve more flexible and rich functions in addition to data transmission on the TCP/IP layer.

TCP’s protocol packet, before the actual data to be transmitted, appends a 20-byte header data, storing TCP’s necessary additional information, such as the sender’s port number, the receiver’s port number, packet number, flag bit, and so on. With this additional TCP header, the packet can be sent correctly, and when it reaches its destination, the header is removed and the real data is retrieved.

HTTP also requires such headers to be appended to the actual data being transmitted, but unlike TCP, it is a “plain text” protocol. The headers are ASCII text that can be easily read with the naked eye and can be understood without the aid of program parsing.

The structure of request message and response message of HTTP protocol is basically the same, which is mainly composed of three parts:

  • Status line: Describe the basic information of the response, that is, the status of the server response;
  • A collection of header fields (Headers) : A more detailed description of the message in key-value form
  • Message body (entity) : The data of the actual response, which is not necessarily plain text, but can be binary data such as images, videos, etc.

The status line and header field are often collectively referred to as the “response header”, and the message body is referred to as the “entity”, which corresponds to the “header” and is often referred to as the “body”.

HTTP protocol stipulate that the message must have the header, but can not have the body, and the header must be followed by a “blank line”, i.e. “CRLF”, hexadecimal “0D0A”.

Taking the response header returned after uploading the Youpai cloud storage interface file as an example, the first line “HTTP/2 200 OK” is the status line, which is composed of three parts:

  • Version number: Represents the HTTP protocol version used by the message. In the figure above, the version is HTTP/2.
  • Status code: A three-digit code that indicates the result of a process, such as 200 (success) or 404 (resource does not exist).
  • Reason Phrase: Supplemented by a numeric status code, is a short literal description of the status code, such as “OK” or “Not Found.”

The following “Content-Type”, “Connection”, and so on belong to the header, and the message ends with a blank line without the body.

In most cases, HTTP packets have only headers and no bodies. Although HTTP protocol does not limit the size of headers, too large headers may take up a lot of server resources and affect operational efficiency. Therefore, each Web server does not allow large request headers. Even so, there are still a lot of big heads running around on the Internet.

In order to minimize the overhead of “big heads” and the time it takes to detect false addresses, sites typically choose status codes to take on this responsibility, because numbers reduce the size of HTTP headers better than words.

The response message allows the client to quickly know whether the request has been processed correctly through the status code, and allows the server to select the most appropriate status to process the request reply client through the status code. At the same time, through various status codes, let the server clearly inform the client of the response status, and let the client clearly know its next operation.

There are currently 41 status codes in the RFC standard, which allow for self-expansion. Apache, Nginx, and other Web servers all define proprietary status codes. When developing Web applications, we can also set up our own proprietary status code without conflict.

Common status code

Next, let’s talk about what the common status codes represent.

The purpose of the status code is to express the “state” of HTTP data processing. The client can switch the processing state according to the code at the appropriate time. Generally, the status code is a decimal number, while the RFC standard specifies that the status code is a three-digit number, with values ranging from 000 to 999. Common status codes have a certain design format and are divided into five categories. The first digit represents the classification, while 0~99 is not used. Thus, the actual usable range of status codes is greatly reduced, from 000~999 to 100~599.

1xx

1×× class status code belongs to the prompt message, is the intermediate state of protocol processing, can be used rarely in practice.

One thing we can occasionally see is “101 Switching Protocols.” This means that the client uses the Upgrade header field and requires the client to continue to communicate with another protocol, such as WebSocket, instead of HTTP. If the server agrees to change the protocol, it sends status code 101, but no HTTP is used for data transfer after that.

There is also a “100 Continue”. Indicates that all is well so far, and the client should proceed with the request, ignoring the request if it has been completed. It usually appears in a file upload.

2xx

The 2×× class status code indicates that the server received and successfully processed the client’s request, which is the status code that the client is most willing to see.

“200 OK” is the most common success status code, indicating that all is well and that the server returned processing results as expected by the client.

“204 No Content” is another very common success status code that means essentially the same thing as “200 OK,” but with No body data after the response header.

“206 Partial Content” is usually used as the basis for Partial downloads or interruptions. It appears when a client sends a “range request” asking for Partial data of a resource. Like 200, the server successfully processed the request, but the data in the body is not the entire resource, but a part of it. The status code 206 is usually accompanied by a header field “Content-Range”, which represents the specific Range of body data in the response packet for the client to confirm, such as “Content-Range: Bytes 0-66/888 “, which means that the first 66 bytes of the total 888 bytes are retrieved.

3xx

The 3×× class status code indicates that the resource requested by the client has changed, and the client must resend the request to obtain the resource with the new URI, which is commonly known as the “redirect”, including the “famous” 301 and 302 jumps.

301 Moved Permanently. This means that the requested resource no longer exists and needs to be revisited by a new URI. Similar to “302 Found,” the former descriptive phrase was “Moved permanently.” colloquially known as “temporary redirect.” This means that the requested resource is still there, but needs to be revisited Temporarily by another URI.

“304 Not Modified” is an interesting status code that is used for conditional requests such as If-Modified-Since to indicate that the resource has Not been Modified and is used for cache control. It does not have the usual meaning of a jump, but can be understood as a “redirect to a file that has been cached” (that is, a “cache redirect”).

4xx

The status code of class 4×× indicates that the request message sent by the client is wrong and the server cannot process it. It is a status code with the real meaning of “error code”.

“400 Bad Request” is a general error code, indicating that there is an error in the Request message, but it is not clear whether it is a data format error, a missing Request header, or some other error. Therefore, in Web development, it is generally avoided to return 400 to the client, and other status codes with more explicit meaning are used.

“403 Forbidden” is not actually a client request error, but indicates that the server has Forbidden access to the resource. Reasons may vary, such as information sensitivity, legal prohibition, etc.

404 Not Found 404 Not Found 404 Not Found 404 Not Found 404 Not Found 404 Not Found 404 Not Found 404 Not Found 404 Not Found 404 Not Found 404 Not Found 404 Not Found

The rest of the code in 4×× is fairly clear about the cause of the error, which is easy to understand. Some common ones used in development are:

  • 405 Method Not Allowed: Some methods are Not Allowed to operate on resources.
  • Acceptable: The resource does Not meet the requirements requested by the client, such as requesting Chinese but only English;
  • 408 Request Timeout: The server is waiting too long for the Request.
  • 409 Conflict: There is a Conflict between multiple requests, which can be understood as a race when multiple threads are concurrently running.
  • 413 Request Entity Too Large: The body of the Request is Too Large;
  • 414 Request-URI Too Long: The URI in the Request line is Too Long;
  • 429 Too Many Requests: The client sent Too Many Requests, triggering server restrictions
  • 431 Request Header Fields are Too Large: A field or population in the Request Header is Too Large.

5xx

Class 5×× status code means that the client request message is correct, but the server internal error occurred during processing, unable to return the due response data, is the server side of the “error code”.

“500 Internal Server Error” is a common Error code similar to that of 400. However, contrary to the response of 400, developers usually do not return the error details inside the server to the visitor. It’s not good for debugging, but it prevents hackers from snooping or analyzing.

Implemented “501 Not Implemented” means that the functionality requested by the client is Not yet supported, something like “Coming soon, stay tuned.”

“502 Bad Gateway” is usually an error code returned by a server acting as a Gateway or proxy. It indicates that the server itself is working properly and that an error occurred while accessing the backend server. The exact cause of the error is unknown.

“503 Service Unavailable” means that the server is currently busy and cannot respond to the Service temporarily. The message “network Service is busy, please try again later” we sometimes encounter when surfing the Internet is the status code 503.

How to handle 404

So let’s go back to our 404 question at the beginning. In the actual business, it is inevitable that the wrong link address will be entered to visit the non-existent resources, or the server can not be accessed by sudden failure. However, the default error response page provided by the Web server, no matter Nginx, Apache or IIS, is not very beautiful. The page is simple, mechanical, and unfriendly to users. It cannot provide intuitive and clear information to users, resulting in a decline in user experience.

As a result, many developers use custom error pages to enhance the user experience and avoid losing users. Take 404 for example. A common way to customize a 404 page is to place quick navigation links, search boxes, and other features on the page to help users access the site and get the information they need.

For example, many developers will use tencent public offer “baby come home – public welfare project 404”, the developer can reference in the custom 404 screen a piece of code, when the user access to resources, 404 pages will display to access a resource does not exist, loading some information missing children, at the same time, through the Internet to spread information, the missing child To improve the chances of finding a missing child. This kind of operation makes science and technology full of temperature, reflects the humanistic care, is the romance of science and technology.

If you don’t know how to customize an error response page, but would love to have one. You can look at a CDN for the cloud, or a custom page feature for a cloud storage service. It helps you quickly configure 4XX, 5XX error response pages. Just open the console, you can configure the error response and error response graph according to your own needs, very convenient and easy to use.

In addition, you can also make different error codes corresponding to different URL jump, URL rewrite and other web guide operations through edge rules.

Recommended reading

[vernacular popular science] chat about those little knowledge of DNS

Talk about common HTTP requests