Android Web programming 1 Http protocol

1. Introduction of HTTP

HTTP is also known as Hypertext Transfer Protocol. A data transfer protocol that specifies the rules for communication between the browser and the World Wide Web (WWW = World Wide Web) server, and transmits World Wide Web documents over the Internet.
HTTP, as an application layer protocol in the TCP/IP model, is no exception. HTTP is usually carried on top of TCP, and sometimes on top of TLS or SSL, at which point it becomes known as HTTPS. The diagram below:

HTTP is an application layer protocol, composed of requests and responses, and is a standard client-server model. HTTP is a stateless protocol.
The default HTTP port number is 80, and HTTPS port number is 443.
Web browsing is the primary application of HTTP, but this does not mean that HTTP can only be used for web browsing. HTTP is a protocol that can be used as long as both sides of the communication adhere to it. For example, we commonly used QQ, Thunderbolt these software, will use HTTP protocol (including other protocols).

2. Features of Http

Client/server mode is supported. Basic authentication and security authentication are supported.
Simple and fast: when a client requests a service from the server, it only needs to send the request method and path. GET, HEAD, and POST are common request methods. Each method specifies a difference in the type of contact the client has with the server. Because HTTP protocol is simple, the HTTP server program size is small, so the communication speed is very fast.
Flexibility: HTTP allows the transfer of any type of data object. The Type being transferred is marked by content-Type.
Connectionless: The meaning of connectionless is to limit processing to one request per connection. The server disconnects from the customer after processing the request and receiving the reply from the customer. In this way, transmission time can be saved.
Stateless: HTTP is a stateless protocol. Stateless means that the protocol has no memory for processing things. The lack of state means that if the previous information is needed for subsequent processing, it must be retransmitted, which can result in an increase in the amount of data passed per receive. On the other hand, the server responds faster when it doesn’t need the previous information.

3. Workflow of HTTP protocol

HTTP is TCP based on the transport layer, and TCP is an end-to-end connection-oriented protocol. The so-called end-to-end can be understood as process-to-process communication. So before HTTP can start transmitting, it first needs to establish a TCP connection, and the TCP connection process requires a so-called “three-way handshake.” The following figure shows the three-way handshake for a TCP connection. After the TCP three-way handshake, a TCP connection is established and HTTP can be transmitted. An important concept is connection-oriented, where HTTP does not break TCP connections between transport completion. This is the default behavior in HTTP1.1 (set via the Connection header).

Establish a connection between the client and the server
After the connection is established, the client issues a request to the server.
When the server receives a request, it gives the client a response (reply, response).
After receiving the response, the client performs further processing

4. HTTP URL

HTTP uses Uniform Resource Identifiers (URIs) to transfer data and establish connections. A URL is a special type of URI that contains enough information to find a resource

URL is an IP address used to identify a resource on the Internet. Take the following URL as an example to describe the components of a common URL:

www.tencent.com:8080/news/index….

As you can see from the above URL, a complete URL consists of the following parts: 1. Protocol part: The protocol part of the URL is HTTP:, which indicates that the web page uses HTTP. There are many protocols that can be used on the Internet, such as HTTP, FTP, and so on. In this example, HTTP is used. The “//” after “HTTP” is the delimiter

2. Domain name: The domain name of the URL is www.tencent.com. In a URL, an IP address can also be used as a domain name

3. Port: The domain name is followed by the port. The domain name and port are separated by colons (:). The port is not a required part of a URL, and the default port is used if the port part is omitted

4. Virtual directory: The virtual directory starts from the first slash (/) after the domain name to the last slash (/). The virtual directory is also not a required part of a URL. The virtual directory in this case is “/news/”

5. File name: start from the last slash (/) after the domain name to? Is the filename part, if there is no? Is the file part, if there is no “?” And “#”, then from the last “/” after the domain name to the end, is the filename part. In this case, the file name is index.asp. The file name portion is also not a required part of a URL, and if omitted, the default file name is used

6. Anchor section: From the “#” to the end, it’s the anchor section. The anchor part in this case is “name”. The anchor part is also not a required part of a URL

7. Parameter part: From “? The part between the beginning and “#” is the parameter part, also known as the search part, the query part. The parameter part in this example is “boardID=5&ID=24618&page=1”. A parameter can have multiple parameters separated by ampersand (&).

(the original: blog.csdn.net/ergouge/art…).

5. Differences between URIs and urls

5.1.URI is a Uniform Resource Identifier that uniquely identifies a resource.

Each resource available on the Web, such as AN HTML document, image, video clip, program, etc., is a URI that is located by a URI and consists of three parts:

Naming mechanism for accessing resources
Name of the host where the resource is stored
The name of the resource itself, represented by a path, emphasizes the resource.

5.2. A URL is a Uniform Resource locator, which is a specific URI that can be used to identify a resource and specify how to locate the resource.

A URL is a string of characters used to describe information resources on the Internet. It is used in various WWW client and server programs, especially the famous Mosaic program. Using URLS can use a unified format to describe various information resources, including files, server addresses and directories. A URL consists of three parts: (1) protocol (or service mode), (2) IP address (sometimes including port number) of the host where the resource resides, and (3) specific address of the host resource. Such as directory and file name

URN, a uniform resource name that identifies a resource by its name, e.gmailto:[email protected].

Uris are an abstract, high-level concept that defines a uniform resource identity, while urls and UrNs are ways of identifying specific resources. Urls and UrNs are both urIs. Broadly speaking, every URL is a URI, but not necessarily every URI is a URL. This is because URIs also include a subclass, the Uniform Resource Name (URN), which names resources but does not specify how to locate them. The mailto, news, and ISBN URIs above are examples of UrNs.

In Java URIs, an instance of a URI can represent either absolute or relative, as long as it follows the syntax rules for URIs. The URL class, on the other hand, not only conforms to semantics but also contains information to locate the resource, so it cannot be relative. In the Java class library, the URI class does not contain any methods to access resources; its only function is parsing. In contrast, the URL class opens a stream to the resource.

6. Several HTTP request methods

Request way	role
Get	Requests the resource identified by the request-URI
Post	Append new data to the resource identified by the request-URI
HEAD	Request a response message header for the resource identified by request-URI.
PUT	The requesting server stores a resource and identifies it with a request-URI.
DELETE	Requests the server to remove the resource identified by request-URI.
TRACE	The request server sends back the received request information, mainly for testing or diagnostics.
CONNECT	Reserved in HTTP/1.1 for proxy servers that can pipe connections.
OPTIONS	Request queries about server performance or resource-related options.
PATCH	Used to apply local modifications to a resource, added to specification RFC5789.

HTTP request headers and response headers

7.1HTTP Request Header Request Header

Header	explain	The sample
Accept	Specifies the type of content that the client can receive	Accept: text/plain, text/html
Accept-Charset	A set of character encodings acceptable to the browser.	Accept-Charset: iso-8859-5
Accept-Encoding	Specifies the type of web server content compression encoding that the browser can support.	Accept-Encoding: compress, gzip
Accept-Language	Browser acceptable language	Accept-Language: en,zh
Accept-Ranges	You can request one or more subscope fields of a web page entity	Accept-Ranges: bytes
Authorization	HTTP authorization certificate	Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==
Cache-Control	Specify the caching mechanism that requests and responses follow	Cache-Control: no-cache
Connection	Indicates whether a persistent connection is required. (HTTP 1.1 makes persistent connections by default)	Connection: close
Cookie	When an HTTP request is sent, all cookie values stored under the domain name of the request are sent to the Web server.	Cookie: $Version=1; Skin=new;
Content-Length	The content length of the request	Content-Length: 348
Content-Type	MIME information that corresponds to the entity being requested	Content-Type: application/x-www-form-urlencoded
Date	The date and time the request was sent	Date: Tue, 15 Nov 2010 08:12:31 GMT
Expect	The specific server behavior requested	Expect: 100-continue
From	Email address of the user who made the request	From: [email protected]
Host	Specifies the domain name and port number of the requested server	Host: www.zcmhi.com
If-Match	This is valid only if the request content matches the entity	If – the Match: “737060 cd8c284d8af7ad3082f209582d”
If-Modified-Since	If the part of the request is modified after the specified time, the request succeeds; if it is not modified, the 304 code is returned	If-Modified-Since: Sat, 29 Oct 2010 19:43:31 GMT
If-None-Match	If the content has not changed, the 304 code is returned with the Etag sent by the server. The Etag is compared with the Etag returned by the server to determine whether it has changed	If None – Match: “737060 cd8c284d8af7ad3082f209582d”
If-Range	If the entity has not changed, the server sends the missing part of the client, otherwise sends the whole entity. The parameter is also Etag	If – Range: “737060 cd8c284d8af7ad3082f209582d”
If-Unmodified-Since	The request succeeds only if the entity has not been modified after the specified time	If-Unmodified-Since: Sat, 29 Oct 2010 19:43:31 GMT
Max-Forwards	Limit the amount of time messages can be sent through proxies and gateways	Max-Forwards: 10
Pragma	Used to contain implementation-specific instructions	Pragma: no-cache
Proxy-Authorization	Certificate of authorization to connect to the agent	Proxy-Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==
Range	Only a portion of the entity is requested, specifying scope	Range: bytes=500-999
Referer	The address of the previous web page, followed by the current requested web page, is the incoming path	Referer: www.zcmhi.com/archives/71…
TE	The client is willing to accept the transmission code and notifies the server to accept the header TE:	trailers,deflate; Q = 0.5
Upgrade	Specify some transport protocol to the server for the server to convert (if supported)	Upgrade: HTTP/2.0, SHTTP/1.3, IRC/6.9, RTA/ X11
User-Agent	User-agent contains the information about the User that sends the request	The user-agent: Mozilla / 5.0 (Linux; X11)
Via	Notification intermediate gateway or proxy server address, communication protocol	Via: 1.0 fred, 1.1 nowhere.com (Apache/1.1)
Warning	Warning information about message entities	Warning: 199 Miscellaneous warning

7.2HTTP Responses Header Response Header

Header	explain	The sample
Accept-Ranges	Indicates whether the server supports scoped requests and what type of segmented requests	Accept-Ranges: bytes
Age	Estimated time from the original server to proxy cache formation (in seconds, non-negative)	Age: 12
Allow	A valid request for a network resource. If not allowed, 405 is returned	Allow: GET, HEAD
Cache-Control	Tell all caching mechanisms whether they can cache and what type	Cache-Control: no-cache
Content-Encoding	The type of returned content compression encoding supported by the Web server.	Content-Encoding: gzip
Content-Language	The language of the response body	Content-Language: en,zh
Content-Length	The length of the response body	Content-Length: 348
Content-Location	Request an alternate address for alternate resources	Content-Location: /index.htm
Content-MD5	Returns the MD5 check value of the resource	Content-MD5: Q2hlY2sgSW50ZWdyaXR5IQ==
Content-Range	The byte position of this part in the entire return body	Content-Range: bytes 21010-47021/47022
Content-Type	Returns the MIME type of the content	Content-Type: text/html; charset=utf-8
Date	The time when the original server message was sent	Date: Tue, 15 Nov 2010 08:12:31 GMT
ETag	The current value of the entity label of the request variable	ETag: “737060 cd8c284d8af7ad3082f209582d”
Expires	The expiration date and time of the response	Expires: Thu, 01 Dec 2010 16:00:00 GMT
Last-Modified	The last modification time of the requested resource	Last-Modified: Tue, 15 Nov 2010 12:45:26 GMT
Location	Used to redirect the recipient to the location of the non-requested URL to complete the request or to identify a new resource	Location: www.zcmhi.com/archives/94…
Pragma	This includes implementing specific instructions that can be applied to any recipient on the response chain	Pragma: no-cache
Proxy-Authenticate	It indicates the authentication scheme and the parameters that can be applied to the URL of the broker	Proxy-Authenticate: Basic
refresh	Applied to redirects or a new resource is created, redirects after 5 seconds (proposed by Netscape and supported by most browsers)	Refresh: 5; url=www.zcmhi.com/archives/94…
Retry-After	If the entity is temporarily unavailable, notify the client to try again after the specified time	Retry-After: 120
Server	Name of the Web server software	Server: Apache / 1.3.27 (Unix) (Red Hat/Linux)
Set-Cookie	Set the Http cookies	Set-Cookie: UserID=JohnDoe; Max-Age=3600; Version=1
Trailer	Indicates that the header field exists at the end of the block transfer code	Trailer: Max-Forwards
Transfer-Encoding	File transfer coding	Transfer-Encoding:chunked
vary	Tell the downstream proxy whether to use a cached response or request from the original server	Vary: *
Via	Tell the proxy client where the response is sent	Via: 1.0 fred, 1.1 nowhere.com (Apache/1.1)
Warning	Alerts entities to possible problems	Warning: 199 Miscellaneous warning
WWW-Authenticate	Indicates the authorization scheme that the client requesting entity should use	WWW-Authenticate: Basic

8. The HTTP status code

8.1. Classification of HTTP status codes

The HTTP status code consists of three decimal digits. The first decimal digit defines the type of the status code, and the second two digits do not classify.

HTTP status codes are classified into five types:

classification	Classification description
1 * *	Message, the server receives the request and requires the requester to proceed with the operation
2 * *	Success, the operation is received and processed successfully
3 * *	Redirect, requiring further action to complete the request
4 * *	Client error, request contains syntax error or request cannot be completed
5 * *	Server error. The server encountered an error while processing the request

8.2.HTTP status code table details

Status code	Status code English name	Product description
100	Continue	To continue. The client should continue with its request
101	Switching Protocols	Switch protocol. The server switches protocols based on client requests. You can only switch to a more advanced protocol, for example, the new version of HTTP
200	The OK request succeeded.	Typically used for GET and POST requests
201	Created Created.	The new resource was successfully requested and created
202	Accepted.	The request has been accepted, but processing is not complete
203	Non-Authoritative Information	Unauthorized information. The request succeeded. The meta information returned is not the original server, but a copy
204	No Content	No content. The server processed successfully, but did not return content. You can ensure that the browser continues to display the current document without updating the web page
205	Reset Content	Reset the content. The server is successful, and the user end (for example, browser) should reset the document view. Use this return code to clear the browser’s form field
206	Partial Content	Part of the content. The server successfully processed some of the GET requests
300	Multiple Choices	A variety of options. The requested resource can include multiple locations, and a list of resource characteristics and addresses can be returned for user terminal (e.g., browser) selection
301	Moved Permanently	Permanently move. The requested resource has been permanently moved to the new URI, the return message will include the new URI, and the browser will automatically redirect to the new URI. Any future new requests should be replaced with a new URI
302	Found	Temporary move. Similar to 301. But resources are moved only temporarily. The client should continue to use the original URI
303	See Other	Look at other addresses. Similar to 301. Use GET and POST requests to view
304	Not Modified	Unmodified. The requested resource is not modified, and the server does not return any resources when it returns this status code. Clients typically cache accessed resources by providing a header indicating that the client wants to return only resources that have been modified after a specified date
305	Use Proxy	Use a proxy. The requested resource must be accessed through a proxy
306	Unused	An invalid HTTP status code
307	Temporary Redirect	Temporary redirect. Similar to 302. Use GET to request redirection
400	Bad Request	Client request syntax error, server cannot understand
401	Unauthorized	The request requires user authentication
402	Payment Required	Reserved for future use
403	Forbidden	The server understands the request from the requesting client, but refuses to execute the request
404	Not Found	The server could not find the resource (web page) based on the client’s request. With this code, a web designer can set up a personalized page that says “the resource you requested could not be found.
405	Method Not Allowed	The method in the client request is disabled
406	Not Acceptable	The server could not complete the request based on the content nature of the client request
407	Proxy Authentication Required	The request requires the identity of the broker, similar to the 401, but the requester should use the broker for authorization
408	Request Time-out	The server waited for a request sent by the client for a long time and timed out. Procedure
409	Conflict	The server may return this code after completing a PUT request from the client. A conflict occurred when the server processed the request
410	Gone	The resource requested by the client does not exist. 410 differs from 404 in that if a resource previously had a 410 code that is now permanently deleted, the site designer can specify a new location for the resource through the 301 code
411	Length Required	The server cannot process the content-length message sent by the client
412	Precondition Failed	A prerequisite error occurred when the client requested information
413	Request Entity Too Large	The request was rejected because the requested entity was too large for the server to process. To prevent continuous requests from clients, the server may close the connection. If the server is temporarily unable to process it, a retry-after response is included
414	Request-URI Too Large	The request URI is too long (usually a url) for the server to process
415	Unsupported Media Type	The server could not process the media format attached to the request
416	Requested range not satisfiable	The scope requested by the client is invalid
417	Expectation Failed	The server cannot satisfy Expect’s request headers
500	Internal Server Error	The server had an internal error and could not complete the request
501	Not Implemented	The server did not support the requested functionality and could not complete the request
502	Bad Gateway	A server acting as a gateway or proxy received an invalid request from a remote server
503	Service Unavailable	The server is temporarily unable to process client requests due to overloading or system maintenance. The length of the delay can be included in the server’s retry-after header
504	Gateway Time-out	The server acting as a gateway or proxy did not get the request from the remote server in time
505	HTTP Version not supported	The server did not support the HTTP version of the request and could not complete the processing