If this article is useful to you, you can search my wechat public number: climb, which will push the latest blog articles in time, we make progress together oh!

For a better reading experience, check out the latest content at climbtw.com at blog.climbtw.com.

This paper mainly deals with HTTP headers, caching mechanism, character set types, URL encoding and other relevant content in HTML and meta information tags.


directory

  • Summary of key/value pairs in HTTP headers
    • 1.1 HTTP Request Header (Usually set by clients and transparent to users)
    • 1.2 HTTP Response Headers
  • 2 MIME brief description of message content types
  • 3 Brief description of compression formats
  • 4 Browser, server caching mechanism
    • 4.1 Description of cache-Control values in Request Headers
    • 4.2 Description of cache-Control values in Response Headers
  • 5 HTTP status message
  • 6 HTTP Methods (GET, POST, etc.)
    • 6.1 Comparison between GET and POST
    • 6.2 Other HTTP Request Methods
  • 7 GMT time string (and method of converting to custom time format)
  • 8 Uniform Resource Locator URL
    • 8.1 Whether the WWW is configured for the domain host
  • 9 Character set type
    • 9.1 ASCII Character Set (1 byte, 128 characters, entity number, entity Name) (excluding Chinese)
    • 9.2 ANSI Character Set (2 bytes, 65536 bytes, incompatible by country)
    • 9.3 GB2312, GBK encoding (ANSI Chinese version, 2 bytes, 65536 bytes)
    • 9.4 ISO Character Set (ISO-8859-1, etc., 1 byte, 256 characters, entity number, entity name)
    • 9.5 Unicode encoding (2 or 4 bytes. Utf-8, UTF-16, 1-4 bytes) (including Chinese), encoding representation method
  • 10 Default encoding format of URL characters (including GET and POST)
    • 10.1 ENCODING Format of URL Characters
    • 10.2 Default encoding of pathInfo(non-parameter part) and queryString(parameter part) in urls in Different Browsers
    • 10.3 URL Manual Encoding Solution
    • 10.4 The Server Configures the URL decoding mode and controls the URL encoding mode of the browser
    • 10.5 Summary: Codec process of the server <-> browser
  • HTTP Referer anti-theft chain and anti-theft chain
  • 12 Website host selection
    • 12.1 Issues to Be Considered when Setting up a Server by Yourself
    • 12.2 Using an Internet Service Provider (ISP)

Summary of key/value pairs in HTTP headers

  • HTTP: HyperText Transfer Protocol. Is a request-reply protocol between a client and a server. It contains a header, a body.
  • The HTTP headersIncluding:The clientSend theRequest header, andThe service sideThe returnedResponse headers.
    • Keys/values in HTTP headers, separated by colons, are case insensitive.
  • A pair of key/value attributes http-equiv/Content in the meta information tag
    whose key values are set in the response header sent by the server.
  • All of the key-value pairs in the request header and the response header are described below.

1.1 HTTP Request Header (Usually set by clients and transparent to users)

Request header instructions The sample
accept The client can accept itThe MIMIE content type of the response body.



Corresponds to the values in the response headercontent-typeField.



Generally, it is set by the client. Transparent to the user.



Hopefully, but it’s up to the server to decide exactly what type of content the server returns, butThe client will receive the response regardless of the content type returned by the server, right, it is impossible to say that the server cannot receive response packets due to different content types. This does not comply with the HTTP protocol specifications.



We make a get or POST request from the browser, and this field is automatically added by the browserThe server side also does not parse the value of this field;Through Ajax requests or other means, we can set the value of this field, but we usually don’t.



The field ofApplication scenariosIt could be something like this: there are two terminals, let’s say one isPlain text reader, such as Kinder (can’t display pictures), another isMobile terminals(can play pictures and videos), all request information about “zebra” to the server.

In this case, the server needs to determine which terminal should return what information, so it can be based onacceptTo make a judgment.

If the parse value of Accept is“text/plain”, which means that the client only supports text types;

If toOn the right side of the case, it means that the client text, pictures, video can be.



But if we don’t judge, when we return toText readerA piece ofThe pictureWhen, maybe what it shows isThe statement.
Accept field in baidu search header:

accept:

text/html,

application/xhtml+xml,

application/xml; Q = 0.9,

image/webp,

image/apng,

* / *; Q = 0.8,


application/signed-exchange; v=b3



Standards for the message content type MIME are summarized below.
content-type Sent by the clientThe MIME type of the request body.

This refers to the pair in the POST requestThe encoding of the URL(that is, set to the URLSpecial charactersThe blank spaceWhether to encode or not)







As for theThe URLHow do you code itResponse headersIn thecontent-typecharsetField.



If the server is not setcontent-typecharsetFields will be used by the browserThe default encodingSet as follows:

For GET requests:Chrome.FireFoxFor both paths and parametersUTF-8Encoding.IEFor the path as wellUTF-8, but forParameters are coded in the local environment, such as Chinese GB2312.

For POST requests: Chrome, IE, and FireFox use paths and parametersUTF-8Encoding.
content-type: application/x-www-form-urlencoded



As for the encoding format and encoding format of URL characters by browsers, it will be summarized below.
accept-charset Acceptable to the clientCoding format.



Corresponds to the values in the response headerchartset.



This value is typically not set unless the user asks the browser to use a specific encoding format.

But it is better to set the page encoding in the response header to inform the browser if it parses, rather than using the browser’s default encoding.



When the server returns a packet, thecharacterAccording to certainCoding formatconvertSequence of bytesSend to the client.



As a server, it can use any encoding method, the client has to receive the complete response message. And nowClients almost all support common encoding types.

So when the server returns data, it only needs to follow theEstablished encoding methodCode, and then inThe response messageInform the client of the encoding method used. In this way, the client decodes the received packets in this way to avoid garbled characters.



However, if the client has already decided to use a certain decoding method, then the server can not be so caprices, it needs to parse the accept-charset field, based on this value, to set the encoding method, as follows:

1. Return yesHTMLPage, is in<meta />Set in the tag;

2. If yesResp Outputstream returns native contentIs displayedResponse headers content-typeField to specify the encoding format.

3. Return yesJSPPage, is specifiedpageEncoding;





So, if I want to make sure that I’m not garbled in any case,The server must inform the client of the encoding format used
accept-charset:gbk,utf-8; Q = 0.8
accept-encoding Supported by the clientExtract format.



Corresponds to the values in the response headercontent-encodingField.



Generally, the client set, and then the server according to the requirements of compression, browser decompression. Transparent to the user.



Network data transmission is bandwidth – intensive, and willThe file dataCompression, canReduce data volumeTo reduce transmission time. Therefore, when the server returns data to the client, it usually compresses the data (transparently to the user, usually done by the server or proxy). The compression method can be used in various ways, depending on which decompression method is supported by the client. And then you can say headersaccept-encodingThe value of the.



Compression of files or data, done by servers or agents, usually without programmer intervention; When the client receives the data, the decompression is usually done automatically by the browser and is transparent to the user.



For ajax requests that we initiate actively, the data volume is usually small and this field is not required.
Accept-encoding field in baidu search header:

accept-encoding: gzip, deflate, br



The format of the decompression is summarized below.
accept-language Acceptable to the clientThe language list of the response body



Corresponds to the values in the response headercontent-languageField.



Generally, it is set by the client. Transparent to the user.



When the browser makes a request directly, the browser appends this field to the locale (the default language).



Generally, the server ignores this field when parsing packets.



hisUsage scenariosIt could look like this: let’s say we have a file with different language versions, so that when different requests come in, we can use the accept-language value to determine which language version to return to the client.

(In fact, this application scenario is not so commonDo not useThe method to determine the Accept-language field because of thisunreliable. canRepresent the language version directly in the URL)
Accept-language field in baidu search header:

accept-language: zh-cn,zh; Q = 0.9
origin Launch a campaign againstCross-domain resource sharingRequest for the current valuehost.



Corresponds to the values in the response headeraccess-control-allow-originField.



The request requires that the server be inResponse headersAdd aaccess-control-allow-originRepresents what the server allowsList of cross-domain sources.
origin: http://www.itbilu.com
cookie Put the clientCookie informationSend to the server.



Corresponds to the values in the response headerset-cookieField.



Generally, it is set by the client. Transparent to the user.



Key/value usage=Join, used between different key-value pairs;separated.
Cookie field in baidu search header (intercept part) :

Cookie:

BAIDUID=2B0B46FB4D7624852C26884029FB5E4A:FG=1;

BIDUPSID=2B0B46FB4D7624852C26884029FB5E4A;

PSTM=1567362808;

BD_UPN=12314753
cache-control Used to specify whether to use it in the current requestThe agentIn theCache file.



Corresponds to the values in the response headercache-controlField.
Cache-control field in baidu search header:

cache-control: max-age=0



The browser caching mechanism is summarized below.
if-modified-since Indicates the last modification time of the client cache resource.



Corresponds to the values in the response headerlast-modifiedField.



General client, browser to set. Transparent to the user.



The browserSet the value: When sending a request, the browser automatically sends the request based on the information in the previous response headerlast-modifiedProperty (when the server file was last modified) to set thisModify the time.

The server then automatically determines whether the last modification of the accessed resource was later than if-modified-since. If no later than or equal to the value, the cached resource is the latest and the server returns it304 unmodifiedHTTP status message.

Indicates that the client can directly use the local cache, saving bandwidth.



Browsers generally only cache static resources such as HTML, JPG, CSS and JS. They do not cache dynamic results of JSP pages and Ajax requests.



In addition, static resources should be CDN accelerated and hosted on static servers.

Because server bandwidth is precious.
If-modified-since: Thu, Jun 22 2017 19:07:30 GTM+0800



The value is a string in GMT format.



The HTTP status messages and HTTP methods and GMT time formats are summarized below.
if-none-match

Higher priority
Representing the client cache resourceHash value.



Corresponds to the values in the response headeretagField.



It has the same function as if-modified-since.



General client, browser to set. Transparent to the user.



The browserSet the value: When sending a request, the browser automatically sends the request based on the information in the previous response headeretagProperty (hash value of the server file) to set this value.



The server will then automatically determine whether the hash values are consistent and decide to return304 unmodifiedOr files.



butetag / if-none-matchIs more important than

last-modified / if-modified-sinceTo be higher.
if-none-match: “9jd00cdj34pss9ejqiw39d82f20d0ikd”

referer Represents the current pageSource of the jump.



Generally, it is set by the client. Transparent to the user.



Often used on websitesAccess statisticsFor example, I have made advertisement links to the main page of my website in many places. At this time, I can use the referer to check where there are many people who jump to it, so that the effect of advertisement is good.



In addition, referer is often usedPreventing hotlinkingTo configure interception on the server.



“Referer” was actually the word “Referrer,” but the RFC misspelled the standard and used it instead.
Referer field in baidu search header:

referer: https://www.baidu.com/
connection Keep client and serverThe connectionFeatures.



Generally, it is set by the client. Transparent to the user.



HTTP is a stateless, connection-oriented protocol that itself has no memory for transactions, meaning that the server does not know the state of the browser. For example, even if you log in and visit different pages on the same site, the server won’t know who you are. If you need to record the login user information, user operations, user behavior and other data must use cookies or session to store.



Since HTTP / 1.1,All browsers have connection: keep-alive enabled by defaultTo keepThe connectionFeatures. For example, after a web page is opened, the TCP connection used to transmit HTTP data between the client and the server is not closed. If the client accesses the web page on the server again, the existing TCP connection is used.



Connection: keep-alive Does not keep the connection permanently.

Both the client and the server can choose to close the connection at any time:



The clientSet in the request headerconnection: close.

The serverYou need to set this parameter based on the server type (for example, Apache)The hold time of the connection.
Connection field in baidu search header:

connection: keep-alive
host Specifies the HTTP server that the client wants to accessThe domain nameThe IP address, you can add the port number (if not, the default HTTP port is 80).



Generally, it is set by the client. Transparent to the user.
Host field in baidu search header:

host: www.baidu.com



Uniform Resource Locator urls, and URL character encodings, are summarized below.
user-agent Presentation clientSoftware environment



Generally, it is set by the client. Transparent to the user.



The server can evaluate the client’s environment based on this field and give different responses. (for example, returning different versions of the page depending on whether the request was initiated from a mobile or a computer)
In Chrome, the user-Agent field in the baidu search header is:

User-Agent:

Mozilla / 5.0 (Windows NT 10.0; Win64; x64)

AppleWebKit / 537.36 (KHTML, like Gecko)

Chrome / 76.0.3809.132

Safari / 537.36



The reason for the browser UA confusion: The browser only recognizes the Mozilla that was developed first, and gives preference to the browser that supports the better Gecko kernel. So newer browsers, in order for browsers to better identify themselves, have started to imitate Mozilla’s UA logo.



IE masquerades as Mozilla

KHTML disguised as Gecko

WebKit masquerades as KHTML

Finally Opera masquerades as any of the browsers above and allows the user to decide who they want the browser to be.



This is the way, to pretend that their own mother do not know, who are not who, who are who.
from



oooooooooooooooooo
The email address of the user who initiated this request from: [email protected]

1.2 HTTP Response Headers

Response headers instructions The sample
content-type Notifies the browser of the MIME type of the current content content-type: text/html; charset=UTF-8
charset Notifies the browser of the decoded format of the current content charset: UTF-8
content-encoding Notifies the browser of the compression format used for the current resource content-encoding: gzip
content-language Sound on the content of the language used content-language: zh-cn
access-control-allow-origin Notify the browser which web sites can be shared across domain source resources access-control-allow-origin: *
set-cookie Set the HTTP cookies set-cookie: ctoken=O5kWnZU24hNA4eJq; domain=.mayibank.net; expires=Wednesday, 20-Jun-2007 22:33:00 GMT; path=/; Max-Age=3600; Version=1
cache-control Notify all caching mechanisms, from the server to the client, of whether or not they can cache the object and for how long. The unit is second cache-control: max-age=3600
last-modified The last modification date of the requested resource object. The server automatically adds a last-Modified field to the static file response message, which is used to set the if-modified-since value at request time last-modified: Dec, 26 Dec 2015 17:30:00 GMT
etag The hash value of the requested resource object. It has the same functionality as Last-Modified. The server also automatically adds an ETAG field to the response packet to set the value of if-none-match etag: “737060cd8c284d8af7ad3082f209582d”
age The duration, in seconds, of the response object in the proxy cache age: 12
expires Specify a date/time after which this response is considered expired expires: Thu, 01 Dec 1994 16:00:00 GMT
refresh Used for redirection, or when a new resource is created. The redirection will refresh after 5 seconds by default refresh: 5; url=http://itbilu.com
location Used when redirecting, or when a new resource is created. location: http://www.itbilu.com/nodejs
status The response header field of a generic gateway interface that describes the response status of the current HTTP connection status: 200 OK
server Server name Server: nginx / 1.6.3
warning General warning that there may be an error in the entity content body warning: 199 Miscellaneous warning
allow A valid action for a particular resource allow: GET, HEAD
content-length The length of the response message body, expressed in hexadecimal bytes content-length: 348
content-location A candidate location for the data returned content-location: /index.htm
proxy-authenticate Requires authentication information when accessing the broker proxy-authenticate: Basic
public-key-pins Used to prevent intermediate attacks and declare the certificate hash value of the transport layer security protocol in web site authentication public-key-pins: max-age=2592000; Pin – sha256 = “…”
vary How should the downstream proxy server be told to match future request protocol headers to determine whether the cached response content can be used instead of re-requesting new content from the original server vary: *
via Tell the client of the proxy server how the current response is sent Via: 1.0 Fred, 1.1 Itbilu.com (nginx/1.6.3)
www-authenticate



oooooooooooooooooooooooooooo
Represents the authentication mode that should be used when requesting this entity www-authenticate: Basic

2 MIME brief description of message content types

See MIME types for details.

Type/subtype representation Name extension of the corresponding file
text/plain txt
text/html HTML and HTM
text/css css
text/javascript All text JavaScript types have been deprecated by RFC 4329.
application/javascript js
application/ecmascript es
image/webp
image/apng
application/xml; Q = 0.9 Parameter Q representsThe weight, specifying the priority of content types.

The range is a real number between 0 and 1,The default value is 1The minimum is 0.001 and the maximum is 1.

(A value of 0 indicates that this content type is not accepted)
application/xhtml+xml
application/x-www-form-urlencoded When the browser requests a submission,The defaultencoding(that is, whether special characters are encoded and Spaces are replaced by + signs).
multipart/form-data The second form of encoding that the browser requests submission, that is, does not encode special characters and Spaces.

Boundary is used instead of &, the value of boundary is—-Web… AJv3.



This form is usually used forBinary data.

Such asUpload a fileThe encoding format must bemultipart/form-data.
application/json The third encoding that the browser requests to submit.
* / *; Q = 0.8 all

3 Brief description of compression formats

Compressed format instructions
deflate No patent compression algorithm, it can achieve lossless data compression, there are many open source implementation algorithms. Deflate compresses faster and uses less CPU.

Deflate is an outdated form of web compression that browsers don’t support very well.
gzip The Apache 1.x series does not have built-in web compression technology, so it uses an additional third-party mod_gzip module to perform compression.

Apache 2.x has built in mod_deflate to replace mod_gzip.

Both use the Gzip compression algorithm, and they work similarly. Gzip has a slightly higher compression ratio and CPU usage

4 Browser, server caching mechanism

Cache-control fields in browser and server headers have fixed values, just different objects.

Caching is performed by the server and browser using last-modified/if-modified-since or etag/if-none-match, but the latter takes precedence.

4.1 Description of cache-Control values in Request Headers

When a client sends a request to the server, it may pass through many layers of proxies, which may cache the desired file for the request. Cache-control in the request header controls whether to use the cached file in the proxy.

value instructions
no-store I don’t need a cache file in the proxy, I need to request the server directly.
no-cache The browser can cache the response file, but before using the cache, it must communicate with the server via a token (eTAG) to confirm that the cache is valid.
max-age=xxx Indicates that the agent is free to use the cached content for the next XXX seconds without the browser having to send the same request.

When the time expires, the cache becomes invalid.

4.2 Description of cache-Control values in Response Headers

value instructions
no-store Do not cache the corresponding content (even if eTAG and Last-Modified fields are present in the response header).
no-cache The proxy needs to check with the server that the cache is up to date if it wants to return a file to the browser cache (or if the browser is using the cache).
max-age=xxx Indicates that the proxy or browser is free to use the cached content for the next XXX seconds without the browser having to send the same request.

This option is only available in HTTP 1.1 and has a higher priority if used with last-Modified.

When the time expires, the cache becomes invalid.
must-revalidation/proxy-revalidation If the cached content fails, the request must be sent to the server/proxy for revalidation
public All content will be cached (both client and proxy)
private Content is only cached in private caches (i.e. only clients can cache, not proxy servers)

5 HTTP status message

An error may occur when a browser requests a service from a Web server. The following is a summary of the status code.

  • 1 xx: information
The message describe
100 Continue The server only receives part of the request, but once the server does not reject the request, the client should continue to send the remaining requests.
101 Switching Protocols Server translation protocol: The server converts compliance with a client’s request to another protocol.
  • 2 xx: success
The message describe
200 OK The request was successful (followed by the reply document for the GET and POST requests).
201 Created The request is created and the new resource is created.
202 Accepted The request for processing was accepted, but processing did not complete.
203 Non-authoritative Information The document has returned normally, but some of the reply headers may be incorrect because a copy of the document is being used.
204 No Content No new documents. The browser should continue to display the original document.

When the user refreshes the page periodically, the Servlet can determine that the user document is sufficiently new.

This status code is very useful.
205 Reset Content No new documents. But the browser should reset what it displays.

Used to force the browser to clear form input.
206 Partial Content The client sends a GET request with a Range header, and the server completes it.
  • Xx: redirect
The message describe
300 Multiple Choices Multiple choices. List of links. The user can select a link to reach the destination. A maximum of five addresses are allowed.
301 Moved Permanently The requested page has been moved to the new URL.
302 Found The requested page has been temporarily moved to the new URL.
303 See Other The requested page can be found at a different URL.
304 Not Modified When the document is not modified as expected, the server tells the browserThe cache is not expired and can still be used. The client has the buffered document and makes a conditional request (typically providing an if-Modified-since header indicating the latest document the client wants by a specified date). The server tells the client that the originally buffered document can still be used.
305 Use Proxy The document requested by the customer should be retrieved through the proxy server specified in the Location header.
306 Unused This code was used for the previous version. It is no longer in use, but the code remains.
307 Temporary Redirect The requested page has been temporarily moved to the new URL.
  • 4xx: Client error
The message describe
400 Bad Request The server failed to understand the request.
401 Unauthorized The requested page requires a username and password.
402 Payment Required This code is not yet available.
403 Forbidden Access to the requested page is disabled.
404 Not Found The server could not find the requested page.
405 Method Not Allowed The method specified in the request is not allowed.
406 Not Acceptable The response generated by the server was not accepted by the client.
407 Proxy Authentication Required The user must first authenticate with a proxy server before the request can be processed.
408 Request Timeout The request exceeded the server wait time.
409 Conflict The request could not be completed due to a conflict.
410 Gone The requested page is not available.
411 Length Required “Content-length” is not defined. Without this content, the server will not accept the request.
412 Precondition Failed The preconditions in the request were assessed as failure by the server.
413 Request Entity Too Large The server will not accept the request because the requested entity is too large.
414 Request-url Too Long The server will not accept the request because the URL is too long. This happens when a POST request is converted into a GET request with long query information.
415 Unsupported Media Type The server will not accept requests because the media type is not supported.
416 The server could not satisfy the Range header specified by the customer in the request.
417 Expectation Failed
  • 5xx: Server error
The message describe
500 Internal Server Error Request not completed. The server encountered an unexpected condition.
501 Not Implemented Request not completed. The requested functionality is not supported by the server.
502 Bad Gateway Request not completed. The server received an invalid response from the upstream server.
503 Service Unavailable Request not completed. The server is temporarily overloaded or down.
504 Gateway Timeout The gateway timed out.
505 HTTP Version Not Supported The server does not support the HTTP protocol version specified in the request.

6 HTTP Methods (GET, POST, etc.)

The two most common HTTP methods are GET and POST.

  • GET: Requests data from the specified resource.
  • POST: Submits data to be processed to a specified resource.

6.1 Comparison between GET and POST

GET POST
Data submission method The requestedURLSent in The request ofHTTP message bodySent in
The historical record Be recorded It won’t be recorded
bookmarks Bookmark Do not bookmark
The cache Can be cached Can’t cache
Data length limit The maximum length of URL is2048 characters unlimited
Back button/refresh Use the cache The data will be resubmitted (browsers should inform users that the data will be resubmitted).
encoding application/x-www-form-urlencoded Application/x – WWW – form – urlencoded or multipart/form – the data
Restrictions on data types Only allowASCII characters, non-ASCII characters need TO be URL encoded There is no limit. Binary data is also allowed.
visibility The data is visible to everyone in the URL. The data is not displayed in the URL.
security GET is less secure than POST because the data sent is part of the URL.

Never use GET! When sending passwords or other sensitive information.
POST is more secure than GET because parameters are not saved in browser history or Web server logs.

6.2 Other HTTP Request Methods

methods describe
HEAD Same as GET, but only the HTTP header is returned, not the body of the document.
PUT Uploads the specified URI representation.
DELETE Deletes a specified resource.
OPTIONS Returns HTTP methods supported by the server.
CONNECT Convert the request connection to a transparent TCP/IP channel.

7 GMT time string (and method of converting to custom time format)

GMT Time format: Wed, 20 Jun 2007 22:33:00 GMT

Note:

  • When the Date object is printed directly, it is automatically converted to a string in GMT format.
  • To customize the string format, you need to manually assemble the desired format string from a Date object.
  • new Date()The constructor accepts a Date string (including GMT format) to build a Date object of the specified Date.

GMT time format conversion example code:

// Here we use the GMT format string to build the Date object, and then customize the output format

GMTToStr(gtmStr) {
    let date = new Date(gtmStr);
    let str=date.getFullYear() + The '-' +
    (date.getMonth() + 1) + The '-' +
    date.getDate() + ' ' +
    date.getHours() + ':' +
    date.getMinutes() + ':' +
    date.getSeconds();
    return str;
}


// The Date object is built with a time format string, and the output object is automatically built as a GMT format string

StrToGMT(timeStr) {
    let GMT = new Date(time);
    return GMT;
}



/ / test

// GMT to customize
Print() {
    let DateTime='Thu Jun 22 2017 19:07:30 GMT+0800'
    let a=this.GMTToStr(DateTime)
    console.log(a)
}

// Output: 2017-6-22 19:7:30


// Time format string changed to GMT
Print(){
    let DateTime='the 2017-6-22 19:7:30'
    let a=this.StrToGMT(DateTime)
    console.log(a)
}

// Output: Thu, Jun 22 2017 19:07:30 GTM+0800
Copy the code

8 Uniform Resource Locator URL

URL: Uniform Resource Locator, also called URL. Consists of words (protocol + domain name + port number + path) (An Internet protocol address (IP) can be used instead of a domain name, for example, 192.168.1.253). When surfing the Web, most people type in the domain name of a web address because names are easier to remember than IP numbers.

Example: http://www.w3school.com.cn/html/index.asp

Grammar rules: scheme: / / host. Domain: port/path/filename

  • scheme: Defining the InternetagreementThe type of. The most common types are HTTP (Hypertext Transfer Protocol), HTTPS (Secure Hypertext Transfer Protocol), FTP (File Transfer Protocol), File (Local Resource Protocol), etc.
  • hostDefinition:Domain hostThe default host for HTTP is WWW, which is used to specify the principal domain name.
  • domain: Defining the InternetThe domain name, such asw3school.com.cn. (includingTop-level domain, second level domainEtc.)
  • :port: Defines the hostThe port number(The default HTTP port number is 80, and the default server port number depends on the server type).
  • path: Defines those on the serverThe path(If omitted, the document must be located on the web siteThe root directory).

8.1 Whether the WWW is configured for the domain host

  • If you have aTop-level domain namesThere areThe secondary domain nameIf so, it is best to set in front of the domain nameDomain host WWW, which is used toClarifying the dominant position. (mainlyA small company.The website is not much.Share a top-level domain nameIn the case of
  • Student: If you use thetamultiple Top-level domain namesTo manage different sites, generallyNo configurationDomain host WWW, which makes it easier for users to use. (A large companyWill useDifferent top-level domainsTo manage theDifferent websites.Convenient management, such as Huawei)

About how to configure, need to set in the server, to be added.

9 Character set type

Overview:

  • American Standard ANSI:ASCII code -> ANSI codeSupport for multiple languages, such as ChineseGB2312And so on.Incompatible between different ANSI encodings).
  • International Standard ISO:The ISO code(Support for multiple languages (The value is limited to 1 byte and cannot contain Chinese characters), but the character set varies by locale, such asISO-8859-1Scope of useNorth America, Western Europe, Latin America, Caribbean, Canada, AfricaAnd so on.Different ISO codes are incompatible).
  • Unicode alliance:Unicode(To solve the compatibility problem described above, each symbol in the world is given a unique code, but each character is represented by two or four bytes, resulting inWaste of resources)-> Utf-8 encoding, etc(longerIn order to solve the previousWaste of resources. It can use 1 to 4 bytes to represent a symbol, varying the length of the byte depending on the symbol.

Variable-length encoding for UTF-8:

  1. Characters in the ASCII range are represented by 1 byte. This is because UTF-8 retains a one-byte ASCII character encoding as part of its code, so utF-8 will always have one-byte ASCII characters. Think of UTF-8 as an extension of ASCII.
  2. When characters such as Chinese characters are encountered, they are represented by multiple bytes.
  3. It’s worth noting that,UnicodeOne of the codesChineseCharacters of2 bytesAnd theUTF-8aChineseCharacters of3 bytes.

The reason is that Unicode encodings only consider encodings, while UTF-8 encodings consider not only encodings but also storage (such as embedded 1-byte ASCII characters). 4. Unicode to UTF-8 is not a direct correspondence, but is converted by algorithms and rules. 5. In computer memory, Unicode encoding is used uniformly. When saving to hard disk or transferring, utF-8 encoding is converted. 6. For example: When editing with Notepad, utF-8 characters read from the file are converted to Unicode characters in memory. After editing, Unicode is converted to UTF-8 and saved to the file. 7. The default browser encoding is ISO-8859-1.

History:

  1. Early use of the World Wide WebCharacter setASCII. (The character ‘set includes both characters and symbols, collectively called character)
  2. Since many countries use characters that do not belong to ASCII, the default character set of modern browsers is the ISO international standard, such as ISO-8859-1.
  3. Therefore, if a web page uses a different character set than isO-8859-1, etc., it should be specified in the tag to tell the browser how to decode it.

9.1 ASCII Character Set (1 byte, 128 characters, entity number, entity Name) (excluding Chinese)

  1. HTML and XHTML use standard 7-bit ASCII code to transfer data over the network. Supports numbers from 0 to 9, uppercase and lowercase letters, and special characters.
  2. 7 bit ASCIICode available128A differentCharacter values.2 ^ 7 = 128.
  3. ASCII codeonlyLow sevenAnd thehighThe sign bitalways0. The reason is:

The complete ASCII encoding range can have 256 bits, but it is still not enough to represent Chinese and Japanese characters, so the high part is used as a reserved symbol bit. 4. The ASCII extended character set, or ANSI character set, can represent the characters of other countries, such as Chinese, when the high level of the reserved symbol bit is 1 and two bytes are used. We’ll talk about that in the next section.

ASCII characters can be represented by entity numbers if they are inconvenient to type directly, for example (part) :

ASCII characters The entity number
The blank space & # 32;
! & # 33;
" & # 34;
# & # 35;
$ & # 36;

For other symbols, refer to the ASCII reference manual.

Note: Distinguish between entity numbers and names: all ASCII characters have entity numbers, but only some have entity names. Common ones are as follows:

ASCII characters The entity number The entity name
" & # 34; &quot;
& & # 38; &amp;
' & # 39; &apos;
< & # 60; &lt;
> The & # 62; &gt;

9.2 ANSI Character Set (2 bytes, 65536 bytes, incompatible by country)

  1. ANSIThe character set is an extension of ASCII.
  2. ANSIEncoding to use0x00~0x7f(that is, 0 to 127 in decimal notation)1 byteTo represent oneThe English characters.

Use the 0x80 to 0xFFFF range to represent other characters in other languages. 3. Different countries have different ANSI character set standards. For example, China has developed the GB2312 code, which is used to encode Chinese characters. Japan codifies Japanese to Shift_JIS; Korea has incorporated Hangul into EUC-KR. ANSI codes in different languages cannot be converted to each other, resulting in garbled text in a multilingual mix. 4. Similar to the ANSI character set, which can support multiple national languages, there is the international standard ISO character set, but does not support Chinese characters, etc. We’ll talk about that in the next video. 5. Unicode was created to solve the problem of ANSI coding conflicts between different countries. We’ll talk about that in the next video.

9.3 GB2312, GBK encoding (ANSI Chinese version, 2 bytes, 65536 bytes)

  • GB2312: Chinese National Standard Simplified Chinese Character Set. For names, ancient Chinese and other rare words,GB 2312 cannot be processedThis led to the laterGBK.
  • GBK: Code extension specification for Chinese characters. Using 2 bytes, it is smaller than **UTF-8 (3 bytes) ** in terms of the storage footprint of Chinese characters.

GBK is compatible with GB 2312 downwards and supports ISO 10646 international standard upwards, which plays a connecting role in the transition process from the former to the latter.

9.4 ISO Character Set (ISO-8859-1, etc., 1 byte, 256 characters, entity number, entity name)

  1. The ISO character set is a standard character set defined by the International Standards Organization (ISO) for different alphabet/languages.
  2. Different regions use different ISO character sets and are incompatible, includingISO-8859-1Scope of useNorth America, Western Europe, Latin America, Caribbean, Canada, Africa.
  3. However, as isSingle-byte encodingIs the same as a computer’s most basic unit of representation, so on many protocols,By default, isO-8859-1 is used.
  4. The default code for the browser page is’ ISO-8859-1 ‘.
  5. The default encoding for most browser urls is’ UTF-8.
  6. HTML5The default character encoding isUTF-8.
  • Lower part of ISO-8859-1 (Code from 1 to 127) was the first 7-bit ASCII,Parts have character entities, see the ASCII character entity table above.
  • Higher part of ISO-8859-1 (Code from 160 to 255)All have entity names, see the following table (part) :
Iso-8859-1 Contains higher characters The entity number The entity name
Uninterrupted space (space) The & # 160; &nbsp;
selections(RMB) The & # 165; &yen;
©(copyright) The & # 169; &copy;
®(Registered trademark) The & # 174; &reg;
x(multiply) The & # 215; &times;
present(devide) The & # 247; &divide;
Unicode characters:
(trademark) The & # 8482; &trade;

For more information about isO-8859-1, see the ISO-8859-1 Reference manual.

9.5 Unicode encoding (2 or 4 bytes. Utf-8, UTF-16, 1-4 bytes) (including Chinese), encoding representation method

  1. Due to theANSI codeThe ISO code, have multiple versions and are incompatible. soUnicode alliancedevelopedThe Unicode standard, using the standard Unicode conversion formatreplaceAll existing character sets.

Unicode encoding all characters are represented by 2 or 4 and bytes. 2. The Unicode encoding covers all characters in the world, and is cross-platform. 3. Unicode has been implemented in XML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML. Unicode is also supported in many operating systems and in all modern browsers. 4. The Unicode Consortium works with leading standards development organizations, such as ISO, W3C, and ECMA. Unicode can be compatible with different character sets. The most common encoding methods are UTF-8 and UTF-16. 5. The first 256 Unicode characters correspond to 256 ISO-8859-1 characters. 6. Unicode to UTF-8 is not a direct correspondence, but is converted by algorithms and rules. 7. Unicode is implemented differently than it is encoded. The Unicode encoding of a character is determined. However, in the actual transmission process, because the design of different system platforms is not necessarily the same, and for the purpose of saving space, the implementation of Unicode encoding is different. The Unicode implementation is called Unicode Translation Format (UTF for short). 8. If a Unicode file contains only basic 7-bit ASCII characters, the 8-bit of the first byte is always 0 if each character is transferred using the 2-byte original Unicode encoding. This creates a relatively large waste. In this case, utF-8 encoding can be used, which is a variable-length encoding that still represents the base 7-bit ASCII characters in 7-bit encoding, occupying one byte (the prime complement of 0). In the case of mixing with other Unicode characters, it will be converted according to a certain algorithm, with each character encoded in 1-3 bytes and identified by the first 0 or 1. This greatly reduces encoding length for Western documents that are dominated by 7-bit ASCII characters (see UTF-8 for a solution). Similarly, 2-byte encoding of UTF-16 will need to be converted through algorithms for 4-byte auxiliary flat characters and other UCS-4 extension characters that will appear in the future.

Representation of Unicode encodings :(in the case of compatible other encodings, the Unicode code value of the same character is the same as that of the other encodings)

The environment Unicode encoding representation
In the HTML said & #+ The decimalUnicode code/other code value +;
Js said in \uPlus hexadecimalfourUnicode code



\x+ up to 2 bits of other hexadecimal code values

\ 0+ other code values in base 8 up to 2 bits

And other TYPES of JS conversion methods
CSS said \Plus hexadecimalfourUnicode code

To get Unicode encoding:

// Get the unicode encoding in base 10, parse the output
'Ann'The charCodeAt ();// The output 23433 is the Unicode encoding of the Chinese character, but note that it is base 10
String.fromCharCode(23433); // Output 'Ann'

// Convert to hexadecimal and parse the output
var unicode = '\\u'+'tea'.charCodeAt().toString(16); // Output string: "\u8317"
JSON.parse('"'+unicode+'"'); // Output Chinese characters: "Ming"
eval('"'+unicode+'"'); // Eval parsing can also be used
Copy the code
  • UTF-8
  1. Characters in UTF-81-4byteLonger said.Chinese characters take up three bytes.
  2. Utf-8 can represent the Unicode standardAny character.
  3. Utf-8 isWeb pageE-mailThe preferred encoding of.
  4. UTF-8EnglishAnd so on the encoding of a single byte, thanUniicodeCoding takes up less storage.
  5. GBKChineseOn the treatment of, thanUTF-8Coding takes up less storage.
  • UTF-16
  1. Utf-16 is similar to UTF-8, but its encoding is exactly the same as Unicode encoding and is mainly used forThe operating systemThe environmentMicrosoft, for exampleWindows 2000/XP/2003/Vista/CEAs well asJava.NETBytecode environment, etc.
  2. Because byte order is interpreted differently by different machine environments, the same byte stream may be interpreted differently.

Therefore, the concepts of big-endian and little-endian and the Byte Order Mark (BOM) solution are used in UTF-16 encoding implementation. (For details, see UTF-16.)

In The Notepad that comes with Microsoft Windows, there are four encoding options available in the Save as dialog: In addition to the non-Unicode ANSI encoding, the other three Unicode encoding, Unicode Big Endian and UTF-8, correspond to the original Unicode encoding, UTF-16 and UTF-8 respectively.

10 Default encoding format of URL characters (including GET and POST)

  • Request headerIn thecontent-typeField determines whether the URL is true or notSpecial characters coding.

Here is the encoding format when coding is required:

  • Analysis:Urls can be sent directly over the Internet using part of the ASCII character setThat is, most ASCII characters can be passed through the URL without encoding. It can be in the URLdirectlyASCII characters include:

0-9, a-z, a-z, [,], (,), -, _,., +, *, ‘, $! And so on.

  • However,All characters except these characters, must be carried outcodingCan be passed in the URL.

10.1 ENCODING Format of URL Characters

Almost all browsers generally use UTF-8 encoding for urls, representing each byte individually. Urls cannot contain Spaces, and Spaces are usually replaced with +.

URL encoding format expression instructions
UTF-8 % + twoHexadecimal numberUtf-8 encoding 1. According to1abyte.

2. The default URL encoding mode of the browser.
Unicode %u + fourhexadecimalUnicode 1. To represent1acharacter.

2. This is also the JS encoding methodescape()Is not recommended.

Common URL character encoding table:

character ASCII Utf-8 encoding of the URL Unicode encoding of URL
enter 13 %0D %u000D
A newline 10 %0A %u000A
Chinese Chinese characters have no ASCII code %e4%b8%ad %e6%96%87 %u4e2d %u6587
## 10.2 The default encoding of pathInfo(non-parameter part) and queryString(parameter part) in urls in different browsers
  1. The serverSet up theThe page codeIn addition to affectingPage display, but also affectGETIn the requested URLThe queryString parametersPart of the code.
  2. Due to mainstream browser pairsPathInfo nonparametricThe default encoding for the section isUTF-8, so weJust need to care GET request URLIn thequeryStringCode.

Solution:

  1. Browser URL encoding: server to return the page, setencodingUTF-8Can.
  2. Server URL decoding: The server modifies the configuration to the URL andGETRequest parameterencodingUTF-8Can.

For example, the Default encoding mode of the Tomcat server is UTF-8 for POST requests, but iso8859-1 for GET requests.

Here are the results of a browser test on the default URL encoding without setting the page encoding:

The browser PathInfo coding The queryString coding
A GET request:
IE UTF-8 GB2312 (Local environment related)
Chrome UTF-8 UTF-8
FireFox UTF-8 UTF-8
A POST request:
Chrome UTF-8 UTF-8
FireFox UTF-8 UTF-8

10.3 URL Manual Encoding Solution

JavaScript functions are commonly used to manually encode urls. Common methods include escape(), encodeURI(), and encodeURIComponent().

Js URL encoding method encoding Uncoded character The sample note
escape() Unicode No coding69A:

* + - . / @ _ 0-9 a-z A-Z
Var url = escape("http://www.baidu.com/ Spring Festival ");



http%3A//www.baidu.com/%u6625%u8282
The old function processing mode is not recommended



ooooooooo
encodeURI() Utf-8 encoding No coding82A:

! ' ( ) * - . _ ~ 0-9 a-z A-Z



# $& +, / :; =? @
Var url = encodeURI("http://www.baidu.com/ Spring Festival ");



http://www.baidu.com/%E6%98%A5%E8%8A%82
It is recommended to use
encodeURIComponent()

More encoded characters
Utf-8 encoding No coding71A:

! ' ( ) * - . _ ~ 0-9 a-z A-Z



contrastencodeURI(), the additional encoding 11 characters are:

# $& +, / :; =? @
Var url = window. EncodeURIComponent (" http://www.baidu.com/ "Spring Festival).



http%3A%2F%2Fwww.baidu.com%2F%E6%98%A5%E8%8A%82
It is recommended to use

10.4 The Server Configures the URL decoding mode and controls the URL encoding mode of the browser

The default encoding mode of the PARAMETERS of the GET request URL of the Tomcat server is ISO8859-1. We need to change it to the encoding mode of the foreground URL (generally, the foreground will manually encode it as UTF-8), so we need to set it to UTF-8. Tomcat decodes POST requests using UTF-8 by default.

Related instructions:

  1. The data submitted by GET is in the URL, so it has already been encoded and decoded by the time it reaches the server. To modify the decoding mode, onlyModifying server Configurations, or manuallyEach of the parametersforCodec conversion.
  2. The data submitted by POST can be used after it is encoded by the browser and arrives at the serverrequest.setCharacterEncoding(“UTF-8”);Set the decoder format separately, if not, the server will be usedDefault decoding format.
codec Request type way
decoding GET 1. Manually convert parameters:String queryStr = new String(request.getParameter("queryStr").getBytes("ISO8859-1"), "UTF-8");

2. Alternatively, change the server configuration to the default valueISO8859-1UTF-8. Using Tomcat as an example, modify the server. XML file:



The < Connector port = "8080" protocol = "HTTP / 1.1"

connectionTimeout="20000"

redirectPort="8443"

URIEncoding="UTF-8"/>
Post Generally, no configuration is required. To be safe, you can set the decoding format for the parameters of the server:



request.setCharacterEncoding(“UTF-8”);

If not, the server will use the default decoding format, UTF-8 or something else.



And then passrequest.getParameter()To get the parameters.
coding All requests 1.When the server sends data, it encodes it,The default encoding of the server is ISO-8859-1. Setting method:



response.setCharacterEncoding(“UTF-8”);



In particular, specify pageEncoding at the top of the page number to set the server encoding:

How it works: The translation of JSP ->.java files is performed by the middleware container, the Tomcat server, which encodes the data by defaultISO-8859-1, so you need to set pageEncoding to change the encoding:

The < % @ page pageEncoding = “utf-8” % >



2.Then there is needTell the browser, the encoding format of the response content.



The first is directly in<meta />TAB to set the page encoding format:

<meta http-equiv=”content-type” content=”text/html; charset=UTF-8 />



The second is to set the response header directlycontent-typeProperties:

response.setHeader(“content-type”, “text/html; charset=UTF-8”);





Special, can be specified at the same timeHow the server is encoded, and SettingsBrowser decoding mode, instead of the above two steps, it is more convenient:



response.setContentType(“text/html; charset=utf-8”);

10.5 Summary: Codec process of the server <-> browser

Server -> Browser

  1. response.setContentType(“text/html; charset=utf-8”);: Server SettingsThe way data is encodedAnd,Tell the browser how it is encoded. (The default server encoding format is ISO-8859-1)

This step can be broken down into two steps: the response. The setCharacterEncoding (” utf-8 “); , the response. SetHeader (” the content-type “, “text/HTML. charset=UTF-8”); 2. The browser decodes the page according to the encoding mode specified by the server. (The default browser decoding format is ALSO ISO-8859-1, some also utF-8)

Browser -> Server

  1. Browser according toEncoding mode specified by the serverThe request is encoded and sent to the server. (The default URL encoding mode of the browser is UTF-8, and some urls are local environment, such as GB2312)
  2. After the server receives the browser request 🙁By default, the Tomcat server decodes THE GET request to the URL isO-8859-1 and the POST request to utF-8)

For GET request: When reaching the server, the server has followed the default decoding mode ISO-8859-1, and the decoding is complete. String queryStr = new String(request.getParameter(“queryStr”).getBytes(“ISO8859-1”), “UTF-8″); B. You can also modify the configuration to change the default server decoding mode to UTF-8. For POST requests: The server has not decoded the data when it arrives at the server. A. decoding way: although you can now set the request. SetCharacterEncoding (” utf-8 “); . The Tomcat server defaults to UTF-8 for POST requests. Plus, of course, it’s safer. 5. Finally, the server uses request.getParameter() to get the parameters sent by the browser.

HTTP Referer anti-theft chain and anti-theft chain

Use it and summarize it.

12 Website host selection

12.1 Issues to Be Considered when Setting up a Server by Yourself

  • Hardware spending: To run “real” websites, you have to buy powerful onesServer hardware. Don’t count on a low-priced PC to do the job. You also need toStable (24 hours a day) high-speed connection.
  • Software spending: Remember,Server authorizationUsually more expensive than client-side licensing. Also note that server authorization may existUser limit.
  • Artificial costDon’t count on low labor costs. You must install your ownHardware and Software. At the same time you want toDeal with bugs and virusesTo ensure that your server is running properly in an “anything can happen” environment at all times.

12.2 Using an Internet Service Provider (ISP)

Most small businesses host their websites on servers provided by isPs.

Advantages of ISP:

  • Powerful hardware: ISP web servers are usually so powerfulAbility to share resources by several websites. You should also look to see if your ISP provides efficientLoad balancing, and necessaryBackup server.
  • The connection speed: Most ISPs have Internet connectionsHigh-speed connection.
  • Safety and reliabilityIsps are experts in web hosting. They should provide99%The aboveOnline time.Latest software patches, as well asBest virus protection.

Considerations for choosing an ISP:

  • traffic: Research your ISP’sTraffic restrictions. If there is an unexpected spike in traffic due to your site’s popularity, make sure you don’tPay extra.
  • Bandwidth or content restrictions: Research your ISP’sbandwidthContent restrictions. If you plan to publishThe pictureOr broadcastVideo orAudio, please make sure you have this permission.
  • Database access: If you plan to useWebsite databaseMake sure your ISP supports youRequired databaseAccess.
  • Daily backup: Make sure your ISP does itDaily backupOr you risk losing valuable data.
  • E – mail function: Make sure your ISP supports what you needE - mail function.
  • 24-hour support: Make sure your ISP provides24 hoursSupport. Don’t put yourself in the awkward position of not being able to solve a serious problem while having to wait for a second workday. A toll-free phone service is also necessary if you do not wish to pay for long distance calls.

For others, see Introduction to Network hosts and various servers.