Front-end Browser Network Series Advanced (local storage, caching, networking, Protocols, security)

preface

Js, Vue, React, Node?

No!

The obvious answer is that the front end is now really powerful and complex……

Know HTML, CSS, Js
Follow up with frameworks like Jquery, bootstarp, angularJs.1x.2x
Unfortunately Vue and React went mainstream
Small programs are quietly born
Can cross-platform work? React Native , flutter
Discard the server side. Node pops up
.

Add to that the various UI frameworks, the building tools, the front end is going in all directions, my god, pain!

But there is a knowledge point, in the front end area inside is necessary, browser, about the browser knowledge, in today’s front end is also particularly important, frequently asked in the interview, a variety of local storage, cache, network, Http, Tcp/IP, security and so on, today with me to learn the browser related knowledge

Browser kernel and rendering engine

What are the common browser kernels

Trident kernel: IE,360, Sogou browser, etc. [also known as MSHTML]
Presto kernel: Opera7 and above. [Opera kernel was Presto, now Blink;]
Webkit kernel: Safari,Chrome, etc. [Chrome Blink (WebKit branch)]

Why do we need to know something about the basic browser kernel? Because all web browsers, E-mail clients, and other applications that need to display Web content need a kernel

Browser Rendering engine Some people say that Css engine and Js engine, actually more specific should be rendering engine and Js engine, why say so, because the browser in rendering a web page, there are generally three steps

HTMLParser is used by browsers to parse HTML into DOM trees.
The browser parses the CSS into a CSS Rule Tree (commonly known as CSSOM Tree) through CSSParser.
The browser uses JavaScript to parse JS code through the DOM API or CSSOM API and applies it to the layout. It constructs the Render Tree from the DOM Tree and the CSSOM Tree.

The final Rander tree is an abstract representation of the document structure of the entire page, and then renders the response results as required. Therefore, combined with the three steps, a simple Css rendering engine literally cannot fully express the first two steps

Rendering engine
Js engine

Rendering engine: it is mainly responsible for obtaining the content of the web page (HTML, XML, images, etc.), and calculating the display mode of the web page, and then output to the browser. The different kernel of the browser will have different syntax interpretation of the web page, so the rendering effect is different

JS engine: it is responsible for parsing and executing javascript code to achieve the dynamic effect of web pages

The local store

cookie
sessionStorage
localStorage

Let’s compare this with a table

features	cookie	localStorage	sessionStorage
The life cycle	Usually generated by the server, you can set the expiration time	Persistent storage	The current session is stored at the session layer
Data store size	4K	5M	5M
Whether the cross-domain	It is carried in the same source HTTP request. By default, cross-domain is not allowed. You need to set cross-domain`withCredentials = true`, the server side needs to allow	Cross-domain is not allowed by default and can be resolved using postMessage	cros
Storage location	On the server side, each request is carried in a header	The hard disk	memory

One more thing to note about some cookie attributes is security

Value Should be encrypted if it is used to save the user login status
Http-only The Cookie cannot be accessed through Js, reducing XSS attacks
Secure can only be carried in requests using HTTPS
Same-site specifies that the browser cannot carry cookies in cross-domain requests to reduce CSRF attacks

Browser cache

Simply put, browser caching is the behavior of the browser to store the resources obtained through HTTP locally

Cache priority

Let’s look it up in memory
If it does not exist in memory, look for it in hard disk
If there is none on the hard drive, then the network request is made
Requested resources are cached to disk and memory

Classification of cache

Strong cache
Negotiate the cache

Let's start with some logic

When a client requests a resource, it first determines whether the resource matches the strong cache based on the HTTP header of the resource. If the resource matches, the client directly obtains the cache resource from the local server and does not send the request to the server
When a strong cache does not match, the client sends a request to the server. The server uses the Request header to verify that the resource matches the negotiated cache. If it does, the server returns 304 and tells the client to get it from the cache
When the negotiation cache is also dead, the server returns the resource to the client

When CTRL + F5 forces a page refresh, load directly from the server, skipping strong cache and negotiation cache
When F5 refreshes a web page, the strong cache is skipped, but the negotiated cache is checked

Strong cache

Expires (the specification in HTTP1.0, which is a time string in GMT format with an absolute time, representing the expiration time of the cached resource)
Cache-control :max-age (this is an Http1.1 specification. Strong caches use their max-age value to determine the maximum lifetime of cached resources. The value is in seconds)

Cache-control has several other commonly used properties:

No-cache: Indicates that a negotiated cache is required and a request is sent to the server to confirm whether to use the cache.
No-store: Disables caching and requests data again each time.
Public: can be cached by all users, including end users and intermediate proxy servers such as CDN.
Private: It can only be cached by the browser of the end user and is not allowed to be cached by intermediate proxy servers such as CDN.

Cache-control and Expires can be enabled on the server at the same time. Cache-control has a higher priority when both Settings are enabled

Strong cache disadvantages

After the cache expires, the request for the resource will be reissued regardless of whether the resource has changed or not, but we want the resource file to remain unchanged. Even if the resource expires, the request will not be reacquired, and the old resource will continue to be used, hence the negotiated cache

Negotiate the cache

Last-Modified / If-Modified-Since

The last-modified value is the Last time the resource was updated in GMT format and is returned with the server’s response. When the browser requests the resource again, the request header will contain if-modified-since, This value is the last-Modified value returned before caching. After receiving if-Modified-Since, the server determines whether the negotiation cache was hit based on the Last Modified time of the resource

ETag / If-None-Match

ETag is a string of numeric codes that uniquely identifies the content of the resource and is returned with the response from the server. The server compares if-none-match in the request header with the ETag of the current resource to determine whether the resource has been modified between the two requests. If no change is made, the resource matches the negotiation cache

Why do you need ETag/if-none-match when you have last-modified/if-modified-since?

If the cached file is opened locally, even if the file is not Modified or changed back within a certain period of time, the last-modified file will be changed and the server will not be able to hit the cache

conclusion

Strong cache has a higher priority than negotiated cache
As long as caching is used, the server does not return resources
The strong cache does not send requests to the server
The negotiation cache sends the request to the server

Http Network request type

Get: Sends a request to obtain server data
Post: submits data to the resource specified by the URL
Put: Submits data to the server for modification
Head: Indicates the Head of the request page to obtain meta information about the resource
Delete: Deletes some resources on the server.
Connect: Establishes a connection tunnel for the proxy server.
Options: Lists the methods of requests that can be made to a resource, often used across domains

There are common differences between Get and Post

Get contains the parameters in the URL, connected with the & symbol, while POST passes the parameters through the request body
Get requests are cached actively, but POST requests are not unless manually set
Post is more secure than Get. Get requests can be rolled back in the browser, while Post requests are requested again
Get request parameters are completely preserved in the browsing history, and Post parameters are not
Get requests are limited in the length of the parameters they pass in the URL, while Post requests are not
Get requests can only use URL encoding, and Post can use other types of encoding

Network request status code

There are five basic categories

1XX (informational status code) Accepted request being processed
2XX (Success status code) The request is processed properly
3XX (redirection) requires additional operations to complete the request
4XX (Client error) The client request failed and the server could not process the request
5XX (Server error) The server failed to process the request

Common status code:

200 If the request is successful, the message is returned
301 Permanent redirection, which is cached, indicates that the request URL is permanently changed, and the new URL prevails thereafter
302 Temporary redirection, not cached, indicates that the request URL is temporarily changed
The Get method uses a negotiated cache, and the server returns a status code that satisfies the condition
400 Request error
401 Require authentication, generally refers to no permissions, common in need Token
403 The server is inaccessible
404 No resource was found matching the requested URL
500 Common server side errors
503 indicates that the server is loaded and cannot process requests

Http1.0, Http1.1, Http2.0 difference

The Http 1.0

HTTP 1.0 stipulates that the browser only keeps a short connection with the server. Each time the browser requests a TCP connection with the server, and the server disconnects the TCP connection after the request processing is complete. It can also force long links to be turned on, such as SettingsConnection: keep-alivefield

The Http 1.1

Pipelining, which allows clients to send multiple requests simultaneously within the same TCP connection (Http pipeline mechanism is a technology to submit multiple Http requests in batches. In the process of transmission, there is no need to wait for the response from the server, and only the request methods such as GET and HEAD can be piped)
Cache processing is introducedCache-Control,Etag/If-None-MatchEtc.
Added some error status response codes

Http 2

Multiplexing is used, that is, within a single connection, both the client and the browser can send multiple requests or responses at the same time, and they do not have to be sequentially matched.
Allows the server to actively push resources to the client

Http and Https

Http is a hypertext transfer protocol, based on the Tcp/Ip communication protocol to transfer data

The request information is transmitted in plain text, which is easy to be caught by eavesdropping
Data integrity is not verified and is easy to be tampered with
No authentication, there is security

Https can be understood as Http + SSL condom layer protocol. The SSL certificate authenticates the identity of the server and encrypts the data transmitted between the browser and the server (symmetric + asymmetric).

So what’s the difference

Data encryption or not: Http is in plaintext and HTTPS is in ciphertext
Default port: The default Http port is 80 and the default Https port is 443
Resource consumption: Https communication consumes more CPU and memory resources than HTTP communication because of encryption and decryption processing
Security: HTTP is not secure, HTTPS is relatively secure

What is the Https process? SSL is used to encrypt and decrypt data, and then Http is used to transfer data (ciphertext)

When the user enters an HTTPS URL into the browser, the server is connected to port 443 by default
The server must have a set of digital certificates, also known as SSL (Condom Layer Protocol), which are essentially a pair of public and private keys (usually required)
The server returns its own digital certificate (including the public key) to the client
After receiving the digital certificate from the server, the client authenticates it. If the certificate is valid, it generates a key (symmetric encryption) and encrypts it using the public key of the certificate
The client initiates a second HTTP request in HTTPS, sending the encrypted client key to the server
After receiving the ciphertext sent by the client, the server uses its own private key to decrypt the ciphertext asymmetrically. After decrypting the ciphertext, the server obtains the client key, and then uses the client key to symmetrically encrypt the returned data, so that the data becomes ciphertext
The server returns the encrypted ciphertext to the client
After receiving the ciphertext sent by the server, the client uses its own key (client key) to decrypt the ciphertext symmetrically to obtain the data returned by the server

Tcp three handshakes and four waves

Before we talk about THE Tcp transmission protocol, let’s first understand the Tcp packet. I found a picture on the Internet. Please check it out in detail

I have made a note of the packet information required by the TCP three-way handshake and four-way wave in the diagram, which needs to be briefly understood, and then I will use the two diagrams to understand the TCP three-way handshake and four-way wave

Three-way handshake

The first three-way handshake is initiated by the client to create a Tcp link to the server, which will flag THE SYN bit at position 1 and carry a request sequence number seq = xSerial number X is 32 bits, randomly generated (client-facing)
When a server receives a request from a client to create a link, the server must respond by setting the ACK flag to 1 and generating the ACK number = x + 1Verify that the sequence number is the received request sequence number + 1At the same time, the server also initiates a link creation request to the client, which flags the SYN bit at position 1 and also carries a request sequence number seq = yThe sequence number Y is 32 bits, randomly generated (server oriented)
At this point, the client knows that the server has received the request and agreed to create the link, so it needs to send a response to the server by sending the request again, setting the ACK flag to 1 and generating the ACK = y + 1

Ok, through the above 3 steps, completed TCP three handshake, why not two? The reason for this is simple. After the first two steps, the client knows that the link can be created, but the server is still confused, so the last step is to give the server a response, telling the server that I am ready to create the link

Four times to wave

Once you’ve learned the three handshakes, the four wave is a little easier. It’s basically the same principle

Similarly, the four waves are also initiated by the client, which sends a disconnect request to the server with the flag bit FIN position 1 and a request sequence number seq = x
When the server receives a disconnection request from the client, it also sends a response, so it sets the ACK flag to 1 and generates an ACK = x + 1
At this point, the client receives the request (the server has received its own disconnection), so it is sitting and waiting for the disconnection, but the server may have some other things to do, such as return data…. So when the server is done with the task at hand, it will send a break request to the client, that is, place the FIN at position 1, and generate a break sequence number seq = y
The client received the disconnect request from the server, Jacket, you have finished processing, then we disconnect

Ok, through the above 4 steps, completed the TCP four waves, the principle is very simple, is a two-way question and answer

Differences between Tcp and Udp

TCP is a connection-oriented protocol, meaning that a secure and reliable connection (the legendary three-way handshake) must be established before sending or receiving data, while UDP is connectionless, requiring only the destination port number to send data
TCP supports only point-to-point, while UDP supports one-to-one, one-to-many, many-to-many, and many-to-one
TCP is relatively inefficient because links are created and broken, whereas UDP is not, so it is relatively fast
TCP is byte stream oriented and UDP is packet oriented
TCP ensures data correctness, while UDP may cause packet loss

Front end common attack and defense

XSS

csrf

SQL injection