Profile:This article will go beyond the usual answer to this classic question and try to be more specific without missing any details.
According to the original English version of “Happens-When”, the translation of part of the content was revised and some redundant chapter titles were removed.



Note:The original English version is a non-existent website, so change to
baidu.comDon’t diss me, love and peace 🙂

directory

  • Press the “G” key
  • Press enter
  • Generate interrupts (non-USB keyboard)
  • (Windows) A WM_KEYDOWN message is sent to the application
  • (Mac OS X) A KeyDown NSEvent is issued to the application
  • The (GNU/Linux)Xorg server listens for key code values
  • Parsing the URL
  • Did you enter a URL or a search keyword?
  • Converts non-ASCII Unicode characters
  • Check the list of HSTS
  • The DNS query
  • The ARP process
  • Use a socket
  • The TLS handshake
  • The HTTP protocol
  • HTTP server request processing
  • The story behind the browser
  • The browser
  • HTML parsing
  • CSS analytical
  • Page rendering
  • The GPU to render
  • Window Server
  • Post-rendering with user-triggered processing

Press the “G” key

What follows is an introduction to the physical keyboard and how system interrupts work, but there are parts that are not covered.

When you press the “G” key, the browser receives the message and triggers the auto-complete mechanism. The browser, depending on its own algorithm and whether you’re in private browsing mode, will suggest input at the bottom of the browser’s address box. Most algorithms give priority to making recommendations based on things like your search history and bookmarks. You’re going to type “http://google.com”, so the suggestions don’t match. But there’s still a lot of code running in the background during the input process, and every keystroke you make makes the suggestion more accurate. It’s possible that the browser will suggest “http://google.com” to you before you even type.

Press enter

To start from scratch, we chose the enter key on the keyboard to be pushed to the bottom as the starting point. At this point, a current loop dedicated to the enter key is closed either directly or indirectly through the capacitor, allowing a small amount of current to enter the keyboard’s logic circuit.

The system scans the state of each key, performs debounce on potential bounce changes on the key switch and translates them into keypad values. In this case, the code value of the carriage return is 13. After the keyboard controller gets the code value, it is encoded for subsequent transmission. This transfer is now almost always done over USB or Bluetooth, instead of PS/2 or ADB connections.

USB keyboard:

  • The USB component of the keyboard is connected to the USB controller through the USB port on the computer, and the first pin in the USB port provides it with 5V voltage
  • Key code values are stored in a register called “endpoint” in the keyboard’s internal circuit
  • The USB controller queries the “endpoint” approximately every 10ms to retrieve the stored keycode value data, a minimum interval provided by the keyboard
  • Key code values are converted by the USB serial interface engine into one or more USB packets that follow the low-level USB protocol
  • These packets are transmitted from the keyboard to the computer via either D+ or D- pins (the two pins in the middle) at speeds of up to 1.5Mb/s. Speed limits are due to the fact that human-computer interaction devices are always declared “low speed devices” (USB 2.0 Compliance)
  • This serial signal is decoded at the COMPUTER’s USB controller and further interpreted by the universal keyboard driver of the hCI device. The key code value is then transmitted to the hardware abstraction layer of the operating system

Virtual Keyboard (touch device) :

  • On modern capacitive screens, when the user places a finger on the screen, a small amount of current passes through the finger from the electrostatic field of the conduction layer, forming a loop that causes the voltage to drop at that point of the touch on the screen, and the screen controller generates an interrupt reporting the coordinates of the “click.
  • The mobile operating system then notifies the currently active application that a click has occurred on one of its GUI widgets, which are now buttons on the virtual keyboard
  • The virtual keyboard raises a soft interrupt that returns a “keypress” message to the OS
  • This message is returned to inform the currently active application of a “button press” event

Generate interrupts (non-USB keyboard)

The keyboard sends signals on its interrupt request line (IRQ), which is mapped by the interrupt controller to an interrupt vector, essentially an integer. The CPU uses the interrupt descriptor table (IDT) to map interrupt vectors to corresponding functions, called interrupt handlers, which are provided by the operating system kernel. When an interrupt arrives, the CPU indexes to the corresponding interrupt processor based on IDT and interrupt vector, and the operating system kernel comes out.

(Windows) A WM_KEYDOWN message is sent to the application

HID transmits the event of keyboard pressing to kbdhid. sys driver, and converts HID signal into a Scancode (Scancode), where the Scancode of return is VK_RETURN(0x0d). The KBHID. sys driver interacts with kbdClass. sys (keyboard class driver), which is responsible for securely handling all keyboard and keypad input events. It then calls win32k.sys, possibly passing messages to installed third-party keyboard filters before doing so. This all happens in kernel mode.

Win32k.sys uses the GetForegroundWindow() API function to find which window is currently active. This API function provides a handle to the current browser’s address bar. The Windows “Message Pump” mechanism calls SendMessage (hWnd, WM_KEYDOWN, VK_RETURN, lParam), which is a mask that indicates more information about the key. This information includes the number of key repeats (0 in this case), the actual scan code (which may depend on OEM, but usually not VK_RETURN), whether the function key (Alt, Shift, CTRL) was pressed (not in this case), and some other states.

Windows’ SendMessage API directly adds messages to the message queue of the specific window handle hWnd, after which the main message processing function assigned to hWnd is called, WindowProc, to process messages in the queue.

The currently active handle hWnd is actually an Edit control. In this case, WindowProc has a handler for the WM_KEYDOWN message, which looks at the third parameter, wParam, passed in by SendMessage. Since this parameter is VK_RETURN, it knows that the user has pressed the Enter key.

(Mac OS X) A KeyDown NSEvent is issued to the application

The interrupt signal raises an interrupt handling event for the I/O Kit Kext keyboard driver, which translates the signal into key-code values and passes them to the Windows Server process in OS X. Windows Server then dispatches the event to the appropriate (active, or listening) application through a Mach port, and the signal is placed in the application’s message queue. Messages in the queue can be read by threads with sufficient privileges using the mach_IPc_dispatch function. This process is typically generated and handled by the NSApplication main event loop, via NSEventType KeyDown NSEvent.

The (GNU/Linux)Xorg server listens for key code values

When using a graphical X Server, the X Server maps the key-code values to scan codes again according to specific rules. When the mapping process is complete, X Server sends the keystroke character to the window manager (DWM, Metacity, I3, etc.), which then sends the character to the current window. The current window uses the graphics API to print text in the input box.

Parsing the URL

The browser knows the following from the URL:

  • Protocol “HTTP” Uses the HTTP Protocol
  • The requested Resource is the home page (index).

Did you enter a URL or a search keyword?

When the protocol or host name is invalid, the browser sends the text entered in the address bar to the default search engine. In most cases, when text is passed to a search engine, the URL carries a specific string of characters that tell the search engine that the search came from that particular browser.

Converts non-ASCII Unicode characters

  • The browser checks if the input contains either a-z, A-z, 0-9, – or. The character of
  • Here the host name is http://google.com, so there are no non-ASCII characters; If so, the browser will use Punycode for the hostname part

Check the list of HSTS

  • Browser check comes with a list of “pre-loaded HSTS (HTTP Strict Transport Security)” sites that request that the browser use HTTPS only for connections
  • If the site is in the list, the browser will use HTTPS instead of HTTP; otherwise, the original request will be sent using HTTP
  • Note that even if a site is not on the HSTS list, it can ask the browser to access it using the HSTS policy. After the browser makes the first HTTP request to the web site, the web site returns a response to the browser, which only sends the request using HTTPS. However, this very first HTTP request may subject the user to a downgrade attack, which is why modern browsers preload the LIST of HSTS.

The DNS query

  • The browser checks to see if the domain name is in the cache (to see the cache in Chrome, open Chrome ://net-internals/# DNS).
  • If not, call the gethostbyName library function (depending on the operating system) to query.
  • The gethostbyName function checks whether the domain name is in the local Hosts before attempting DNS resolution. The location of Hosts varies with the OPERATING system
  • If gethostByName does not have a cached record of the domain name and is not found in hosts, it will send a DNS query request to the DNS server. The DNS server is provided by the network communication stack, usually the cache DNS server of the local router or ISP.
  • Example Query the local DNS server
  • If the DNS server and the host reside on the same subnet, the system performs ARP query for the DNS server according to the following ARP procedure
  • If the DNS server and our host are on different subnets, the system queries the default gateway according to the following ARP procedure

The ARP process

To send ARP (Address Resolution Protocol) broadcasts, we need to have a destination IP address and the MAC address of the interface used to send ARP broadcasts.

  • First, we query the ARP cache. If the cache matches, we return the result: destination IP = MAC

If the cache is not hit:

  • Check the routing table to see if the destination IP address is in a subnet in the local routing table. If yes, use the interface connected to that subnet, otherwise use the interface connected to the default gateway.
  • Query the MAC address of the selected network interface
  • We send a Layer 2 (data link layer in the OSI model) ARP request:

ARP Request:

Sender MAC: interface:mac:address:here
Sender IP: interface.ip.goes.here
Target MAC: FF:FF:FF:FF:FF:FF (Broadcast)
Target IP: target.ip.goes.here
Copy the code

The hardware connected to the host and router can be classified into the following types:

Direct:

  • If we are directly connected to the router, it will return an ARP Reply (see below).

Hub:

  • If we connect to a hub, the hub will broadcast ARP requests to all other ports, and if the router is also “connected” to it, it will return an ARP Reply.

Switch:

  • If we are connected to a switch, the switch checks the local CAM/MAC table to see which port has the MAC address we are looking for. If not, the switch broadcasts the ARP request to all other ports.
  • If there is an entry in the switch’s MAC/CAM table, the switch will send an ARP request to the port that has the MAC address we want to query
  • If the router is also “connected” to it, it will return an ARP Reply

ARP Reply:

Sender MAC: target:mac:address:here
Sender IP: target.ip.goes.here
Target MAC: interface:mac:address:here
Target IP: interface.ip.goes.here
Copy the code

Now that we have the IP address of the DNS server or default gateway, we can proceed with the DNS request:

  • Port 53 is used to send UDP request packets to the DNS server. If the response packets are too large, TCP is used
  • If the local /ISP DNS server does not find the result, it sends a recursive query request to the upper-level DNS server layer by layer until it finds the origin authority, and returns the result if it finds it

Use a socket

When the browser gets the IP address of the destination server and the port number given in the URL (the default HTTP port number is 80, HTTPS port number is 443), it calls the system library function socket to request a TCP stream socket. The corresponding parameters are AF_INET/AF_INET6 and SOCK_STREAM.

  • This request is first sent to the transport layer, where the request is encapsulated as a TCP segment. The destination port is added to the header, and the source port is selected from the kernel’s dynamic port range (ip_local_port_range on Linux).
  • The TCP segment is sent to the network layer. The network layer adds an IP header to the segment, which contains the IP address of the destination server and the IP address of the local host, and encapsulates it as a TCP packet.
  • The TCP packet then goes to the link layer. The link layer adds a frame header to the packet, which contains the MAC address of the local built-in network card and the MAC address of the gateway (local router). As mentioned earlier, if the kernel does not know the MAC address of the gateway, it must perform an ARP broadcast to query its address.

At this point, the TCP packet is ready to be transmitted as follows:

  • Ethernet
  • WiFi
  • Cellular data network

For most home networks and small-business networks, packets start at the local computer, travel through the local network, and are converted by modems into analog signals suitable for transmission over telephone lines, cable television cables, and wireless phone lines. At the other end of the transmission line is another modem, which converts analog signals back to digital signals for processing at the next network node. The destination and source addresses of nodes are discussed later.

Large businesses and newer homes often use fiber optic or direct Ethernet connections, in which case the signal is always digital and passed directly to the next network node for processing.

The final packet will reach the router that manages the local subnet. From there, it continues through border routers in autonomous systems (AS), other autonomous regions, and eventually to the target server. These routers along the way extract the destination address from the IP data header and route the packet correctly to the next destination. The value of the time to Live (TTL) field in the HEADER of the IP datagram decreases by 1 for each router that passes through it. If the TTL of the packet becomes 0 or the packet queue is full due to network congestion, the packet is discarded by the router.

The above send and receive process occurs many times during a TCP connection:

  • The client selects an initial sequence number (ISN) and sends a packet with the SYN bit set to the server, indicating that it wants to establish a connection and set the initial sequence number
  • The server receives a SYN packet if it can establish a connection:
    • The server selects its own initial sequence number
    • Setting the SYN bit on the server indicates that it has selected an initial sequence number
    • The server copies (the client ISN + 1) to the ACK field and sets the ACK bit to indicate that it received the first packet from the client
  • The client confirms the connection by sending the following packet:
    • Own serial number +1
    • The receiver ACK + 1
    • Set the ACK bit
  • Data is transmitted in the following way:
    • When a party sends N Bytes, it increments its own SEQ sequence number by N
    • After the other party acknowledges receipt of this packet (or series of packets), it sends an ACK packet whose value is set to the last serial number of the received packet
  • When closing the connection:
    • The party that wants to close the connection sends a FIN packet
    • The other party acknowledges the FIN packet and sends its own FIN packet
    • The party to close uses an ACK packet to confirm that a FIN has been received

The TLS handshake

  • The client sends a ClientHello message to the server containing its Transport Layer Security (TLS) version, available encryption algorithms, and compression algorithms.
  • The server returns a ServerHello message to the client containing the TLS version on the server, the encryption and compression algorithm selected by the server, and the server public Certificate issued by the Certificate Authority. The certificate contains the public key. The client uses this public key to encrypt the subsequent handshake until a new symmetric key is negotiated
  • The client authenticates the certificate on the server based on its trusted CA list. If trusted, the client generates a string of pseudo-random numbers and encrypts it using the server’s public key. This random number is used to generate a new symmetric key
  • The server uses its own private key to decrypt the random numbers mentioned above, and then uses this string of random numbers to generate its own symmetric master key
  • The client sends a Finished message to the server using the symmetric key to encrypt a hash value for the communication
  • The server generates its own hash value and decrypts the message sent by the client to check if the two values match. If so, a Finished message is sent to the client, which is encrypted using the negotiated symmetric key
  • From now on, the entire TLS session is encrypted using a symmetric secret key to transmit application layer (HTTP) content

The HTTP protocol

If the browser is made by Google, instead of using HTTP to retrieve page information, it will send a request to the server to discuss using SPDY.

If the browser uses HTTP and does not support SPDY, it sends a request to the server like this:

GET/HTTP/1.1 Host: google.com Connection: close [other headers]Copy the code

[Other Headers] contains a series of key-value pairs separated by colons in the HTTP protocol format, separated by a newline character. (We are assuming that the browser does not have any bugs that violate the HTTP protocol and that the browser uses HTTP/1.1, otherwise the header may not contain the Host field and the version number in the GET request will be either HTTP/1.0 or HTTP/0.9.)

HTTP/1.1 defines the “close” option, which is used by the sender to indicate that the connection will be disconnected after the response ends. Such as:

Connection:close

HTTP/1.1 applications that do not support persistent connections must include the “close” option in every message.

After sending these requests and headers, the browser sends a newline character to indicate that the content to be sent is finished.

The server returns a response code indicating the status of the request. The response looks like this:

200 OK
[response headers]
Copy the code

Then a line feed, and then payload, which is the HTML content of http://www.google.com. The server may close the connection below, and if the client requests to keep the connection open, the server will keep the connection open for reuse on subsequent requests.

If the HTTP header sent by the browser contains enough information (such as the Etag header) that the server can determine that the version of the file cached by the browser has not changed since it was last fetched, the server may return a response like this:

304 Not Modified
[response headers]
Copy the code

The response has no payload, and the browser retrieves the desired content from its cache.

After parsing the HTML, the browser and client repeat the process until all the resources introduced into the HTML page (images, CSS, favicon.ico, etc.) have been retrieved. The difference is that GET/HTTP/1.1 in the header becomes GET /$(as opposed to http://www.google.com URL) HTTP/1.1.

If HTML introduces a resource other than http://www.google.com, the browser will revert to resolving the domain name and follow the steps below to the next step. The Host header in the request will change to a different domain name.

HTTP server request processing

HTTPD(HTTP Daemon) handles requests/responses on the server side. The most common types of HTTPD are Apache and Nginx, commonly used on Linux, and IIS on Windows.

  • HTTPD receives the request
  • The server breaks the request into the following parameters:
    • HTTP request methods (GET, POST, HEAD, PUT, DELETE, CONNECT, OPTIONS, or TRACE). In this case, you type the URL directly into the address bar and you use the GET method
    • Domain name: http://google.com
    • Request path/page :/ (we did not request the specified page under http://google.com, so/is the default path)
  • The server verifies that it has a virtual host configured for http://google.com
  • The server verifies that http://google.com accepts the GET method
  • The server authenticates the user using the GET method (based on IP address, identity information, etc.)
  • If the server has a URL rewriting module installed (such as Apache’s mod_rewrite and IIS’s URL Rewrite), the server will try to match the rewriting rules, and if so, the server will Rewrite the request according to the rules
  • The server gets the response based on the request information, in this case the home page file will be accessed because the access path is “/” (you can override this rule, but this is the most common).
  • The server will parse the file using the specified handler. If Google uses PHP, the server will parse the index file using PHP, capture the output, and return the PHP output to the requester

The story behind the browser

When the server provides the resources (HTML, CSS, JS, images, etc.), the browser does the following:

  • Parsing – HTML, CSS, JS
  • Render – Build DOM tree -> Render -> Layout -> Draw

The browser

The function of the browser is to retrieve the resources you want from the server and display them in the browser window. Resources are usually HTML files, but they can also be PDFS, images, or other types of content. The location of resources is determined by the Uniform Resource Identifier (URI) provided by users.

The way browsers interpret and display HTML files is detailed in the HTML and CSS standards. These standards are maintained by the World Wide Web Consortium (W3C), a Web standards organization.

The user interfaces of different browsers are often very similar, with many common UI elements:

  • An address bar
  • Back and forward buttons
  • Bookmark option
  • Refresh and stop buttons
  • Home button

Browser High-level Architecture

The components that make up the browser are:

  • User Interface The user interface includes the address bar, forward and back buttons, bookmarks menu, etc. Everything you see is part of the user interface except the request page
  • The browser engine is responsible for making the UI and rendering engine work together
  • Rendering Engine Rendering engine is responsible for displaying the requested content. If the requested content is HTML, the rendering engine parses the HTML and CSS and displays the content on the screen
  • Network components Network components are responsible for network calls, such as HTTP requests, using a platform-independent interface, with platform-specific implementations underneath
  • UI back end THE UI back end is used to draw basic UI components, such as drop-down list boxes and Windows. The UI backend exposes a unified platform-independent interface, and the underlying layer is implemented using the UI methods of the operating system
  • Javascript engine Javascript engines are used to parse and execute Javascript code
  • Data store The data store component is a persistence layer. Browsers may need to store all kinds of data locally, such as cookies. Browsers also need to support storage mechanisms such as localStorage, IndexedDB, WebSQL, and FileSystem

HTML parsing

The browser rendering engine takes the requested document from the network layer and typically transfers the document in 8kB chunks.

The main job of AN HTML parser is to parse HTML documents and generate parse trees.

A parse tree is a tree with DOM elements and attributes as nodes. DOM, short for Document Object Model, is both the Object representation of HTML documents and the external-facing interface (such as JavaScript) to HTML elements. The root of the tree is the “Document” object. The entire DOM and HTML document have an almost one-to-one relationship.

Parsing algorithm

HTML cannot be analyzed using the usual top-down or bottom-up methods. The main reasons are as follows:

  • The “tolerant” nature of language itself
  • HTML itself can be fragmentary, and browsers need to have traditional fault-tolerant mechanisms to support common imperfections
  • The parsing process requires iteration. In other languages, the source code does not change during parsing, but in HTML, dynamic code, such as the document.write() method contained in a script element, adds content to the source code, meaning that parsing actually changes the input

Unable to use common parsing techniques, browsers created parsers specifically for parsing HTML. The parsing algorithm is described in detail in the HTML5 standard specification, and the algorithm mainly consists of two stages: tokenization and tree construction.

After parsing

The browser starts loading the external resources of the page (CSS, images, Javascript files, etc.).

At this point, the browser marks the document as interactive, and the browser begins parsing scripts in deferred mode, those that need to be executed after the document has been parsed. The document’s state then changes to complete, and the browser triggers a Load event.

Note that there is never an “Invalid Syntax” error when parsing an HTML page; the browser fixes all the errors and continues parsing.

CSS analytical

  • Analyze the content of CSS files and
  • Each CSS file is parsed into a StyleSheet Object, which contains CSS rules with selectors and objects corresponding to the CSS syntax
  • CSS parsers may be either top-down or bottom-up parsers generated using the parser generator

Page rendering

  • Create a “Frame tree” or “render tree” by traversing the DOM node tree and compute the individual CSS style values for each node
  • Compute the preferred width of each node in the Frame tree from the bottom up by adding the width of the child node, the padding, border, and margin of the node
  • Calculate the actual width of each node by assigning feasible widths to the children of each node from the top down
  • Compute the height of each node from the bottom up by applying text folding, adding the height of the child node, and the padding, border, and margin of the node
  • Use the above calculation results to construct the coordinates of each node
  • When there is use floated elements, there are absolutely or relatively attributes, there will be more complicated calculations, See http://dev.w3.org/csswg/css2/ and http://www.w3.org/Style/CSS/current-work
  • Create layers to indicate which parts of the page can be drawn as a group without having to be rasterized. Each frame object is assigned to a layer
  • Each layer on the page is assigned a texture
  • Each layer’s frame object is traversed, and the computer executes drawing commands to draw each layer, either rasterized by the CPU or directly drawn on the GPU via D2D/SkiaGL
  • All of the above steps can take advantage of the values calculated during the last page rendering, which can save a lot of computation
  • The final position of each layer is calculated, a set of commands are issued by Direct3D/OpenGL, the GPU command buffer is cleared, the command is transmitted to the GPU and rendered asynchronously, and the frame is sent to the Window Server.

The GPU to render

  • During rendering, the graphics processing layer may use a general-purpose CPU or a graphics processor GPU
  • When GPU is used for graphics rendering, the graphics driver software will divide the task into multiple parts, so as to make full use of the powerful parallel computing capability of GPU and carry out a large number of floating point calculations in the rendering process.

Window Server

Post-rendering with user-triggered processing

After rendering, the browser runs JavaScript code based on some timing mechanism (such as Google Doodle animation) or interacts with the user (entering a keyword in the search bar to get a search suggestion). Plug-ins such as Flash and Java also run, although not on the Google home page. These scripts can trigger network requests, or they can change the content and layout of a web page, leading to another round of rendering and drawing.

Making:
alex/what-happens-when






Copyright Notice:This article is a collaborative translation, used
Creative Commons ZeroAgreement to release