Some bubble education notes, sorting out learning

First of all, we need to know that the application layer is the seventh layer of the OSI seven-layer network model. Different types of network applications have different communication rules, so the application layer protocols are diverse, such as DNS, FTP, Telnet, SMTP, HTTP, and other protocols are used to solve their own class of problems.

httpBasic principles of communication protocols

HTTP protocol is widely used in remote communication scenarios, including the communication of the mainstream micro-service architecture based on HTTP protocol. Because of the relationship of frequent use, so we understand HTTP protocol is still quite profound, I will directly comb the basic principle of HTTP protocol here.

At a timeHTTPRequest communication flow

Let’s start with a question: when we type in a web address, how does the browser display the content of the target web address? Where does the content come from?

So let’s graph the process DNS: (Domain Name System) The service is andHTTPProtocols Protocols at the application layer. It provides domain name to IP address resolution service, users usually use host name or domain name to access each other’s computer, rather than directly through IP address. That’s because it’s better to remember a computer name as a combination of letters and numbers than as a set of pure numbers for an IP address. But getting computers to understand names is relatively difficult. Because computers are better at processing long strings of numbers. To solve the above problems,DNSService came into being.DNSThe protocol provides the service of searching IP addresses by domain names or reverse-searching domain names from IP addresses.

HTTPComposition of communication protocols

Now that we have seen how HTTP works, we should also know that HTTP is based on the application layer protocol and is the reliable communication protocol of TCP used at the transport layer. Since it is a protocol, it should conform to the definition of a protocol: a protocol is an agreement reached by two programs that need to communicate through the network, and it defines the mode of exchange and the meaning of the packet. Therefore, we will go into the in-depth analysis of the principle and composition of the HTTP protocol.

requestURILocate resources

When we enter an address in the browser, how does the browser find the resource corresponding to the address and return it to the server? And what valuable information does this address contain?

This requires us to understand Uniform Resource Locator (URL), which is used to describe resources on a network.

A URI identifies an Internet resource as a string, and a URL represents the resource’s location (its location on the Internet). Visible urls are a subset of URIs. Example: www.baidu.com:80/java/index….

Schema ://host[:port#]/path/… /? [url-params]#[ query-string]

  • scheme: Specifies the protocol used by the application layer (for example:http.https.ftp)
  • host:HTTPIP address or domain name of the server
  • port#HTTPThe default port of the server is 80, in which case the port number can be omitted. You must specify if another port is used, for examplewww.cnblogs.com:8080/
  • path: Path for accessing resources
  • query-string: Query string
  • Fragment identifiers (fragment identifiers are usually used to mark a child resource (a location within a document) within an acquired resource.)

From this URL, we can read that the current user is using HTTP to access the resource in the corresponding process on the specified server and carries the request parameters.

MIME Type

After the server finds the file based on the resource requested by the user, it returns a resource to the client browser, which parses and renders the resource. However, there are many resource types on the server, such as image type, video type, Js, Css, text and so on. So how does the browser recognize the current type to render differently? MIME Type: An Internet standard that describes the Type of message content

  • Text file:text/html,text/plain,text/css,application/xhtml+xml,application/xml
  • Image file:image/jpeg,image/gif,image/png
  • Video file:video/mpeg,video/quicktime

We can set the render type of the file in two ways

  • The first is aAccept
    • Accept: indicates the type of data that the client wants to accept. That is, it tells the server what media type of data I needAcceptThe request header produces data of the specified media type.
  • The second isContent-Type
    • Content-Type: indicates the entity data type sent by the sender. For example, we should have written something like:Resposne. SetContentType (" application/json; Charset = utf-8 ")The data format returned by the server isjson.

If the Accept and content-type do not match, the browser will not be able to parse it if the server returns text/ HTML while accepting image/ GIF.

What if the user visits an address that doesn’t exist?

If the user is accessing the correct address, or if the server is properly parsing and processing the current user’s request, the correct information will be returned to the client. But what if there is a problem with the address the user is accessing, or if the server has problems parsing the user request and processing the request logic? How should the browser tell the user that it is currently handling a failure?

So there’s one involved hereStatus codeThe concept of

Responsibilities of status codesWhen the client sends a request to the server, it describes the result of the request processing returned by the server. By using the status code, the browser can know whether the server processed the request properly or if an error occurred.Common status codes:

  • 200: Everything is fine
  • 301: Permanent redirect
  • 404: Requested resource does not exist
  • 500: The server has an internal error

With a status code, the browser can be very friendly when the user visits a site that is not normal.

Tells the server the intent of the current request

With URL, MimeType and status code, it can basically meet the needs of users. However, in many cases, a website does not simply obtain resources from the server and render, but may also need to do some data submission, deletion and other functions. So browsers define eight methods to represent how to operate on different requests, with Get and Post being the most common.

  • GET: Usually used to send a message from the clientURIAddress to obtain resources from the server (generally used for query operations),GetThere are limits to the data that can be transferred, which are determined by the browser
  • POST: Normal user client transfers an entity to the server for the server to save (usually used for create operations)
  • PUT: Sends data to the server. It is used to update data
  • DELETE: The client initiates oneDeleteThe request asks the server to delete some data (usually used in delete operations).
  • HEAD: Obtains the packet header
  • OPTIONS: Ask for supported methods
  • TRACE: Trace the path
  • CONNECT: Connects the proxy using the tunnel protocol

In the REST architectural style, there are strict rules about how to set up appropriate request methods for different request types. It is also to avoid confusion caused by misuse.

RESTReasons for architectural styles

Personally, I think so

  1. With the popularity of service-oriented architectures,httpProtocols are being used with increasing frequency
  2. Many people are using it incorrectlyhttpProtocols define interfaces, like various names, whatgetUserInfoById.deleteById, stateful and stateless requests.
  3. forhttpThe rules provided by the protocol itself are not well utilized

Therefore, in order to better solve these problems, simply define a set of rules, this set of rules does not introduce anything new, but does make some restrictions on the use of HTTP protocol itself, for example

  1. RESTIs resource-oriented, with each URI representing a resource
  2. Emphasising statelessness, the server cannot store information from one request from a client and use it in other requests from that client
  3. Emphasis onURLWhen revealing resources, do notURIOccur verbs in
  4. Rational utilizationhttpStatus code, request method.

So understand what you are following and what problems you are trying to solve when using REST style with reference to this standard.

httpThe full composition of the agreement

Ok, deduce here, basically understand the composition of an HTTP protocol, then briefly summarize, HTTP protocol contains two packets, one is a request packet, one is a response packet.

The request message

The request message format consists of three parts (start line, head field, body).

The response message

The format of the response packet is the same, and it is divided into three parts

HttpExtensions in protocols

In addition to these two basic components of the HTTP protocol, there are many more common properties or configurations, and I will briefly list some of them

What if the transferred files are too large

The resource files returned on the server are relatively large. For example, some JS files may be several megabytes in size. Too large files affect transmission efficiency and consume bandwidth. What to do?

  1. The common method is to compress the file and reduce the file size. So how does the compression and decompression process work? First, the server must be able to support file compression. Second, the browser can decompress the compressed file. The browser can specify accept-encoding to speed the server. Accept-encoding :gzip,deflate The server will select the appropriate Encoding type for compression based on the supported Encoding type. Common encoding methods are: gzip/deflate

  2. Split transfer: When transferring large amounts of data, the browser can gradually display the page by dividing the data into multiple pieces. This ability to block entity bodies is called Chunked Transfer Coding.

Should a connection be established on each request?

In the earliesthttpIn the protocol, every timehttpCorrespondence, you need to do it oncetcpThe connection. However, one connection requires three handshakes, which increases the communication overhead.So in theHTTP / 1.1Instead of a persistent connection, as long as the client or server does not explicitly request disconnection after a connection is established, then thistcpThe connection will always remain connected.

One of the biggest benefits of persistent connections is that they greatly reduce connection establishment and closure latency.

HTTP1.1 has a Transport section. It carries a Connection: keep-alive, indicating that it wants the Connection to be persistent. HTTP/1.1 persistent connections are enabled by default. Unless otherwise specified, HTTP/1.1 assumes that all connections are persistent. To close a Connection after a transaction ends, an HTTP/1.1 application must explicitly add a Connection: Close header to the message.

When an HTTP1.1 client receives a response, the HTTP/1.1 Connection remains open unless the response contains a Connection: close header. However, clients and servers can still close idle connections at any time. Not sending Connection: close does not mean that the server promises to keep the Connection open forever.

Pipelining connection:HTTP / 1.1Allows request pipes to be used on persistent connections. Previously, after sending a request, you had to wait and receive a response before sending the next request. With the advent of pipelining, the next request can be sent directly without waiting for a response. This makes it possible to send multiple requests simultaneously in parallel without having to wait for one response after another.

HttpProtocol Features

HttpStateless protocol

The HTTP protocol is stateless. What is stateless? This means that the HTTP protocol itself does not store the communication state between the request and the response.

But now the apps are all stateful, if it is stateless, then these apps are almost not used, you think, visit an e-commerce website, log in first, and then go to buy goods, when you click on an item to add to the cart, and then prompt you to log in. The user experience will never be used. So how do we implement stateful protocols?

Client supportedcookie

Cookie technology is introduced into Http protocol to solve the problem of STATeless Http protocol. The client state is controlled by writing Cookie information in request and response packets. The Cookie notifies the client to save the Cookie based on the set-cookie header field in the response packet sent from the server. When the client sends a request to the server next time, the client automatically adds the Cookie value to the request packet and sends the request packet.

Supported by the serversession

In what way does the server store state?

Based on thetomcatThis kind ofjsp/servletIn the container, it will providesessionThe server uses a hash table structure to store information when a program needs to create one for a client requestsessionThe server first checks if the client’s request contains onesessionLogo –session id;

If you already have onesession idIf yes, it has been created for the clientsession, the server followssession idTo put thissessionRetrieve it and use it (if it cannot be retrieved, a new one will be created);

If the client request does not containsession id, creates one for this clientsessionAnd generate one with thissessionThe associatedsession id.session idThe value of is an imitated string that neither repeats nor is easy to find a patternsession idWill be returned to the client for saving.

TomcatimplementationsessionAnalysis of code logic

We take theHttpServletRequest#getSession()As an entry point, yesSessionAnalysis of the creation process our application gotHttpServletRequestorg.apache.catalina.connector.RequestFacade(unless certainFilterSpecially treated), it isorg.apache.catalina.connector.RequestThe facade mode of.

First, judgeRequestObjectSessionReturns if it exists and is not invalid. If it doesn’t existSession, then try according torequestedSessionIdTo find theSession, if there isSessionIf it does not exist, a new one is createdSessionAnd thesessionIdAdded to theCookie, subsequent requests will carry itCookie, which can be based onCookieIn thesessionIdFind the original createdSession

HttpsBasic Protocol Analysis

Because HTTP protocol is based on plaintext communication and TCP/IP protocol communication, communication content may be intercepted and stolen on all communication lines according to the working mechanism of TCP/IP protocol family. The Wireshark can be used to capture the request and response content.

httpsSecure transport protocol

Because HTTP protocol communication is not secure, so in order to prevent information from being leaked or tampered with in the process of transmission, people came up with the way to encrypt the transmission channel HTTPS. HTTPS is an encrypted hypertext transfer protocol. The protocol difference between HTTPS and HTTP is that HTTPS fully encrypts data during data transfer. HTTP and HTTPS are both on top of THE TCP transport Layer, and network protocols are layered. Therefore, Secure Socket Layer (SSL) is added on top of TCP. Security Layer) or Transport Layer Security (TLS) Transport protocols are used in combination to construct encryption channels.

SslnetscapeCompany designed (Secure sockets layer) and later the Internet Standardization OrganizationISOCTo replace theNETScapeCompany, releasedSSLAn updated version ofTLS. Then,TLSThe new version has been updated several times; In fact, what we have nowHTTPSAre made ofTLSAgreement, but becauseSSLIt was invented earlier and is still supported by current browsers, soSSLIs stillHTTPSPronoun of.

Reverse deductionhttpsDesign process of

Let’s not explore the implementation of SSL, let’s think from the designer’s point of view how to build a secure transmission channel

Start with the first message

Client A sends A message to server B, which may be intercepted and tampered with. How can we make the packet sent from CLIENT A to server B, even if it is intercepted, we cannot know the message content and view it?

Use symmetric encryption

To ensure that a message cannot be viewed or tampered with by a third party, the first idea is to encrypt the content. At the same time, the message needs to be decrypted by the server. So we can use symmetric encryption algorithm to achieve, the key S plays the role of encryption and decryption. In the case of secret key S, security can be guaranteed?

It’s not that simple

In the Internet world, communication will not be so simple, perhaps.

There will be multiple clients connected to the server, and this client may be a lurker, if he also has symmetric key S, that means the above scheme is not feasible? What if the server communicates with each client using a different encryption algorithm?Seems to solve the problem perfectly, and then? How are the keys distributed? How does the server tell the client which symmetric encryption algorithm to use? The solution seems to be to negotiate after the session is established?

The negotiation process is not secure

The negotiation process means that the key is dynamically distributed based on the transmission of a network, but this negotiation process is not secure, how to break?

Asymmetric encryption

Asymmetric encryption algorithms have the following characteristics: The ciphertext encrypted with the private key can be decrypted as long as there is a public key, but the ciphertext encrypted with the public key can be decrypted only with the private key. A private key can be owned by only one person, while a public key can be distributed to everyone.

In this way, messages sent by A/B to the server are secure. Seems we solved the key negotiation problem with asymmetric encryption algorithms? But here’s the problem

How do I get the public key?

Using asymmetric encryption algorithms, how can clients A and B securely hold public keys? So let’s think about it step by step, and there are two scenarios that we can think of:

  1. The server sends the public key to each client
  2. The server puts the public key on a remote server, and the client can request it (one more request, and the public key placement problem).

Option one doesn’t seem feasible because, again, the transmission process is not secure, right? The public key may be switched.

Bring in a third party

Up to the above step, the most critical question is, how does the client know whether the person who gave me the public key is Huang Rong or Xiaolongnu? Can only look for oneself to prove? Or there may be a third party to help you verify, and the third party is absolutely impartial. So bringing in a trusted third party is a good idea.

The server encrypts the public key using the private key provided by a third-party organization and then transfers the public key to the client. The content encrypted by the private key of a third-party organization against the public key of the server is a crude version of a digital certificate. This frame number contains the server public key.After the client obtains the certificate, the certificate is encrypted by a third-party organization using a private key. The client must have a public key provided by a third-party organization to decrypt the certificate. How to transfer the public key of the third party? (assuming it is built into the system first) and there is another problem. Certificates issued by third parties are for all users, not just one. What if lawbreakers apply for a certificate, too?

What if the wrongdoer got a certificate?

If the offender also applies for a certificate, it can swap the certificate. The client in this case cannot tell whether it is receiving your certificate or a middleman’s. Because either the middleman’s certificate or your certificate can be decrypted using the public key of a third party.

Verify the validity of the certificate

At this point, the question becomes, how does the client identify the authenticity of the certificate? In real life, to verify the authenticity of a things, most are based on the number to verify (such as the university graduation certificate, such as whether to buy digital products is fake), I have said before, the solutions are to be realized in the field of computer, so in this case, the solution is the same, if you add a certificate to the digital certificate number? Is that what you want?

The certificate describes how to generate a certificate number based on the certificate content. After obtaining the certificate, the client generates a certificate number according to the method on the certificate. If the generated certificate number is the same as the certificate number on the certificate, the certificate is genuine. This piece is similar to MD5 authentication. When we download a software package, we will provide an MD5 value. After we get the software package, we can use a third-party software to generate an MD5 value for comparison.



Data on the serverMD5The algorithm yields aMD5, generates the certificate number, encrypts the certificate number using the private key of a third-party organization, and adds the certificate number generation algorithm to the certificate.

Browser built inCAThe public key can decrypt the serverCAPrivate key encrypted certificate through the browser built-inCAThe certificate number algorithm checks the certificate number returned by the server

Where is the public key certificate of the third party?

Browser and operating system will maintain a list of third party authority (including their public key) for the client to receive the certificate of some authority, the client is based on the values found in the local response of the issuing authority’s public key Having said that, I think we must know that the certificate is a digital certificate of HTTPS, and The certificate number is the digital signature, and the third party is the issuing authority (CA) of the digital certificate.

HttpsThe principle of analysis

HTTPSCertificate application process

  1. Generated on the serverCSRFile (certificate application file, including the certificate public key, usedHashSignature algorithm, applied domain name, company name, position, etc.)

The 2.CSRFiles and other possible credentials are uploaded to the CA authority, which, upon receipt of the certificate request, uses the documents in the applicationHashAlgorithm, digest part of the content, and then use the CA body’s own private key to sign the summary information (equivalent to the unique number of the certificate) 3. The CA then sends the signed certificate to the applicant by email. 4. After receiving the certificate, the applicant deploys to his/her ownwebIn the server

Client request interaction process

  1. The client initiates a request (Client HelloPackage)
    • A) Three handshakes, establishTCPThe connection
    • B) Supported protocol versions (TLS/SSL)
    • C) Random number generated by the clientclient.random, which is then used to generate the “conversation key”
    • D) Encryption algorithms supported by the client
    • e)sessionid, used to maintain the same session (if the client and server have gone through the trouble of establishing one)HTTPSIt’s a pity that the link is broken as soon as it’s built.
  2. The server receives the request and responds (Server Hello)
    • A) Confirm the encryption channel protocol version
    • B) Random number generated by the serverserver.random, which is then used to generate the “conversation key”
    • C) Confirm the encryption algorithm used (used to sign subsequent handshake messages to prevent tampering)
    • D) Server certificate (CACertificate issued by the organization to the server)
  3. The client receives the certificate for authentication
    • A) Verify whether the certificate is superiorCAWhen verifying the certificate, the browser will invoke the certificate manager interface of the system to verify all certificates in the certificate path level by level. Only all certificates in the path are trusted, the whole verification result is trusted
    • B) The certificate returned by the server contains the validity period of the certificate. You can verify whether the certificate is expired by the expiration date
    • C) Verify whether the certificate has been revoked
    • D) We know thatCAWhen an organization issues a certificate, it uses its private key to sign the certificate the signature algorithm field in the certificatesha256RSAsaidCAAgencies usesha256Digest the certificate, then useRSAThe algorithm signs the digest with a private key, and we know thatRSAIn the algorithm, only the public key can check the signature after the private key is used.
    • E) The browser is built into the operating systemCAThe organization’s public key checks the server’s certificate. Determine whether the certificate is issued by a legitimate authority. I found out after checking the visaCAAgencies usesha256Digest the certificate before the client uses itsha256The certificate content is summarized. If the value obtained is the same as that returned by the server, the certificate has not been modified
    • F) After passing the verification, the word “safety” in green will be displayed
    • G) The client generates a random number. After passing the verification, the client will generate a random numberpre-master secret, client based on the previous:Client.random + sever.random + pre-masterGenerate a symmetric key and encrypt it using the public key in the certificate, using the previously negotiated keyHASHAlgorithm, take the handshake messageHASHValue and then encrypt “handshake message + handshake message with random numberHASHValue (signature) “and send it to the server (where the handshake message is fetched)HASHValue is used as a signature to verify that the handshake message has not been tampered with in transit.
  4. The server receives random numbers
    • A) After receiving the encrypted data from the client, the server uses its own private key to decrypt the ciphertext. Then getclient.random/server.random/pre-master secret.HASHValue, and with the passedHASHThe values are compared to make sure they are consistent.
    • B) Then encrypt a handshake message (handshake message + handshake message) with a random passwordHASHValue) to the client
  5. The client receives the message
    • A) The client decrypts and calculates the handshake message with a random numberHASH, if with the server sentHASHAt this point, the handshake process is over,
    • B) All subsequent communication data will be generated during the previous interactionpre master secret / client.random/server.randomThrough the algorithmsession KeyIs used as the symmetric key in the subsequent interaction

httpsThe application of actual combat

Next, in order to better understand the principle of HTTPS, we configure an HTTPS certificate based on Nginx. In the production environment, SSL certificates need to be purchased from third-party certification authorities, including professional OV certificates (the enterprise name is not displayed in the browser address bar) and advanced EV certificates (the enterprise name can be displayed). The number of domain names protected by certificates also affects the price (for example, WWW authentication and wildcard * authentication only). Prices are different), and tier 3 domain names are not supported. This indicates that the certificate is expired or invalid, or yellow indicates that some connections to the site are still using HTTP. If you buy your own domain name, you can apply for a free certificate on Aliyun to use it.

In order to demonstrate the certificate application process, we directly use OpenSSL as the certificate authority to make the certificate. However, the certificate is not trusted, so the browser will prompt the error that the certificate is not trusted.

Certificate application process

Generate the server certificate request file and private key file

Openssl req-nodes -newkey rsa: 2048-out myreq.csr -keyout privatekey.key: openssl req -nodes -newkey rsa: 2048-out myreq.

  • req: indicates a request for a digital certificate
  • rsa:2048 indicates the encryption algorithm and length
  • out: Outputs a request file
  • keyout: Generates a private key
  • myreq.csrCertificate signing request, this is not a certificate, but to obtain a signature certificate from the authority, its main content is a public key
  • privatekey.key, the private key that matches the public key

CSR(Certificate Request file), which is used to apply to the CA, usually ends with CSR. It contains all the necessary information for applying for a certificate. The most important one is the domain name, which must be the domain name that you want to access through HTTPS. This file is the private KEY corresponding to the server. This information is important first. If the KEY file is not well saved, it cannot be retrieved

Very little information is required to run this command

sectiondepartment

simulationCAInstitutions to makeCACertificate of organization

The CA uses its own public and private keys to encrypt the public keys submitted by the certificate applicant. Therefore, to simulate the work flow of the CA organization, you need to create a CA certificate openSSL configuration file: /etc/pki/tls/openssl.cnf

The following is aopenssl CAThe default configuration we need to configureCA, you need to create the corresponding file in the specified directory

  1. Create the required files

    touch /etc/pki/CA/index.txtGenerate the certificate index database fileecho 01 > /etc/pki/CA/serialSpecifies the serial number to which the first certificate is issued. It must be a two-digit hexadecimal number, 99 followed by 9A
  2. CAFrom the certificate – generate the private keycd /etc/pki/CA/ openssl genrsa -out /etc/pki/CA/private/cakey.pem 2048
  3. Generate a self-signed certificate

    Openssl the req - new - x509 - key/etc/pki/CA/private/cakey. Pem - days 365 - out/etc/pki/CA/cacert pem

    Prompt for country, province, city, company name, department name,CAHost name (Issuer name)
  4. Issue the certificate

    openssl ca -policy policy_anything -in myreq.csr -out mycert.crt -days 365 policy policy_anything policyParameter allows signing CA and site certificates to have different countries, place names, and so on

    out: caCertificate file issued

    daysValidity of certificate

nginxconfigurationhttps

Configure the server segment in nginx.conf to add the certificate mycert.pem and private key pem to the specified file

server { listen 443 ssl; ssl on; ssl_certificate cert/mycert.crt; ssl_certificate_key cert/privatekey.key; ssl_session_cache shared:SSL:1m; ssl_session_timeout 5m; ssl_ciphers HIGH:! aNULL:! MD5; ssl_prefer_server_ciphers on; location / { root html; index index.html index.htm; }}Copy the code

Small tips

Many students encountered some problems when playing, asking why certificates end in PEM, CRT and key. In fact, there are two encoding formats for X.509 certificates, one is PEM and the other is DER. However, it is not necessary to use PEM or DER as the extension when creating the certificate and private key. For example, the certificate can be expressed in PEM, DER, CRT, or CER, and the private KEY or public KEY can be expressed in PEM, DER, or KEY, but the encoding format is different.

Nginx adds HTTPS support

To add certificate support to an already installed Nginx, follow these steps

  1. /data/program/nginx/sbin/nginx -VBefore viewingnginxCompile which modules are installed to avoid missing some module configurations and causing problems
  2. CD/data/program/nginx - 1.11Downloaded before enteringnginxSource package directory
  3. ./configure --prefix=/data/program/nginx --with-http_stub_status_module --withhttp_ssl_moduleRecompile and addsslModule support
  4. make】 to performmakeOrders must not be carried outmake installOtherwise, it will be installed beforenginxoverwrite
  5. cp /data/program/nginx/sbin/nginx /data/program/nginx/sbin/nginx.bakBack up the original startup script
  6. cp objs/nginx /data/program/nginx/sbin/】 to replacenginxBinary script of
  7. /data/program/nginx/sbin/nginx -VVerify again that the required modules are compiled