The domain name of a third-party service we rely on was recently changed, and both the old domain name and the new domain name are accessed via HTTPS.

After deploying multiple containers for normal backend service changes, the following problems occur:

  • Some containers failed to access their new domain name (the domain name remained inaccessible throughout the service run).

  • Other containers can access the domain name normally.

  • Some containers on the same physical machine can access the domain name successfully, but some containers fail to access the domain name.

The exception information is as follows:

PKIX path building failed: Public Key Infrastructure (PKI) is a Public Key Infrastructure based on Public and private keys to achieve reliable message transmission and identity confirmation, while PKIX is a PKI system based on X.509 (a digital certificate format standard). SSL/TLS certificates in HTTPS follow this standard.

Unable to find valid certification path to request target: Unable to find valid certification path to request target: The certificate cannot be trusted.

This is an EXCEPTION during SSL/TLS certificate verification. To clarify the problem, we need to know the following:

  1. What does HTTPS request look like, and why do you need certificate authentication?

  2. Where and how does SSL/TLS certificate authentication take place?

Starting with HTTPS, we know that HTTPS is more secure than HTTP for the following reasons:

  • HTTP transmits data in clear text.

  • HTTPS is an encrypted transmission of data:

  • The communication parties hold the same session key to encrypt and decrypt data in the form of symmetric encryption.

  • To ensure that the session key cannot be stolen by third parties and tampered with communication data, SSL/TLS (TLS is a standard after SSL3.0) is introduced. It is located between the HTTP layer and the TCP layer to ensure communication security.

The short answer is HTTPS = HTTP + SSL/TLS. After the TCP connection is established, the TLS handshake process is performed to transmit the session secret key (not directly sending the secret key, but using three random numbers during the handshake to generate the session secret key), and then the application data can be encrypted through the session secret key.

As for the generation of session key from TLS four handshake, and this exception occurred in the TLS handshake during the certificate verification link.

TLS four handshakes

A quick review of the TLS handshake process:

TLS handshake process

The whole handshake process is actually a process of exchanging session secret keys, during which a total of 3 random numbers are generated:

  1. Random number generated by the client in ClientHello

  2. ServerHello Random number generated by the server

  3. PreMaster Key is a random number generated by the client in ClientKeyExchange. Note that this random number is encrypted and transmitted through the public Key of the server.

After the handshake, the client and server both hold these three random numbers. According to the encryption method agreed in advance, the symmetric encryption key used in this session is respectively generated, that is, the session secret key mentioned above. Then the session secret key can be used to encrypt the content for encrypted communication.

Note here that all communication throughout the handshake phase is in clear text.

So in the case of plaintext communication, how to ensure that the exchange of session keys with the client is the server it requested, and not some other third party impersonating? This starts with certificate verification.

After ServerHello, the client obtains the server’s certificate and verifies the authenticity of the certificate to verify that the server is responding to the request. This is where the exception at the beginning of the article occurred, causing the handshake to fail.

Certificate authentication

An X.509 V3 compliant digital certificate contains the following basic contents:

X.509 V3

We focus on the certificate body public key and signature.

HTTPS uses an asymmetric encryption algorithm to verify digital certificates:

  • This algorithm contains two secret keys: a private key and a public key.

  • The ciphertext encrypted with the private key can be decrypted only by the corresponding public key; otherwise, the ciphertext encrypted with the public key can be decrypted only by the corresponding private key.

  • The private key is kept confidential by the certificate holder, while the public key is used publicly.

As mentioned in the handshake phase above, the third random number PreMaster Key is encrypted with the public Key of the server, which is the public Key here.

Signature: verifies that the certificate information is not tampered with. When the certificate is issued, the authority encrypts the fingerprint and fingerprint algorithm of the certificate using its own private key to obtain the signature.

  • The fingerprint: Indicates that the issuing authority fingerprint the original content of the certificate before issuing itSHA1 or SHA256A hash value computed. The hash value is irreversible. The purpose of fingerprints is to ensure that the contents of the certificate are complete.

When the client gets the certificate of the server:

  1. First, the public key of the certificate is used to decrypt the signature of the certificate to obtain the fingerprint and fingerprint algorithm.

  2. Then the corresponding fingerprint algorithm is used to calculate the original content of the certificate to obtain a hash value and compare it with the fingerprint of the certificate.

Verification of fingerprints ensures that the contents of the certificate are intact and have not been tampered with, but does not guarantee that the requested connection has been intercepted by an intermediary (who also has a certificate issued by the corresponding CA authority). His certificate is encrypted with the issuer’s private key and can pass certificate integrity verification.

In this case, the client compares the URL requested by the client and the certificate URL to determine whether the current certificate is the certificate of the server requested by the client. (If the middleman changes the URL on his certificate, the middleman cannot pass the verification that the certificate is not tampered with.)

In total, 3 items have been certified:

  1. Whether the certificate is issued by a trusted authority.

  2. Whether the certificate is tampered with.

  3. Is the certificate from the server we requested, not from a third party?

In the actual certification process, certificates exist in the form of a certificate chain, which usually has three levels:

  1. Root Certificate: issued by the Certificate Authority (CA).

  2. Intermediate Certificate: helps a CA issue a Certificate to a Certificate holder. There can be multiple levels of Intermediate Certificate.

  3. End-entity Certificate: indicates the Certificate of the Certificate holder, which is used in HTTPS.

certificate-verify

Each layer in the certificate chain guarantees the validity of the certificate at the next layer and verifies the certificate chain upward (verifying the signature) until the root. The root certificate is pre-installed in the operating system and browser. The current certificate can be trusted only when all certificates in the certificate chain are trusted.

Returning to the exception at the beginning of this article, when can the handshake succeed for some containers and fail for some containers? In theory the certificates we request should be the same if all handshakes are successful or all handshakes fail.

Is it possible that the list of built-in certificates on the incompatible container is inconsistent, or that different libraries are being read, and that the failed container doesn’t happen to have the root certificate for this third-party domain certificate chain? It starts with our service.

Our back-end service language is Java, and in the JDK there is an extension JSSE(Java Secure Socket Extension) : the Java implementation standard for SSL/TLS, providing APIS and implementations that support SSL/TLS. The TrustManager interface is related to certificates. It is mainly responsible for deciding whether to trust the security certificate of the other party and reads the certificate according to the following priorities:

  1. The javax.net.ssl.trustStore property specifies the TrustStore (this file represents the trust certificate repository)

  2. ${java.home}/lib/security/jssecacerts

  3. ${java.home}/lib/security/cacerts

TrustStoreManager

The Java VIRTUAL machine does not use the built-in certificates of the operating system. Instead, the Java VIRTUAL machine uses some root certificates in its Security Manager by default. The list of built-in certificates varies with JDK versions. Run the keytool -list -v -keystore cacerts command to view the certificate content.

After investigation, the JDK version used by our containers is the same, the list of built-in certificates is the same, and TrustStore is not specifically specified. So there is no doubt about the different built-in certificates.

Returning to the TLS handshake process, we seem to be missing a case where servers are deployed with service-bound domain names today:

The server does not provide only one service, especially with nginx, we usually bind multiple domain names to multiple services on a server, and multiple domain names mean multiple certificates.

In this case, during the TLS handshake, how does the server know which service certificate the client is requesting?

TLS provides a Server Name Indication (SNI) extension that allows you to use multiple domain names for a single Server. It will bring the requested domain name information to the client in the first handshake ClientHello so that the server knows which certificate to respond to in ServerHello.

(OpenSSL s_client-showcerts-connect xxx.com:443) (OpenSSL s_client-showcerts-connect xxx.com:443) (OpenSSL s_client-showcerts-connect xxx.com:443) (OpenSSL s_client-showcerts-connect xxx.com:443) (OpenSSL s_client-showcerts-connect xxx.com:443) If the trusted certificate is not expected by the client, the correct certificate can be requested only after the SNI (openssl s_client-showcerts-connect xxx.com: 443-servername xxx.com) is installed.

openssl

See here, for the article at the beginning of abnormal situation, have a basic explanation: the three new domain name server where there are multiple certificates, we visit containers may be because of some reason lead to failure SNI extension fails, shake hands didn’t take in the information domain information, request to the cloud shield self-signed certificates, certificate validation fails.

As mentioned above, our backend service language is Java, and the access to the three-party domain is based on HttpClient. Next, we will look at the code that causes the SNI extension to fail.

JDK and HttpClient support for SNI extensions

JDK

First JDK 1.7 support the SNI, but in the early stages of the jdk1.8 version before 1.8 (u14) there is a Bug: the JDK – 8144566 (bugs.openjdk.java.net/browse/JDK-… The SNI extension fails. Other things to look out for:

  • Attribute jsse enableSNIExtension defaults to true, open the SNI extension see jsse document (docs.oracle.com/javase/8/do…

  • Properties javax.net.debug=ssl, handshake information can be viewed.

  • Javax.net.ssl.SSLSocketFactory offers the following several structural approach of the Socket.

jdk-sni

HttpClient

HttpClient began support SNI see 4.3.2 HttpClient – 1119 (issues.apache.org/jira/browse…

  • Under older versions of HttpClient, sockets are created by default using the JDK’s SSLSocketFactor #createSocket(), so the handshake does not carry an SNI.

  • Under the new version, use SSLConnectionSocketFactory# createLayeredSocket (Socket Socket, String host, int port, HttpContext context). Internal implementation on a plain Scoket, call line 12 above to construct SSLSocket, after which the handshake process supports SNI.

conclusion

Finally, based on the suspicion of SNI extension failure, we checked the use of our code and finally found the cause of the problem at the beginning of the article: The jSSE. enableSNIExtension=false will be dynamically set in special cases. As a result, when some containers request the third-party domain name, the SNI extension is invalid, the wrong certificate (Cloud Shield) is obtained, and the handshake fails.

If you have a similar certificate verification problem and you are sure that the certificate is ok, check whether it is an SNI problem.