Introduction to the

Individual services need to be upgraded to cluster services if they want to break through to high concurrency services. Clustering also lays a solid foundation for high availability. If you look at the most popular services or middleware, both RabbitMQ and Redis provide clustering capabilities.

As a representative of the hardcore industry, Wildfly is no exception. Recently, we studied the cluster of KeyCloak and found that its underlying server also uses Wildfly. This article will discuss the architecture of the cluster of KeyCloak.

Clusters in keyCloak

As we know, there are two modes in KeyCloak, one called Standalone and the other called domain.

The difference between the two modes is simply that the deployment files are centrally managed, and if the deployment files need to be manually copied one by one, this is the standalone mode. If it is a one-click automatic installation, then it is domain mode.

Standalone mode has a configuration file called/standalone/configuration/standalone – ha. XML, the XML file is configured in standalone mode cluster.

And domain model, the configuration file is in the domain controller is configured on the machine, the specific file is domain/configuration/domain XML.

Let’s take a look at the cluster specific components ha uses:

<profile name="full-ha">.<subsystem xmlns="Urn: jboss: domain: modcluster: 5.0">
                <proxy name="default" advertise-socket="modcluster" listener="ajp">
                    <dynamic-load-provider>
                        <load-metric type="cpu"/>
                    </dynamic-load-provider>
                </proxy>
</subsystem>

<subsystem xmlns="Urn: jboss: domain: infinispan: 11.0">.</subsystem>

<subsystem xmlns="Urn: jboss: domain: jgroups: 8.0">
                <channels default="ee">
                    <channel name="ee" stack="udp" cluster="ejb"/>
                </channels>
                <stacks>
                    <stack name="udp">.</stack>
                    <stack name="tcp">.</stack>
                </stacks>
            </subsystem>.</profile>
Copy the code

The main uses are ModCluster, Infinispan and JGroups.

In addition, KeyCloak introduces a kind of clustering called cross-data center

This mode is mainly used when services are cross-data centers, such as remote equipment rooms, which have strong disaster recovery.

After looking at the basic cluster setup for KeyCloak, let’s talk about some of the key concepts and uses of the KeyCloak cluster.

Load balancing

Because of the cluster structure, we have multiple servers at the back end, so when users access our services through the client, which server should they locate?

This is where load balancing software comes in, also known as Load Balancing.

Generally speaking, there are three load balancing methods:

The first is client load balancing. The client already knows the multiple service addresses of the server and selects the service address to request when sending a request.

This pattern typically involves configuring a powerful client API for routing functionality, such as Memcached.

The magic of Memcached comes from two-stage hashing. Memcached is like a huge hash table with lots of <key,value> pairs. With keys, arbitrary data can be stored or queried.

Clients can store data on multiple memcached devices. When querying data, the client first calculates the hash value (stage-one hash) of the key by referring to the node list, and then selects a node. The client sends the request to the selected node, and the memcached node looks for the actual item through an internal hash algorithm (phase two hash).

The second type is proxy service load balancing. In this mode, a proxy server connects to multiple back-end services. The client interacts with the proxy server, and the proxy server selects which service to route to on behalf of the client.

This Proxy routing software, such as familiar nginx and HTTPD, as well as ildFly with mod_cluster, HA Proxy, or other hardware load balancing.

The third is routing load balancing. In this mode, the user randomly selects a back-end server to make a connection request and then routes the request internally to other servers.

In this mode, specific load balancing functions must be implemented on servers.

Expose the CLIENT IP address

No matter what mode of load balancing is used, it is possible that the IP addresses accessed by customers will be used in our business.

We need to obtain user IP addresses for certain services, such as recording user operation logs. If the real IP address cannot be obtained, the wrong IP address may be used. There is also the authentication or anti-brush work based on the IP address.

There is a problem if we use a reverse proxy server before serving. Therefore, you need to configure the reverse proxy server to ensure that the two HTTP headers, X-Forwarded-For and X-Forwarded-Proto, are valid.

The server then fetches the client’s real IP address from X-Forwarded-For.

In keyCloak, if HTTP forwarding is used, you can configure it like this:

<subsystem xmlns="Urn: jboss: domain: undertow: 10.0">
   <buffer-cache name="default"/>
   <server name="default-server">
      <ajp-listener name="ajp" socket-binding="ajp"/>
      <http-listener name="default" socket-binding="http" redirect-socket="https"
          proxy-address-forwarding="true"/>.</server>.</subsystem>
Copy the code

If AJP is forward, such as Apache HTTPD + mod-cluster, then configure it like this:

<subsystem xmlns="Urn: jboss: domain: undertow: 10.0">
     <buffer-cache name="default"/>
     <server name="default-server">
         <ajp-listener name="ajp" socket-binding="ajp"/>
         <http-listener name="default" socket-binding="http" redirect-socket="https"/>
         <host name="default-host" alias="localhost">.<filter-ref name="proxy-peer"/>
         </host>
     </server>.<filters>.<filter name="proxy-peer"
                 class-name="io.undertow.server.handlers.ProxyPeerAddressHandler"
                 module="io.undertow.core" />
     </filters>
 </subsystem>
Copy the code

Sticky Sessions and non-Sticky sessions

If you are in a session environment, such as a Web application, you also need to consider session sharing if the back-end server is in a cluster.

Because sessions are maintained locally for each server, what if multiple servers want to share sessions?

One way is for all servers to store sessions in the same external caching system, such as Redis. In this way, no matter which server the user accesses, the same session data can be read.

Of course, the cache system can be a single point or a cluster, or even synchronized across data centers if they are different data centers.

Cache synchronization is of course a good idea, but synchronization naturally has overhead. Is there a simpler and more convenient way to deal with it? For example, if one user only accesses the same server, does that solve the cache synchronization problem?

This pattern of fixed user access to a particular server is called the Sticky Sessions pattern. In this mode, session synchronization is not a concern. Of course, in this mode, if a server goes down, the user’s session is lost. So you still have to do some session synchronization, but you don’t have to do it in real time.

In addition, the sticky session has another disadvantage: if the request is from the background, the session information cannot be obtained, so the sticky session cannot be realized. In this case, the background data must be copied to ensure the consistency of the request no matter where the request is sent.

shared databases

All applications need to save data. Generally speaking, we have two kinds of data:

One is database data, which stores user information permanently.

One is the cache, which serves as a buffer for databases and applications.

Regardless of the type of data, there can be a cluster mode, where multiple servers simultaneously read and write data. In this way, the shared data involves the update of cluster data.

There are two update modes for cluster data:

One is reliability-first, Active/Active mode, where updated data from one node is immediately synchronized to the other node.

In Active/Passive mode, data updated on one node is not immediately synchronized to the other node.

The reliability-first running logic is that an update request is successful until all cluster services return the update success. The performance-first running logic is that the primary data is successfully updated, and the other nodes asynchronously synchronize with the primary data node.

The cache used in keyCloak is infinisPAN and multiple session caches are built, each using a different synchronization strategy:

  • AuthenticationSessions: This cache stores the information of the logged-in user. If the sticky Sessions mode is used, data synchronization is not required.

  • Action Tokens: This type of cache is needed if users need to perform asynchronous email authentication, such as forgetting passwords. Because the token in this operation can only be used once, data synchronization is required.

  • Non-authenticated session information: Replication is required because the sticky session mode cannot be guaranteed.

  • LoginFailures: detects loginFailures of users and does not need to be replicated.

When storing data in the cache, you need to pay attention to the invalidation of data after updating.

In KeyCloak, a single Work cache is used. This cache is synchronized across all data centers and does not store the actual data, only data notifications to be invalid. Each data service reads the list of invalid data from the Work cache and invalidates the corresponding data cache.

multicasting

Finally, IP broadcasting is required if the cluster needs the ability to dynamically discover and manage nodes. For example, you can use JGroups to do this.

conclusion

Keycloak bottom is wildfly, itself has supported many strong industrial component, its design concept is to make the procedure of business logic and other general production level features (high availability and load balancing, the cache cluster, message queue, etc.), only focus on the implementation of business logic and writing, other things to do to the server.

You can take a look at these excellent server frameworks and get some different insights.

Author: Flydean program stuff

Link to this article: www.flydean.com/keycloak-cl…

Source: Flydean’s blog

Welcome to pay attention to my public number: “procedures those things” the most popular interpretation, the most profound dry goods, the most concise tutorial, many you do not know the small skills you find!