This is my understanding of O ‘Reilly’s book Reactive Microservices Architecture, which describes the design principles of Microservices/distributed systems. Book address: info.lightbend.com/COLL-20XX-R… .

Microservices can be more responsive than traditional monolithic applications, making a big impact on a small system. With the acceleration of network, the reduction of disk cost, the reduction of RAM cost, the development of multi-core technology and the outbreak of cloud architecture technology, microservices are no longer limited by these objective conditions, and have begun to be applied on a large scale.

Microservices have the same intent as SOA architecture: decoupling, isolation, composition, integration, decentralization, and autonomy, but SOA is often misunderstood and misused, especially with the use of an ESB to support protocol (complex, inefficient, unstable) calls to multiple monolithic systems, making systems very complex. With the development of hardware and software architecture concepts over the years, all systems have essentially become distributed architectures, bringing with them many new challenges. New ideas and concepts are needed to address these issues, and Reactive Principles, as described in this book, are ideas for addressing distributed systems. The principle of responsiveness is not new either; the Actor model in Erlang is a type of responsive design. Microservices is an architectural design of responsive principles that borrows from SOA architectures and uses modern infrastructure (cloud services, automation tools, etc.).

Responsive microservice definition

One of the key principles for using microservices architecture is to divide the system into isolated, independent subsystems that communicate through well-defined protocols. Isolation is the premise of realizing elastic and scalable systems, and asynchronous communication boundaries need to be established between services. Therefore, decoupling should be carried out in the following two aspects:

  • Time: Concurrency is allowed.
  • Space: Allows for distribution and mobility, that is, services that can move at any time.

In addition, microservices need to eliminate shared state to minimize the cost of collaborating and connecting with each other, and try to “share nothing”.

Quarantine anything

Isolation is the most important feature of microservices architecture. Not only is it fundamental to many of the benefits that microservices offer, it is also the aspect that has the greatest impact on design and architecture. It also has a huge impact on organizational structure, as Conway’s Law says,

The structure of the system is a reflection of the team's organizational structure.Copy the code

Failure isolation is a design pattern associated with “bulkheads” (bulkheads in a ship’s hold) : isolate errors, failures to prevent them from spreading to all services and causing a wider failure.

“Bulkheads” have been used on ships for centuries to create watertight compartments to prevent damage to the hull or other leaks. These Spaces are completely isolated from each other, so that even if one containment area is filled with water, it does not spill into the other containment areas, allowing the ship to function as a whole.

Resilience (the ability to recover from failure) depends on this design of bulkheads and failure isolation, and requires the breaking of synchronous communication mechanisms. As a result, microservices typically use asynchronous message transmission between boundaries, which eliminates the dependence of normal business logic on error catching and error handling.

Further, the isolation between services makes it easy to “continuously deliver” services, which can be deployed at any time without worrying about disrupting normal business. And isolated individual services are easy to monitor, debug, test, and deploy, making them easy to scale.

Act autonomously

The isolation described above is a prerequisite for autonomy. Only when services are completely isolated from each other is complete autonomy possible, including independent decisions, independent actions, and coordination with other services to solve problems.

An autonomous service only guarantees the correctness of its published protocols/apis. This not only allows us to better understand and model these systems in collaboration, but also allows us to troubleshoot and repair conflicts and failures within a single service.

Using autonomous services brings great flexibility in service choreography, workflow management, and service collaboration, as well as extensibility, availability, and runtime management. But at the cost of a well-defined composable API design, this can be challenging.

Do one thing, and do it well

As the Unix programming philosophy goes: programs should do one thing and do it well. Then let them work together to complete the task. This is also similar to the description of the Single Responsibility Principle for Software Development (SRP) in object-oriented programming languages.

A big problem in microservices is how to size the service correctly. For example, what particle size is considered micro? How many lines of code can still be considered a microservice? But micro is really about scope of responsibility, like the Unix SRP principle: Do one thing and do it well.

Each service should have a single reason for being and provide a set of related functions without mixing business and responsibility. All services are collectively extensible, resilient, easy to understand, and easy to maintain.

Have your own private state

A key part of microservices is state. Many microservices are stateful entities, including encapsulation of state and behavior. Under the “stateless” design concept, many services sink their state into a large shared database, which is what many traditional Web frameworks do. This makes scalability, availability, and data integration difficult to control. A microservice architecture with a shared database is essentially a monolithic application.

The logical approach is that a service with a single responsibility should have its own state and persistence mechanism, modeled as a bounded context, with its own domain name and language. These are also DDD (Domain-Drivern Design) inside the technology. Microservices are heavily influenced by DDD, and many of the context concepts for microservices come from DDD.

When accessing a service, you can only politely request its state and cannot force it to have state. So, Each service is capable of Event Sourcing and Command Query Responsibility (CQRS) Segreation) customizing its own state representation and implementation (RDBMS, NoSQL, time-series, EventLog).

Decentralized data management and persistence (multilingual persistence) offer many advantages. The data storage medium can be selected according to the needs of the service itself, and the service including its data can be regarded as a single unit. At the same time, one service is not allowed to directly access another service’s database, if only through the API (which is guaranteed by specifying specifications, policies, and Code Review).

Event logs are a way of storing messages. We can store it as a message into the service (sent to the Commnds of the service), which is Command Sourcing. We can also ignore the command and let the command execute first to have some effect on the service. If a state change is triggered, we capture the change and store the Event in the EventLog using Event Sourcing.

Messages are stored in order to provide a history of all interactions for the service. Messages also hold transactions of the service, which can be queried, audited, and replayed for elastic scaling, debugging, and redundancy.

Command traceability and event traceability are semantic differences. The replay command means the side effects will be replayed. A replay event is a change in execution state. You need to select the tracing technology based on specific scenarios.

Using EventLog avoids the “object relationship mismatch” problem that often occurs in ORMs. The Event Log is the best persistence model for microservices in most cases because of its natural suitability for asynchronous messaging.

Embrace asynchronous messaging

The best mechanism for communication between microservices is message transport. As mentioned above, asynchronous boundaries between services decouple both spatially and temporally, improving overall system performance.

Asynchronous non-blocking execution and IO are efficient operations on resources that minimize blocking consumption when accessing shared resources (the biggest barriers to scalability, low latency, and high throughput). Simple example: If you want to initiate access to 10 services, each of which takes 100ms, then using synchronous mode, it takes 10*100=1000ms to complete all requests. If you use asynchronous mode and start 10 threads at the same time, a total of 100ms is required.

Asynchronous messaging also allows us to focus on the limitations of network programming rather than pretend they don’t exist, especially in failure scenarios. It also allows us to focus more on workflows and how data flows, protocols, and interactions between services take place.

However, at present, the default communication protocol of microservices is mainly REST, which is a synchronous communication mechanism in essence. It is more suitable for controllable service invocation or tightly coupled service invocation.

In addition, another requirement for using asynchronous message transport is continuous (possibly unbounded) flow processing of messages. It is also a change in our concept from “Data at Rest” to “Data in Motion”. Where previously data was used offline, data is now processed online. The application’s response to data changes needs to be real-time: when changes occur, they need to be continuously queried, aggregated, and fed back to the final application in real time. There are three main stages in the development of this concept:

  1. Data at REST: a large amount of data is stored in a data storage medium similar to HDFS, and the offline batch processing technology is used to process the data. The delay is usually several hours.

  2. Realized that “Data in Motion” was becoming increasingly important: capturing data, processing it, and feeding it back to a running system in seconds. Lambda is one of the architectures that emerged at this point: an acceleration layer for real-time online computing; The batch layer is used to do complex offline processing. The results of the acceleration layer’s real-time processing are subsequently merged with the results of the batch layer. This model solves the problem that some scenarios require immediate response of data, but the complexity of its architecture makes it difficult to maintain.

  3. “Data in Motion” : Fully embrace the concept of mobile data. Traditional batch-oriented architectures are moving towards stream-only architectures. This model also brings “data in motion” and stream processing capabilities to microservices as a communication protocol and persistence scheme (via Event Logging).

Keep moving, but addressable

As mentioned above, asynchronous messaging introduces a decoupling of space and time. Decoupling of space is also known as “location transparency” : the ability for microservers on multiple cores or nodes to dynamically scale at runtime without changing nodes. This also determines the flexibility and mobility of the system. Making this happen depends on some of the features that cloud computing brings and its “on-demand” model.

Addressable means that the address of the service needs to be stable so that it can be referred to indefinitely, regardless of whether the service is currently located or not. The address should be available when the service is running, stopped, suspended, upgrading, crashed, and so on, and any client can send a message to an address at any time. In practice, these messages may be queued, resubmitted, brokered, logged, or queued. In addition, the address needs to be virtual and can represent the services provided by a set of instances.

  • Load balancing between stateless services: If the service is stateless, it doesn’t matter which service instance the request is processed by. There are also a variety of routing algorithms available, such as rotational training, broadcast, or based on metric information.
  • Build active-passive redundancy design between stateful services: If a service is stateful, then the sticky routing algorithm can be used (requests from the same client are sent to the same service instance). Redundancy of a passive instance is to take over the request when the primary instance hangs. Therefore, each state change of the service needs to be synchronized to the Passive instance.
  • Stateful service relocation: Moving a service instance from one location to another improves locality of references (bringing data and computation close together) and resource utilization.

Using virtual addresses allows service consumers to know the address without caring how the service is currently configured to operate.

Micro service system implementation

A micro-service is not a real “micro-service”. Only through communication and cooperation can a series of micro-services solve problems and form a complete business system. While implementing a service is relatively simple, it is the implementation of other infrastructures: service discovery, collaboration, security, redundancy, data consistency, fault tolerance, deployment, and integration with other systems that are difficult.

Systems need to take advantage of reality

A big advantage of microservices architecture is that it provides a set of tools to build systems that leverage reality and mimic the real world, including real-world limitations and opportunities.

First, according to “Conway’s Law”, the deployment of microservices is adapted to how engineering organizations/departments work in the real world. It is also important to note that reality is not uniform and everything is relative, even the concept of time and the present.

Information cannot travel faster than light, and most of it is very slow, which means that information communication is delayed. Information comes from the past, and if we think about it a little bit, we can see that information carries what we’ve observed. The facts we observe/learn about are at least a short time ago, which means we are always looking into the past, and the “present” is in the eye of the beholder.

Each microservice can be seen as an island of security, providing certainty and consistency, where the time and “present” are absolute. But when you leave the boundaries of a microservice, you enter the world of a sea of uncertainty — distributed systems. As many have said, building distributed systems is hard. But the real world also offers ideas on how to solve distributed problems such as elasticity, scalability, and isolation. Therefore, even though building distributed systems is difficult, we should not degenerate into monolithic applications, but learn how to manage it using a set of design principles, abstractions, and tools.

As Pat Helland puts it in Data on the Outside Versus Data on the Inside: Internal data is our local “present,” while external data-events are information from the past, and commands between services are “hopes for the future.”

Service discovery

The problem of service discovery is how to locate a set of services that can be invoked using addresses. One of the simplest ways to do this is to hardcode address and port information in all services or externalize it in service configuration files. The problem with this approach is that it is a static deployment model that contradicts the intent of microservices.

Services need to remain decoupled and mobile, while systems need to be resilient and dynamic. This problem can be solved by introducing an indirection layer using the Inversion of Control pattern. This means that each service reports its own information (location, how to access it) to a unified platform. This platform is called Service discovery and is a fundamental part of the microservices platform. In this way, once the service’s information is stored, the service can use a “service registry” to look up the information that invoked the service, a pattern known as “client-side service discovery.” Another strategy is to store and maintain information in a load balancer (AWS ELB) or directly in the address of the service provider – “Server-side service discovery.”

You can select a database with the CP feature to store service information to ensure information consistency. But such databases sacrifice a certain level of availability to achieve consistency, and rely on additional infrastructure, which is often not necessary. Therefore, a better choice is to use ap-specific point-to-point technologies for storage, such as CRDTs (Conflict-free Replicated Data Types) and Epidemic Gossip, which can achieve the ultimate consistent dissemination of information and have better elasticity. There is no need for additional infrastructure.

API management

API management solves the problem of how to manage service protocols and apis in a unified manner to facilitate service invocation. The upgrade and rollback of protocol and data versions are included. This can be addressed by introducing a layer responsible for serialization encoding, protocol maintenance, and data transfer, or even directly versioning services. This is called a “Corruption” layer in DDD and can be added to the service itself or implemented in the API gateway.

If a client needs to call 10 services, each with a different API, to complete a task, it can be very tedious for the client. Rather than having the client invoke the service directly, it is better to have the client invoke the service through the API gateway service. The API gateway is responsible for accepting the client’s request, routing the request to the appropriate service (converting the protocol if necessary), assembling the response and returning it to the client. This simplifies the client-to-service protocol as a layer between the client and the service. However, it is difficult to achieve high availability and scalability if it is centralized, so implementing API gateways using decentralized technologies such as service discovery is a better choice.

However, it is important to note that the API gateway, including all the core outbound services, does not have to be self-built, and ideally should be part of the underlying platform.

Management communication mode

In a system composed of several microservices, the communication between services can be completed by using point-to-point communication. But as the number of services grows, if you let them call each other directly, the whole system can quickly become chaotic. Solving this problem requires a mechanism that decouples the sender and receiver and routes the data according to some well-defined principle.

A publish-subscribe mechanism is a solution where publishers publish information to a topic and subscribers listen to the topic to listen for messages. This can be done using extensible messaging systems (Apache Kafka, Amazon Kinesis) or NoSQL databases (AP feature databases such as Cassendra and Riak).

In an SOA architecture, the ESB assumes this role. We definitely won’t use it to bridge individual applications in microservices, but it can be used as a publishing system to broadcast tasks and data or as a communication bus between systems (collecting data into Spark via Spark Streaming).

Publish and subscribe agreements are sometimes inadequate. For example, there are no advanced routing features that allow programmers to customize routing rules or data transformation, enrichment, separation, and consolidation (Akka Streams or Apache Camel can be used).

integration

Communication between the system and the outside world or between systems is required. When communicating with an external system, especially if the external system cannot be controlled, there is a high risk of failure (system overload, business failure). Therefore, no matter how well the agreement is negotiated, external services cannot be trusted and various precautions should be taken to ensure the security of their own services.

The first step is to reach a good agreement that minimizes the risk of a sudden system overload resulting in service unavailability, such as the need to avoid initiating more requests than the service provider can handle. You also want to avoid using synchronous communication mechanisms, otherwise you put the availability of your own services in the control of dependent third-party services.

Avoiding cascading failures requires sufficient decoupling and isolation of services. Using asynchronous communication mechanism is the best solution. In addition, it is necessary to achieve consistency of data flow speed through back-pressure (the receiver adjusts the receiving rate according to its own receiving status and controls the sending rate of the sender through the reverse response), so as to prevent the fast responding system from crushing the slower part. Increasingly, tools and libraries are embracing the “Reactive Streams” specification (Akka Stream, RxJava, Spark Streaming, Cassandra Drivers), which uses asynchronous backpressure real-time Streams to bridge systems, This improves overall system reliability, performance, and interoperability.

How to manage service invocation failures is also a key issue in microservices. When an error is caught, retry, and if the error persists, isolate the service for a while until the service is restored – “circuit breaker” mode (implemented in Netflix and Akka).

In the face of scalability, high throughput, and availability requirements, the implementation of system integration has changed from traditional reliance on centralized services such as RDBMS/ESB to decentralized strategies (HTTP REST, ZeroMQ) or subscription publishing systems (Kakka, Amazon Kinesis). Recently, Event Streaming Platforms are becoming the trend of system integration type selection, whose idea comes from Fast Data and real-time Data management.

As mentioned above, there are many benefits to using an asynchronous communication mechanism between services. But when it comes to communication between a client (browser, APP) and a service, REST is often a better choice. However, not all places have to use synchronous communication mechanisms and need to be evaluated differently for different scenarios. In many cases, developers tend to use synchronous solutions out of habit, rather than making a choice that simplifies operations and improves operability based on real needs. Here are a few examples of asynchronous behavior that are typically modeled using synchronous scenarios:

  • Queries the availability of an item and notifies the user if the item is sold out.
  • If a restaurant’s special menu changes, users need to know immediately.
  • User comments on a site require a live conversation.
  • The AD system outputs different responses based on the user’s behavior on the page.

For the above examples, we need to analyze them separately to understand what is the most natural way to conform to client and service communication. At the same time, it is often necessary to find the possibility to weaken the consistency guarantee (order) according to the integrity constraints of data, in order to find the least coordination constraints to give users intuitive semantics: to find the best strategy to use reality.

The safety management

Security management is mainly about authentication and authorization management of services. Some services can only be accessed by certain services.

  • TLS Client Certificates are also known as mutual authentication, two-way authentication. It assigns a separate private key and certificate to each service to ensure authentication access between services. Not only does the service authenticate the client, but the client also authenticates the service. Therefore, it can not only prevent data from being eavesdropped, but also prevent the interception and forwarding of data even in insecure networks. Communication over SSL is not only secure, it is also an open, easy-to-understand standard. But it is too complex to be adequately supported by the underlying platform. Similarly, HTTPS Basic Authentication is two-way Authentication, but the management of SSL certificates is complicated and requests cannot be cached by reverse proxies.
  • The Asymmetric Request Signing: Each service needs to use its private key to sign the requests it sends, and the public key of each service is reported to the Service Discovery service. The disadvantage of this scheme is that once the network is unreliable, it is difficult to prevent data eavesdropping or request replay attacks.
  • Hash Message Authentication Code (HMAC) : specifies a regular signature for requests based on the shared key. This scheme is relatively simple, but because each pair of services that need to communicate needs a unique shared key, and the whole system needs the number of shared keys of all the permutations of services, it is difficult to implement.

Minimize data coupling

In the microservice architecture, the completion of a task requires the cooperation of multiple services. Therefore, minimizing the cost of state collaboration between services helps improve the overall performance of the microservice system.

What needs to be done is to analyze the data from a business perspective to understand the relationships, guarantees, and integrity constraints between the data. Then anti-paradigm design is performed on the data and consistency boundaries are defined within the system. This allows for strong consistency within boundaries. You then need to use these boundaries to drive the design and scope of microservices. If you design services that have many data dependencies and relationships, you need to reduce or even eliminate the coupling of these data to avoid coordination of service state.

Minimize collaboration costs

As described in the previous section, data coupling has been minimized, but there will still be business scenarios that require multiple services to collaborate. This is inevitable, but at this point in time, you need to gradually add collaboration as needed, rather than gradually removing the coupling at first (which is cumbersome and difficult).

There are several scalable, elastic ways to accommodate data changes to achieve Composability (changes to data do not require stopping the service in which the data is located or waiting for certain conditions).

  • Demand-oriented Programming: Based on the idea that asking for forgiveness is easier than asking for permission. If you can’t respond, make a reasonable guess that a condition has been met, and if you’re wrong later, apologize and make amends. This approach is very similar to reality. For example, if the flight is overbooked, if there are no seats at the time of departure, then make up for it.
  • Event-driven Architecture: Based on asynchronous messaging and Event sourcing. You need to distinguish between commands, which represent an action intent that will have a side effect (hope for the future), and events, which represent the fact that it has already happened. The CQRS model is used for querying, separating the writer (persisting the event log) from the reader (storing the data in an RDBMS or NoSQL database). Using event logging here for state management and persistence has many benefits: it simplifies auditing, debugging, redundancy, fault tolerance, and allows playback of the event stream at any point in time in the past.
  • ACID2.0: Created by Pat Helland, it defines a set of principles for extensible, resilient protocol and API design. A is Associative, indicating that group messages have no impact and can be processed in batches. C is Commutative, meaning that the order of the messages is not important. I is Idempotent, which indicates that message repetition is not affected. D) Distributed D) ACID

CRDTs is a tool that combines all of the above to achieve final consistency, rich data structures (counters, sets, maps, graphs), and convergence without collaboration. The sequence of its update operation does not affect the final merge result, and can automatically and safely merge. Although CRDTs has only recently emerged in the industry, it has been in production for many years. There are already several production-level libraries available for direct use (Akka, Riak).

However, many business scenarios do not allow for final consistency. Causal consistency can be used. Causality is easy to understand. And the causal consistency model can achieve scalability and availability. It generally uses logical time and is available in many NoSQL databases, event logging, and distributed event flow products.

Distributed transactions are a commonly used way to coordinate changes in distributed systems, but by their very nature require constrained concurrent execution to ensure that only one user is operating at a time. Therefore, it is very expensive and makes the system slow and unscalable. The Saga pattern is an alternative to distributed transactions that allows for scalable, elastic scaling. Its theoretical basis is that a long-running business transaction is mostly composed of multiple transaction steps, and the overall consistency of transaction steps can be achieved by grouping these steps into an overall distributed transaction. This technique pairs each phase’s transaction with a compensable rollback, and if a phase’s transaction fails, the overall distributed transaction can be rolled back (in reverse order).

conclusion

When designing a responsive microservice, you need to insist on isolation, single responsibility, autonomy, exclusive state, asynchronous messaging, and mobility. Microservices require collaboration to form a system to function. A complex microservice platform that can provide basic services and a responsive principle pattern is necessary.