Domain-driven Design (DDD) has become a hot topic in the era of microservices, and in the field of DDD, We often see a Command Query and Query Responsibility Segregation (CQRS) architecture. CQRS architecture, because it is only an idea of read and write separation, is implemented in a variety of ways. For example, data storage is not separated, which is only read and write separation at the code level, which is also the embodiment of CQRS. The Command end is responsible for data storage, and the Query end is responsible for data Query. The data on the Query end is synchronized through events generated by the Command end, which is also an implementation of CQRS architecture. This paper introduces the practice of CQRS architecture in the development of real-time answering PK system. In addition, we will introduce the Event Sourcing mode. This paper believes that the value of CQRS architecture can be better played by combining the Event Sourcing mode.

At the beginning of 2018, it became popular to answer questions and coin such products, these products become a hot flow entrance. The following are several answer PK products on the market: Mind King, Dazhong Dianping Answer PK, the strongest answer.

Real-time answer PK gameplay is mainly: in a certain competition area random match opponents, or invited friends to start a game, each game five questions, in the case of no props, answer a question can get up to 200 points, the slower the less points, the wrong answer does not score. The time limit for each question is 10 seconds. If you answer correctly in the first second, you will get 200 points. If you use one second (i.e. 9 seconds left), you will get 180 points. In the end, the winner will be judged by the total score of each team.

In the PK process, a lot of battle data of both sides will be generated, such as: score of each round, number of consecutive correct answers (often called combo in the game), most consecutive correct answers in a game, number of consecutive correct answers, etc.

In order to build a real-time answering PK system, we first need to select the communication mode of the front and rear end.

HTTP protocol is used in the application layer protocol, it is based on the TCP protocol, HTTP protocol to establish a connection must have three handshakes to send information. HTTP connection is divided into short connection and long connection. In short connection, each request requires three handshakes before sending its own information, that is, each request corresponds to a response. A long connection keeps the TCP connection open for a certain period of time.

The client communicates with the server, and it has to be initiated by the client and the server returns the result. The client is active, the server is passive. Therefore, the main defects of this approach are: high use of communication resources, poor timeliness of information flow.

Websocket, an application layer protocol, is a network technology for full-duplex communication between browser and server provided by HTML5. Websocket is based on TCP transmission protocol and multiplexes HTTP handshake channel, realizing multiplexing and full duplex communication. Under the WebSocket protocol, the client and the browser can send messages to each other at the same time. That is, after the Websocket is established, the server does not need to send messages to the browser after the browser sends a request. In this case, the server can also actively push messages to the client, improving the timeliness of the client information.

In addition, compared with HTTP long connection communication, Websocket can not only reduce the pressure of the server, but also reduce the redundant protocol information in the information transmitted, and improve the utilization of communication resources. Therefore, the communication mode based on Websocket is a better choice in the game scene with more information interaction.

However, in order to be compatible with different mobile device environments and improve system availability, the server generally needs to support HTTP polling and Websocket long connection. The system preferentially uses Websocket to establish long connections at the front and back ends. If the front end fails to establish connections, for example, websocket cannot be supported, the system uses HTTP polling for communication.

After confirming that HTTP and Websocket are supported, we need to agree on two communication protocols.

When HTTP communication is used, it is mainly about the API provided by the server. The steps for the front and back ends to communicate using HTTP are shown below:

1) Report the matching request;

2) Polling to obtain matching results;

3) The ready state is reported after the successful matching information is obtained;

4) Polling for game related information (main and guest player information, etc.);

5) Get the questions for the next round. If there are no questions, go to Step 7.

6) Report the answer of the round to the player and obtain the result of the round. Polling until the end of the round, go to Step 5.

7) Obtain the result information of the competition, and the competition ends.

As shown in the figure below, in the case of webSocket long-connection communication, the communication steps of the front and background are as follows:

1) The client reports the player’s unmatched request;

2) The server pushes the pairing success message;

3) The client sends back the ready message, ready to start the game;

4) The server pushes relevant information about the game and the first round title;

5) The client reports the player’s answer;

6) The server pushes the answer result of the round;

7) If this round is over (both players have completed the answer), the server will push the result of this round and the next round question, and proceed to Step 5. If all rounds have ended, go to Step 8.

8) The server pushes the result of the game and the game ends.

In most cases, we can think of an information system as a data repository that can be added, deleted, modified, and reviewed. All functions of the system can be translated into create, view, update, and delete (CRUD) on the object model of the storage structure. The CRUD pattern falls short when requirements become complex. We might want to view records in special ways: for example, to combine multiple records into one; Or combine records from different places into a single virtual record. In practice, however, there are limitations. Often we can only merge certain data, and sometimes the information stored is not the same as what we provided. Therefore, a business architecture with read/write separation is a good choice when addressing inconsistent read/write models.

CQRS (Command Query Responsibility Segregation) was first proposed by Greg Young, which is essentially a mechanism of read/write Segregation. Its architecture is shown as follows:

The separation of commands and queries gives us a better grasp of the details of the object and a better understanding of which operations change the state of the system. So that the system has better scalability, improve system availability. As an architectural idea, CQRS can be implemented in several ways:

  1. The most common CQRS architecture is read/write separation of databases;

  2. The bottom storage of the system is not separated, but the upper logical code is separated;

  3. The bottom layer of the system is separated. The Command side uses the Event Sourcing technology to store events in the EventStore. The Query side stores the latest state of an object to provide Query support.

Applicable scenarios of CQRS architecture:

  1. When there is a large difference between the applied write model and read model;

  2. When the query performance and write performance of the system need to be optimized separately, especially in the system with very high read/write ratio, separate read/write performance is necessary.

  3. The system needs to meet the requirements of high concurrent write and high concurrent read at the same time. Because systems based on CQRS architecture can maximize write on the Command side, it is easy to provide an extensible read model on the Query side.

  4. When practicing DDD; Because the CQRS architecture allows us to implement domain models without any object and database impedance imbalances that ORM frameworks bring.

The main feature of Event Sourcing (ES) is that it does not save the latest state of the object, but saves all events generated by the object, and obtains the latest state of the object through Event Sourcing. We usually persist the latest state of the object to the database after each object participates in a business action. That is to say, the data in our database reflects the latest state of the object.

On the contrary, event traceability saves every event experienced by the object instead of the latest state of the object. All events generated by the object are stored in the database in chronological order. When we need the latest state of the object, we only need to create an empty object first, and then replay all the events related to the object in the order of occurrence. This process is called event backtracking.

An event in ES represents a fact that has already happened. Facts cannot be erased or modified; therefore, events in ES themselves cannot be modified (updated or deleted), and all modification operations require a new record of a separate event.

Practice ES, however, when there is a big challenge in terms of storage, query, due to the records of all relevant events is the influence of the state of the object, so to get the latest state need to replay the event object, time needed for the process and is proportional to the number of events, when the data access object status update time will grow. In addition, in many business logic operations, “read” is generally required for verification before “write”. Therefore, the system of practice ES architecture generally caches the latest state of an object, conducts “warm-up” at startup, and reads all related events of the object for tracing back.

Thus, when an object is read — that is, the latest state of an aggregated root entity is loaded — system availability is not compromised due to query performance. Scenarios that require “reading” before “writing” inevitably run into concurrency problems, so the operation of backtracking events must be atomic. In practice, ES is often used in conjunction with message queues to ensure ACID (atomicity, consistency, isolation, persistence) for event backtracking snapshots.

Applicability of CQRS and Event Sourcing framework in real-time Answering PK system:

1. Read/write model

The read-write model of the answer PK system is quite different: the write model mainly deals with the answer data of a single player, while the read model is to obtain the data of all players in a game, and needs to filter sensitive data (the correct answer cannot be queried when one’s own team has not completed the answer). The main read and write requests of the PK system are shown in the figure below. These read requests can accept asynchronous return and therefore accept final consistency in the business.

In order to support HTTP and Websocket, read/write separation is more scalable and maintainable.

2. Read and write

Based on the HTTP polling protocol, the read and write ratio of the real-time answering PK system is relatively high, because a game of five rounds can write up to 10 events, and the client needs to poll the relevant game data.

3. Query side extensibility

The real-time answer PK system needs to have good scalability in Query. The game interaction mode is ever-changing, and the Query side needs to expand different data Query capabilities, such as: the number of consecutive wins, the number of consecutive rounds, whether to kill (the last round to surpass the score), replay the answering process (invite friends and their own answering records PK), the statistics of players’ game data and so on. Therefore, separate read and write, using Event Sourcing, makes it easier for our system to provide an easily extensible read model.

Based on the micro-service architecture and DDD design, we divide the background domain into PK context, matching context, NPC context, player account context and question bank context. PK context is the core context of the background system of real-time answer PK. This paper mainly introduces the design and practice of PK context. The following figure describes the modeling of PK context:

In the context of PK, we control the PK process through the aggregation of Game. As shown in the figure above, a Game includes multiple rounds (generally, a Game has 5 rounds), and the main attribute of a Round is Question. A turn consists of two PlayRound entities.

In the Game world, a player’s PK record serves as the Game’s outcome information, and is the credential and stub for settling a player’s assets.

In order to improve the actual success rate of PK matching (many players enter the matching process and then quit) and improve the game experience, we can refer to the two-phase Commit protocol when setting up the matching protocol of PK 1VS1: After two players submit a request to be matched, the matching context asynchronously forms a pair of two players. At this point, the PK domain server and the two players do not know whether they are ready for the match. In order to ensure the ACID properties of this round of matching, the server needs to divide the matching process into preparation stage and submission stage.

As shown in the figure below: After player A reports the request to be matched, the server matches player A with player B in the same competition area and A similar position based on the matching strategy. At this point, the server sends player A and player B A prepare notification, waits for both players to confirm the match, and commits the match to the server. Both players are confirmed after the official start of the game. If a player times out and does not confirm the prepare (the client does not receive the Game Start event sent by the server within the specified timeout period), the match is invalid. The client starts the matching process again and reports the match request again until the match succeeds.

When both players confirm a match, the server initializes the game data (deducting the admission fee, storing the game context) and sends the Game Start event to the player. At this point, if a player quits or drops out, the PK process will deal with it. For example, an NPC will host the offline player to ensure the game process experience.

The whole matching process, based on Websocket, is realized by sending messages to each other between the client and server. In HTTP polling mode, the client needs to proactively fetch the information from the server and specify a certain polling times or timeout period to retry or terminate the process.

In Game domain object processing, we separate read and write requests from domain services based on CQRS. Take the answering process of players as an example, as shown in the figure below, the main process is:

1) The client reports the player’s answers and contextual information;

2) The command processor parses and separates the request information into the command to process the player’s answer, and sends the command to game aggregation for processing;

3) Game aggregation converts the player’s answer into an answer event (mainly including gameId, playerId, roundIndex, answerTime, Score, etc.), returns the player’s score of the current round synchronously, and persists this event;

4) After a valid game event is persisted, an Event Added message will be sent to the message queue (in order to prevent the competition of shared data, the message is hashed to the corresponding partition in the message bus according to gameId consistency), which will be processed by the game View processor subscriber;

5) According to the monitored events, the game view processor backtracks the events of the corresponding games and generates the latest game view data, including the game progress and the performance of each role player (current score, combo number, current longest continuous logarithm, etc.);

6) Persist the latest generated view to the view database for the query processor to use for API;

7) The game view processor sends the latest game progress information to the message queue and pushes it to the websocket-connected client, or other domain services (for example, after a game, the account domain service needs to settle the assets of each player of the game);

8) Based on websocket connected client, the server will actively push game progress information, including the current round situation, the next round title, the game results and other data;

9) Based on HTTP polling communication client, the client queries the corresponding game information through polling API.

A common solution to support efficient queries based on Event Sourcing data is to generate a view ahead of time that externalizes the data in a format suitable for the desired result set. These materialized views, which contain only the data needed for the query, allow applications to quickly get the information they need. In addition to joining tables or merging data entities, materialized views can also contain the current value of computed columns or data items, the result of merging values or performing transformations on data items, and values specified as part of a query. You can even optimize the materialized view for just a single query.

In PK, the data to be displayed by the front desk includes, but is not limited to:

1) Current round score

2) Total score of the current game

3) Current round combo

4) Game data of the other party

5) Is it the final kill or not?

6) The longest number of consecutive correct answers in the bureau

All of these data can be retrieved from the user’s answer event list through event backtracking. However, to improve query efficiency, we generate related data view snapshots by subscribing to answer events, which are stored in RDBMS and NOSQL databases.

With the CQRS architecture, there is a delay between reads and writes, which means that the Consistency model of the system is Eventual Consistency. In the question PK game, in terms of data consistency and availability, we give priority to ensuring the availability of services, and the final consistency of data guarantee (delay TP999 control within 200ms) is acceptable.

To ensure final consistency, we need to:

A. MQ ensures message loss.

B. Any consumer thread must ensure that it is fully processed before sending ACK to MQ;

C. Idempotent processing of any message per consumer thread;

After a Command request is sent, the requester does not receive the correct response for various reasons, but the requested end has performed the operation correctly. If the request is resended at this point, a repeat operation will occur. In CQRS/ES architecture, the same Command is identified by aggregating root entity ID, version number, and CommandId, and Command requests are processed idempotent. In practice, we do idempotent check mainly by the unique key constraint of the database.

Concurrent processing mainly involves business scenarios that need to be read and then written, such as the processing of relevant view data in PK (answer data of PK contestants). In order to prevent data consistency problems caused by concurrent event processing, CQRS/ES is strictly implemented on a first-come-first-come-later basis, and the state of the aggregate root is changed by a single thread atomic operation. Therefore, we can use message queues (such as Kafka) to produce related events of the same entity to the same partition, and the same partition can be consumed in an orderly manner. To ensure orderly processing of the state of the same aggregate root entity.

It is useless to talk about system architecture in isolation from business scenarios. Projects built on CQRS/ES have a much more scalable, maintainable, and high-performance code base, but this gain does not come from perfect practice of any technology, but from detailed analysis and thought through the details of business requirements. The case in this paper is a suitable scenario for CQRS/ES practice.

Eric Evans. Domain-driven design. Zhao Li Sheng Hai Yan Liu Xia et al. Trans. Microservice Patterns and Best Practices: Vinicius Feitosa Pacheco. Posts and Telecommunications Press, 2016. Explore patterns like CQRS and event sourcing to create scalable, maintainable, and testable microservices. Packt Publishing, 2018.