Recently when playing animal Friends will often encounter some mysterious online problems, a search on the Internet, found that everyone’s answers tend to use metaphysics to explain, so I have the interest to understand the root of these problems in principle and the technical reasons behind some online Settings of this game.

For the record, I do not work in the game industry nor am I a long-term player of console games. If there is any mistake or other Angle to add, please let me know in the comments section.

How is the game synchronized

Let’s start by looking at how games in general do syncing.

Imagine two separate rooms with players A and B playing a game of chess remotely. They need to know the status of the current board before each move, and there are two ways to do this.

The first is called lockstep synchronization, and the principle is that each player notifies another player of his or her moves and synchronizes the current sequence of moves with each other. Through these timed moves, the current state of the game can be calculated. But it does not allow the loss of any information in the middle, otherwise there will be a very large calculation bias.

The second is called state synchronization, which, as the name suggests, synchronizes the state of the entire board with every move the player makes. This way, you can tolerate the loss of some intermediate states and still end up with the same state.

In practice, lock-step synchronization will be used more often in games where the player has a high frequency of action, such as Honor of Kings. Card games tend to use state synchronization directly.

How is the game online

Communication architecture

Either way, we need to exchange data across a network between multiple hosts. Let’s change the scene to a, B and C playing checkers together. In order to ensure that the game states obtained by three people are consistent, we often need to have a Host as the authoritative Host, and other hosts can only update their local game data through the data (state/operation sequence) sent by the authoritative Host.

Here, we assume that A is the “Host”, and each operation step of B and C needs to be sent to A for confirmation, and then the confirmed information of the operation is sent to B and C. Only then can B and C consider the operation successful and update the picture to the latest state. A host has the real state of the current game at any time, other hosts just follow the state of a host to update their game screen.

In the above mode, due to a host as a game host, but also as a state synchronization host, when the number of online users, a host will be overwhelmed, the so-called “explosion/explosion island” phenomenon. In addition, this mode will require a host to live all the time, only as a servo solution for a short period of time. So some games introduce an external custom/official server to take over the state synchronization function, such as Minecraft. But the principle is exactly the same.

NAT penetration

After understanding the basics above, we can see that, without considering external servers, we have the following requirements for the network between player consoles:

  1. A can send data to B and C
  2. B and C can send data to A
  3. B and C do not need network connectivity protection

Although the above requirements seem easy, it is not easy to establish bidirectional communication between two home hosts because network operators use NAT to varying degrees nowadays.

Home networks generally have four different NAT types:

Full – cone NAT:

  • Once an internal address (iAddr:iPort) is mapped to an external address (eAddr:ePort), any packets from iAddr:iPort are sent through eAddr:ePort.
  • Any external host can send packets to iAddr:iPort by sending packets to eAddr:ePort.

(Address) – restricted – cone NAT:

  • Once an internal address (iAddr:iPort) is mapped to an external address (eAddr:ePort), any packets from iAddr:iPort are sent through eAddr:ePort.
  • An external host (hAddr:any) can send packets to iAddr:iPort by sending packets to eAddr:ePort only if iAddr:iPort has previously sent a packet to hAddr:any. “Any” means the port number doesn’t matter.

The Port – restricted – cone NAT:

  • Once an internal address (iAddr:iPort) is mapped to an external address (eAddr:ePort), any packets from iAddr:iPort are sent through eAddr:ePort.
  • An external host (hAddr:hPort) can send packets to iAddr:iPort by sending packets to eAddr:ePort only if iAddr:iPort has previously sent a packet to hAddr:hPort.

Symmetric NAT

  • Each request from the same internal IP address and port to a specific destination IP address and port is mapped to a unique external source IP address and port; if the same internal host sends a packet even with the same source address and port but to a different destination, a different mapping is used.
  • Only an external host that receives a packet from an internal host can send a packet back.

The four types of NAT can be summarized as follows:

  1. The first three types of NAT can map the Intranet iAddr:iPort to a fixed extranet eAddr:ePort. Only the Symmetric type can be used for the same iAddr:iPort. However, different destination IP addresses and ports can be mapped to different eAddr:ePort.
  2. Address-restricted-cone indicates that only address-restricted-cone can send packets to address-restricted-cone. Port-retricted-cone means that only the address:port address has the ability to send a packet to it.
  3. The restriction of the fourth Symmetric type on the source of returned packets is consistent with that of Port-retricTED-cone.

The network test function on the host can tell us the NAT type of the current network. On the Switch, Types A, B, C, and D map to the preceding four types respectively. On the PS4, types 1(direct connection, no NAT), 2(asymmetric NAT), and 3(Symmetric NAT) are used. For the purposes of this discussion, we use the Switch’s ABCD to refer to the above four network types.

By understanding the limitations of each of the four NAT types, we can deduce which two NAT types of networks are impossible to establish bidirectional communication without having to go to the humans. Here, we illustrate different situations under different NAT types. A serves as the Host, and we have an external online service that can obtain the external IP information of PARTY A and Party B. The so-called online service is a third-party server, through which both party A and Party B can search for each other’s external IP address and port number information and register their own external IP address and port number information. Therefore, party A and Party B can know each other’s address information before they communicate.

If the NAT type of A is A:

  • No matter the type of USER B is A/B/C/D, User B can directly send data to User A’s eAddr:port. When User A has received data from User B, user A can also obtain User B’s eAddr2:port2 and the qualification to send data to User B, thus establishing bidirectional communication.

If a’s NAT type is B:

  • If the NAT type of USER B is B/C/D, User A uses it first192.168.1.1:10001= >1.1.1.1:10002(A Extranet exit)= >2.2.2.2:20002(B Extranet entrance)User B refuses to send data to user B, but the access record is left on the router of user B, enabling user B to send data to User A. When b finishes sending data, A can send data to B, thus establishing two-way communication.
  • If the NAT type of USER B is A, the logic is the same as that of user A

If the NAT type of A is C:

  • If the NAT type of USER B is D:, user B’s attempt to connect to User A is rejected, and user A cannot know the port number mapped to user B and therefore cannot connect to user B. Unable to establish communication in either direction.
  • When the NAT type of B is B: the connection between C-B and the connection between B-C is the same.
  • When the NAT type of B is C: THE difference between C-C and C-B/B-C is that the egress ports of the two parties must be the same, but bidirectional communication can still be established.
  • Bidirectional communication can also be established when the NAT type of USER B is A.

If the NAT type of A is D:

  • If the NAT type of USER B is D, the communication cannot be established in any direction.
  • If the NAT type of USER B is C, the communication between user B and USER C-D cannot be established in any direction.
  • If the NAT type of USER B is A/B, it can establish bidirectional communication with BOTH A-D and B-D.

In conclusion, the following conclusions can be drawn:

  1. Only c-D,D-C, and D-D combinations have no chance to establish bidirectional communication. Other combinations have bidirectional communication capabilities at the NAT level.
  2. Players of type A/B should theoretically have no NAT problems with any other type of player.

Of course, this is all theory, but whether or not you can actually connect depends on other network conditions and even programming logic.

How does Actison do online

Many console games have some very strange or even bizarre Settings when they are online, which are related to the synchronization mechanism and online networking issues described above.

Actison’s online mode also has a number of interesting (and annoying) Settings, such as:

  • Island decorations cannot be changed while online
  • When a player is on the island, it requires everyone to pause for a long time to join
  • When a player leaves, they also need to be watched as they leave, and the data progress of the current moment is saved when they leave
  • When a player drops/forcibly logs out, everyone’s data is rolled back to the previous version of the island when the player landed/normally left
  • When a player opens a dialog window on the island, people cannot leave the island or enter the island

The following analysis is just my guess at how to implement the motion picture, combined with my software engineering experience, after playing 200 hours of the game. No one can guarantee that this is absolutely correct without looking at the code, and I care more about the happiness of the guessing process than the correctness, so you can think about it more in terms of engineering than whether it’s actually done.

We can divide the process of online game into the following sections to discuss respectively:

1. Party A enables the online permission (the so-called open door) and uses its own Host as the Host

In this step, user A registers its external IP address and port number (for example, 1.1.1.1:10001) with the online service of the Switch.

2. User B searches for User A and tries to join user A

  • Party B registers its external IP address and port number (for example, 2.2.2.2:10002) through the online service. (The part of the game where you ask if you want to get online)

  • After that, we can get 1.1.1.1:10001 of A (that is, the step of searching friends in the mobile network) and try to connect.

    Note that at this time, host A also knows that PARTY B is connecting to it through the online service in the background, and Party A will connect party B with 2.2.2.2:10002 of Party B according to different NAT types to try to get through two-way communication.

3. Establish a connection and go to the island

When the above step confirmed that two-way communication can be established, you can start the island. The island is divided into the following steps:

3.1 Host Packages all current game states

Before going to the island, Host A will pack a snapshot of all the game data of the current moment.

The snapshot data here includes island data and player data.

3.2 Downloading the Snapshot of the Other Island

An animation showing progress will pop up when mobile is on the island. The process of this animation is downloading the snapshot data of the target island. Obviously, it can be found that the process will be very long if the players in China connect to the United States, which is caused by the slow cross-border Internet speed.

3.3 Others wait until the new player comes to the island

The reason why other players have to wait for new players is because the snapshot saved in the previous step must be the latest result, which means that other players can’t do any incremental actions, otherwise the state of the new player will be inconsistent when they land on the island.

4. Start the game

When the island is finished, you can start the game normally. This is where the problem arises of how to synchronize the data that players are manipulating with each other.

Here we divide player actions into two types:

  1. Affect game data (low frequency, timing requirements, need to drop disk)
  2. Affect game graphics but not game data (high frequency, no timing requirements, no need to drop disks)

If we carefully experience, we will find that when we are digging holes, talking to NPCS, dropping items and other operations that directly affect the global game data, there will occasionally be a lag. This is because these operations that affect the global state need to request permission from the Host before rendering the screen. If there is network jitter there will be a stutter/failure situation. While running, we rarely get stuck, but sometimes “flashbacks” occur, because running only affects the player’s current position, not the game data, and even flashbacks are acceptable, and timing is not mandatory. Also, if you run, you have to go to the Host and ask, and the whole game experience is very slow. But for things like dropping items, if the data is messed up or the timing is messed up, the whole state is inconsistent, which can be very serious.

So the synchronization mode here is actually lockstep synchronization. It’s just different strategies for low and high frequencies.

5. Player exits/forcibly exits/disconnects

If the player exits the game normally, a “Save data” screen will be triggered. To understand what this saves data means, we need to divide the data in the game into two categories:

  1. State of the island
  2. Every move of each player

For a console game, the only data that really counts is the data that ends up in the console’s local storage. However, if every update triggered an update to the player’s local host storage, it would be very difficult to roll back, not to mention disk IO is very slow. It’s more likely that the Host will have the authoritative data about the current game in its memory, including the state of the island and the player’s actions. In addition, regardless of how the player exits, it is important to ensure that all of the player’s local saves plus those on the Host Host are the true state of the game at any point in time after the game is over. The correctness of game data takes precedence over the user experience. Here are some examples to illustrate the importance of this.

When “Save data” is triggered by the player’s normal exit, the Host Host will first open a snapshot of the current moment, and then each player’s Host will download its own operation data from the Host Host and save it locally.

But when players are abnormal exits, because of its data download didn’t belong to him, so he is local game data at the last save point of time T1 on, in order to meet our said in front of the accuracy of the data, while the other players did not drop on the island, and their status in the game is the latest is the correct data, But you have to roll everyone else’s data back to time T1 in order for this guy’s data to be correct, which is why There are dropbacks in Actison.

Q&A

Is Nintendo’s online service crap?

Through the above explanation, it can be understood that behind these seemingly strange online experiences, there are indeed many technical problems. After all, Nintendo is a game company, not a professional distributed database company. Although there are many areas that can be improved in the current technology implementation solution, but it is also considered ROI, so it is not garbage.

Why don’t game manufacturers build their own servers to improve the experience?

Players come from all over the world, and if you want to build your own server to enhance the experience, you have to build servers all over the world, which is very expensive and difficult to implement. Yes, there are global solutions out there, but it’s generally companies that rely on the Internet to make money that are able and willing to adopt them. The business model of console games means they don’t have to spend a lot of money to improve the web experience. Of course, hosting manufacturers own a global server is another story.

The resources

  • Wikipedia: Network address translation
  • Peer-to-Peer (P2P) communication across middleboxes
  • Network game synchronization technology overview

Author: Joway