[TOC]

Instant messaging advanced in the field of IM technology

Instant messaging IM technology field basics

How to upgrade the access layer server program

For the current specific Access long connection Access service

My experience in XXX project:

  1. Access layer service, TCP long connection, if you need to update, that is not the client needs to log in again?

    • Yes, but it can be retrofitted, and Access is stripped out to maintain long connections
  2. Access is divided into connection layer and Access, the former does not involve services, so it is not expected to restart, the latter carries services, update restart has no impact on the connection. Consider incorporating push into Access later

  3. The connection layer and Access maintain connection information through shared memory.

For the generic access layer

  1. If the access layer is stateless, the access layer is separated from the logical layer.

  2. Stateless access layer that can be restarted at any time

  3. At the logical service layer, you can use ETCD to discover and register services to facilitate upgrade and expansion

Number of TCP long connections maintained by a single server

  1. Operating systems have Max Open Files limits, either system-wide or process-level

    • fs.file-max
    • soft nofile/ hard nofile
  2. Each TCP socket connection occupies a certain amount of memory

    • Through test verification and relevant data, just stay connected and do nothing, each TCP connection roughly takes up 4k or so of memory, millions of connections, the OS needs more than 4G memory.
    • Net.ipv4. tcp_rmem/net.ipv4.tcp_wmem
  3. Network limitations

    • Assuming that 20% of the million connections are active and each connection transmits 1KB of data per second, the required network bandwidth is 0.2m x 1KB/s x 8 = 1.6Gbps, requiring the server to be at least 10Gbps.
  4. Some basic common sySCTL changes:

    • net.ipv4.tcp_mem = 78643200 104857600 157286400
    • net.ipv4.tcp_rmem=4096 87380 16777216
    • net.ipv4.tcp_wmem=4096 87380 16777216
    • net.ipv4.ip_local_port_range = 1024 65535
    • net.ipv4.tcp_tw_recycle=1
    • net.ipv4.tcp_tw_reuse=1
    • fs.file-max = 1048576
    • net.ipv4.ip_conntrack_max = 1048576

    n = (mempages * (PAGE_SIZE / 1024)) / 10;

    PAGE_SIZE:typically 4096 in an x86_64

    files_stat.max_files = n;

  5. Does the epoll mechanism affect performance if there are too many long connections? The bottom layer uses red-black trees and linked lists to manage data.

    • This does not affect TCP connections or performance, no matter how many events ePoll monitors
    • In addition to creating a file node in the epoll file system and a red-black tree in the kernel cache to store future socket calls from epoll_CTL, the kernel also creates a list to store ready events. When epoll_wait is called, Just look at the list and see if there’s any data in it.
  6. What should be considered in practice?

    • Network card multi-queue support, check whether the network card support, otherwise the CPU can not deal with network data, this requires a good network card, but also CPU consumption

    • Maintain the node management of TCP long connections, which consumes CPU and requires a corresponding data structure for management

    • In practice, you should also think about how fast you can make connections per second, because millions of connections are not made all at once, so if you restart and reconnect, what is the connection speed?

    • If this node fails, what happens to the allocation of requests?

    • What is the capacity of the application layer to handle each connection? What is the capability of the server to parse protocol packets?

    • TCP MEM problem: Memory is not allocated if it is not used, but may not be reclaimed immediately.

  7. Another consideration for long connections:

    • Under the condition of stable connection, connection number of the index, in contrast, without the network throughput actually meaning often is not big, maintain a connection CPU resources consumption is small, each connection TCP stack will account for about 4 k of memory overhead, system parameters are adjusted, we single test data, the highest is 300 w long connection can achieve a single instance. But I personally don’t see much point in taking a higher test.

    • In the actual network environment, a single instance of 300w long connection is under great theoretical pressure. In the actual weak network environment, the disconnection rate of mobile clients is very high. It is assumed that 1/1000 users disconnection and reconnection every second. If the number of new connections reaches 3w every second, the connected 3w users need to register and load the offline storage and other internal RPC calls at the same time. In addition, the heartbeat of the connected 300w users needs to be maintained. Suppose that the heartbeat lasts for 300s, the heartbeat packets require 1w TPS per second. The forwarding of unicast and multicast data and broadcast data itself also needs to respond to the internal RPC call. Under the condition of 300W long connection, the pressure brought by GC and whether the response delay of the internal interface can be guaranteed stably. These are concentrated in one instance and usability is a challenge. Therefore, a single online instance cannot hold very long connections. The actual situation depends on the network status of the client.

  8. Note that too many close_WAIT links are in the close_wait state because the client is often disconnected due to network instability. If the server fails to close the socket in time, too many links in the close_WAIT state will occur.

    • The link in close_WAIT state does not release resources such as handles and memory. If Too many open files occur, the system may run out of handles. New clients cannot access the link, and operations related to creating or opening handles fail.
  9. Considering the situation of different network operators in different regions, users may be unable to connect to our service or slow down due to network restrictions.

    • In practice, we have found that some network operators block certain ports, causing some users to lose access to services. To solve this problem, multiple IP addresses and ports can be provided so that the client can poll to switch to a faster IP if it is slow to connect to an IP address.
  10. TCP_NODELAY

    • Addressing this topic, Thompson believes that many people who are thinking about microservices architecture do not have a full understanding of TCP. In certain scenarios, it is possible to encounter delayed ACK, which limits the number of packets sent over the link to 2-5 packets per second. This is due to deadlocks caused by two TCP algorithms: Nagle and TCP Delayed Acknowledgement. After a timeout of 200-500ms, the deadlock is broken, but communication between microservices is affected separately. The recommended solution is to use TCP_NODELAY, which disables Nagle’s algorithm and multiple smaller packets can be sent in sequence. According to Thompson, the difference is between 5 and 500 req/ SEC.

    • Tcp_nodelay tells Nginx not to cache data, but to send segments – this property should be set to the application when data needs to be sent in a timely manner, so that a small chunk of data cannot be returned immediately.

    • We found that the synchronous call of gRPC conflicts with Nagle’s algorithm. Although THE SOCketopt TCP_NODELAY is added in gRPC code, it has no effect in OS X. Tcp_low_latency = 1. In Linux, net.ipv4.tcp_low_latency = 1, so that the time of the next synchronous call in 100M bandwidth is less than 500us. Moreover, in practical application, we solve the problem of repeated data transmission through streaming call, rather than through repeated synchronous call to transmit the same data, so that a write can be about 5uS. Batch writes have always been a good way to solve performance problems


Heartbeat related processing

  1. The heart actually does two things

    • Heartbeat Ensures the connection between the client and the server, and the server checks whether the client is online
    • The heartbeat also needs to maintain the GGSN of the mobile network.
  2. The most common is to send a heartbeat every four and a half minutes, but that’s not smart.

    • The heartbeat duration is too short, which consumes traffic/power and increases server pressure.
    • If the heartbeat duration is too long, the network may be disconnected because the corresponding entries in the NAT table are eliminated by carrier policies
  3. Heartbeat algorithm (refer to Android wechat Intelligent Heartbeat Strategy)

    • Maintain mobile network GGSN(Gateway GPRS support node)
      • When there is no data communication on a link for a period of time, most mobile wireless network operators eliminate the corresponding entries in the NAT table, resulting in link interruption. NAT timeout is an important factor affecting the lifetime of TCP connections (especially in China). Therefore, the client automatically calculates the NAT timeout to dynamically adjust the heartbeat interval, which is an important optimization point.
    • Refer to a set of adaptive heartbeat algorithms of wechat:
      • In order to ensure the experience of receiving messages in a timely manner, a fixed heartbeat is used when the APP is in the foreground active state.
      • When the app enters the background (or the foreground turns off the screen), use the minimum heartbeat to maintain the long link. Then enter the background adaptive heartbeat calculation. The purpose of this operation is to select the time period when users are not active as much as possible to reduce the impact of delayed message collection caused by heartbeat calculation.
  4. Simplify heartbeat packets to ensure that the size of a heartbeat packet is less than 10 bytes, and adjust the interval of heartbeat packets according to the status of APP (mainly Android)


Related processing in weak network environment

  1. Network accelerated CDN

    • Includes signaling acceleration points and picture CDN networks
  2. Protocol simplification and compression

    • Compression algorithm is used to compress data packets
  3. After the TCP connection is made through the domain name for the first time, the IP address is cached and the IP address is directly connected next time. If the IP address fails to be connected next time, reconnect to the domain name

  4. For large files, images, etc., use breakpoint upload and segmented upload

  5. Balance the impact of network latency and bandwidth

    • If the package size is less than 1500 bytes, merge the request package as much as possible. To reduce the request
  6. IP Nearby Access

    • Direct IP Connection (domain name to IP address)
    • Domain name resolution (IP address library), domain name resolution is particularly slow on mobile networks
    • Calculate the access point of the same carrier closest to the user’s geographical location

Disconnection reconnection policy

After the offline, different reconnection intervals are selected according to different state requirements. If the local network fails, you do not need to periodically disconnect the network. In this case, you only need to monitor the network status and reconnect the network after it recovers. If the network changes very frequently, especially when the App is running in the background, certain frequency control can be added for reconnection, so as to ensure the real-time performance of certain messages and avoid excessive power consumption.

  1. The shortest interval of disconnection reconnection is 4, 8, 16… (Max. 30) sequence execution to avoid frequent disconnection and reconnection, thus reducing the server load. This policy is reset when the server receives the correct package

  2. If there is a network but the connection fails, the interval is 2, 2, 4, 4, 8, 8, 16, 16… (Max. 120) is continuously retried

  3. Policy mechanism after successful reconnection

    • Consolidate partial requests to reduce the time for an unnecessary network request round trip
    • Simplify the synchronization request after login. Some synchronization requests can be postponed to UI operations, such as group member information refresh.
  4. In rewiring the Timer, in order to prevent the emergence of the avalanche effect, we detected the socket failure (server), reconnection is not immediately, but let the client random Sleep for a period of time (or the other strategy) to connect to the server again, so you can make different client on the server restart time not to connect at the same time, This creates an avalanche effect.


How to handle network switchover? Do you want to reconnect or log in again?

  1. Generally speaking, there is a network switch (3G -> 4G ->wifi-> 4G) to reconnect, go through the whole process again

  2. It is better for APP to re-register the server with as little communication as possible, such as no longer obtaining configuration information from the server, but reading directly from the cache data of the server configuration pulled last time (if the server changes, it is better to send a notification to APP for update).

  3. For example, if you switch from wifi to 4G, or are on the edge of subway or wifi, you can use the reconnection policy with a slight delay to avoid reconnection storm (frequent reconnection requests due to network instability)


How to expand/shrink server applications? Horizontal scaling?

  1. Use common distributed service discovery methods to configure solutions. For example, use ETCD for service discovery and registration.

  2. Each module of the design should be able to be independently deployed and designed as stateless, such as the so-called micro service, so as to do the upgrade and expansion of the service well, ensure no single point of failure, and facilitate grayscale release and update

  3. Dynamic configuration


Group message correlation

  1. Is the message write-spread or read-spread: does everyone in the group write the same message once, or does everyone in the group read the same message from the same place?

    • Write diffusion: simple, but everyone in the group has to write through the cache. A bit too much data and increased network consumption (e.g. when writing Redis).

    • Read expansion: write only one copy to the cache, pull from the group cache message, need to add a bit of logic, easy to delete cache data after all the group members have been pulled (or expire)

  2. Way to send

    • Traversing group members, sending them in order if they are online, but when there are many group members and the group is active, it may increase the pressure.

    • Is it possible to send batches when traversing group members, online people, and intra-service flow (RPC)?

  3. Group of way

    • If the MSG is online, only one MSG is sent to the db, and the index is written to the cache and db.
    • Offline, cache, write spread (MSG and index), if the cache fails, through to db pull.
  4. For group messages, each message needs to pull the online status of the group member. If stored in Redis, the pull is too frequent. The number of connections will explode and the concurrency will be too high. This increases the level of local cache, placing connection information in the local cache (reducing network connections and requests by consuming memory)


Client power consumption reduction policy

  1. The alarm Manager is used to trigger heartbeat packets

  2. Minimize network requests and preferably consolidate (or send more than one request at a time). Batch, merge data requests/sends

  3. The download speed of mobile network is higher than the upload speed, the data packet should not be too large for 2G, and the more power is saved for 3G/4G.


How are messages guaranteed to be reachable (not lost)/ unique/sequential?

  1. The header contains the field DUP, which is set to determine repeated delivery if the message is repeated

  2. The server caches the corresponding MSGID list. The client sends the maximum MSGID received. The server determines that all messages smaller than the maximum MSGID received by the client have been received. That way, I don’t get lost.

  3. The server ensures extremely high availability of the MSGID generator, and incrementally ensures message order by msGID size

Details the message loss prevention mechanism

In order to ensure that no message is lost, the simplest solution is that the mobile terminal sends an ACK to the server for each message received. However, this solution has too many interactions between the mobile terminal and the server, and may also encounter problems such as ACK loss under weak network conditions. Therefore, the sequence mechanism is introduced

  1. Each user has 4.2 billion Sequnence Spaces (from 1 to UINT_MAX), allocated consecutively from small to large
  2. Each user needs to assign a sequence to each message
  3. The server stores the maximum sequence that each user has been assigned
  4. The mobile phone stores the maximum sequence of received messages

** Scheme advantages **

  1. Based on the sequence difference between the server and the mobile phone, you can easily incrementally deliver messages that are not received by the mobile phone

  2. In the case of weak network environment, packet loss probability is relatively high, and the packet return from the server often fails to reach the mobile terminal. The mobile phone will update the local sequence only after receiving the message. Therefore, even if the packet return from the server is lost and the mobile phone waits for a timeout to collect the message from the server again, the undelivered message can be correctly received.

  3. Since the sequence stored by the mobile phone is the largest sequence for confirming receiving messages, each time the mobile phone receives messages from the server can also be regarded as a confirmation of the last received message. If an account is logged in to from multiple mobile phones in turn, the server can ensure that confirmed messages are not delivered repeatedly as long as the server stores confirmed sequence messages from mobile phones. After logging in to different mobile phones in turn, other mobile phones cannot receive received messages.


The communication mode (TCP, UDP, or HTTP) uses both TCP and HTTP.

  1. The requirements of an IM system include accounts, relationship links, online status display, message interaction (text, picture, and voice), and real-time audio and video

  2. HTTP mode (short link) and TCP mode (long link), which deal with state protocol and data transfer protocol respectively

  3. When maintaining a long connection, use TCP because you need to receive messages at any time. Choose TCP, not UDP, to maintain long connections

  4. When obtaining other resources that are not timely, use HTTP short connections. Why not all use TCP? What are the benefits of using HTTP?

    • At present, most functions can be implemented through TCP.
    • When it comes to uploading and downloading files, it’s HTTP
      • Supports breakpoint continuation and fragment upload.
    • Pull mode is used for offline messages, preventing the TCP channel pressure from affecting instant message delivery efficiency
    • Large doodles and files are uploaded using the storage service to avoid excessive TCP channel pressure
  5. Whether to use UDP or TCP for IM

    • UDP and TCP have their own application scenarios. As for IM, in the early IM, UDP was often considered because server resources (server hardware, network bandwidth, etc.) were expensive and there was no better way to share the performance load, which was mainly represented by the early QQ.

    • TCP server load has been a good solution, coupled with the reduction of server resource cost, many IM and message push solutions are also using TCP as the transport layer protocol. However, UDP is not excluded from IM and push solutions, such as weak network communication (including transnational high-latency network environment), Internet of Things communication, real-time audio and video communication in IM, and so on, UDP is still the first choice.

    • Whether IM should be UDP or TCP is a matter of opinion. There is no need to get too hung up on it. Please consider your overall IM application scenario, development cost, deployment and operation cost, and I believe you can find the answer you want.


Select the communication protocol between the server and client

  1. Common IM protocols are as follows: The IM protocol is easy to expand, covering various service logic, and saves traffic. This latter requirement is particularly important for mobile IM, right?

    • XMPP: Open source protocol, extensible, in each end (including the server) has a variety of languages, easy access for developers. But there are many drawbacks :XML is weak, has too much redundant information, has a lot of traffic, and has a lot of sinkholes in actual use.

    • MQTT: The protocol is simple and the traffic is low. However, it is not designed for IM and is mostly used for push. You need to implement groups, friends, and so on in your business. Suitable for push services and live IM scenarios.

    • SIP: a text protocol, mainly used for voIP-related modules. SIP signaling control is complex

    • Proprietary protocol: Implement your own protocol. Most mainstream IM apps use private protocols. A well-designed private protocol generally has the following advantages: high efficiency, saving traffic (binary protocol is generally used), high security, and difficult to crack.

  2. Considerations for protocol design:

    • Network data size, bandwidth, transmission efficiency, though for a single user, the amount of data transmission is very small, but for the server to withstand numerous high concurrency data transmission, must consider the data bandwidth, try not to have redundant data, so as to occupy less bandwidth, less resource-intensive, less network IO, improve transmission efficiency;

    • Network data security – Sensitive data network security: Some data transmitted for related services is sensitive data, so you must consider encrypting some transmitted data

    • Encoding complexity — serialization and deserialization complexity, efficiency, scalability of data structures

    • Protocol generality – General specification: Data types must be cross-platform and data formats common

  3. Comparison of common serialization protocols

    • Open source protocols that provide serialization and deserialization libraries: PB, Thrift. Extensions are quite handy, and serialization and deserialization are convenient

    • Text protocol: XML, JSON. Serialization, deserialization easy, but occupy a large volume.

  4. Definition of agreement considerations

    • Packet data can be compressed to reduce packet size

    • Packet data is encrypted to ensure data security

    • Some fields in the protocol, expressed in sectors, can be adjusted to uint32. Reduce the packet header size

    • The protocol header should contain seq_num

      • This is for asynchronous support. The most important thing for this kind of message channel is to solve the channel problem, all the message processing can not be synchronous, must be asynchronous, you send a message, ABC three packets, you receive three packets XYZ, how do you know it is corresponding, is the corresponding relationship, how do we deal with it, is to add an ID

Key considerations in IM system architecture design

  1. Coding Angle: using efficient network model, threading model, I/O processing model, reasonable database design and operation statement optimization;

  2. Vertical scaling: Improve performance by increasing the hardware or network resources of a single server;

  3. Horizontal scaling: Load balancing is implemented through reasonable architecture design and o&M load balancing policies to effectively improve performance. Later, we can even consider adding data cache layer to break the IO bottleneck.

  4. High availability of systems: prevention of single points of failure;

  5. The architecture is designed to separate business processes from data, and relies on distributed deployment to ensure system availability in the event of a single point of failure.

  6. The dual-system hot backup technology can be used to cut key independent nodes

  7. Database data security can be resolved by redundant configuration of disk arrays and active/standby databases.


TCP congestion solution

TCP Congestion control consists of four core algorithms: Slow Start, Congestion voidance, Fast Retransmit, and Fast Recovery.


How do I determine if kafka queues lag?

Kafka queues, there is no concept of full, only the concept of consumption lag/accumulation

  1. The offset monitor monitors Kafka in real time

  2. For kafka

    • It is a distributed system that supports linear scaling, so it does not face this problem.

    • Can you write data and not consume it?


A scheme to manipulate caches and databases

  1. Write: Write to the database and update the cache

  2. Read: The cache is read first, and db is penetrated if there is no data.

However, if I write to the database successfully, updating the cache fails. So the next time I read, I will read dirty data (inconsistent data). How to handle this situation?

Solution:

  1. Eliminate the cache first, then write the database. But if in concurrent time, also may appear inconsistent problem, that is, if after eliminating the cache, has not written db in time, this time to read the request, will directly read the old data from DB.

    • Therefore, it is necessary to strictly ensure that operations on the same data are serial.
  2. Inconsistencies between database and cache data due to read and write concurrency at the database level (essentially, subsequent read requests are returned first) can be resolved with two minor changes:

    • Modify the Service connection pool. The Service connection is modelled by the ID to ensure that the read and write of the same data falls on the same back-end Service

    • Modify the DB connection pool of the database and select DB connections for the id module to ensure that the reads and writes of the same data are serial at the database level


Database sub – database sub – table

Why should the database be divided into tables? Under what circumstances is the database divided into tables?

  1. Resolves the maximum file limit on the disk system

  2. Reduce the impact of the lock on the query when incremental data is written, reduce the table lock caused by the long query, affect write operations such as lock contention (table lock and row lock). Avoid lock contention between individual tables, save queuing time and increase throughput

  3. As the number of single tables decreases, common query operations reduce the number of records that need to be scanned, reducing the number of rows required for a single query, reducing disk I/OS, and shortening the latency

  4. The resources of a server (CPU, disk, memory, IO, etc.) are limited, and the amount of data that the database can bear and the data processing capacity will encounter bottlenecks. The purpose of database splitting is to reduce the load on a single server. The principle of database splitting is based on the tightness of services. The disadvantage is that cross-database cannot be linked to table query

  5. When the data volume is very large, the b-tree index is not so useful. If there is a large amount of data, a large amount of random I/O will be generated and the response time of the database will be unacceptably large.

    • When a large amount of data is generated, the depth of the B-tree becomes deeper and the NUMBER of I/OS required to pass from the root node to the leaf node increases. When you have more than four I/O levels, it gets really slow, and in fact, four I/O levels, you’re storing terabytes of data, unless you’re storing ints and so on. I can’t say BTREE doesn’t work, just that it doesn’t.
    • Does it have to be random I/O because there’s so much data? This is not necessarily true. 10E records can return results quickly if they are all primary key queries. When querying with secondary indexes, it becomes random IO, and the response time is slower, depending on the distribution of the data. In addition, he did not mention the storage media, if you use SSD, random IO is 100 times stronger than SAS, performance is also good

The goroutine Golang

  1. Goroutine is all user mode scheduling, coroutine switching is simply to change the execution function stack, does not involve kernel mode and user mode transformation, the overhead of context switching is relatively small.

  2. Creating a Goroutine takes about 2k(V1.4) stack space.

  3. Go has preemptive scheduling: If a Goroutine has not been scheduled for a long time, it will be preempted by the Runtime

Does this mean that creating a certain amount of goroutine (say, 30W, 50W) with sufficient memory will not cause too much performance degradation due to CPU scheduling?

  1. If your system has too many Goroutines, one of the possible reasons is that each goroutine takes too long to process, and you need to look at why it takes too long to process.

  2. Given the approximate data, 24 core, 64G server, in QoS is message at least, pure push, message body 256B~1kB, a single instance of 100W real user (200W +) coroutine, the peak value can reach 2~ 5W QPS… Memory can be stable at about 25G, gc time is about 200~800ms (there is room for optimization). (From 360 Messaging System sharing)

Net connection management for long connection access layer

There are many NET connections in the long connection access layer. Generally, a single server can have hundreds of thousands or even millions of connections. Then how to manage these connections? How can I quickly find the connection for this request? How do connections and users correspond

Manage TCP long connections

  1. A connection structure containing TCP connection information, last communication time, encryption and decryption sharekey, clientaddr. It also contains a user structure

    • The user structure contains uid, deviceid. Name,version…. , including the connection above, one to one.

    • Instead of using a map to manage the TCP connection information and user information to carry out a one-to-one mapping, if the map, millions may be slow to find.

    • During the login request, the USER information can be obtained based on the TCP connection information, but the user information is basically not populated with any data, so the user information structure needs to be populated based on the login. The key is: in the current Access service, there will be a useMap, which will match uid and user information, can be used to determine whether the UID has been logged in to the instance

    • When the data is returned, the user structure can be obtained according to the UID, and then the CORRESPONDING TCP connection information can be obtained through the structure, and the information can be sent.

  2. In addition, the login logout, there will be another connection information (uid/topic/protoType/addr… Add or delete to the user center

    • Successful login :UseAddConn
    • Logout offline :UserDelConn
    • This connection information is available for other remote service calls, such as Oracle.
  3. If there are more than one Access layer, each Access layer will have a useMap structure.

    • If multiple terminals log in to the same account and are in different Access, they cannot be kicked out through useMap and need to be managed by the user center described above
    • Multiple Access means multiple USEMaps, so you need to ensure that a request from one Access will be returned to the current Access. How do you guarantee that? The current Access IP address :addr is delivered all the way down, and then when returned, the corresponding Access is returned according to the delivered Access IP :addr.
    • Then, the user structure and TCP connection structure corresponding to the current UID are obtained according to the UID.

Data structure: Map/Hash (red-black tree)

Abnormal management sending and receiving, request ack, timeout

  1. Using the Map data structure, after sending a message, msGID and UID immediately add the corresponding message body to the Map structure.

  2. After receiving the response, delete the corresponding map structure.

  3. After the timeout, resubmit OfflineDeliver. Then delete the corresponding map structure.

Asynchronous, concurrent, how does the RPC framework know which request is which?

  1. Each time the client thread calls the remote interface through the socket, it generates a unique ID, requestID (the requestID must be unique within a socket connection). It is common to use AtomicLong to accumulate numbers from 0 to generate unique ids, or to use timestamps to generate unique ids.

  2. GRPC also requires service discovery. GRPC services may have one instance. 2, or even more? A service may hang/go down. You can use ZooKeeper to manage it.

  3. Synchronous RPC calls block until a reply is received from the server, which is the closest abstraction RPC wants. On the other hand, the network is asynchronous internally, and the ability to start RPC without blocking the current thread can be very useful in many scenarios. In most languages, the gRPC programming interface supports both synchronous and asynchronous features.

  4. GRPC allows clients to specify a deadline value before calling a remote method. This value specifies how long the client can wait for the server to reply, after which the RPC will end with a DEADLINE_EXCEEDED error. The expiration value can be queried on the server side to see if a particular method has expired or how much time is left to complete the method. Languages differ in the way they specify a deadline

Service performance considerations

  1. Coding Angle:

    • Adopt efficient network model, thread model, I/O processing model, reasonable database design and operation statement optimization;
  2. Vertical expansion:

    • Improve performance by increasing hardware resources or network resources of a single server;
  3. Horizontal expansion:

    • Through reasonable architecture design and load balancing strategy in operation and maintenance, load balancing is implemented to effectively improve performance. Later, we can even consider adding data cache layer to break the IO bottleneck.
  4. High availability of the system:

    • Prevent single point of failure;
  5. The architecture is designed to separate business processes from data, and relies on distributed deployment to ensure system availability in the event of a single point of failure.

  6. The dual-system hot backup technology can be used to switch over key independent nodes.

  7. Database data security can be resolved by redundant configuration of disk arrays and active/standby databases.


Server bottleneck analysis

According to the pressure test, gRPC is one of the factors affecting the bottleneck. Why gRPC? Why CPU consumption? How to solve it? The network must not affect throughput.

  1. The UArmy mode is used. Consider streaming. Batch delivery to improve efficiency

    • The uarmy mode is one-to-one. When the concurrency increases, the number of connections increases

    • Streaming, for example, merges multiple requests (batch package requests/responses) to reduce network interactions and connections

    • I have done the pressure measurement of streaming, and the performance is said to be more than twice as high as unary

  2. Generally, servers have a parabolic pattern. As the number of concurrent requests increases, they will gradually consume and run out (CPU/memory/network bandwidth/disk IO), resulting in slower response times (Latency becomes longer), and the QPS/throughput does not increase.

  3. For GRPC, when the number of concurrent requests increases, the actual effect is that the latency increases. For some requests, the response time of one request reaches about 5s (ACCESS/PUSH), which indicates that the delay is too long. QPS/throughput = Concurrency/response time. Response time is too long to swallow.

    • Why is the response time so long? Is it because the CPU is full?

    • Another reason is slow response, that is, the final request will be sent to Oracle service, and Oracle will request data resources (cache/ DB). In Oracle design, the increase of concurrent requests for resources (the number of connections also increases) will lead to the increase of the latency of the request resources, so the call back to the upper-layer GRPC will also increase the latency.

    • So it all comes down to CPU/memory/network bandwidth/disk IO

  4. For RPC, the number of connections increases, resulting in:

    • Like TCP long connections, memory must be allocated for each connection

    • Processing so many connections at once, with each connection having its own transaction, makes the CPU more capable

  5. Later, after investigation, we found that the synchronous call of gRPC would conflict with Nagle’s algorithm. Although gRPC added TCP_NODELAY socketopt in the code, it had no effect in OS X. Tcp_low_latency = 1. In Linux, net.ipv4.tcp_low_latency = 1, so that the time of the next synchronous call in 100M bandwidth is less than 500us. Moreover, in practical application, we solve the problem of repeated data transmission through streaming call, rather than through repeated synchronous call to transmit the same data, so that a write can be about 5uS. Batch writes have always been a good solution to performance problems


How can I quickly access the access layer of the server

If the server is in Beijing and the client is in Guangzhou, how can I access it quickly? Is there any other way to use CDN?

  1. If there is only one data center, there is no alternative to CDN acceleration for the time being.

  2. If there are two data centers, you can adopt the nearby principle, but the data of the two data centers needs to be synchronized

  3. Nearest access: The DNS service is used to find the nearest machine and provide services in the shortest path

How to improve your IM skills?

  1. It is necessary to be able to estimate the QPS that the system can support without pressure measurement. It is necessary to be able to roughly estimate the time of a DB request, a REDis request, and a RPC call request.

    • What are the time-consuming and CPU consuming tasks in the system?
  2. All systems must have several layers, from top to bottom, what are the requests for each step? What is the time spent at each level?

    • Whether other resources are imported into the system

    • The performance bottleneck cannot be CPU/IO.

    • Db query slow, why slow? There must be a reason for the slowness, right?

      • It takes about 0.2-0.5ms to query an SQL statement. (In the case of small table data, whether to query by index ID makes little difference.)
    • QPS for a single machine is 8K, which is relatively small. 1/8ms=0.125ms, QPS is 8K, then 5 machines, QPS is 4W, and 10W people are online at the same time, if the sending and receiving count as a QPS, then THE QPS is halved, that is 2W QPS, 10W is online at the same time, everyone sends a message 3-4s, need QPS to 3W.

    • When TESTING Redis before, we have tested that if the concurrency is too high, it will take a long time to pull Redis, more than 3s.

    • Normally, it takes a person at least 5 seconds (6-8 words) to send a message.

  3. To further improve IM technology, you must be able to analyze performance, identify performance bottlenecks, and resolve them.

    • Also look at others such as wechat some practices
  4. Architecture is gradually transformed, each stage has the architecture of each stage, the general architecture, the initial three/four tier architecture. The first stage of transformation is to split the service according to logic and business, merge the resource request, reduce the number of concurrent and connection.

  5. Keep an eye on big numbers like registered users, daily live, monthly live, and retention. Be sensitive to the data, why does it stay the same, why does it suddenly go up, what is the peak? How much can it hold so far?

  6. Pay attention to system performance indicators, SUCH as CPU, memory, network, and disk data, and observe frequently to see if there are exceptions. Detect problems in advance, rather than waiting for problems to be solved. If problems occur, ensure that the solution time is minute.

    • To fully understand the meaning of the system underlying tools, such as SAR, iosta, dstat, vmstat, top, etc., always observe the, these data often watch

    • Ensure that all parts involved in the system are white box

      • Other services depend on who is responsible for them, how they are deployed, and in which room

      • Resources used, how much memory is redis? How many mysql databases? How many table? How are master and slave distributed? For messages: one master, two slaves,32 libraries,32 tables. For friend data: one master, one slave,128 tables. For the circle of friends, by month table.

Summary & How to think and improve self ability

  1. If there is no thinking, the existing things are reasonable, which is obviously not possible.

    • Every time you look at something, you have to think, is this thing reasonable? Can it be optimized? What are some of those? What if

    • For example, when I first got in touch with XXX project, I thought the structure was good and there was no need to optimize it. However, after large-scale promotion was needed, XXX put forward some optimization points and exposed some problems by increasing the magnitude

      • Mysql slow request problem after large concurrency

      • When the number of concurrent requests is large, the number of concurrent requests for resources is too large. Therefore, resource requests need to be merged

      • The Access layer service is not easy to upgrade. Therefore, the Access layer service needs to be disconnected to improve the stability. Convenient service upgrade.

  2. In addition to being familiar with the code framework, be sure to dive into the details, such as the low-level optimization of Golang, system-level optimization.

  3. Think about the problem and magnitude around the IM realm, what the current magnitude is, and then you need to think about what you need to do at a higher level.

    • When the current level is LEVEL W, you have to consider what you should do at level 100,000. After level 100,000, you have to consider millions. You don’t have to do it immediately, but you have to think about it first. How do you refactor?
  4. Understand the relevant technical solutions of the industry, understand the pit that others have stepped on. With the subsequent large volume, it can provide better technical methods and architectures for advanced IM/IM development, not only limited to XXX projects. Be able to think around the entire IM domain

    • The structure, technical solution and selection of the industry need to be understood first.

Welcome to follow my wechat official account: Linux server system development, and we will vigorously send quality articles through wechat official account in the future.