How do you build a robust service

In the era of big data and traffic explosion, how to ensure the security and stability of our services is a problem that every enterprise and every developer need to pay attention to. So micro services, distributed, big data and so on some lofty still terms appear… . But no matter how complex a system or framework is, it’s bound to be made up of something small.

distributed

The emergence of distribution is to solve the problem that individual services cannot bear the high pressure, and at the same time solve the problem that the whole service is unavailable due to the failure of individual services. Distributed services must be composed of multiple nodes, each of which can share some of the load, so that one machine can support 1W QPS requests, and 10 machines can achieve 10W QPS requests. Moreover, by deploying the nodes in different machine rooms, the phenomenon of collective paralysis caused by the concentration of nodes is reduced. If we have only one machine, the entire service will be unavailable no matter where we deploy it. If we have 10 machines, distributed in city A and city B…. In this way, even if the nodes in city A are unavailable due to weather or other reasons, we just need to cut off the traffic originally connected to city A and distribute the traffic to the other 9 nodes. The whole service is still available, realizing high availability. Since there are multiple nodes, how can we evenly distribute the traffic or distribute it to a single node according to our policy?

Load balancing

For distributed services, common load balancing algorithms are as follows:

The polling method

Assign requests to each node in turn, such as the first request to node A, the second request to node B… , the polling method is very fair for each node and does not care about the current state of the node (such as node connection number, load, etc.). Therefore, when polling is used, the configuration of each node should be the same as possible (same CPU cores, same memory…). . If the configuration of a node is too low, it may have a greater pressure for the same traffic, which is not fair in the case of polling, and the traffic allocated to the node with a low configuration may be too heavy to bear.

Random method

Instead of polling, the random method uses some random algorithm to select one of the nodes. For example, if there are 10 nodes, you can use rand(1,10) to randomly select a node. According to probability statistics, the random algorithm tends to average as the number of calls increases, so the final effect of the allocation method through random method is similar to that of polling method.

IP hash method

Sometimes we want A request from A client to be fixed to A node, like A request from client A, and every time I want it to be assigned to node A. You can hash a number based on the client’S IP and then modulo the number of nodes. The resulting number is the node number to be assigned for this request. For example, if the IP address of the client is 192.168.10.1 and the hash function yields hashIndex(192.168.10.1)%10=5, then 5 indicates that the node with serial number 5 should be allocated.

Weighted polling method

Reality, if there is A certain node configuration, by weight of reasonable distribution flow, such as node A is 2 nuclear CPU, and the rest of the nodes are 4 core CPU, obviously A can bear the load of nodes is relatively low, this time can be allocated according to the weight flow, low configuration node weight is lower, Then distribute the traffic to each node according to the weight order. For example, in normal polling, each node is allocated in turn. After weighting, all nodes except node A are allocated in turn. Node A may be allocated again after two rounds of allocation for other nodes.

Weighted stochastic method

Like the weighted polling method, the weighted random method also allocates traffic according to the weight configured for each node. The difference is that it requests the back-end server randomly based on weights, not sequentially. For example, we can assign probabilities based on the number of cores. If node A has 2 cores and the remaining 9 nodes have 4 cores, the probability of node A being assigned each time is 2/(4*9+2).

Minimum join number method

The minimum connection number method should be a flexible load balancing algorithm. When the configuration of each node is different, in addition to manually configuring the weight algorithm, traffic can also be allocated according to the number of connections of nodes, which is the minimum connection number method. Each time, the node with the least number of connections is dynamically selected to allocate traffic according to the current connection status of nodes, so as to improve the utilization efficiency of each node as much as possible and allocate traffic more reasonably.

Service registration and discovery

ip + port

When we have multiple clusters and A load balancing algorithm is designed, the next thing to solve is how to find the corresponding node for A request. For example, A request should be allocated to node A, so how can A request find node A? The simplest method is the IP + port method: tell the caller the IP and port, and the caller selects the corresponding IP node according to the LB to find the service. This method is commonly used to configure the nginx upstream, and use nginx as the proxy layer of our server to help us discover the service. However, the disadvantage of this approach is that it requires manual intervention. When we need to dynamically add or remove a node, we need to modify the configuration of nginx. If a node fails, it will be a bit slow to modify the configuration manually.

Registration and Discovery

You might think that the IP and port approach is ok, after all, in the case of few nodes, add and subtract configuration is also quick. However, in the large-scale micro-service system, we will break down the service into more details. We will break down the whole application into one micro-service according to the business classification. Each micro-service is highly available, so each micro-service is also composed of multiple nodes. In the case of so many nodes, management by IP +port will cost much manpower and be insecure. Therefore, service registration and discovery emerged. Through service registration and discovery, nodes can be automatically managed without human intervention, which greatly improves the efficiency of operation and maintenance.

Service registration: to tell their IP and port to a manager, and can set an identity name.

Discover ->register('192.168.0.1',8000,'auth')Copy the code

Service discovery: Find the corresponding node by the registered name.

Discover ->find('auth')Copy the code

Service registration and discovery checks the node survival status through heartbeat. If the node heartbeat fails, it will be removed automatically. Of course, discover should also be clustered and highly available. At present, consul, ETCD, and ZooKeeper are generally used for implementation.

The cache

With the development of the information, the amount of data is more and more big, more and more users, simply by database, speaking, reading and writing cannot meet the needs of fast response, because IO is relatively time consuming, so we usually in front of the db block on a cache, first read the cache, the cache expires we’ll read from the database, read the data back to the cache, So this is a closed loop. By caching, we can ensure that most of the requests are cached, which not only improves the response speed, but also protects the DB. Since caching is used, it is unavoidable to have data consistency problems. When we try to update a piece of data, do we update the cache first or the database first?

Update the cache first, then the database

Assuming that updating the cache succeeds and then the database fails, this results in inconsistent data and old data being read from the database after the cache fails.
Suppose you now have two requests A and B operating simultaneously
1. A Cache update
2. B Update cache
3. B Updating the database
4. A Updating the database
A is delayed due to network problems, so it can be found that the data of B updating the database is overwritten by A.

Update the database first, then the cache

If updating the database succeeds and then the cache fails, then the data will be inconsistent, but the correct data will be read after the cache fails, compared to updating the cache and then updating the schema of the database.
Suppose you now have two requests A and B operating simultaneously
1. A Updating the database
2. B Updating the database
3. B Update cache
4. A Cache update
Because A is delayed due to network problems, it can be found that B’s updated cache data is overwritten by A.

, regardless of his double writing inconsistent problem update cache this action is not recommended, because sometimes the cache data is a more complex data, it is a complex, need to read a few table data aggregation, and if you are using json string, so every time want deserialization, serialization, again after update the overhead is very large. And if, at some point, you update 10 times but don’t read a single one, you’re wasting multiple updates to the cache.

Delete the cache first, then update the data

When A requests to delete the cache, B requests to read the cache:

A Deleting cache
User B requests the cache and finds that the cache does not exist
B requests the database to write the read old value back to the cache
A updates the new value to the database

The solution to this problem is usually delayed deletion, which is performed again after A updates the database.

Update the database first, then delete the cache

Similarly, if updating the database succeeds but deleting the cache fails, then data inconsistency will also occur. Or if your cache is a master-slave architecture, the master deletion succeeds but has not yet synchronized to the slave, then reading the slave server will also cause temporary data inconsistency problems.

Neither method can guarantee database cache double-write consistency, let alone distributed systems. Choose an approach that is less risky or more appropriate for your business scenario. The rest is to connect monitoring and alarm. In some scenarios with high requirements for data consistency, it is a good choice to timely notify abnormal data through monitoring and alarm, and then repair abnormal data.

The interface current limiting

First of all, each interface does different things. Some interfaces may only return static configuration, some interfaces require complex computation, and some interfaces require a lot of IO… . If the flow is not limited, for some time-consuming interfaces, the sudden flow may suspend our machine or the DB behind it. How to estimate the current limit? Generally, we estimate the QPS that our interface can withstand through pressure measurement. If the QPS exceeds the estimate, we refuse to deal with it. Good traffic limiting algorithms are also very important. Common traffic limiting algorithms are as follows:

Counter current limiting

One comes, one adds up, and when the total reaches a set threshold in unit time, service is denied. The disadvantage of this scheme is that the burst traffic is not well supported. For example, the 1s is limited to 100, which will be used up in the first 100ms and cannot be serviced in the remaining 900ms.

count++
count--
Copy the code

Fixed window current limiting

Compared to count limiting, fixed window has a window concept. The main idea of fixed window flow limiting is to regard a certain period of time as a window, and in this window there is a counter, which is used to count the number of requests within this time window. When the number of requests exceeds the threshold, the flow limiting will be performed. When the next window opens, a recount is performed. However, a fixed window has a disadvantage: suppose my current limiting is 100qps, and my window is in unit of S. The last 0.5s of the first window consumes 100Qps, and then the first 0.5s of the second window consumes 100Qps, so the 1s of the two add up to 200qps, which obviously does not comply with our current limiting rules.

Sliding window current limiting

To solve the problem caused by fixed Windows, we can set the window to be unfixed. Each time the request comes, push forward 1s, then within the 1s window, if the traffic limit is not reached, the service will be provided, otherwise the service will be denied. However, the sliding window also has a problem, that is, it cannot solve the problem of sudden traffic. For example, the first 1ms has used up the flow limit, and the remaining 999ms can not be served.

Bucket current limit

The idea of the leaky bucket algorithm is that the external rate is variable (dense and sparse requests), while the capacity of the bucket is fixed and always flows out of the bucket at a constant rate. When the bucket is full, the excess requests will be limited (bucket overflow). The leaky bucket method is suitable for constant rate processing scenarios.

The token bucket

The idea of a token bucket is to put tokens in at a constant rate. When the bucket is full, tokens are not put in, indicating that the request rate is not keeping up with the token loading rate. In this way, there will be enough tokens in the bucket when the service is idle. In case of burst traffic, there are enough tokens in the bucket, which can meet the burst traffic to a certain extent. The token bucket algorithm supports the bucket output rate is arbitrary, and the token needs to be obtained from the bucket every time. If a token is obtained, it means that the task can be processed. If no token is obtained, the traffic is restricted.

The action of putting tokens in does not need to be handled by a separate thread, and the time difference and rate can be used to calculate how many tokens to put in each time the token is fetched.

Speed: speed = 1/n # for example, 1/n #, then the rate is 1/n every 10 seconds. Time difference: diffTime := now-lastTimeCopy the code

Service fusing

Why are circuit breakers needed? In micro service, the service break up fine, frequently, service A – > B C – > service, if C services because some lengthy operations, leading to the slow query, the RT linear growth of the interface, then lead to B service all request timeout, as more and more overtime, the TCP connection is played, service B also appear problem, after service B appear problem, As A result, the timeout of A gradually increases, and the service of A is also unavailable. It can be found that all related services are unavailable due to the problem of service C, causing an avalanche. We should disconnect the communication between service B and service C when service C becomes continuously unavailable. This is called a circuit breaker. Of course, we can’t always be a circuit breaker, if service C is good, we still have to provide service, then this involves a circuit breaker algorithm.

Fusing strategy

Count the total number of requests over a period of time
Calculate the request failure rate during this period
If both 1 and 2 are met, a fuse is triggered
Fusing after a period of time half open, try to put a little amount
If there are still many failures after half open, fuse again, otherwise the fuse is closed.

When the total number of requests reaches our set number and the failure rate reaches our set standard, the circuit breaker is triggered. The fuse will tell us that the service is unavailable, so the next time a request comes in, it will return directly. After a period of time (time optional), assuming that the dependent service has been restored, the fuse will try to put part of the flow into the test. If there are still a lot of errors at this time, it means that the dependent service is still not restored. Then the fuse will be triggered again and wait for the next half-open time. If half-open traffic is ok, the circuit breaker is turned off and service resumes.

Service degradation

Degradation is also an important tool in service governance. When service traffic peaks, some unimportant services fail, which may cause a chain reaction and affect the main business. At this time, some unimportant services (such as operations, comments…) can be stopped. .

methods

The whole service was shut down
The associated interface returns null
The interface relies on the DB resource to be cut off and read from local memory

Of course, the downgrade can be used together with the fuse.

If break {return localCache // drop localCache}Copy the code

Daemon + smooth restart

If our service is running, there is a sudden abnormal exit, this time assuming that there is no automatic recovery function, it would be very embarrassing.

Guardian way

If the service is onk8s, k8S will automatically pull up when pod exits.
Through a third partysupervisorManage our process
Through the systemsystemdTo manage our process

Smoothing principle

It is important to pay attention to smoothing during service restart. Do not restart the process of the request 504 situation, generally through the main process fork child process, the main process is responsible for processing the old request, after the main process finished processing, automatically exit. All new requests are processed by the child process, and the child process becomes the new master process after the master process exits.

Distributed link tracking

In distributed systems, due to the separation of services, there may be many micro-service interfaces nested behind a gateway interface, which undoubtedly increases the difficulty of troubleshooting problems. For example, A->B->C->D, A went to B because of the error, B said that the error caused by C, C said that the error caused by D. If there is a way to connect nodes in different places and finally display the time, IP address, error information of each node on the interface, it can greatly improve the efficiency of troubleshooting.

The emergence of distributed link tracing (TRACE) solves this problem. Link tracing generally consists of three core steps: data acquisition, data storage and data presentation. Data collection requires code intrusion, which inevitably leads to different writing methods. Therefore, OpenTracing appears. OpenTracing is a set of distributed tracing protocols, independent of language, with unified interface specifications, which facilitates access to different distributed tracing systems. Data models defined in the OpenTracing semantic specification include Trace, Span, and Reference.

Span

Span represents a single unit of work, which can be an RPC call, a function call, or even a call that you think belongs in the Span module. The modules we focus on are delineated by span, which generally includes:

Span name, such as the service name of the dependent party
Start time stamp
End time stamp

Trace

Trace represents a complete Trace link, for example, the entire life of a request. A Trace is a directed acyclic graph (DAG) consisting of one or more spans.

[Span A] please please please (the root Span) | + -- -- -- -- -- - + -- -- -- -- -- -- + | | [Span B] [C] Span please please please (Span C is A`ChildOf`[Span Span A) | | D] + - + -- -- -- -- -- -- -- + | | [Span E] [Span F] > > > [Span G] > > > [Span H] write write write (Span G`FollowsFrom` Span F)
Copy the code

The timeline is shown as follows:

-- - | -- - | -- - | -- - | -- - | -- - | -- - | -- - | -- - > time [Span A. · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·] [Span B · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·] [Span D · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·] [Span C · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · · ·] [Span E · · · · · · ·] [the] Span f. [the] Span g. to [the] Span h.Copy the code

Reference

A Span can have a causal relationship with one or more spans, which is called Reference. OpenTracing currently defines two types of relationships: ChildOf and FollowsFrom.

ChildOfThe execution of the parent Span depends on the execution result of the child Span. For example, for an RPC call, the parent Span synchronously waits for the return result of the child Span. So the child Span is the ChildOf the parent Span.
FollowsFromThe execution of the parent Span does not depend on the execution of the child Span, such as some asynchronous processes.

Each requested link has a unique traceID. By using traceID and spanID, all spans on a link can be associated so that the entire requested link can be seen at a glance. Common distributed link tracking includes Uber’s Jaeger, Twitter’s Zipin, Ali’s Hawkeye…

The log

Troubleshooting through logs is also a common method. According to different scenarios, logs are classified into the following levels:

DEBUG: Debug level logs, the type often used when debugging programs.
TRACE: Trace level logs, which contain detailed information about how the program is running.
INFO: Information level logs, which print required INFO information.
WARN: Indicates a warning level log. It indicates a potential error. You need to further determine the severity of the problem.
ERROR: Logs of the error level indicate that an error occurs in a certain area and affects normal functions
FATAL: Critical logs, program crashes, abnormal exits, and other critical errors.

monitoring

The robustness of the service is dependent on monitoring, through which we can see the QPS status of the service, HTTP error status, time consumption, CPU load status, and the business data status that we want to monitor ourselves… . The monitoring system consists of collecting indicators, storing data, and displaying pages. The most widely used of these is Prometheus+Grafana. Prometheus monitors and reports four metrics types:

Counter:

Counters are simple metrics that can only be incremented or reset to 0, such as the number of requests to an interface, the number of HTTP errors, etc.

httpReqs := prometheus.NewCounterVec(
		prometheus.CounterOpts{
			Name: "http_requests_total",
			Help: "How many HTTP requests processed, partitioned by status code and HTTP method.",} []string{"code"."method"},
	)
	prometheus.MustRegister(httpReqs)
	httpReqs.WithLabelValues("404"."POST").Add(42)


m := httpReqs.WithLabelValues("200"."GET")
m.Inc()
Copy the code

Gauges

The Evacuation scale is used to deal with parameters that change dynamically over time, such as memory utilization, CPU load, and so on.

opsQueued := prometheus.NewGauge(prometheus.GaugeOpts{
		Namespace: "our_company",
		Subsystem: "blob_storage",
		Name:      "ops_queued",
		Help:      "Number of blob storage operations waiting to be processed.",
	})
	prometheus.MustRegister(opsQueued)

	// 10 operations queued by the goroutine managing incoming requests.
	opsQueued.Add(10)
	// A worker goroutine has picked up a waiting operation.
	opsQueued.Dec()
Copy the code

Histogram

The histogram, a single histogram, generates three metrics, _count, _sum, and _bucket.

_bucket: Collects statistics on each sampling point and sends the statistics to each bucket
_sum: Accumulates the sum of the values of each sampling point
_count: Cumulative sum of times of sampling points (count)

For example, now there are 100 cakes, the quality of the cake is small 100g, we hope to divide the 100 cakes into the corresponding boxes according to the quality, now there are 5 boxes can hold 0-20g, 20-40g, 40-60g, 60-80g, 80-100g cake respectively. So the _bucket is the five boxes. If the 20g box contains 5g, 10g, and 15g cakes, then _count is 3 and _sum is 5+10+15=30g.

CakeHistogram = prometheus.NewHistogram(prometheus.HistogramOpts{
		Name: "cakeHistogram",
		Help: "cake histogram",
		Buckets: prometheus.LinearBuckets(20.40.60.80.100),
	})
cases:=[]int{1.2.3.4. }for i:=0; i<len(cases); i++{ CakeHistogram.Observe(cases[i]) }Copy the code

Summary

Similar to summary and histogram, three indicators are also generated, namely _count, _sum and quantile. _count and _sum share the same concept with histogram. Quantile means quantile, and summary can be specified when defining. Such as 5, 9, 99, 999, 999, the concept of the 999 quantile is smaller than this number accounted for 99.9%. If you have N samples, first sort them from smallest to largest, and then extract the values in order φ *N. Summary can reflect the situation of the overall indicator by percentage probability. Generally, a fluctuation range of φ will be added, such as 0.9:0.1, so that the final φ is between 0.89 and 0.91.

GradeSummary = prometheus.NewSummary(prometheus.SummaryOpts{
		Name: "man_grade",
		Help: "man grade",
		Objectives: map[float64]float64{0.5:0.01.0.9:0.001.0.99:0.01.0.999:0.01}})var salary = [10]float64{90.87.88.84.99.100.91}
for i:=0; i<len(salary); i++{ GradeSummary.Observe(salary[i]) }Copy the code

To report indicators, you need to install the Exporter component on your machine, and Prometheus Server will proactively pull the collector data or the collector will proactively push the data to the Push Gateway. Prometheus Server then pulls data from the Push Gateway. Indicators are generally temporary in the machine’s memory, which is why the general version of the time, there will be a case of curve jitter.

The alarm

The alarm is the first time we know the problem means, a service error can alarm, a period of time QPS jitter can alarm… , common alarm means are as follows:

Enterprise WeChat
mail
SMS
The phone

CI/CD

Without CI/CD, the task of getting code online was tedious and required multiple people to work with.

You have to compile it yourself, then run unit tests, and if you find bugs along the way, you have to change the code, and then you have to manually compile and test it… This is a series of repetitive actions that reduce efficiency from development to deployment.

CI

Continuous Integration: Developers can frequently integrate their code into branches of a common code repository. CI can automatically detect, pull, build, and unit test after source code changes. The goal of CI is to quickly ensure that newly submitted changes by developers are good and suitable for further use in the code base.

CD

Continuous Delivery Is the deployment of integrated code into a pre-delivery environment on the basis of Continuous integration.

Continuous Deployment: Continuous Deployment automates the process of Deployment to production on a Continuous delivery basis.

Through CI/CD, we can liberate repetitive labor, fix problems faster, deliver results earlier, realize the integration of development, operation and maintenance, and greatly improve the delivery efficiency.

Configuration center

In order to reduce the hard-coded in our project, for some may often change the configuration information, a set of mature configuration center can bring many benefits: to our project not because a small configuration, and to modify the code (such as the price of a commodity, today is 9.9 yuan, do tomorrow, hope to 6.9 yuan). Depending on the scenario, changing static constants to read dynamic configurations can improve delivery efficiency.

implementation

Generally, when the project is released, go to the remote configuration center to pull the corresponding configuration file to the local, and the program maps the corresponding configuration to its own variables according to the configuration file, and the corresponding code reads the corresponding variables.

The characteristics of

Visual configuration background, convenient operation
Version record, which records every configuration change, for easy backtracking if there is a problem
Rollback. If the new configuration has problems, you can roll back to the previous configuration.

Failure to rehearse

A fault drill is similar to a military drill. Although the line is stable, how to troubleshoot and solve problems? The purpose is to exercise the programmer’s ability to troubleshoot problems and quickly locate problems. So how do you set up a rehearsal environment? Direct use of the test environment is certainly not good, first of all, can not affect the normal test, secondly, the test environment and the real environment is still some difference. Pre-delivery and production are even worse, their internal connections are real online resources, can not affect online.

Environment set up

Therefore, we need to build such a drill environment that is close to the real environment. The resources of the drill environment (DB, cache…) It needs to be isolated from production and copy online resources such as DB to the new DB. The next step is traffic, which should be close to the real traffic, preferably the real traffic. You can use HTTP traffic playback magic devices such as GoReplay to replicate traffic on the LB layer, and then replay the copied traffic in the drill environment to achieve the effect of real traffic playback.

Rehearse what

The DB to cut off the
The cache to cut off the
A service interruption
.

Simulate some resource or other failure, when the failure occurs, the first time should be the alarm, and then go to check the monitoring and log to locate which part of the service occurred.

Pressure test

The pressure measurement environment should also be close to the real environment. The pressure measurement environment theory can use the drill environment. If conditions are sufficient, a pressure measurement environment can also be built separately.

Pressure test object

It can be an interface
It can be a business
.

Pressure test evaluation

RT: Observe the response time
ERROR: Displays ERROR information
CPU: Check the CPU load
MEM: Observe the memory load
DB: check the database load
Connection number: Indicates the number of network connections
IO: View disk I/O
QPS: Observe the QPS curve

Finally, the QPS supported by the pressure test is evaluated, where is the bottleneck of the limit, and where is the optimization point if the flow increases in the future. If it is CPU intensive, machine resources should be added, and if it is IO intensive, DB-related resources may be added.

Robust coding

This is service-level robustness. Let’s look at application-level robustness. How to elegantly write a set of code, for a programmer is also very important.

Return is recommended instead of else

// bad case
if a {
  do_a()
} else if b {
  do_b()
} 
// good case
if a {
  do_a()
  return
}
if b {
 do_b()
 return
}

Copy the code

retry

When we make RPC calls, we can give several retry opportunities for some very important business scenarios.

for i:=0; i<2; i++{if_,err := rpc.getUserinfo(); err==nil{
     break}}Copy the code

The connection pool

Reuse connections through connection pooling to avoid frequent new connections through three-way handshakes.

redisPool := newRedisPoo()
conn := redisPool.Get()
Copy the code

Timeout control

A timeout period should be set for any RPC request to avoid avalanche.

conn, err := net.DialTimeout(netw, addr, time.Second*2) 
Copy the code

Avoid repeated resource calls

In A business, there will often be func A()->func B() ->func C(), assuming that every func needs to obtain user information, do not get every func (unless the real time requirements are very high), through the way of parameter passing to reduce unnecessary queries.

func A(a) {
 user := DB.getUser()
 B(user)
}
func B(user *User) {
  C(user)
}
func C(user *User){}Copy the code

Don’t have too much function body code

When a function has too many lines of code, it becomes unreadable.

Func doSomething() {// This book omitted 1000 lines}Copy the code

Function names should make sense

For example, get the most recently logged in user.

//bad case
func getLoginUser(a)
//good case
func getRecentLoginUser(a)
Copy the code

The catching

Don’t let exceptions cause your program to quit.

defer func (
  if err:=recover; err! =nil{
    log.Error("something panic (%+v)",err){}})Copy the code

Welcome to pay attention to the public account of the same name, receive computer network, data structure, Redis, mysql, Java, Go, Python and other e-books