Three sides of Bitdance

Where we left off, programmer Zach went through the second interview and finally came to the last one.

The last one is of course the technical director we are going to talk about. The technical director quietly looked at the resume of the programmer Xiao Zha and said: I saw that your previous work was related to payment, so I ask you what should you do if you received two callback notifications from wechat or Alipay? “Zach” : Oh, we usually do this idempotent, the payment callback usually has information like order ID, we just need to determine whether the same order ID has been processed, if it has been processed, just ignore it. “Director” : What if the inventory reduction succeeds in the payment process, but the order renewal fails? Zach: Well, we usually use transactions to ensure data consistency. Director: So you’re using the MySQL database. Zack: Yes. Governor: Do you use master-slave mode for MySQL? Zach: Yes, master provides write and slave provides read. “Director” : I have a question here. If you need to check the order status before placing an order, if you read the order from the library, will you fail to read the latest data due to the delay? “Zha” : If delay is to be considered, we will read the master library directly in the scenes with high requirements on real-time data, so as to avoid reading old data due to the delay of the slave library. Director: I’ve seen some architectures that have one of the nodes delayed by one hour by default. What do you think is the benefit of that? “Zach” : I understand that. If we delete or update something accidentally and regret it, we can find the data that was mistakenly updated or deleted from the node within an hour through this delayed replication. Director: What log does master/slave mode rely on for replication? Zach: binlog. In normal cases, after binlog is enabled, the changed SQL is sent to secondary nodes. Each secondary node executes the same SQL based on the binlog to ensure data consistency. “Director” : You said that the binlog records the changed SQL, so if I execute SQL like this:

Update user set money=money+0.1 where money=rand()Copy the code

Rand generates a random number, so the primary database and the secondary database will update different data.

“Small firm”: Binlog supports not only statement but also row mode. In this mode, the statement mode is not used to record changes to the SQL statement, but the row mode is used to record changes to the data. Even random numbers like rand() do not have primary/secondary inconsistencies.

“Director”: Row mode is awesome, but row mode seems not perfect, you know the disadvantages of Row mode?

“Small firm”: As we know, row mode consumes storage space, but we can configure mixed mode. In this way, MySQL will judge according to the actual situation. For example, statement mode will be used for SQL that does not cause primary/secondary inconsistency, and row mode will be used for SQL that does. This ensures data consistency without consuming too much storage space by using row mode all the time.

“Director”: Ok, you have installed it, let’s go back to the issue of the order, if the boss has an activity one day, he plans to put 1000 iPad Mini7, the price is 10% discount, but it is reported that there are 100,000 people to grab it, how can you make your interface resist?

“Small firm”: It is as you say, seconds kill it (and what is mini7 ghost, haven’t listed, this watch is a fake, and 1 discount is what meaning, call directly fracture well), normally, divided into front-end current-limiting and back-end current-limiting, front end, such as user in order the order button, let the gray button, click the user for the second time will not, Avoid crazy click caused too much useless requests, back-end, such a large request directly to the database certainly not, you can put the request to the Redis queue, redis performance is still very strong.

“Director”You said the front end grays the button so the user doesn’t have to click it twice. What if the user refreshes the page again?

“Small firm”: Refresh the page, you can request the service side, front for each request not only make it into the queue to be processed, you can also set a hash, hash time complexity is O (1), so that when the user refreshes the page request, can according to customer’s orders are in redis hash table to tell whether the current can continue to place the order.

“Director”: Is there anything else you can do to reduce the number of page refresh requests?

“Small firm”: Yes, for the user who has placed an order, you can save a mark to the local localStorage browser, so that the next refresh time can first check whether the local localStorage has a corresponding mark, if so, do not need to go to the server request, indicating that the current order is queuing, For users who have not ordered malicious refresh, the front end can make an interception, for example, even if crazy refresh, not every time to request, but every 1s request.

“Director”The server uses hash to determine if the user has placed an order. For example, a user with userID =100 will hash[100]=1 if the user has placed an order. The speed of the hash table should be fine, but is there any other way to save space?

“Small firm”: Can be saved using bitmap, each user_id takes up one bit, very memory saving.

“Director”: Before, you said that every successful order will enter the queue, there is a problem, there are only 1000 goods, if there are 10000 sorting users, that is not 9000 users can not grab.

“Small firm”: Before (really, only in a small company received the payment, the day is 10 to single, must I kill, yellow, and if it weren’t for the former company I was killed by a company s, will be you today ask seconds kill ~), it does, but is due to the activity, the quantity is limited, we can get the total quantity set in advance to redis, every come to such a request, The corresponding total number is reduced by one, when the total number is equal to 0, directly tell the user no inventory, do not enter the sort of queue, so that the client gets the server no inventory information, set localStorage again, so that no matter you have no order are directly returned, do not have to request the server, This can even block a large number of unwanted requests.

“Director”: This looks good, but there seems to be no guarantee of consistency of the data, such as setting up the bitmap, failing to destock, failing to queue up the order, etc.

“Small firm”: (you are really 🐶), Redis also supports transactions, but does not support rollback, so you need to implement rollback yourself, for example, in the case of inventory reduction failure, we can restore the Bitmap, or do not want to restore, we can use compensation to make the order successful, for example, when there is an error, The wrong order is put into another checksum compensation queue, and the checksum compensation script makes the corresponding compensation action.

“Director”: Do you use much redis in your production environment?

“Small firm”: That’s quite a lot.

“Director”: So you are using the cache invalidation and then database read scenario?

“Small firm”: Not really, because what I do is the C terminal, and the traffic is relatively large, about the magnitude of 10WQPS. And incidentally, most of the businesses I am responsible for are read scenarios. In order to ensure a faster response, I directly use Redis on my side, and there is no failure to read database scenarios.

“Director”If the cache fails, there will be no data to read.

“Small firm”: Scripts are flushed periodically, which means that some caches are not invalid. If changes are made, scripts are flushed automatically.

“Director”You said your QPS is 10W. If the traffic bursts one day, will there be a problem with your service?

“Small firm”: Sudden flow is true, but our service is not only able to support 10WQPS, generally we will slightly expand the resources, for example, it can support 2 times of the current QPS, so even if the sudden flow, we are not afraid. At the same time, our interface also supports traffic limiting. If the traffic of an interface exceeds the expectation, we do not process the extra requests, so as to prevent the service from crashing.

“Director”: What traffic limiting algorithm do you use?

“Small firm”: Token bucket.

“Director”: Can you explain how token buckets work?

“Small firm”: (comes, don’t want to ask me how I can make the aircraft artillery, but let me screw), how to say, actually barrel is allowed to flow, barrels into the token is, of course, every time want to handle the request, must first get the token from the bucket, if not get the token into the token big probability is behind the request, the speed of the then current limit.

“Director”: Do you know how to do that?

“Small firm”A bucket could be a queue, and a process would put tokens on one end of the queue, and our program would get tokens on the other end.

“Director” : Yes, you can, but you need a separate process to put in the token, which is a waste of resources, please think about whether there is another way, you can write it, just pseudo-code. “Zach” :(writing out) ok, I’ll meditate for a moment, while you give me a song.

Deep in thought, Zach constructed a clear logical framework in his head:

First of all, the capacity of the bucket must be set by us, assuming m. Secondly, how many tokens should be placed per second, which is the corresponding QPS and also configured. Assuming n tokens are placed per second, the N is also the speed at which tokens are placed.
The interviewer said with a new process to put into the token is a waste of resources, so don’t need the additional process to add, that is each time you take a token in real-time, so you don’t need other process, suppose that a new request, and then when the time is the time request, the request after the next time the request arrives, Then the previous time is the time when the token was taken lasttime. As long as you know the difference between the current time now and the lasttime when the token was taken lasttime, multiply it by the rate of putting the token into the token, it is the token that should be put into the token.
Therefore, it is reasonable to calculate how many tokens should be put in each request. The token= time * speed, that is, if it can be put in, it will be put in directly, and then take out again. If it cannot be put in, it will not be put in, and take out directly, and if it cannot get out, it will limit the flow.

Zach: How about this:

Func isLimit() bool {var (leftToken // Remaining token M // Bucket capacity) DiffTime = now-last // Time difference inToken = Difftime *n // Number of tokens that should be added if InToken + leftToken <= m {// Smaller than the bucket capacity leftToken = inToken + leftToken} else {// Larger than the bucket capacity leftToken = m} if leftToken > 0 { Return true} return false // No token}Copy the code

“Director” :(unexpectedly write out ~, can), the core is written out, but need to consider the problem of concurrency, unlocked. “Zach” : Exactly. “Director” : ok, we continue to, current limiting can restrict the flow of stronger-than-expected, that now consider a problem, first of all, traffic is normal, but we rely on the service problems, rely on the service party in 500, for example, in a short period of time can’t restore, if still keep to request, isn’t it a waste of time. “Zach” : You can fuse it. Inspector: Oh, tell me more. “Xiao Zha” : The circuit breaker means that if the dependent party reaches the circuit breaker standard for the number of errors within a certain period of time, the service will be directly cut off without calling it. And this standard is also configurable, such as 100% errors within 10s. Dg: And after the circuit breaker, if the dependent party is restored, do we continue the circuit breaker? “Zach” : Will not, of course, when the trigger fusing, relying party appear the problem, but we can’t always fuse, we can in fusing within a certain time point, put some traffic in test, such as 1 minute after fusing put 1% of the traffic to see first, if the relying party good, then we continue to enlarge the flow, until the return to 100%, But if the dependent party is still not good, then continue to fuse, wait until the next volume sniffing, so constantly retry, if the dependent party is good, also do not need manual intervention, if the dependent party has not been good, will not always brainless request. Director: From a business point of view, users will be affected if they don’t get the data during the circuit breaker. Is there any solution? “Small firm” : yes, but you should see the business scenario, we can configure some of the data out, after the fuse, although we can’t, by to want to rely on data, but we can return to the beginning to this scenario and configure static data, out to see the business scenario, of course, some businesses may not take them. Director: Do you use message queues at work? Zach: Yes, mainly Kafka. Director: What problems do you use Kafka to solve? Zack: Traffic peak clipping, asynchronous task. Director: Tell me about the benefits of the consumer group. “Zach” : First, a topic can have multiple partitions, and then a group can have multiple consumers. These consumers together form a consumer group. The consumers in the consumer group can consume different topic partitions, so as to achieve a load balancing effect. “Director” : What is a broker in Kafka? Producers write messages to a specific Topic in Brokers. Consumers pull messages from a specific Topic in Brokers for business processing. The broker acts as a relay station for the broker to save messages. AR: sets of all copies, ISR: sets of all copies eligible for election, OSR: sets of all copies eligible for election AR = ISR + OSR. Under normal circumstances, AR should be the same as ISR. However, when a Follower replica falls too far behind or a Follower replica fails, it is removed from ISR and placed into OSR. The first replica in the ISR is elected as the new Leader node. For example, if AR=[1,2,3] and 1 fails, then ISR=[2,3],2 will be elected as the new Leader. Director: What does ZooKeeper do in Kafka? Can zooKeeper not be used? Zookeeper is a distributed coordination component. Earlier versions of Kafka used ZK to store meta information, consumer status, group management, and offset values. Considering some factors of ZK itself and the probability of single point problems in the whole architecture, the role of ZooKeeper is gradually weakened in the new version. The new consumer uses the group Coordination protocol within Kafka and reduces the dependency on ZooKeeper, but the broker still relies on ZK. Zookeeper is also used in Kafka to elect controllers, detect brokers, and so on. Director: Why is Kafka so fast? Sequential write: Since modern operating systems provide read-ahead and write techniques, sequential writes to disk are mostly faster than random writes to memory. Zero-copy: the zero-copy technology reduces copy times. Batching of Messages: Merge small requests and stream them to each other, reducing network overhead. “Director” : when the consumer submits the consumption shift, does it submit the offset of the latest news currently consumed or offset+1? Zha: offset+1. Inspector: Is there anything you want to ask me? “Zach” : Excuse me, what is the main job of this position? Director: Curd. “Lucky” : Ok, no more questions. Director: OK, just wait here for a moment. Zach: Ok, ok (hr should be next).

Finally, programmer Xiao Zha succeeded in getting the offer of Bitbeat company with a 50% increase in salary.

The last

Imperceptibly has written three periods of programmer small zaobi interview, although the story is made up, but the content is my own, and three interviews almost do not exist repeated knowledge points, creation is not easy, you praise is the biggest support for the author, is the author’s biggest creative power, we see next period.

Pay attention to the public number [pretend to understand programming], the programmer is waiting for you here.

The beating side of the bit
The two sides of the beat
How did network IO go to zero copy

The last

Related Posts

Cloud Foundry is the only Cloud provider in China, Aliyun continues to link Cloud Foundry/Kubernetes ecosystem

The session to understand

C++ function related knowledge point analysis