preface

Hello everyone, I am a little boy picking up field snails, a partner interviewed byte (four and a half years of work experience), share the interview real questions, everyone refuels together ha.

  1. Why is Redis fast
  2. Redis has several data structures. How are the underlying data structures stored
  3. Redis has several ways of persisting
  4. How to ensure thread safety in multi-threaded situation?
  5. Ever used volatile? What’s the underlying principle?
  6. MySQL index structure, clustered index and non-clustered index difference
  7. MySQL has several high availability solutions, which one are you using
  8. Tell me about the most challenging project you have ever worked on
  9. What scheme does seckill adopt
  10. Talk about inventory and table, need to stop
  11. What if Redis dies?
  12. How do you keep coupons from being swiped more than once?
  13. How to design douyin comment system
  14. How to design a short chain address
  15. If you have an array of non-repeating elements, you go up and then down to find the largest value in the array
  • Public number: a boy picking up snails

1. Why is Redis fast

1.1 Implementation based on memory storage

Memory read and write is much faster than in disk, Redis based on memory storage database, compared with the data in disk MySQL database, save disk I/O consumption.

1.2 Efficient data structures

Mysql index selects B+ tree data structure for efficiency. In fact, reasonable data structure, is to make your application/application faster. Let’s take a look at Redis data structure & internal coding diagram:

1.2.1 SDS Simple dynamic string

  • String length processing: Redis obtains string length, the time complexity is O(1), while in C language, it needs to iterate from scratch, the complexity is O(n);
  • Space preallocation: The more frequently strings are modified, the more frequently memory is allocated, which consumes performance, whereas SDS modification and space expansion allocate additional unused space and reduce performance loss.
  • Lazy space release: When SDS is shortened, it does not recycle the excess memory space, but free records the excess space. If there is subsequent change, the space recorded in FREE is directly used to reduce allocation.

1.2.2 dictionary

Redis is a K-V in-memory database, and all key values are stored in dictionaries. A dictionary is a hash table, such as a HashMap, where a value can be obtained directly from a key. The property of a hash table is that the value can be obtained in O (1) time complexity.

1.2.3 jump table

  • Skip list is a unique data structure of Redis, which adds multi-level indexes on the basis of linked list to improve search efficiency.
  • Skip lists support node lookup with average O (logN), worst O (N) complexity, and batch processing of nodes through sequential operations.

1.3 Reasonable data encoding

Redis supports multiple data types, each base type, and possibly multiple data structures. When, what data structure to use, and what code to use are the results of redis designers’ summary and optimization.

  • String: If a number is stored, the encoding is int; If a non-numeric string of 39 bytes or less is stored, it is embstr. If the value is larger than 39 bytes, it is raw encoding.
  • List: If the List has less than 512 elements and each element of the List has a value less than 64 bytes (default), use ziplist encoding, otherwise use LinkedList encoding
  • Hash: If the number of Hash elements is less than 512 and all values are less than 64 bytes, use ziplist, otherwise use Hashtable.
  • Set: IntSet encoding is used if all the elements in the collection are integers and the number of elements is less than 512, otherwise hashTable encoding is used.
  • Zset: ziplist encoding is used when the ordered set has less than 128 elements and the value of each element is less than 64 bytes, otherwise skiplist (skiplist) encoding is used

1.4 Reasonable threading model

I/O multiplexing

Redis uses EPoll as an implementation of I/O multiplexing technology that allows a single thread to efficiently process multiple connection requests. In addition, Redis’ own event processing model converts epoll connections, reads and writes, and closes into events, and does not waste too much time on network I/O.

2. Redis has several data structures. How are the underlying data structures stored

There are five basic types of Redis:

  • String (String)
  • Hash
  • List (List)
  • Set
  • Zset (Ordered set)

It also has three special types of data structures

  • Geospatial
  • Hyperloglog
  • Bitmap

2.1 Five basic data types of Redis

String (String)

  • Summary :String is the most basic data structure type of Redis. It is binary safe and can store images or serialized objects with a maximum value of 512 megabytes
  • Simple usage examples include set key value and Get key
  • Application scenarios: Shared session, distributed lock, counter, and traffic limiting.
  • Int (8-byte long integer)/EMbstr (less than or equal to 39 bytes)/RAW (more than 39 bytes)

Hash

  • Summary: In Redis, a hash type means that v (value) is itself a key-value pair (k-V) structure
  • Simple usage examples: hset key field value, hGET key field
  • Internal encoding: ziplist (compressed list), hashtable (hashtable)
  • Application scenario: Cache user information.

List (List)

  • The list type is used to store multiple ordered strings. A list can hold up to 2^32-1 elements.
  • Simple practical examples:lpush key value [value ...] , lrange key start end
  • Internal coding: ziplist (compressed list), linkedlist (linkedlist)
  • Application scenario: message queue, article list

Set

  • Summary: The set type is also used to hold multiple string elements, but does not allow duplicate elements
  • Sadd key element [Element…] , smembers key
  • Internal encoding: intset (integer set), hashtable (hashtable)
  • Application scenarios: user tags, generate random number lottery, social needs.

Ordered set (Zset)

  • Summary: a sorted collection of strings and elements that cannot be repeated
  • Simple format examples:zadd key score member [score member ...] , zrank key member
  • Underlying internal coding: ziplist (compressed list), skiplist (skiplist)
  • Application scenario: leaderboards, social needs (such as user likes).

2.2 Three special data types of Redis

  • Geo: launched by Redis3.2, geo-location positioning, used to store geo-location information, and to operate the stored information.
  • HyperLogLog: A data structure used to make cardinal statistics algorithms, such as the UV of a statistical website.
  • Bitmaps: Map the state of an element using a single bit. In Redis, the underlying implementation is based on strings. Bitmaps can be used as an array of bits

3. Redis has several persistence methods

Redis is a memory-based non-relational K-V database, and since it is memory-based, if the Redis server fails, the data will be lost. To avoid data loss, Redis provides persistence, that is, saving data to disk.

Redis provides two persistence mechanisms, RDB and AOF. The process of persistent file loading is as follows:

3.1 RDB

RDB is to save memory data to disk in the form of snapshot.

What is a snapshot? One way to think about it is, take a picture of the data at the current moment and save it.

RDB persistence refers to writing snapshots of data sets in memory to disk within a specified time interval and a specified number of write operations. It is the default Redis persistence mode. After the operation is complete, a dump. RDB file is generated in the specified directory. When Redis restarts, the dump. RDB file is loaded to restore data. RDB trigger mechanisms are as follows:

The advantages of RDB

Suitable for large-scale data recovery scenarios, such as backup and full replication

RDB shortcomings

  • There is no way to do real-time persistence/second persistence.
  • The RDB format is incompatible with the old version

3.2 AOF

AOF (Append only File) is persisted. Each write operation is recorded in the form of a log and appended to a file. When the file is restarted, the command in the AOF file is executed again to recover data. It mainly solves the real-time problem of data persistence. It is disabled by default.

AOF’s workflow is as follows:

The advantages of AOF

Higher data consistency and integrity

The disadvantage of AOF

The more AOF records, the larger the file, and the slower data recovery.

4. How to ensure thread safety in the case of multi-threading?

Add locks, such as pessimistic lock select for update, sychronized, such as optimistic lock, optimistic lock such as CAS, and Redis distributed lock, etc.

5. Did you use volatile? How does it guarantee visibility, and how does it work

The volatile keyword is the lightest synchronization mechanism provided by the Java Virtual machine and is used as a modifier to modify variables. It guarantees variable visibility to all threads and disallows instruction reordering, but does not guarantee atomicity.

Let’s start with the Java Memory Model (JMM) :

  • The Java Virtual Machine specification attempts to define a Java memory model that screens out differences in memory access across hardware and operating systems, so that Java programs can achieve consistent memory access across platforms.
  • The Java memory model specifies that all variables are stored in main memory and that each thread has its own working memory. Variables here include instance and static variables, but not local variables, which are thread private.
  • The working memory of a thread holds a primary memory copy of variables used by the thread. All operations on variables must be done in the working memory of the thread rather than in main memory. And each thread cannot access the working memory of other threads.

Volatile guarantees that new values are immediately synchronized back to main memory and flushed from main memory immediately before each use, so we say that volatile guarantees visibility for multithreaded variables.

Volatile guarantees visibility related to memory barriers. Let’s take a look at the demo code for volatile:

public class Singleton { private volatile static Singleton instance; private Singleton (){} public static Singleton getInstance() { if (instance == null) { synchronized (Singleton.class) { if (instance == null) { instance = new Singleton(); } } } return instance; }}Copy the code

After compiling, we compared the assembly code generated with and without volatile, and found that with volatile, an additional lock addl $0x0,(%esp) was added. The lock instruction acts as a memory barrier

The LOCK instruction acts as a memory barrier that guarantees the following:

  1. Subsequent instructions cannot be reordered to the location in front of the memory barrier
  2. Writes this processor’s cache to memory
  3. If it is a write action, the corresponding cache in other processors will be invalidated.

Points 2 and 3 are for volatile and visibility

MySQL > alter table MySQL > alter table MySQL > alter table MySQL > alter table MySQL > alter table MySQL

  • A table can have only one clustered index, but a table can have multiple clustered indexes.
  • An index is described by the data structure of a binary tree. We can think of a clustered index as: the leaf nodes of the index are data nodes. A leaf node that is not a clustered index is still an index node but has a pointer to the corresponding data block.
  • Clustered indexes: Physical storage is sorted by index; Non-clustered index: Physical storage is not sorted by index

7. MySQL has several high availability solutions, which one do you use

  • Master/slave or master/master semi-synchronous replication
  • Semi-synchronous replication optimization
  • High availability architecture optimization
  • Shared storage
  • Distributed protocol

7.1 Master/Slave or Master/Master Semi-synchronous Replication

Set up unidirectional or bidirectional semi-synchronous replication with two-node database. The structure is as follows:

It is usually used in conjunction with third-party software such as proxy and Keepalived to monitor the health of the database and execute a series of administrative commands. If the primary database fails, the database can continue to be used after switching to the standby database.

This scheme has the advantages of simple architecture and deployment, and can be switched directly when the host is down. The disadvantage is that data consistency cannot be guaranteed because semi-synchronous replication is degraded to asynchronous replication. In addition, we need to consider the high availability mechanism of HaProxy and Keepalived.

7.2 Semi-Synchronous Replication Optimization

The semi-synchronous replication mechanism is reliable and ensures data consistency. However, if the network fluctuation occurs, the semi-synchronous replication will be switched to asynchronous replication after timeout, and the data consistency cannot be guaranteed by the remote replication. Therefore, it can be optimized on the basis of semi-identical replication to ensure semi-identical replication as much as possible. Such as the two-channel replication scheme

  • Advantages: This solution is simple in architecture and deployment, and can be directly switched when the host is down. Compared with semi-synchronous replication in solution 1, this method ensures data consistency.
  • Disadvantages: need to modify the kernel source code or use mysql communication protocol, does not fundamentally solve the data consistency problem.

7.3 Ha Architecture Optimization

To ensure high availability, you can expand a primary/secondary two-node database into a database cluster. Zookeeper can be managed as a cluster. It uses distributed algorithms to ensure data consistency in a cluster, avoiding network partitioning.

  • Advantages: The high availability of the whole system is guaranteed, and the scalability is good, which can be extended to large-scale clusters.
  • Disadvantages: Data consistency still depends on native mysql semi-synchronous replication; The introduction of Zookeeper complicates the system logic.

7.4 Shared Storage

Shared storage decouples the database server from the storage device. Data synchronization between different databases does not rely on MySQL’s native replication function, but uses disk data synchronization to ensure data consistency.

DRBD Disk replication

DRBD is a software-based, shareless storage replication solution for mirroring block device content between servers. It is used to mirror data on disks, partitions, and logical volumes between servers. When a user writes data to a local disk, the data is also sent to another host’s disk on the network. In this way, data on the local host (active node) and remote host (standby node) can be synchronized in real time. Common architectures are as follows:

When the local host is faulty, the remote host retains the same data, which ensures data security.

  • Advantages: Simple deployment, reasonable price, and strong data consistency
  • Disadvantages: The secondary library does not provide read operations

7.5 Distributed Protocol

Distributed protocol can solve the data consistency problem well. A common deployment solution is MySQL Cluster, which is an official cluster deployment solution. It uses the NDB storage engine to back up redundant data in real time to achieve high database availability and data consistency. As follows:

  • Advantages: Does not rely on third-party software to achieve strong data consistency;
  • Disadvantages: Complex configuration; Need to use NDB storage engine; At least three knots;

8. Tell me about the most challenging projects you have worked on, which modules you worked on, which were the most challenging, and what optimizations you made

You can say “ha” based on your actual project. You can also add my wechat and communicate with me. Come on, come on.

9. What plan should be adopted?

To design a seckill system, consider these questions:

How to solve these problems?

  • Page statics
  • Button to grey control
  • Service single responsibility
  • Add salt to the kill link
  • Current limiting
  • A distributed lock
  • MQ asynchronous processing
  • Current limiting & Downgrading & Fusing

9.1 Static Page

The majority of the content of the active page is fixed, such as commodity name, commodity picture and so on. The active page can be static processing, reducing the request to visit the server. The second kill users will be distributed all over the country, some in Shanghai, some in Shenzhen, the geographical distance, the Internet speed is also different. To enable users to access the active page in the fastest way, you can use the Content Delivery Network (CDN). CDN allows users to get their content close to where they want it.

9.2 Button to Grey Control

Before the SEC kill activity starts, the button generally needs to be gray. Only when the time is up can it become clickable. This is to prevent the user from frantically requesting the server a few seconds before the clock runs out, and then the server hangs up before the clock runs out.

9.3 Single responsibility for service

We all know the idea of microservice design, which is to break up the functional modules, put the similar functional modules together, and deploy them in a distributed manner.

If the user login related, design a user service, order related to make an order service, and then to the gift related to make a gift service and so on. Then, the business logic related to seckill can also be put together to create a seckill service, a separate seckill database for it.”

A single responsibility for services has an advantage: if a server fails to cope with high concurrency, it will not affect other services on the system.

Add salt to 9.4 SEC kill links

If the link is exposed in clear text, someone will get the requested Url, killing it in seconds. Therefore, you need to salt the kill link. URL can be dynamic, such as the MD5 encryption algorithm to encrypt random strings to make URL.

9.5 current limiting

There are two types of traffic limiting: Nginx traffic limiting and Redis traffic limiting.

  • To prevent a user from requesting too frequently, we can limit the traffic of the same user.
  • To prevent scalpers from simulating several user requests, we can limit the flow of a certain IP address.
  • To prevent someone from using a proxy and changing IP requests with each request, we can limit the flow of the interface.
  • The Sentinel and Hystrix components of Ali can also be used to limit the current in order to prevent the instantaneous excessive flow from overwhelming the system.

9.6 Distributed Lock

Redis distributed locks can be used to solve the oversold problem.

Use Redis SET EX PX NX + to verify a unique random value, then delete and release the lock.

If (jedis.set(key_resource_id, uni_request_id, "NX", "EX", 100s) == 1){// try {do something // catch(){} finally {// Check whether the current thread is holding the lock (uni_request_id.equals(jedis.get(key_resource_id))) { jedis.del(lockKey); // Release the lock}}}Copy the code

In this case, determining whether the current thread has added a lock and releasing a lock is not an atomic operation. If jedis.del() is called to release the lock, the lock may no longer belong to the current client.

For more rigor, lua scripts are often used instead. The lua script is as follows:

if redis.call('get',KEYS[1]) == ARGV[1] then 
   return redis.call('del',KEYS[1]) 
else
   return 0
end;
Copy the code

9.7 MQ asynchronous processing

If the instantaneous traffic is particularly heavy, you can use message queue peaking and asynchronous processing. When a user request comes in, it is put in the message queue before being consumed.

9.8 Current limiting & Degradation & Fusing

  • Traffic limiting is to limit requests to prevent excessive requests from overwhelming the server.
  • Downgrade, is the second kill service has a problem, downgrade processing, do not affect other services;
  • Fuse, the service has a problem on the fuse, the general fuse downgrade is to appear together.

10. Talk about the branch table, why should the branch table stop the operation, and what can be done if the operation does not stop

10.1 Separate database and table scheme

  • Horizontal sorting: Split the data in one library into multiple libraries based on fields according to certain policies (hash, range, etc.).
  • Horizontal table: Split the data in one table into multiple tables based on fields and policies (such as hash and range).
  • Vertical database division: Split different tables into different databases based on service ownership.
  • Vertical split table: Separate the fields in the table into different tables (main table and extended table) according to the activity of the fields.

10.2 Common sub-database and sub-table middleware:

  • Sharding – JDBC (dangdang)
  • Mycat
  • TDDL (Taobao)
  • 58-city Database Middleware
  • Vitess (Database middleware developed by Google)
  • Atlas(Qihoo 360)

10.3 Possible problems in database and table classification

  • Transaction problem: Need to use distributed transaction
  • Cross-node Join problem: Solving this problem can be implemented in two queries
  • Count, Order BY, Group BY, and aggregate function issues across nodes: get results on each node and merge them on the application side.
  • Data migration, capacity planning, and capacity expansion
  • ID problem: After the database is shard, it can no longer rely on the primary key generation mechanism of the database itself. The simplest is to consider UUID
  • Sort paging across shards

10.4 Should I stop using the meter? How do I do without stop?

Don’t have to. What to do when you’re taking it nonstop, in five steps:

  1. Write the proxy layer, add a switch (to control whether to access the new DAO or the old DAO, or both), grayscale, or the old DAO.
  2. After the release of the full version, enable double write, both in the old table to add and modify, also in the new table to add and modify. The log or temporary table records the start value of the new table ID. The data smaller than this value in the old table is the existing data, and this batch of data is to be migrated.
  3. Write the old table’s stock data to the new table by script.
  4. Stop reading from the old table. In this case, the new table already bears all read and write services. Do not stop writing to the old table immediately.
  5. After reading and writing to the new table for some time, you can stop writing to the old table if there are no business problems

11. What if Redis hangs up?

Redis is a memory-based non-relational K-V database, and since it is memory-based, if the Redis server fails, the data will be lost. To avoid data loss, Redis provides persistence, that is, saving data to disk.

Redis provides two persistence mechanisms, RDB and AOF. The process of persistent file loading is as follows:

11.1 RDB

RDB is to save memory data to disk in the form of snapshot.

What is a snapshot? One way to think about it is, take a picture of the data at the current moment and save it.

RDB persistence refers to writing snapshots of data sets in memory to disk within a specified time interval and a specified number of write operations. It is the default Redis persistence mode. After the operation is complete, a dump. RDB file is generated in the specified directory. When Redis restarts, the dump. RDB file is loaded to restore data. RDB trigger mechanisms are as follows:

The advantages of RDB

  • Suitable for large-scale data recovery scenarios, such as backup and full replication

RDB shortcomings

  • There is no way to do real-time persistence/second persistence.
  • The RDB format is incompatible with the old version

11.2 AOF

AOF (Append only File) is persisted. Each write operation is recorded in the form of a log and appended to a file. When the file is restarted, the command in the AOF file is executed again to recover data. It mainly solves the real-time problem of data persistence. It is disabled by default.

AOF’s workflow is as follows:

The advantages of AOF

  • Higher data consistency and integrity

The disadvantage of AOF

  • The more AOF records, the larger the file, and the slower data recovery.

12. How do you keep coupons from being swiped twice?

For repeated requests, consider interface idempotent and anti-duplicate.

Take a look at the article I wrote earlier: Talk about idempotent design

Brush – proof, you can limit the flow and join the blacklist.

  • To prevent one user from requesting coupons too often, we can limit the flow of the same user.
  • To prevent scalpers from simulating several user requests, we can limit the flow of a certain IP address.
  • To prevent someone from using a proxy and changing IP requests with each request, we can limit the flow of the interface.

13. How to design douyin comment system? How to add friend relationship?

Performance needs to be considered, as well as scalability. Have you ever made comments, friends and other project needs ah, play your smart little brain, how to answer this question.

14. How to design a short chain address, considering the problem of cross-machine room deployment

14.1 Why is a Short Connection Required?

Why do you need a short connection? Don’t long links smell good? Some platforms have length limits, and links that are too long can easily be identified as hyperlinks, etc.

14.2 Mechanism of short Links

It’s just a 302 redirect.

302 Status code Indicates temporary redirection.

14.3 Method of short link generation

Short chains can be generated using hash algorithms, but there will be hash conflicts. How do you solve it? You can use a Bloom filter.

Is there another plan? Auto-increment algorithm, each time a long chain is received, an ID is assigned and converted to base 62 concatenated to the short field. Because of high concurrency, ID autoproliferators can become bottlenecks.

There are generally four distributed ID generation methods:

  1. Uuid, which is guaranteed to be unique to all machines in the same space and time, but generates long ids in this way and is disordered, which wastes space to insert.
  2. Snowflake algorithm, this solution is good, but if the system clock of a machine is rolled back, it may cause ID conflicts and duplicate, or IDS out of order (consider cross-room deployment problem).
  3. Mysql increases the primary key. In high concurrency, db write pressure will be great
  4. Using Redis to do the self-increasing ID generator, high performance, but to consider the problem of persistence; Or modify the Snowflake algorithm to solve the clock callback problem by modifying workId)

15. Find the maximum value of an array of integer elements that are sorted in ascending order and then descending order.

For example: 1,3,5,7,9,8,6,4,2 write a function to find the largest element in the array

This is going to be easy for everyone, because if you use quicksort, the complexity is O(NLGN), and the interviewer may not be very satisfied. You can actually use binary search, just find the last element in the ascending order.

Let’s take 1,6,5,4,2 as an example.

How do you reduce space with dichotomy? Just compare the middle element to its next element

  1. If the middle element is larger than its next element, the maximum value is to the left, so the right pointer moves to the left
  2. If the middle element is smaller than its next element, the maximum value is to the left, so the right pointer moves to the left

Nums [mid]=5>nums[mid]=4 right=mid =2-1

Mid = mid (mid+1)/ mid = mid (mid+1)/ mid = mid (mid+1)/ mid = mid (mid+1)/ mid = mid (mid+1))

[left] =nums[1]=6

The implementation code is as follows:

Class Solution {public static void main(String[] args) {int[] nums = {1,3,5,7,9,8,6,4,2}; System.out.println(getLargestNumInArray(nums)); } private static int getLargestNumInArray(int[] nums) { if (nums == null || nums.length == 0) { return -1; } int left = 0, right = nums.length - 1; while (left <= right) { int mid = left + ((right - left) / 2); if (nums[mid] < nums[mid + 1]) { left = mid + 1; } else { right = mid - 1; } } return nums[left]; }}Copy the code