preface

Hello, everyone. I am a little boy picking up field snails. Recently, a friend went to an interview and shared a sample interview question. I sorted out the answers for you. If there are incorrect, welcome to point out ha, progress together.

  • What are the maximum values of Redis keys and values?
  • How to use Redis to achieve data deduplication?
  • When does Redis need serialization? What are the ways in which Redis serializes?
  • MySQL > select * from B+ tree;
  • What are the states of the thread pool? What are the ways to get the results of multi-threaded concurrent execution?
  • Thread pool principle? The role of each parameter.
  • What are the usage scenarios for ThreadLocal? The principle? Memory leak?
  • How does Kafka keep messages organized?
  • Do you know Nacos election mechanism? Raft algorithm?
  • Talk about TCC compensation
  • Public number: a boy picking up snails

1. What are the maximum values of Redis key and value?

  • Although the size of the Key is capped at512MHowever, it is generally recommended that the size of the key should not exceed1KB, which can not only save storage space, but also help Redis to search.
  • So is the maximum value of value512M. For a String value, the maximum value is512M, and set, linked list, hash and other key types, the value limit of a single element is also512M.

2. How to use Redis to achieve data deduplication?

  • The Redisset: It can remove duplicate elements. It can also quickly determine whether an element exists in the set. If there are many elements (such as hundreds of millions of counts), elimination takes up a lot of memory
  • The Redisbit: it can be used to implement a higher compression than the set memory count, which is set to 1 or 0 by a bit, indicating the existence of a stored element information. For example, the count of unique visitors to a website can be putuser_idOffset is the offset of bit. If it is set to 1, it indicates that there is access. 1 MB space can store the daily access statistics of more than 8 million users.
  • HyperLogLog: It is difficult to achieve accurate and unique counts of large amounts of data.HyperLogLogYou can use only about 12 k of memory to achieve hundreds of millions of unique counts, and the error is controlled in about one percent.
  • Bloomfilter: A bloomfilter is a data structure that takes up very little space and consists of a long binary vector and a set of Hash mapping functions that are used to retrieve whether an element is in a collection

If you are interested in a Bloom filter, you can read this article. What’s the use?

3. When does Redis need serialization? What are the ways in which Redis serializes?

Let me remind you a little bit about Java serialization, when do you serialize?

  • Serialization: The process of converting a Java object into a byte stream.
  • Deserialization: The process of converting byte streams into Java objects.

Why serialization?

To use a metaphor: as a roaming coder in a big city, moving is the norm. When we move a desk, it’s too big to fit through a smaller door, so we have to take it apart and move it through, and the process of taking it apart is called serialization. The process of putting the desk back together is deserialization.

  • For example, when you want to save the state of an object in memory to a file or database (most commonly, such as redis);
  • Serialization is required when you want to use a socket to send objects over the network.

The RedisSerializer interface is used to serialize Redis keys and values

  • JDK serialization (default)
  • String serialization
  • JSON serialization
  • XML serialization

MySQL > select * from B+ tree; (for example, 100W data, field int type)

InnoDB storage engine the smallest storage unit is a page, a page size is 16K.

B+ leaves store data, and internal nodes store keys + Pointers. The index organization table determines which page the data is in through the binary search method of non-leaf nodes and Pointers, and then finds the needed data in the data page.

Let’s say the height of a B+ tree is 2, so there’s one root and several leaves. The total number of records stored in this B+ tree is = the number of Pointers to the root node * the number of records in a single leaf node.

  • If the data size of a row is 1K, then the number of records that can be stored in a single leaf node = 16K / 1K =16.
  • How many Pointers are stored in non-leaf nodes? We assume that the primary key ID is bigint and the length is 8 bytes (an int is 32 bits, 4 bytes), and the pointer size is set to 6 bytes in InnoDB source code, so 8+6=14 bytes, 16K /14B =16*1024B/14B = 1170

Thus, a B+ tree of height 2 can hold 1170 * 16=18720 such data records. Similarly, a B+ tree of 3 height can store 1170 *1170 *16 =21902400, that is, it can store about 20 million records. The HEIGHT of a B+ tree is 1-3 layers, which can store tens of millions of data.

What are the states of the thread pool? What are the ways to get the results of multi-threaded concurrent execution?

Thread pool and the state of the thread is not the same ah, thread pool has this several status: RUNNING, SHUTDOWN, STOP, TIDYING, TERMINATED.

Private static final int RUNNING = -1 << COUNT_BITS; private static final int SHUTDOWN = 0 << COUNT_BITS; private static final int STOP = 1 << COUNT_BITS; private static final int TIDYING = 2 << COUNT_BITS; private static final int TERMINATED = 3 << COUNT_BITS;Copy the code

The thread pool state switching diagram is as follows:

RUNNING

  • The thread pool in this state receives new tasks and processes tasks in the blocking queue;
  • Call the shutdown() method of the thread pool to switch to the shutdown state;
  • Call the thread pool shutdownNow() method to switch to the STOP state;

SHUTDOWN

  • The thread pool in this state does not receive new tasks, but processes tasks in the blocking queue;
  • The queue is empty, and the tasks executed in the thread pool are also empty and enter the TIDYING state.

STOP

  • A thread in this state does not receive new tasks, does not process tasks in a blocking queue, and interrupts running tasks;
  • The task executed in the thread pool is empty and enters the TIDYING state.

TIDYING

  • This status indicates that all tasks have been run and terminated, and the number of recorded tasks is 0.
  • terminated()The state is TERMINATED

TERMINATED

  • This status indicates that the thread pool is completely terminated

6. Thread pool principle? The role of each parameter.

Constructor of ThreadPoolExecutor:

public ThreadPoolExecutor(int corePoolSize, int maximumPoolSize,long keepAliveTime,TimeUnit unit,
   BlockingQueue<Runnable> workQueue,
   ThreadFactory threadFactory,
   RejectedExecutionHandler handler) 
Copy the code

The role of several core parameters:

  • CorePoolSize: indicates the maximum number of core threads in a thread pool
  • MaximumPoolSize: specifies the maximum number of threads in the thread pool
  • KeepAliveTime: The lifetime of non-core threads in the thread pool
  • Unit: indicates the unit of idle thread lifetime
  • WorkQueue: block queue for storing tasks
  • ThreadFactory: Used to set the factory for creating threads. You can give the threads a meaningful name to facilitate troubleshooting.
  • Handler: Saturation policy events of a line city. There are four types.

Four saturation rejection strategies

  • AbortPolicy(throws an exception, default)
  • DiscardPolicy(Simply discard tasks)
  • DiscardOldestPolicy (Discarding the oldest task in the queue and recommitting the current task to the thread pool)
  • CallerRunsPolicy (to be processed by the thread from which the thread pool call is made)

Thread pool principle:

  • When a task is submitted and the number of surviving core threads in the thread pool is smaller than corePoolSize, the thread pool creates a core thread to handle the submitted task.
  • If the number of core threads in the thread pool is full, that is, if the number of threads is equal to corePoolSize, a newly submitted task will be queued to the workQueue for execution.
  • When the number of threads in the thread pool is equal to corePoolSize and the workQueue is full, determine whether the number of threads reaches maximumPoolSize, that is, whether the maximum number of threads is full. If not, create a non-core thread to execute the submitted task.
  • If the current number of threads reaches maximumPoolSize and a new task comes in, reject it.

To visualize thread pool execution, let me use an analogy:

  • Core threads are compared to company employees
  • Non-core threads are compared to outsourced employees
  • Blocking queues are compared to requirements pools
  • Submitting tasks is like making a request

  • When the product makes a requirement, regular employees (core threads) take the requirement first (perform the task)
  • If all the regular employees have requirements working, i.e., the core thread count is full), the product puts the requirements in the requirements pool (blocking queue) first.
  • What if the demand pool (blocking queue) is also full, but the product continues to demand? Then outsource (non-core threads) to do it.
  • If all the employees (maximum number of threads is also full) are required to do it, then the rejection policy is implemented.
  • If the outsourcer completes the requirement, it leaves the company after a keepAliveTime.

7. What are the usage scenarios of ThreadLocal? The principle? Memory leak?

ThreadLocal, a thread-local variable. If you create a ThreadLocal variable, each thread accessing the variable will have a local copy of the variable. When multiple threads manipulate the variable, they are actually manipulating the variable in their local memory, thus providing thread isolation and avoiding thread-safety issues.

Application scenarios of ThreadLocal

  • Database connection pool
  • Used in session management

ThreadLocal memory structure:

ThreadLocal principle

  • Thread objects holding a ThreadLocal ThreadLocalMap member variables.
  • ThreadLocalMap maintains an array of entries. Each Entry represents a complete object. The key is the ThreadLocal itself, and the value is the generic value of the ThreadLocal.
  • When a thread sets a value to a ThreadLocal, it stores the value to its own ThreadLocalMap, and reads the value from a ThreadLocal to find the corresponding key in its own map, thus achieving thread isolation.

The ThreadLocal memory leaks

Take a look at TreadLocal’s reference diagram.

The key used in ThreadLocalMap is a weak reference to ThreadLocal, as shown below

Weak references: Whenever the garbage collection mechanism runs, the object’s memory is reclaimed, regardless of whether the JVM has sufficient memory.

Weak references are easier to recycle. Therefore, if a ThreadLocal (the Key of a ThreadLocalMap) is collected by the garbage collector, but since ThreadLocalMap has the same life cycle as Thread, it will not be collected if it does not: ThreadLocalMap’s key is missing, but its value is still there, causing a memory leak.

How do I fix memory leaks? When ThreadLocal is used, the remove() method is called to free up memory.

8. How does Kafka keep messages organized?

Kafka guarantees message order by saying:

  • One topic, one partition, one consumer, internal single-thread consumption, single-thread throughput is too low to use this. (Global orderliness)
  • Write N memory queues, all data with the same key to the same queue; Then, for N threads, each thread consumes a queue to ensure orderliness.

You can see how the order of the message queue is derived:

The ordering of messages means that they can be consumed in the order in which they are sent. Some businesses require the order of messages, such as placing an order, then paying for it, completing the order, and so on. Suppose the producer produces two messages successively, namely the order message (M1) and the payment message (M2). M1 is generated before M2. How can we ensure that M1 is consumed before M2?

To ensure the order of messages, M1 and M2 can be sent to the same Server. After M1 sends an ACK, M2 will send the ack again. As shown in figure:

This can still be problematic because there can be network latency from the MQ server to the server, and although M1 is sent first, it arrives later than M2.

So what else can you do to ensure that messages are sequential? Send M1 and M2 to the same consumer, and send M1, wait until the consumer ACK success, send M2.

So that’s the whole idea of message queues being sequential. Globally ordered messages in Kafka, for example, reflect this idea: when a producer sends a message, a Topic corresponds to only one Partition, a Consumer, and an internal single-threaded consumption.

But this throughput is too low, generally ensure that the message local order can be. When a message is sent, the Partition Key is specified. Kafka hashes the Partition Key and determines which Partition to add. In this way, messages with the same Partition Key will be placed on the same Partition. The multiple consumers then consume the specified Partition in a single thread.

9. Do you know the electoral mechanism of Nacos? Raft algorithm?

The function of Nacos as a configuration hub is implemented based on the Raft algorithm.

Raft algorithm is the consensus algorithm of choice for distributed system development. It achieves consensus of a set of values and log alignment of nodes in a “everything follows the leader” approach.

The Raft election discipline involves three roles and terms,

  • Followers: silently receive and process messages from the Leader. When waiting for the heartbeat message from the Leader times out, they actively stand up and recommend themselves as candidates.
  • Candidate: Sends a vote request to other nodes, notifies them to vote, and if they win a majority of votes (N/2+1), they are promoted to Leader.
  • Leader: responsible for handling client requests and log replication. The goal of each election is to elect a Leader. The leader constantly sends heartbeat messages telling other nodes, “I’m the leader, I’m alive, don’t call a new election, don’t find a new leader to replace me.”
  • Term: Much like elections in a democratic society, each new Term of office is called a Term

Leading the election process

  1. At the initial stage, all nodes in the cluster are in the Follower state and are set with a random election timeout time (generally 150ms-300ms) :

  1. If the Follower does not receive a heartbeat from the Leader within the specified timeout period, it initiates an election: it switches its status to Candidate, increases its tenure number, and then sends a request to the other Follower nodes in the cluster asking them to elect it as Leader:

  1. After receiving candidate A’s request to vote, if the other node has not voted during the term numbered 1, it will vote for node A and increase its term number:

  1. After receiving the acceptance vote from more than half of the nodes in the cluster, node A becomes the Leader of the current term and periodically sends heartbeat messages to inform other nodes that I am the Leader to prevent followers from initiating A new election:

10. Talk about TCC compensation

TCC is a solution for distributed transactions. It uses a compensation mechanism, the core idea of which is that for each operation, a corresponding acknowledgement and compensation (undo) operation should be registered. Try-confirm-cancel (TCC) consists of three stages:

  • Try phase: Perform the consistency check for all services and reserve necessary service resources.
  • Confirm phase: Services are submitted without any check because the try phase has already been checked. By default, no error occurs in the Confirm phase.
  • Cancel phase: This phase releases all service resources occupied by the try phase and rolls back all operations performed in the Confirm phase if services fail to be executed.

The following example is used to simulate the process of TCC implementing distributed transactions:

Let’s say user A has A balance of 100 gold pieces and has 5 gifts. A spends 10 gold coins, places an order, buys 10 roses. Balances, orders, gifts are all in different databases.

Try phase of TCC:

  • Generates an order record with an order status to confirm.
  • Update the balance of the gold coin in user A’s account to 90 and freeze the gold coin to 10 (reserved business resources)
  • Set the number of gifts for the user to 5 and pre-increase the number to 10.
  • After the Try succeeds, the Confirm phase is entered
  • If any exception occurs during the Try process, it enters the Cancel phase

TCC Confirm phase:

  • Order status updated to paid
  • Update user balance is 90, can be frozen to 0
  • User gift count updated to 15, pre-increased to 0
  • If any exception occurs during the Confirm process, the Cancel phase is entered
  • If the Confirm process is successful, the transaction ends

Cancel phase of TCC:

  • Change the order status to Cancelled
  • Update the user balance back to 100
  • Update user gift count to 5

  • The advantage of TCC is that you can customize the granularity of database operations, reducing lock conflicts and improving performance
  • The disadvantage of TCC is that the application is highly intrusive, and different rollback strategies need to be implemented according to different failure reasons such as network and system failures, which is difficult to implement. Generally, TCC open-source frameworks such as ByteTCC, TCC-Transaction, and Himly are used.

The last

Thank you for reading here, I hope you can find the ideal offer. Public number: a boy picking up snails