Linux kernel: an ingenious unlocked ring queue

Kernel version Linux 2.6.12
Author Where that accompany a carp
Email [email protected]
Date 2021.7.12

1. Introduction

In The Return of the Condor Heroes, Jin Yong said that when Dugu sought the defeated xuan Iron epee, “Epee has no edge, and big Qiao does not work”. He said that if one reaches a certain level of personal cultivation, “flowers, stones, plants and trees can be turned into swords”, and no more skills are needed. There is no shortage of concise, beautiful, and efficient implementation code in the Linux kernel. What is lacking is the eye and perseverance to find that beauty. In the Linux kernel, the simplicity and efficiency of code does not mean that there is a secret recipe. On the contrary, they often use the most basic knowledge and data structures to achieve perfect code, and KFIFO can be said to be a good example of this.

The reason why the term “lockless ring queues” in Linux is not appropriate here is that lockless ring queues are elaborate, simple, ingenious, and not simple. It uses the most basic technical knowledge to achieve important functions. Here we have a glimpse of him.

2. Kfifo profile

This article examines the version of the original code 2.6.12
Kfifo header file Linux – 2.6.12 \ include/Linux/kfifo. H
Source file of KFIFO Linux – 2.6.12 \ kernel \ kfifo. C

Kfifo is a “First In First Out “data structure that uses the previously mentioned ring buffer to provide an unbounded byte stream service. The advantage of using a ring buffer is that when one data element is used up, the remaining data elements do not need to move their storage location, thus reducing copying and improving efficiency. More importantly, == KFIFO adopts parallel lockless technology. Kfifo realizes single production/single consumption mode of shared queue without locking ==.

The origin of parallel lockless technology:

Most of the current high performance server software (such as HTTP accelerators) runs on multi-core servers. Current hardware can support 32, 64 or more cpus. In such a high concurrency environment, lock contention sometimes hurts system performance more than data copy, context switch, etc. Important data structures need to be moved down from lock protection to lock-free environments to improve software performance.

Therefore, lockless mechanism is becoming more and more popular, and using different lockless queues in different environments can save cost and improve program efficiency.

struct kfifo {
	unsigned char *buffer;	/* the buffer holding the data */
	unsigned int size;	/* the size of the allocated buffer */
	unsigned int in;	/* data is added at offset (in % size) */
	unsigned int out;	/* data is extracted from off. (out % size) */
	spinlock_t *lock;	/* protects concurrent modifications */
};
Copy the code

Kfifo structure of the meaning of the fields:

buffer A cache used to store data
size The size of the buffer space, to the power of 2
in Point to buffer squadron leader
out Points to the end of the queue in the buffer
lock

Kfifo unlocked queue application note:

  • == Single producer/single consumer does not need to use locks for synchronization ==
  • == Not used. Kfifo_reset ()==
  • == KfiFO_reset_out ()== is used only on the consumer side

Kfifo lockless queue can be used when the above three conditions are met. On the other hand, if there are multiple producers or consumers, the lock can be used to synchronize:

  • == Multiple producers and one consumer mode, lock synchronization on the producer side ==
  • == Single producer with multiple consumers. Consumer lock synchronization ==

Kfifo as a basic FIFO structure, including the queue function ___kfiFO_PUT, queue out function __kfifo_get() and other basic operations. Here are the details.

3. Initialize the Kfifo

The initialization of Kfifo refers to the operation of allocating space for Kfifo and initializing various parameters in Kfifo.

/** * kfifo_alloc - allocates a new FIFO and its internal buffer * @size: the size of the internal buffer to be allocated. * @gfp_mask: get_free_pages mask, passed to kmalloc() * @lock: the lock to be used to protect the fifo buffer * * The size will be rounded-up to a power of 2. */
struct kfifo *kfifo_alloc(unsigned int size, unsigned int __nocast gfp_mask, spinlock_t *lock)
{
	unsigned char *buffer;
	struct kfifo *ret;

	/* * round up to the next power of 2, since our 'let the indices * wrap' tachnique works only in this case. */
	if (size & (size - 1)) {/* If it's not a power of 2, then it goes up to a power of 2 */
		BUG_ON(size > 0x80000000);
		size = roundup_pow_of_two(size);
	}

	buffer = kmalloc(size, gfp_mask);
	if(! buffer)return ERR_PTR(-ENOMEM);

	ret = kfifo_init(buffer, size, gfp_mask, lock);

	if (IS_ERR(ret))
		kfree(buffer);

	return ret;
}
Copy the code

3.1 Determine whether a number is raised to a power of two

In this function, kfifo_alloc() requires size to be a power of 2.

In binary, the power of two is easy to represent: a number that has only one bit and all the rest zeros. For example:

Decimal representation Binary representation Is it a power of two
8 0000, 1000, is
30 0001, 1110, no
666 001010011010 no
1024 * * * * 0100 0000 0000 is
2000 0111 1101 0000 no
4096 0001 0000 0000 0000 is

In other words, if we can determine that only one of the binary bits of a number is 1, then the number must be raised to a power of 2. Equivalence conversion occurs, so how do we determine how many ones a number contains in binary?? . Here’s a common interview question and tip. The method is: x & (x -1)==0, then the number in binary has only one 1, otherwise contains multiple 1. This method is usually used to calculate how many ones there are in a number.

Int numberof1(int n) {int count = 0; while(n){ count++; n = n & (n-1); } return count; }Copy the code

Simply put: ==x & (n-1) sets the lowest 1 in binary x to 0(the last 1 is set to 0)==. So if n& n minus 1 is equal to 0, that means that there’s only one bit in binary that has a 1, so it must be a power of 2.

3.2 Find an integer power not less than some number 2

Let’s go straight to the kernel implementation:

static __inline__ int generic_fls(int x)
{
	int r = 32;

	if(! x)return 0;
	if(! (x &0xffff0000u)) { 
		x <<= 16;
		r -= 16;
	}
	if(! (x &0xff000000u)) {
		x <<= 8;
		r -= 8;
	}
	if(! (x &0xf0000000u)) {
		x <<= 4;
		r -= 4;
	}
	if(! (x &0xc0000000u)) {
		x <<= 2;
		r -= 2;
	}
	if(! (x &0x80000000u)) {
		x <<= 1;
		r -= 1;1
	}
	return r;
}

static inline unsigned long __attribute_const__ roundup_pow_of_two(unsigned long x)
{
	return (1UL << generic_fls(x - 1));
}
Copy the code

This efficiency? Since all of them are bit operations, it is certain that the efficiency of four operations, such as modulus and mod, should be high, and no point that can be optimized can be missed. As for the principle of doing so, their own product bar, is also quite classic existence.

root@ubantu:/home/toney# ./a.out 
12 --- output=4
16 --- output=5
24 --- output=5
32 --- output=6
128 --- output=8
1024 --- output=11
1400 --- output=11
2040 --- output=11
Copy the code

3.3 Why do I want two to the power?

To use bits, fast, fast, fast by hook or by crook

4. Kfifo entry and exit

__kfifo_put is the Kfifo queue function, source implementation as follows:

unsigned int __kfifo_put(struct kfifo *fifo,
			 unsigned char *buffer, unsigned int len)
{
	unsigned int l;
    
	len = min(len, fifo->size - fifo->in + fifo->out);
    
	/* first put the data starting from fifo->in to buffer end */
	l = min(len, fifo->size - (fifo->in & (fifo->size - 1)));
	memcpy(fifo->buffer + (fifo->in & (fifo->size - 1)), buffer, l);

	/* then put the rest (if any) at the beginning of the buffer */
	memcpy(fifo->buffer, buffer + l, len - l);

	fifo->in += len;

	return len;
}
Copy the code

== It is important to note that memory barriers are not used in the Linux 2.6.12 kernel implementation. Memory barriers were added in later versions and are the core and key to implementing lockfree queues ==. Here we are using the Linux2.6.12 implementation to illustrate the simple principles. For details on Memory Barriers, see my blog post “What are Memory Barriers? Why Memory Barriers?”

Line 6 (in-out) indicates the space used, and size – (in-out) indicates the remaining space. Use min() to prevent write overruns.
Line 9 In & (size – 1) indicates that in falls at the position of the size space pointer, == acts as in%size==, but the bit operation is more efficient.Min is used to obtain the remaining space on the right of the FIFO
Line 10 Copy data to the remaining space on the right of the FIFO
Line 13 Len -l indicates the length of data to be filled in on the left when the space on the right is insufficient
Line 15 Move the in pointer position

__kfifo_put() is Kfifo out of the queue function, source code implementation as follows:

unsigned int __kfifo_get(struct kfifo *fifo,
			 unsigned char *buffer, unsigned int len)
{
	unsigned int l;

	len = min(len, fifo->in - fifo->out);

	/* first get the data from fifo->out until the end of the buffer */
	l = min(len, fifo->size - (fifo->out & (fifo->size - 1)));
	memcpy(buffer, fifo->buffer + (fifo->out & (fifo->size - 1)), l);

	/* then get the rest (if any) from the beginning of the buffer */
	memcpy(buffer + l, fifo->buffer, len - l);

	fifo->out += len;

	return len;
}
Copy the code

You don’t even want to use an “if”. How many if-else judgments can you make me feel better?

4.1 Kfifo joins the queue on the right

When the space on the right side of the FIFO is sufficient, that is, size – in%size > len, directly fill the data on the right side, the position is **[in, in+len]**.

l = min(len, fifo->size - (fifo->in & (fifo->size - 1)));
memcpy(fifo->buffer + (fifo->in & (fifo->size - 1)), buffer, l);
Copy the code

How to express in % size efficiently? Yes, in & size minus 1. Here is a premise: == that requires size to be a power of 2 ==. Why ?

First, in % size range is [0, size-1]; In & (size-1) ranges from [0, size-1].

Secondly, its principle is: size is a power of 2, size-1 means [0, size-1] each bit is 1, you can get all the values in the range, this is the reason why size is a power of 2.

Finally, the two are essentially equivalent, but in & (size-1) only performs bitwise operations, which is much more efficient.

4.2 Kfifo right side + left side to join the queue

When the length of the right side is not enough to join the queue, it is necessary to join the queue on the left side of kFIFO. At this time, the range of about kFIFO is ** [0, len-l]., the range on the left is【in, in+l】**.

/* first put the data starting from fifo->in to buffer end */
l = min(len, fifo->size - (fifo->in & (fifo->size - 1)));
memcpy(fifo->buffer + (fifo->in & (fifo->size - 1)), buffer, l);

/* then put the rest (if any) at the beginning of the buffer */
memcpy(fifo->buffer, buffer + l, len - l);
Copy the code

4.3 Unsigned Integer Overflow Winding

Let’s start with an example:

void main()
{
	unsigned int a = 0xfffffffa;
	unsigned int b = a + 10;
	unsigned int c = 4;
	printf("a = %u\n",a);
	printf("b = %u\n",b);
	printf("b - a =%d\n",b-a);
	printf("c - a =%d\n",c-a);

}
Copy the code

Here are the results:

root@ubantu:/home/toney# gcc kfifo.c 
root@ubantu:/home/toney# ./a.out 
a = 4294967290
b = 4
b - a =10
c - a =10
root@ubantu:/home/toney# 
Copy the code

Here’s the explanation:

a = 4294967290;
b = 4; // A + 10 overflow 4, --> 0x1 00 00 00 04butunsigned intfor4Bytes in total32Bit, so the highest bit cannot be obtained, and b can only be obtained after32bIt, that is,0x00 00 00 04
b - a = - 4294967285.; namely0x1 FF FF FF F6
6 : 0110-- > radix-minus-one complement1001 = 9
- 4294967285.The storage mode in the memory is: complement = inverse +1, i.e.,0x1 00 00 00 09 +1 = 0x1 00 00 00 0A so b minus a is equal to10;

Copy the code

Therefore, whenever integer winding occurs, the variables in KFIFO have the following relationship:


f i f o . i n f i f o . o u t < = f i f o . s i z e fifo.in – fifo.out <= fifo.size

Wonderful!

Unfortunately, I still don’t feel that deep.

Experience 5.

No, no, no, no, no, no, no, no, no, no, no, no, no. If this code is written by me or my colleagues, I would think it will not have a lot of bugs, but in this I think no, no, really no!!