The CPU is the brain of the entire computer, the task of controlling and processing external input. CPU is composed of three modules: arithmetic unit, register and controller.

Arithmetic unit

Arithmatic Unit ALU is used for arithmetic and logic calculations. Arithmetic calculation includes addition, subtraction, multiplication and division of four operations, logical operations mainly include different, or, not, and, comparison and other operations. The data processed by the arithmetic unit is usually binary data. In modern computers, the length of the data processed by the arithmetic unit is usually 64 bits. The larger the length of the binary bits processed, the higher the processing performance. The register is the data warehouse of the arithmetic unit. The data needed by the arithmetic unit is stored in the register. When the arithmetic unit completes its calculation, it will write the result back to the register and finally return to the output device. The register is divided into

The controller

Includes program counter, timing generator, instruction decoder, register.

Program counter

Program Counter, or PC, is used to record the address of CPU instructions. When the CPU executes the instruction, it needs to get the address of the instruction from the PC, and then get the instruction according to the address. When this instruction is completed, the address of the PC will point to the next instruction address bit.

Timing generator

It is used to send timing pulses. The CPU works rhythmically according to different timing pulses, which is similar to the metronome of the CPU.

Instruction decoder

Translation instruction

register

The Data Register (DR), also known as the Data buffer Register, serves as the transfer station for information transmission between CPU, main memory and peripherals to make up for the difference in operation speed between CPU, main memory and peripherals. The data register is used to temporarily hold an instruction or a data word that is read out by main memory; Conversely, when an instruction or a data word is deposited into main memory, it is also temporarily deposited in the data register. The instruction register stores the instructions of the CPU. The instruction address in the PC is the instruction register space to which it points. When an instruction is executed, it is first read from main memory into the data register and then transferred to the instruction register. The Address Register (AR), the memory Address that stores data that the CPU exchanges with memory via the IO bus. Due to the difference in operation speed between main memory and CPU, the address register must be used to hold the address information of main memory temporarily until the main memory access operation is completed. The address register and the data register are used when the CPU and main memory exchange information, i.e. the CPU either adds data/instructions to main memory or reads data/instructions from main memory. Accumulator The Accumulator register, usually referred to as Accumulator (AC), is a general purpose register. When an arithmetic logic unit (ALU) of an arithmetic unit (ALU) performs arithmetic or logical operations, it provides a workspace for ALU to temporarily store an operand or the result of an operation. The Program Status Word (PSW) is used to represent the state of the current operation and the working mode of the Program. The program status word register is also used to store information such as interrupts and system working state, so that the CPU and system can know the running state of the machine and program running state in time. Therefore, the program status word register is a register that holds the flags of various state conditions.

CPU cache

When the CPU accesses the memory, it first queries the data from the CPU cache, if there is a direct fetch, otherwise it accesses the data in the memory, puts the data into the CPU cache, and finally returns the data to the CPU for use.

The following figure shows the cache architecture of the CPU



As shown in the figure, CPU caches are divided into primary, secondary, and tertiary caches. The closer you get to the CPU, the faster you can access data and the higher the cost. Each CPU core has separate L1 and L2 caches, and multiple cores of the same physical CPU share the same L3 cache.

When the CPU fetches data, it first fetches data from L1, and then from L2 and L3 until it fetches data from main memory. It is fastest when L1 contains the required data, whereas it is slow when you have to fetch data from the master slave every time.

Cache line

The minimum unit of data exchanged between CPU and main memory is the cache line(cache line), each of which is 64 bytes in size. Due to spatial locality, adjacent data is more likely to be accessed in the future. Therefore, when the CPU loads main memory data into the cache, it will load 64 adjacent bytes into the cache together. These 64 bytes will account for exactly one cache line. When the data of a cache row changes, the cache row is cleared and reloaded from the master and slave. The problem of multithreaded cache sharing occurs when multiple threads simultaneously operate on different bytes in a cache line. Each thread operation will lead to the cache failure, and then need to read from the master and slave, reload into the CPU cache, which leads to frequent cache load, failure, read data actually from the main memory, greatly reducing the CPU performance. In order to avoid the pseudo-sharing problem of cache rows, we can distribute the data operated by different threads into different cache rows without interfering with each other. If thread A modifies thread A, it only invalidates the cache rows of thread A without affecting thread B’s data. There are two ways to align the cache rows: (1) Fill the shared variable data left and right with invalid data, and make sure the total length is 64 bytes; (2) Use @Contended annotation to annotate the shared variable