How does the CPU execute an instruction?

Author: Doug, 10+ year veteran of embedded development.

Public number: [IOT town], focusing on: C/C++, Linux operating system, application design, Internet of Things, SCM and embedded development and other fields. Public account reply [books], get Linux, embedded field classic books.

Transfer: welcome to reprint the article, reprint need to indicate the source.

[TOC]

What is Linux from scratch

Over the past two years, my focus has been on x86 Linux, from driver to middle tier to application tier development.

As the content expands, it becomes increasingly clear that many of the basics have been forgotten, such as this table (In Depth understanding the LINUX Kernel, p. 47) :

This table describes several segments of descriptor information in a Linux system.

Data and code. Read the books carefully to see what these descriptors mean, but:

Why are the Base addresses of these segments 0x00000000?

Why is Limit always 0xfffff?

Why do they have different TYPES of Type and priority DPL?

Without some basic understanding of the x86 platform, this book is a real struggle to finish!

What’s more, as Linux kernel code continues to expand in size, the latest 5.13 zip is already over 100 megabytes:

How do you really learn Linux from such a giant? !

Even starting with Linux version 0.11, a lot of the code looks like a lot of work!

During the weekend, I was sorting out some books that I had read before, and I found several good books: Assembly Language by Wang Shuang, From Reality Mode to Protection Mode by Li Zhong, and Assembly Language Programming by Ma Zhaohui.

They are very – very – old books, once again turned over, really feel that the content is very good!

The description of some concepts, principles and design ideas is clear and thorough.

Many of the aspects of segmentation, memory, and register-related design in Linux can be found in these books.

Then I came up with the idea of rereading and organizing the Linux-related content in these books, but it is not simply knowledge transfer.

After thinking about it, I have the following ideas:

Determine your ultimate goal goals: Learn the Linux operating system;

These books are written assembly language, as well as more basic knowledge of the bottom. We will downplay the assembly language and focus on the principles associated with the Linux operating system.

Not strictly in accordance with the contents of the book, the order to output the article, but several books related to the content of the parts together to study and discuss;

Some of the content can be compared with the relevant part of Linux 2.6, so that in the future learning Linux kernel part, you can find the underlying support;

Finally, I hope I can stick to this series, which can also be a comb for myself.

In a word: based on basic knowledge!

As the opening chapter, this article will describe the steps to perform the following diagram:

Start now!

Old Intel8086 processor

The 8086 is Intel’s first 16-bit processor. It was born in 1978, and should be a little older than most of you.

Among all Intel processors, it occupies a very important position and is the originator of the entire Intel 32-bit architecture processor (IA-32).

So, what is a 16-bit processor?

Some people confuse the bits of the processor with the bits of the address bus!

As we know, when the CPU accesses memory, it transmits the physical address through the address bus.

The 8086 CPU has a 20-bit address line and can transmit a 20-bit address.

Each address line represents a bit, so the maximum number of bits that can be represented by 20 bits is 2 ^ 20.

This is called the addressing capability of the CPU.

However, the 8086 processor is 16-bit because:

Algorithms can process up to 16 bits of data at a time;

The maximum width of the register is 16 bits;

The path between the register and the arithmetic unit is 16 bits;

In other words, the maximum length of an 8086 processor that can be processed, transmitted, and temporarily stored at one time is 16 bits. Therefore, we say it is a 16-bit CPU.

What is main storage?

The essence of a computer is the storage and processing of data, so where does the data involved in computing come from? That is a physical device called Storage or Memory.

In a broad sense, any device that can store data can be called memory, such as hard disk, USB disk, etc.

Inside a computer, however, there is a special type of memory that is connected to the CPU and used to store the programs and data that are being executed. This is commonly called internal memory or main memory, or simply memory or main memory.

Memory is organized in bytes, and the minimum unit of access is 1 byte, which is the most basic storage unit.

Each storage unit, that is, a byte, corresponds to an address, as shown below:

The ADDRESS bus is used by the CPU to determine which memory location to access data.

The address of the first byte is 0000H, the address of the second byte is 0001H, and so on.

In the figure, the maximum memory address is FFFFH, which is 65535 in decimal notation, so the memory capacity is 65536 bytes, or 64 KB.

Here’s an atomic manipulation problem to think about.

In Linux kernel code, atomic operations are used in many places, such as mutex implementation code.

Why do atomic operations need to restrict variables to int types? This involves reading and writing to memory.

Although the smallest unit of memory is a byte, different bits of CPU are carefully designed and arranged to be accessible by byte, word, and doubleword.

In other words, a 16-bit processor can handle 16-bit binaries and a 32-bit processor can handle 32-bit binaries in a single access.

What is a register?

Inside the CPU, some are electrical signals that represent zeros or ones. Groups of these binary digits appear on the processor’s internal circuit as a combination of high and low levels that represent each bit of the binary number.

Inside the processor, this data must be locked in a circuit called a register.

Registers, therefore, are essentially a type of memory. It’s just that they’re inside the processor, and the CPU can access the registers faster than it can access memory.

The processor is always busy, and as it works, all data is stored in a register for only a short time before being sent elsewhere, which is why it is called a register.

The registers in the 8086 are 16 bits and can hold two bytes, or one word. The high byte comes first (bit8 to bit15) and the low byte comes last (bit0 to bit7).

The 8086 has the following registers:

As I said, these registers are all 16 bits. Four of the registers AX, BX, CX, and DX can also be used as two 8-bit registers for compatibility with older processors.

For example, AX represents a 16-bit register, AH and AL represent 8-bit registers.

Mov AL, 5D mov AL, 5D mov AL, 5D mov AL, 5D mov ALCopy the code

Three bus

When we start an application, the program’s code and data are loaded into physical memory.

Whether CPU reads instructions or operates data, it needs to interact information with memory:

Determine the address of the storage unit (address information).

Device selection, read or write commands (control information);

Read or write data (data information);

In a computer, there is a dedicated piece of data, called a bus, that connects the CPU to other chips.

Logically classified, including the following three buses:

Address bus: used to determine the address of the storage unit; Control bus: CPU to control the external period; Data bus: data transfer between CPU and memory or other devices;

The 8086 has 20 address lines, called the width of the address bus, which can address 2 ^ 20 memory units.

Similarly, the width of the 8086 data bus is 16, which means it can transmit 16 bits of data at a time.

The control bus determines how many kinds of external control the CPU can carry out, and determines the control ability of the CPU to external devices.

How does the CPU address memory?

In Linux 2.6 kernel code, the compiler-generated address is called a virtual address (also known as a logical address). This logical address is translated by segment to a linear address, which is then translated by page to a physical address in physical memory.

Remember the table of paragraph descriptors at the beginning of this article?

Both the code segment and data segment descriptors start at 0x00000000, which means that the virtual address is numerically equal to the converted linear address (you’ll see why this is the case later).

Let’s take a look at the simpler address translation in 8086.

As mentioned earlier, memory is a linear storage device, and the CPU depends on the location of each storage unit.

For the 8086 CPU, it has 20 address lines and can transmit 20 bits of address, up to 1MB of addressing capability.

However, 8086 is also a 16-bit structure, and the internal one-time processing, transmission, and temporary storage of the address is only 16 bits.

From the internal structure, if the address is simply sent from the internal address bus, only 16 bits can be sent, so that the addressing capacity is only 64KB.

So how do you get the most out of 20 address lines?

8086 CPU uses: internal use of two 16-bit address synthesis method, to form a 20-bit physical address, as follows:

The first 16-bit address is called a segment address, and the second 16-bit address is called an offset address.

The address adder uses the following formula to “compose” a 20-bit physical address:

Physical address = Segment address x 16 + offset address

For example: we write a program, after loading into memory, in a memory space.

When executing these instructions, the CPU treats the CS register as a segment register and the IP register as an offset register, and then computes the value of CS x 16 + IP to get the physical address of the instruction.

As can be seen from the above description: the 8086 CPU seems to be because the register can not directly output 20 bits of physical address, so it has no choice but to use this address composition method.

In fact, the more fundamental reason is that the 8086 CPU wants to address memory by the way of base address + offset (in this case, the base address, that is, the segment address moved 4 bits to the left).

That is, even if the CPU has the ability to output a 20-bit address directly, it may still use the base address + offset approach for memory addressing.

Consider this: when we compile a library file on Linux, we usually add the -fpIC option to the compilation option, indicating that the compiled dynamic library is address independent and needs to be relocated when loaded into memory.

The base address + offset addressing mode provides the underlying support for relocation.

How do we control the CPU?

CPU is actually a very pure, very mechanical thing, the only thing it does is: to CS:IP these two registers designated memory unit to fetch an instruction, and then execute this instruction:

Of course, you need to define a set of instructions in advance. In the instruction area in memory, all the instructions stored must be legitimate, otherwise the CPU will not recognize them.

Each instruction instructs the CPU to perform a specific operation with a specific number (instruction code).

The CPU knows these instructions, and as soon as it sees them, it knows how many bytes of operands follow the instructions and what operations need to be performed.

For example, the instruction F4H means to stop the processor. When the CPU executes this instruction, it stops working.

(The CPU is a bit of an inaccuracy, since it is a unit that includes many components. Perhaps it would be more accurate to say the execution unit in the CPU.)

Another caveat: everything in memory is data, and it is entirely up to the operating system designer to decide which bits of data are to be executed as instructions and which bits of data are to be manipulated as “variables”.

At the level of the 8086 processor, any area of memory that CS:IP “points to” is executed as an instruction.

From the above description can be seen: in THE CPU, programmers can use instructions to read and write components only register, we can change the contents of the register, to achieve the CONTROL of the CPU.

To put it more bluntly: we can control the CPU to execute target instructions by changing the contents of CS and IP registers.

As a qualified embedded developer, it is estimated that we have configured some registers in the MCU to achieve the purpose of some function definition and port reuse. In fact, these operations can be regarded as our control of the CPU.

If the CPU is a puppet, then the registers are the ropes that control the puppet.

Let’s compare THE CPU and PLC programming in the field of industrial control.

When we get a new PLC device, there is only one runtime, and this runtime does its job:

Scan all input ports, lock exists input image area;

Perform an operation, control logic, get a series of output signals, locked in the output image region;

Refresh the output image area signal to the output port;

In a new PLC, the operation and control logic required in the second step may not exist.

Therefore, a single runtime, PLC is unable to complete a meaningful work.

In order to let PLC complete a specific control goal, we also need to use PLC manufacturers to provide the upper computer programming software, the development of an operation, control logic program, programming language is generally ladder diagram in the majority.

When this program is downloaded into the PLC, it can control the runtime to do some meaningful work.

We can simply think: ladder diagram is used to control the running time of PLC.

For a CPU to execute instructions for a memory location, it simply changes the registers CS and IP.

In other words: as long as you know the memory layout of a program well enough, you can manipulate the CPU to execute code anywhere.

The CPU executes the instruction flow

Now that we understand address translation, memory addressing, we are left with the smallest units needed for the CPU to execute an instruction: the instruction buffer and the control circuit.

Simply put: the instruction buffer is used to cache the instructions read from memory, and the control circuit is used to coordinate the use of resources such as bus by various devices.

For the diagram below, there are four instructions:

Take the first instruction, which goes through five steps:

The CS:IP content is fed into the address adder, and the 20-bit physical address of 20000H is calculated.

The control circuit feeds the 20-bit address into the address bus;

20000H unit in memory instruction B8 23 01, through the data bus is sent to instruction buffer;

The value of the instruction offset register IP is incremented by 3, pointing to the next offset address waiting to be executed (because instruction code B8 means that the current instruction is 3 bytes long).

Execute the instructions in the instruction buffer: enter the value 0123H into register AX;

These are the basic steps of executing an instruction, and of course, the execution of instructions in modern processors is much more complicated than this.

Great oaks from little acorns grow!

This article describes only the minimum knowledge required for a CPU to execute an instruction.

In the next article, we’ll take a closer look at the memory fragmentation mechanism.

How does the CPU execute an instruction?

What is Linux from scratch

Old Intel8086 processor

What is main storage?

What is a register?

Three bus

How does the CPU address memory?

How do we control the CPU?

The CPU executes the instruction flow

Related Posts

12 interesting Linux terminal commands

Typora map bed plug-in – Paste pictures automatically upload to B station

How to record audio in MacOS