Make writing a habit together! This is the 13th day of my participation in the “Gold Digging Day New Plan · April More Text Challenge”. Click here for more details.

What are the risks we need to take to improve CPU throughput through pipelining design?

Three major risks to be solved in assembly line design:

  • Structural Hazard
  • Data Hazard
  • Control Hazard

In CPU pipeline design, there are various “dangers” that make the next instruction of the pipeline not work properly. But still by “jumping the gun”, “risk” to get a chance to increase the throughput of orders. Pipelined cpus are risky choices made proactively. This is not a desperate response, nor is it a crisis, in the expectation of higher returns through risk taking.

You have a plan in place to deal with the problems your risks may cause.

Structure of adventure

In essence, it is a resource competition problem at hardware level, that is, a problem at hardware circuit level.

The CPU is running at different stages of two computer instructions in the same clock cycle. But these two different phases may use the same hardware circuitry.

The most typical example is in-memory data access.

  • In the same clock cycle, two different instructions access the same resource (diagram of level 5 pipeline)

When the first instruction performs Fetch (MEM), the fourth instruction in the pipeline performs Fetch (Fetch). Access and access instructions, both to carry out memory data read. Memory has only one address decoder as the address input, so it can only read one data in a clock cycle, can not simultaneously execute the first instruction read memory data and the fourth instruction read instruction code.

The most common example of such resource conflicts is the thin-film keyboard “lock key”. Thin-film keyboards do not have separate circuits behind each key, but multiple keys share a single circuit. If you press two buttons that share the same circuit at the same time, the signals from both buttons can’t be transmitted. Heavy keyboard users should buy mechanical or capacitive keyboards. Because the keys have independent transmission lines, “full key without impact”, a lot of writing articles, writing programs, or playing games, will not press the key but did not take effect.

The essence of “full key without impact” is to increase resources. Also available in CPU architecture adventures. Conflicts between accessing memory data and fetching instructions split our memory into two parts, each with its own address decoder. These two parts are the program memory that holds instructions and the data memory that holds data.

This solution of splitting memory into two parts is known in computer Architecture as Harvard Architecture. Von Neumann Architecture is also known as Princeton Architecture.

Today’s CPU is still a von Neumann architecture, without splitting memory into program memory and data memory. Because of this separation, the memory space required for program instructions and data cannot be dynamically allocated according to the actual application. While resource conflicts are resolved, flexibility is lost.Modern CPU architectures, borrowed from Harvard architecture, split the cache level into instruction cache and data cache. However, borrowed from Harvard architecture, modern cpus do not split the memory level, but they do split the cache inside the CPU. The Cache is divided into Instruction Cache and Data Cache.

Memory access is much slower than CPU speed, so modern cpus do not read main memory directly. It loads instructions and data from main memory into the cache so that subsequent accesses are to the cache. Instruction cache and data cache split, so that our CPU in the data access and instruction, there will be no resource conflict problem.

Structural risk taking is a hardware problem that can be solved by increasing hardware resources. There are, however, a lot of risky problems at the level of procedural logic. One of the most common is data risk-taking.

Data risk-taking: Three different dependencies

There are data dependencies between simultaneous execution of multiple instructions.

These data dependencies can be divided into three categories:

  • Read After Write (RAW)
  • Write After Read, WAR
  • Write After Write (WAW)

Read After Write

C language code compiled out of the assembly instructions.

int main(a) {
  int a = 1;
  int b = 2;
  a = a + 2;
  b = a + 3;
}
int main(a) {
   0:   55                      push   rbp
   1:   48 89 e5                mov    rbp,rsp
  int a = 1;
   4:   c7 45 fc 01 00 00 00    mov    DWORD PTR [rbp-0x4].0x1
  int b = 2;
   b:   c7 45 f8 02 00 00 00    mov    DWORD PTR [rbp-0x8].0x2
  a = a + 2;
  12:   83 45 fc 02             add    DWORD PTR [rbp-0x4].0x2
  b = a + 3;
  16:   8b 45 fc                mov    eax,DWORD PTR [rbp-0x4]
  19:   83 c0 03                add    eax,0x3
  1c:   89 45 f8                mov    DWORD PTR [rbp-0x8],eax
}
  1f:   5d                      pop    rbp
  20:   c3                      ret  
Copy the code
  • Add 0x2 to rBP-0x4 for machine code 12
  • From rBP-0x4, data is written to the EAX register.

Therefore, it is necessary to ensure that the operation of writing the instruction at memory address 12 to RBP-0x4 must be completed before the instruction at memory address 16 can read the value of RBP-0x4. This is the data dependency of write before read. This order is not guaranteed, the program is wrong!

This write-before-read Dependency is called a Data Dependency.

Write After Read

So this time we’re going to do a is equal to b plus a, and then we’re going to do b is equal to a plus b.

int main(a) {
  int a = 1;
  int b = 2;
  a = b + a;
  b = a + b;
}
int main(a) {
   0:   55                      push   rbp
   1:   48 89 e5                mov    rbp,rsp
   int a = 1;
   4:   c7 45 fc 01 00 00 00    mov    DWORD PTR [rbp-0x4].0x1
   int b = 2;
   b:   c7 45 f8 02 00 00 00    mov    DWORD PTR [rbp-0x8].0x2
   a = b + a;
  12:   8b 45 f8                mov    eax,DWORD PTR [rbp-0x8]
  15:   01 45 fc                add    DWORD PTR [rbp-0x4],eax
   b = a + b;
  18:   8b 45 fc                mov    eax,DWORD PTR [rbp-0x4]
  1b:   01 45 f8                add    DWORD PTR [rbp-0x8],eax
}
  1e:   5d                      pop    rbp
  1f:   c3                      ret       
Copy the code

The assembly instruction at address 15 reads the eAX register value and adds it to the rBP-0x4 memory address. In the assembly instruction at memory address 18, write to update the EAX register.

If the eAX at memory 18 is written first and the eAX is fetched at memory 15, the program is wrong. Here, we also want to protect the read before write order for eAX.

This read-before-write Dependency is called anti-dependency.

Write After Write

Set variable A = 1, then variable A = 2.

int main(a) {
  int a = 1;
  a = 2;
}
int main(a) {
   0:   55                      push   rbp
   1:   48 89 e5                mov    rbp,rsp
  int a = 1;
   4:   c7 45 fc 01 00 00 00    mov    DWORD PTR [rbp-0x4].0x1
  a = 2;
   b:   c7 45 fc 02 00 00 00    mov    DWORD PTR [rbp-0x4].0x2
}
Copy the code

The instruction at memory address 4 and the instruction at memory address B both write the corresponding data to the memory address rBP-0x4. If the instruction at memory address B is written after the instruction at memory address 4. After these instructions are completed, the data in RBP-0x4 is incorrect. This will result in the subsequent use of the data instruction in the memory address, and will not be able to get the correct value. Therefore, it is necessary to ensure that the instruction at memory address 4 is written before the instruction at memory address B is written.

Output Dependency This Dependency is called Output Dependency.

Line stop

Except for read after read, operations on the same register or memory address are enforced in an explicit order. And this sequential operation requirements, also bring challenges to the use of pipelining. Because the core of pipelining architecture is that subsequent instructions begin to execute before the previous instruction has finished.

So, there needs to be a solution to these data adventures. The simplest and dumbest is Pipeline Stall, or Pipeline Bubbling.

If it is found that a later instruction will have a data-level dependency on the earlier instruction, it will “wait”. When the instruction is decoded, the register and memory address that the corresponding instruction needs to access will be obtained, and then it can be determined whether the instruction will trigger the data adventure. When it does, it can decide to shut down the entire assembly line for one or more cycles.The clock signal will automatically switch between 0 and 1 constantly. So, you can’t really stop. Every step of the assembly line has to do something. So instead of actually stopping the pipeline, you insert a NOP operation, which is just a fish operation, before performing the next step.The insertion of the command is like an air bubble in a Pipeline. As the water passes, it doesn’t actually transport the water to the next step. Instead, it creates a Bubble of empty air, hence the name Pipeline Bubble.

conclusion

  • Structural risk-taking can be addressed by increasing resources.

Modern CPU architecture is also a hybrid architecture solution borrowed from Harvard architecture under the von Neumann architecture. Although the memory is not divided by function, it is divided into instruction cache and data cache at the cache level. From the hardware level, the competition for the same resource under the same clock no longer occurs.

  • It is also possible to solve risky problems, pipeline pauses, by “waiting,” inserting NOP operations.

However, solutions such as pipeline pauses sacrifice CPU performance. Because, in fact, in the worst-case scenario, our pipelined CPU will revert to a single-instruction cycle CPU.

reference

  • Chapters 4.5-4.7 in Computer composition and design: Hardware/software interfaces