Why did humans invent computers? Humans have long had the habit of being lazy, and it was for this reason that computers were invented to increase productivity and give people more free time to play

Working principle of von Neumann structure

The computer of Von Neumann structure consists of five parts, namely calculator, controller, memory (memory), input device and output device, and the computer and controller constitute the CPU (Center process unit). The characteristic of the von Neumann structure is that data and instructions are stored indiscriminately in binary form in memory. To calculate the addition of two numbers, for example, the relevant code and data are first read into memory, and the compiler compiles the relevant code into assembly code. MOVE [1000] EAX indicates that the contents with address 1000 are stored in register EAX. MOVE [1004] EBX indicates that the contents with address 1004 are stored in register EBX. ADD EAX EBX Adds the contents of EAX and EBX and stores the result in the EBX register. We find that a simple addition of data corresponds to three instructions at the assembly level, while more complex data operations require more complex instructions.

The way the parts of a computer are connected

The CPU is connected to memory via the I/O bridge, and the I/O bridge is connected to the IO bus, which mounts the USB controller, graphics controller, and disk controller. Bus structure (not all devices are connected in pairs, but through a bus structure)

  • SOA is also a bus structure, and services are not connected in pairs

  • The code is stored on disk. To run the code, you first need to put the code on disk into memory, and then the CPU finds the first address of the program, such as the address of the main method

The I/O bridge is an abstraction of the north bridge and the south bridge

  • Northbridge: It is connected to fast devices, such as memory

  • South Bridge: Connected to slow equipment

Instruction and pipeline

Common instruction formats include three-address instruction, two-address instruction, one-address instruction and zero-address instruction, as shown in the figure below. Among them, two-address instruction occupies less space than three-address instruction. X86 processors use two-address form. An address instruction is usually used as an accumulator; Zero-address instructions are more compact, but generally require more instructions to do one thing than two-address or three-address instructions, for example

iconst_1
iconst_2
iadd
istore_0
Copy the code

The iadd (for integer addition) instruction does not take any arguments. What is the use of a zero-address directive when the source address is not specified? Zero addresses mean that both source and target are implicit parameters whose implementation relies on a common data structure, the stack.

The execution of instructions is step by step (finger taking, decoding, execution, etc.), and it is precisely for this reason that it corresponds to different parts, which we should make full use of, hence the concept of assembly line. If pipelining is not used, the CPU speed will be slow, for example: while the value component is being evaluated, the decoding and execution components are idle. So, while one instruction is decoding, another instruction is fetching from memory, and in doing so, both the value unit and the decoder unit can be used.

The problem of speed mismatch — the core problem of computer

The following graph shows the speed difference between CPU, memory, hard disk and network. CPU speed is 1s, main memory speed is 6 minutes, hard disk speed is 1 to 12 months, and network speed is 19 years. We find that the speed difference between these devices is huge. The network is ten million times slower than the CPU, so can the CPU sit back and wait for memory or hard disk to do the slow work? Or is there a way out of this situation?

How to solve the speed mismatch problem

1. Increase the speed of hard disks and other devices to match the CPU (impossible at present)

2. Acknowledge limitations, but squeeze the CPU to its full capacity.

  • A classic example of asyncio is Direct Memory Access (DMA). After a CPU initiates a read from a hard disk, it does something else immediately without waiting for the hard disk to complete. The DMA controller reads data from the hard disk to Memory and notifies the CPU when the read is complete.

  • Sequential -> concurrent order means that each program is executed in order; Concurrency is the execution of different instructions on a single CPU by switching the time slice. Because the time slice is very short, human beings cannot feel it, so we feel that multiple programs are executed in parallel. Parallelism is the simultaneous execution of multiple programs on multiple cpus.

  • The principle of adding intermediate layer locality is divided into time locality principle and space locality principle. Time locality: if an instruction in the program is executed once, it may be executed again soon after. If a piece of data is accessed, it may be accessed again later. Spatial locality means that once a program accesses a storage unit, it will not be long before nearby storage units are accessed as well. In the CASE of the CPU, increasing the middle tier is increasing the cache. When the CPU to read a data, first from the CPU cache search, find immediately read and sent to the CPU; If the data is not found, it is read from the relatively slow memory and sent to the CPU for processing. At the same time, the data block in which the data resides is transferred to the cache, so that the future reading of the whole data block is carried out from the cache without the need to call the memory.

Welcome to follow wechat public account: Mukeda, all articles will be synchronized on the public account.