Each of us programmers probably have a dream, that is to become a champion, we may be immersed in various frameworks, think that the framework is everything, think that the application layer is the most important, you are wrong. In today’s computer industry, the ability to apply is the basic quality, if you understand its principle can let you go further in the industry, and the basic knowledge of computer is the most important. Below, follow my footsteps, for you to introduce the basic knowledge of the computer.

CPU

Don’t know about cpus? Now let’s take you through what a CPU is

The CPU stands for the Central Processing Unit, and it is arguably the most hardcore component of your computer. The CPU is the core component that makes your computer a computer, but it doesn’t represent your computer. The CPU is to computers what the brain is to people. At its core, the CPU takes instructions from programs or applications and performs calculations. This process can be divided into three key stages: extraction, decoding and execution. The CPU extracts the instruction from the system’s main memory, decodes the actual contents of the instruction, and then executes the instruction by the relevant part of the CPU.

CPU internal processing

The following figure shows the running process of a general program (taking C language as an example). It can be said that understanding the running process of a program is the basis and prerequisite for mastering the running mechanism of a program.

In this process, the CPU is responsible for interpreting and running the content that is ultimately translated into machine language.

The CPU consists of two main parts: the control unit and the arithmetic logic Unit (ALU)

  • Control unit: extracts instructions from memory and decodes them
  • Arithmetic logic unit (ALU) : Handles arithmetic and logical operations

The CPU is the heart and brain of a computer. It and memory are electronic components made up of many transistors. It receives data input, executes instructions and processes information. It communicates with input/output (I/O) devices that send and receive data to and from the CPU.

From the point of view of function, the CPU’s internal by the register, controller, arithmetic unit and clock four parts, each part through electrical communication.

  • registerIt’s part of the central processing unit. They can be used to hold instructions, data, and addresses. You can think of it as a kind of memory. Depending on the type, a CPU can have between 20 and 100 internal registers.
  • The controllerResponsible for the memory instructions, data into the register, and control the computer according to the results of instructions
  • Arithmetic unitComputes data read into registers from memory
  • The clockResponsible for sending the clock signal for the CPU to start timing

A CPU is a collection of registers

Of the four CPU structures, we programmers only need to understand the registers, the other three do not need to pay too much attention to, why say so? Because the program describes registers as objects.

Different types of CPUS have different types of internal registers, the number of registers, and the range of values that registers store. However, depending on the function, registers can be divided into the following categories

species

function

Accumulative register

Store running data and post-operation data.

Flag register

Some features used to reflect the state and results of the processor and to control the execution of instructions.

Program counter

A program counter is a place where the address of the cell in which the next instruction resides is stored.

Base address register

The starting location of the memory where data is stored

Indexing register

Store the relative address of the base address register

Universal register

Storing arbitrary data

Instruction register

The register stores the instructions being run. It is used internally by the CPU and cannot be read or written by the programmer

The stack register

The starting location of the storage stack area

There is only one program counter, accumulative register, flag register, instruction register and stack register, and other registers generally have more than one.

Each register is described below

Program counter

The Program Counter is used to store the address of the cell in which the next instruction resides.

When the program is executed, the initial value of PC is the address of the first instruction in the program. When the program is executed sequentially, the controller first takes out an instruction from the memory according to the instruction address indicated by the program counter, and then analyzes and executes the instruction. At the same time, the value of PC plus 1 points to the next instruction to be executed.

Let’s take a detailed look at the execution of the program counter with an example

This is a sum operation. The program starts, and after compilation and parsing, the operating system copies the program from the hard disk to memory. In this example, the program adds 123 and 456, and outputs the results to the monitor.

Address 0100 is where the program starts. When an operating system like Windows copies a program from hard disk to memory, it sets the program counter as the starting position 0100, and then executes the program. With each instruction, the program counter increases in value by one (or points directly to the address of the next instruction). Commands are read from memory and executed, that is, program counters control the flow of the program.

Conditional branching and loop mechanisms

The condition control flow in high-level language is mainly divided into three kinds: sequential execution, conditional branch and circular judgment. Sequential execution is the execution instruction in order according to the content of the address. Conditional branching is the execution of instructions at arbitrary addresses based on conditions. A loop is the repeated execution of instructions at the same address.

  • Sequential execution is simple; the program counter is + 1 for each instruction executed.
  • Conditional and loop branching sets the value of the program counter to any address, so that the program can go back to the previous address to repeat the same instruction, or jump to any instruction.

The conditional branch is used as an example to illustrate the execution of the program (the loop is similar)

The start process of the program is the same as the sequential process, CPU starts to execute the command from 0100, 0100 and 0101 are executed sequentially, THE value of PC is in order +1, when executing the instruction to 0102 address, judge the value of register 0106 is greater than 0, jump to 0104 address instruction, Output the value to the display, and then end the program, 0103 instruction is skipped, which is the same as if() judgment in our program, if the condition is not met, the instruction will be skipped directly. So the PC does not directly +1, but the address of the next instruction.

Flag register

The condition and loop branches use jump instruction, which determines whether to jump or not based on the current instruction. We mentioned the flag register above, and the flag register stores the result of the current cumulative register whether it is positive, negative, or zero

CPU in the operation, the value of the flag register will be automatically set according to the result of the current operation, the operation result of positive, negative and zero three states by the flag register three bits. When the first, second, and third byte bits of the flag register each result is 1, they represent a positive number, zero, and negative number, respectively.

The execution mechanism of CPU is interesting. If XXX stored in the cumulative register is compared with YYY stored in the general register, the operation mechanism of CPU will do the subtraction operation behind the comparison. Whether the result of the subtraction operation is positive, zero or negative, it will be saved in the flag register. If the value is positive, XXX is larger than YYY; if the value is zero, XXX is equal to YYY; if the value is negative, XXX is smaller than YYY. The instructions for comparing programs are actually doing subtraction inside the CPU.

Function call mechanism

Next, we move on to the function call mechanism. Even for programs written in high-level languages, function call processing is done by setting the value of the program counter to the function’s storage address. After a function executes a jump instruction, it must be returned. A simple instruction jump does not make sense. Here is an example of how to implement a function jump

In the figure, variables A and B are assigned with values of 123 and 456 respectively, and MyFun(a,b) method is called for instruction jump. The addresses in the figure are the addresses when C is compiled into machine language. The addresses in the figure are scattered because a one-line C program is usually converted into multi-line machine language after compilation. After executing MyFun(a,b), the program returns to the next instruction of MyFun(a,b), and the CPU continues to execute the following instruction.

The call and return instructions are important. After setting the entry address of the function to the program counter, the call instruction stores the address of the instruction to be executed after calling the function in main memory called the stack. After the function is processed, the return instruction is executed through the exit of the function. The function of the return directive is to set the address stored on the stack to the program counter. The address 0154 is stored on the stack until MyFun is called, and the address 0154 is stored in the program counter when MyFun is finished. The call is as follows

In conditional or loop statements in some high-level languages, the processing of a function call is converted to a Call instruction, and the processing after the function terminates is converted to a return instruction.

Implement an array by address and index

Next we look at the base address register and the address change register. With these two registers we can partition a specific area of main memory to perform an array-like operation. First, we divide the addresses 00000000-ffffFFFF from the computer memory in hexadecimal numbers. So, any memory address in this range, as long as there is a 32-bit register, you can see the entire address. But if you want to split up a specific memory area like an array for continuous viewing, using two registers is more convenient.

For example, we use two registers (base address register and variable address register) to represent the value of memory

This representation is similar to the construction of an array, which is a sequential arrangement of data of the same length in memory. Use the array name to represent all the values of the array, and use the index to distinguish each data element of the array, for example, A [0] -a [4], 0-4 in [] is the subscript of the array.

CPU instruction execution process

In almost all von Neumann cpus, the work can be divided into five stages: fetch instruction, decode instruction, execute instruction, access number, and write back result.

  • Instruction fetchA phase is the process of reading an instruction from memory into a register in the CPU that stores the address of the next instruction
  • Instruction decodeIn the stage of instruction decoding, the instruction decoder splits and interprets the retrieved instruction according to the predetermined instruction format, and identifies different instruction categories and various methods for obtaining operands.
  • An instruction toStage, after the completion of decoding, it is necessary to execute this instruction, the task of this stage is to complete the various operations specified by the instruction, specific realization of the function of the instruction.
  • Access to accessStage: according to the requirement of instruction, it may need to extract data from memory. The task of this stage is to obtain the address of the operand in main memory according to the instruction address code, and read the operand from main memory for operation.
  • Results to write backAs the last stage, the Write Back (WB) stage “writes” the result data of the execution instruction stage to some form of storage. The result data is often written to the INTERNAL registers of the CPU so that it can be accessed quickly by subsequent instructions.

memory

The CPU and memory are like a bunch of inseparable lovers. They are inseparable. Without memory, the CPU cannot execute program instructions, so the computer has no meaning. With only memory and no instructions to execute, the computer would still not run.

So what is memory? How do memory and CPU interact? Here is an introduction

What is memory

Memory is one of the most important components in a computer. It is the bridge between a program and CPU. All programs in the computer are run in the memory, so memory has a very big impact on the computer, memory is also known as the main memory, its role is to store the COMPUTING data in the CPU, as well as data exchanged with the hard disk and other external storage devices. As long as the computer is running, the CPU will transfer the data to the main memory for calculation. When the calculation is completed, the CPU will transmit the results. The operation of the main memory also determines the stable operation of the computer.

The physical structure of memory

The interior of the memory is made up of various IC circuits. There are many kinds of them, but they are mainly divided into three kinds of memory

  • Random access memory (RAM) : The most important type of memory from which data can be either read or written. When the machine shuts down, the information in memory willThe loss of.
  • Read-only memory (ROM) : ROM can only be used to read data, not to write data, but this data will not be lost when the machine fails.
  • Cache: Cache is also commonly seen, it is divided into L1 Cache (L1 Cache), L2 Cache (L2 Cache), three Cache (L3 Cache) these data, it is located between the memory and CPU, is a read and write speed ratio memoryfasterMemory. When the CPU writes to memory, the data is also written to the cache. When the CPU needs to read data from the Cache, it directly reads the data from the Cache. Of course, if the data is not in the Cache, the CPU reads the data from the memory.

Memory IC is a complete structure, it also has power supply, address signal, data signal, control signal and for addressing IC pin to carry out data read and write. Below is a virtual IC pin schematic

In the figure, VCC and GND represent the power supply, A0-A9 are the pins of the address signal, d0-D7 represent the control signal, RD and WR are good control signals, I use different colors to distinguish, after connecting the power supply to VCC and GND, You can pass 0 and 1 signals to other pins, and in most cases, +5V means 1, 0V means 0.

We all know that memory is used to store data, so how much data can this MEMORY IC store? D0-d7 represents data signals, that is, 8 bit = 1 byte data can be input and output at a time. A0-a9 indicates ten address signals, indicating that 00000 00000-11111 11111 = 1024 addresses in total. Each address holds 1 byte of data, so we can say that the memory IC is 1 KB.

The process of reading and writing memory

Let’s focus on the process of memory IC reading and writing data. Let’s look at a model for writing and reading data to a memory IC

To describe the process in detail, suppose we want to write 1byte of data to a memory IC. The process looks like this:

  • First, connect +5V power supply for VCC and 0V power supply for GNDA0 - A9To specify where the data is stored, and then to input the value of the data toD0 - D7The data signal and theWR (write)Is set to 1. After these operations, data can be written to the memory IC
  • When reading data, you only need to specify the location of data storage through a0-A9 address signal, and then set RD to 1.
  • RD and WR are also known as control signals. When WR and RD are 0, write and read operations cannot be performed.

A realistic model of memory

To facilitate memory, we map memory models to models of our real world, and in the real world, memory models are very much like the buildings we live in. In this building, one floor can store one byte of data, floor number is the address, the following is the model diagram of memory and floor integration

As we know, the data in the program is not only the value, but also the concept of data type, in terms of memory, is the amount of memory (occupied floor) meaning. Even if memory is physically forced to read and write data one byte at a time, it is possible to read and write data in a specific number of bytes in a program by specifying its data type.

binary

As we all know, the bottom layer of the computer uses binary data for data flow transmission, so why use binary to represent the computer? In other words, what are binary numbers? In a further step, how to add, subtract, multiply and divide using binary? Let’s take a look

What is a binary number

So what is a binary number? To illustrate this problem, let’s first convert the number 00100111 to a decimal number. Let’s see, the binary number to a decimal number, directly convert the value of each position * bit weight, so let’s convert the value above

In other words, the binary number 00100111 is 39 in decimal, which is not written as 3 and 9, but 3 * 10 + 9 * 1, where 10, 1 is the bit weight, and so on, In the example above, the bit weights from highest to lowest are 7, 6, 5, 4, 3, 2, 1, 0. This bit weight is also called the power, so the highest bit is 2 to the seventh, 2 to the sixth and so on. Binary numbers are always evaluated to base 2, which is the base, so the base of a decimal number is 10. In any case the value of the bit weight is the number of bits -1, then the first bit weight is 1-1 = 0, the second bit weight is 2-1 = 1, and so on.

So the binary number we are talking about is actually a number represented by 0 and 1. Its base is 2, and its value is the sum of the bits * weight of each number. We generally refer to the decimal number, so its value is 3 * 10 + 9 * 1 = 39.

Shift operation and multiplication and division

Now that we know about binary, let’s look at binary operations. Just like decimal numbers, addition, subtraction, multiplication and division work for binary numbers, as long as you pay attention to the round of 2. The operation of binary numbers is also the unique operation of computer programs, so it is necessary to understand the operation of binary.

First, we will introduce the shift operation. The shift operation refers to the operation of moving the elements of each position of the binary value to the left and right, as shown in the figure below

complement

We didn’t show you the right shift just now, because the high order value that’s left empty after the right shift has two forms, 0 and 1. To tell when to add 0 and when to add 1, you first need to know how to represent negative numbers in binary numbers.

When negative values are represented in binary numbers, the highest bit is usually used as a sign, so we use this highest bit as a sign bit. A sign bit 0 represents a positive number and a sign bit 1 represents a negative number. So what’s minus one in binary terms? Many people might think that since the binary number of 1 is 0000 0001, and the highest bit is the sign bit, the correct representation of -1 should be 1000 0001, but is that really the right answer?

There is no subtraction in the computer world, when the computer is doing subtraction is actually doing addition, which is to achieve the subtraction operation with addition. For example, 100 minus 50, when the computer looks at it, it’s 100 plus minus 50, and for that reason, when you write negative numbers, you have to use the binary complement, which is a negative number represented by a positive number.

To get the complement, we need to invert all the bits in binary, and then add one to the result. Remember that, so let’s show it.

To be specific, it is necessary to obtain the binary number of a certain value, and then invert each bit of the binary number (0 –> 1, 1 –> 0), and finally invert the number +1, so as to complete the acquisition of the complement.

The acquisition of the complement, although intuitively difficult to understand, but logically very rigorous, for example, let’s take a look at the process of 1-1, we first use the above 1000 0001(it is the complement of 1, do not know the above, just for calculation) to express it

Strangely, 1-1 becomes 130 instead of 0, so it follows that 1000 0001 means -1 is completely wrong.

So what is the correct expression? Actually, we already gave the answer up here, which is 1111, 1111, just to show you that this is true

We can see that 1-1 is actually 1 + (-1). After taking the above inverse + 1, it becomes 1111 1111, and then adding with 1, the result is 1 0000 0000. The overflow occurs and the computer directly ignores the overflow bit. So you flip the highest 1, and it becomes 0000, 0000. That’s 0. That’s right. So 1111, 1111 is minus 1.

So the binary representation of a negative number is to find its complement, the process of finding the complement is to take the binary bits of the original number, and then add 1 to the result.

The difference between an arithmetic right shift and a logical right shift

Now that we know about the complement, let’s revisit the issue of right shift. There are two cases where the highest bit left empty by right shift is 0 and 1.

When a binary number is right-shifted as a signed value, it is necessary to fill the highest bit with the value of the sign bit before the shift (0 or 1). This is called an arithmetic shift to the right. If the value uses the negative value represented by the complement, then the highest bit left empty after the right shift is filled with 1, it can correctly represent the numerical operation of 1/2,1/4,1/8, etc. If it’s positive, you just fill in the empty space with 0.

Let’s look at an example of a right shift. We’re going to move negative 4 two places to the right, and we’re going to look at each of them

As you can see in the figure above, in the case of a logical shift to the right, negative 4 moves two places to the right will become 63, which is obviously not a quarter of it, so you can’t use a logical shift to the right, so in the case of a mathematical shift to the right, negative 4 moves two places to the right will become negative 1, which is obviously a quarter of it, so you use a mathematical shift to the right.

Then we can get a conclusion: when moving to the left, no matter the graph or the value, after the shift, we only need to add 0 to the low position; When you move right, you need to determine whether you move right logically or arithmetically.

Here’s a look at sign extension: Sign extension of data is used to produce a result that doubles the number of digits, but does not change the size of the digit. Some instructions require that the number of digits be operated on, such as the dividend that is longer than the divisor, or the number of digits is longer to reduce errors in the calculation process.

In the case of 8-bit binaries, symbolic extension means converting them to 16-bit and 32-bit binaries without changing their values. Converting 0111 1111 from a positive 8-bit binary to a 16-bit binary easily yields the correct result 0000 0000 0111 1111, but what about a value represented by a complement such as 1111 1111? I’m just going to call it 1111 1111 1111 1111. In other words, regardless of whether the positive number or the negative number represented by the complement, you just need to fill in the 0 and 1 in the high place.

Relationship between memory and disk

As we all know, the five basic components of a computer are memory, controller, arithmetic unit, input and output equipment. From the perspective of storage function, memory can be divided into memory and disk. We have introduced memory above, and the following is to introduce the relationship between disk and memory

Programs cannot run without reading into memory

The main storage components of a computer are memory and disk. Programs stored on disk must be loaded into memory to run. Programs stored on disk cannot run directly. This is because the CPU, which is responsible for parsing and running the program contents, needs to read the program instructions by specifying the memory address through the program counter.

The disk structure

Disk cache

As we mentioned above, disks and memory tend to have a symbiotic relationship, working together and having a good cooperative relationship. Every time memory reads data from disk, it must read the same thing, so there must be a role to store what we often need to read. We often use caching technology when we do software, so the hardware level is no exception, disk cache, disk cache is called disk cache.

Disk caching refers to a way of storing data read from disk into memory so that when the same content needs to be read later, it is read from the disk cache instead of the actual disk. Some technology or framework is bound to solve some problem, and disk caching greatly improves disk access speed.

Virtual memory

Virtual memory is the second medium for memory and disk interaction. Virtual memory is the use of a portion of the disk as imaginary memory. This is in contrast to disk caching, which is imaginary disk (actually memory), and virtual memory, which is imaginary memory (actually disk).

Virtual memory is a technology of memory management in computer system. It makes the application think that it has contiguous available memory (a complete address space), but in reality, it is usually split into multiple physical fragments, with some stored on an external disk manager for data exchange as necessary.

With the help of virtual memory, programs can still run when memory is low. For example, you can run a 10MB program with only 5MB of memory left. Since the CPU can only execute programs loaded into memory, the virtual memory space needs to be swapped with the memory space and run the program.

How to swap virtual memory with memory

There are two methods of virtual memory: paging and segmenting. Windows uses paging. In this way, the program is divided into pages of a certain size and replaced by pages without considering the structure of the program. In paging mode, we call reading disk contents into memory Page In, and writing disk contents into memory Page Out. The page size of a Windows computer is 4KB, that is, the application needs to be shred by 4KB pages, put on disk by page, and then replaced.

To implement the memory function, Windows provides virtual memory usage files (page files) on disk. This file is generated and managed by Windows and is the same size as virtual memory, usually 1-2 times the size of memory.

Physical structure of the disk

Now that we’ve looked at the physical structure of CPU and memory, let’s look at the physical structure of disk. The physical structure of a disk refers to how data is stored on the disk.

A disk is used by dividing its physical surface into multiple Spaces. There are two ways to divide: variable length and sector. The former divides the physical structure into Spaces of variable growth, while the latter divides the disk structure into Spaces of fixed length. Windows disks and floppy disks use sectors. In a sector, the space that divides the disk surface into concentric circles is a track, and the space that divides the track into a fixed amount of storage space is a sector

A sector is the smallest unit of physical read and write to a disk. A disk used in Windows is usually 512 bytes per sector. However, Windows logically reads and writes to disks in units of integer multiple sectors. The value of cluster 1 can be 512 bytes (sector 1 for cluster 1), 1KB (sector 2 for cluster 1), 2KB, 4KB, 8KB, 16KB, or 32KB(sector 64 for cluster 1). Clusters and sectors are equal in size.

Compression algorithm

We’ve all had the experience of compressing and decompressing files, using file compression to reduce the size of files when they become too large. For example, the limit for uploading files on wechat is 100 MB. I have a folder here that cannot be uploaded, but the files I decompress will be less than 100 MB, so my files can be uploaded.

In addition, when we save the photos taken by the camera to the computer, we also use a compression algorithm for file compression, the file compression format is generally JPEG.

So what is a compression algorithm? How is the compression algorithm defined? So before we get to the algorithm we need to know a little bit about how files are stored, right

File storage

A file is a form of storing data on a storage medium such as a disk. The most basic unit of stored data in a program file is the byte. The size of a file is expressed as xxxKB or xxxMB, because the file is stored in the unit of B = Byte.

A file is a collection of bytes of data. There are 256 types of byte data in 1-byte (8-bit) format, which is 0000 0000-1111 1111 in binary format. A file is a text file if the data stored in it is literal. If it is a graph, the file is an image file. In any case, the number of bytes in the file is stored consecutively.

Definition of compression algorithm

File collection is actually a collection of bytes of data, so we can give a definition of compression algorithm.

A compaction algorithm is an algorithm that compacts data. It consists of two steps: compaction and uncompaction.

In fact, it is an algorithm that reduces the file byte space and occupies space without changing the original file attributes.

According to the definition of compression algorithm, we can divide it into different types:

Lossy and lossless

Lossless compression: it can reconstruct the compressed data without distortion and restore the original data accurately. It can be used to compress executable files, common files, disks, and multimedia data when data accuracy is strictly required. The compression of this method is small. Such as difference coding, RLE, Huffman coding, LZW coding, arithmetic coding.

Lossy compression: there is distortion, the original data can not be completely accurate recovery, the reconstructed data is only an approximation of the original data. It can be used in situations where the accuracy of data is not high, such as multimedia data compression. The compression of this method is relatively large. Examples are predictive coding, sound sensing coding, fractal compression, wavelet compression, JPEG/MPEG.

symmetry

If the complexity of the codec algorithm and the required time are about the same, it is a symmetric coding method, and most compression algorithms are symmetric. However, there are also asymmetries, which are generally difficult to encode and easy to decode, such as Huffman coding and fractal coding. But the encoding method used in cryptography is the opposite. It is easy to encode and very difficult to decode.

Between frames and within frames

In video coding, in-frame and inter-frame encoding methods are used at the same time. In-frame encoding refers to the encoding method completed independently in a frame of image, and static image encoding, such as JPEG; Interframe coding, on the other hand, requires reference to the before and after frames to encode and decode, and takes into account the compression of time redundancy between frames in the encoding process, such as MPEG.

The real time

In some multimedia applications, data need to be processed or transmitted in real time (such as live digital recording and video recording, MP3/RM/VCD/DVD, video/audio on demand, live network broadcast, video telephone, and video conference). The codec delay is generally less than or equal to 50 ms. This requires simple/fast/efficient algorithms and high speed/complex CPU/DSP chips.

grading

Some compression algorithms can simultaneously process multimedia data of different resolutions, different transmission rates and different quality levels, such as JPEG2000 and MPEG-2/4.

These concepts are abstract, mainly in order to let you know about the classification of compression algorithms, we will analyze the characteristics and advantages of several commonly used compression algorithms on the concrete

Understanding of several commonly used compression algorithms

Mechanism of RLE algorithm

Let’s take a formal look at file compression. First let’s try to compress the file (text file) of AAAAAABBCDDEEEEEF with 17 half-corner characters. Although these words have no practical meaning, they are a good way to describe the compression mechanism of RLE.

Since corner characters are stored as one byte in the file, the size of the file is 17 bytes. As shown in figure

So, how do you compress this file? Consider, too, that we can use any compression algorithm that makes the file less than 17 bytes.

The most obvious way to compress, which I think you’ve already figured out, is to reduplicate the same character, which is the number of times a character is repeated. So the top file will look like this when compressed

From the figure, we can see that the 17 characters of AAAAAABBCDDEEEEEF are successfully compressed into 12 characters of A6B2C1D2E5F1, that is, 12/17 = 70%, the compression ratio is 70%, the compression is successful.

As such, the compression method of expressing the file contents as data * repeats is called the RLE(Run Length Encoding) algorithm. RLE algorithm is a good compression method, often used to compress fax images, etc. Since the essence of image files is also a collection of bytes of data, RLE algorithm can be used for compression

Huffman algorithm and Morse code

Next we introduce another compression algorithm, namely Huffman algorithm. Before you understand Huffman’s algorithm, you must discard the fact that each character of a semicolon number is 1 byte (8 bits) of data. Let’s take a look at the basic idea of Huffman’s algorithm.

Text files are composed of different types of characters, and the number of occurrences of different characters is also different. For example, in A text file, it is not uncommon for A to appear 100 times or so and Q to be used only three times. The key of Huffman’s algorithm is that data that occurs repeatedly can be represented in bytes less than 8 bits, while data that is not frequently used can be represented in bytes more than 8 bits. If A and Q are represented by 8 bits, the size of the original file is 100 times * 8 bits + 3 times * 8 bits = 824 bits. If A is represented by 2 bits and Q is represented by 10 bits, it is 2 times 100 + 3 times 10 = 230 bits.

Note, however, that the final disk storage is 8 bits per byte to hold files.

Huffman algorithm is a bit more complicated, but before we get into it let’s have some dessert and learn about Morse code. You must have seen American TV shows or war movies, and Morse code is often used to transmit information during war, as shown below

Now let’s talk about Morse code. Here’s an example of Morse code. Think of 1 as a short dot (di) and 11 as a long dot (da).

Morse coding generally represents the character with the highest frequency in the text as a short code. As shown in the table, if the short bit is 1 and the long bit is 11, then the data character E (di) can be represented by 1, and C (tick-tock) can be represented by 9-bit 110101101. In actual Morse code, if the length of the short point is 1, the length of the long point is 3, and the interval between the short point and the long point is 1. The length here refers to the length of the sound. For example, we want to use the AAAAAABBCDDEEEEEF example above to rewrite it in Morse code, where characters are separated by symbols representing time intervals. We’re going to use zero zero here.

So, AAAAAABBCDDEEEEEF The text becomes A * 6 times + B * 2 times + C * 1 times + D * 2 times + E * 5 times + F * 1 times + character interval * 16 = 4 bits * 6 times + 8 bits * 2 times + 9 bits * 1 times + 6 bits * 2 times + 1 bits * 5 times + 8 bits * 1 times + 2 bits * 16 times = 106 bits = 14 bytes.

So the compression ratio using Morse code is 14/17 = 82%. Efficiency is not outstanding.

Implement Huffman algorithm with binary tree

As mentioned earlier, Morse coding determines the length of encoding data representing each character according to the frequency of occurrence of each character in daily text. However, AAAAAABBCDDEEEEEF is not the most efficient text in the system.

Now let’s look at Huffman’s algorithm. Huffman algorithm is to construct the best coding system for each compressed object file and compress it on the basis of this coding system. Therefore, what kind of code (Huffman code) is used to split the data depends on the file. Huffman encoded information and compressed data are stored in files compressed by Huffman algorithm.

Next, we are sorting out the a-F characters in AAAAAABBCDDEEEEEF according to the principle that characters with high frequency should be represented with as few digit codes as possible. The results, sorted in order of frequency from highest to lowest, also list the coding schemes.

character

frequency

Coding (scheme)

digits

A

6

0

1

E

5

1

1

B

2

10

2

D

2

11

2

C

1

100

3

F

1

101

3

In the encoding scheme in the above table, as the frequency of occurrence decreases, the data bits of character encoding information also gradually increase, from 1 and 2 bits at the beginning to 3 bits in turn. But there is A problem with the code system. You don’t know the three-digit code 100, which means that E, A and A are represented by 1, 0 and 0. Or do we write B and A in terms of 10 and 0? Let’s call C 100 again.

In Huffman algorithm, the coding system that can distinguish clearly can be constructed even without using characters to distinguish symbols through constructing the coding system of Huffman tree. However, the algorithm of Huffman tree is more complex. The following is the construction process of a Huffman tree.

In natural trees, leaves grow from the root, whereas in Huffman trees, leaves grow from the branches

Huffman trees can increase the compression ratio

After using Huffman tree, the data with higher frequency occupies fewer bits, which is also the core idea of Huffman tree. From step 2 in the figure above, we can see that when the branches connect data, we start with the data with low frequency. This means that less frequently occurring data reaches more branches in the root. More branches means more bits of code.

Next, let’s look at the compression ratio of Huffman tree, as shown in the figure above. AAAAAABBCDDEEEEEF is 000000000000 100100 110 101101 0101010101 111,40 bits = 5 bytes. Before compression, the data was 17 bytes, and after compression, the data reached an astonishing 5 bytes, or compression ratio = 5/17 = 29%. Such a high compression ratio is simply amazing.

For reference, you can use Huffman trees as compression algorithms for any type of data

The file type

Before compression

After the compression

The compression ratio

Text file

14862 bytes

4119 bytes

28%

The image file

96062 bytes

9456 bytes

10%

EXE file

24576 bytes

4652 bytes

19%

Reversible compression and non-reversible compression

Finally, let’s look at the data form of the image file. The purpose of image file is to output image data to display, printer and other devices. Commonly used image formats are: BMP, JPEG, TIFF, GIF format, etc.

  • BMP: An image form created by using the Windows paintbrush
  • JPEG: a form of image data commonly used in digital cameras
  • TIFF: Is a form of image that quickly displays the nature of the data by including “tags” in the file
  • GIF: A form of data developed in the United States that requires no more than 256 colors

Image files can use the RLE algorithm and Huffman algorithm introduced earlier, because image files in most cases do not require data to be restored to the same state as before compression, allowing the loss of some data. We call the compression that can be restored to the pre-compression state reversible compression, and the compression that cannot be restored to the pre-compression state non-reversible compression.

Generally speaking, JPEG file is not reversible compression, so some image information is blurred after restoration. GIF is reversible compression

The operating system

Operating system Environment

The program contains the content of the operating environment, it can be said that the operating environment = operating system + hardware, operating system can also be called software, it is composed of a series of instructions. We are not going to cover the operating system, we are going to focus on hardware identification.

I’m sure we’ve all played games. What do you do before you play? Need to take a look at their notebook or computer can liver from the game? Here is a game configuration (miss Wow)

The main configuration in the figure is as follows

  • Operating system version: said is the application program runs in which system environment, now there are three main operating system environment on the market, Windows, Linux and Unix, generally we play large games are almost running on Windows, can say that Windows is the heaven of games. Windows operating systems are also divided into 32-bit operating systems and 64-bit operating systems, which are incompatible.

  • Processor: Processor refers to the CPU. Your computer’s computing power is generally defined as the number of instructions per second it can process. For a deeper understanding, read another post by Blogger: CPU core Knowledge Programmers need to know

  • Graphics card: graphics card assumes the output task of graphics, so it is also known as Graphic Processing Unit (GPU), graphics card is also very important, for example, I played before the sword ling open five files (in fact, the image becomes clearer) will card, in fact, is the reason why the graphics card can not display.

  • Memory: Memory is main memory. It is the amount of memory that your application can dynamically analyze instructions while running. Its size also determines how fast your computer can run

  • Storage: Storage is the amount of disk space used to install the application. As you can see from the figure, the minimum storage space for the game must be greater than 5GB, which is a lot of what we leave behind to install the game.

The type of CPU is a particularly important parameter from the point of view of the environment in which the program is running. In order for the program to run properly, the minimum CPU configuration must be met.

The CPU can only interpret its own native language. Different cpus can interpret different kinds of machine languages. Machine language programs are called native code. Programs written by programmers in a high-level language like C are simply text files. Text files (with the exception of text encoding) can be displayed and edited in any environment. We call this source code. By compiling the source code, you get native code. The diagram below illustrates this process.

uploading-image-703074.png

The Windows operating system overcomes hardware differences other than cpus

The hardware of a computer is not just made up of the CPU, but also the data and memory used to store program instructions, as well as peripherals such as keyboards, monitors, hard disks, and printers connected by I/O.

In WIndows software, keyboard input, monitor output, and so on do not send commands directly to the hardware. It does this by sending instructions to Windows. Therefore, programmers do not have to pay attention to the different composition of memory and I/O addresses. Windows operation is hardware rather than software, software through operating Windows system can achieve the purpose of controlling hardware.

Apis vary among operating systems

Let’s look at the types of operating systems. The same model of computer, can install the type of operating system will also have a variety of choices. For example, in addition to Windows, the AT compatibles can also use multiple operating systems such as Linux, the Unix family, and FreeBSD (also a Unix operating system). Of course, applications must be developed specifically for each operating system type. The language of the machine varies depending on the type of CPU, and the way an application communicates to the operating system varies depending on the type of operating system.

The way an Application program passes instructions to the system is called an Application Programming Interface (API). Windows and Linux operating system apis provide a combination of functions that any application can take advantage of. Because the apis of different operating systems are different. Therefore, how to port the same application to another operating system must override the API used by the application.

The functions of keyboard input, mouse input, display output, file input and output to interact with peripherals are provided through the API.

This is why Windows applications cannot be ported directly to Linux; the apis are too different.

On the same type of operating system, the API is almost the same regardless of the hardware. However, because the machine language of different cpus varies, the native code varies.

A history of operating system functionality

Operating system is also a kind of software, any new things must have its historical background, then the operating system is not a vacuum appeared, must have its historical background.

In the days before there was an operating system for computers, there were no programs at all, and people controlled computers with buttons, which was a very cumbersome process. As a result, someone developed a monitoring program with only load and run functions, which is the prototype of the operating system. By starting the monitor in advance, programmers can load various programs into memory to run as needed. It’s still a bit of a hassle, but it’s a lot less work than developing without any programs at all.

With the development of The Times, people find that many programs have common parts in the process of writing programs using monitoring programs. For example, typing text on a keyboard, displaying data on a monitor, etc., would be a waste of time if the same processing was required every time a new application was written. Therefore, the basic input/output part of the program is appended to the monitor. That’s how the early operating systems were born.

Similar ideas can share, people found more application can be appended to the monitoring program, such as hardware control program, programming language processor (assembly, compilation, analysis) as well as a variety of applications, and so on, the result is formed and the difference of the operating system now, that is, in fact the operating system is a set of multiple programs.

Features of Windows operating system

Windows operating system is the largest user group in the world. As a veteran User of Windows operating system, do you know the characteristics of Windows operating system? Here are some features of the Windows operating system

  • The Windows operating system comes in two versions: 32-bit and 64-bit
  • throughAPIFunction integration to provide system calls
  • Provides a user interface using a graphical user interface
  • throughWYSIWYGTo achieve print output, WYSIWYG Is actually What You See Is What You Get. It Is worth noting that the graphics and text displayed on the display can be exported to the printer for printing as Is.
  • Provides multi-task function, that is, enables multiple tasks at the same time
  • Provides network functions and database functions
  • Through plug and play device driver self-setting

These are some of the features that make sense to programmers, and here are some of them

32-bit operating system

The 32-bit operating system represented here represents the data size that can be processed most efficiently. The basic unit for Windows to process data is 32 bits. This is different from the original 16-bit operating system such as MS-DOS, because in a 16-bit operating system, it takes two times to process 32-bit data, while a 32-bit operating system only needs one time to process 32-bit data. Therefore, the maximum data that can be processed in Windows applications is 32 bits.

For example, when using C language to process integer data, there are three options: 8-bit char, 16-bit short, and 32-bit long. Using long with larger bits only increases memory and disk overhead and has little impact on performance.

Most of the operating systems out there today are 64-bit, and so are 64-bit operating systems.

System calls are provided through a set of API functions

Windows provides system calls through a set of functions called apis. An API is an Interface between an Application program and an operating system.

The current mainstream 32-bit version of Windows API is also called Win32 API, so named, is needed to distinguish with different operating systems, such as the first 16-bit version of Win16 API, and later popular Win64 API.

The API is provided through multiple DLL files, and the entities of each API are functions written in C language. Therefore, it is easier to use the API in C. For example, the MessageBox() function used by the API is stored in the DLL file user32.dll provided by Windows.

Provides a GUI based user interface

Graphical User Interface (GUI) refers to a Graphical User Interface, a Graphical User Interface that can be visualized by clicking on a window or icon on a display. For example: There are two versions of the Linux operating system, a compact version that controls hardware directly from the command line, and a visual version that controls hardware by clicking a graphical interface with the cursor.

Print the output through WYSIWYG

WYSIWYG refers to the fact that the output on the display can be printed directly from the printer. In Windows, the monitor and printer are treated as equivalent graphics output devices, and this feature also enables WYSIWYG.

WYSIWYG makes it easier for programmers. Initially, in order to now display and print in the printer, it is necessary to write their respective programs, and in Windows, you can use WYSIWYG basically in a program can do display and print these two functions.

Provides multi-task function

Multitasking refers to the ability to run multiple applications at the same time, and Windows uses clock splitting technology to achieve multitasking. Clock splitting refers to the way multiple programs switch between running at short intervals. To the user, it looks like multiple programs are running at the same time, and underneath it is CPU time slicing, which is at the heart of multithreading and multitasking.

Provides network functions and database functions

In Windows, networking is provided as a standard feature. Database (database server) functionality is sometimes appended. Network capabilities and database capabilities are not essential to the operating system, but because they are so close to the operating system, they are collectively referred to as middleware rather than applications. Meaning in the middle layer of operating system and application, operating system and middleware together, called system software. Applications can take advantage of not only the operating system, but also the functionality of middleware.

While the operating system cannot be replaced easily once installed, the middleware can be replaced as needed. However, for most applications, replacing the middleware will cause the application to be replaced. From this point of view, it is not so easy to change the middleware.

Realize the automatic setting of device driver through plug and play

Plug-and-play refers to a mechanism that can be used when a new device is plugged into a computer, which automatically installs and sets up drivers to control the device

The device driver is a part of the operating system that provides basic input and output functions with the hardware. Keyboard, mouse, monitor, disk device, etc., the necessary hardware in the computer device drivers, are generally installed with the operating system.

Sometimes DLL files are also installed with device driver files. These DLL files are used to store the newly added hardware API, through the API, can be made to run the heart of the hardware application.

Assembly language and native code

As we discussed in previous articles, computer CPUS can only run native code (machine language) programs. Code written in high-level languages such as C needs to be compiled by the compiler and converted into native code before it can be interpreted and executed by the CPU.

But native code is very unreadable, so it needs to be replaced with a language that can be read directly. That is, in each native code, with an abbreviation for what it does. For example, add the abbreviation of Add (addition) in the local code of the addition operation, add the abbreviation of CMP (compare) in the local code of the comparison operator, etc., these symbols that express specific local code instructions through abbreviations are called mnemonics, and the language using mnemonics is called assembly language. In this way, by reading assembly language, you can also understand the meaning of native code.

However, even the source code written in assembly language must eventually be converted to native code before it can be run. The program responsible for doing this is called a compiler, and the conversion process is called assembly. An assembler and a compiler are the same in their ability to convert source code into native code.

Source code written in assembly language corresponds to native code one to one. Thus, native code can also be converted to code written in assembly language. The process of converting native code into assembly code is called disassembly, and the program that performs disassembly is called a disassembler.

Even source code written in C is converted to native code for a particular CPU when compiled. By disassembling it, you can obtain the source code of the assembly language and investigate its contents. However, converting native code into C source code decompilation is more difficult than converting native code into assembly code disassembly, because C code and native code are not one-to-one correspondence.

Through the compiler output assembly language source code

We mentioned above that native code can be disassembled into assembly code, but is this the only conversion? Obviously not. Source code written in C can also be compiled by a compiler called assembly code. Let’s try it out.

First you need to do some preparation first need to download the Borland c + + 5.5 compiler, for convenience, my side directly download the reader can be directly extracted from my baidu network location (link: pan.baidu.com/s/19LqVICpn… Password: hz1u)

Download is completed, need to be configured, the following is the configuration instructions (wenku.baidu.com/view/22e2f4…

First, use Windows Notepad and other text editors to write the following code

Int AddNum(int a,int b){return a + b; } void MyFunc(){int c; C = AddNum (123456); } Duplicate codeCopy the code

After writing, save the file name as sample4.c, the extension of the C language source file, usually represented by.C, the above program is to provide two input parameters and return the sum of them.

In Windows, open the command prompt, switch to the folder where sample4.c is saved, and type in the command prompt

Bcc32-c-s sample4.c copies the codeCopy the code

Bcc32 is the command to start Borland C++, the -c option is to compile only without linking, and the -s option is used to specify the source code to generate assembly language

As a result of compilation, an assembly language source named sample4.asm is generated in the current directory. The extension of the assembly language source file, usually represented by.asm, will be opened in the editor to look at the contents of Sample4.asm

.386p ifdef ?? version if ?? version GT 500H .mmx endif endif model flat ifndef ?? version ? debug macro endm endif ? debug S "Sample4.c" ? debug T "Sample4.c" _TEXT segment dword public use32 'CODE' _TEXT ends _DATA segment dword public use32 'DATA' _DATA ends _BSS segment dword public use32 'BSS' _BSS ends DGROUP group _BSS,_DATA _TEXT segment dword public use32 'CODE' _AddNum proc near ? live1@0: ; ; int AddNum(int a,int b){ ; push ebp mov ebp,esp ; ; ; return a + b; ; @1: mov eax,dword ptr [ebp+8] add eax,dword ptr [ebp+12] ; ; }; @3: @2: pop ebp ret _AddNum endp _MyFunc proc near ? live1@48: ; ; void MyFunc(){ ; push ebp mov ebp,esp ; ; int c; ; C = AddNum (123456); ; @4: push 456 push 123 call _AddNum add esp,8 ; ; }; @5: pop ebp ret _MyFunc endp _TEXT ends public _AddNum public _MyFunc ? Debug D "sample4.c" 20343 45835 end Copies the codeCopy the code

Thus, the compiler has successfully converted C into assembly code.

A pseudoinstruction that does not convert native code

While assembly code may seem difficult to first-time readers, it is actually relatively simple, and perhaps even simpler than C. There are a few points to note in order to read the source code for assembly code

The source code of assembly language is made up of instructions to convert native code (opcodes described below) and pseudo-instructions to the assembler. The pseudo-instruction is responsible for the construction of the program and assembly method to indicate to the assembler (converter). However, pseudoinstructions cannot be converted into native code by assembly. Here is the pseudo-instruction captured by the above program

_TEXT segment dword public use32 'CODE' _TEXT ends _DATA segment dword public use32 'DATA' _DATA ends _BSS segment dword  public use32 'BSS' _BSS ends DGROUP group _BSS,_DATA _AddNum proc near _AddNum endp _MyFunc proc near _MyFunc endp _TEXT Ends end Copies the codeCopy the code

The part enclosed by the segment and ends directives is called a segment definition by adding a name to the collection of commands and data that make up the program. The English expression of section definition has the meaning of region. In this program, section definition refers to the collection of programs such as commands and data. A program consists of several section definitions.

At the beginning of the above code, three segment definitions are defined with names _TEXT, _DATA, and _BSS. _TEXT is the specified segment definition, _DATA is the segment definition of the data initialized (with an initial value), and _BSS is the segment definition of the data that has not been initialized. The name of this definition is defined by Borland C++ and is automatically assigned by the Borland C++ compiler, so the sequence of segment definitions is _TEXT, _DATA, _BSS, which also ensures memory continuity

_TEXT segment dword public use32 'CODE' _TEXT ends _DATA segment dword public use32 'DATA' _DATA ends _BSS segment dword Public use32 'BSS' _BSS ends Specifies the replication codeCopy the code

A segment is used to distinguish or divide areas of scope. The segment directive in assembly language indicates the beginning of a segment definition, and the ENDS directive indicates the end of a segment definition. A segment definition is a contiguous segment of memory

The pseudo-instruction group indicates that the two segments _BSS and _DATA are defined to summarize the group named DGROUP

DGROUP group _BSS,_DATA Copies the codeCopy the code

_TEXT segment and _TEXT ends enclose _AddNum and _MyFun as defined by _TEXT.

_TEXT segment dword public use32 'CODE' _TEXT ends Copy CODECopy the code

Thus, even if instructions and data are jumbled up in source code, they are translated into neat native code after compilation and assembly.

The portions enclosed by _AddNum proc and _AddNum endp, and the portions enclosed by _MyFunc proc and _MyFunc endp represent the ranges of AddNum and MyFunc functions, respectively.

_AddNum proc near _AddNum endp _MyFunc proc near _MyFunc endp Copies codeCopy the code

Borland C++ specifies that function names are preceded by an underscore _ after compilation. AddNum functions written in C are internally handled by the name _AddNum. The section enclosed by the proc and ENDP directives represents the scope of the procedure. In assembly language, this C equivalent of a function is called a procedure.

At the end of the end directive, indicating the end of source code.

The syntax of assembly language is opcode + operand

In assembly language, a line represents an instruction for a pair of cpus. The syntax structure of assembly language instructions is opcode + operand, there are also only opcode without operand instructions.

Opcodes represent instruction actions, and operands represent instruction objects. An opcode used with an operand is an English instruction. For example, from the perspective of English grammar, opcodes are verbs and operands are objects. For example, in the English command “Give me money”, “Give” is the opcode, “me” and “money” are the operands. When there are multiple operands in assembly language, they are separated by commas, as in Give me,money.

The form of opcodes that can be used depends on the type of CPU. The functions of opcodes are summarized below.

To run, native code needs to be loaded into memory, which holds the instructions and data that make up the native code. As the program runs, the CPU reads data and instructions from memory and places them in internal CPU registers for processing.

If you are not familiar with the cpu-memory relationship, please read another article by the author about CPU core knowledge programmers need to know.

Registers are storage areas in the CPU. In addition to temporary storage and calculation functions, registers also have computing functions. The following figure shows the main types and roles of the x86 series

Command parsing

The following instructions in the CPU are analyzed

The most commonly used MOV instruction

The most commonly used instruction is the MOV instruction for data storage in registers and memory. The two operands of the MOV instruction are used to specify the storage place and read source of data respectively. Operands can specify registers, constants, labels (appended to addresses), and those enclosed in square brackets ([]). If something is specified that is not enclosed in square brackets ([]), the value is processed; If you specify something enclosed in square brackets, the value of the square brackets is interpreted as a memory address, and the value of the memory address is read or written to. Let’s illustrate the code snippet above

Mov eBP, ESP mov eax, Dword PTR [eBP +8] copy codeCopy the code

Mov eBP, ESP,esp register is stored directly in the EBP, that is, if the ESP register is 100 then the EBP register is 100.

In mov eAX,dword PTR [EBP +8], the eBP register value +8 will be resolved as the memory address. If the ebp

If the register value is 100, then the eAX register value is the address value of 100 + 8. A dword PTR (double Word pointer) is a pointer that reads 4 bytes of data from a specified memory address

Push and pop the stack

When the program runs, it allocates a data space in memory called the stack. Stack (stack) is a feature of last in first out, data is stored from the lower level of memory (large address number) gradually to the upper level (small address number), when read is read from the top down.

The stack is an area where temporary data is stored. It is characterized by data storage and readout through push instructions and POP instructions. Storing data onto the stack is called pushing, while reading data off the stack is called pushing. In 32-bit x86 cpus, a push or pop is used to process 32-bit data (4 bytes).

Function call mechanism

Now let’s analyze the function call mechanism, we wrote the above C language code for example. First, let’s start with the assembly language part where the MyFunc function calls the AddNum function to illustrate the function calling mechanism. The stack plays a huge role in the function call. Here is the assembly processing of the MyFunc function after processing

_MyFunc proc near push ebp ; (1) mov ebp,esp; Store the esp register value into the EBP register (2) Push 456; (3) Push 123; (4) call _AddNum; Call AddNum (5) add esp,8; Esp register value + 8 (6) pop eBP; The values in the readout stack are stored in the ESP register (7) RET; End the MyFunc function and return to the call source (8) _MyFunc endp copy codeCopy the code

The treatments (1), (2), (7), and (8) in the code explanation apply to all functions in C, and we’ll show what the AddNum function does later. I want you to focus on (3) – (6), which is crucial to understanding the mechanism of function calls.

(3) and (4) represent arguments passed to the AddNum function pushed onto the stack. In the C source code, the function AddNum(123,456) is described, but it is pushed in the order 456,123 first. That is, the next value is pushed first. This is the rule of C. (5) represents the call instruction, which will jump the program flow to the address of AddNum function instruction. In assembly language, the function name represents the memory address of the function. After the AddNum function completes, the flow must return to line (6). After the call instruction is executed, the memory address of the next line of the call instruction (i.e. line (6)) will be pushed automatically. This value is popped off the stack with the RET instruction at the end of the AddNum processing, and the program returns to line (6).

(6) The two parameters stored in the stack (456 and 123) will be destroyed. Although this can be done with two pop instructions, it is more efficient to use the ESP register + 8 (processing once). When the stack is entered and output with values, the units of values are 4 bytes. Therefore, by adding 4 times 8 to the ESP register responsible for stack address management, you can achieve the same effect as running the POP command twice. Although the data in memory actually remains, the data is destroyed by updating the VALUE of the ESP register to the data location in front of the data store address.

When I compile the sample4.c file, I get the message shown below

The value of c is defined in MyFunc but never used. This is a compiler optimized function. Since the variable c containing the return value of the AddNum function is not used later, the compiler considers this variable meaningless and does not generate the corresponding assembly language code.

The following figure shows the stack memory changes before and after the AddNum function is called

Internal processing of a function

After analyzing the entire sample4.c process in assembly code, we will now focus on the source code of the AddNum function, and analyze the mechanism for receiving, returning, and returning parameters

_AddNum proc near push ebp -----------(1) mov ebp,esp -----------(2) mov eax,dword ptr[ebp+8] -----------(3) add Eax, dword PTR [ebp + 12] -- -- -- -- -- -- -- -- -- -- - (4) the pop ebp -- -- -- -- -- -- -- -- -- -- - (5) the ret -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - (6) _AddNum endp duplicate codeCopy the code

The value of the EBP register is pushed in (1) and pushed out in (5). This is mainly to restore the contents of the EBP register used in the function to the state before the function call.

In (2), the esp register, which manages the stack address, is assigned to the EBP register. This is because the ARGUMENTS in square brackets in the MOV instruction are not allowed to specify the ESP register. Therefore, instead of using esp directly, the eBP register is used to read and write the contents of the stack.

(3) use [eBP + 8] to specify the first parameter 123 stored in the stack and read it into the EAX register. Like this, you can refer to the contents of the stack without using the POP instruction. The EAX register was chosen from multiple registers because EAX is the cumulative register responsible for computation.

The result of adding the value of the current EAX register to the second parameter is stored in the EAX register by the add instruction in (4). [eBP + 12] is used to specify the second parameter, 456. In C, it is also stipulated that the return value of a function must be returned through the EAX register. That is, the parameters of the function are passed through the stack and the return value is returned through the register.

After the RET instruction in (6) is run, the function returns the destination memory address and is automatically removed from the stack. Accordingly, the program flow will jump back to the next line in (6) (Call _AddNum). At this point, the stack state changes at the entry and exit of the AddNum function, as shown in the figure below

Global and local variables

After familiar with assembly language, next we will learn about global variables and local variables, variables defined outside the function is called global variables, variables defined inside the function is called local variables, global variables can be used in any function, local variables can only be used inside the function defined local variables. Let’s look at the differences between global variables and local variables in assembly language.

The following C code defines local and global variables, and assigns values to each variable. Let’s look at the source code section first

// define the initialized global variable int a1 = 1; int a2 = 2; int a3 = 3; int a4 = 4; int a5 = 5; // define uninitialized global variables int b1,b2,b3,b4,b5; / / define function void MyFunc () {/ / local variables defined int c1, c2, c3, c4, c5, c6, c7, c8, c9, c10; // assign c1 = 1; c2 = 2; c3 = 3; c4 = 4; c5 = 5; c6 = 6; c7 = 7; c8 = 8; c9 = 9; c10 = 10; // Assign local variables to global variables a1 = c1; a2 = c2; a3 = c3; a4 = c4; a5 = c5; b1 = c6; b2 = c7; b3 = c8; b4 = c9; b5 = c10; } Duplicate codeCopy the code

The above code is quite violent, but it does not matter, it is easy for us to analyze the assembly source code, we use Borland C++ compiled assembly code as follows, after compiling the source code is quite long, we only use part of it for analysis (we change the sequence of the section definition, remove some comments).

_DATA segment dword public use32 'DATA' align 4 _a1 label dword dd 1 align 4 _a2 label dword dd 2 align 4 _a3 label dword dd 3 align 4 _a4 label dword dd 4 align 4 _a5 label dword dd 5 _DATA ends _BSS segment dword public use32 'BSS' align 4 _b1 label dword db 4 dup(?) align 4 _b2 label dword db 4 dup(?) align 4 _b3 label dword db 4 dup(?) align 4 _b4 label dword db 4 dup(?) align 4 _b5 label dword db 4 dup(?) _BSS ends _TEXT segment dword public use32 'CODE' _MyFunc proc near push ebp mov ebp,esp add esp,-20 push ebx push esi mov eax,1 mov edx,2 mov ecx,3 mov ebx,4 mov esi,5 mov dword ptr [ebp-4],6 mov dword ptr [ebp-8],7 mov dword ptr [ebp-12],8 mov dword ptr [ebp-16],9 mov dword ptr [ebp-20],10 mov dword ptr [_a1],eax mov dword ptr [_a2],edx mov dword ptr [_a3],ecx mov dword ptr [_a4],ebx mov dword ptr [_a5],esi mov eax,dword ptr [ebp-4] mov dword ptr [_b1],eax mov edx,dword ptr [ebp-8] mov dword ptr [_b2],edx mov ecx,dword ptr [ebp-12] mov dword ptr [_b3],ecx mov eax,dword ptr [ebp-16] mov dword ptr [_b4],eax mov edx,dword ptr [ebp-20] mov dword ptr [_b5],edx pop esi pop ebx mov esp,ebp pop ebp Ret _MyFunc endp _TEXT ends Copies the codeCopy the code

Compiled programs are grouped into groups called segment definitions.

  • Initialized global variables are summarized into a segment definition named _DATA

    _DATA segment dword public use32 ‘DATA’ … _DATA ends Copies the code

  • Global variables that are not initialized are summarized into a segment definition named _BSS

    _BSS segment dword public use32 ‘BSS’ … _BSS ends copies the code

  • Assembly code surrounded by segment definition _TEXT is Borland C++ definition

    _TEXT segment dword public use32 ‘CODE’ _MyFunc proc near … _MyFunc endp _TEXT ends Copies the code

Before we look at the above assembly code, let’s look at some more assembly instructions. This table is a continuation of some of the above opcodes and their functions

opcode

The operand

function

add

A,B

Add the values of A and B and assign the result to A

call

A

Call function A

cmp

A,B

A and B are compared, and the comparison result is automatically stored in the flag register

inc

A

Plus 1 for A

ige

Tag name

This command is combined with the CMP command. Jump to the label line

jl

Tag name

This command is combined with the CMP command. Jump to the label line

jle

Tag name

This command is combined with the CMP command. Jump to the label line

jmp

Tag name

This command is combined with the CMP command. Jump to the label line

mov

A,B

Assign the value of B to A

pop

A

Read the value from the stack and store it in A

push

A

Put the value of A on the stack

ret

There is no

Return processing to the calling source

xor

A,B

The bits of A and B are compared and the result is stored in A

Let’s first look at the contents of the _DATA section definition. _A1 label Dword defines the _A1 label. The tag represents the position relative to the start of the segment definition. Since _a1 is at the beginning of the _DATA section definition, the relative position is 0. _a1 is equivalent to the global variable a1. Compiled function and variable names are preceded by an (_), as Borland C++ requires. Dd 1 means that the application allocates 4 bytes of memory and stores the initial value of 1. Dd refers to the definition of double word indicating that there are two byte fields (word) of length 2, which means 4 bytes.

In Borland C++, the assembler converts a1 = 1 to _a1 label dword and dd 1, since the length of int is 4 bytes. Similarly, labels _a2-_A5 are defined that correspond to a2-a5 of the global variables, and their initial values 2-5 are also stored in their respective 4 bytes.

Next, let’s talk about what the _BSS section defines. Here we define the label _b1-_b5 that corresponds to the global variables B1-b5. Where db 4dup(?) Represents a field for which 4 bytes have been allocated, but the value has not yet been determined. (). Db (define Byte) Indicates that there is a 1-byte memory space. Therefore, db 4 dup(?) In this case, 4 bytes of memory.

Note: db 4 dup(?) Not to be confused with DD 4, the former refers to four memory Spaces of 1 byte length. Db 4 represents a double byte (= 4 bytes) memory space with a value of 4

Temporarily ensure the memory space used by local variables

As we know, local variables are temporarily stored in registers and stacks. A stack is used internally to store local variables. After a function call, local variable values are destroyed, but registers may be used for other purposes. Therefore, local variables are only temporarily stored in registers and stacks during the processing of functions.

Recall that the above code defines 10 local variables. This is to show that local variables are stored not only on the stack, but also in registers. To ensure c1-C10 required domains, registers are used when they are free and stacks are used when register space is low.

Let’s move on to the above code. The _TEXT section definition represents the scope of the MyFunc function. The memory area required for local variables defined in MyFunc functions. Will be allocated in registers as much as possible. You might think that using a high-performance register instead of regular memory is a waste of resources, but the compiler doesn’t think so, and it uses a register whenever it has space. Since the register access speed is much faster than memory, direct access to the register can be efficiently processed. The use of registers for local variables is optimized by the Borland C++ compiler.

The following in the code listing represents the part that allocates local variables to registers

Mov eax,1 mov EDx,2 mov ecx,3 mov ebx,4 mov ESI,5Copy the code

It is not enough to define a local variable; it is only when a local variable is assigned that it is allocated to the memory area of the register. The above code is equivalent to assigning 5 local variables C1-C5 values 1-5 respectively. Eax, EDX, ECX, EBX, esi are the names of x86 32-bit CPU registers. It is up to the compiler to decide which register to use.

X86 cpus have more than a dozen registers that a program can operate on, with at most a few free. Thus, when the number of local variables exceeds the number of registers, there are not enough registers to allocate, in which case the compiler uses a stack to store the remaining local variables.

In this part of the code, after assigning registers to local variables C1-C5, there are not enough registers available. Thus, the remaining five local variables c6-C10 are allocated to the stack memory space. As shown in the code below

mov dword ptr [ebp-4],6 mov dword ptr [ebp-8],7 mov dword ptr [ebp-12],8 mov dword ptr [ebp-16],9 mov dword ptr [EBP-20],10 Copy codeCopy the code

Add ESP,-20 means to subtract 20 from the ESP register (stack pointer) where the stack data is stored. To ensure that the memory variables C6-C10 are on the stack, we need to keep the space needed for five local variables of type int (4 bytes * 5 = 20 bytes). Mov ebp,esp Assign the esp register value to the EBP register. Mov ESP eBP at the exit of the function is used to restore the ESP register value to its original state, thus freeing the stack space for allocation. In this case, the local variables used in the stack will disappear. This is also a stack cleanup. In the case of registers, local variables automatically disappear when the register is used for other purposes, as shown in the figure below.

mov dword ptr [ebp-4],6 mov dword ptr [ebp-8],7 mov dword ptr [ebp-12],8 mov dword ptr [ebp-16],9 mov dword ptr [EBP-20],10 Copy codeCopy the code

These five lines of code are the part of inserting values into the stack space. Before applying memory space to the stack, the value of esp register was saved in the ESP register with the help of mov EBP and ESP. Therefore, By using [EBP-4], [EBP-8], [EBP-12], [EBP-16], [EBP-20], it is possible to allocate 20 bytes of stack memory space into 5 4 bytes of space. For example, mov dword PTR [EBP-4],6 represents 4 bytes of data stored in the address ([EBP-4]) from the lower end of the requested allocated memory space (the location indicated by the EBP register).

Loops control the processing of statements

The above are all sequential processes, so now let us analyze the processing of the loop process, see how to realize the flow control of c language programs such as the for loop and if conditional branch, we still take the code and the result after compilation as an example, see the process of the program control process.

Void MySub(){void MySub(){void MyFunc (){int I; for(int i = 0; i < 10; I++){// call MySub ten times MySub(); }} Copy the codeCopy the code

The above code takes the local variable I as a loop condition and loops through the MySub function ten times. Here is its main assembly code

xor ebx, ebx ; 0 @4 call _MySub; Call MySub inc ebx; Ebx register + 1 CMP ebx,10; Compare the value of the EBX register with 10. If less than 10 jump to @4 copy the codeCopy the code

The for statement in C executes loop processing by specifying the initial value of the loop counter (I = 0), the continuation condition of the loop (I < 10), and the update of the loop counter (I ++) in parentheses. The opposite of this assembly code is implemented by comparison instructions (CMP) and jump instructions (JL).

Let’s explain the above code

The only local variable used in MyFunc is I, which allocates memory space to the EBX register. The I = 0 in the parentheses of the for statement is converted to xor ebx. With ebx, the xOR instruction xOR the first operand from the left and the second operand from the right, and stores the result in the first operand. Because the first and second operands are specified as ebx, it becomes an XOR operation on the same value. That is, no matter what the value of the current register is, the final result is 0. Similarly, we used MOV EBx,0, to get the same result, but the xOR instruction was faster and the compiler turned on optimization.

XOR refers to the XOR operation, which is based on the rule that if a and B have different values, the XOR result is 1. If a and B have the same value, the xOR result is 0.

XOR operation with the same value results in 0. XOR is computed by the rule that the result is 1 if the value is different and 0 if the value is the same. For example, 01010101 and 01010101 perform XOR operations on each digit bit. Because each digit is the same, the result is 0.

After the ebX register value is initialized, the _MySub function is called via call. After the return from _MySub, the Inc ebx instruction is executed to perform the + 1 operation on the ebx value.

I need to know the difference between I ++ and ++ I

I ++ is assigned first, and the + 1 operation is performed on I after the copy is complete

++ I performs the +1 operation first, and then the assignment is completed

Inc The CMP on the next line is the instruction that compares the values of the first and second operands. CMP ebx,10 is equivalent to the I < 10 processing in C, which means to compare the value of the EBX register with 10. The result of the assembly language comparison instruction is stored in the CPU flag register. However, the value of the flag register is not directly referenced by the program. So how do you judge the comparison?

There are multiple jump instructions in assembly language, and these jump instructions will judge whether to jump according to the value of the flag register. For example, JL in the last line will judge whether to jump according to the value of CMP EBX,10 instruction stored in the flag register. The jL command stands for jump on less than. If I is less than 10, it jumps to the instruction at sign 4 and continues.

So the meaning of assembly code can also be used in C language to rewrite, deepen understanding

i ^= i; L4: MySub(); i++; if(i < 10) goto L4; Copy the codeCopy the code

The first line of the code I ^= I refers to the XOR operation between I and I, the XOR operation, and the MySub() function is replaced by L4 tag, and then the I increment operation, if the value of I is less than 10, the MySub() function is repeated.

Conditional branch processing method

Conditional branches are handled in a similar way to loops, using CMP and jump instructions. The following is the conditional branch code written in C language

/ / definition MySub1 function void MySub1 () {/ / not to do any processing} / / definition MySub2 function void MySub2 () {/ / not to do any processing} / / definition MySub3 function void MySub3 () {/ / Void MyFunc(){int a = 123; If (a > 100){MySub1(); } else if(a < 50){ MySub2(); } else { MySub3(); }} Copy the codeCopy the code

Very simple a conditional judgment C language code, so we put it with Borland C++ compiler after the result is as follows

_MyFunc proc near push ebp mov ebp,esp mov eax,123 ; CMP eAX,100; Compare the value of the EAX register with 100. For 100 hours, jump to @8 tag call _MySub1; Call MySub1 JMP shor@11; @8: CMP eAX,50; Compare the value of the EAX register with 50. Above 50, jump to @10 tag call _MySub2; Call MySub2 JMP shor@11; Jump to @11 tag @10: call _MySub3; Call MySub3 @11: pop ebp ret _MyFunc endp to copy the codeCopy the code

The above code uses three jump instructions, which are respectively jLE (Jump on Less or equal) jump when the comparison result is small, JGE (Jump on greater or equal) jump when the comparison result is large, and JMP which will jump regardless of the result. These jump instructions are preceded by the comparison instruction CMP, which forms the main logical form of assembly code described above.

Understand the logic necessary to run the program

By comparing the above assembly code with the C language source code, we can have a new understanding of the operation mode of the program. Moreover, the knowledge obtained from the assembly source code is also helpful to understand the features of Java and other high-level languages, such as Java has variables modified by native keywords. So the underlying variable is written in C, and there are some Java syntactic sugar whose logic is only known through assembly code. In some cases, it is also helpful to find the cause of a bug.

All the programming methods we have seen above are serial processing. What are the features of serial processing?

One of the biggest features of serial processing is that you focus on doing one thing and then do another.

Computers support multithreading, the core of which is CPU switching, as shown in the following figure

As a practical example, let’s look at a piece of code

// define the global variable int counter = 100; // define MyFunc1() void MyFunc(){counter *= 2; // define MyFunc2() void MyFunc2(){counter *= 2; } Duplicate codeCopy the code

The above code is a C program that updates the value of counter. MyFunc1() and MyFunc2() both double the value of counter and then assign the value of counter to it. Here, we assume that we have called MyFunc1 and MyFunc2 at the same time, and that the value of the global variable counter should be programmed to be 100 * 2 * 2 = 400. If you have more than one thread open, you’ll notice that sometimes counter is also 200, and if you don’t know how the program works, it’s hard to figure out why that happens.

We convert the above code to assembly language as follows

mov eax,dword ptr[_counter] ; Read counter into the eax register add eax,eax; Enlarge the value of the EAX register by 2 times. mov dword ptr[_counter],eax ; Store the value of the EAX register into counter. Copy the codeCopy the code

In multithreaded programs, processing can be switched to another thread every time a line of code in assembly language runs. So, if MyFun1 reads counter 100, and before it writes its double 200 to counter, MyFun2 reads counter 100, the result will be 200.

To avoid this bug, we can use a locking method that disables thread switching as a unit of behavior in functions or C code, or use some thread-safe method to avoid the problem.

Almost no one writes programs in assembly language anymore, because high-level languages such as C and Java are much more efficient than assembly language. However, assembly language experience is still very important, through the use of assembly language, we can better understand the operation mechanism of the computer.

To have a deeper understanding of the computer foundation and the underlying logic, Dongge recommended you read the following article!

Recommend the article

Summary of basic knowledge of computer and operating system PDF download

300 page illustrated web PDF download!

Graphic operating system, network, computer composition PDF download!

Liver for a month, finally completed 240,000 words Java interview manual

Leetcode, written by BAT boss, brush the problem notes, read the algorithm to kill 90% of the second!