Compilation series of articles has been updated three, each is the author attentively summary, I hope to help you

Teach you assembly Debug by the hand

Love love, this register is a little interesting

In the previous article we talked about some basic assembly instructions, and through a debugging software called Debug, let us see how to store instructions and data in memory, after learning these, we can understand the assembly program.

The execution of a program

First of all, through a schematic diagram to introduce the implementation process of the program, we take C language a simple Hello. C program as an example.

This is the complete Execution of a Hello World program, which involves several core components: a preprocessor, a compiler, an assembler, and a connector, which we’ll break down one by one.

  • In the Preprocessing phase, the preprocessor will start#Character to modify the source C program.#include <stdio.h>The command tells the preprocessor to read the system header filestdio.hAnd insert it into the program as text.
  • And then theCompilation Phase, the compiler will put the text filehello.iTranslate into texthello.sIt includes an assembly-language program.

Assembly language is very useful because it can provide its own set of standard output languages for different high-level languages.

  • After the compilation is completeAssembly PhaseS is translated into machine instructions, which are packaged into the Relocatable Object Program in the hello.c file.
  • The last one isLinking Phase, this stage is to merge the translated programs together with the linker to generate executable files that run directly on the operating system.

So, in general, an executable has two aspects

  1. Programs and data, the basic information that makes up an executable file.
  2. Relevant descriptive information, such as the size of the space, the size of the program, etc., are essential elements of an executable file.

Understanding assembler

Again, let’s start with assembly code, and then slowly summarize it below.

assume cs:code
code segment
		mov ax,1234H
		add ax,ax
		mov bx,1111H
		add bx,bx
code ends
end
Copy the code

There are a few things you may not know about this assembly code, but the MOV and add directives should be known (if you read my previous article and studied it carefully).

The instructions constituting the assembler are divided into two kinds: one is assembly instruction, the other is pseudo instruction. Assembly instruction is mov and ADD instruction mentioned above, which have practical significance. For example, MOV is to move registers or data, and ADD is to add registers or data. Moreover, assembler instructions such as MOV and ADD have corresponding machine codes in memory and will eventually be executed by the CPU. They simply define a program segment. These pseudo-instructions will be interpreted directly by the compiler. They have no corresponding machine code in memory, so they will not be executed by the CPU.

There are three kinds of pseudo instructions mentioned above, namely

code segment
	......
code ends
Copy the code

Segment and ends are a pair of instructions that must appear in pairs without either of them. This pair defines a segment, with the segment beginning and the ends ending. Code represents the name of the segment, which can be substituted at will.

An assembler program consists of multiple segments (at least one segment) that are used to hold code, data, or stack space. The sections in the code in the example above are made up of code, so they are called snippets.

In addition to sections, an assembler also needs to assume, which is also a pseudo-instruction. It assumes that a certain register is associated with a certain section, and assumes that association. Assume doesn’t have to be deeply understood, as long as we know how to associate a specific purpose segment with the relevant register when programming.

End is the symbol of the end of an assembler program, which is also a pseudo-instruction. The compiler will stop compiling when it meets end in the process of compiling an assembler program. Therefore, if we finish writing an assembler program, we need to add end at the end of the program to indicate the end of the program.

In assembler programs, in addition to assembly instructions and pseudoinstructions, there is a label, such as code in the above code, that precedes a segment as the name of the segment that will eventually be compiled and concatenated into the address of the segment.

“Ends” is used with segment as an assembly segment, and “end” is used as the end of the assembly.

So to sum up, source programs written in assembly language include pseudo-instructions and assembly instructions. Pseudo-instructions are executed by the compiler, and assembly instructions can be translated into machine code and finally executed by the CPU.

Later, we can refer to the contents of the source program file as the source program, and the instructions or data ultimately executed and processed by the computer in the source program as the program. The program first exists in the form of assembly instructions in the original program, and then after compilation and connection, it is converted into machine code and stored in the executable file, as shown in the following figure

So, to sum up, writing an assembler consists of the following steps

  • Start by defining a segment, such as code, ABC, etc
  • Writes an assembly instruction to a segment
  • Indicates when the program ends
  • The label is associated with the register.
  • Program return (more on that later)

To return

A complete program is to be returned to a condition, the program only after the execution of the relevant code execution returns conditions, yield the CPU executive power, the operating system will allocate time to other applications, program cannot take in CPU don’t put all the time, it is a waste of resources, and has been occupied by the CPU, will cause the program to collapse.

In assembly language, the implementation returns only two lines of instruction

mov ax,4c00H
int 21H
Copy the code

Explain what these two instructions mean:

Mov ax,4c00H is to move 4c00 to AX, and INT 21H is to call the system interrupt instruction.

So far, we’ve looked at several terminations, such as end of segment, end of assembler, and return of program. The following table lists the differences between the three instructions.

Program error

Generally speaking, the program error of assembly language can be divided into two kinds: syntax error and logic error.

Syntax errors are very simple, to put it bluntly, you write wrong assembly language instructions, this program compile time will be found.

Logical errors occur at runtime and are generally not easy to detect and difficult to troubleshoot. For example, the following code is a logical error if it is not returned by a program.

assume cs:code
code segment
		mov ax,1234H
		add ax,ax
		mov bx,1111H
		add bx,bx
code ends
end
Copy the code

Why is that? Because you have no return logic in this code. There are many similar logic errors, which need to be found in a specific scenario.

Write the assembly

Now we start to use the editor to write the assembly source program, as long as the assembly is stored as a text file, and then edited by the compiler, the CPU can run.

We can use a variety of text formats to write assembler, for example we can use the simplest text file to write (based on the Win7 operating system environment)

assume cs:codeseg
codeseg segment
		mov ax,0123H
		mov bx,0456H
		add ax,bx
		add ax,ax
codeseg ends
end
Copy the code

Once written, it is stored as an.asm suffix file, which is an assembly format.

compile

A complete assembler execution process is divided into writing, compiling, linking and running, so next we need to compile the completed assembler. Before compiling we need to find a corresponding compiler, here we use masM 5.0 assembly compiler, the execution program is masm.exe.

(In order to prevent you from looking for resources on the website, I have downloaded it and put it on the network disk. You can get it by replying to MASM on the background of programmer Cxuan)

When it comes to the process of using MASM 5.0, I stepped on a lot of pits, here is a reminder for you, close pits in time!!

  • Masm 5.0 is a stable version, 6.x is circulating on the Internet. I don’t know what happened, BUT I didn’t run it successfully.
  • Masm 5.0 is going to run in Windows 7, I used Windows 11 test, the program is not compatible, I don’t know about other versions. Windows 7 works fine.

After the installation, we opened CMD and went to the masM 5.0 folder that was downloaded and unzipped.

Then type MASm directly

[.asm] [.asm] [.asm] [.asm] [.asm] [.asm] [.asm] If the source file does not have the.asm suffix, enter its full name, which is test.txt.

Here we typed test because we wrote the file with the.asm suffix.

After inputting the file name of the source program, press Enter, the program will prompt us to input the name of the target file to be compiled. The target file name is the final result after we compile the source program. The extension name of Object filename is.obj, because.asm files are automatically compiled to.obj files, we do not need to specify a filename, just press enter, the.obj file will be generated.

Source listing, which prompts you to enter the name of the list file, is an intermediate result of compiling the Source program into the target file. You can tell the compiler to type Enter instead of generating the file. If the compiler were to generate this file, its suffix would be.lst.

The cross-reference file is the same as Source listing, which is an intermediate result generated by the compiler. You can press Enter to stop the compiler from generating the file. If the compiler were to generate this file, its suffix would be.crf.

At the end, the compiler produces a result output that displays warning errors and errors that must be corrected. As you can see from the figure above, our program has no warning or compilation errors.

[unable to open input file] [unable to open input file] [unable to open input file] [Unable to open input file] [Unable to open input file] [Unable to open input file] [Unable to open input file] This error also occurs.

The connection

After compiling the source program and getting the object file, we need to connect the object file to get the executable file. Last step we got the.obj file. Now we need to concatenate the.obj file into the.exe executable.

To implement our requirements, we need to use Microsoft Overlay Linker 3.60 connector. The file is called link.exe. This application does not need to be downloaded again (the software we get from my official account will include the compiler and Linker, which will all be in the MASM folder when unzipped).

Now let’s go to DOS, CD to masM file and type link.

Note that the default file ends with.obj, so if the file you want to connect to is an obj file, you do not need to enter the suffix. If it is not an obj file, you need to enter the full name.

We just compiled a test.obj file, so let’s connect directly to this obj file.

Enter the file name you want to connect to (you still need to enter the path to obj here), and press Enter.

After entering Enter, a triple prompt continues.

The first prompt indicates that the program continues to prompt us for the name of the executable we want to generate. The executable is the end result of a program we want to connect to. The default.exe file is test.exe, so we no longer need to specify a filename. It is also possible to specify the directory where the executable will be generated, but we don’t need that either, so keep going.

The second prompt is for the linker to prompt for the name of the image file, which is the intermediate result of the linker linking the object file to an executable, or for the linker not to generate the file and to continue.

The third prompt is the linker prompt to enter the name of the library file. The library file contains some callable subroutines. If the program calls the library subroutine, you need to specify it.

At the end, a waring: No stack segment appears. I thought it would not generate the final execution file, but when I checked it carefully, I found that it was only an waning file. The final execution file of the waning is in the masm folder.

This prompt only tells us that there is no stack segment, we can completely ignore this prompt, of course, if your program has a problem, it will not be able to generate the connected file.

The process of connecting is useful and, in the end, serves three main purposes

  • When the source program is large, it can be divided into multiple source files for compilation, and each separately compiled object file can be linked together to produce an executable file.
  • Program called a library file subroutine, the need to connect the library file and the target file together to generate an executable file.
  • A machine code file generated after compilation, some of which cannot be executed directly. The linker needs to convert these contents into executable information to be able to connect the compiled machine code file into an executable.

Executing the application

Now I have an ASM file in my left hand, an OBJ file in my right hand, and an EXE file in my mouth, so I am the king of mouth. Waste a long time strength, finally will ASM into exe file, tired fart, but don’t hurry to rest, still poor last step, execute it!

So we execute the following test.exe file

I’m a little confused. How come there’s nothing here? What’s the output? .

On second thought, oh, we didn’t use any libraries to output information to the console, we just did some data and registers moving and adding.

Of course we can output to the console, but we’ll show you that later.

Let’s talk a little bit about the loading process

As we all know, if a program is to be executed, it needs to be loaded into memory, and then the CPU takes the finger from memory to execute the command.

So, when we use DOS, who is responsible for loading executable programs into memory?

In DOS, there is something called the command interpreter command.com, which is also the shell for the DOS system.

After the DOS starts, it initializes the system and then runs command.com. After the DOS starts, a prompt is displayed for user input after other tasks are completed.

If a user enters a command to be executed, such as CD or taskkill, the command executes the command and waits for the user to input the command.

If the user enters a program to execute, Command finds the executable file by name, loads it into memory, sets the CS:IP entry, and then pauses, the CPU executes the program, and when the program is finished, it returns to Command, and again waits for user input.

So, the execution of a complete assembler is as follows.

If this article is well written and helpful to you, then I ask for a thumbs up!