It is well known that Go uses the plan9 assembly, a Unix relic. Even if you’re familiar with x86 assembly, there are some differences in plan9. You might be looking at the code and come across an SP that looks like SP but isn’t.

This article provides a comprehensive introduction to plan9 assembly and answers most of the questions you may encounter when working with plan9 assembly.

The platform used in this article is Linux AMD64, because the different platforms have different instruction sets and registers, so there is no way to discuss them together. This is also due to the nature of the assembly itself.

Basic instructions

Stack adjustment

Intel or AT&T assembles provide a family of push and pop instructions. There are no push and pop instructions in plan9. Although there are push and pop instructions in plan9, they are generally not in the generated code. The stack adjustment we see is mostly realized by calculating the hardware SP register, for example:

SUBQ $0x18, SP // Assign a stack frame to a function by subtracting SP.// Omit the useless code
ADDQ $0x18, SP // Add SP to clear the stack frame
Copy the code

The general instructions are similar to those for the X64 platform and are described in the following sections.

The data handling

The constant is represented in the plan9 assembly as num, can be negative, and is decimal by default. This can be expressed as num and can be negative. By default, it is decimal. This can be expressed as num and can be negative. By default, it is decimal. Hexadecimal numbers can be represented as 0x123.

MOVB $1, DI      // 1 byte
MOVW $0x10, BX   // 2 bytes
MOVD $1, DX      // 4 bytes
MOVQ $- 10, AX     // 8 bytes
Copy the code

As you can see, the length of the haul is determined by the MOV suffix, which is slightly different from the Intel assembly. Look at a similar X64 assembly:

mov rax, 0x1   // 8 bytes
mov eax, 0x100 // 4 bytes
mov ax, 0x22   // 2 bytes
mov ah, 0x33   // 1 byte
mov al, 0x44   // 1 byte
Copy the code

The plan9 assembly has operands in the opposite direction from the Intel assembly, similar to AT&T.

MOVQ $0x10, AX ===== mov rax, 0x10
       |    |------------|      |
       |------------------------|
Copy the code

However, there are always exceptions, and if you want to know more about these exceptions, see Resources [1].

Common calculation instruction

ADDQ  AX, BX   // BX += AX
SUBQ  AX, BX   // BX -= AX
IMULQ AX, BX   // BX *= AX
Copy the code

Similar to data transport instructions, the suffix of the instruction can also be modified to correspond to the operands of different lengths. For example ADDQ/ADDW/ADDL/ADDB.

Conditional jump/Unconditional jump

// Jump unconditionally
JMP addr   // Jump to the address. The address could be the address in the code, but it doesn't happen in handwriting
JMP label  // Jump to tag, which can jump to tag position within the same function
JMP 2(PC)  // Jump forward/back x lines based on the current instruction
JMP 2 -(PC) / / same as above

// Conditional jump
JZ target // If zero flag is set, jump

Copy the code

Instruction set

Refer to the ARCH section of the source code.

As an added bonus, Go 1.10 added a lot of SIMD instruction support, so it’s not as painful to write as before, i.e. no human flesh byte.

register

General purpose register

Universal register of AMD64:

(lldb) reg read General Purpose Registers: rax = 0x0000000000000005 rbx = 0x000000c420088000 rcx = 0x0000000000000000 rdx = 0x0000000000000000 rdi = 0x000000c420088008 rsi = 0x0000000000000000 rbp = 0x000000c420047f78 rsp = 0x000000c420047ed8 r8 = 0x0000000000000004 r9  = 0x0000000000000000 r10 = 0x000000c420020001 r11 = 0x0000000000000202 r12 = 0x0000000000000000 r13 = 0x00000000000000f1 r14 = 0x0000000000000011 r15 = 0x0000000000000001 rip = 0x000000000108ef85 int`main.main + 213 at int.go:19 rflags = 0x0000000000000212 cs = 0x000000000000002b fs = 0x0000000000000000 gs = 0x0000000000000000Copy the code

Plan9 assembly is available. The main general purpose registers used at the application code level are: Rax, RBX, RCX, RDX, Rdi, RSI, r8~ R15, although RBP and RSP can also be used, but bp and SP will be used to manage the top and bottom of the stack, it is best not to use them for calculation.

The register in Plan9 does not need to be prefixed with r or e, such as rax, just write AX:

MOVQ $101, AX = mov rax, 101
Copy the code

Here’s how the names of the universal universal registers correspond in X64 and Plan9:

X64 rax rbx rcx rdx rdi rsi rbp rsp r8 r9 r10 r11 r12 r13 r14 rip
Plan9 AX BX CX DX DI SI BP SP R8 R9 R10 R11 R12 R13 R14 PC

Pseudo register

The assembly of Go also introduces four pseudo-registers, as described in the official document:

  • FP: Frame pointer: arguments and locals.
  • PC: Program counter: jumps and branches.
  • SB: Static base pointer: global symbols.
  • SP: Stack pointer: top of stack.

The official description is slightly flawed, so we’ve expanded on these notes a bit:

  • FP: Use the formsymbol+offset(FP)Refers to the input parameters of the function. For example,arg0+0(FP).arg1+8(FP)When using FP without symbol, it cannot be compiled. At the assembly level, symbol is not useful. The use of symbol is mainly to improve the code readability. In addition, although the official document refers to the pseudo-register FP as frame Pointer, in fact it is not frame Pointer at all. According to the traditional x86 convention, frame Pointer refers to the BP register at the bottom of the entire stack frame. If the current callee function is add, and the FP is referenced in the add code, the FP will point not to callee’s stack frame, but to caller’s stack frame. See below for detailsThe stack structureChapter.
  • PC: It is actually the PC register commonly seen in the knowledge of architecture, which corresponds to the IP register on the x86 platform and rip on the AMD64. With the exception of individual jumps, handwritten plan9 code has little to do with PC registers.
  • SB: A global static base pointer, usually used to declare functions or global variables. You’ll see how to use it later in the function knowledge and examples section.
  • SP: The SP register of plan9 points to the start of the local variable of the current stack frame, using the formsymbol+offset(SP)Refers to a local variable of a function. The valid value of offset is [-framesize, 0). If the local variables are all 8 bytes, then the first local variable can be usedlocalvar0-8(SP)To represent. This is also a non-lexical register. And the hardware register SP are two different things, in the stack frame size of 0, the pseudo register SP and the hardware register SP point to the same location. Hand-written assembly code, if yessymbol+offset(SP)Form, represents the pseudo register SP. If it isoffset(SP)Represents the hardware register SP. Be sure to pay attention. For compile output (go tool compile -s/go tool objdump), all SP is currently a hardware register SP, whether with or without symbol.

Here are a few of the confusing points:

  1. A dummy SP and a hardware SP are not the same thing. In handwritten code, a dummy SP and a hardware SP are distinguished by whether the SP has a symbol in front of it. If there is a symbol, it is a pseudo register. If there is no symbol, it is a hardware SP register.
  2. The relative positions of SP and FP vary, so you should not try to use the pseudo SP register to find values that FP + offset refers to, such as the function’s input and return values.
  3. The official document says that the false SP points to the top of the stack, there is a problem. The local variable is actually at the bottom of the stack (except for caller BP), so bottom is more appropriate.
  4. In the go tool objdump/go tool compile -s output code, there is no pseudo SP and FP register, we said above to distinguish between pseudo SP and hardware SP register method, for the output of the above two commands is not used. In the results of compilation and disassembly, there are only real SP registers.
  5. Framepointer in the source code refers to the value of the caller BP register. In this case, it is equivalent to the value of the caller SP.

The above explanation does not understand it does not matter, after familiar with the function stack structure and then repeatedly check back should be able to understand. Personal opinion, these are the pits dug by Go officials.

Variable declarations

A variable in assembly is usually a read-only value stored in a.rodata or.data section. At the application level, these are initialized global const, var, or static variables/constants.

Use DATA in conjunction with GLOBL to define a variable. The usage of DATA is:

DATA    symbol+offset(SB)/width, value
Copy the code

Most of the arguments are literal, but the offset needs a little attention. The value is an offset relative to the symbol, not an offset relative to a global address.

Declare the variable as global using the GLOBL directive, accepting two additional arguments, one for flag and the other for the total size of the variable.

GLOBL divtab(SB), RODATA, $64
Copy the code

GLOBL must follow the DATA directive. Here is a complete example of a global variable that defines multiple readonly variables:

DATA age+0x00(SB)/4, $18  // forever 18
GLOBL age(SB), RODATA, $4

DATA pi+0(SB)/8, $3.1415926
GLOBL pi(SB), RODATA, $8

DATA birthYear+0(SB)/4, $1988
GLOBL birthYear(SB), RODATA, $4
Copy the code

As I said before, all symbols are declared with an offset of 0.

It is also possible to define an array, or a string, in a global variable, and use a non-zero offset. For example:

DATA bio<>+0(SB)/8, $"oh yes i"
DATA bio<>+8(SB)/8, $"am here "
GLOBL bio<>(SB), RODATA, $16
Copy the code

Most of this makes sense, but here we introduce a new tag, <>, which follows the symbol name to indicate that the global variable is valid only in the current file, similar to C static. Relocation Target not found if the variable is referenced in another file.

Flag mentioned in this section can also have other values:

  • NOPROF = 1
(For  `TEXT`  items.) Don't profile the marked function. This flag is deprecated.
Copy the code
  • DUPOK = 2
It is legal to have multiple instances of this symbol in a single binary. The linker will choose one of the duplicates to use.
Copy the code
  • NOSPLIT = 4
(For  `TEXT`  items.) Don't insert the preamble to check if the stack must be split. The frame for the routine, plus anything it calls, must fit in the spare space at the top of the stack segment. Used to protect routines such as the stack splitting code itself.
Copy the code
  • RODATA = 8
(For  `DATA`  and  `GLOBL`  items.) Put this data in a read-only section.
Copy the code
  • NOPTR = 16
(For  `DATA`  and  `GLOBL`  items.) This data contains no pointers and therefore does not need to be scanned by the garbage collector.
Copy the code
  • WRAPPER = 32
(For  `TEXT`  items.) This is a wrapper function and should not count as disabling  `recover`.
Copy the code
  • NEEDCTXT = 64
(For  `TEXT`  items.) This function is a closure so it uses its incoming context register.
Copy the code

To use these flag literals, #include “textflag.h” in the assembly file.

Global variables in. S and. Go files are interworking

Global variables defined in.go can be used directly in.s files. Here is a simple example:

refer.go:

package main

var a = 999
func get(a) int

func main(a) {
    println(get())
}

Copy the code

refer.s:

#include "textflag.h"The TEXT, the get (SB), NOSPLIT, $0- 8 -MOVQ · A (SB), AX MOVQ AX, RET +0(FP)
    RET
Copy the code

·a(SB) : indicates that the symbol needs to be redirected (relocation). If the symbol is not found, the error is not found.

This is a simple example, so you can try it yourself.

Function declaration

Let’s look at a typical plan9 assembly function definition:

// func add(a, b int) int
// => The declaration is defined in any.go file within the same package
// => Only function header, no implementationThe TEXT pkgname, add (SB), NOSPLIT, $0- 8 -
    MOVQ a+0(FP), AX
    MOVQ a+8(FP), BX
    ADDQ AX, BX
    MOVQ BX, ret+16(FP)
    RET
Copy the code

Why TEXT? Those of you who know a little bit about the compartmentalization of program data in files and in memory should know that our code is stored in the binary, in the.text section, and that’s the convention for naming it. In plan9, TEXT is actually an instruction that defines a function. In addition to TEXT, there is DATA/GLOBL mentioned earlier in the variable declaration.

The pkgname part of the definition can be omitted or written if desired. However, if you write pkgname, you will need to change the code after you rename the package, so it is best not to write it.

Midpoint · In particular, is a Unicode midpoint, which is input option+ Shift +9 on the MAC. After the program is linked, all midpoints · are replaced with periods. For example, your method is Runtime ·main, and the symbol in the compiled program is Runtime.main. Well, it looks sick. To summarize:

Parameters and return value size | TEXT pkgname, add (SB), NOSPLIT, $32- 32| | | package name function name Stack frame size (local variable + may need additional calls the function of the total size of the parameter space, but not call other functions of ret address size)Copy the code

The stack structure

Here is the stack structure of a typical function:

----------------- current func arg0 ----------------- <----------- FP(pseudo FP) caller ret addr +---------------+ | Caller BP (*) | -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- < -- -- -- -- -- -- -- -- -- -- -- SP (pseudo SP, Is actually the current stack frame of BP position) | Local Var0 | -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - | Local Var1 | -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - | Local Var2 | -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- - | . | ----------------- | Local VarN | ----------------- | | | | | temporarily | | unused space | | | | | ----------------- | call retn | ----------------- | call ret(n-1)| ----------------- | .......... | ----------------- | call ret1 | ----------------- | call argn | ----------------- | ..... | ----------------- | call arg3 | ----------------- | call arg2 | |---------------| | call arg1 | ----------------- < -- -- -- -- -- -- -- -- -- -- -- -- the hardware SP position return addr + -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- +Copy the code

In principle, if the current function calls another function, then return addr is also on the caller stack. However, the process of inserting return ADDR on the stack is completed by the CALL instruction. In RET, SP is restored to the position on the graph. When we calculate the relative position of SP and parameters, we can think that the hardware SP points to the position on the graph.

The caller BP in the graph refers to the caller’s BP register value. Some people refer to caller BP as caller frame Pointer. This convention is actually inherited from the x86 architecture. The Go ASM documentation also refers to the pseudo-register FP as a Frame Pointer, but the two frame Pointers are not the same thing at all.

In addition, it should be noted that Caller BP is inserted by the compiler during compilation. When the user writes the code by hand, the part of caller BP is not included in the calculation of frame size. The main judgment criteria for whether to insert Caller BP are:

  1. The stack frame size of the function is greater than 0
  2. The following function returns true
func Framepointer_enabled(goos, goarch string) bool {
    returnframepointer_enabled ! =0 && goarch == "amd64"&& goos ! ="nacl"
}
Copy the code

If the compiler does not insert Caller BP(frame Pointer in the source code) in the final assembly result, and there is only 8 bytes between SP and FP in the caller return address, but BP is inserted, I have an extra 8 bytes. In other words, the relative position of the pseudo-SP and pseudo-FP is not fixed. It may be 8 bytes apart or 16 bytes apart. And the judgment basis will be different according to the platform and Go version.

As can be seen from the figure, the FP pseudo-register points to the starting position of the passed parameters of the function. Since the stack grows in the direction of low address, for the convenience of referring to parameters through registers, the placement direction of parameters is opposite to the growth direction of the stack, namely:

                              FP
high ----------------------> low
argN, ... arg3, arg2, arg1, arg0
Copy the code

Assuming all parameters are 8 bytes, we can use symname+0(FP) to access the first parameter, symname+8(FP) to access the second parameter, and so on. Using a pseudo-sp to refer to a local variable is similar in principle, but since the pseudo-sp refers to the bottom of the local variable, symname-8(SP) is the first local variable, symname-16(SP) is the second, and so on. Of course, this assumes that local variables are all 8 bytes long.

The caller return address and current func arg0 at the top of the graph are allocated by the caller. Not in the current stack frame.

Since the official document itself is quite obscure, let’s take a panoramic view of the function calls to see what the relationship between the true SP/FP/BP is:

caller +------------------+ | | +----------------------> -------------------- | | | | | caller parent BP | | BP(pseudo SP) -------------------- | | | | | Local Var0 | | -------------------- | | | | | ....... | | -------------------- | | | | | Local VarN | -------------------- caller stack frame | | | callee arg2 | | |------------------| | | | | | callee arg1 | | |------------------| | | | | | callee arg0 | | ----------------------------------------------+ FP(virtual register) | | | | | | return addr | parent return address | +----------------------> +------------------+--------------------------- <-------------------------------+ | caller BP |  | | (caller frame pointer) | | BP(pseudo SP) ---------------------------- | | | | | Local Var0 | | ---------------------------- | | | | Local Var1 | ---------------------------- callee stack frame | | | ..... | ---------------------------- | | | | | Local VarN | | SP(Real Register) ---------------------------- | | | | | | | | |  | | | | | | | +--------------------------+ <-------------------------------+ calleeCopy the code

Argsize and framesize calculate rules

argsize

In the function declaration:

The TEXT pkgname, add (SB), NOSPLIT, $16- 32
Copy the code

We already said 16−32 means 16−32 means 16−32 means framesize-argsize. When a Go function is called, both parameters and return values need to be provided by the Caller on the stack frame. Callee still needs to know this argSize when making a declaration. Argsize = sizeof(int64) * 4 argsize = sizeof(int64) * 4 argsize = sizeof(int64) * 4

But the real world is never quite as rosy as we assume. Function arguments tend to be a mix of different types, and there are memory alignment issues to consider.

If you are not sure how big an argsize you need for your function signature, you can simply implement an empty function with the same signature and then go tool objdump to look backwards to see how much space should be allocated.

framesize

The framesize of the function is a little more complicated. The framesize of the handwritten code does not need to consider the caller BP inserted by the compiler.

  1. Local variables, and the size of each of them.
  2. If there are calls to other functions in the function, if so, callee’s arguments and return values should be taken into account. Although the value of return address(RIP) is also stored on the Caller stack frame, this process is completed by the CALL instruction and RET instruction to save and restore the PC register. It also doesn’t take into account the 8 bytes that the PC register needs on the stack.
  3. In principle, you can call a function without overwriting the local variable. A framesize that allocates a few more bytes won’t die.
  4. If you want to overwrite local variables, that’s fine, as long as the logic is fine. Just make sure caller and Callee get the correct return value when entering and exiting assembly functions.

Address arithmetic

Amd64 is loaded with 8 bytes, so let’s just use LEAQ:

LEAQ (BX)(AX*8), CX
// In the above code, 8 represents scale
// Scale can only be 0, 2, 4, 8
// If it is written as any other value:
// LEAQ (BX)(AX*3), CX
// ./a.s:6: bad scale: 3

// With LEAQ, you must provide a scale even if you add the two register values directly
// The following is not acceptable
// LEAQ (BX)(AX), CX
// asm: asmidx: bad address 0/2064/2067
// The correct way to write it is
LEAQ (BX)(AX*1), CX


// In addition to the register operation, additional offset can be added
LEAQ 16(BX)(AX*1), CX

// Three registers to do the calculation
// LEAQ DX(BX)(AX*8), CX
// ./a.s:13: expected end of operand, found (
Copy the code

The advantage of using LEAQ is also obvious: it saves instructions. If basic arithmetic instructions were used to implement LEAQ’s function, it would take two or three more computation instructions to achieve LEAQ’s full function.

The sample

add/sub/mul

math.go:

package main

import "fmt"

func add(a, b int) int// Assembler function declaration

func sub(a, b int) int// Assembler function declaration

func mul(a, b int) int// Assembler function declaration

func main(a) {
    fmt.Println(add(10.11))
    fmt.Println(sub(99.15))
    fmt.Println(mul(11.12))}Copy the code

math.s:

#include "textflag.h" // We need to include textflag.h since we declare functions with flags like NOSPLIT

// func add(a, b int) intThe TEXT, the add (SB), NOSPLIT, $0- 24
    MOVQ a+0(FP), AX / / a
    MOVQ b+8(FP), BX / / parameter b
    ADDQ BX, AX    // AX += BX
    MOVQ AX, ret+16(FP) / / return
    RET

// func sub(a, b int) intThe TEXT, sub (SB), NOSPLIT, $0- 24
    MOVQ a+0(FP), AX
    MOVQ b+8(FP), BX
    SUBQ BX, AX    // AX -= BX
    MOVQ AX, ret+16(FP)
    RET

// func mul(a, b int) intThe TEXT, the mul (SB), NOSPLIT, $0- 24
    MOVQ  a+0(FP), AX
    MOVQ  b+8(FP), BX
    IMULQ BX, AX    // AX *= BX
    MOVQ  AX, ret+16(FP)
    RET
    Unexpected EOF may be reported if the last line is blank
Copy the code

Put the two files in any directory, go build and run to see the effect.

Pseudo register SP, pseudo register FP and hardware register SP

To write a simple code to prove the pseudo SP, pseudo FP and hardware SP location relationship. spspfp.s:

#include "textflag.h"

// func output(int) (int, int, int)The TEXT, the output (SB), $8- 48
    MOVQ 24(SP), DX // Without symbol, SP is the hardware register SP
    MOVQ DX, ret3+24(FP) // Return the third value
    MOVQ perhapsArg1+16(SP), BX // The current stack size is > 0, so FP is 16 bytes above SP
    MOVQ BX, ret2+16(FP) // Return the second value
    MOVQ arg1+0(FP), AX
    MOVQ AX, ret1+8(FP)  // Return the first value
    RET

Copy the code

spspfp.go:

package main

import (
    "fmt"
)

func output(int) (int.int.int)// Assembler function declaration

func main(a) {
    a, b, c := output(987654321)
    fmt.Println(a, b, c)
}
Copy the code

Executing the above code yields the following output:

987654321 987654321 987654321
Copy the code

Thinking in conjunction with the code, our current stack structure looks like this:

------
ret2 (8 bytes)
------
ret1 (8 bytes)
------
ret0 (8 bytes)
------
arg0 (8 bytes)
------ FP
ret addr (8 bytes)
------
caller BP (8 bytes)
------ pseudo SP
frame content (8 bytes)
------ hardware SP
Copy the code

The framesize of the example in this section is greater than zero, and the reader can try to change the framesize to zero, and then adjust the offset in the code that references the pseudo-sp and the hardware SP to explore the pseudo-FP when framesize is zero, The relative position between the pseudo SP and the hardware SP.

The example in this section is to show that the relative positions of pseudo-sp and pseudo-FP vary, and you should not refer to data with the offset of pseudo-sp and >0 when writing, otherwise the results may surprise you.

Assembler calls nonassembler functions

output.s:

#include "textflag.h"

// func output(a,b int) intThe TEXT, the output (SB), NOSPLIT, $24- 24
    MOVQ a+0(FP), DX // arg a
    MOVQ DX, 0(SP) // arg x
    MOVQ b+8(FP), CX // arg b
    MOVQ CX, 8(SP) // arg yThe CALL, the add (SB)// Before calling add, the parameters are moved to the top of the function stack via the physical register SP
    MOVQ 16(SP), AX // The add function will put the return value in this position
    MOVQ AX, ret+16(FP) // return result
    RET

Copy the code

output.go:

package main

import "fmt"

func add(x, y int) int {
    return x + y
}

func output(a, b int) int

func main(a) {
    s := output(10.13)
    fmt.Println(s)
}

Copy the code

Loop in assembly

With the combination of DECQ and JZ, we can implement looping logic in high level languages:

sum.s:

#include "textflag.h"

// func sum(sl []int64) int64The TEXT, the sum (SB), NOSPLIT, $0- 32
    MOVQ $0, SI
    MOVQ sl+0(FP), BX // &sl[0], addr of the first elem
    MOVQ sl+8(FP), CX // len(sl)
    INCQ CX           // CX++, because it's going to loop len times

start:
    DECQ CX       // CX--
    JZ   done
    ADDQ (BX), SI // SI += *BX
    ADDQ $8, BX   // Move the pointer
    JMP  start

done:
    // The return address is 24.
    Go: compile -s math.go: compile -s math.go
    // When we call the sum function, we pass in three values:
    // Start address of slice, len of slice, cap of slice
    // We only need len for the sum, but cap will still take up space
    / / it is 16 (FP)
    MOVQ SI, ret+24(FP)
    RET
Copy the code

sum.go:

package main

func sum([]int64) int64

func main(a) {
    println(sum([]int64{1.2.3.4.5}}))Copy the code

Expand the topic

Some of the data structures in the standard library

Numeric types

There are many numeric types in the library:

  1. int/int8/int16/int32/int64
  2. uint/uint8/uint16/uint32/uint64
  3. float32/float64
  4. byte/rune
  5. uintptr

In assembly, these types are a continuous piece of memory to store data, but the memory length is not the same, when operating on the data length of the line.

slice

As mentioned in the previous example, slice is actually expanded to take three arguments when passed to a function:

  1. First element address
  2. Slice the len
  3. Slice of the cap

When working in assembly, it’s easy to know this principle, and you can do it sequentially or indexed as you like.

string

package main

//go:noinline
func stringParam(s string) {}

func main(a) {
    var x = "abcc"
    stringParam(x)
}
Copy the code

Compile -s: compile -s: compile -s: compile -s

0x001d 00029 (stringParam.go:11)    LEAQ    go.string."abcc"(SB), AX  // Get the string address in the RODATA segment
0x0024 00036 (stringParam.go:11)    MOVQ    AX, (SP) // Put the obtained address on the top of the stack as the first argument
0x0028 00040 (stringParam.go:11)    MOVQ    $4.8(SP) // Take the string length as the second argument
0x0031 00049 (stringParam.go:11)    PCDATA  $0, $0 / / gc
0x0031 00049 (stringParam.go:11)    CALL    "".stringParam(SB) // Call stringParam
Copy the code

At the assembly level a string is an address + a string length.

struct

When passed as an argument to a function, the struct will be expanded on the caller’s stack and passed to the corresponding callee:

struct.go

package main

type address struct {
    lng int
    lat int
}

type person struct {
    age    int
    height int
    addr   address
}

func readStruct(p person) (int.int.int.int)

func main(a) {
    var p = person{
        age:    99,
        height: 88,
        addr: address{
            lng: 77,
            lat: 66,
        },
    }
    a, b, c, d := readStruct(p)
    println(a, b, c, d)
}
Copy the code

struct.s

#include "textflag.h"TEXT, readStruct (SB), NOSPLIT, $0- 64.
    MOVQ arg0+0(FP), AX
    MOVQ AX, ret0+32(FP)
    MOVQ arg1+8(FP), AX
    MOVQ AX, ret1+40(FP)
    MOVQ arg2+16(FP), AX
    MOVQ AX, ret2+48(FP)
    MOVQ arg3+24(FP), AX
    MOVQ AX, ret3+56(FP)
    RET
Copy the code

The above program outputs 99, 88, 77, 66, indicating that even embedded structures are contiguous in memory distribution.

map

Go tool compile -s (compile -s) to get a map of what to do when assigning a key:

m.go:

package main

func main(a) {
    var m = map[int]int{}
    m[43] = 1
    var n = map[string]int{}
    n["abc"] = 1
    println(m, n)
}
Copy the code

Take a look at the output in line 7:

0x0085 00133 (m.go:7)   LEAQ    type.map[int]int(SB), AX
0x008c 00140 (m.go:7)   MOVQ    AX, (SP)
0x0090 00144 (m.go:7)   LEAQ    ""..autotmp_2+232(SP), AX
0x0098 00152 (m.go:7)   MOVQ    AX, 8(SP)
0x009d 00157 (m.go:7)   MOVQ    $43.16(SP)
0x00a6 00166 (m.go:7)   PCDATA  $0, $1
0x00a6 00166 (m.go:7)   CALL    runtime.mapassign_fast64(SB)
0x00ab 00171 (m.go:7)   MOVQ    24(SP), AX
0x00b0 00176 (m.go:7)   MOVQ    $1, (AX)
Copy the code

We have already analyzed the procedure of calling the function, and the first few lines here prepare the runtime.mapassign_fast64(SB) arguments. Go to Runtime and look at the signature of this function:

func mapassign_fast64(t *maptype, h *hmap, key uint64) unsafe.Pointer {
Copy the code

Without looking at the implementation of the function, we can probably infer the relationship between the input parameters and the output parameters of the function, if the input parameters correspond to the assembly instructions:

t *maptype
=>
LEAQ    type.map[int]int(SB), AX
MOVQ    AX, (SP)

h *hmap
=>
LEAQ    ""..autotmp_2+232(SP), AX
MOVQ    AX, 8(SP)

key uint64
=>
MOVQ    $43.16(SP)
Copy the code

The return parameter is the memory address of the key to which the value can be written. After obtaining the address, we can write the value we want to write:

MOVQ    24(SP), AX
MOVQ    $1, (AX)
Copy the code

The whole process is quite complicated, we can also copy it by hand. If you’re interested in this example, you can try it yourself.

Overall, using assembly to manipulate maps is not a wise choice.

channel

A channel is also a complex data structure in Runtime. If you operate at the assembly level, you actually call the chan.go function in Runtime, which is similar to the map function.

Get goroutine id

Go’s goroutine is a structure called g that has its own unique id, but Runtime does not reveal this id, but many people want to get it. So there are various libraries or their Goroutine IDS.

As we mentioned in the struct section, the structure itself is a continuous piece of memory. If we know the starting address and the offset of the field, we can easily move this data around:

go_tls.h:

#ifdef GOARCH_arm
#define LR R14
#endif

#ifdef GOARCH_amd64
#define    get_tls(r)    MOVQ TLS, r
#define    g(r)    0(r)(TLS*1)
#endif

#ifdef GOARCH_amd64p32
#define    get_tls(r)    MOVL TLS, r
#define    g(r)    0(r)(TLS*1)
#endif

#ifdef GOARCH_386
#define    get_tls(r)    MOVL TLS, r
#define    g(r)    0(r)(TLS*1)
#endif
Copy the code

goid.go:

package goroutineid
import "runtime"
var offsetDict = map[string]int64{
    / /... Omit some lines
    "go1.7":    192."go1.7.1":  192."go1.7.2":  192."go1.7.3":  192."go1.7.4":  192."go1.7.5":  192."go1.7.6":  192./ /... Omit some lines
}

var offset = offsetDict[runtime.Version()]

// GetGoID returns the goroutine id
func GetGoID(a) int64 {
    return getGoID(offset)
}

func getGoID(off int64) int64
Copy the code

goid.s:

#include "textflag.h"
#include "go_tls.h"

// func getGoID() int64TEXT, getGoID (SB), NOSPLIT, $0- 16
    get_tls(CX)
    MOVQ g(CX), AX
    MOVQ offset(FP), BX
    LEAQ 0(AX)(BX*1), DX
    MOVQ (DX), AX
    MOVQ AX, ret+8(FP)
    RET
Copy the code

This implements a simple little library that gets the goID field in the struct G and places it here as a toy:

Github.com/cch123/goro…

SIMD

SIMD is short for Single Instruction, Multiple Data. The SIMD Instruction set on Intel platform is SSE, AVX, AVX2, and AVX512 successively. These Instruction sets introduce instructions outside the standard, and registers with larger width, such as:

  • 128-bit registers XMM0 to XMM31.
  • 256 bit YMM0~YMM31 registers.
  • 512 bit registers ZMM0 to ZMM31.

The relationship between these registers is similar to the relationship between RAX, EAX, and AX. Instructions can move or calculate multiple sets of data at the same time, for example:

  • Movups: Transfer 4 misaligned single precision values to the XMM register or memory
  • Movaps: Transfer four aligned single-precision values to the XMM register or memory

When we take an array as an argument to a function, we will most likely see, for example:

arr_par.go:

package main

import "fmt"

func pr(input [3]int) {
    fmt.Println(input)
}

func main(a) {
    pr([3]int{1.2.3})}Copy the code

go compile -S:

0x001d 00029 (arr_par.go:10)    MOVQ    "".statictmp_0(SB), AX
0x0024 00036 (arr_par.go:10)    MOVQ    AX, (SP)
0x0028 00040 (arr_par.go:10)    MOVUPS  "".statictmp_0+8(SB), X0
0x002f 00047 (arr_par.go:10)    MOVUPS  X0, 8(SP)
0x0034 00052 (arr_par.go:10)    CALL    "".pr(SB)
Copy the code

As you can see, the compiler has taken performance into account in some cases, helping us optimize data transport using the SIMD instruction set.

Because the topic of SIMD itself is quite broad, I will not elaborate here.

Special thanks to

In the process of research, the basic encounter is not very clear to harass Zhuo Giant, is this mzh.io/ da. Special thanks to him for all the clues and hints.

The resources

  1. Quasilyte. Making. IO/blog/post/g…
  2. davidwong.fr/goasm
  3. www.doxsey.net/blog/go-and…
  4. Github.com/golang/go/f…
  5. golang.org/doc/asm

[4] The slide includes the Caller’s return address in the Callee Stack frame.

This article was automatically published by ArtiPub, an article publishing platform