Find the program entry

package main

func main(a){}Copy the code

After go build main.go, use GDB, sudo GDB main (I’m a MacOS myself, GDB will freeze if sudo is not added), and execute info files in GDB to find the executable entry

Sudo GDB main (I’m macOS myself, GDB freezes if I don’t add Sudo)

 $  sudo gdb main
Password:
GNU gdb (GDB) 11.1
Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "X86_64 - apple - darwin20.4.0".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word". Reading symbols from main... (No debugging symbols found in main) Loading Go Runtime support. (gdb) info files Symbols from"goroutine/main".
Local exec file:
	`goroutine/main', file type mach-o-x86-64. Entry point: 0x105c660 0x0000000001001000 - 0x000000000105e270 is .text 0x000000000105e280 - 0x000000000105e35e is __TEXT.__symbol_stub1 0x000000000105e360 - 0x000000000108b0eb is __TEXT.__rodata 0x000000000108b100 - 0x000000000108b59c  is __TEXT.__typelink 0x000000000108b5a0 - 0x000000000108b5a8 is __TEXT.__itablink 0x000000000108b5a8 - 0x000000000108b5a8 is __TEXT.__gosymtab 0x000000000108b5c0 - 0x00000000010c7658 is __TEXT.__gopclntab 0x00000000010c8000  - 0x00000000010c8020 is __DATA.__go_buildinfo 0x00000000010c8020 - 0x00000000010c8148 is __DATA.__nl_symbol_ptr 0x00000000010c8160 - 0x00000000010c9360 is __DATA.__noptrdata 0x00000000010c9360 - 0x00000000010cb150 is .data 0x00000000010cb160 - 0x00000000010f8470 is .bss 0x00000000010f8480 - 0x00000000010fd570 is __DATA.__noptrbssCopy the code

I found the entry point 0x105C660, so I’m going to set the break point here

(gdb) b *0x105c660
Breakpoint 1 at 0x105c660
(gdb) r
Starting program: goroutine/main
[New Thread 0x1f03 of process 10002]
[New Thread 0x2303 of process 10002]
warning: unhandled dyld version (17)

Thread 2 hit Breakpoint 1.0x000000000105c660 in _rt0_amd64_darwin ()
(gdb)
Copy the code

I found the entry function rt0_amd64_darwin

rt0_darwin_amd64.s:7

TEXT _rt0_amd64_darwin(SB),NOSPLIT,$- 8 -
   JMP    _rt0_amd64(SB)
Copy the code

This function does nothing but JMP goes to _rt0_amd64 and continues

asm_amd64.s:15

TEXT _rt0_amd64(SB),NOSPLIT,$- 8 -
   MOVQ   0(SP), DI  // DI=argc
   LEAQ   8(SP), SI  // SI=&argvJMP runtime · rt0_go (SB)Copy the code

Argc goes to DI, argv goes to SI, and JMP goes to Runtime ·rt0_go

asm_amd64.s:81

The TEXT, the runtime rt0_go (SB), NOSPLIT | TOPFRAME, $0
   // copy arguments forward on an even stack
   MOVQ   DI, AX    // argc
   MOVQ   SI, BX    // &argv
   SUBQ   $(4*8+7), SP      // 2args 2auto
   // The purpose is to align SP with 16 bytes
   ANDQ   $~15, SP
   MOVQ   AX, 16(SP)
   MOVQ   BX, 24(SP)
Copy the code

These instructions do the following:

  1. willSPAccording to the16Byte alignment
  2. willargcandargvCopy to a new location

The stack to distribution is as follows

Initialize the g0

asm_amd64.s:92

// create istack out of the given (operating system) stack.
// _cgo_init may update stackguard.MOVQ $runtime · g0 (SB), DI// DI = g0
LEAQ   (- 64.*1024+104)(SP), BX     // BX=SP-64*1024 + 104
MOVQ   BX, g_stackguard0(DI)      // g0.stackguard0 = SP-64*1024 + 104
MOVQ   BX, g_stackguard1(DI)      // g0.stackguard1 = SP-64*1024 + 104
MOVQ   BX, (g_stack+stack_lo)(DI) // g0.stack.lo = SP-64*1024 + 104
MOVQ   SP, (g_stack+stack_hi)(DI) // g0.stack.hi = SP
Copy the code

Here, the stack size of G0 is initialized, which is about 64K, and the space of G0 and stack is shown as follows:

The main thread is bound to M0

For MacOS, it goes directly to OK

asm_amd64.s:181

ok:
   // set the per-goroutine and per-mach "registers"
   // #define	get_tls(r)	MOVQ TLS, r
   get_tls(BX)                  // BX = TLSLEAQ runtime g0 (SB), CX// CX=&g0
   MOVQ   CX, g(BX)             // TLS.g=&g0LEAQ runtime, m0 (SB), AX//AX=&m0

   // save m->g0 = g0
   MOVQ   CX, m_g0(AX)          // m0.g0 = &g0 
   // save m0 to g0->m 
   MOVQ   AX, g_m(CX)           // g0.m = &m0
Copy the code

In this case, the stack relation of m0, g0 and g0 is as follows:

Initialize m0 and P

asm_amd64.s:206

MOVL    16(SP), AX    // copy argc
MOVL   AX, 0(SP)
MOVQ   24(SP), AX    // copy argv
MOVQ   AX, 8(SP), CALL the runtime args, (SB.)// handle the parametersCALL the runtime, osinit (SB)// Just get the number of CPU coresCALL the runtime, schedinit (SB)Copy the code

will&argvandargcMove down, as arguments to the next function call, the stack distribution is as followsNext the scheduler is initialized

proc.go:654

// The bootstrap sequence is:
//
// call osinit
// call schedinit
// make & queue new G
/ / call the runtime mstart
//
// The new G calls Runtime ·main
func schedinit(a){(...).// raceinit must be the first call to race detector.
   // In particular, it must be done before mallocinit below calls racemapshadow.
   _g_ := getg()                //g_=g0
   if raceenabled {
      _g_.racectx, raceprocctx0 = raceinit()
   }

   sched.maxmcount = 10000      // The maximum number of m is 10000(...). mcommoninit(_g_.m,- 1)       // initialize m0, _g_. M is m0(...). lock(&sched.lock) sched.lastpoll =uint64(nanotime())
   procs := ncpu
   if n, ok := atoi32(gogetenv("GOMAXPROCS")); ok && n > 0 {
      procs = n
   }
   ifprocresize(procs) ! =nil {
      throw("unknown runnable goroutine during bootstrap")
   }
   (...)
}
Copy the code

Here Schedinit does two things in general:

  1. Initialize the m0
  2. Initialize the P

So how does m0 get initialized

// Pre-allocated ID may be passed as 'id', or omitted by passing -1.
func mcommoninit(mp *m, id int64) {
   _g_ := getg()                  // g=g0

   // g0 stack won't make sense for user (and is not necessary unwindable).
   if_g_ ! = _g_.m.g0 {// It must be equal here
      callers(1, mp.createstack[:])
   }

   lock(&sched.lock)            

   if id >= 0 {
      mp.id = id
   } else {
      mp.id = mReserveID()       // Check if the number of m is greater than 10000
   }

   mp.fastrand[0] = uint32(int64Hash(uint64(mp.id), fastrandseed))
   mp.fastrand[1] = uint32(int64Hash(uint64(cputicks()), ^fastrandseed))
   if mp.fastrand[0]|mp.fastrand[1] = =0 {
      mp.fastrand[1] = 1
   }

   mpreinit(mp)
   ifmp.gsignal ! =nil {
      mp.gsignal.stackguard1 = mp.gsignal.stack.lo + _StackGuard
   }

   // Add to allm so garbage collector doesn't free g->m
   // when it is just in a register or thread-local storage.
   mp.alllink = allm            // add m to allM

   // NumCgoCall() iterates over allm w/o schedlock,
   // so we need to publish it safely.
   atomicstorep(unsafe.Pointer(&allm), unsafe.Pointer(mp))  // Set the allM address to m
   unlock(&sched.lock)

   
}
Copy the code

This function is mainly to associate M0 with ALLM, the specific association relationship is as follows:

proc.go:4994

// Change number of processors.
//
// sched.lock must be held, and the world must be stopped.
//
// gcworkbufs must not be being modified by either the GC or the write barrier
// code, so the GC must not be running if the number of Ps actually changes.
//
// Returns list of Ps with local work, they need to be scheduled by the caller.
func procresize(nprocs int32) *p {
   assertLockHeld(&sched.lock)
   assertWorldStopped()

   old := gomaxprocs   // Initialization phase old=0
   if old < 0 || nprocs <= 0 {
      throw("procresize: invalid arg")} (...).// Grow allp if necessary.
   Len (allp)=0 during initialization
   if nprocs > int32(len(allp)) {
      // Synchronize with retake, which could be running
      // concurrently since it doesn't run on a P.
      lock(&allpLock)
      if nprocs <= int32(cap(allp)) {
         allp = allp[:nprocs]
      } else {
         // The initialization phase is executed here
         nallp := make([]*p, nprocs)
         // Copy everything up to allp's cap so we
         // never lose old allocated Ps.
         copy(nallp, allp[:cap(allp)])
         allp = nallp
      }

      (...)
      
      unlock(&allpLock)
   }

   // initialize new P's
   for i := old; i < nprocs; i++ {
      pp := allp[i]
      if pp == nil {
         pp = new(p)
      }
      pp.init(i)  // Initialize pp, pp.id = id; pp.status = _Pgcstop
      atomicstorep(unsafe.Pointer(&allp[i]), unsafe.Pointer(pp))
   }

   _g_ := getg()     // _g_=g0
   if_g_.m.p ! =0 && _g_.m.p.ptr().id < nprocs {  // during initialization, m.p is empty
      // continue to use the current P
      _g_.m.p.ptr().status = _Prunning
      _g_.m.p.ptr().mcache.prepareForSweep()
   } else {
      // release the current P and acquire allp[0].
      //
      // We must do this before destroying our current P
      // because p.destroy itself has write barriers, so we
      // need to do that from a valid P.
      if_g_.m.p ! =0 {       // during initialization, m.p is empty
         if trace.enabled {
            // Pretend that we were descheduled
            // and then scheduled again to keep
            // the trace sane.
            traceGoSched()
            traceProcStop(_g_.m.p.ptr())
         }
         _g_.m.p.ptr().m = 0
      }
      _g_.m.p = 0
      p := allp[0]
      p.m = 0
      p.status = _Pidle
      acquirep(p)  M0. P = allp[0]; allp[0].m = m0
      if trace.enabled {  
         traceGoStart()   
      }
   }

   // g.m.p is now set, so we no longer need mcache0 for bootstrapping.
   mcache0 = nil

   // The initialization phase does not go here, and later logic may
   // The logic is to transfer the resources associated with the excess P, such as G owned by P, and finally put P in sched.gFree for future reuse
   for i := nprocs; i < old; i++ {
      p := allp[i]
      p.destroy()
      // can't free P itself because it can be referenced by an M in syscall
   }
   
   // Truncate allP again to make sure the length matches the expected length
   // Trim allp.
   if int32(len(allp)) ! = nprocs { lock(&allpLock) allp = allp[:nprocs] idlepMask = idlepMask[:maskWords] timerpMask = timerpMask[:maskWords] unlock(&allpLock) }var runnablePs *p
   for i := nprocs - 1; i >= 0; i-- {
      p := allp[i]
      if _g_.m.p.ptr() == p {   // Ignore the current associated p
         continue
      }
      p.status = _Pidle
      if runqempty(p) { // During initialization, all P's have no tasks, so they are put in sched.pidle
         pidleput(p) 
      } else {         // Initialization does not go here, so the return value here is not used
         p.m.set(mget())
         p.link.set(runnablePs)
         runnablePs = p
      }
   }
   stealOrder.reset(uint32(nprocs))
   var int32p *int32 = &gomaxprocs // make compiler check that gomaxprocs is an int32
   atomic.Store((*uint32)(unsafe.Pointer(int32p)), uint32(nprocs))
   return runnablePs
}
Copy the code

At this point, the relation between stack space of M0, g0, g0 and ALLp [0] is as follows:

Now that the scheduler has been initialized, the next section looks at running the main function and creating the Goroutine