|
@@ -0,0 +1,332 @@
|
|
|
|
+This is a living document and at times it will be out of date. It is
|
|
|
|
+intended to articulate how programming in the Go runtime differs from
|
|
|
|
+writing normal Go. It focuses on pervasive concepts rather than
|
|
|
|
+details of particular interfaces.
|
|
|
|
+
|
|
|
|
+Scheduler structures
|
|
|
|
+====================
|
|
|
|
+
|
|
|
|
+The scheduler manages three types of resources that pervade the
|
|
|
|
+runtime: Gs, Ms, and Ps. It's important to understand these even if
|
|
|
|
+you're not working on the scheduler.
|
|
|
|
+
|
|
|
|
+Gs, Ms, Ps
|
|
|
|
+----------
|
|
|
|
+
|
|
|
|
+A "G" is simply a goroutine. It's represented by type `g`. When a
|
|
|
|
+goroutine exits, its `g` object is returned to a pool of free `g`s and
|
|
|
|
+can later be reused for some other goroutine.
|
|
|
|
+
|
|
|
|
+An "M" is an OS thread that can be executing user Go code, runtime
|
|
|
|
+code, a system call, or be idle. It's represented by type `m`. There
|
|
|
|
+can be any number of Ms at a time since any number of threads may be
|
|
|
|
+blocked in system calls.
|
|
|
|
+
|
|
|
|
+Finally, a "P" represents the resources required to execute user Go
|
|
|
|
+code, such as scheduler and memory allocator state. It's represented
|
|
|
|
+by type `p`. There are exactly `GOMAXPROCS` Ps. A P can be thought of
|
|
|
|
+like a CPU in the OS scheduler and the contents of the `p` type like
|
|
|
|
+per-CPU state. This is a good place to put state that needs to be
|
|
|
|
+sharded for efficiency, but doesn't need to be per-thread or
|
|
|
|
+per-goroutine.
|
|
|
|
+
|
|
|
|
+The scheduler's job is to match up a G (the code to execute), an M
|
|
|
|
+(where to execute it), and a P (the rights and resources to execute
|
|
|
|
+it). When an M stops executing user Go code, for example by entering a
|
|
|
|
+system call, it returns its P to the idle P pool. In order to resume
|
|
|
|
+executing user Go code, for example on return from a system call, it
|
|
|
|
+must acquire a P from the idle pool.
|
|
|
|
+
|
|
|
|
+All `g`, `m`, and `p` objects are heap allocated, but are never freed,
|
|
|
|
+so their memory remains type stable. As a result, the runtime can
|
|
|
|
+avoid write barriers in the depths of the scheduler.
|
|
|
|
+
|
|
|
|
+`getg()` and `getg().m.curg`
|
|
|
|
+----------------------------
|
|
|
|
+
|
|
|
|
+To get the current user `g`, use `getg().m.curg`.
|
|
|
|
+
|
|
|
|
+`getg()` alone returns the current `g`, but when executing on the
|
|
|
|
+system or signal stacks, this will return the current M's "g0" or
|
|
|
|
+"gsignal", respectively. This is usually not what you want.
|
|
|
|
+
|
|
|
|
+To determine if you're running on the user stack or the system stack,
|
|
|
|
+use `getg() == getg().m.curg`.
|
|
|
|
+
|
|
|
|
+Stacks
|
|
|
|
+======
|
|
|
|
+
|
|
|
|
+Every non-dead G has a *user stack* associated with it, which is what
|
|
|
|
+user Go code executes on. User stacks start small (e.g., 2K) and grow
|
|
|
|
+or shrink dynamically.
|
|
|
|
+
|
|
|
|
+Every M has a *system stack* associated with it (also known as the M's
|
|
|
|
+"g0" stack because it's implemented as a stub G) and, on Unix
|
|
|
|
+platforms, a *signal stack* (also known as the M's "gsignal" stack).
|
|
|
|
+System and signal stacks cannot grow, but are large enough to execute
|
|
|
|
+runtime and cgo code (8K in a pure Go binary; system-allocated in a
|
|
|
|
+cgo binary).
|
|
|
|
+
|
|
|
|
+Runtime code often temporarily switches to the system stack using
|
|
|
|
+`systemstack`, `mcall`, or `asmcgocall` to perform tasks that must not
|
|
|
|
+be preempted, that must not grow the user stack, or that switch user
|
|
|
|
+goroutines. Code running on the system stack is implicitly
|
|
|
|
+non-preemptible and the garbage collector does not scan system stacks.
|
|
|
|
+While running on the system stack, the current user stack is not used
|
|
|
|
+for execution.
|
|
|
|
+
|
|
|
|
+nosplit functions
|
|
|
|
+-----------------
|
|
|
|
+
|
|
|
|
+Most functions start with a prologue that inspects the stack pointer
|
|
|
|
+and the current G's stack bound and calls `morestack` if the stack
|
|
|
|
+needs to grow.
|
|
|
|
+
|
|
|
|
+Functions can be marked `//go:nosplit` (or `NOSPLIT` in assembly) to
|
|
|
|
+indicate that they should not get this prologue. This has several
|
|
|
|
+uses:
|
|
|
|
+
|
|
|
|
+- Functions that must run on the user stack, but must not call into
|
|
|
|
+ stack growth, for example because this would cause a deadlock, or
|
|
|
|
+ because they have untyped words on the stack.
|
|
|
|
+
|
|
|
|
+- Functions that must not be preempted on entry.
|
|
|
|
+
|
|
|
|
+- Functions that may run without a valid G. For example, functions
|
|
|
|
+ that run in early runtime start-up, or that may be entered from C
|
|
|
|
+ code such as cgo callbacks or the signal handler.
|
|
|
|
+
|
|
|
|
+Splittable functions ensure there's some amount of space on the stack
|
|
|
|
+for nosplit functions to run in and the linker checks that any static
|
|
|
|
+chain of nosplit function calls cannot exceed this bound.
|
|
|
|
+
|
|
|
|
+Any function with a `//go:nosplit` annotation should explain why it is
|
|
|
|
+nosplit in its documentation comment.
|
|
|
|
+
|
|
|
|
+Error handling and reporting
|
|
|
|
+============================
|
|
|
|
+
|
|
|
|
+Errors that can reasonably be recovered from in user code should use
|
|
|
|
+`panic` like usual. However, there are some situations where `panic`
|
|
|
|
+will cause an immediate fatal error, such as when called on the system
|
|
|
|
+stack or when called during `mallocgc`.
|
|
|
|
+
|
|
|
|
+Most errors in the runtime are not recoverable. For these, use
|
|
|
|
+`throw`, which dumps the traceback and immediately terminates the
|
|
|
|
+process. In general, `throw` should be passed a string constant to
|
|
|
|
+avoid allocating in perilous situations. By convention, additional
|
|
|
|
+details are printed before `throw` using `print` or `println` and the
|
|
|
|
+messages are prefixed with "runtime:".
|
|
|
|
+
|
|
|
|
+For unrecoverable errors where user code is expected to be at fault for the
|
|
|
|
+failure (such as racing map writes), use `fatal`.
|
|
|
|
+
|
|
|
|
+For runtime error debugging, it may be useful to run with `GOTRACEBACK=system`
|
|
|
|
+or `GOTRACEBACK=crash`. The output of `panic` and `fatal` is as described by
|
|
|
|
+`GOTRACEBACK`. The output of `throw` always includes runtime frames, metadata
|
|
|
|
+and all goroutines regardless of `GOTRACEBACK` (i.e., equivalent to
|
|
|
|
+`GOTRACEBACK=system`). Whether `throw` crashes or not is still controlled by
|
|
|
|
+`GOTRACEBACK`.
|
|
|
|
+
|
|
|
|
+Synchronization
|
|
|
|
+===============
|
|
|
|
+
|
|
|
|
+The runtime has multiple synchronization mechanisms. They differ in
|
|
|
|
+semantics and, in particular, in whether they interact with the
|
|
|
|
+goroutine scheduler or the OS scheduler.
|
|
|
|
+
|
|
|
|
+The simplest is `mutex`, which is manipulated using `lock` and
|
|
|
|
+`unlock`. This should be used to protect shared structures for short
|
|
|
|
+periods. Blocking on a `mutex` directly blocks the M, without
|
|
|
|
+interacting with the Go scheduler. This means it is safe to use from
|
|
|
|
+the lowest levels of the runtime, but also prevents any associated G
|
|
|
|
+and P from being rescheduled. `rwmutex` is similar.
|
|
|
|
+
|
|
|
|
+For one-shot notifications, use `note`, which provides `notesleep` and
|
|
|
|
+`notewakeup`. Unlike traditional UNIX `sleep`/`wakeup`, `note`s are
|
|
|
|
+race-free, so `notesleep` returns immediately if the `notewakeup` has
|
|
|
|
+already happened. A `note` can be reset after use with `noteclear`,
|
|
|
|
+which must not race with a sleep or wakeup. Like `mutex`, blocking on
|
|
|
|
+a `note` blocks the M. However, there are different ways to sleep on a
|
|
|
|
+`note`:`notesleep` also prevents rescheduling of any associated G and
|
|
|
|
+P, while `notetsleepg` acts like a blocking system call that allows
|
|
|
|
+the P to be reused to run another G. This is still less efficient than
|
|
|
|
+blocking the G directly since it consumes an M.
|
|
|
|
+
|
|
|
|
+To interact directly with the goroutine scheduler, use `gopark` and
|
|
|
|
+`goready`. `gopark` parks the current goroutine—putting it in the
|
|
|
|
+"waiting" state and removing it from the scheduler's run queue—and
|
|
|
|
+schedules another goroutine on the current M/P. `goready` puts a
|
|
|
|
+parked goroutine back in the "runnable" state and adds it to the run
|
|
|
|
+queue.
|
|
|
|
+
|
|
|
|
+In summary,
|
|
|
|
+
|
|
|
|
+<table>
|
|
|
|
+<tr><th></th><th colspan="3">Blocks</th></tr>
|
|
|
|
+<tr><th>Interface</th><th>G</th><th>M</th><th>P</th></tr>
|
|
|
|
+<tr><td>(rw)mutex</td><td>Y</td><td>Y</td><td>Y</td></tr>
|
|
|
|
+<tr><td>note</td><td>Y</td><td>Y</td><td>Y/N</td></tr>
|
|
|
|
+<tr><td>park</td><td>Y</td><td>N</td><td>N</td></tr>
|
|
|
|
+</table>
|
|
|
|
+
|
|
|
|
+Atomics
|
|
|
|
+=======
|
|
|
|
+
|
|
|
|
+The runtime uses its own atomics package at `runtime/internal/atomic`.
|
|
|
|
+This corresponds to `sync/atomic`, but functions have different names
|
|
|
|
+for historical reasons and there are a few additional functions needed
|
|
|
|
+by the runtime.
|
|
|
|
+
|
|
|
|
+In general, we think hard about the uses of atomics in the runtime and
|
|
|
|
+try to avoid unnecessary atomic operations. If access to a variable is
|
|
|
|
+sometimes protected by another synchronization mechanism, the
|
|
|
|
+already-protected accesses generally don't need to be atomic. There
|
|
|
|
+are several reasons for this:
|
|
|
|
+
|
|
|
|
+1. Using non-atomic or atomic access where appropriate makes the code
|
|
|
|
+ more self-documenting. Atomic access to a variable implies there's
|
|
|
|
+ somewhere else that may concurrently access the variable.
|
|
|
|
+
|
|
|
|
+2. Non-atomic access allows for automatic race detection. The runtime
|
|
|
|
+ doesn't currently have a race detector, but it may in the future.
|
|
|
|
+ Atomic access defeats the race detector, while non-atomic access
|
|
|
|
+ allows the race detector to check your assumptions.
|
|
|
|
+
|
|
|
|
+3. Non-atomic access may improve performance.
|
|
|
|
+
|
|
|
|
+Of course, any non-atomic access to a shared variable should be
|
|
|
|
+documented to explain how that access is protected.
|
|
|
|
+
|
|
|
|
+Some common patterns that mix atomic and non-atomic access are:
|
|
|
|
+
|
|
|
|
+* Read-mostly variables where updates are protected by a lock. Within
|
|
|
|
+ the locked region, reads do not need to be atomic, but the write
|
|
|
|
+ does. Outside the locked region, reads need to be atomic.
|
|
|
|
+
|
|
|
|
+* Reads that only happen during STW, where no writes can happen during
|
|
|
|
+ STW, do not need to be atomic.
|
|
|
|
+
|
|
|
|
+That said, the advice from the Go memory model stands: "Don't be
|
|
|
|
+[too] clever." The performance of the runtime matters, but its
|
|
|
|
+robustness matters more.
|
|
|
|
+
|
|
|
|
+Unmanaged memory
|
|
|
|
+================
|
|
|
|
+
|
|
|
|
+In general, the runtime tries to use regular heap allocation. However,
|
|
|
|
+in some cases the runtime must allocate objects outside of the garbage
|
|
|
|
+collected heap, in *unmanaged memory*. This is necessary if the
|
|
|
|
+objects are part of the memory manager itself or if they must be
|
|
|
|
+allocated in situations where the caller may not have a P.
|
|
|
|
+
|
|
|
|
+There are three mechanisms for allocating unmanaged memory:
|
|
|
|
+
|
|
|
|
+* sysAlloc obtains memory directly from the OS. This comes in whole
|
|
|
|
+ multiples of the system page size, but it can be freed with sysFree.
|
|
|
|
+
|
|
|
|
+* persistentalloc combines multiple smaller allocations into a single
|
|
|
|
+ sysAlloc to avoid fragmentation. However, there is no way to free
|
|
|
|
+ persistentalloced objects (hence the name).
|
|
|
|
+
|
|
|
|
+* fixalloc is a SLAB-style allocator that allocates objects of a fixed
|
|
|
|
+ size. fixalloced objects can be freed, but this memory can only be
|
|
|
|
+ reused by the same fixalloc pool, so it can only be reused for
|
|
|
|
+ objects of the same type.
|
|
|
|
+
|
|
|
|
+In general, types that are allocated using any of these should be
|
|
|
|
+marked as not in heap by embedding `runtime/internal/sys.NotInHeap`.
|
|
|
|
+
|
|
|
|
+Objects that are allocated in unmanaged memory **must not** contain
|
|
|
|
+heap pointers unless the following rules are also obeyed:
|
|
|
|
+
|
|
|
|
+1. Any pointers from unmanaged memory to the heap must be garbage
|
|
|
|
+ collection roots. More specifically, any pointer must either be
|
|
|
|
+ accessible through a global variable or be added as an explicit
|
|
|
|
+ garbage collection root in `runtime.markroot`.
|
|
|
|
+
|
|
|
|
+2. If the memory is reused, the heap pointers must be zero-initialized
|
|
|
|
+ before they become visible as GC roots. Otherwise, the GC may
|
|
|
|
+ observe stale heap pointers. See "Zero-initialization versus
|
|
|
|
+ zeroing".
|
|
|
|
+
|
|
|
|
+Zero-initialization versus zeroing
|
|
|
|
+==================================
|
|
|
|
+
|
|
|
|
+There are two types of zeroing in the runtime, depending on whether
|
|
|
|
+the memory is already initialized to a type-safe state.
|
|
|
|
+
|
|
|
|
+If memory is not in a type-safe state, meaning it potentially contains
|
|
|
|
+"garbage" because it was just allocated and it is being initialized
|
|
|
|
+for first use, then it must be *zero-initialized* using
|
|
|
|
+`memclrNoHeapPointers` or non-pointer writes. This does not perform
|
|
|
|
+write barriers.
|
|
|
|
+
|
|
|
|
+If memory is already in a type-safe state and is simply being set to
|
|
|
|
+the zero value, this must be done using regular writes, `typedmemclr`,
|
|
|
|
+or `memclrHasPointers`. This performs write barriers.
|
|
|
|
+
|
|
|
|
+Runtime-only compiler directives
|
|
|
|
+================================
|
|
|
|
+
|
|
|
|
+In addition to the "//go:" directives documented in "go doc compile",
|
|
|
|
+the compiler supports additional directives only in the runtime.
|
|
|
|
+
|
|
|
|
+go:systemstack
|
|
|
|
+--------------
|
|
|
|
+
|
|
|
|
+`go:systemstack` indicates that a function must run on the system
|
|
|
|
+stack. This is checked dynamically by a special function prologue.
|
|
|
|
+
|
|
|
|
+go:nowritebarrier
|
|
|
|
+-----------------
|
|
|
|
+
|
|
|
|
+`go:nowritebarrier` directs the compiler to emit an error if the
|
|
|
|
+following function contains any write barriers. (It *does not*
|
|
|
|
+suppress the generation of write barriers; it is simply an assertion.)
|
|
|
|
+
|
|
|
|
+Usually you want `go:nowritebarrierrec`. `go:nowritebarrier` is
|
|
|
|
+primarily useful in situations where it's "nice" not to have write
|
|
|
|
+barriers, but not required for correctness.
|
|
|
|
+
|
|
|
|
+go:nowritebarrierrec and go:yeswritebarrierrec
|
|
|
|
+----------------------------------------------
|
|
|
|
+
|
|
|
|
+`go:nowritebarrierrec` directs the compiler to emit an error if the
|
|
|
|
+following function or any function it calls recursively, up to a
|
|
|
|
+`go:yeswritebarrierrec`, contains a write barrier.
|
|
|
|
+
|
|
|
|
+Logically, the compiler floods the call graph starting from each
|
|
|
|
+`go:nowritebarrierrec` function and produces an error if it encounters
|
|
|
|
+a function containing a write barrier. This flood stops at
|
|
|
|
+`go:yeswritebarrierrec` functions.
|
|
|
|
+
|
|
|
|
+`go:nowritebarrierrec` is used in the implementation of the write
|
|
|
|
+barrier to prevent infinite loops.
|
|
|
|
+
|
|
|
|
+Both directives are used in the scheduler. The write barrier requires
|
|
|
|
+an active P (`getg().m.p != nil`) and scheduler code often runs
|
|
|
|
+without an active P. In this case, `go:nowritebarrierrec` is used on
|
|
|
|
+functions that release the P or may run without a P and
|
|
|
|
+`go:yeswritebarrierrec` is used when code re-acquires an active P.
|
|
|
|
+Since these are function-level annotations, code that releases or
|
|
|
|
+acquires a P may need to be split across two functions.
|
|
|
|
+
|
|
|
|
+go:uintptrkeepalive
|
|
|
|
+-------------------
|
|
|
|
+
|
|
|
|
+The //go:uintptrkeepalive directive must be followed by a function declaration.
|
|
|
|
+
|
|
|
|
+It specifies that the function's uintptr arguments may be pointer values that
|
|
|
|
+have been converted to uintptr and must be kept alive for the duration of the
|
|
|
|
+call, even though from the types alone it would appear that the object is no
|
|
|
|
+longer needed during the call.
|
|
|
|
+
|
|
|
|
+This directive is similar to //go:uintptrescapes, but it does not force
|
|
|
|
+arguments to escape. Since stack growth does not understand these arguments,
|
|
|
|
+this directive must be used with //go:nosplit (in the marked function and all
|
|
|
|
+transitive calls) to prevent stack growth.
|
|
|
|
+
|
|
|
|
+The conversion from pointer to uintptr must appear in the argument list of any
|
|
|
|
+call to this function. This directive is used for some low-level system call
|
|
|
|
+implementations.
|