Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/golang/go/llms.txt

Use this file to discover all available pages before exploring further.

Introduction

Go’s assembler is based on the Plan 9 assembler syntax, which differs from traditional assemblers. It operates on a semi-abstract instruction set rather than providing direct access to machine instructions. This guide explains Go’s assembly language and how to use it effectively.
Go’s assembler is not a direct representation of the underlying machine. Some details map precisely to hardware, but many are abstracted. The toolchain handles instruction selection during code generation.

Key Concepts

Semi-Abstract Instructions

The assembler works with semi-abstract instructions:
  • A MOV might not generate a move instruction at all
  • Could be a clear, load, or other operation
  • Machine-specific operations tend to appear as themselves
  • General concepts (memory move, calls) are more abstract

Viewing Assembly Output

To see what assembly your Go code generates:
# Compile and show assembly
go build -gcflags=-S main.go

# Or use compile tool directly
GOOS=linux GOARCH=amd64 go tool compile -S x.go

# Disassemble compiled binary
go build -o program main.go
go tool objdump -s main.main program

Syntax and Structure

Constants

Constant expressions use Go operator precedence:
// This is 4, not 0 (parsed as (3&1)<<2, not 3&(1<<2))
3&1<<2

// Constants are 64-bit unsigned
-2  // Represented as unsigned 64-bit with same bit pattern
Division or right shift where the right operand’s high bit is set is rejected to avoid ambiguity.

Symbols and Pseudo-Registers

Four predeclared pseudo-registers (same on all architectures):
  • FP (Frame Pointer): Arguments and locals
  • PC (Program Counter): Jumps and branches
  • SB (Static Base): Global symbols
  • SP (Stack Pointer): Top of local stack frame

Static Base (SB)

Used for global functions and data:
// Global function
TEXT runtime·profileloop(SB), NOSPLIT, $8

// Global symbol
MOVQ $runtime·profileloop1(SB), CX

// File-local symbol (like static in C)
TEXT foo<>(SB), NOSPLIT, $0

// Offset from symbol
MOVQ foo+4(SB), AX  // 4 bytes past start of foo

Frame Pointer (FP)

Access function arguments:
// Must use names with offsets
MOVQ first_arg+0(FP), AX    // First argument
MOVQ second_arg+8(FP), BX   // Second argument (64-bit)

// On 32-bit systems, 64-bit values split:
MOVL arg_lo+0(FP), AX
MOVL arg_hi+4(FP), DX
Plain 0(FP) is rejected - you must use a name like arg+0(FP). The name is for documentation and verification by go vet.

Stack Pointer (SP)

Access local variables and prepare function calls:
// Negative offsets from SP for locals
MOVQ x-8(SP), AX    // Local variable
MOVQ y-16(SP), BX   // Another local

// Range: [-framesize, 0)
On architectures with hardware SP register:
  • x-8(SP) - virtual stack pointer
  • -8(SP) - hardware SP register

Labels and Jumps

label:
    MOVW $0, R1
    JMP label  // Jump to label

// Labels are function-local
// Multiple functions can use same label names
Direct jumps use symbols:
CALL name(SB)      // OK
JMP name(SB)       // OK  
JMP name+4(SB)     // ERROR: cannot use offset

Directives

TEXT Directive

Declares a function:
TEXT runtime·profileloop(SB), NOSPLIT, $8
    MOVQ $runtime·profileloop1(SB), CX
    MOVQ CX, 0(SP)
    CALL runtime·externalthreadhandler(SB)
    RET
Format: TEXT symbol(SB), flags, $framesize-argsize
  • framesize: Local stack frame size
  • argsize: Argument size on caller’s frame
  • flags: See textflag.h (NOSPLIT, WRAPPER, etc.)
The last instruction in a TEXT block must be a jump (usually RET). The linker will add a jump-to-itself if missing.

Common Flags

NOSPLIT     = 4    // Don't check for stack split
RODATA      = 8    // Read-only data
NOPTR       = 16   // Contains no pointers (GC)
WRAPPER     = 32   // Wrapper function (for recover)
NEEDCTXT    = 64   // Closure, uses context register
NOFRAME     = 512  // No frame allocation (frame must be $0)
TOPFRAME    = 2048 // Outermost frame (stop traceback)

DATA and GLOBL Directives

Define global data:
// Initialize data
DATA divtab<>+0x00(SB)/4, $0xf4f8fcff
DATA divtab<>+0x04(SB)/4, $0xe6eaedf0
DATA divtab<>+0x3c(SB)/4, $0x81828384

// Declare global symbol
GLOBL divtab<>(SB), RODATA, $64

// Implicitly zeroed variable
GLOBL runtime·tlsoffset(SB), NOPTR, $4
Format: DATA symbol+offset(SB)/width, value

Special Instructions

FUNCDATA and PCDATA

Generated by compiler for GC information:
FUNCDATA $0, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
PCDATA $0, $0
These provide stack map information for the garbage collector.

PCALIGN

Align next instruction:
PCALIGN $32
MOVD $2, R0  // Start of MOVD aligned to 32 bytes
Supported on: arm64, amd64, ppc64, loong64, riscv64

Interacting with Go

go_asm.h Header

When a package has .s files, go build generates go_asm.h:
// Provides constants for Go types
const_bufSize              // Go const values
reader__size              // Struct sizes
reader_buf                // Field offsets  
reader_r
Usage in assembly:
#include "go_asm.h"

// Access field of struct pointer in R1
MOVQ reader_r(R1), AX
This keeps assembly robust to changes in Go type layouts. Always use these constants instead of hard-coding offsets.

Runtime Coordination

Assembly functions need pointer information for GC:
  1. Define Go prototype in a .go file:
//go:linkname ·asmFunction
func asmFunction(arg1 int, arg2 *byte) int
  1. Implement in assembly:
TEXT ·asmFunction(SB), NOSPLIT, $0-24
    MOVQ arg1+0(FP), AX
    MOVQ arg2+8(FP), BX
    // ... implementation ...
    MOVQ AX, ret+16(FP)
    RET
Rules:
  • Assembly name must not include package (use ·Function not pkg·Function)
  • Always provide Go prototype for pointer safety
  • Mark data with NOPTR if it contains no pointers
  • Use NO_LOCAL_POINTERS if local frame has no pointers

Calling Convention

Data flow is left to right:
MOVQ $0, CX    // Clear CX (0CX)
ADDQ AX, BX    // Add AX to BX (AX + BXBX)
This applies even on architectures with opposite conventional notation.

Architecture-Specific Details

x86 (386 and amd64)

Accessing g and m

#include "go_tls.h"
#include "go_asm.h"

get_tls(CX)
MOVQ g(CX), AX        // Move g into AX
MOVQ g_m(AX), BX      // Move g.m into BX  

Addressing Modes

(DI)(BX*2)            // Address DI + BX*2
64(DI)(BX*2)          // Address DI + BX*2 + 64
// Scale factors: 1, 2, 4, 8 only
In -dynlink or -shared modes, loads/stores of globals may overwrite CX. Avoid using CX between memory references.

ARM64

Registers:
  • R18: Platform register (reserved on Apple)
  • R27, R28: Reserved by compiler/linker
  • R29: Frame pointer
  • R30: Link register
Instruction modifiers:
MOVW.P    // Post-increment
MOVW.W    // Pre-increment
Addressing modes:
R0->16        // Arithmetic right shift
R0>>16        // Logical right shift  
R0<<16        // Left shift
R0@>16        // Rotate right

$(8<<12)      // Immediate with shift
8(R0)         // R0 + 8
(R2)(R0)      // R0 + R2

R0.UXTB       // Zero-extend byte
R0.SXTB       // Sign-extend byte

ARM (32-bit)

Registers:
  • R10: Points to g (goroutine structure) - use g not R10
  • R11: Reserved for linker temps
  • R13: Hardware SP (use R13, not SP)
Special:
  • Frame size $-4 tells linker not to save LR (leaf function)
  • Condition codes append to instruction: MOVW.EQ, MOVM.IA.W

Writing Assembly Functions

Complete Example

Go declaration:
package main

//go:linkname ·add
func add(x, y int64) int64
Assembly implementation:
#include "textflag.h"

// func add(x, y int64) int64
TEXT ·add(SB), NOSPLIT, $0-24
    MOVQ x+0(FP), AX
    MOVQ y+8(FP), BX
    ADDQ BX, AX
    MOVQ AX, ret+16(FP)
    RET
Frame size calculation:
  • 2 arguments × 8 bytes = 16 bytes
  • 1 return value × 8 bytes = 8 bytes
  • Total: $0-24 (no locals, 24 byte args+results)

Using BYTE and WORD

For unsupported opcodes:
TEXT runtime·atomicload64(SB), NOSPLIT, $0-12
    MOVL ptr+0(FP), AX
    LEAL ret_lo+4(FP), BX
    
    // MOVQ (%EAX), %MM0
    BYTE $0x0f; BYTE $0x6f; BYTE $0x00
    
    // MOVQ %MM0, 0(%EBX)
    BYTE $0x0f; BYTE $0x7f; BYTE $0x03
    
    // EMMS
    BYTE $0x0F; BYTE $0x77
    RET

Best Practices

  1. Always provide Go prototypes for pointer safety and go vet checking
  2. Use go_asm.h constants instead of hard-coding offsets
  3. Mark nosplit functions appropriately and keep them small
  4. Document why assembly is needed in comments
  5. Test thoroughly - assembly bypasses safety checks
  6. Use NOPTR for data without pointers to help GC
  7. Avoid architecture-specific code when possible - use Go instead
Assembly code bypasses Go’s type safety and bounds checking. Use only when necessary for performance or to access features not available in Go.

References