重要前提
安装AI Skills的关键前提是:必须科学上网,且开启TUN模式,这一点至关重要,直接决定安装能否顺利完成,在此郑重提醒三遍:科学上网,科学上网,科学上网。查看完整安装教程 →
assembly-x86 by mohitmishra786/low-level-dev-skills
npx skills add https://github.com/mohitmishra786/low-level-dev-skills --skill assembly-x86指导智能体掌握 x86-64 汇编:阅读编译器输出、理解 ABI、编写内联汇编以及常见模式。
%rsp / %rbp —— 这是什么意思?"# AT&T 语法 (GCC 默认)
gcc -S -O2 -fverbose-asm foo.c -o foo.s
# Intel 语法
gcc -S -masm=intel -O2 foo.c -o foo.s
# 在 GDB 中
(gdb) disassemble /s main # 附带源代码
(gdb) x/20i $rip
# 使用 objdump
objdump -d -M intel -S prog # Intel 语法 + 源代码 (需要 -g 选项)
| 64位 | 32位 | 16位 | 8位高字节 | 8位低字节 |
|---|
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 用途 |
|---|
%rax | %eax | %ax | %ah | %al | 返回值 / 累加器 |
%rbx | %ebx | %bx | %bh | %bl | 被调用者保存 |
%rcx | %ecx | %cx | %ch | %cl | 第 4 个参数 / 计数器 |
%rdx | %edx | %dx | %dh | %dl | 第 3 个参数 / 第 2 个返回值 |
%rsi | %esi | %si | — | %sil | 第 2 个参数 |
%rdi | %edi | %di | — | %dil | 第 1 个参数 |
%rbp | %ebp | %bp | — | %bpl | 帧指针 (被调用者保存) |
%rsp | %esp | %sp | — | %spl | 栈指针 |
%r8–%r11 | %r8d–%r11d | %r8w–%r11w | — | %r8b–%r11b | 第 5–8 个参数 / 调用者保存 |
%r12–%r15 | %r12d–%r15d | %r12w–%r15w | — | %r12b–%r15b | 被调用者保存 |
%rip | 指令指针 | ||||
%rflags | %eflags | 状态标志位 | |||
%xmm0–%xmm7 | 浮点/SIMD 参数和返回值 | ||||
%xmm8–%xmm15 | 调用者保存的 SIMD 寄存器 | ||||
%ymm0–%ymm15 | AVX 256 位 | ||||
%zmm0–%zmm31 | AVX-512 512 位 |
整数/指针参数寄存器 (按顺序): %rdi, %rsi, %rdx, %rcx, %r8, %r9
浮点参数寄存器: %xmm0–%xmm7
返回值:
%rax (低位), %rdx (高位,如果是 128 位)%xmm0 (低位), %xmm1 (高位)调用者保存 (临时): %rax, %rcx, %rdx, %rsi, %rdi, %r8–%r11, %xmm0–%xmm15
被调用者保存 (必须保留): %rbx, %rbp, %r12–%r15
栈: 在 call 指令前 16 字节对齐;call 指令压入 8 字节 → 在函数入口序言后,栈为 16 字节对齐。
红区: %rsp 下方 128 字节区域,叶子函数可以不调整 %rsp 直接使用。在内核/信号处理程序中不可用。
| 模式 | 含义 |
|---|---|
mov %rdi, %rax | 将 rdi 复制到 rax |
mov (%rdi), %rax | 从 rdi 中的地址加载 8 字节 |
mov %rax, 8(%rdi) | 将 rax 存储到 rdi+8 |
lea 8(%rdi), %rax | 将有效地址 rdi+8 加载到 rax (不访问内存) |
push %rbx | 压入 rbx;rsp -= 8 |
pop %rbx | 弹出到 rbx;rsp += 8 |
call foo | 压入返回地址;跳转到 foo |
ret | 弹出返回地址;跳转到该地址 |
xor %eax, %eax | 将 rax 清零 (编码比 mov $0, %rax 更小) |
test %rax, %rax | 如果 rax == 0 则设置 ZF (比 cmp $0, %rax 更廉价) |
cmp $5, %rdi | 为 rdi - 5 设置标志位 |
jl label | 如果小于 (有符号) 则跳转 |
| 特性 | AT&T | Intel |
|---|---|---|
| 操作数顺序 | 源, 目标 | 目标, 源 |
| 寄存器前缀 | %rax | rax |
| 立即数前缀 | $42 | 42 |
| 内存操作数 | 8(%rdi) | [rdi+8] |
| 大小后缀 | movl, movq | — (推断) |
GCC 默认生成 AT&T 语法。使用 -masm=intel 选项生成 Intel 语法。
// 基本示例:递增寄存器
int x = 5;
__asm__ volatile (
"incl %0"
: "=r"(x) // 输出:=r 表示只写寄存器
: "0"(x) // 输入:0 表示与输出 0 相同
: // 破坏列表:无
);
// CPUID 示例
uint32_t eax, ebx, ecx, edx;
__asm__ volatile (
"cpuid"
: "=a"(eax), "=b"(ebx), "=c"(ecx), "=d"(edx)
: "a"(1) // 输入:叶编号 1
);
// 原子递增
static inline int atomic_inc(volatile int *p) {
int ret;
__asm__ volatile (
"lock; xaddl %0, %1"
: "=r"(ret), "+m"(*p)
: "0"(1)
: "memory"
);
return ret + 1;
}
约束代码:
"r" — 任意通用寄存器"m" — 内存操作数"i" — 立即整数"a", "b", "c", "d" — 特定寄存器 (%rax, %rbx, %rcx, %rdx)"=" 前缀 — 输出 (只写)"+" 前缀 — 读写"memory" 破坏项 — 告知编译器内存可能被修改 (屏障)#include <immintrin.h> // 包含所有 x86 SIMD 头文件
// 使用 AVX 一次相加 8 个浮点数
__m256 a = _mm256_loadu_ps(arr_a); // 加载 8 个浮点数 (未对齐)
__m256 b = _mm256_loadu_ps(arr_b);
__m256 c = _mm256_add_ps(a, b);
_mm256_storeu_ps(result, c);
编译时检查 CPU 支持: -mavx2 或 -march=native。运行时检查: __builtin_cpu_supports("avx2")。
完整的寄存器和指令参考,请参阅 references/reference.md。
skills/low-level-programming/assembly-arm 处理 AArch64/ARM 汇编skills/compilers/gcc 获取 -S -masm=intel 标志的详细信息skills/debuggers/gdb 单步执行汇编代码 (si, ni, x/i)每周安装次数
54
代码仓库
GitHub 星标数
34
首次出现
2026年2月20日
安全审计
安装于
codex52
kimi-cli51
gemini-cli51
amp51
cline51
github-copilot51
Guide agents through x86-64 assembly: reading compiler output, understanding the ABI, writing inline asm, and common patterns.
%rsp / %rbp — what does it mean?"# AT&T syntax (GCC default)
gcc -S -O2 -fverbose-asm foo.c -o foo.s
# Intel syntax
gcc -S -masm=intel -O2 foo.c -o foo.s
# From GDB
(gdb) disassemble /s main # with source
(gdb) x/20i $rip
# From objdump
objdump -d -M intel -S prog # Intel + source (needs -g)
| 64-bit | 32-bit | 16-bit | 8-bit high | 8-bit low | Purpose |
|---|---|---|---|---|---|
%rax | %eax | %ax | %ah | %al | Return value / accumulator |
%rbx | %ebx | %bx | %bh |
Integer/pointer argument registers (in order): %rdi, %rsi, %rdx, %rcx, %r8, %r9
Floating-point argument registers: %xmm0–%xmm7
Return values:
%rax (low), %rdx (high if 128-bit)%xmm0 (low), %xmm1 (high)Caller-saved (scratch): %rax, %rcx, %rdx, %rsi, %rdi, %r8–%r11, %xmm0–%xmm15
Callee-saved (must preserve): %rbx, %rbp, %r12–%r15
Stack: 16-byte aligned before call; call pushes 8 bytes → 16-byte aligned at function entry after prologue.
Red zone: 128 bytes below %rsp may be used by leaf functions without adjusting %rsp. Not available in kernel/signal handlers.
| Pattern | Meaning |
|---|---|
mov %rdi, %rax | Copy rdi to rax |
mov (%rdi), %rax | Load 8 bytes from address in rdi |
mov %rax, 8(%rdi) | Store rax to rdi+8 |
lea 8(%rdi), %rax | Load effective address rdi+8 into rax (no memory access) |
push %rbx | Push rbx; rsp -= 8 |
pop %rbx | Pop into rbx; rsp += 8 |
| Feature | AT&T | Intel |
|---|---|---|
| Operand order | source, dest | dest, source |
| Register prefix | %rax | rax |
| Immediate prefix | $42 | 42 |
| Memory operand | 8(%rdi) | [rdi+8] |
| Size suffix |
GCC emits AT&T by default. Use -masm=intel for Intel syntax.
// Basic: increment a register
int x = 5;
__asm__ volatile (
"incl %0"
: "=r"(x) // outputs: =r means write-only register
: "0"(x) // inputs: 0 means same as output 0
: // clobbers: none
);
// CPUID example
uint32_t eax, ebx, ecx, edx;
__asm__ volatile (
"cpuid"
: "=a"(eax), "=b"(ebx), "=c"(ecx), "=d"(edx)
: "a"(1) // input: leaf 1
);
// Atomic increment
static inline int atomic_inc(volatile int *p) {
int ret;
__asm__ volatile (
"lock; xaddl %0, %1"
: "=r"(ret), "+m"(*p)
: "0"(1)
: "memory"
);
return ret + 1;
}
Constraint codes:
"r" — any general register"m" — memory operand"i" — immediate integer"a", "b", "c", "d" — specific registers (%rax, %rbx, %rcx, %rdx)"=" prefix — output (write-only)"+" prefix — read-write"memory" clobber — tells compiler memory may be modified (barrier)#include <immintrin.h> // includes all x86 SIMD headers
// Add 8 floats at once with AVX
__m256 a = _mm256_loadu_ps(arr_a); // load 8 floats (unaligned)
__m256 b = _mm256_loadu_ps(arr_b);
__m256 c = _mm256_add_ps(a, b);
_mm256_storeu_ps(result, c);
Check CPU support at compile time: -mavx2 or -march=native. Check at runtime: __builtin_cpu_supports("avx2").
For a full register and instruction reference, see references/reference.md.
skills/low-level-programming/assembly-arm for AArch64/ARM assemblyskills/compilers/gcc for -S -masm=intel flag detailsskills/debuggers/gdb for stepping through assembly (si, ni, x/i)Weekly Installs
54
Repository
GitHub Stars
34
First Seen
Feb 20, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
codex52
kimi-cli51
gemini-cli51
amp51
cline51
github-copilot51
React 组合模式指南:Vercel 组件架构最佳实践,提升代码可维护性
125,600 周安装
%bl| Callee-saved |
%rcx | %ecx | %cx | %ch | %cl | 4th arg / count |
%rdx | %edx | %dx | %dh | %dl | 3rd arg / 2nd return |
%rsi | %esi | %si | — | %sil | 2nd arg |
%rdi | %edi | %di | — | %dil | 1st arg |
%rbp | %ebp | %bp | — | %bpl | Frame pointer (callee-saved) |
%rsp | %esp | %sp | — | %spl | Stack pointer |
%r8–%r11 | %r8d–%r11d | %r8w–%r11w | — | %r8b–%r11b | 5th–8th args / caller-saved |
%r12–%r15 | %r12d–%r15d | %r12w–%r15w | — | %r12b–%r15b | Callee-saved |
%rip | Instruction pointer |
%rflags | %eflags | Status flags |
%xmm0–%xmm7 | FP/SIMD args and return |
%xmm8–%xmm15 | Caller-saved SIMD |
%ymm0–%ymm15 | AVX 256-bit |
%zmm0–%zmm31 | AVX-512 512-bit |
call foo | Push return addr; jmp foo |
ret | Pop return addr; jmp to it |
xor %eax, %eax | Zero rax (smaller encoding than mov $0, %rax) |
test %rax, %rax | Set ZF if rax == 0 (cheaper than cmp $0, %rax) |
cmp $5, %rdi | Set flags for rdi - 5 |
jl label | Jump if signed less than |
movl, movq |
| — (inferred) |