重要前提
安装AI Skills的关键前提是:必须科学上网,且开启TUN模式,这一点至关重要,直接决定安装能否顺利完成,在此郑重提醒三遍:科学上网,科学上网,科学上网。查看完整安装教程 →
nanogpt by orchestra-research/ai-research-skills
npx skills add https://github.com/orchestra-research/ai-research-skills --skill nanogptnanoGPT 是一个简化的 GPT 实现,专为学习和实验而设计。
安装 :
pip install torch numpy transformers datasets tiktoken wandb tqdm
在莎士比亚数据集上训练 (CPU 友好):
# 准备数据
python data/shakespeare_char/prepare.py
# 训练 (CPU 上约 5 分钟)
python train.py config/train_shakespeare_char.py
# 生成文本
python sample.py --out_dir=out-shakespeare-char
输出示例 :
ROMEO:
What say'st thou? Shall I speak, and be a man?
JULIET:
I am afeard, and yet I'll speak; for thou art
One that hath been a man, and yet I know not
What thou art.
完整训练流程 :
# 步骤 1: 准备数据 (创建 train.bin, val.bin)
python data/shakespeare_char/prepare.py
# 步骤 2: 训练小型模型
python train.py config/train_shakespeare_char.py
# 步骤 3: 生成文本
python sample.py --out_dir=out-shakespeare-char
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
配置 (config/train_shakespeare_char.py):
# 模型配置
n_layer = 6 # 6 个 Transformer 层
n_head = 6 # 6 个注意力头
n_embd = 384 # 384 维嵌入
block_size = 256 # 256 字符上下文长度
# 训练配置
batch_size = 64
learning_rate = 1e-3
max_iters = 5000
eval_interval = 500
# 硬件
device = 'cpu' # 或 'cuda'
compile = False # PyTorch 2.0 以上可设为 True
训练时间 : ~5 分钟 (CPU), ~1 分钟 (GPU)
在 OpenWebText 上进行多 GPU 训练 :
# 步骤 1: 准备 OpenWebText 数据 (约需 1 小时)
python data/openwebtext/prepare.py
# 步骤 2: 使用 DDP 训练 GPT-2 124M (8 个 GPU)
torchrun --standalone --nproc_per_node=8 \
train.py config/train_gpt2.py
# 步骤 3: 从训练好的模型采样
python sample.py --out_dir=out
配置 (config/train_gpt2.py):
# GPT-2 (124M) 架构
n_layer = 12
n_head = 12
n_embd = 768
block_size = 1024
dropout = 0.0
# 训练
batch_size = 12
gradient_accumulation_steps = 5 * 8 # 总批次约 0.5M 词元
learning_rate = 6e-4
max_iters = 600000
lr_decay_iters = 600000
# 系统
compile = True # PyTorch 2.0
训练时间 : ~4 天 (8× A100)
从 OpenAI 检查点开始 :
# 在 train.py 或配置中
init_from = 'gpt2' # 选项: gpt2, gpt2-medium, gpt2-large, gpt2-xl
# 模型自动加载 OpenAI 权重
python train.py config/finetune_shakespeare.py
示例配置 (config/finetune_shakespeare.py):
# 从 GPT-2 开始
init_from = 'gpt2'
# 数据集
dataset = 'shakespeare_char'
batch_size = 1
block_size = 1024
# 微调
learning_rate = 3e-5 # 微调使用较低学习率
max_iters = 2000
warmup_iters = 100
# 正则化
weight_decay = 1e-1
在您自己的文本上训练 :
# data/custom/prepare.py
import numpy as np
# 加载您的数据
with open('my_data.txt', 'r') as f:
text = f.read()
# 创建字符映射
chars = sorted(list(set(text)))
stoi = {ch: i for i, ch in enumerate(chars)}
itos = {i: ch for i, ch in enumerate(chars)}
# 词元化
data = np.array([stoi[ch] for ch in text], dtype=np.uint16)
# 分割训练/验证集
n = len(data)
train_data = data[:int(n*0.9)]
val_data = data[int(n*0.9):]
# 保存
train_data.tofile('data/custom/train.bin')
val_data.tofile('data/custom/val.bin')
训练 :
python data/custom/prepare.py
python train.py --dataset=custom
在以下情况下使用 nanoGPT :
简洁性优势 :
model.py 中train.py 中在以下情况下使用替代方案 :
问题:CUDA 内存不足
减小批次大小或上下文长度:
batch_size = 1 # 从 12 减小
block_size = 512 # 从 1024 减小
gradient_accumulation_steps = 40 # 增加以维持有效批次大小
问题:训练速度太慢
启用编译 (PyTorch 2.0+):
compile = True # 2 倍加速
使用混合精度:
dtype = 'bfloat16' # 或 'float16'
问题:生成质量差
延长训练时间:
max_iters = 10000 # 从 5000 增加
降低温度参数:
# 在 sample.py 中
temperature = 0.7 # 从 1.0 降低
top_k = 200 # 添加 top-k 采样
问题:无法加载 GPT-2 权重
安装 transformers:
pip install transformers
检查模型名称:
init_from = 'gpt2' # 有效值: gpt2, gpt2-medium, gpt2-large, gpt2-xl
模型架构 : 请参阅 references/architecture.md 了解 GPT 块结构、多头注意力和 MLP 层的简单解释。
训练循环 : 请参阅 references/training.md 了解学习率调度、梯度累积和分布式数据并行设置。
数据准备 : 请参阅 references/data.md 了解词元化策略 (字符级 vs BPE) 和二进制格式详情。
莎士比亚 (字符级) :
GPT-2 (124M) :
GPT-2 Medium (350M) :
性能 :
compile=True: 2 倍加速dtype=bfloat16: 内存减少 50%每周安装量
66
代码仓库
GitHub 星标
5.6K
首次出现
2026年2月7日
安全审计
已安装于
opencode57
codex56
cursor56
gemini-cli55
claude-code55
github-copilot54
nanoGPT is a simplified GPT implementation designed for learning and experimentation.
Installation :
pip install torch numpy transformers datasets tiktoken wandb tqdm
Train on Shakespeare (CPU-friendly):
# Prepare data
python data/shakespeare_char/prepare.py
# Train (5 minutes on CPU)
python train.py config/train_shakespeare_char.py
# Generate text
python sample.py --out_dir=out-shakespeare-char
Output :
ROMEO:
What say'st thou? Shall I speak, and be a man?
JULIET:
I am afeard, and yet I'll speak; for thou art
One that hath been a man, and yet I know not
What thou art.
Complete training pipeline :
# Step 1: Prepare data (creates train.bin, val.bin)
python data/shakespeare_char/prepare.py
# Step 2: Train small model
python train.py config/train_shakespeare_char.py
# Step 3: Generate text
python sample.py --out_dir=out-shakespeare-char
Config (config/train_shakespeare_char.py):
# Model config
n_layer = 6 # 6 transformer layers
n_head = 6 # 6 attention heads
n_embd = 384 # 384-dim embeddings
block_size = 256 # 256 char context
# Training config
batch_size = 64
learning_rate = 1e-3
max_iters = 5000
eval_interval = 500
# Hardware
device = 'cpu' # Or 'cuda'
compile = False # Set True for PyTorch 2.0
Training time : ~5 minutes (CPU), ~1 minute (GPU)
Multi-GPU training on OpenWebText :
# Step 1: Prepare OpenWebText (takes ~1 hour)
python data/openwebtext/prepare.py
# Step 2: Train GPT-2 124M with DDP (8 GPUs)
torchrun --standalone --nproc_per_node=8 \
train.py config/train_gpt2.py
# Step 3: Sample from trained model
python sample.py --out_dir=out
Config (config/train_gpt2.py):
# GPT-2 (124M) architecture
n_layer = 12
n_head = 12
n_embd = 768
block_size = 1024
dropout = 0.0
# Training
batch_size = 12
gradient_accumulation_steps = 5 * 8 # Total batch ~0.5M tokens
learning_rate = 6e-4
max_iters = 600000
lr_decay_iters = 600000
# System
compile = True # PyTorch 2.0
Training time : ~4 days (8× A100)
Start from OpenAI checkpoint :
# In train.py or config
init_from = 'gpt2' # Options: gpt2, gpt2-medium, gpt2-large, gpt2-xl
# Model loads OpenAI weights automatically
python train.py config/finetune_shakespeare.py
Example config (config/finetune_shakespeare.py):
# Start from GPT-2
init_from = 'gpt2'
# Dataset
dataset = 'shakespeare_char'
batch_size = 1
block_size = 1024
# Fine-tuning
learning_rate = 3e-5 # Lower LR for fine-tuning
max_iters = 2000
warmup_iters = 100
# Regularization
weight_decay = 1e-1
Train on your own text :
# data/custom/prepare.py
import numpy as np
# Load your data
with open('my_data.txt', 'r') as f:
text = f.read()
# Create character mappings
chars = sorted(list(set(text)))
stoi = {ch: i for i, ch in enumerate(chars)}
itos = {i: ch for i, ch in enumerate(chars)}
# Tokenize
data = np.array([stoi[ch] for ch in text], dtype=np.uint16)
# Split train/val
n = len(data)
train_data = data[:int(n*0.9)]
val_data = data[int(n*0.9):]
# Save
train_data.tofile('data/custom/train.bin')
val_data.tofile('data/custom/val.bin')
Train :
python data/custom/prepare.py
python train.py --dataset=custom
Use nanoGPT when :
Simplicity advantages :
model.pytrain.pyUse alternatives instead :
Issue: CUDA out of memory
Reduce batch size or context length:
batch_size = 1 # Reduce from 12
block_size = 512 # Reduce from 1024
gradient_accumulation_steps = 40 # Increase to maintain effective batch
Issue: Training too slow
Enable compilation (PyTorch 2.0+):
compile = True # 2× speedup
Use mixed precision:
dtype = 'bfloat16' # Or 'float16'
Issue: Poor generation quality
Train longer:
max_iters = 10000 # Increase from 5000
Lower temperature:
# In sample.py
temperature = 0.7 # Lower from 1.0
top_k = 200 # Add top-k sampling
Issue: Can't load GPT-2 weights
Install transformers:
pip install transformers
Check model name:
init_from = 'gpt2' # Valid: gpt2, gpt2-medium, gpt2-large, gpt2-xl
Model architecture : See references/architecture.md for GPT block structure, multi-head attention, and MLP layers explained simply.
Training loop : See references/training.md for learning rate schedule, gradient accumulation, and distributed data parallel setup.
Data preparation : See references/data.md for tokenization strategies (character-level vs BPE) and binary format details.
Shakespeare (char-level) :
GPT-2 (124M) :
GPT-2 Medium (350M) :
Performance :
compile=True: 2× speedupdtype=bfloat16: 50% memory reductionWeekly Installs
66
Repository
GitHub Stars
5.6K
First Seen
Feb 7, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
opencode57
codex56
cursor56
gemini-cli55
claude-code55
github-copilot54
超能力技能使用指南:AI助手技能调用优先级与工作流程详解
53,700 周安装