nanogpt by davila7/claude-code-templates
npx skills add https://github.com/davila7/claude-code-templates --skill nanogptnanoGPT 是一个简化的 GPT 实现,专为学习和实验而设计。
安装:
pip install torch numpy transformers datasets tiktoken wandb tqdm
在莎士比亚数据集上训练(对 CPU 友好):
# 准备数据
python data/shakespeare_char/prepare.py
# 训练(在 CPU 上约 5 分钟)
python train.py config/train_shakespeare_char.py
# 生成文本
python sample.py --out_dir=out-shakespeare-char
输出示例:
ROMEO:
What say'st thou? Shall I speak, and be a man?
JULIET:
I am afeard, and yet I'll speak; for thou art
One that hath been a man, and yet I know not
What thou art.
完整训练流程:
# 步骤 1:准备数据(创建 train.bin, val.bin)
python data/shakespeare_char/prepare.py
# 步骤 2:训练小型模型
python train.py config/train_shakespeare_char.py
# 步骤 3:生成文本
python sample.py --out_dir=out-shakespeare-char
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
配置 (config/train_shakespeare_char.py):
# 模型配置
n_layer = 6 # 6 个 Transformer 层
n_head = 6 # 6 个注意力头
n_embd = 384 # 384 维嵌入
block_size = 256 # 256 字符上下文长度
# 训练配置
batch_size = 64
learning_rate = 1e-3
max_iters = 5000
eval_interval = 500
# 硬件
device = 'cpu' # 或 'cuda'
compile = False # 对于 PyTorch 2.0 可设为 True
训练时间:约 5 分钟(CPU),约 1 分钟(GPU)
在 OpenWebText 上进行多 GPU 训练:
# 步骤 1:准备 OpenWebText 数据(约需 1 小时)
python data/openwebtext/prepare.py
# 步骤 2:使用 DDP 训练 GPT-2 124M(8 个 GPU)
torchrun --standalone --nproc_per_node=8 \
train.py config/train_gpt2.py
# 步骤 3:从训练好的模型采样
python sample.py --out_dir=out
配置 (config/train_gpt2.py):
# GPT-2 (124M) 架构
n_layer = 12
n_head = 12
n_embd = 768
block_size = 1024
dropout = 0.0
# 训练
batch_size = 12
gradient_accumulation_steps = 5 * 8 # 总批次约 0.5M 词元
learning_rate = 6e-4
max_iters = 600000
lr_decay_iters = 600000
# 系统
compile = True # PyTorch 2.0
训练时间:约 4 天(8× A100)
从 OpenAI 检查点开始:
# 在 train.py 或配置中
init_from = 'gpt2' # 选项:gpt2, gpt2-medium, gpt2-large, gpt2-xl
# 模型自动加载 OpenAI 权重
python train.py config/finetune_shakespeare.py
示例配置 (config/finetune_shakespeare.py):
# 从 GPT-2 开始
init_from = 'gpt2'
# 数据集
dataset = 'shakespeare_char'
batch_size = 1
block_size = 1024
# 微调
learning_rate = 3e-5 # 微调使用较低的学习率
max_iters = 2000
warmup_iters = 100
# 正则化
weight_decay = 1e-1
在您自己的文本上训练:
# data/custom/prepare.py
import numpy as np
# 加载您的数据
with open('my_data.txt', 'r') as f:
text = f.read()
# 创建字符映射
chars = sorted(list(set(text)))
stoi = {ch: i for i, ch in enumerate(chars)}
itos = {i: ch for i, ch in enumerate(chars)}
# 分词
data = np.array([stoi[ch] for ch in text], dtype=np.uint16)
# 分割训练/验证集
n = len(data)
train_data = data[:int(n*0.9)]
val_data = data[int(n*0.9):]
# 保存
train_data.tofile('data/custom/train.bin')
val_data.tofile('data/custom/val.bin')
训练:
python data/custom/prepare.py
python train.py --dataset=custom
在以下情况使用 nanoGPT:
简洁性优势:
model.py 中train.py 中在以下情况使用替代方案:
问题:CUDA 内存不足
减少批次大小或上下文长度:
batch_size = 1 # 从 12 减少
block_size = 512 # 从 1024 减少
gradient_accumulation_steps = 40 # 增加以维持有效批次大小
问题:训练速度太慢
启用编译(PyTorch 2.0+):
compile = True # 2 倍加速
使用混合精度:
dtype = 'bfloat16' # 或 'float16'
问题:生成质量差
延长训练时间:
max_iters = 10000 # 从 5000 增加
降低温度参数:
# 在 sample.py 中
temperature = 0.7 # 从 1.0 降低
top_k = 200 # 添加 top-k 采样
问题:无法加载 GPT-2 权重
安装 transformers:
pip install transformers
检查模型名称:
init_from = 'gpt2' # 有效值:gpt2, gpt2-medium, gpt2-large, gpt2-xl
模型架构:请参阅 references/architecture.md 了解 GPT 块结构、多头注意力和 MLP 层的简单解释。
训练循环:请参阅 references/training.md 了解学习率调度、梯度累积和分布式数据并行设置。
数据准备:请参阅 references/data.md 了解分词策略(字符级 vs BPE)和二进制格式详情。
莎士比亚(字符级):
GPT-2 (124M):
GPT-2 Medium (350M):
性能:
compile=True:2 倍加速dtype=bfloat16:内存减少 50%每周安装量
155
仓库
GitHub 星标数
22.6K
首次出现
2026 年 1 月 21 日
安全审计
已安装于
opencode124
claude-code123
gemini-cli118
cursor110
codex104
antigravity102
nanoGPT is a simplified GPT implementation designed for learning and experimentation.
Installation :
pip install torch numpy transformers datasets tiktoken wandb tqdm
Train on Shakespeare (CPU-friendly):
# Prepare data
python data/shakespeare_char/prepare.py
# Train (5 minutes on CPU)
python train.py config/train_shakespeare_char.py
# Generate text
python sample.py --out_dir=out-shakespeare-char
Output :
ROMEO:
What say'st thou? Shall I speak, and be a man?
JULIET:
I am afeard, and yet I'll speak; for thou art
One that hath been a man, and yet I know not
What thou art.
Complete training pipeline :
# Step 1: Prepare data (creates train.bin, val.bin)
python data/shakespeare_char/prepare.py
# Step 2: Train small model
python train.py config/train_shakespeare_char.py
# Step 3: Generate text
python sample.py --out_dir=out-shakespeare-char
Config (config/train_shakespeare_char.py):
# Model config
n_layer = 6 # 6 transformer layers
n_head = 6 # 6 attention heads
n_embd = 384 # 384-dim embeddings
block_size = 256 # 256 char context
# Training config
batch_size = 64
learning_rate = 1e-3
max_iters = 5000
eval_interval = 500
# Hardware
device = 'cpu' # Or 'cuda'
compile = False # Set True for PyTorch 2.0
Training time : ~5 minutes (CPU), ~1 minute (GPU)
Multi-GPU training on OpenWebText :
# Step 1: Prepare OpenWebText (takes ~1 hour)
python data/openwebtext/prepare.py
# Step 2: Train GPT-2 124M with DDP (8 GPUs)
torchrun --standalone --nproc_per_node=8 \
train.py config/train_gpt2.py
# Step 3: Sample from trained model
python sample.py --out_dir=out
Config (config/train_gpt2.py):
# GPT-2 (124M) architecture
n_layer = 12
n_head = 12
n_embd = 768
block_size = 1024
dropout = 0.0
# Training
batch_size = 12
gradient_accumulation_steps = 5 * 8 # Total batch ~0.5M tokens
learning_rate = 6e-4
max_iters = 600000
lr_decay_iters = 600000
# System
compile = True # PyTorch 2.0
Training time : ~4 days (8× A100)
Start from OpenAI checkpoint :
# In train.py or config
init_from = 'gpt2' # Options: gpt2, gpt2-medium, gpt2-large, gpt2-xl
# Model loads OpenAI weights automatically
python train.py config/finetune_shakespeare.py
Example config (config/finetune_shakespeare.py):
# Start from GPT-2
init_from = 'gpt2'
# Dataset
dataset = 'shakespeare_char'
batch_size = 1
block_size = 1024
# Fine-tuning
learning_rate = 3e-5 # Lower LR for fine-tuning
max_iters = 2000
warmup_iters = 100
# Regularization
weight_decay = 1e-1
Train on your own text :
# data/custom/prepare.py
import numpy as np
# Load your data
with open('my_data.txt', 'r') as f:
text = f.read()
# Create character mappings
chars = sorted(list(set(text)))
stoi = {ch: i for i, ch in enumerate(chars)}
itos = {i: ch for i, ch in enumerate(chars)}
# Tokenize
data = np.array([stoi[ch] for ch in text], dtype=np.uint16)
# Split train/val
n = len(data)
train_data = data[:int(n*0.9)]
val_data = data[int(n*0.9):]
# Save
train_data.tofile('data/custom/train.bin')
val_data.tofile('data/custom/val.bin')
Train :
python data/custom/prepare.py
python train.py --dataset=custom
Use nanoGPT when :
Simplicity advantages :
model.pytrain.pyUse alternatives instead :
Issue: CUDA out of memory
Reduce batch size or context length:
batch_size = 1 # Reduce from 12
block_size = 512 # Reduce from 1024
gradient_accumulation_steps = 40 # Increase to maintain effective batch
Issue: Training too slow
Enable compilation (PyTorch 2.0+):
compile = True # 2× speedup
Use mixed precision:
dtype = 'bfloat16' # Or 'float16'
Issue: Poor generation quality
Train longer:
max_iters = 10000 # Increase from 5000
Lower temperature:
# In sample.py
temperature = 0.7 # Lower from 1.0
top_k = 200 # Add top-k sampling
Issue: Can't load GPT-2 weights
Install transformers:
pip install transformers
Check model name:
init_from = 'gpt2' # Valid: gpt2, gpt2-medium, gpt2-large, gpt2-xl
Model architecture : See references/architecture.md for GPT block structure, multi-head attention, and MLP layers explained simply.
Training loop : See references/training.md for learning rate schedule, gradient accumulation, and distributed data parallel setup.
Data preparation : See references/data.md for tokenization strategies (character-level vs BPE) and binary format details.
Shakespeare (char-level) :
GPT-2 (124M) :
GPT-2 Medium (350M) :
Performance :
compile=True: 2× speedupdtype=bfloat16: 50% memory reductionWeekly Installs
155
Repository
GitHub Stars
22.6K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
opencode124
claude-code123
gemini-cli118
cursor110
codex104
antigravity102
AI 代码实施计划编写技能 | 自动化开发任务分解与 TDD 流程规划工具
47,700 周安装
Tailwind CSS v4 最佳实践:Vite插件、@theme指令、OKLCH颜色格式详解
485 周安装
Context7自动研究技能:为Claude Code自动获取最新库框架文档
480 周安装
App Builder 应用构建编排器:AI 智能规划技术栈、项目结构与多代理协调
483 周安装
Astro CLI 本地环境管理指南:启动、停止、重启 Airflow 及故障排除
485 周安装
简明规划技能:AI辅助项目规划工具,生成可执行原子化步骤计划
487 周安装
Java测试技能:JUnit 5、Mockito、Testcontainers集成测试与TDD实践
488 周安装