implementing-llms-litgpt by davila7/claude-code-templates
npx skills add https://github.com/davila7/claude-code-templates --skill implementing-llms-litgptLitGPT 提供了 20 多个预训练大语言模型的实现,代码清晰易读,并包含可用于生产的训练工作流。
安装:
pip install 'litgpt[extra]'
加载并使用任意模型:
from litgpt import LLM
# 加载预训练模型
llm = LLM.load("microsoft/phi-2")
# 生成文本
result = llm.generate(
"What is the capital of France?",
max_new_tokens=50,
temperature=0.7
)
print(result)
列出可用模型:
litgpt download list
复制此清单:
Fine-Tuning Setup:
- [ ] Step 1: Download pretrained model
- [ ] Step 2: Prepare dataset
- [ ] Step 3: Configure training
- [ ] Step 4: Run fine-tuning
步骤 1:下载预训练模型
# 下载 Llama 3 8B
litgpt download meta-llama/Meta-Llama-3-8B
# 下载 Phi-2(更小,更快)
litgpt download microsoft/phi-2
# 下载 Gemma 2B
litgpt download google/gemma-2b
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
模型将保存到 checkpoints/ 目录。
步骤 2:准备数据集
LitGPT 支持多种格式:
Alpaca 格式(指令-回复):
[
{
"instruction": "What is the capital of France?",
"input": "",
"output": "The capital of France is Paris."
},
{
"instruction": "Translate to Spanish: Hello, how are you?",
"input": "",
"output": "Hola, ¿cómo estás?"
}
]
保存为 data/my_dataset.json。
步骤 3:配置训练
# 全量微调(7B 模型需要 40GB+ 显存)
litgpt finetune \
meta-llama/Meta-Llama-3-8B \
--data JSON \
--data.json_path data/my_dataset.json \
--train.max_steps 1000 \
--train.learning_rate 2e-5 \
--train.micro_batch_size 1 \
--train.global_batch_size 16
# LoRA 微调(高效,16GB 显存)
litgpt finetune_lora \
microsoft/phi-2 \
--data JSON \
--data.json_path data/my_dataset.json \
--lora_r 16 \
--lora_alpha 32 \
--lora_dropout 0.05 \
--train.max_steps 1000 \
--train.learning_rate 1e-4
步骤 4:运行微调
训练会自动将检查点保存到 out/finetune/。
监控训练:
# 查看日志
tail -f out/finetune/logs.txt
# TensorBoard(如果使用了 --train.logger_name tensorboard)
tensorboard --logdir out/finetune/lightning_logs
最节省内存的方案。
LoRA Training:
- [ ] Step 1: Choose base model
- [ ] Step 2: Configure LoRA parameters
- [ ] Step 3: Train with LoRA
- [ ] Step 4: Merge LoRA weights (optional)
步骤 1:选择基础模型
对于有限 GPU 显存(12-16GB):
步骤 2:配置 LoRA 参数
litgpt finetune_lora \
microsoft/phi-2 \
--data JSON \
--data.json_path data/my_dataset.json \
--lora_r 16 \ # LoRA 秩 (8-64,越高=容量越大)
--lora_alpha 32 \ # LoRA 缩放系数(通常为 2×r)
--lora_dropout 0.05 \ # 防止过拟合
--lora_query true \ # 对查询投影应用 LoRA
--lora_key false \ # 通常不需要
--lora_value true \ # 对值投影应用 LoRA
--lora_projection true \ # 对输出投影应用 LoRA
--lora_mlp false \ # 通常不需要
--lora_head false # 通常不需要
LoRA 秩指南:
r=8:轻量级,2-4MB 适配器r=16:标准,质量良好r=32:高容量,用于复杂任务r=64:最高质量,适配器大小增加 4 倍步骤 3:使用 LoRA 训练
litgpt finetune_lora \
microsoft/phi-2 \
--data JSON \
--data.json_path data/my_dataset.json \
--lora_r 16 \
--train.epochs 3 \
--train.learning_rate 1e-4 \
--train.micro_batch_size 4 \
--train.global_batch_size 32 \
--out_dir out/phi2-lora
# 内存使用:Phi-2 带 LoRA 约 8-12GB
步骤 4:合并 LoRA 权重(可选)
将 LoRA 适配器合并到基础模型中以供部署:
litgpt merge_lora \
out/phi2-lora/final \
--out_dir out/phi2-merged
现在使用合并后的模型:
from litgpt import LLM
llm = LLM.load("out/phi2-merged")
在你的领域数据上训练新模型。
Pretraining:
- [ ] Step 1: Prepare pretraining dataset
- [ ] Step 2: Configure model architecture
- [ ] Step 3: Set up multi-GPU training
- [ ] Step 4: Launch pretraining
步骤 1:准备预训练数据集
LitGPT 期望使用分词后的数据。使用 prepare_dataset.py:
python scripts/prepare_dataset.py \
--source_path data/my_corpus.txt \
--checkpoint_dir checkpoints/tokenizer \
--destination_path data/pretrain \
--split train,val
步骤 2:配置模型架构
编辑配置文件或使用现有配置:
# config/pythia-160m.yaml
model_name: pythia-160m
block_size: 2048
vocab_size: 50304
n_layer: 12
n_head: 12
n_embd: 768
rotary_percentage: 0.25
parallel_residual: true
bias: true
步骤 3:设置多 GPU 训练
# 单 GPU
litgpt pretrain \
--config config/pythia-160m.yaml \
--data.data_dir data/pretrain \
--train.max_tokens 10_000_000_000
# 使用 FSDP 的多 GPU
litgpt pretrain \
--config config/pythia-1b.yaml \
--data.data_dir data/pretrain \
--devices 8 \
--train.max_tokens 100_000_000_000
步骤 4:启动预训练
对于集群上的大规模预训练:
# 使用 SLURM
sbatch --nodes=8 --gpus-per-node=8 \
pretrain_script.sh
# pretrain_script.sh 内容:
litgpt pretrain \
--config config/pythia-1b.yaml \
--data.data_dir /shared/data/pretrain \
--devices 8 \
--num_nodes 8 \
--train.global_batch_size 512 \
--train.max_tokens 300_000_000_000
导出 LitGPT 模型用于生产环境。
Model Deployment:
- [ ] Step 1: Test inference locally
- [ ] Step 2: Quantize model (optional)
- [ ] Step 3: Convert to GGUF (for llama.cpp)
- [ ] Step 4: Deploy with API
步骤 1:本地测试推理
from litgpt import LLM
llm = LLM.load("out/phi2-lora/final")
# 单次生成
print(llm.generate("What is machine learning?"))
# 流式生成
for token in llm.generate("Explain quantum computing", stream=True):
print(token, end="", flush=True)
# 批量推理
prompts = ["Hello", "Goodbye", "Thank you"]
results = [llm.generate(p) for p in prompts]
步骤 2:量化模型(可选)
以最小的质量损失减少模型大小:
# 8 位量化(大小减少 50%)
litgpt convert_lit_checkpoint \
out/phi2-lora/final \
--dtype bfloat16 \
--quantize bnb.nf4
# 4 位量化(大小减少 75%)
litgpt convert_lit_checkpoint \
out/phi2-lora/final \
--quantize bnb.nf4-dq # 双重量化
步骤 3:转换为 GGUF(用于 llama.cpp)
python scripts/convert_lit_checkpoint.py \
--checkpoint_path out/phi2-lora/final \
--output_path models/phi2.gguf \
--model_name microsoft/phi-2
步骤 4:使用 API 部署
from fastapi import FastAPI
from litgpt import LLM
app = FastAPI()
llm = LLM.load("out/phi2-lora/final")
@app.post("/generate")
def generate(prompt: str, max_tokens: int = 100):
result = llm.generate(
prompt,
max_new_tokens=max_tokens,
temperature=0.7
)
return {"response": result}
# 运行:uvicorn api:app --host 0.0.0.0 --port 8000
在以下情况下使用 LitGPT:
在以下情况下使用替代方案:
问题:微调时内存不足
使用 LoRA 代替全量微调:
# 代替 litgpt finetune(需要 40GB+)
litgpt finetune_lora # 仅需 12-16GB
或启用梯度检查点:
litgpt finetune_lora \
... \
--train.gradient_accumulation_iters 4 # 累积梯度
问题:训练太慢
启用 Flash Attention(内置,在兼容硬件上自动启用):
# 在 Ampere+ GPU(A100,RTX 30/40 系列)上默认已启用
# 无需配置
使用更小的微批次并进行累积:
--train.micro_batch_size 1 \
--train.global_batch_size 32 \
--train.gradient_accumulation_iters 32 # 有效批次大小=32
问题:模型无法加载
检查模型名称:
# 列出所有可用模型
litgpt download list
# 如果不存在则下载
litgpt download meta-llama/Meta-Llama-3-8B
验证检查点目录:
ls checkpoints/
# 应该看到:meta-llama/Meta-Llama-3-8B/
问题:LoRA 适配器太大
降低 LoRA 秩:
--lora_r 8 # 代替 16 或 32
对更少的层应用 LoRA:
--lora_query true \
--lora_value true \
--lora_projection false \ # 禁用此项
--lora_mlp false # 以及此项
支持的架构:查看 references/supported-models.md 获取包含 20 多个模型系列及其大小和能力的完整列表。
训练方案:查看 references/training-recipes.md 获取经过验证的预训练和微调超参数配置。
FSDP 配置:查看 references/distributed-training.md 获取使用完全分片数据并行的多 GPU 训练配置。
自定义架构:查看 references/custom-models.md 获取以 LitGPT 风格实现新模型架构的方法。
每周安装量
166
仓库
GitHub 星标数
23.4K
首次出现
2026 年 1 月 21 日
安全审计
安装于
opencode136
claude-code134
gemini-cli132
cursor127
codex117
antigravity114
LitGPT provides 20+ pretrained LLM implementations with clean, readable code and production-ready training workflows.
Installation :
pip install 'litgpt[extra]'
Load and use any model :
from litgpt import LLM
# Load pretrained model
llm = LLM.load("microsoft/phi-2")
# Generate text
result = llm.generate(
"What is the capital of France?",
max_new_tokens=50,
temperature=0.7
)
print(result)
List available models :
litgpt download list
Copy this checklist:
Fine-Tuning Setup:
- [ ] Step 1: Download pretrained model
- [ ] Step 2: Prepare dataset
- [ ] Step 3: Configure training
- [ ] Step 4: Run fine-tuning
Step 1: Download pretrained model
# Download Llama 3 8B
litgpt download meta-llama/Meta-Llama-3-8B
# Download Phi-2 (smaller, faster)
litgpt download microsoft/phi-2
# Download Gemma 2B
litgpt download google/gemma-2b
Models are saved to checkpoints/ directory.
Step 2: Prepare dataset
LitGPT supports multiple formats:
Alpaca format (instruction-response):
[
{
"instruction": "What is the capital of France?",
"input": "",
"output": "The capital of France is Paris."
},
{
"instruction": "Translate to Spanish: Hello, how are you?",
"input": "",
"output": "Hola, ¿cómo estás?"
}
]
Save as data/my_dataset.json.
Step 3: Configure training
# Full fine-tuning (requires 40GB+ GPU for 7B models)
litgpt finetune \
meta-llama/Meta-Llama-3-8B \
--data JSON \
--data.json_path data/my_dataset.json \
--train.max_steps 1000 \
--train.learning_rate 2e-5 \
--train.micro_batch_size 1 \
--train.global_batch_size 16
# LoRA fine-tuning (efficient, 16GB GPU)
litgpt finetune_lora \
microsoft/phi-2 \
--data JSON \
--data.json_path data/my_dataset.json \
--lora_r 16 \
--lora_alpha 32 \
--lora_dropout 0.05 \
--train.max_steps 1000 \
--train.learning_rate 1e-4
Step 4: Run fine-tuning
Training saves checkpoints to out/finetune/ automatically.
Monitor training:
# View logs
tail -f out/finetune/logs.txt
# TensorBoard (if using --train.logger_name tensorboard)
tensorboard --logdir out/finetune/lightning_logs
Most memory-efficient option.
LoRA Training:
- [ ] Step 1: Choose base model
- [ ] Step 2: Configure LoRA parameters
- [ ] Step 3: Train with LoRA
- [ ] Step 4: Merge LoRA weights (optional)
Step 1: Choose base model
For limited GPU memory (12-16GB):
Step 2: Configure LoRA parameters
litgpt finetune_lora \
microsoft/phi-2 \
--data JSON \
--data.json_path data/my_dataset.json \
--lora_r 16 \ # LoRA rank (8-64, higher=more capacity)
--lora_alpha 32 \ # LoRA scaling (typically 2×r)
--lora_dropout 0.05 \ # Prevent overfitting
--lora_query true \ # Apply LoRA to query projection
--lora_key false \ # Usually not needed
--lora_value true \ # Apply LoRA to value projection
--lora_projection true \ # Apply LoRA to output projection
--lora_mlp false \ # Usually not needed
--lora_head false # Usually not needed
LoRA rank guide:
r=8: Lightweight, 2-4MB adaptersr=16: Standard, good qualityr=32: High capacity, use for complex tasksr=64: Maximum quality, 4× larger adaptersStep 3: Train with LoRA
litgpt finetune_lora \
microsoft/phi-2 \
--data JSON \
--data.json_path data/my_dataset.json \
--lora_r 16 \
--train.epochs 3 \
--train.learning_rate 1e-4 \
--train.micro_batch_size 4 \
--train.global_batch_size 32 \
--out_dir out/phi2-lora
# Memory usage: ~8-12GB for Phi-2 with LoRA
Step 4: Merge LoRA weights (optional)
Merge LoRA adapters into base model for deployment:
litgpt merge_lora \
out/phi2-lora/final \
--out_dir out/phi2-merged
Now use merged model:
from litgpt import LLM
llm = LLM.load("out/phi2-merged")
Train new model on your domain data.
Pretraining:
- [ ] Step 1: Prepare pretraining dataset
- [ ] Step 2: Configure model architecture
- [ ] Step 3: Set up multi-GPU training
- [ ] Step 4: Launch pretraining
Step 1: Prepare pretraining dataset
LitGPT expects tokenized data. Use prepare_dataset.py:
python scripts/prepare_dataset.py \
--source_path data/my_corpus.txt \
--checkpoint_dir checkpoints/tokenizer \
--destination_path data/pretrain \
--split train,val
Step 2: Configure model architecture
Edit config file or use existing:
# config/pythia-160m.yaml
model_name: pythia-160m
block_size: 2048
vocab_size: 50304
n_layer: 12
n_head: 12
n_embd: 768
rotary_percentage: 0.25
parallel_residual: true
bias: true
Step 3: Set up multi-GPU training
# Single GPU
litgpt pretrain \
--config config/pythia-160m.yaml \
--data.data_dir data/pretrain \
--train.max_tokens 10_000_000_000
# Multi-GPU with FSDP
litgpt pretrain \
--config config/pythia-1b.yaml \
--data.data_dir data/pretrain \
--devices 8 \
--train.max_tokens 100_000_000_000
Step 4: Launch pretraining
For large-scale pretraining on cluster:
# Using SLURM
sbatch --nodes=8 --gpus-per-node=8 \
pretrain_script.sh
# pretrain_script.sh content:
litgpt pretrain \
--config config/pythia-1b.yaml \
--data.data_dir /shared/data/pretrain \
--devices 8 \
--num_nodes 8 \
--train.global_batch_size 512 \
--train.max_tokens 300_000_000_000
Export LitGPT models for production.
Model Deployment:
- [ ] Step 1: Test inference locally
- [ ] Step 2: Quantize model (optional)
- [ ] Step 3: Convert to GGUF (for llama.cpp)
- [ ] Step 4: Deploy with API
Step 1: Test inference locally
from litgpt import LLM
llm = LLM.load("out/phi2-lora/final")
# Single generation
print(llm.generate("What is machine learning?"))
# Streaming
for token in llm.generate("Explain quantum computing", stream=True):
print(token, end="", flush=True)
# Batch inference
prompts = ["Hello", "Goodbye", "Thank you"]
results = [llm.generate(p) for p in prompts]
Step 2: Quantize model (optional)
Reduce model size with minimal quality loss:
# 8-bit quantization (50% size reduction)
litgpt convert_lit_checkpoint \
out/phi2-lora/final \
--dtype bfloat16 \
--quantize bnb.nf4
# 4-bit quantization (75% size reduction)
litgpt convert_lit_checkpoint \
out/phi2-lora/final \
--quantize bnb.nf4-dq # Double quantization
Step 3: Convert to GGUF (for llama.cpp)
python scripts/convert_lit_checkpoint.py \
--checkpoint_path out/phi2-lora/final \
--output_path models/phi2.gguf \
--model_name microsoft/phi-2
Step 4: Deploy with API
from fastapi import FastAPI
from litgpt import LLM
app = FastAPI()
llm = LLM.load("out/phi2-lora/final")
@app.post("/generate")
def generate(prompt: str, max_tokens: int = 100):
result = llm.generate(
prompt,
max_new_tokens=max_tokens,
temperature=0.7
)
return {"response": result}
# Run: uvicorn api:app --host 0.0.0.0 --port 8000
Use LitGPT when:
Use alternatives instead:
Issue: Out of memory during fine-tuning
Use LoRA instead of full fine-tuning:
# Instead of litgpt finetune (requires 40GB+)
litgpt finetune_lora # Only needs 12-16GB
Or enable gradient checkpointing:
litgpt finetune_lora \
... \
--train.gradient_accumulation_iters 4 # Accumulate gradients
Issue: Training too slow
Enable Flash Attention (built-in, automatic on compatible hardware):
# Already enabled by default on Ampere+ GPUs (A100, RTX 30/40 series)
# No configuration needed
Use smaller micro-batch and accumulate:
--train.micro_batch_size 1 \
--train.global_batch_size 32 \
--train.gradient_accumulation_iters 32 # Effective batch=32
Issue: Model not loading
Check model name:
# List all available models
litgpt download list
# Download if not exists
litgpt download meta-llama/Meta-Llama-3-8B
Verify checkpoints directory:
ls checkpoints/
# Should see: meta-llama/Meta-Llama-3-8B/
Issue: LoRA adapters too large
Reduce LoRA rank:
--lora_r 8 # Instead of 16 or 32
Apply LoRA to fewer layers:
--lora_query true \
--lora_value true \
--lora_projection false \ # Disable this
--lora_mlp false # And this
Supported architectures : See references/supported-models.md for complete list of 20+ model families with sizes and capabilities.
Training recipes : See references/training-recipes.md for proven hyperparameter configurations for pretraining and fine-tuning.
FSDP configuration : See references/distributed-training.md for multi-GPU training with Fully Sharded Data Parallel.
Custom architectures : See references/custom-models.md for implementing new model architectures in LitGPT style.
Weekly Installs
166
Repository
GitHub Stars
23.4K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
opencode136
claude-code134
gemini-cli132
cursor127
codex117
antigravity114
超能力技能使用指南:AI助手技能调用优先级与工作流程详解
52,100 周安装
Benchling Python SDK与REST API集成指南:生物信息学自动化与生命科学研发
178 周安装
Hypogenic:基于LLM的自动化科学假设生成与测试框架,加速AI科研发现
179 周安装
Arboreto:基因调控网络推断Python库,支持GRNBoost2/GENIE3算法与分布式计算
180 周安装
Clerk 身份验证模板 - Next.js 14/15 App Router 完整设置与路由保护
181 周安装
Next.js服务端与客户端组件选择指南:TypeScript最佳实践与性能优化
180 周安装
Fastify TypeScript 开发指南:构建高性能、类型安全的 Node.js API
181 周安装