pufferlib by davila7/claude-code-templates
npx skills add https://github.com/davila7/claude-code-templates --skill pufferlibPufferLib 是一个高性能强化学习库,专为快速并行环境模拟和训练而设计。它通过优化的向量化、原生多智能体支持和高效的 PPO 实现(PuffeRL),实现了每秒数百万步的训练速度。该库提供了包含 20 多个环境的 Ocean 套件,并与 Gymnasium、PettingZoo 和专门的 RL 框架无缝集成。
在以下情况下使用此技能:
PuffeRL 是 PufferLib 优化的 PPO+LSTM 训练算法,可实现每秒 1M-4M 步。
快速开始训练:
# CLI 训练
puffer train procgen-coinrun --train.device cuda --train.learning-rate 3e-4
# 分布式训练
torchrun --nproc_per_node=4 train.py
Python 训练循环:
import pufferlib
from pufferlib import PuffeRL
# 创建向量化环境
env = pufferlib.make('procgen-coinrun', num_envs=256)
# 创建训练器
trainer = PuffeRL(
env=env,
policy=my_policy,
device='cuda',
learning_rate=3e-4,
batch_size=32768
)
# 训练循环
for iteration in range(num_iterations):
trainer.evaluate() # 收集轨迹
trainer.train() # 批量训练
trainer.mean_and_log() # 记录结果
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
有关全面的训练指导,请阅读 references/training.md:
使用 PufferEnv API 创建自定义高性能环境。
基本环境结构:
import numpy as np
from pufferlib import PufferEnv
class MyEnvironment(PufferEnv):
def __init__(self, buf=None):
super().__init__(buf)
# 定义空间
self.observation_space = self.make_space((4,))
self.action_space = self.make_discrete(4)
self.reset()
def reset(self):
# 重置状态并返回初始观察
return np.zeros(4, dtype=np.float32)
def step(self, action):
# 执行动作,计算奖励,检查是否结束
obs = self._get_observation()
reward = self._compute_reward()
done = self._is_done()
info = {}
return obs, reward, done, info
使用模板脚本: scripts/env_template.py 提供了完整的单智能体和多智能体环境模板,包含以下示例:
有关完整的环境开发,请阅读 references/environments.md:
通过优化的并行模拟实现最大吞吐量。
向量化设置:
import pufferlib
# 自动向量化
env = pufferlib.make('environment_name', num_envs=256, num_workers=8)
# 性能基准:
# - 纯 Python 环境:100k-500k SPS
# - C 语言环境:100M+ SPS
# - 包含训练:400k-4M 总 SPS
关键优化:
有关向量化优化,请阅读 references/vectorization.md:
将策略构建为标准 PyTorch 模块,并带有可选工具。
基本策略结构:
import torch.nn as nn
from pufferlib.pytorch import layer_init
class Policy(nn.Module):
def __init__(self, observation_space, action_space):
super().__init__()
# 编码器
self.encoder = nn.Sequential(
layer_init(nn.Linear(obs_dim, 256)),
nn.ReLU(),
layer_init(nn.Linear(256, 256)),
nn.ReLU()
)
# Actor 和 critic 头部
self.actor = layer_init(nn.Linear(256, num_actions), std=0.01)
self.critic = layer_init(nn.Linear(256, 1), std=1.0)
def forward(self, observations):
features = self.encoder(observations)
return self.actor(features), self.critic(features)
有关完整的策略开发,请阅读 references/policies.md:
无缝集成来自流行 RL 框架的环境。
Gymnasium 集成:
import gymnasium as gym
import pufferlib
# 包装 Gymnasium 环境
gym_env = gym.make('CartPole-v1')
env = pufferlib.emulate(gym_env, num_envs=256)
# 或直接使用 make
env = pufferlib.make('gym-CartPole-v1', num_envs=256)
PettingZoo 多智能体:
# 多智能体环境
env = pufferlib.make('pettingzoo-knights-archers-zombies', num_envs=128)
支持的框架:
有关集成详细信息,请阅读 references/integration.md:
scripts/train_template.py 作为起点references/training.md 进行优化scripts/env_template.py 开始reset() 和 step() 方法pufferlib.emulate() 或 make() 进行向量化references/environments.md 获取高级模式references/vectorization.md 进行优化layer_init 进行适当的权重初始化references/policies.md 中的模式references/vectorization.md 进行系统优化train_template.py - 完整的训练脚本模板,包含:
env_template.py - 环境实现模板:
training.md - 全面的训练指南:
environments.md - 环境开发指南:
vectorization.md - 向量化优化:
policies.md - 策略架构指南:
integration.md - 框架集成指南:
scripts/train_template.py 和 scripts/env_template.py 提供了可靠的起点pufferlib.pytorch 中的 layer_init 来初始化策略# Atari
env = pufferlib.make('atari-pong', num_envs=256)
# Procgen
env = pufferlib.make('procgen-coinrun', num_envs=256)
# Minigrid
env = pufferlib.make('minigrid-empty-8x8', num_envs=256)
# PettingZoo
env = pufferlib.make('pettingzoo-pistonball', num_envs=128)
# 所有智能体的共享策略
policy = create_policy(env.observation_space, env.action_space)
trainer = PuffeRL(env=env, policy=policy)
# 创建自定义环境
class MyTask(PufferEnv):
# ... 实现环境 ...
# 向量化并训练
env = pufferlib.emulate(MyTask, num_envs=256)
trainer = PuffeRL(env=env, policy=my_policy)
# 最大化吞吐量
env = pufferlib.make(
'my-env',
num_envs=1024, # 大批量
num_workers=16, # 多个工作进程
envs_per_worker=64 # 优化每个工作进程
)
uv pip install pufferlib
每周安装次数
120
代码仓库
GitHub 星标数
22.6K
首次出现
2026年1月21日
安全审计
已安装于
claude-code103
opencode96
cursor93
gemini-cli92
antigravity86
codex81
PufferLib is a high-performance reinforcement learning library designed for fast parallel environment simulation and training. It achieves training at millions of steps per second through optimized vectorization, native multi-agent support, and efficient PPO implementation (PuffeRL). The library provides the Ocean suite of 20+ environments and seamless integration with Gymnasium, PettingZoo, and specialized RL frameworks.
Use this skill when:
PuffeRL is PufferLib's optimized PPO+LSTM training algorithm achieving 1M-4M steps/second.
Quick start training:
# CLI training
puffer train procgen-coinrun --train.device cuda --train.learning-rate 3e-4
# Distributed training
torchrun --nproc_per_node=4 train.py
Python training loop:
import pufferlib
from pufferlib import PuffeRL
# Create vectorized environment
env = pufferlib.make('procgen-coinrun', num_envs=256)
# Create trainer
trainer = PuffeRL(
env=env,
policy=my_policy,
device='cuda',
learning_rate=3e-4,
batch_size=32768
)
# Training loop
for iteration in range(num_iterations):
trainer.evaluate() # Collect rollouts
trainer.train() # Train on batch
trainer.mean_and_log() # Log results
For comprehensive training guidance , read references/training.md for:
Create custom high-performance environments with the PufferEnv API.
Basic environment structure:
import numpy as np
from pufferlib import PufferEnv
class MyEnvironment(PufferEnv):
def __init__(self, buf=None):
super().__init__(buf)
# Define spaces
self.observation_space = self.make_space((4,))
self.action_space = self.make_discrete(4)
self.reset()
def reset(self):
# Reset state and return initial observation
return np.zeros(4, dtype=np.float32)
def step(self, action):
# Execute action, compute reward, check done
obs = self._get_observation()
reward = self._compute_reward()
done = self._is_done()
info = {}
return obs, reward, done, info
Use the template script: scripts/env_template.py provides complete single-agent and multi-agent environment templates with examples of:
For complete environment development , read references/environments.md for:
Achieve maximum throughput with optimized parallel simulation.
Vectorization setup:
import pufferlib
# Automatic vectorization
env = pufferlib.make('environment_name', num_envs=256, num_workers=8)
# Performance benchmarks:
# - Pure Python envs: 100k-500k SPS
# - C-based envs: 100M+ SPS
# - With training: 400k-4M total SPS
Key optimizations:
For vectorization optimization , read references/vectorization.md for:
Build policies as standard PyTorch modules with optional utilities.
Basic policy structure:
import torch.nn as nn
from pufferlib.pytorch import layer_init
class Policy(nn.Module):
def __init__(self, observation_space, action_space):
super().__init__()
# Encoder
self.encoder = nn.Sequential(
layer_init(nn.Linear(obs_dim, 256)),
nn.ReLU(),
layer_init(nn.Linear(256, 256)),
nn.ReLU()
)
# Actor and critic heads
self.actor = layer_init(nn.Linear(256, num_actions), std=0.01)
self.critic = layer_init(nn.Linear(256, 1), std=1.0)
def forward(self, observations):
features = self.encoder(observations)
return self.actor(features), self.critic(features)
For complete policy development , read references/policies.md for:
Seamlessly integrate environments from popular RL frameworks.
Gymnasium integration:
import gymnasium as gym
import pufferlib
# Wrap Gymnasium environment
gym_env = gym.make('CartPole-v1')
env = pufferlib.emulate(gym_env, num_envs=256)
# Or use make directly
env = pufferlib.make('gym-CartPole-v1', num_envs=256)
PettingZoo multi-agent:
# Multi-agent environment
env = pufferlib.make('pettingzoo-knights-archers-zombies', num_envs=128)
Supported frameworks:
For integration details , read references/integration.md for:
scripts/train_template.py as starting pointreferences/training.md for optimizationscripts/env_template.pyreset() and step() methodspufferlib.emulate() or make()references/environments.md for advanced patternsreferences/vectorization.md if neededlayer_init for proper weight initializationreferences/policies.mdreferences/vectorization.md for systematic optimizationtrain_template.py - Complete training script template with:
env_template.py - Environment implementation templates:
training.md - Comprehensive training guide:
environments.md - Environment development guide:
vectorization.md - Vectorization optimization:
policies.md - Policy architecture guide:
integration.md - Framework integration guide:
Start simple : Begin with Ocean environments or Gymnasium integration before creating custom environments
Profile early : Measure steps per second from the start to identify bottlenecks
Use templates : scripts/train_template.py and scripts/env_template.py provide solid starting points
Read references as needed : Each reference file is self-contained and focused on a specific capability
Optimize progressively : Start with Python, profile, then optimize critical paths with C if needed
Leverage vectorization : PufferLib's vectorization is key to achieving high throughput
Monitor training : Use WandB or Neptune to track experiments and identify issues early
Test environments : Validate environment logic before scaling up training
Check existing environments : Ocean suite provides 20+ pre-built environments
Use proper initialization : Always use layer_init from for policies
# Atari
env = pufferlib.make('atari-pong', num_envs=256)
# Procgen
env = pufferlib.make('procgen-coinrun', num_envs=256)
# Minigrid
env = pufferlib.make('minigrid-empty-8x8', num_envs=256)
# PettingZoo
env = pufferlib.make('pettingzoo-pistonball', num_envs=128)
# Shared policy for all agents
policy = create_policy(env.observation_space, env.action_space)
trainer = PuffeRL(env=env, policy=policy)
# Create custom environment
class MyTask(PufferEnv):
# ... implement environment ...
# Vectorize and train
env = pufferlib.emulate(MyTask, num_envs=256)
trainer = PuffeRL(env=env, policy=my_policy)
# Maximize throughput
env = pufferlib.make(
'my-env',
num_envs=1024, # Large batch
num_workers=16, # Many workers
envs_per_worker=64 # Optimize per worker
)
uv pip install pufferlib
Weekly Installs
120
Repository
GitHub Stars
22.6K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
claude-code103
opencode96
cursor93
gemini-cli92
antigravity86
codex81
AI Elements:基于shadcn/ui的AI原生应用组件库,快速构建对话界面
63,800 周安装
Azure Data Explorer (Kusto) 查询分析指南:KQL 语法、性能优化与大数据分析
103,100 周安装
Azure合规性扫描与安全审计工具 - 全面评估资源配置与Key Vault过期监控
103,100 周安装
Azure AI 网关配置指南:使用APIM治理AI模型、MCP工具与智能体 | Microsoft Copilot
103,100 周安装
Azure Application Insights 仪表化指南 - ASP.NET Core/Node.js/Python 应用监控教程
103,100 周安装
Microsoft Foundry 技能指南:部署、调用、监控智能体全流程详解
103,200 周安装
Azure存储服务全解析:Blob、文件、队列、表存储及Data Lake使用指南
103,300 周安装
pufferlib.pytorch