AgentDB Learning Plugins by ruvnet/claude-flow
npx skills add https://github.com/ruvnet/claude-flow --skill 'AgentDB Learning Plugins'通过 AgentDB 的插件系统提供对 9 种强化学习算法的访问。创建、训练和部署用于自主智能体的学习插件,使其能够通过经验不断改进。包含离线强化学习(决策变换器)、基于价值的学习(Q-Learning)、策略梯度(Actor-Critic)以及高级技术。
性能表现:通过 WASM 加速的神经推理,模型训练速度提升 10-100 倍。
# 交互式向导
npx agentdb@latest create-plugin
# 使用特定模板
npx agentdb@latest create-plugin -t decision-transformer -n my-agent
# 预览而不创建
npx agentdb@latest create-plugin -t q-learning --dry-run
# 自定义输出目录
npx agentdb@latest create-plugin -t actor-critic -o .$plugins
# 显示所有插件模板
npx agentdb@latest list-templates
# 可用模板:
# - decision-transformer(序列建模强化学习 - 推荐)
# - q-learning(基于价值的学习)
# - sarsa(同策略时序差分学习)
# - actor-critic(带基线的策略梯度)
# - curiosity-driven(基于探索的)
# 列出已安装的插件
npx agentdb@latest list-plugins
# 获取插件信息
npx agentdb@latest plugin-info my-agent
# 显示:算法、配置、训练状态
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
import { createAgentDBAdapter } from 'agentic-flow$reasoningbank';
// 初始化并启用学习功能
const adapter = await createAgentDBAdapter({
dbPath: '.agentdb$learning.db',
enableLearning: true, // 启用学习插件
enableReasoning: true,
cacheSize: 1000,
});
// 存储训练经验
await adapter.insertPattern({
id: '',
type: 'experience',
domain: 'game-playing',
pattern_data: JSON.stringify({
embedding: await computeEmbedding('state-action-reward'),
pattern: {
state: [0.1, 0.2, 0.3],
action: 2,
reward: 1.0,
next_state: [0.15, 0.25, 0.35],
done: false
}
}),
confidence: 0.9,
usage_count: 1,
success_count: 1,
created_at: Date.now(),
last_used: Date.now(),
});
// 训练学习模型
const metrics = await adapter.train({
epochs: 50,
batchSize: 32,
});
console.log('训练损失:', metrics.loss);
console.log('持续时间:', metrics.duration, '毫秒');
类型:离线强化学习 最佳适用场景:从记录的经验中学习、模仿学习 优势:无需在线交互、训练稳定
npx agentdb@latest create-plugin -t decision-transformer -n dt-agent
使用案例:
配置:
{
"algorithm": "decision-transformer",
"model_size": "base",
"context_length": 20,
"embed_dim": 128,
"n_heads": 8,
"n_layers": 6
}
类型:基于价值的强化学习(异策略) 最佳适用场景:离散动作空间、样本效率 优势:成熟、简单、适用于中小型问题
npx agentdb@latest create-plugin -t q-learning -n q-agent
使用案例:
配置:
{
"algorithm": "q-learning",
"learning_rate": 0.001,
"gamma": 0.99,
"epsilon": 0.1,
"epsilon_decay": 0.995
}
类型:基于价值的强化学习(同策略) 最佳适用场景:安全探索、风险敏感任务 优势:比 Q-Learning 更保守,安全性更好
npx agentdb@latest create-plugin -t sarsa -n sarsa-agent
使用案例:
配置:
{
"algorithm": "sarsa",
"learning_rate": 0.001,
"gamma": 0.99,
"epsilon": 0.1
}
类型:带价值基线的策略梯度 最佳适用场景:连续动作、方差减少 优势:稳定,适用于连续/离散动作
npx agentdb@latest create-plugin -t actor-critic -n ac-agent
使用案例:
配置:
{
"algorithm": "actor-critic",
"actor_lr": 0.001,
"critic_lr": 0.002,
"gamma": 0.99,
"entropy_coef": 0.01
}
类型:基于查询的学习 最佳适用场景:标签高效学习、人在回路 优势:最小化标注成本,专注于不确定样本
使用案例:
类型:鲁棒性增强 最佳适用场景:安全性、对扰动的鲁棒性 优势:提高模型鲁棒性,对抗防御
使用案例:
类型:渐进难度训练 最佳适用场景:复杂任务、更快收敛 优势:稳定学习,在困难任务上收敛更快
使用案例:
类型:分布式学习 最佳适用场景:隐私保护、分布式数据 优势:隐私保护、可扩展
使用案例:
类型:迁移学习 最佳适用场景:相关任务、知识共享 优势:在新任务上学习更快,泛化能力更好
使用案例:
// 在智能体执行期间存储经验
for (let i = 0; i < numEpisodes; i++) {
const episode = runEpisode();
for (const step of episode.steps) {
await adapter.insertPattern({
id: '',
type: 'experience',
domain: 'task-domain',
pattern_data: JSON.stringify({
embedding: await computeEmbedding(JSON.stringify(step)),
pattern: {
state: step.state,
action: step.action,
reward: step.reward,
next_state: step.next_state,
done: step.done
}
}),
confidence: step.reward > 0 ? 0.9 : 0.5,
usage_count: 1,
success_count: step.reward > 0 ? 1 : 0,
created_at: Date.now(),
last_used: Date.now(),
});
}
}
// 在收集的经验上训练
const trainingMetrics = await adapter.train({
epochs: 100,
batchSize: 64,
learningRate: 0.001,
validationSplit: 0.2,
});
console.log('训练指标:', trainingMetrics);
// {
// loss: 0.023,
// valLoss: 0.028,
// duration: 1523,
// epochs: 100
// }
// 检索类似的成功经验
const testQuery = await computeEmbedding(JSON.stringify(testState));
const result = await adapter.retrieveWithReasoning(testQuery, {
domain: 'task-domain',
k: 10,
synthesizeContext: true,
});
// 评估动作质量
const suggestedAction = result.memories[0].pattern.action;
const confidence = result.memories[0].similarity;
console.log('建议动作:', suggestedAction);
console.log('置信度:', confidence);
// 将经验存储在缓冲区中
const replayBuffer = [];
// 为训练随机采样批次
const batch = sampleRandomBatch(replayBuffer, batchSize: 32);
// 在批次上训练
await adapter.train({
data: batch,
epochs: 1,
batchSize: 32,
});
// 存储带优先级(TD误差)的经验
await adapter.insertPattern({
// ... 标准字段
confidence: tdError, // 使用 TD 误差作为置信度/优先级
// ...
});
// 检索高优先级经验
const highPriority = await adapter.retrieveWithReasoning(queryEmbedding, {
domain: 'task-domain',
k: 32,
minConfidence: 0.7, // 仅高 TD 误差经验
});
// 从多个智能体收集经验
for (const agent of agents) {
const experience = await agent.step();
await adapter.insertPattern({
// ... 存储带智能体 ID 的经验
domain: `multi-agent/${agent.id}`,
});
}
// 训练共享模型
await adapter.train({
epochs: 50,
batchSize: 64,
});
// 收集一批经验
const experiences = collectBatch(size: 1000);
// 批量插入(快 500 倍)
for (const exp of experiences) {
await adapter.insertPattern({ /* ... */ });
}
// 在批次上训练
await adapter.train({
epochs: 10,
batchSize: 128, // 更大的批次以提高效率
});
// 在新数据到达时增量训练
setInterval(async () => {
const newExperiences = getNewExperiences();
if (newExperiences.length > 100) {
await adapter.train({
epochs: 5,
batchSize: 32,
});
}
}, 60000); // 每分钟
将学习与推理结合以获得更好的性能:
// 训练学习模型
await adapter.train({ epochs: 50, batchSize: 32 });
// 使用推理智能体进行推断
const result = await adapter.retrieveWithReasoning(queryEmbedding, {
domain: 'decision-making',
k: 10,
useMMR: true, // 多样化经验
synthesizeContext: true, // 丰富上下文
optimizeMemory: true, // 整合模式
});
// 基于学习经验 + 推理做出决策
const decision = result.context.suggestedAction;
const confidence = result.memories[0].similarity;
# 创建插件
npx agentdb@latest create-plugin -t decision-transformer -n my-plugin
# 列出插件
npx agentdb@latest list-plugins
# 获取插件信息
npx agentdb@latest plugin-info my-plugin
# 列出模板
npx agentdb@latest list-templates
// 降低学习率
await adapter.train({
epochs: 100,
batchSize: 32,
learningRate: 0.0001, // 更低的学习率
});
// 使用验证分割
await adapter.train({
epochs: 50,
batchSize: 64,
validationSplit: 0.2, // 20% 验证
});
// 启用内存优化
await adapter.retrieveWithReasoning(queryEmbedding, {
optimizeMemory: true, // 整合,减少过拟合
});
# 启用量化以加速推理
# 使用二进制量化(快 32 倍)
npx agentdb@latest mcp类别:机器学习 / 强化学习 难度:中级到高级 预计时间:30-60 分钟
每周安装次数
–
代码仓库
GitHub 星标数
26.9K
首次出现时间
–
安全审计
Provides access to 9 reinforcement learning algorithms via AgentDB's plugin system. Create, train, and deploy learning plugins for autonomous agents that improve through experience. Includes offline RL (Decision Transformer), value-based learning (Q-Learning), policy gradients (Actor-Critic), and advanced techniques.
Performance : Train models 10-100x faster with WASM-accelerated neural inference.
# Interactive wizard
npx agentdb@latest create-plugin
# Use specific template
npx agentdb@latest create-plugin -t decision-transformer -n my-agent
# Preview without creating
npx agentdb@latest create-plugin -t q-learning --dry-run
# Custom output directory
npx agentdb@latest create-plugin -t actor-critic -o .$plugins
# Show all plugin templates
npx agentdb@latest list-templates
# Available templates:
# - decision-transformer (sequence modeling RL - recommended)
# - q-learning (value-based learning)
# - sarsa (on-policy TD learning)
# - actor-critic (policy gradient with baseline)
# - curiosity-driven (exploration-based)
# List installed plugins
npx agentdb@latest list-plugins
# Get plugin information
npx agentdb@latest plugin-info my-agent
# Shows: algorithm, configuration, training status
import { createAgentDBAdapter } from 'agentic-flow$reasoningbank';
// Initialize with learning enabled
const adapter = await createAgentDBAdapter({
dbPath: '.agentdb$learning.db',
enableLearning: true, // Enable learning plugins
enableReasoning: true,
cacheSize: 1000,
});
// Store training experience
await adapter.insertPattern({
id: '',
type: 'experience',
domain: 'game-playing',
pattern_data: JSON.stringify({
embedding: await computeEmbedding('state-action-reward'),
pattern: {
state: [0.1, 0.2, 0.3],
action: 2,
reward: 1.0,
next_state: [0.15, 0.25, 0.35],
done: false
}
}),
confidence: 0.9,
usage_count: 1,
success_count: 1,
created_at: Date.now(),
last_used: Date.now(),
});
// Train learning model
const metrics = await adapter.train({
epochs: 50,
batchSize: 32,
});
console.log('Training Loss:', metrics.loss);
console.log('Duration:', metrics.duration, 'ms');
Type : Offline Reinforcement Learning Best For : Learning from logged experiences, imitation learning Strengths : No online interaction needed, stable training
npx agentdb@latest create-plugin -t decision-transformer -n dt-agent
Use Cases :
Configuration :
{
"algorithm": "decision-transformer",
"model_size": "base",
"context_length": 20,
"embed_dim": 128,
"n_heads": 8,
"n_layers": 6
}
Type : Value-Based RL (Off-Policy) Best For : Discrete action spaces, sample efficiency Strengths : Proven, simple, works well for small$medium problems
npx agentdb@latest create-plugin -t q-learning -n q-agent
Use Cases :
Configuration :
{
"algorithm": "q-learning",
"learning_rate": 0.001,
"gamma": 0.99,
"epsilon": 0.1,
"epsilon_decay": 0.995
}
Type : Value-Based RL (On-Policy) Best For : Safe exploration, risk-sensitive tasks Strengths : More conservative than Q-Learning, better for safety
npx agentdb@latest create-plugin -t sarsa -n sarsa-agent
Use Cases :
Configuration :
{
"algorithm": "sarsa",
"learning_rate": 0.001,
"gamma": 0.99,
"epsilon": 0.1
}
Type : Policy Gradient with Value Baseline Best For : Continuous actions, variance reduction Strengths : Stable, works for continuous$discrete actions
npx agentdb@latest create-plugin -t actor-critic -n ac-agent
Use Cases :
Configuration :
{
"algorithm": "actor-critic",
"actor_lr": 0.001,
"critic_lr": 0.002,
"gamma": 0.99,
"entropy_coef": 0.01
}
Type : Query-Based Learning Best For : Label-efficient learning, human-in-the-loop Strengths : Minimizes labeling cost, focuses on uncertain samples
Use Cases :
Type : Robustness Enhancement Best For : Safety, robustness to perturbations Strengths : Improves model robustness, adversarial defense
Use Cases :
Type : Progressive Difficulty Training Best For : Complex tasks, faster convergence Strengths : Stable learning, faster convergence on hard tasks
Use Cases :
Type : Distributed Learning Best For : Privacy, distributed data Strengths : Privacy-preserving, scalable
Use Cases :
Type : Transfer Learning Best For : Related tasks, knowledge sharing Strengths : Faster learning on new tasks, better generalization
Use Cases :
// Store experiences during agent execution
for (let i = 0; i < numEpisodes; i++) {
const episode = runEpisode();
for (const step of episode.steps) {
await adapter.insertPattern({
id: '',
type: 'experience',
domain: 'task-domain',
pattern_data: JSON.stringify({
embedding: await computeEmbedding(JSON.stringify(step)),
pattern: {
state: step.state,
action: step.action,
reward: step.reward,
next_state: step.next_state,
done: step.done
}
}),
confidence: step.reward > 0 ? 0.9 : 0.5,
usage_count: 1,
success_count: step.reward > 0 ? 1 : 0,
created_at: Date.now(),
last_used: Date.now(),
});
}
}
// Train on collected experiences
const trainingMetrics = await adapter.train({
epochs: 100,
batchSize: 64,
learningRate: 0.001,
validationSplit: 0.2,
});
console.log('Training Metrics:', trainingMetrics);
// {
// loss: 0.023,
// valLoss: 0.028,
// duration: 1523,
// epochs: 100
// }
// Retrieve similar successful experiences
const testQuery = await computeEmbedding(JSON.stringify(testState));
const result = await adapter.retrieveWithReasoning(testQuery, {
domain: 'task-domain',
k: 10,
synthesizeContext: true,
});
// Evaluate action quality
const suggestedAction = result.memories[0].pattern.action;
const confidence = result.memories[0].similarity;
console.log('Suggested Action:', suggestedAction);
console.log('Confidence:', confidence);
// Store experiences in buffer
const replayBuffer = [];
// Sample random batch for training
const batch = sampleRandomBatch(replayBuffer, batchSize: 32);
// Train on batch
await adapter.train({
data: batch,
epochs: 1,
batchSize: 32,
});
// Store experiences with priority (TD error)
await adapter.insertPattern({
// ... standard fields
confidence: tdError, // Use TD error as confidence$priority
// ...
});
// Retrieve high-priority experiences
const highPriority = await adapter.retrieveWithReasoning(queryEmbedding, {
domain: 'task-domain',
k: 32,
minConfidence: 0.7, // Only high TD-error experiences
});
// Collect experiences from multiple agents
for (const agent of agents) {
const experience = await agent.step();
await adapter.insertPattern({
// ... store experience with agent ID
domain: `multi-agent/${agent.id}`,
});
}
// Train shared model
await adapter.train({
epochs: 50,
batchSize: 64,
});
// Collect batch of experiences
const experiences = collectBatch(size: 1000);
// Batch insert (500x faster)
for (const exp of experiences) {
await adapter.insertPattern({ /* ... */ });
}
// Train on batch
await adapter.train({
epochs: 10,
batchSize: 128, // Larger batch for efficiency
});
// Train incrementally as new data arrives
setInterval(async () => {
const newExperiences = getNewExperiences();
if (newExperiences.length > 100) {
await adapter.train({
epochs: 5,
batchSize: 32,
});
}
}, 60000); // Every minute
Combine learning with reasoning for better performance:
// Train learning model
await adapter.train({ epochs: 50, batchSize: 32 });
// Use reasoning agents for inference
const result = await adapter.retrieveWithReasoning(queryEmbedding, {
domain: 'decision-making',
k: 10,
useMMR: true, // Diverse experiences
synthesizeContext: true, // Rich context
optimizeMemory: true, // Consolidate patterns
});
// Make decision based on learned experiences + reasoning
const decision = result.context.suggestedAction;
const confidence = result.memories[0].similarity;
# Create plugin
npx agentdb@latest create-plugin -t decision-transformer -n my-plugin
# List plugins
npx agentdb@latest list-plugins
# Get plugin info
npx agentdb@latest plugin-info my-plugin
# List templates
npx agentdb@latest list-templates
// Reduce learning rate
await adapter.train({
epochs: 100,
batchSize: 32,
learningRate: 0.0001, // Lower learning rate
});
// Use validation split
await adapter.train({
epochs: 50,
batchSize: 64,
validationSplit: 0.2, // 20% validation
});
// Enable memory optimization
await adapter.retrieveWithReasoning(queryEmbedding, {
optimizeMemory: true, // Consolidate, reduce overfitting
});
# Enable quantization for faster inference
# Use binary quantization (32x faster)
npx agentdb@latest mcpCategory : Machine Learning / Reinforcement Learning Difficulty : Intermediate to Advanced Estimated Time : 30-60 minutes
Weekly Installs
–
Repository
GitHub Stars
26.9K
First Seen
–
Security Audits
AI Elements:基于shadcn/ui的AI原生应用组件库,快速构建对话界面
58,500 周安装