agentic-development by alinaqi/claude-bootstrap
npx skills add https://github.com/alinaqi/claude-bootstrap --skill agentic-development加载方式:base.md + llm-patterns.md + [language].md
用于构建能够使用工具执行多步骤任务的自主人工智能智能体。
来源: Claude Agent SDK | Anthropic Claude 代码最佳实践 | Pydantic AI | Google Gemini 智能体开发 | OpenAI 构建智能体
| 语言/框架 | 默认选择 | 原因 |
|---|---|---|
| Python | Pydantic AI | 类型安全,Pydantic 验证,多模型支持,生产就绪 |
| Node.js / Next.js | Claude Agent SDK | 官方 Anthropic SDK,工具支持,多智能体,原生流式传输 |
from pydantic_ai import Agent
from pydantic import BaseModel
class SearchResult(BaseModel):
title: str
url: str
summary: str
agent = Agent(
'claude-sonnet-4-20250514',
result_type=list[SearchResult],
system_prompt='You are a research assistant.',
)
# Type-safe result
result = await agent.run('Find articles about AI agents')
for item in result.data:
print(f"{item.title}: {item.url}")
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
// Define tools
const tools: Anthropic.Tool[] = [
{
name: "web_search",
description: "Search the web for information",
input_schema: {
type: "object",
properties: {
query: { type: "string", description: "Search query" },
},
required: ["query"],
},
},
];
// Agentic loop
async function runAgent(prompt: string) {
const messages: Anthropic.MessageParam[] = [
{ role: "user", content: prompt },
];
while (true) {
const response = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 4096,
tools,
messages,
});
// Check for tool use
if (response.stop_reason === "tool_use") {
const toolUse = response.content.find((b) => b.type === "tool_use");
if (toolUse) {
const result = await executeTool(toolUse.name, toolUse.input);
messages.push({ role: "assistant", content: response.content });
messages.push({
role: "user",
content: [{ type: "tool_result", tool_use_id: toolUse.id, content: result }],
});
continue;
}
}
// Done - return final response
return response.content.find((b) => b.type === "text")?.text;
}
}
先计划,再逐步行动,始终验证。
在执行前进行研究和计划的智能体,其表现始终优于直接开始行动的智能体。将复杂任务分解为可验证的步骤,明智地使用工具,并在整个执行过程中保持清晰的状态。
┌─────────────────────────────────────────────────┐
│ AGENT │
├─────────────────────────────────────────────────┤
│ Model (Brain) │ LLM for reasoning & │
│ │ decision-making │
├─────────────────────┼───────────────────────────┤
│ Tools (Arms/Legs) │ APIs, functions, external │
│ │ systems for action │
├─────────────────────┼───────────────────────────┤
│ Instructions │ System prompts defining │
│ (Rules) │ behavior & boundaries │
└─────────────────────┴───────────────────────────┘
project/
├── src/
│ ├── agents/
│ │ ├── orchestrator.ts # Main agent coordinator
│ │ ├── specialized/ # Task-specific agents
│ │ │ ├── researcher.ts
│ │ │ ├── coder.ts
│ │ │ └── reviewer.ts
│ │ └── base.ts # Shared agent interface
│ ├── tools/
│ │ ├── definitions/ # Tool schemas
│ │ ├── implementations/ # Tool logic
│ │ └── registry.ts # Tool discovery
│ ├── prompts/
│ │ ├── system/ # Agent instructions
│ │ └── templates/ # Task templates
│ └── memory/
│ ├── conversation.ts # Short-term context
│ └── persistent.ts # Long-term storage
├── tests/
│ ├── agents/ # Agent behavior tests
│ ├── tools/ # Tool unit tests
│ └── evals/ # End-to-end evaluations
└── skills/ # Agent skills (Anthropic pattern)
├── skill-name/
│ ├── instructions.md
│ ├── scripts/
│ └── resources/
// Gather context before acting
async function explore(task: Task): Promise<Context> {
const relevantFiles = await agent.searchCodebase(task.query);
const existingPatterns = await agent.analyzePatterns(relevantFiles);
const dependencies = await agent.identifyDependencies(task);
return { relevantFiles, existingPatterns, dependencies };
}
// Plan explicitly before execution
async function plan(task: Task, context: Context): Promise<Plan> {
const prompt = `
Task: ${task.description}
Context: ${JSON.stringify(context)}
Create a step-by-step plan. For each step:
1. What action to take
2. What tools to use
3. How to verify success
4. What could go wrong
Output JSON with steps array.
`;
return await llmCall({ prompt, schema: PlanSchema });
}
// Execute with verification at each step
async function execute(plan: Plan): Promise<Result[]> {
const results: Result[] = [];
for (const step of plan.steps) {
// Execute single step
const result = await executeStep(step);
// Verify before continuing
if (!await verify(step, result)) {
// Self-correct or escalate
const corrected = await selfCorrect(step, result);
if (!corrected.success) {
return handleFailure(step, results);
}
}
results.push(result);
}
return results;
}
// Independent verification prevents overfitting
async function verify(step: Step, result: Result): Promise<boolean> {
// Run tests if available
if (step.testCommand) {
const testResult = await runCommand(step.testCommand);
if (!testResult.success) return false;
}
// Use LLM to verify against criteria
const verification = await llmCall({
prompt: `
Step: ${step.description}
Expected: ${step.successCriteria}
Actual: ${JSON.stringify(result)}
Does the result satisfy the success criteria?
Respond with { "passes": boolean, "reasoning": string }
`,
schema: VerificationSchema
});
return verification.passes;
}
// tools/definitions/file-operations.ts
import { z } from 'zod';
export const ReadFileTool = {
name: 'read_file',
description: 'Read contents of a file. Use before modifying any file.',
parameters: z.object({
path: z.string().describe('Absolute path to the file'),
startLine: z.number().optional().describe('Start line (1-indexed)'),
endLine: z.number().optional().describe('End line (1-indexed)'),
}),
// Risk level for guardrails (OpenAI pattern)
riskLevel: 'low' as const,
};
export const WriteFileTool = {
name: 'write_file',
description: 'Write content to a file. Always read first to understand context.',
parameters: z.object({
path: z.string().describe('Absolute path to the file'),
content: z.string().describe('Complete file content'),
}),
riskLevel: 'medium' as const,
// Require confirmation for high-risk operations
requiresConfirmation: true,
};
// tools/implementations/file-operations.ts
export async function readFile(
params: z.infer<typeof ReadFileTool.parameters>
): Promise<ToolResult> {
try {
const content = await fs.readFile(params.path, 'utf-8');
const lines = content.split('\n');
const start = (params.startLine ?? 1) - 1;
const end = params.endLine ?? lines.length;
return {
success: true,
data: lines.slice(start, end).join('\n'),
metadata: { totalLines: lines.length }
};
} catch (error) {
return {
success: false,
error: `Failed to read file: ${error.message}`
};
}
}
// Use platform-provided tools when available
const agent = createAgent({
tools: [
// Built-in tools (handled by platform)
{ type: 'web_search' },
{ type: 'code_interpreter' },
// Custom tools only when needed
{ type: 'function', function: customDatabaseTool },
],
});
对于大多数任务使用一个智能体。多个智能体会增加复杂性。
// Expose specialized agents as callable tools
const researchAgent = createAgent({
name: 'researcher',
instructions: 'You research topics and return structured findings.',
tools: [webSearchTool, documentReadTool],
});
const mainAgent = createAgent({
tools: [
{
type: 'function',
function: {
name: 'research_topic',
description: 'Delegate research to specialized agent',
parameters: ResearchQuerySchema,
handler: async (query) => researchAgent.run(query),
},
},
],
});
// One-way transfer between agents
const customerServiceAgent = createAgent({
tools: [
// Handoff to specialist when needed
{
name: 'transfer_to_billing',
description: 'Transfer to billing specialist for payment issues',
handler: async (context) => {
return { handoff: 'billing_agent', context };
},
},
],
});
// memory/conversation.ts
interface ConversationMemory {
messages: Message[];
maxTokens: number;
add(message: Message): void;
getContext(): Message[];
summarize(): Promise<string>;
}
// Maintain state across tool calls (Gemini pattern)
interface AgentState {
thoughtSignature?: string; // Encrypted reasoning state
conversationId: string; // For shared memory
currentPlan?: Plan;
completedSteps: Step[];
}
// memory/persistent.ts
interface PersistentMemory {
// Store learnings across sessions
store(key: string, value: any): Promise<void>;
retrieve(key: string): Promise<any>;
// Semantic search over past interactions
search(query: string, limit: number): Promise<Memory[]>;
}
// guards/index.ts
interface GuardrailConfig {
// Input validation
inputClassifier: (input: string) => Promise<SafetyResult>;
// Output validation
outputValidator: (output: string) => Promise<SafetyResult>;
// Tool risk assessment
toolRiskLevels: Record<string, 'low' | 'medium' | 'high'>;
// Actions requiring human approval
humanInTheLoop: string[];
}
async function executeWithGuardrails(
agent: Agent,
input: string,
config: GuardrailConfig
): Promise<Result> {
// 1. Check input safety
const inputCheck = await config.inputClassifier(input);
if (!inputCheck.safe) {
return { blocked: true, reason: inputCheck.reason };
}
// 2. Execute with tool monitoring
const result = await agent.run(input, {
beforeTool: async (tool, params) => {
const risk = config.toolRiskLevels[tool.name];
if (risk === 'high' || config.humanInTheLoop.includes(tool.name)) {
return await requestHumanApproval(tool, params);
}
return { approved: true };
},
});
// 3. Validate output
const outputCheck = await config.outputValidator(result.output);
if (!outputCheck.safe) {
return { blocked: true, reason: outputCheck.reason };
}
return result;
}
// Agent must stay within defined scope
const agentInstructions = `
You are a customer service agent for Acme Corp.
SCOPE BOUNDARIES (non-negotiable):
- Only answer questions about Acme products and services
- Never provide legal, medical, or financial advice
- Never access or modify data outside your authorized scope
- If a request is out of scope, politely decline and explain why
If you cannot complete a task within scope, notify the user
and request explicit approval before proceeding.
`;
| 任务复杂度 | 推荐模型 | 备注 |
|---|---|---|
| 简单、快速 | gpt-5-mini, claude-haiku | 低延迟 |
| 通用目的 | gpt-4.1, claude-sonnet | 平衡 |
| 复杂推理 | o4-mini, claude-opus | 更高准确性 |
| 深度规划 | gpt-5 + reasoning, ultrathink | 最大能力 |
// Use thinking_level for reasoning depth
const response = await gemini.generate({
model: 'gemini-3',
thinking_level: 'high', // For complex planning
temperature: 1.0, // Optimized for reasoning engine
});
// Preserve thought state across tool calls
const nextResponse = await gemini.generate({
thoughtSignature: response.thoughtSignature, // Required for function calling
// ... rest of params
});
// Trigger extended thinking with keywords
const thinkingLevels = {
'think': 'standard analysis',
'think hard': 'deeper reasoning',
'think harder': 'extensive analysis',
'ultrathink': 'maximum reasoning budget',
};
const prompt = `
Think hard about this problem before proposing a solution.
Task: ${task.description}
`;
describe('readFile tool', () => {
it('reads file content correctly', async () => {
const result = await readFile({ path: '/test/file.txt' });
expect(result.success).toBe(true);
expect(result.data).toContain('expected content');
});
});
describe('agent planning', () => {
it('creates plan before executing file modifications', async () => {
const trace = await agent.runWithTrace('Refactor the auth module');
// Verify planning happened first
const firstToolCall = trace.toolCalls[0];
expect(firstToolCall.name).toBe('read_file');
// Verify no writes without reads
const writeIndex = trace.toolCalls.findIndex(t => t.name === 'write_file');
const readIndex = trace.toolCalls.findIndex(t => t.name === 'read_file');
expect(readIndex).toBeLessThan(writeIndex);
});
});
// Run nightly, not in regular CI
describe('Agent Accuracy (Eval)', () => {
const testCases = loadTestCases('./evals/coding-tasks.json');
it.each(testCases)('completes $name correctly', async (testCase) => {
const result = await agent.run(testCase.input);
// Verify against expected outcomes
expect(result.filesModified).toEqual(testCase.expectedFiles);
expect(await runTests(testCase.testCommand)).toBe(true);
}, 120000);
});
project/
├── src/
│ ├── agents/
│ │ ├── __init__.py
│ │ ├── researcher.py # Research agent
│ │ ├── coder.py # Coding agent
│ │ └── orchestrator.py # Main coordinator
│ ├── tools/
│ │ ├── __init__.py
│ │ ├── web.py # Web search tools
│ │ ├── files.py # File operations
│ │ └── database.py # DB queries
│ ├── models/
│ │ ├── __init__.py
│ │ └── schemas.py # Pydantic models
│ └── deps.py # Dependencies
├── tests/
│ ├── test_agents.py
│ └── test_tools.py
└── pyproject.toml
from pydantic_ai import Agent, RunContext
from pydantic import BaseModel
from httpx import AsyncClient
class SearchResult(BaseModel):
title: str
url: str
snippet: str
class ResearchDeps(BaseModel):
http_client: AsyncClient
api_key: str
research_agent = Agent(
'claude-sonnet-4-20250514',
deps_type=ResearchDeps,
result_type=list[SearchResult],
system_prompt='You are a research assistant. Use tools to find information.',
)
@research_agent.tool
async def web_search(ctx: RunContext[ResearchDeps], query: str) -> list[dict]:
"""Search the web for information."""
response = await ctx.deps.http_client.get(
'https://api.search.com/search',
params={'q': query},
headers={'Authorization': f'Bearer {ctx.deps.api_key}'},
)
return response.json()['results']
@research_agent.tool
async def read_webpage(ctx: RunContext[ResearchDeps], url: str) -> str:
"""Read and extract content from a webpage."""
response = await ctx.deps.http_client.get(url)
return response.text[:5000] # Truncate for context
# Usage
async def main():
async with AsyncClient() as client:
deps = ResearchDeps(http_client=client, api_key='...')
result = await research_agent.run(
'Find recent articles about LLM agents',
deps=deps,
)
for item in result.data:
print(f"- {item.title}")
from pydantic import BaseModel, Field
from pydantic_ai import Agent
class CodeReview(BaseModel):
summary: str = Field(description="Brief summary of the review")
issues: list[str] = Field(description="List of issues found")
suggestions: list[str] = Field(description="Improvement suggestions")
approval: bool = Field(description="Whether code is approved")
confidence: float = Field(ge=0, le=1, description="Confidence score")
review_agent = Agent(
'claude-sonnet-4-20250514',
result_type=CodeReview,
system_prompt='Review code for quality, security, and best practices.',
)
# Result is validated Pydantic model
result = await review_agent.run(f"Review this code:\n```python\n{code}\n```")
if result.data.approval:
print("Code approved!")
else:
for issue in result.data.issues:
print(f"Issue: {issue}")
from pydantic_ai import Agent
# Specialized agents
planner = Agent('claude-sonnet-4-20250514', system_prompt='Create detailed plans.')
executor = Agent('claude-sonnet-4-20250514', system_prompt='Execute tasks precisely.')
reviewer = Agent('claude-sonnet-4-20250514', system_prompt='Review and verify work.')
async def orchestrate(task: str):
# 1. Plan
plan = await planner.run(f"Create a plan for: {task}")
# 2. Execute each step
results = []
for step in plan.data.steps:
result = await executor.run(f"Execute: {step}")
results.append(result.data)
# 3. Review
review = await reviewer.run(
f"Review the results:\nTask: {task}\nResults: {results}"
)
return review.data
from pydantic_ai import Agent
agent = Agent('claude-sonnet-4-20250514')
async def stream_response(prompt: str):
async with agent.run_stream(prompt) as response:
async for chunk in response.stream():
print(chunk, end='', flush=True)
# Get final structured result
result = await response.get_data()
return result
import pytest
from pydantic_ai import Agent
from pydantic_ai.models.test import TestModel
@pytest.fixture
def test_agent():
return Agent(
TestModel(), # Mock model for testing
result_type=str,
)
async def test_agent_response(test_agent):
result = await test_agent.run('Test prompt')
assert result.data is not None
# Test with specific responses
async def test_with_mock_response():
model = TestModel()
model.seed_response('Expected output')
agent = Agent(model)
result = await agent.run('Any prompt')
assert result.data == 'Expected output'
skills/
└── code-review/
├── instructions.md # How to perform code reviews
├── scripts/
│ └── run-linters.sh # Supporting scripts
└── resources/
└── checklist.md # Review checklist
# Code Review Skill
## When to Use
Activate this skill when asked to review code, PRs, or diffs.
## Process
1. Read the changed files completely
2. Run linters: `./scripts/run-linters.sh`
3. Check against resources/checklist.md
4. Provide structured feedback
## Output Format
- Summary (1-2 sentences)
- Issues found (severity: critical/major/minor)
- Suggestions for improvement
- Approval recommendation
async function loadSkill(skillName: string): Promise<Skill> {
const skillPath = `./skills/${skillName}`;
const instructions = await fs.readFile(`${skillPath}/instructions.md`, 'utf-8');
const scripts = await glob(`${skillPath}/scripts/*`);
const resources = await glob(`${skillPath}/resources/*`);
return {
name: skillName,
instructions,
scripts: scripts.map(s => ({ name: path.basename(s), path: s })),
resources: await Promise.all(resources.map(loadResource)),
};
}
"think" → Standard analysis
"think hard" → Deeper reasoning
"think harder" → Extensive analysis
"ultrathink" → Maximum reasoning
thinking_level: "high" | "low"
temperature: 1.0 (keep at 1.0 for reasoning)
thoughtSignature: <pass back for function calling>
每周安装量
132
代码仓库
GitHub 星标数
530
首次出现
2026年1月20日
安全审计
安装于
opencode106
codex101
gemini-cli99
claude-code96
cursor88
github-copilot80
Load with: base.md + llm-patterns.md + [language].md
For building autonomous AI agents that perform multi-step tasks with tools.
Sources: Claude Agent SDK | Anthropic Claude Code Best Practices | Pydantic AI | Google Gemini Agent Development | OpenAI Building Agents
| Language/Framework | Default | Why |
|---|---|---|
| Python | Pydantic AI | Type-safe, Pydantic validation, multi-model, production-ready |
| Node.js / Next.js | Claude Agent SDK | Official Anthropic SDK, tools, multi-agent, native streaming |
from pydantic_ai import Agent
from pydantic import BaseModel
class SearchResult(BaseModel):
title: str
url: str
summary: str
agent = Agent(
'claude-sonnet-4-20250514',
result_type=list[SearchResult],
system_prompt='You are a research assistant.',
)
# Type-safe result
result = await agent.run('Find articles about AI agents')
for item in result.data:
print(f"{item.title}: {item.url}")
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
// Define tools
const tools: Anthropic.Tool[] = [
{
name: "web_search",
description: "Search the web for information",
input_schema: {
type: "object",
properties: {
query: { type: "string", description: "Search query" },
},
required: ["query"],
},
},
];
// Agentic loop
async function runAgent(prompt: string) {
const messages: Anthropic.MessageParam[] = [
{ role: "user", content: prompt },
];
while (true) {
const response = await client.messages.create({
model: "claude-sonnet-4-20250514",
max_tokens: 4096,
tools,
messages,
});
// Check for tool use
if (response.stop_reason === "tool_use") {
const toolUse = response.content.find((b) => b.type === "tool_use");
if (toolUse) {
const result = await executeTool(toolUse.name, toolUse.input);
messages.push({ role: "assistant", content: response.content });
messages.push({
role: "user",
content: [{ type: "tool_result", tool_use_id: toolUse.id, content: result }],
});
continue;
}
}
// Done - return final response
return response.content.find((b) => b.type === "text")?.text;
}
}
Plan first, act incrementally, verify always.
Agents that research and plan before executing consistently outperform those that jump straight to action. Break complex tasks into verifiable steps, use tools judiciously, and maintain clear state throughout execution.
┌─────────────────────────────────────────────────┐
│ AGENT │
├─────────────────────────────────────────────────┤
│ Model (Brain) │ LLM for reasoning & │
│ │ decision-making │
├─────────────────────┼───────────────────────────┤
│ Tools (Arms/Legs) │ APIs, functions, external │
│ │ systems for action │
├─────────────────────┼───────────────────────────┤
│ Instructions │ System prompts defining │
│ (Rules) │ behavior & boundaries │
└─────────────────────┴───────────────────────────┘
project/
├── src/
│ ├── agents/
│ │ ├── orchestrator.ts # Main agent coordinator
│ │ ├── specialized/ # Task-specific agents
│ │ │ ├── researcher.ts
│ │ │ ├── coder.ts
│ │ │ └── reviewer.ts
│ │ └── base.ts # Shared agent interface
│ ├── tools/
│ │ ├── definitions/ # Tool schemas
│ │ ├── implementations/ # Tool logic
│ │ └── registry.ts # Tool discovery
│ ├── prompts/
│ │ ├── system/ # Agent instructions
│ │ └── templates/ # Task templates
│ └── memory/
│ ├── conversation.ts # Short-term context
│ └── persistent.ts # Long-term storage
├── tests/
│ ├── agents/ # Agent behavior tests
│ ├── tools/ # Tool unit tests
│ └── evals/ # End-to-end evaluations
└── skills/ # Agent skills (Anthropic pattern)
├── skill-name/
│ ├── instructions.md
│ ├── scripts/
│ └── resources/
// Gather context before acting
async function explore(task: Task): Promise<Context> {
const relevantFiles = await agent.searchCodebase(task.query);
const existingPatterns = await agent.analyzePatterns(relevantFiles);
const dependencies = await agent.identifyDependencies(task);
return { relevantFiles, existingPatterns, dependencies };
}
// Plan explicitly before execution
async function plan(task: Task, context: Context): Promise<Plan> {
const prompt = `
Task: ${task.description}
Context: ${JSON.stringify(context)}
Create a step-by-step plan. For each step:
1. What action to take
2. What tools to use
3. How to verify success
4. What could go wrong
Output JSON with steps array.
`;
return await llmCall({ prompt, schema: PlanSchema });
}
// Execute with verification at each step
async function execute(plan: Plan): Promise<Result[]> {
const results: Result[] = [];
for (const step of plan.steps) {
// Execute single step
const result = await executeStep(step);
// Verify before continuing
if (!await verify(step, result)) {
// Self-correct or escalate
const corrected = await selfCorrect(step, result);
if (!corrected.success) {
return handleFailure(step, results);
}
}
results.push(result);
}
return results;
}
// Independent verification prevents overfitting
async function verify(step: Step, result: Result): Promise<boolean> {
// Run tests if available
if (step.testCommand) {
const testResult = await runCommand(step.testCommand);
if (!testResult.success) return false;
}
// Use LLM to verify against criteria
const verification = await llmCall({
prompt: `
Step: ${step.description}
Expected: ${step.successCriteria}
Actual: ${JSON.stringify(result)}
Does the result satisfy the success criteria?
Respond with { "passes": boolean, "reasoning": string }
`,
schema: VerificationSchema
});
return verification.passes;
}
// tools/definitions/file-operations.ts
import { z } from 'zod';
export const ReadFileTool = {
name: 'read_file',
description: 'Read contents of a file. Use before modifying any file.',
parameters: z.object({
path: z.string().describe('Absolute path to the file'),
startLine: z.number().optional().describe('Start line (1-indexed)'),
endLine: z.number().optional().describe('End line (1-indexed)'),
}),
// Risk level for guardrails (OpenAI pattern)
riskLevel: 'low' as const,
};
export const WriteFileTool = {
name: 'write_file',
description: 'Write content to a file. Always read first to understand context.',
parameters: z.object({
path: z.string().describe('Absolute path to the file'),
content: z.string().describe('Complete file content'),
}),
riskLevel: 'medium' as const,
// Require confirmation for high-risk operations
requiresConfirmation: true,
};
// tools/implementations/file-operations.ts
export async function readFile(
params: z.infer<typeof ReadFileTool.parameters>
): Promise<ToolResult> {
try {
const content = await fs.readFile(params.path, 'utf-8');
const lines = content.split('\n');
const start = (params.startLine ?? 1) - 1;
const end = params.endLine ?? lines.length;
return {
success: true,
data: lines.slice(start, end).join('\n'),
metadata: { totalLines: lines.length }
};
} catch (error) {
return {
success: false,
error: `Failed to read file: ${error.message}`
};
}
}
// Use platform-provided tools when available
const agent = createAgent({
tools: [
// Built-in tools (handled by platform)
{ type: 'web_search' },
{ type: 'code_interpreter' },
// Custom tools only when needed
{ type: 'function', function: customDatabaseTool },
],
});
Use one agent for most tasks. Multiple agents add complexity.
// Expose specialized agents as callable tools
const researchAgent = createAgent({
name: 'researcher',
instructions: 'You research topics and return structured findings.',
tools: [webSearchTool, documentReadTool],
});
const mainAgent = createAgent({
tools: [
{
type: 'function',
function: {
name: 'research_topic',
description: 'Delegate research to specialized agent',
parameters: ResearchQuerySchema,
handler: async (query) => researchAgent.run(query),
},
},
],
});
// One-way transfer between agents
const customerServiceAgent = createAgent({
tools: [
// Handoff to specialist when needed
{
name: 'transfer_to_billing',
description: 'Transfer to billing specialist for payment issues',
handler: async (context) => {
return { handoff: 'billing_agent', context };
},
},
],
});
// memory/conversation.ts
interface ConversationMemory {
messages: Message[];
maxTokens: number;
add(message: Message): void;
getContext(): Message[];
summarize(): Promise<string>;
}
// Maintain state across tool calls (Gemini pattern)
interface AgentState {
thoughtSignature?: string; // Encrypted reasoning state
conversationId: string; // For shared memory
currentPlan?: Plan;
completedSteps: Step[];
}
// memory/persistent.ts
interface PersistentMemory {
// Store learnings across sessions
store(key: string, value: any): Promise<void>;
retrieve(key: string): Promise<any>;
// Semantic search over past interactions
search(query: string, limit: number): Promise<Memory[]>;
}
// guards/index.ts
interface GuardrailConfig {
// Input validation
inputClassifier: (input: string) => Promise<SafetyResult>;
// Output validation
outputValidator: (output: string) => Promise<SafetyResult>;
// Tool risk assessment
toolRiskLevels: Record<string, 'low' | 'medium' | 'high'>;
// Actions requiring human approval
humanInTheLoop: string[];
}
async function executeWithGuardrails(
agent: Agent,
input: string,
config: GuardrailConfig
): Promise<Result> {
// 1. Check input safety
const inputCheck = await config.inputClassifier(input);
if (!inputCheck.safe) {
return { blocked: true, reason: inputCheck.reason };
}
// 2. Execute with tool monitoring
const result = await agent.run(input, {
beforeTool: async (tool, params) => {
const risk = config.toolRiskLevels[tool.name];
if (risk === 'high' || config.humanInTheLoop.includes(tool.name)) {
return await requestHumanApproval(tool, params);
}
return { approved: true };
},
});
// 3. Validate output
const outputCheck = await config.outputValidator(result.output);
if (!outputCheck.safe) {
return { blocked: true, reason: outputCheck.reason };
}
return result;
}
// Agent must stay within defined scope
const agentInstructions = `
You are a customer service agent for Acme Corp.
SCOPE BOUNDARIES (non-negotiable):
- Only answer questions about Acme products and services
- Never provide legal, medical, or financial advice
- Never access or modify data outside your authorized scope
- If a request is out of scope, politely decline and explain why
If you cannot complete a task within scope, notify the user
and request explicit approval before proceeding.
`;
| Task Complexity | Recommended Model | Notes |
|---|---|---|
| Simple, fast | gpt-5-mini, claude-haiku | Low latency |
| General purpose | gpt-4.1, claude-sonnet | Balance |
| Complex reasoning | o4-mini, claude-opus | Higher accuracy |
| Deep planning | gpt-5 + reasoning, ultrathink | Maximum capability |
// Use thinking_level for reasoning depth
const response = await gemini.generate({
model: 'gemini-3',
thinking_level: 'high', // For complex planning
temperature: 1.0, // Optimized for reasoning engine
});
// Preserve thought state across tool calls
const nextResponse = await gemini.generate({
thoughtSignature: response.thoughtSignature, // Required for function calling
// ... rest of params
});
// Trigger extended thinking with keywords
const thinkingLevels = {
'think': 'standard analysis',
'think hard': 'deeper reasoning',
'think harder': 'extensive analysis',
'ultrathink': 'maximum reasoning budget',
};
const prompt = `
Think hard about this problem before proposing a solution.
Task: ${task.description}
`;
describe('readFile tool', () => {
it('reads file content correctly', async () => {
const result = await readFile({ path: '/test/file.txt' });
expect(result.success).toBe(true);
expect(result.data).toContain('expected content');
});
});
describe('agent planning', () => {
it('creates plan before executing file modifications', async () => {
const trace = await agent.runWithTrace('Refactor the auth module');
// Verify planning happened first
const firstToolCall = trace.toolCalls[0];
expect(firstToolCall.name).toBe('read_file');
// Verify no writes without reads
const writeIndex = trace.toolCalls.findIndex(t => t.name === 'write_file');
const readIndex = trace.toolCalls.findIndex(t => t.name === 'read_file');
expect(readIndex).toBeLessThan(writeIndex);
});
});
// Run nightly, not in regular CI
describe('Agent Accuracy (Eval)', () => {
const testCases = loadTestCases('./evals/coding-tasks.json');
it.each(testCases)('completes $name correctly', async (testCase) => {
const result = await agent.run(testCase.input);
// Verify against expected outcomes
expect(result.filesModified).toEqual(testCase.expectedFiles);
expect(await runTests(testCase.testCommand)).toBe(true);
}, 120000);
});
project/
├── src/
│ ├── agents/
│ │ ├── __init__.py
│ │ ├── researcher.py # Research agent
│ │ ├── coder.py # Coding agent
│ │ └── orchestrator.py # Main coordinator
│ ├── tools/
│ │ ├── __init__.py
│ │ ├── web.py # Web search tools
│ │ ├── files.py # File operations
│ │ └── database.py # DB queries
│ ├── models/
│ │ ├── __init__.py
│ │ └── schemas.py # Pydantic models
│ └── deps.py # Dependencies
├── tests/
│ ├── test_agents.py
│ └── test_tools.py
└── pyproject.toml
from pydantic_ai import Agent, RunContext
from pydantic import BaseModel
from httpx import AsyncClient
class SearchResult(BaseModel):
title: str
url: str
snippet: str
class ResearchDeps(BaseModel):
http_client: AsyncClient
api_key: str
research_agent = Agent(
'claude-sonnet-4-20250514',
deps_type=ResearchDeps,
result_type=list[SearchResult],
system_prompt='You are a research assistant. Use tools to find information.',
)
@research_agent.tool
async def web_search(ctx: RunContext[ResearchDeps], query: str) -> list[dict]:
"""Search the web for information."""
response = await ctx.deps.http_client.get(
'https://api.search.com/search',
params={'q': query},
headers={'Authorization': f'Bearer {ctx.deps.api_key}'},
)
return response.json()['results']
@research_agent.tool
async def read_webpage(ctx: RunContext[ResearchDeps], url: str) -> str:
"""Read and extract content from a webpage."""
response = await ctx.deps.http_client.get(url)
return response.text[:5000] # Truncate for context
# Usage
async def main():
async with AsyncClient() as client:
deps = ResearchDeps(http_client=client, api_key='...')
result = await research_agent.run(
'Find recent articles about LLM agents',
deps=deps,
)
for item in result.data:
print(f"- {item.title}")
from pydantic import BaseModel, Field
from pydantic_ai import Agent
class CodeReview(BaseModel):
summary: str = Field(description="Brief summary of the review")
issues: list[str] = Field(description="List of issues found")
suggestions: list[str] = Field(description="Improvement suggestions")
approval: bool = Field(description="Whether code is approved")
confidence: float = Field(ge=0, le=1, description="Confidence score")
review_agent = Agent(
'claude-sonnet-4-20250514',
result_type=CodeReview,
system_prompt='Review code for quality, security, and best practices.',
)
# Result is validated Pydantic model
result = await review_agent.run(f"Review this code:\n```python\n{code}\n```")
if result.data.approval:
print("Code approved!")
else:
for issue in result.data.issues:
print(f"Issue: {issue}")
from pydantic_ai import Agent
# Specialized agents
planner = Agent('claude-sonnet-4-20250514', system_prompt='Create detailed plans.')
executor = Agent('claude-sonnet-4-20250514', system_prompt='Execute tasks precisely.')
reviewer = Agent('claude-sonnet-4-20250514', system_prompt='Review and verify work.')
async def orchestrate(task: str):
# 1. Plan
plan = await planner.run(f"Create a plan for: {task}")
# 2. Execute each step
results = []
for step in plan.data.steps:
result = await executor.run(f"Execute: {step}")
results.append(result.data)
# 3. Review
review = await reviewer.run(
f"Review the results:\nTask: {task}\nResults: {results}"
)
return review.data
from pydantic_ai import Agent
agent = Agent('claude-sonnet-4-20250514')
async def stream_response(prompt: str):
async with agent.run_stream(prompt) as response:
async for chunk in response.stream():
print(chunk, end='', flush=True)
# Get final structured result
result = await response.get_data()
return result
import pytest
from pydantic_ai import Agent
from pydantic_ai.models.test import TestModel
@pytest.fixture
def test_agent():
return Agent(
TestModel(), # Mock model for testing
result_type=str,
)
async def test_agent_response(test_agent):
result = await test_agent.run('Test prompt')
assert result.data is not None
# Test with specific responses
async def test_with_mock_response():
model = TestModel()
model.seed_response('Expected output')
agent = Agent(model)
result = await agent.run('Any prompt')
assert result.data == 'Expected output'
skills/
└── code-review/
├── instructions.md # How to perform code reviews
├── scripts/
│ └── run-linters.sh # Supporting scripts
└── resources/
└── checklist.md # Review checklist
# Code Review Skill
## When to Use
Activate this skill when asked to review code, PRs, or diffs.
## Process
1. Read the changed files completely
2. Run linters: `./scripts/run-linters.sh`
3. Check against resources/checklist.md
4. Provide structured feedback
## Output Format
- Summary (1-2 sentences)
- Issues found (severity: critical/major/minor)
- Suggestions for improvement
- Approval recommendation
async function loadSkill(skillName: string): Promise<Skill> {
const skillPath = `./skills/${skillName}`;
const instructions = await fs.readFile(`${skillPath}/instructions.md`, 'utf-8');
const scripts = await glob(`${skillPath}/scripts/*`);
const resources = await glob(`${skillPath}/resources/*`);
return {
name: skillName,
instructions,
scripts: scripts.map(s => ({ name: path.basename(s), path: s })),
resources: await Promise.all(resources.map(loadResource)),
};
}
"think" → Standard analysis
"think hard" → Deeper reasoning
"think harder" → Extensive analysis
"ultrathink" → Maximum reasoning
thinking_level: "high" | "low"
temperature: 1.0 (keep at 1.0 for reasoning)
thoughtSignature: <pass back for function calling>
Weekly Installs
132
Repository
GitHub Stars
530
First Seen
Jan 20, 2026
Security Audits
Gen Agent Trust HubPassSocketWarnSnykWarn
Installed on
opencode106
codex101
gemini-cli99
claude-code96
cursor88
github-copilot80
agent-browser 浏览器自动化工具 - Vercel Labs 命令行网页操作与测试
152,900 周安装