Agent架构分析工具：基于12-Factor原则的AI代理合规性检查与代码审查 | SkillsMD

Agent架构分析工具：基于12-Factor原则的AI代理合规性检查与代码审查

agent-architecture-analysis by existential-birds/beagle

83 周安装量

45 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/existential-birds/beagle --skill agent-architecture-analysis

AI/机器学习自动化代码质量

🇨🇳中文介绍

12-Factor Agents 合规性分析

参考：12-Factor Agents

输入参数

参数	描述	必填
`docs_path`	文档目录路径（用于现有分析）	可选
`codebase_path`	待分析代码库的根路径	必填

分析框架

要素 1：自然语言到工具调用

原则： 使用模式验证的输出，将自然语言输入转换为结构化的、确定性的工具调用。

搜索模式：

# 查找 Pydantic 模式
grep -r "class.*BaseModel" --include="*.py"
grep -r "TaskDAG\|TaskResponse\|ToolCall" --include="*.py"

# 查找 JSON 模式生成
grep -r "model_json_schema\|json_schema" --include="*.py"

# 查找结构化输出生成
grep -r "output_type\|response_model" --include="*.py"

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

相关 Skills

FlyClaw：零登录航班聚合查询工具，Python实现多源航班信息与价格搜索

4,000,000 周安装

find-skills 技能搜索工具 - Vercel Labs 开源智能体技能包管理器

879,700 周安装

Azure RBAC 权限管理工具：查找最小角色、创建自定义角色与自动化分配

135,700 周安装

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

118,000 周安装

等级	标准
强	所有 LLM 输出均使用带有验证器的 Pydantic/数据类模式
部分	部分输出有类型定义，但存在字典返回或未验证的字符串
弱	LLM 返回原始字符串，需手动或使用正则表达式解析

# 查找嵌入式提示词
grep -r "SYSTEM_PROMPT\|system_prompt" --include="*.py"
grep -r '""".*You are' --include="*.py"

# 查找模板系统
grep -r "jinja\|Jinja\|render_template" --include="*.py"
find . -name "*.jinja2" -o -name "*.j2"

# 查找提示词目录
find . -type d -name "prompts"

等级	标准
强	提示词位于单独的文件中，使用模板（Jinja2），有版本控制
部分	提示词作为模块常量，有一定参数化
弱	提示词硬编码在函数内部，仅使用 f-strings

# 查找上下文/消息管理
grep -r "AgentMessage\|ChatMessage\|messages" --include="*.py"
grep -r "context_window\|context_compiler" --include="*.py"

# 查找自定义序列化
grep -r "to_xml\|to_context\|serialize" --include="*.py"

# 查找令牌管理
grep -r "token_count\|max_tokens\|truncate" --include="*.py"

等级	标准
强	自定义上下文格式，令牌优化，类型化事件，压缩
部分	具有基本结构的消息历史记录
弱	原始消息累积，仅使用标准 OpenAI 格式

# 查找工具/响应模式
grep -r "class.*Response.*BaseModel" --include="*.py"
grep -r "ToolResult\|ToolOutput" --include="*.py"

# 查找确定性处理器
grep -r "def handle_\|def execute_" --include="*.py"

# 查找验证层
grep -r "model_validate\|parse_obj" --include="*.py"

等级	标准
强	所有工具输出都经过模式验证，处理器类型安全
部分	大多数工具有类型定义，部分返回宽松的字典
弱	工具返回任意字典，没有验证层

# 查找状态模型
grep -r "ExecutionState\|WorkflowState\|Thread" --include="*.py"

# 查找双重状态系统
grep -r "checkpoint\|MemorySaver" --include="*.py"
grep -r "sqlite\|database\|repository" --include="*.py"

# 查找状态重建
grep -r "load_state\|restore\|reconstruct" --include="*.py"

等级	标准
强	包含所有执行元数据的单一可序列化状态对象
部分	状态存在但分散在多个系统中（内存 + 数据库）
弱	执行状态分散，需要多次查询才能重建

# 查找 REST 端点
grep -r "@router.post\|@app.post" --include="*.py"
grep -r "start_workflow\|pause\|resume" --include="*.py"

# 查找中断机制
grep -r "interrupt_before\|interrupt_after" --include="*.py"

# 查找 Webhook 处理器
grep -r "webhook\|callback" --include="*.py"

等级	标准
强	REST API + Webhook 恢复，可在任何点暂停，包括工具执行过程中
部分	存在启动/暂停/恢复功能，但仅在粗粒度点
弱	仅 CLI 启动，无暂停/恢复能力

# 查找人类输入机制
grep -r "typer.confirm\|input(\|prompt(" --include="*.py"
grep -r "request_human_input\|human_contact" --include="*.py"

# 查找审批模式
grep -r "approval\|approve\|reject" --include="*.py"

# 查找结构化问题格式
grep -r "question.*options\|HumanInputRequest" --include="*.py"

等级	标准
强	具有问题/选项/紧急程度/格式的 `request_human_input` 工具
部分	存在审批关卡，但硬编码在图结构中
弱	阻塞式 CLI 提示，没有基于工具的人类联系

# 查找路由逻辑
grep -r "add_conditional_edges\|route_\|should_continue" --include="*.py"

# 查找自定义循环
grep -r "while True\|for.*in.*range" --include="*.py" | grep -v test

# 查找执行模式控制
grep -r "execution_mode\|agentic\|structured" --include="*.py"

等级	标准
强	自定义路由函数，条件边，执行模式控制
部分	框架控制流，带有一些自定义
弱	默认框架循环，无自定义路由

# 查找错误处理
grep -r "except.*Exception\|error_history\|consecutive_errors" --include="*.py"

# 查找重试逻辑
grep -r "retry\|backoff\|max_attempts" --include="*.py"

# 查找升级
grep -r "escalate\|human_escalation" --include="*.py"

等级	标准
强	错误在上下文中，带阈值的重试，自动升级
部分	错误被记录并返回，没有自动重试循环
弱	仅记录错误，不反馈给 LLM，任务立即失败

# 查找代理类
grep -r "class.*Agent\|class.*Architect\|class.*Developer" --include="*.py"

# 查找步骤定义
grep -r "steps\|tasks" --include="*.py" | head -20

# 统计每个代理的方法数
grep -r "async def\|def " agents/*.py 2>/dev/null | wc -l

等级	标准
强	3+ 个专业代理，每个具有单一职责，步骤限制
部分	多个代理，但部分职责范围过广
弱	单个“全能”代理处理所有事情

# 查找入口点
grep -r "@cli.command\|@router.post\|@app.post" --include="*.py"

# 查找 WebSocket 支持
grep -r "WebSocket\|websocket" --include="*.py"

# 查找外部集成
grep -r "slack\|discord\|webhook" --include="*.py" -i

等级	标准
强	CLI + REST + WebSocket + Webhook + 聊天集成
部分	提供 CLI + REST API
弱	仅 CLI，无程序化访问

# 查找状态突变模式
grep -r "\.status = \|\.field = " --include="*.py"

# 查找不可变更新
grep -r "model_copy\|\.copy(\|with_" --include="*.py"

# 查找代理中的副作用
grep -r "write_file\|subprocess\|requests\." agents/*.py 2>/dev/null

等级	标准
强	不可变状态更新，副作用隔离到工具/处理器
部分	大部分不可变，存在一些原地突变
弱	状态原地突变，副作用与代理逻辑混杂

# 查找上下文预取
grep -r "pre_fetch\|prefetch\|fetch_context" --include="*.py"

# 查找 RAG/嵌入系统
grep -r "embedding\|vector\|semantic_search" --include="*.py"

# 查找相关文件发现
grep -r "related_tests\|similar_\|find_relevant" --include="*.py"

等级	标准
强	在规划前自动预取相关测试、文件、文档
部分	手动传递上下文，支持设计文档
弱	没有预取，LLM 必须通过工具请求所有上下文

| 要素 | 状态 | 备注 |
|--------|--------|-------|
| 1. 自然语言 -> 工具调用 | **强/部分/弱** | [关键发现] |
| 2. 拥有你的提示词 | **强/部分/弱** | [关键发现] |
| ... | ... | ... |
| 13. 预取上下文 | **强/部分/弱** | [关键发现] |

**总体情况**：X 强，Y 部分，Z 弱

 * 证据，包含文件:行号引用

 * 显示模式的代码片段

 * 强/部分/弱，并附理由

 * 与 12-Factor 理想状态相比缺少什么

 * 可操作的改进措施，包含代码示例

 * 运行所有要素的搜索模式

 * 识别每个要素的关键文件
 * 记录任何现有的合规文档

 * 阅读识别出的文件
 * 根据合规标准进行评估
 * 使用文件路径记录证据

 * 比较当前状态与 12-Factor 理想状态
 * 识别存在的反模式
 * 按影响优先级排序

 * 提供可操作的改进措施
 * 包含改进前/后的代码示例
 * 如果存在，引用路线图

 * 编制执行摘要表
 * 突出优势和关键差距
 * 建议改进的优先级顺序

评分	含义	行动
强	完全实现原则	维护，小优化
部分	部分实现，存在显著差距	计划改进
弱	最小或没有实现	路线图高优先级

🇺🇸English

12-Factor Agents Compliance Analysis

Reference: 12-Factor Agents

Input Parameters

Parameter	Description	Required
`docs_path`	Path to documentation directory (for existing analyses)	Optional
`codebase_path`	Root path of the codebase to analyze	Required

Analysis Framework

Factor 1: Natural Language to Tool Calls

Principle: Convert natural language inputs into structured, deterministic tool calls using schema-validated outputs.

Search Patterns:

# Look for Pydantic schemas
grep -r "class.*BaseModel" --include="*.py"
grep -r "TaskDAG\|TaskResponse\|ToolCall" --include="*.py"

# Look for JSON schema generation
grep -r "model_json_schema\|json_schema" --include="*.py"

# Look for structured output generation
grep -r "output_type\|response_model" --include="*.py"

File Patterns: **/agents/*.py, **/schemas/*.py, **/models/*.py

Compliance Criteria:

Level	Criteria
Strong	All LLM outputs use Pydantic/dataclass schemas with validators
Partial	Some outputs typed, but dict returns or unvalidated strings exist
Weak	LLM returns raw strings parsed manually or with regex

Anti-patterns:

json.loads(llm_response) without schema validation
output.split() or regex parsing of LLM responses
dict[str, Any] return types from agents
No validation between LLM output and handler execution

Factor 2: Own Your Prompts

Principle: Treat prompts as first-class code you control, version, and iterate on.

Search Patterns:

# Look for embedded prompts
grep -r "SYSTEM_PROMPT\|system_prompt" --include="*.py"
grep -r '""".*You are' --include="*.py"

# Look for template systems
grep -r "jinja\|Jinja\|render_template" --include="*.py"
find . -name "*.jinja2" -o -name "*.j2"

# Look for prompt directories
find . -type d -name "prompts"

File Patterns: **/prompts/**, **/templates/**, **/agents/*.py

Compliance Criteria:

Level	Criteria
Strong	Prompts in separate files, templated (Jinja2), versioned
Partial	Prompts as module constants, some parameterization
Weak	Prompts hardcoded inline in functions, f-strings only

Anti-patterns:

f"You are a {role}..." inline in agent methods
Prompts mixed with business logic
No way to iterate on prompts without code changes
No prompt versioning or A/B testing capability

Factor 3: Own Your Context Window

Principle: Control how history, state, and tool results are formatted for the LLM.

Search Patterns:

# Look for context/message management
grep -r "AgentMessage\|ChatMessage\|messages" --include="*.py"
grep -r "context_window\|context_compiler" --include="*.py"

# Look for custom serialization
grep -r "to_xml\|to_context\|serialize" --include="*.py"

# Look for token management
grep -r "token_count\|max_tokens\|truncate" --include="*.py"

File Patterns: **/context/*.py, **/state/*.py, **/core/*.py

Compliance Criteria:

Level	Criteria
Strong	Custom context format, token optimization, typed events, compaction
Partial	Basic message history with some structure
Weak	Raw message accumulation, standard OpenAI format only

Anti-patterns:

Unbounded message accumulation
Large artifacts embedded inline (diffs, files)
No agent-specific context filtering
Same context for all agent types

Factor 4: Tools Are Structured Outputs

Principle: Tools produce schema-validated JSON that triggers deterministic code, not magic function calls.

Search Patterns:

# Look for tool/response schemas
grep -r "class.*Response.*BaseModel" --include="*.py"
grep -r "ToolResult\|ToolOutput" --include="*.py"

# Look for deterministic handlers
grep -r "def handle_\|def execute_" --include="*.py"

# Look for validation layer
grep -r "model_validate\|parse_obj" --include="*.py"

File Patterns: **/tools/*.py, **/handlers/*.py, **/agents/*.py

Compliance Criteria:

Level	Criteria
Strong	All tool outputs schema-validated, handlers type-safe
Partial	Most tools typed, some loose dict returns
Weak	Tools return arbitrary dicts, no validation layer

Anti-patterns:

Tool handlers that directly execute LLM output
eval() or exec() on LLM-generated code
No separation between decision (LLM) and execution (code)
Magic method dispatch based on string matching

Factor 5: Unify Execution State

Principle: Merge execution state (step, retries) with business state (messages, results).

Search Patterns:

# Look for state models
grep -r "ExecutionState\|WorkflowState\|Thread" --include="*.py"

# Look for dual state systems
grep -r "checkpoint\|MemorySaver" --include="*.py"
grep -r "sqlite\|database\|repository" --include="*.py"

# Look for state reconstruction
grep -r "load_state\|restore\|reconstruct" --include="*.py"

File Patterns: **/state/*.py, **/models/*.py, **/database/*.py

Compliance Criteria:

Level	Criteria
Strong	Single serializable state object with all execution metadata
Partial	State exists but split across systems (memory + DB)
Weak	Execution state scattered, requires multiple queries to reconstruct

Anti-patterns:

Retry count stored separately from task state
Error history in logs but not in state
LangGraph checkpoints + separate database storage
No unified event thread

Factor 6: Launch/Pause/Resume

Principle: Agents support simple APIs for launching, pausing at any point, and resuming.

Search Patterns:

# Look for REST endpoints
grep -r "@router.post\|@app.post" --include="*.py"
grep -r "start_workflow\|pause\|resume" --include="*.py"

# Look for interrupt mechanisms
grep -r "interrupt_before\|interrupt_after" --include="*.py"

# Look for webhook handlers
grep -r "webhook\|callback" --include="*.py"

File Patterns: **/routes/*.py, **/api/*.py, **/orchestrator/*.py

Compliance Criteria:

Level	Criteria
Strong	REST API + webhook resume, pause at any point including mid-tool
Partial	Launch/pause/resume exists but only at coarse-grained points
Weak	CLI-only launch, no pause/resume capability

Anti-patterns:

Blocking input() or confirm() calls
No way to resume after process restart
Approval only at plan level, not per-tool
No webhook-based resume from external systems

Factor 7: Contact Humans with Tools

Principle: Human contact is a tool call with question, options, and urgency.

Search Patterns:

# Look for human input mechanisms
grep -r "typer.confirm\|input(\|prompt(" --include="*.py"
grep -r "request_human_input\|human_contact" --include="*.py"

# Look for approval patterns
grep -r "approval\|approve\|reject" --include="*.py"

# Look for structured question formats
grep -r "question.*options\|HumanInputRequest" --include="*.py"

File Patterns: **/agents/*.py, **/tools/*.py, **/orchestrator/*.py

Compliance Criteria:

Level	Criteria
Strong	`request_human_input` tool with question/options/urgency/format
Partial	Approval gates exist but hardcoded in graph structure
Weak	Blocking CLI prompts, no tool-based human contact

Anti-patterns:

typer.confirm() in agent code
Human contact hardcoded at specific graph nodes
No way for agents to ask clarifying questions
Single response format (yes/no only)

Factor 8: Own Your Control Flow

Principle: Custom control flow, not framework defaults. Full control over routing, retries, compaction.

Search Patterns:

# Look for routing logic
grep -r "add_conditional_edges\|route_\|should_continue" --include="*.py"

# Look for custom loops
grep -r "while True\|for.*in.*range" --include="*.py" | grep -v test

# Look for execution mode control
grep -r "execution_mode\|agentic\|structured" --include="*.py"

File Patterns: **/orchestrator/*.py, **/graph/*.py, **/core/*.py

Compliance Criteria:

Level	Criteria
Strong	Custom routing functions, conditional edges, execution mode control
Partial	Framework control flow with some customization
Weak	Default framework loop with no custom routing

Anti-patterns:

Single path through graph with no branching
No distinction between tool types (all treated same)
Framework-default error handling only
No rate limiting or resource management

Factor 9: Compact Errors into Context

Principle: Errors in context enable self-healing. Track consecutive errors, escalate after threshold.

Search Patterns:

# Look for error handling
grep -r "except.*Exception\|error_history\|consecutive_errors" --include="*.py"

# Look for retry logic
grep -r "retry\|backoff\|max_attempts" --include="*.py"

# Look for escalation
grep -r "escalate\|human_escalation" --include="*.py"

File Patterns: **/agents/*.py, **/orchestrator/*.py, **/core/*.py

Compliance Criteria:

Level	Criteria
Strong	Errors in context, retry with threshold, automatic escalation
Partial	Errors logged and returned, no automatic retry loop
Weak	Errors logged only, not fed back to LLM, task fails immediately

Anti-patterns:

logger.error() without adding to context
No retry mechanism (fail immediately)
No consecutive error tracking
No escalation to humans after repeated failures

Factor 10: Small, Focused Agents

Principle: Each agent has narrow responsibility, 3-10 steps max.

Search Patterns:

# Look for agent classes
grep -r "class.*Agent\|class.*Architect\|class.*Developer" --include="*.py"

# Look for step definitions
grep -r "steps\|tasks" --include="*.py" | head -20

# Count methods per agent
grep -r "async def\|def " agents/*.py 2>/dev/null | wc -l

File Patterns: **/agents/*.py

Compliance Criteria:

Level	Criteria
Strong	3+ specialized agents, each with single responsibility, step limits
Partial	Multiple agents but some have broad scope
Weak	Single "god" agent that handles everything

Anti-patterns:

Single agent with 20+ tools
Agent with unbounded step count
Mixed responsibilities (planning + execution + review)
No step or time limits on agent execution

Factor 11: Trigger from Anywhere

Principle: Workflows triggerable from CLI, REST, WebSocket, Slack, webhooks, etc.

Search Patterns:

# Look for entry points
grep -r "@cli.command\|@router.post\|@app.post" --include="*.py"

# Look for WebSocket support
grep -r "WebSocket\|websocket" --include="*.py"

# Look for external integrations
grep -r "slack\|discord\|webhook" --include="*.py" -i

File Patterns: **/routes/*.py, **/cli/*.py, **/main.py

Compliance Criteria:

Level	Criteria
Strong	CLI + REST + WebSocket + webhooks + chat integrations
Partial	CLI + REST API available
Weak	CLI only, no programmatic access

Anti-patterns:

Only if __name__ == "__main__" entry point
No REST API for external systems
No event streaming for real-time updates
Trigger logic tightly coupled to execution

Factor 12: Stateless Reducer

Principle: Agents as pure functions: (state, input) -> (state, output). No side effects in agent logic.

Search Patterns:

# Look for state mutation patterns
grep -r "\.status = \|\.field = " --include="*.py"

# Look for immutable updates
grep -r "model_copy\|\.copy(\|with_" --include="*.py"

# Look for side effects in agents
grep -r "write_file\|subprocess\|requests\." agents/*.py 2>/dev/null

File Patterns: **/agents/*.py, **/nodes/*.py

Compliance Criteria:

Level	Criteria
Strong	Immutable state updates, side effects isolated to tools/handlers
Partial	Mostly immutable, some in-place mutations
Weak	State mutated in place, side effects mixed with agent logic

Anti-patterns:

state.field = new_value (mutation)
File writes inside agent methods
HTTP calls inside agent decision logic
Shared mutable state between agents

Factor 13: Pre-fetch Context

Principle: Fetch likely-needed data upfront rather than mid-workflow.

Search Patterns:

# Look for context pre-fetching
grep -r "pre_fetch\|prefetch\|fetch_context" --include="*.py"

# Look for RAG/embedding systems
grep -r "embedding\|vector\|semantic_search" --include="*.py"

# Look for related file discovery
grep -r "related_tests\|similar_\|find_relevant" --include="*.py"

File Patterns: **/context/*.py, **/retrieval/*.py, **/rag/*.py

Compliance Criteria:

Level	Criteria
Strong	Automatic pre-fetch of related tests, files, docs before planning
Partial	Manual context passing, design doc support
Weak	No pre-fetching, LLM must request all context via tools

Anti-patterns:

Architect starts with issue only, no codebase context
No semantic search for similar past work
Related tests/files discovered only during execution
No RAG or document retrieval system

Output Format

Executive Summary Table

| Factor | Status | Notes |
|--------|--------|-------|
| 1. Natural Language -> Tool Calls | **Strong/Partial/Weak** | [Key finding] |
| 2. Own Your Prompts | **Strong/Partial/Weak** | [Key finding] |
| ... | ... | ... |
| 13. Pre-fetch Context | **Strong/Partial/Weak** | [Key finding] |

**Overall**: X Strong, Y Partial, Z Weak

Per-Factor Analysis

For each factor, provide:

Current Implementation
- Evidence with file:line references
- Code snippets showing patterns
Compliance Level
- Strong/Partial/Weak with justification
Gaps
- What's missing vs. 12-Factor ideal
Recommendations
- Actionable improvements with code examples

Analysis Workflow

Initial Scan
- Run search patterns for all factors
- Identify key files for each factor
- Note any existing compliance documentation
Deep Dive (per factor)
- Read identified files
- Evaluate against compliance criteria
- Document evidence with file paths
Gap Analysis
- Compare current vs. 12-Factor ideal
- Identify anti-patterns present
- Prioritize by impact
Recommendations
- Provide actionable improvements
- Include before/after code examples
- Reference roadmap if exists
Summary
- Compile executive summary table
- Highlight strengths and critical gaps
- Suggest priority order for improvements

Quick Reference: Compliance Scoring

Score	Meaning	Action
Strong	Fully implements principle	Maintain, minor optimizations
Partial	Some implementation, significant gaps	Planned improvements
Weak	Minimal or no implementation	High priority for roadmap

When to Use This Skill

Evaluating new LLM-powered systems
Reviewing agent architecture decisions
Auditing production agentic applications
Planning improvements to existing agents
Comparing frameworks or implementations

Weekly Installs

Repository

existential-birds/beagle

GitHub Stars

First Seen

Jan 20, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

gemini-cli69

codex69

opencode69

claude-code67

cursor65

github-copilot60

Agent架构分析工具：基于12-Factor原则的AI代理合规性检查与代码审查

🇨🇳中文介绍

12-Factor Agents 合规性分析

输入参数

分析框架

要素 1：自然语言到工具调用

相关 Skills

要素 2：拥有你的提示词

要素 3：拥有你的上下文窗口

要素 4：工具是结构化输出

要素 5：统一执行状态

要素 6：启动/暂停/恢复

要素 7：使用工具联系人类

要素 8：拥有你的控制流

要素 9：将错误压缩到上下文中

要素 10：小型、专注的代理

要素 11：从任何地方触发

要素 12：无状态归约器

要素 13：预取上下文

输出格式

执行摘要表

按要素分析

分析工作流

快速参考：合规评分

何时使用此技能

🇺🇸English

12-Factor Agents Compliance Analysis

Input Parameters

Analysis Framework

Factor 1: Natural Language to Tool Calls

Factor 2: Own Your Prompts

Factor 3: Own Your Context Window

Factor 4: Tools Are Structured Outputs

Factor 5: Unify Execution State

Factor 6: Launch/Pause/Resume

Factor 7: Contact Humans with Tools

Factor 8: Own Your Control Flow

Factor 9: Compact Errors into Context

Factor 10: Small, Focused Agents

Factor 11: Trigger from Anywhere

Factor 12: Stateless Reducer

Factor 13: Pre-fetch Context

Output Format

Executive Summary Table

Per-Factor Analysis

Analysis Workflow

Quick Reference: Compliance Scoring

When to Use This Skill

最新 Skills