senior-prompt-engineer by alirezarezvani/claude-skills
npx skills add https://github.com/alirezarezvani/claude-skills --skill senior-prompt-engineer提示工程模式、LLM评估框架与智能体系统设计。
# 分析并优化提示文件
python scripts/prompt_optimizer.py prompts/my_prompt.txt --analyze
# 评估RAG检索质量
python scripts/rag_evaluator.py --contexts contexts.json --questions questions.json
# 根据定义可视化智能体工作流
python scripts/agent_orchestrator.py agent_config.yaml --visualize
分析提示的令牌效率、清晰度和结构。生成优化版本。
输入: 提示文本文件或字符串 输出: 包含优化建议的分析报告
用法:
# 分析提示文件
python scripts/prompt_optimizer.py prompt.txt --analyze
# 输出:
# 令牌计数:847
# 预估成本:$0.0025 (GPT-4)
# 清晰度得分:72/100
# 发现的问题:
# - 第3行指令模糊
# - 缺少输出格式规范
# - 冗余上下文(第12-15行重复了第5-8行)
# 建议:
# 1. 添加明确的输出格式:"以JSON格式响应,包含以下键:..."
# 2. 移除冗余上下文以节省89个令牌
# 3. 澄清"分析" -> "列出前3个问题及其严重性评级"
# 生成优化版本
python scripts/prompt_optimizer.py prompt.txt --optimize --output optimized.txt
# 为成本估算统计令牌数
python scripts/prompt_optimizer.py prompt.txt --tokens --model gpt-4
# 提取并管理少样本示例
python scripts/prompt_optimizer.py prompt.txt --extract-examples --output examples.json
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
通过衡量上下文相关性和答案忠实度来评估检索增强生成的质量。
输入: 检索到的上下文(JSON)和问题/答案 输出: 评估指标和质量报告
用法:
# 评估检索质量
python scripts/rag_evaluator.py --contexts retrieved.json --questions eval_set.json
# 输出:
# === RAG 评估报告 ===
# 已评估问题数:50
#
# 检索指标:
# 上下文相关性:0.78 (目标:>0.80)
# 检索精确率@5:0.72
# 覆盖率:0.85
#
# 生成指标:
# 答案忠实度:0.91
# 事实依据性:0.88
#
# 发现的问题:
# - 8个问题的前5个结果中没有相关上下文
# - 3个答案包含了上下文中没有的信息
#
# 建议:
# 1. 改进技术文档的分块策略
# 2. 为日期敏感查询添加元数据过滤
# 使用自定义指标进行评估
python scripts/rag_evaluator.py --contexts retrieved.json --questions eval_set.json \
--metrics relevance,faithfulness,coverage
# 导出详细结果
python scripts/rag_evaluator.py --contexts retrieved.json --questions eval_set.json \
--output report.json --verbose
解析智能体定义并可视化执行流程。验证工具配置。
输入: 智能体配置(YAML/JSON) 输出: 工作流可视化、验证报告
用法:
# 验证智能体配置
python scripts/agent_orchestrator.py agent.yaml --validate
# 输出:
# === 智能体验证报告 ===
# 智能体:research_assistant
# 模式:ReAct
#
# 工具(已注册4个):
# [OK] web_search - API密钥已配置
# [OK] calculator - 无需配置
# [WARN] file_reader - 缺少 allowed_paths
# [OK] summarizer - 提示模板有效
#
# 流程分析:
# 最大深度:5次迭代
# 预估令牌数/运行:2,400-4,800
# 潜在无限循环:无
#
# 建议:
# 1. 为 file_reader 添加 allowed_paths 以确保安全
# 2. 考虑为简单查询添加提前退出条件
# 可视化智能体工作流(ASCII)
python scripts/agent_orchestrator.py agent.yaml --visualize
# 输出:
# ┌─────────────────────────────────────────┐
# │ research_assistant │
# │ (ReAct Pattern) │
# └─────────────────┬───────────────────────┘
# │
# ┌────────▼────────┐
# │ 用户查询 │
# └────────┬────────┘
# │
# ┌────────▼────────┐
# │ 思考 │◄──────┐
# └────────┬────────┘ │
# │ │
# ┌────────▼────────┐ │
# │ 选择工具 │ │
# └────────┬────────┘ │
# │ │
# ┌─────────────┼─────────────┐ │
# ▼ ▼ ▼ │
# [web_search] [calculator] [file_reader]
# │ │ │ │
# └─────────────┼─────────────┘ │
# │ │
# ┌────────▼────────┐ │
# │ 观察 │───────┘
# └────────┬────────┘
# │
# ┌────────▼────────┐
# │ 最终答案 │
# └─────────────────┘
# 将工作流导出为Mermaid图
python scripts/agent_orchestrator.py agent.yaml --visualize --format mermaid
当需要改进现有提示的性能或降低令牌成本时使用。
步骤 1:建立当前提示的基线
python scripts/prompt_optimizer.py current_prompt.txt --analyze --output baseline.json
步骤 2:识别问题 查看分析报告,寻找:
步骤 3:应用优化模式
| 问题 | 应应用的模式 |
|---|---|
| 输出模糊 | 添加明确的格式规范 |
| 过于冗长 | 提取为少样本示例 |
| 结果不一致 | 添加角色/人设框架 |
| 缺少边缘情况 | 添加约束边界 |
步骤 4:生成优化版本
python scripts/prompt_optimizer.py current_prompt.txt --optimize --output optimized.txt
步骤 5:比较结果
python scripts/prompt_optimizer.py optimized.txt --analyze --compare baseline.json
# 显示:令牌减少量、清晰度提升、已解决的问题
步骤 6:用测试用例验证 针对您的评估集运行两个提示并比较输出。
当需要为上下文学习创建示例时使用。
步骤 1:清晰定义任务
任务:从客户评论中提取产品实体
输入:评论文本
输出:包含 {product_name, sentiment, features_mentioned} 的JSON
步骤 2:选择多样化的示例(建议3-5个)
| 示例类型 | 目的 |
|---|---|
| 简单案例 | 展示基本模式 |
| 边缘案例 | 处理模糊性 |
| 复杂案例 | 多个实体 |
| 负面案例 | 不应提取的内容 |
步骤 3:保持格式一致
示例 1:
输入:"Love my new iPhone 15, the camera is amazing!"
输出:{"product_name": "iPhone 15", "sentiment": "positive", "features_mentioned": ["camera"]}
示例 2:
输入:"The laptop was okay but battery life is terrible."
输出:{"product_name": "laptop", "sentiment": "mixed", "features_mentioned": ["battery life"]}
步骤 4:验证示例质量
python scripts/prompt_optimizer.py prompt_with_examples.txt --validate-examples
# 检查:一致性、覆盖率、格式对齐
步骤 5:用保留案例测试 确保模型能泛化到您的示例之外。
当需要可靠的JSON/XML/结构化响应时使用。
步骤 1:定义模式
{
"type": "object",
"properties": {
"summary": {"type": "string", "maxLength": 200},
"sentiment": {"enum": ["positive", "negative", "neutral"]},
"confidence": {"type": "number", "minimum": 0, "maximum": 1}
},
"required": ["summary", "sentiment"]
}
步骤 2:在提示中包含模式
请按照以下模式以JSON格式响应:
- summary (字符串,最多200字符):内容的简要摘要
- sentiment (枚举):"positive"、"negative"、"neutral" 之一
- confidence (数字 0-1):您对情感判断的置信度
步骤 3:添加格式强制要求
重要:请仅返回有效的JSON。不要使用markdown,不要解释。
请以 { 开始响应,以 } 结束。
步骤 4:验证输出
python scripts/prompt_optimizer.py structured_prompt.txt --validate-schema schema.json
| 文件 | 包含内容 | 当用户询问以下内容时加载 |
|---|---|---|
references/prompt_engineering_patterns.md | 10种提示模式,包含输入/输出示例 | "哪种模式?"、"少样本"、"思维链"、"角色提示" |
references/llm_evaluation_frameworks.md | 评估指标、评分方法、A/B测试 | "如何评估?"、"衡量质量"、"比较提示" |
references/agentic_system_design.md | 智能体架构(ReAct、计划-执行、工具使用) | "构建智能体"、"工具调用"、"多智能体" |
| 模式 | 使用时机 | 示例 |
|---|---|---|
| 零样本 | 简单、定义明确的任务 | "将此电子邮件分类为垃圾邮件或非垃圾邮件" |
| 少样本 | 复杂任务,需要一致的格式 | 在任务前提供3-5个示例 |
| 思维链 | 推理、数学、多步骤逻辑 | "请逐步思考..." |
| 角色提示 | 需要专业知识、特定视角 | "您是一名专业的税务会计师..." |
| 结构化输出 | 需要可解析的JSON/XML | 包含模式 + 格式强制要求 |
# 提示分析
python scripts/prompt_optimizer.py prompt.txt --analyze # 完整分析
python scripts/prompt_optimizer.py prompt.txt --tokens # 仅令牌计数
python scripts/prompt_optimizer.py prompt.txt --optimize # 生成优化版本
# RAG评估
python scripts/rag_evaluator.py --contexts ctx.json --questions q.json # 评估
python scripts/rag_evaluator.py --contexts ctx.json --compare baseline # 与基线比较
# 智能体开发
python scripts/agent_orchestrator.py agent.yaml --validate # 验证配置
python scripts/agent_orchestrator.py agent.yaml --visualize # 显示工作流
python scripts/agent_orchestrator.py agent.yaml --estimate-cost # 令牌估算
每周安装数
223
代码仓库
GitHub星标数
2.8K
首次出现
2026年1月20日
安全审计
安装于
claude-code193
opencode170
gemini-cli166
codex162
cursor148
github-copilot135
Prompt engineering patterns, LLM evaluation frameworks, and agentic system design.
# Analyze and optimize a prompt file
python scripts/prompt_optimizer.py prompts/my_prompt.txt --analyze
# Evaluate RAG retrieval quality
python scripts/rag_evaluator.py --contexts contexts.json --questions questions.json
# Visualize agent workflow from definition
python scripts/agent_orchestrator.py agent_config.yaml --visualize
Analyzes prompts for token efficiency, clarity, and structure. Generates optimized versions.
Input: Prompt text file or string Output: Analysis report with optimization suggestions
Usage:
# Analyze a prompt file
python scripts/prompt_optimizer.py prompt.txt --analyze
# Output:
# Token count: 847
# Estimated cost: $0.0025 (GPT-4)
# Clarity score: 72/100
# Issues found:
# - Ambiguous instruction at line 3
# - Missing output format specification
# - Redundant context (lines 12-15 repeat lines 5-8)
# Suggestions:
# 1. Add explicit output format: "Respond in JSON with keys: ..."
# 2. Remove redundant context to save 89 tokens
# 3. Clarify "analyze" -> "list the top 3 issues with severity ratings"
# Generate optimized version
python scripts/prompt_optimizer.py prompt.txt --optimize --output optimized.txt
# Count tokens for cost estimation
python scripts/prompt_optimizer.py prompt.txt --tokens --model gpt-4
# Extract and manage few-shot examples
python scripts/prompt_optimizer.py prompt.txt --extract-examples --output examples.json
Evaluates Retrieval-Augmented Generation quality by measuring context relevance and answer faithfulness.
Input: Retrieved contexts (JSON) and questions/answers Output: Evaluation metrics and quality report
Usage:
# Evaluate retrieval quality
python scripts/rag_evaluator.py --contexts retrieved.json --questions eval_set.json
# Output:
# === RAG Evaluation Report ===
# Questions evaluated: 50
#
# Retrieval Metrics:
# Context Relevance: 0.78 (target: >0.80)
# Retrieval Precision@5: 0.72
# Coverage: 0.85
#
# Generation Metrics:
# Answer Faithfulness: 0.91
# Groundedness: 0.88
#
# Issues Found:
# - 8 questions had no relevant context in top-5
# - 3 answers contained information not in context
#
# Recommendations:
# 1. Improve chunking strategy for technical documents
# 2. Add metadata filtering for date-sensitive queries
# Evaluate with custom metrics
python scripts/rag_evaluator.py --contexts retrieved.json --questions eval_set.json \
--metrics relevance,faithfulness,coverage
# Export detailed results
python scripts/rag_evaluator.py --contexts retrieved.json --questions eval_set.json \
--output report.json --verbose
Parses agent definitions and visualizes execution flows. Validates tool configurations.
Input: Agent configuration (YAML/JSON) Output: Workflow visualization, validation report
Usage:
# Validate agent configuration
python scripts/agent_orchestrator.py agent.yaml --validate
# Output:
# === Agent Validation Report ===
# Agent: research_assistant
# Pattern: ReAct
#
# Tools (4 registered):
# [OK] web_search - API key configured
# [OK] calculator - No config needed
# [WARN] file_reader - Missing allowed_paths
# [OK] summarizer - Prompt template valid
#
# Flow Analysis:
# Max depth: 5 iterations
# Estimated tokens/run: 2,400-4,800
# Potential infinite loop: No
#
# Recommendations:
# 1. Add allowed_paths to file_reader for security
# 2. Consider adding early exit condition for simple queries
# Visualize agent workflow (ASCII)
python scripts/agent_orchestrator.py agent.yaml --visualize
# Output:
# ┌─────────────────────────────────────────┐
# │ research_assistant │
# │ (ReAct Pattern) │
# └─────────────────┬───────────────────────┘
# │
# ┌────────▼────────┐
# │ User Query │
# └────────┬────────┘
# │
# ┌────────▼────────┐
# │ Think │◄──────┐
# └────────┬────────┘ │
# │ │
# ┌────────▼────────┐ │
# │ Select Tool │ │
# └────────┬────────┘ │
# │ │
# ┌─────────────┼─────────────┐ │
# ▼ ▼ ▼ │
# [web_search] [calculator] [file_reader]
# │ │ │ │
# └─────────────┼─────────────┘ │
# │ │
# ┌────────▼────────┐ │
# │ Observe │───────┘
# └────────┬────────┘
# │
# ┌────────▼────────┐
# │ Final Answer │
# └─────────────────┘
# Export workflow as Mermaid diagram
python scripts/agent_orchestrator.py agent.yaml --visualize --format mermaid
Use when improving an existing prompt's performance or reducing token costs.
Step 1: Baseline current prompt
python scripts/prompt_optimizer.py current_prompt.txt --analyze --output baseline.json
Step 2: Identify issues Review the analysis report for:
Step 3: Apply optimization patterns
| Issue | Pattern to Apply |
|---|---|
| Ambiguous output | Add explicit format specification |
| Too verbose | Extract to few-shot examples |
| Inconsistent results | Add role/persona framing |
| Missing edge cases | Add constraint boundaries |
Step 4: Generate optimized version
python scripts/prompt_optimizer.py current_prompt.txt --optimize --output optimized.txt
Step 5: Compare results
python scripts/prompt_optimizer.py optimized.txt --analyze --compare baseline.json
# Shows: token reduction, clarity improvement, issues resolved
Step 6: Validate with test cases Run both prompts against your evaluation set and compare outputs.
Use when creating examples for in-context learning.
Step 1: Define the task clearly
Task: Extract product entities from customer reviews
Input: Review text
Output: JSON with {product_name, sentiment, features_mentioned}
Step 2: Select diverse examples (3-5 recommended)
| Example Type | Purpose |
|---|---|
| Simple case | Shows basic pattern |
| Edge case | Handles ambiguity |
| Complex case | Multiple entities |
| Negative case | What NOT to extract |
Step 3: Format consistently
Example 1:
Input: "Love my new iPhone 15, the camera is amazing!"
Output: {"product_name": "iPhone 15", "sentiment": "positive", "features_mentioned": ["camera"]}
Example 2:
Input: "The laptop was okay but battery life is terrible."
Output: {"product_name": "laptop", "sentiment": "mixed", "features_mentioned": ["battery life"]}
Step 4: Validate example quality
python scripts/prompt_optimizer.py prompt_with_examples.txt --validate-examples
# Checks: consistency, coverage, format alignment
Step 5: Test with held-out cases Ensure model generalizes beyond your examples.
Use when you need reliable JSON/XML/structured responses.
Step 1: Define schema
{
"type": "object",
"properties": {
"summary": {"type": "string", "maxLength": 200},
"sentiment": {"enum": ["positive", "negative", "neutral"]},
"confidence": {"type": "number", "minimum": 0, "maximum": 1}
},
"required": ["summary", "sentiment"]
}
Step 2: Include schema in prompt
Respond with JSON matching this schema:
- summary (string, max 200 chars): Brief summary of the content
- sentiment (enum): One of "positive", "negative", "neutral"
- confidence (number 0-1): Your confidence in the sentiment
Step 3: Add format enforcement
IMPORTANT: Respond ONLY with valid JSON. No markdown, no explanation.
Start your response with { and end with }
Step 4: Validate outputs
python scripts/prompt_optimizer.py structured_prompt.txt --validate-schema schema.json
| File | Contains | Load when user asks about |
|---|---|---|
references/prompt_engineering_patterns.md | 10 prompt patterns with input/output examples | "which pattern?", "few-shot", "chain-of-thought", "role prompting" |
references/llm_evaluation_frameworks.md | Evaluation metrics, scoring methods, A/B testing | "how to evaluate?", "measure quality", "compare prompts" |
references/agentic_system_design.md | Agent architectures (ReAct, Plan-Execute, Tool Use) | "build agent", "tool calling", "multi-agent" |
| Pattern | When to Use | Example |
|---|---|---|
| Zero-shot | Simple, well-defined tasks | "Classify this email as spam or not spam" |
| Few-shot | Complex tasks, consistent format needed | Provide 3-5 examples before the task |
| Chain-of-Thought | Reasoning, math, multi-step logic | "Think step by step..." |
| Role Prompting | Expertise needed, specific perspective | "You are an expert tax accountant..." |
| Structured Output | Need parseable JSON/XML | Include schema + format enforcement |
# Prompt Analysis
python scripts/prompt_optimizer.py prompt.txt --analyze # Full analysis
python scripts/prompt_optimizer.py prompt.txt --tokens # Token count only
python scripts/prompt_optimizer.py prompt.txt --optimize # Generate optimized version
# RAG Evaluation
python scripts/rag_evaluator.py --contexts ctx.json --questions q.json # Evaluate
python scripts/rag_evaluator.py --contexts ctx.json --compare baseline # Compare to baseline
# Agent Development
python scripts/agent_orchestrator.py agent.yaml --validate # Validate config
python scripts/agent_orchestrator.py agent.yaml --visualize # Show workflow
python scripts/agent_orchestrator.py agent.yaml --estimate-cost # Token estimation
Weekly Installs
223
Repository
GitHub Stars
2.8K
First Seen
Jan 20, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
claude-code193
opencode170
gemini-cli166
codex162
cursor148
github-copilot135
agent-browser 浏览器自动化工具 - Vercel Labs 命令行网页操作与测试
147,400 周安装