customaize-agent%3Aprompt-engineering by neolabhq/context-engineering-kit
npx skills add https://github.com/neolabhq/context-engineering-kit --skill customaize-agent:prompt-engineering用于最大化 LLM 性能、可靠性和可控性的高级提示工程技术。
通过展示示例来教导模型,而不是解释规则。包含 2-5 个展示期望行为的输入-输出对。当您需要一致的格式、特定的推理模式或处理边缘情况时使用。更多示例可以提高准确性但会消耗 token——根据任务复杂性进行权衡。
示例:
从支持工单中提取关键信息:
输入:"我的登录不起作用,一直收到错误 403"
输出:{"issue": "authentication", "error_code": "403", "priority": "high"}
输入:"功能请求:在设置中添加深色模式"
输出:{"issue": "feature_request", "error_code": null, "priority": "low"}
现在处理:"无法上传大于 10MB 的文件,出现超时"
在最终答案之前请求逐步推理。添加"让我们逐步思考"(零样本)或包含示例推理轨迹(少样本)。用于需要多步逻辑、数学推理的复杂问题,或者当您需要验证模型的思维过程时。将分析任务的准确性提高 30-50%。
示例:
分析此错误报告并确定根本原因。
逐步思考:
1. 期望的行为是什么?
2. 实际的行为是什么?
3. 最近有什么变化可能导致此问题?
4. 涉及哪些组件?
5. 最可能的根本原因是什么?
错误:"昨天部署缓存更新后,用户无法保存草稿"
通过测试和优化系统性地改进提示。从简单开始,衡量性能(准确性、一致性、token 使用量),然后迭代。在包括边缘情况在内的多样化输入上进行测试。使用 A/B 测试来比较变体。对于一致性和成本至关重要的生产提示至关重要。
示例:
版本 1(简单):"总结这篇文章"
→ 结果:长度不一致,遗漏要点
版本 2(添加约束):"用 3 个要点总结"
→ 结果:结构更好,但仍遗漏细微差别
版本 3(添加推理):"识别 3 个主要发现,然后总结每个"
→ 结果:一致、准确、捕捉关键信息
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
构建具有变量、条件部分和模块化组件的可重用提示结构。用于多轮对话、基于角色的交互,或者当相同模式适用于不同输入时。减少重复并确保类似任务间的一致性。
示例:
# 可重用的代码审查模板
template = """
审查此 {language} 代码的 {focus_area}。
代码:
{code_block}
提供关于以下方面的反馈:
{checklist}
"""
# 用法
prompt = template.format(
language="Python",
focus_area="安全漏洞",
code_block=user_code,
checklist="1. SQL 注入\n2. XSS 风险\n3. 身份验证"
)
设置在整个对话中持续存在的全局行为和约束。定义模型的角色、专业水平、输出格式和安全指南。使用系统提示来处理不应逐轮更改的稳定指令,从而为用户消息 token 释放空间以处理可变内容。
示例:
系统:您是一位专注于 API 设计的高级后端工程师。
规则:
- 始终考虑可扩展性和性能
- 默认建议 RESTful 模式
- 立即标记安全问题
- 提供 Python 代码示例
- 使用提前返回模式
将响应格式化为:
1. 分析
2. 建议
3. 代码示例
4. 权衡
从简单的提示开始,仅在需要时添加复杂性:
级别 1 :直接指令
级别 2 :添加约束
级别 3 :添加推理
级别 4 :添加示例
[系统上下文] → [任务指令] → [示例] → [输入数据] → [输出格式]
构建能够优雅处理失败的提示:
# 将检索到的上下文与提示工程结合
prompt = f"""给定以下上下文:
{retrieved_context}
{few_shot_examples}
问题:{user_question}
仅基于上述上下文提供详细答案。如果上下文信息不足,请明确说明缺少什么。"""
# 添加自我验证步骤
prompt = f"""{main_task_prompt}
生成响应后,验证其是否符合以下标准:
1. 直接回答问题
2. 仅使用提供上下文中的信息
3. 引用具体来源
4. 承认任何不确定性
如果验证失败,请修改您的响应。"""
基于 Anthropic 官方的智能体提示最佳实践。
"上下文窗口"指的是语言模型在生成新文本时可以回顾和引用的全部文本量加上它生成的新文本。这与语言模型训练所使用的大型语料库不同,而是代表了模型的"工作记忆"。更大的上下文窗口允许模型理解和响应更复杂、更长的提示,而较小的上下文窗口可能会限制模型处理较长提示或在扩展对话中保持连贯性的能力。
上下文窗口是公共资源。您的提示、命令、技能与 Claude 需要知道的所有其他内容共享上下文窗口,包括:
默认假设 :Claude 已经非常聪明
只添加 Claude 尚未拥有的上下文。挑战每条信息:
好例子:简洁(约 50 个 token):
## 提取 PDF 文本
使用 pdfplumber 进行文本提取:
```python
import pdfplumber
with pdfplumber.open("file.pdf") as pdf:
text = pdf.pages[0].extract_text()
```
坏例子:过于冗长(约 150 个 token):
## 提取 PDF 文本
PDF(便携式文档格式)文件是一种包含文本、图像和其他内容的常见文件格式。要从 PDF 中提取文本,您需要使用一个库。有许多可用于 PDF 处理的库,但我们推荐 pdfplumber,因为它易于使用且能很好地处理大多数情况。首先,您需要使用 pip 安装它。然后您可以使用下面的代码...
简洁版本假设 Claude 知道 PDF 是什么以及库如何工作。
将特异性级别与任务的脆弱性和可变性相匹配。
高自由度(基于文本的指令):
在以下情况下使用:
示例:
## 代码审查流程
1. 分析代码结构和组织
2. 检查潜在的错误或边缘情况
3. 提出可读性和可维护性的改进建议
4. 验证是否遵守项目约定
中等自由度(带有参数的伪代码或脚本):
在以下情况下使用:
示例:
## 生成报告
使用此模板并根据需要进行自定义:
```python
def generate_report(data, format="markdown", include_charts=True):
# 处理数据
# 以指定格式生成输出
# 可选地包含可视化
```
低自由度(特定脚本,很少或没有参数):
在以下情况下使用:
示例:
## 数据库迁移
精确运行此脚本:
```bash
python scripts/migrate.py --verify --backup
```
不要修改命令或添加额外的标志。
类比 :将 Claude 想象成一个探索路径的机器人:
适用于编写提示,包括但不限于:Claude Code 的命令、钩子、技能,或子智能体或任何其他 LLM 交互的提示。
LLM 对与人类相同的说服原则做出反应。理解这种心理学有助于您设计更有效的技能——不是为了操纵,而是为了确保即使在压力下也能遵循关键实践。
研究基础: Meincke 等人(2025)在 N=28,000 次 AI 对话中测试了 7 种说服原则。说服技巧使遵从率提高了一倍多(33% → 72%,p < .001)。
是什么: 对专业知识、资历或官方来源的遵从。
在提示中如何工作:
何时使用:
示例:
✅ 先写代码再写测试?删除它。重新开始。没有例外。
❌ 在可行时考虑先写测试。
是什么: 与先前行动、陈述或公开声明的一致性。
在提示中如何工作:
何时使用:
示例:
✅ 当您找到一个技能时,您必须宣布:"我正在使用 [技能名称]"
❌ 考虑让您的合作伙伴知道您正在使用哪个技能。
是什么: 来自时间限制或有限可用性的紧迫感。
在提示中如何工作:
何时使用:
示例:
✅ 完成任务后,在继续之前立即请求代码审查。
❌ 您可以在方便时审查代码。
是什么: 遵从他人行为或被认为是正常的行为。
在提示中如何工作:
何时使用:
示例:
✅ 没有 TodoWrite 跟踪的清单 = 步骤被跳过。每次都是。
❌ 有些人发现 TodoWrite 对清单有帮助。
是什么: 共享身份、"我们感"、群体归属感。
在提示中如何工作:
何时使用:
示例:
✅ 我们是共同工作的同事。我需要您诚实的技术判断。
❌ 如果我错了,您可能应该告诉我。
是什么: 回报所获利益的义务。
如何工作:
何时避免:
是什么: 倾向于与我们喜欢的人合作。
如何工作:
何时避免:
| 提示类型 | 使用 | 避免 |
|---|---|---|
| 纪律执行 | 权威 + 承诺 + 社会认同 | 好感度, 互惠性 |
| 指导/技术 | 适度权威 + 统一性 | 强权威 |
| 协作性 | 统一性 + 承诺 | 权威, 好感度 |
| 参考 | 仅清晰性 | 所有说服原则 |
明确规则减少合理化:
实施意图创造自动行为:
LLM 是类人的:
合法的:
不合法的:
测试: 如果用户完全理解,这种技巧是否服务于用户的真正利益?
设计提示时,请问:
每周安装量
254
仓库
GitHub 星标数
699
首次出现
2026 年 2 月 19 日
安装于
opencode244
codex243
github-copilot241
gemini-cli240
amp238
kimi-cli238
Advanced prompt engineering techniques to maximize LLM performance, reliability, and controllability.
Teach the model by showing examples instead of explaining rules. Include 2-5 input-output pairs that demonstrate the desired behavior. Use when you need consistent formatting, specific reasoning patterns, or handling of edge cases. More examples improve accuracy but consume tokens—balance based on task complexity.
Example:
Extract key information from support tickets:
Input: "My login doesn't work and I keep getting error 403"
Output: {"issue": "authentication", "error_code": "403", "priority": "high"}
Input: "Feature request: add dark mode to settings"
Output: {"issue": "feature_request", "error_code": null, "priority": "low"}
Now process: "Can't upload files larger than 10MB, getting timeout"
Request step-by-step reasoning before the final answer. Add "Let's think step by step" (zero-shot) or include example reasoning traces (few-shot). Use for complex problems requiring multi-step logic, mathematical reasoning, or when you need to verify the model's thought process. Improves accuracy on analytical tasks by 30-50%.
Example:
Analyze this bug report and determine root cause.
Think step by step:
1. What is the expected behavior?
2. What is the actual behavior?
3. What changed recently that could cause this?
4. What components are involved?
5. What is the most likely root cause?
Bug: "Users can't save drafts after the cache update deployed yesterday"
Systematically improve prompts through testing and refinement. Start simple, measure performance (accuracy, consistency, token usage), then iterate. Test on diverse inputs including edge cases. Use A/B testing to compare variations. Critical for production prompts where consistency and cost matter.
Example:
Version 1 (Simple): "Summarize this article"
→ Result: Inconsistent length, misses key points
Version 2 (Add constraints): "Summarize in 3 bullet points"
→ Result: Better structure, but still misses nuance
Version 3 (Add reasoning): "Identify the 3 main findings, then summarize each"
→ Result: Consistent, accurate, captures key information
Build reusable prompt structures with variables, conditional sections, and modular components. Use for multi-turn conversations, role-based interactions, or when the same pattern applies to different inputs. Reduces duplication and ensures consistency across similar tasks.
Example:
# Reusable code review template
template = """
Review this {language} code for {focus_area}.
Code:
{code_block}
Provide feedback on:
{checklist}
"""
# Usage
prompt = template.format(
language="Python",
focus_area="security vulnerabilities",
code_block=user_code,
checklist="1. SQL injection\n2. XSS risks\n3. Authentication"
)
Set global behavior and constraints that persist across the conversation. Define the model's role, expertise level, output format, and safety guidelines. Use system prompts for stable instructions that shouldn't change turn-to-turn, freeing up user message tokens for variable content.
Example:
System: You are a senior backend engineer specializing in API design.
Rules:
- Always consider scalability and performance
- Suggest RESTful patterns by default
- Flag security concerns immediately
- Provide code examples in Python
- Use early return pattern
Format responses as:
1. Analysis
2. Recommendation
3. Code example
4. Trade-offs
Start with simple prompts, add complexity only when needed:
Level 1 : Direct instruction
Level 2 : Add constraints
Level 3 : Add reasoning
Level 4 : Add examples
[System Context] → [Task Instruction] → [Examples] → [Input Data] → [Output Format]
Build prompts that gracefully handle failures:
# Combine retrieved context with prompt engineering
prompt = f"""Given the following context:
{retrieved_context}
{few_shot_examples}
Question: {user_question}
Provide a detailed answer based solely on the context above. If the context doesn't contain enough information, explicitly state what's missing."""
# Add self-verification step
prompt = f"""{main_task_prompt}
After generating your response, verify it meets these criteria:
1. Answers the question directly
2. Uses only information from provided context
3. Cites specific sources
4. Acknowledges any uncertainty
If verification fails, revise your response."""
Based on Anthropic's official best practices for agent prompting.
The “context window” refers to the entirety of the amount of text a language model can look back on and reference when generating new text plus the new text it generates. This is different from the large corpus of data the language model was trained on, and instead represents a “working memory” for the model. A larger context window allows the model to understand and respond to more complex and lengthy prompts, while a smaller context window may limit the model’s ability to handle longer prompts or maintain coherence over extended conversations.
The context window is a public good. Your prompt, command, skill shares the context window with everything else Claude needs to know, including:
Default assumption : Claude is already very smart
Only add context Claude doesn't already have. Challenge each piece of information:
Good example: Concise (approximately 50 tokens):
## Extract PDF text
Use pdfplumber for text extraction:
```python
import pdfplumber
with pdfplumber.open("file.pdf") as pdf:
text = pdf.pages[0].extract_text()
```
Bad example: Too verbose (approximately 150 tokens):
## Extract PDF text
PDF (Portable Document Format) files are a common file format that contains
text, images, and other content. To extract text from a PDF, you'll need to
use a library. There are many libraries available for PDF processing, but we
recommend pdfplumber because it's easy to use and handles most cases well.
First, you'll need to install it using pip. Then you can use the code below...
The concise version assumes Claude knows what PDFs are and how libraries work.
Match the level of specificity to the task's fragility and variability.
High freedom (text-based instructions):
Use when:
Example:
## Code review process
1. Analyze the code structure and organization
2. Check for potential bugs or edge cases
3. Suggest improvements for readability and maintainability
4. Verify adherence to project conventions
Medium freedom (pseudocode or scripts with parameters):
Use when:
Example:
## Generate report
Use this template and customize as needed:
```python
def generate_report(data, format="markdown", include_charts=True):
# Process data
# Generate output in specified format
# Optionally include visualizations
```
Low freedom (specific scripts, few or no parameters):
Use when:
Example:
## Database migration
Run exactly this script:
```bash
python scripts/migrate.py --verify --backup
```
Do not modify the command or add additional flags.
Analogy : Think of Claude as a robot exploring a path:
Usefull for writing prompts, including but not limited to: commands, hooks, skills for Claude Code, or prompts for sub agents or any other LLM interaction.
LLMs respond to the same persuasion principles as humans. Understanding this psychology helps you design more effective skills - not to manipulate, but to ensure critical practices are followed even under pressure.
Research foundation: Meincke et al. (2025) tested 7 persuasion principles with N=28,000 AI conversations. Persuasion techniques more than doubled compliance rates (33% → 72%, p < .001).
What it is: Deference to expertise, credentials, or official sources.
How it works in prompts:
When to use:
Example:
✅ Write code before test? Delete it. Start over. No exceptions.
❌ Consider writing tests first when feasible.
What it is: Consistency with prior actions, statements, or public declarations.
How it works in prompts:
When to use:
Example:
✅ When you find a skill, you MUST announce: "I'm using [Skill Name]"
❌ Consider letting your partner know which skill you're using.
What it is: Urgency from time limits or limited availability.
How it works in prompts:
When to use:
Example:
✅ After completing a task, IMMEDIATELY request code review before proceeding.
❌ You can review code when convenient.
What it is: Conformity to what others do or what's considered normal.
How it works in prompts:
When to use:
Example:
✅ Checklists without TodoWrite tracking = steps get skipped. Every time.
❌ Some people find TodoWrite helpful for checklists.
What it is: Shared identity, "we-ness", in-group belonging.
How it works in prompts:
When to use:
Example:
✅ We're colleagues working together. I need your honest technical judgment.
❌ You should probably tell me if I'm wrong.
What it is: Obligation to return benefits received.
How it works:
When to avoid:
What it is: Preference for cooperating with those we like.
How it works:
When to avoid:
| Prompt Type | Use | Avoid |
|---|---|---|
| Discipline-enforcing | Authority + Commitment + Social Proof | Liking, Reciprocity |
| Guidance/technique | Moderate Authority + Unity | Heavy authority |
| Collaborative | Unity + Commitment | Authority, Liking |
| Reference | Clarity only | All persuasion |
Bright-line rules reduce rationalization:
Implementation intentions create automatic behavior:
LLMs are parahuman:
Legitimate:
Illegitimate:
The test: Would this technique serve the user's genuine interests if they fully understood it?
When designing a prompt, ask:
Weekly Installs
254
Repository
GitHub Stars
699
First Seen
Feb 19, 2026
Installed on
opencode244
codex243
github-copilot241
gemini-cli240
amp238
kimi-cli238
React 组合模式指南:Vercel 组件架构最佳实践,提升代码可维护性
106,200 周安装