npx skills add https://github.com/crinkj/common-claude-setting --skill project-development本技能涵盖了识别适合 LLM 处理的任务、设计有效的项目架构以及使用智能体辅助开发进行快速迭代的原则。无论您是在构建批处理流水线、多智能体研究系统,还是交互式智能体应用程序,该方法论都适用。
在以下情况下激活此技能:
并非所有问题都受益于 LLM 处理。任何项目的第一步都是评估任务特征是否与 LLM 的优势相匹配。此评估应在编写任何代码之前进行。
适合 LLM 的任务具有以下特征:
| 特征 | 匹配原因 |
|---|---|
| 跨来源综合 | LLM 擅长整合来自多个输入的信息 |
| 基于规则的判断 | LLM 能够根据标准处理评分、评估和分类 |
| 自然语言输出 | 当目标是生成人类可读文本,而非结构化数据时 |
| 容错性 | 个别失败不会破坏整个系统 |
| 批处理 | 项目之间不需要会话状态 |
| 训练中包含领域知识 | 模型已具备相关上下文 |
不适合 LLM 的任务具有以下特征:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 特征 | 失败原因 |
|---|---|
| 精确计算 | 数学、计数和精确算法不可靠 |
| 实时性要求 | LLM 的延迟对于亚秒级响应来说太高 |
| 完美准确性要求 | 幻觉风险使得 100% 的准确性不可能实现 |
| 专有数据依赖 | 模型缺乏必要的上下文 |
| 顺序依赖性 | 每个步骤严重依赖于前一个结果 |
| 确定性输出要求 | 相同的输入必须产生完全相同的输出 |
评估应通过手动原型设计进行:在构建任何自动化之前,取一个代表性示例,直接用目标模型进行测试。
在投入自动化之前,通过手动测试验证任务与模型的匹配度。将一个代表性输入复制到模型界面中。评估输出质量。这只需几分钟,却能防止数小时的开发浪费。
此验证回答了关键问题:
如果手动原型失败,自动化系统也会失败。如果成功,您就有了比较的基准和提示设计的模板。
LLM 项目受益于分阶段的流水线架构,其中每个阶段是:
典型的流水线结构:
acquire → prepare → process → parse → render
阶段 1、2、4 和 5 是确定性的。阶段 3 是非确定性的且成本高昂。这种分离允许仅在必要时重新运行昂贵的 LLM 阶段,同时快速迭代解析和呈现阶段。
使用文件系统而非数据库或内存结构来跟踪流水线状态。每个处理单元都有一个目录。每个阶段的完成由文件的存在来标记。
data/{id}/
├── raw.json # acquire 阶段完成
├── prompt.md # prepare 阶段完成
├── response.md # process 阶段完成
├── parsed.json # parse 阶段完成
要检查某个项目是否需要处理:检查输出文件是否存在。要重新运行某个阶段:删除其输出文件及下游文件。要调试:直接读取中间文件。
这种模式提供了:
当 LLM 输出必须以编程方式解析时,提示设计直接决定了解析的可靠性。提示必须通过示例指定确切的格式要求。
有效的结构规范包括:
示例提示结构:
Analyze the following and provide your response in exactly this format:
## Summary
[Your summary here]
## Score
Rating: [1-10]
## Details
- Key point 1
- Key point 2
Follow this format exactly because I will be parsing it programmatically.
解析代码必须能够优雅地处理变化。LLM 不会完美地遵循指令。构建能够:
的解析器。
现代具备智能体能力的模型可以显著加速开发。模式是:
这关乎快速迭代:生成、测试、修复、重复。智能体处理样板代码和初始结构,而您专注于特定领域的要求和边缘情况。
有效进行智能体辅助开发的关键实践:
LLM 处理具有可预测的成本,应在开始前进行估算。公式:
Total cost = (items × tokens_per_item × price_per_token) + API overhead
对于批处理:
在开发过程中跟踪实际成本。如果成本显著超出估算,请重新评估方法。考虑:
单智能体流水线适用于:
多智能体架构适用于:
采用多智能体的主要原因是上下文隔离,而非角色拟人化。子智能体获得新的上下文窗口以专注于子任务。这可以防止在长时间运行的任务中出现上下文退化。
有关详细架构指导,请参见 multi-agent-patterns 技能。
从最小架构开始。仅在证明必要时才增加复杂性。生产证据表明,移除专用工具通常能提高性能。
Vercel 的 d0 智能体通过将 17 个专用工具减少到 2 个基本原语(bash 命令执行和 SQL),将成功率从 80% 提高到 100%。文件系统智能体模式使用标准的 Unix 工具(grep、cat、find、ls)而不是自定义的探索工具。
简化优于复杂性的情况:
需要复杂性的情况:
有关详细的工具架构指导,请参见 tool-design 技能。
预期会进行重构。大规模的生产智能体系统需要进行多次架构迭代。Manus 自推出以来已经重构了其智能体框架五次。"苦涩的教训"表明,为当前模型限制而添加的结构,随着模型的改进,会成为约束。
为变化而构建:
任务分析
手动验证
架构选择
成本估算
开发计划
跳过手动验证 :在验证模型能否执行任务之前就构建自动化,当方法存在根本缺陷时会浪费大量时间。
单体流水线 :将所有阶段合并到一个脚本中,使得调试和迭代变得困难。使用持久的中间输出将各个阶段分开。
过度约束模型 :添加护栏、预过滤和验证逻辑,而这些本可以由模型自行处理。测试您的脚手架是帮助还是阻碍了模型。
直到生产才考虑成本 :令牌成本在规模上会迅速累积。从一开始就估算和跟踪。
完美的解析要求 :期望 LLM 完美地遵循格式指令。构建能够处理变化的健壮解析器。
过早优化 :在基本流水线正常工作之前就添加缓存、并行化和优化。
示例 1:批处理分析流水线(Karpathy 的 HN 时间胶囊)
任务:用后见之明评分分析 10 年前的 930 个 HN 讨论。
架构:
结果:总成本 58 美元,执行约 1 小时,静态 HTML 输出。
示例 2:架构简化(Vercel d0)
任务:用于内部分析的文本到 SQL 智能体。
之前:17 个专用工具,80% 成功率,平均执行时间 274 秒。
之后:2 个工具(bash + SQL),100% 成功率,平均执行时间 77 秒。
关键洞察:语义层已经是很好的文档。Claude 只需要能够直接读取文件。
详细分析请参见 案例研究。
此技能关联到:
内部参考资料:
本集合中的相关技能:
外部资源:
创建日期 : 2025-12-25 最后更新 : 2025-12-25 作者 : Agent Skills for Context Engineering Contributors 版本 : 1.0.0
每周安装数
1
代码仓库
首次出现
今天
安全审计
安装于
zencoder1
amp1
cline1
openclaw1
opencode1
cursor1
This skill covers the principles for identifying tasks suited to LLM processing, designing effective project architectures, and iterating rapidly using agent-assisted development. The methodology applies whether building a batch processing pipeline, a multi-agent research system, or an interactive agent application.
Activate this skill when:
Not every problem benefits from LLM processing. The first step in any project is evaluating whether the task characteristics align with LLM strengths. This evaluation should happen before writing any code.
LLM-suited tasks share these characteristics:
| Characteristic | Why It Fits |
|---|---|
| Synthesis across sources | LLMs excel at combining information from multiple inputs |
| Subjective judgment with rubrics | LLMs handle grading, evaluation, and classification with criteria |
| Natural language output | When the goal is human-readable text, not structured data |
| Error tolerance | Individual failures do not break the overall system |
| Batch processing | No conversational state required between items |
| Domain knowledge in training | The model already has relevant context |
LLM-unsuited tasks share these characteristics:
| Characteristic | Why It Fails |
|---|---|
| Precise computation | Math, counting, and exact algorithms are unreliable |
| Real-time requirements | LLM latency is too high for sub-second responses |
| Perfect accuracy requirements | Hallucination risk makes 100% accuracy impossible |
| Proprietary data dependence | The model lacks necessary context |
| Sequential dependencies | Each step depends heavily on the previous result |
| Deterministic output requirements | Same input must produce identical output |
The evaluation should happen through manual prototyping: take one representative example and test it directly with the target model before building any automation.
Before investing in automation, validate task-model fit with a manual test. Copy one representative input into the model interface. Evaluate the output quality. This takes minutes and prevents hours of wasted development.
This validation answers critical questions:
If the manual prototype fails, the automated system will fail. If it succeeds, you have a baseline for comparison and a template for prompt design.
LLM projects benefit from staged pipeline architectures where each stage is:
The canonical pipeline structure:
acquire → prepare → process → parse → render
Stages 1, 2, 4, and 5 are deterministic. Stage 3 is non-deterministic and expensive. This separation allows re-running the expensive LLM stage only when necessary, while iterating quickly on parsing and rendering.
Use the file system to track pipeline state rather than databases or in-memory structures. Each processing unit gets a directory. Each stage completion is marked by file existence.
data/{id}/
├── raw.json # acquire stage complete
├── prompt.md # prepare stage complete
├── response.md # process stage complete
├── parsed.json # parse stage complete
To check if an item needs processing: check if the output file exists. To re-run a stage: delete its output file and downstream files. To debug: read the intermediate files directly.
This pattern provides:
When LLM outputs must be parsed programmatically, prompt design directly determines parsing reliability. The prompt must specify exact format requirements with examples.
Effective structure specification includes:
Example prompt structure:
Analyze the following and provide your response in exactly this format:
## Summary
[Your summary here]
## Score
Rating: [1-10]
## Details
- Key point 1
- Key point 2
Follow this format exactly because I will be parsing it programmatically.
The parsing code must handle variations gracefully. LLMs do not follow instructions perfectly. Build parsers that:
Modern agent-capable models can accelerate development significantly. The pattern is:
This is about rapid iteration: generate, test, fix, repeat. The agent handles boilerplate and initial structure while you focus on domain-specific requirements and edge cases.
Key practices for effective agent-assisted development:
LLM processing has predictable costs that should be estimated before starting. The formula:
Total cost = (items × tokens_per_item × price_per_token) + API overhead
For batch processing:
Track actual costs during development. If costs exceed estimates significantly, re-evaluate the approach. Consider:
Single-agent pipelines work for:
Multi-agent architectures work for:
The primary reason for multi-agent is context isolation, not role anthropomorphization. Sub-agents get fresh context windows for focused subtasks. This prevents context degradation on long-running tasks.
See multi-agent-patterns skill for detailed architecture guidance.
Start with minimal architecture. Add complexity only when proven necessary. Production evidence shows that removing specialized tools often improves performance.
Vercel's d0 agent achieved 100% success rate (up from 80%) by reducing from 17 specialized tools to 2 primitives: bash command execution and SQL. The file system agent pattern uses standard Unix utilities (grep, cat, find, ls) instead of custom exploration tools.
When reduction outperforms complexity:
When complexity is necessary:
See tool-design skill for detailed tool architecture guidance.
Expect to refactor. Production agent systems at scale require multiple architectural iterations. Manus refactored their agent framework five times since launch. The Bitter Lesson suggests that structures added for current model limitations become constraints as models improve.
Build for change:
Task Analysis
Manual Validation
Architecture Selection
Cost Estimation
Development Plan
Skipping manual validation : Building automation before verifying the model can do the task wastes significant time when the approach is fundamentally flawed.
Monolithic pipelines : Combining all stages into one script makes debugging and iteration difficult. Separate stages with persistent intermediate outputs.
Over-constraining the model : Adding guardrails, pre-filtering, and validation logic that the model could handle on its own. Test whether your scaffolding helps or hurts.
Ignoring costs until production : Token costs compound quickly at scale. Estimate and track from the beginning.
Perfect parsing requirements : Expecting LLMs to follow format instructions perfectly. Build robust parsers that handle variations.
Premature optimization : Adding caching, parallelization, and optimization before the basic pipeline works correctly.
Example 1: Batch Analysis Pipeline (Karpathy's HN Time Capsule)
Task: Analyze 930 HN discussions from 10 years ago with hindsight grading.
Architecture:
Results: $58 total cost, ~1 hour execution, static HTML output.
Example 2: Architectural Reduction (Vercel d0)
Task: Text-to-SQL agent for internal analytics.
Before: 17 specialized tools, 80% success rate, 274s average execution.
After: 2 tools (bash + SQL), 100% success rate, 77s average execution.
Key insight: The semantic layer was already good documentation. Claude just needed access to read files directly.
See Case Studies for detailed analysis.
This skill connects to:
Internal references:
Related skills in this collection:
External resources:
Created : 2025-12-25 Last Updated : 2025-12-25 Author : Agent Skills for Context Engineering Contributors Version : 1.0.0
Weekly Installs
1
Repository
First Seen
Today
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
zencoder1
amp1
cline1
openclaw1
opencode1
cursor1
React 组合模式指南:Vercel 组件架构最佳实践,提升代码可维护性
109,600 周安装
Crypto Agent Trading 技能:自动化加密货币交易智能体开发指南
337 周安装
OpenAI图像生成脚本:批量生成DALL-E 3/GPT图像,支持多模型参数与自动图库
338 周安装
Cloudflare Workers AI 完整指南:2025 模型更新、性能优化与常见问题解决方案
338 周安装
Cloudflare Queues 完全指南:2025最新特性、快速入门与API详解
339 周安装
发票模板生成技能 - 使用Python ReportLab快速创建专业PDF发票,支持批量生成与定制
354 周安装
json-render-core:AI驱动的JSON模式定义与动态UI渲染核心库
340 周安装