重要前提
安装AI Skills的关键前提是:必须科学上网,且开启TUN模式,这一点至关重要,直接决定安装能否顺利完成,在此郑重提醒三遍:科学上网,科学上网,科学上网。查看完整安装教程 →
project-development by guanyang/antigravity-skills
npx skills add https://github.com/guanyang/antigravity-skills --skill project-development本技能涵盖了识别适合 LLM 处理的任务、设计有效的项目架构以及使用智能体辅助开发进行快速迭代的原则。无论构建批处理流水线、多智能体研究系统还是交互式智能体应用,该方法论都适用。
在以下情况下激活此技能:
在编写任何代码之前评估任务-模型匹配度,因为在根本上不匹配的任务上构建自动化会浪费数天的努力。使用以下两个表格来评估每个提议的任务,以决定是继续还是停止。
当任务具有以下特征时继续:
| 特征 | 理由 |
|---|---|
| 跨来源综合 | LLM 比基于规则的替代方案能更好地结合来自多个输入的信息 |
| 基于量规的主观判断 | 带有标准的评分、评估和分类自然映射到语言推理 |
| 自然语言输出 | 当目标是人类可读的文本时,LLM 能原生地提供它 |
| 容错性 | 个别失败不会破坏整个系统,因此 LLM 的非确定性是可以接受的 |
| 批处理 | 项目之间不需要对话状态,这保持了上下文的清洁 |
| 训练数据中的领域知识 | 模型已具备相关上下文,减少了提示工程的开销 |
当任务具有以下特征时停止:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 特征 | 理由 |
|---|---|
| 精确计算 | 数学、计数和精确算法在语言模型中不可靠 |
| 实时性要求 | LLM 的延迟太高,无法满足亚秒级响应 |
| 完美准确性要求 | 幻觉风险使得 100% 的准确性不可能实现 |
| 专有数据依赖 | 模型缺乏必要的上下文,并且无法仅从提示中获取 |
| 顺序依赖性 | 每个步骤严重依赖于前一步的结果,导致错误累积 |
| 确定性输出要求 | 相同的输入必须产生相同的输出,这是 LLM 无法保证的 |
在投入自动化之前,始终通过手动测试来验证任务-模型匹配度。将一个代表性输入复制到模型界面,评估输出质量,并使用结果来回答以下问题:
这样做是因为失败的手动原型预示着失败的自动化系统,而成功的原型既提供了质量基准,也提供了提示设计的模板。这个测试只需几分钟,却能防止数小时的开发浪费。
将 LLM 项目构建为分阶段的流水线,因为确定性和非确定性阶段的分离能够实现快速迭代和成本控制。设计每个阶段时,应使其:
使用这种规范的流水线结构:
acquire -> prepare -> process -> parse -> render
阶段 1、2、4 和 5 是确定性的。阶段 3 是非确定性的且昂贵。保持这种分离,因为它允许仅在必要时重新运行昂贵的 LLM 阶段,同时在解析和渲染阶段快速迭代。
使用文件系统来跟踪流水线状态,而不是数据库或内存结构,因为文件的存在提供了自然的幂等性和人类可读的调试信息。
data/{id}/
raw.json # acquire 阶段完成
prompt.md # prepare 阶段完成
response.md # process 阶段完成
parsed.json # parse 阶段完成
通过检查输出文件是否存在来判断某个项目是否需要处理。通过删除其输出文件和下游文件来重新运行一个阶段。通过直接读取中间文件进行调试。这种模式有效是因为每个目录都是独立的,支持简单的并行化和轻松的缓存。
为结构化、可解析的输出设计提示,因为提示设计直接决定了解析的可靠性。在每个结构化提示中包含以下元素:
构建能够优雅处理 LLM 输出变化的解析器,因为 LLM 不会完美地遵循指令。使用足够灵活的正则表达式模式来处理微小的格式变化,在缺少章节时提供合理的默认值,并记录解析失败以供审查,而不是直接崩溃。
使用支持智能体的模型通过快速迭代来加速开发:描述项目目标和约束,让智能体生成初始实现,针对特定失败进行测试和迭代,然后根据结果优化提示和架构。
采用这些实践,因为它们能保持智能体输出的专注性和高质量:
在开始之前估算 LLM 处理成本,因为令牌成本在规模上会迅速累积,而预算超支的后期发现会迫使进行昂贵的返工。使用以下公式:
Total cost = (items x tokens_per_item x price_per_token) + API overhead
对于批处理,估算每个项目的输入令牌数(提示 + 上下文),估算每个项目的输出令牌数(典型响应长度),乘以项目数量,并增加 20-30% 的缓冲以应对重试和失败。
在开发过程中跟踪实际成本。如果成本显著超过估算,则通过截断减少上下文长度,对较简单的项目使用较小的模型,缓存和重用部分结果,或者添加并行处理以减少实际运行时间。
对于具有独立项目的批处理,默认使用单智能体流水线,因为它们更易于管理、运行成本更低且更易于调试。仅当满足以下条件之一时,才升级到多智能体架构:
为上下文隔离选择多智能体,而不是角色拟人化。子智能体为专注的子任务获得新的上下文窗口,这可以防止长时间运行任务中的上下文退化。
有关详细的架构指导,请参阅 multi-agent-patterns 技能。
从最小架构开始,仅当生产证据证明有必要时才增加复杂性,因为过度设计的脚手架通常会限制而不是提升模型性能。
Vercel 的 d0 智能体通过将 17 个专用工具减少到 2 个基本工具:bash 命令执行和 SQL,将成功率从 80% 提高到 100%。文件系统智能体模式使用标准的 Unix 工具(grep、cat、find、ls)而不是自定义的探索工具。
在以下情况下简化:
在以下情况下增加复杂性:
有关详细的工具架构指导,请参阅 tool-design 技能。
从一开始就计划进行多次架构迭代,因为生产规模的智能体系统总是需要重构。Manus 自发布以来已经重构了其智能体框架五次。Bitter Lesson 表明,为当前模型限制添加的结构会随着模型的改进而成为约束。
通过遵循以下实践来构建适应变化的能力:
按顺序遵循此模板,因为每个步骤都在下一步投入努力之前验证假设。
任务分析
手动验证
架构选择
成本估算
开发计划
示例 1:批处理分析流水线(Karpathy 的 HN 时间胶囊)
任务:用后见之明评分分析 10 年前的 930 个 HN 讨论。
架构:
结果:总成本 58 美元,执行约 1 小时,输出静态 HTML。
示例 2:架构简化(Vercel d0)
任务:用于内部分析的文本转 SQL 智能体。
之前:17 个专用工具,80% 成功率,平均执行时间 274 秒。
之后:2 个工具(bash + SQL),100% 成功率,平均执行时间 77 秒。
关键洞察:语义层已经是良好的文档。Claude 只需要能够直接读取文件。
详细分析请参阅 案例研究。
此技能关联到:
内部参考资料:
此集合中的相关技能:
外部资源:
创建日期 : 2025-12-25 最后更新 : 2026-03-17 作者 : Agent Skills for Context Engineering Contributors 版本 : 1.1.0
每周安装次数
58
代码仓库
GitHub 星标数
544
首次出现
Jan 26, 2026
安全审计
安装于
opencode53
codex51
cursor49
github-copilot49
gemini-cli48
amp47
This skill covers the principles for identifying tasks suited to LLM processing, designing effective project architectures, and iterating rapidly using agent-assisted development. The methodology applies whether building a batch processing pipeline, a multi-agent research system, or an interactive agent application.
Activate this skill when:
Evaluate task-model fit before writing any code, because building automation on a fundamentally mismatched task wastes days of effort. Run every proposed task through these two tables to decide proceed-or-stop.
Proceed when the task has these characteristics:
| Characteristic | Rationale |
|---|---|
| Synthesis across sources | LLMs combine information from multiple inputs better than rule-based alternatives |
| Subjective judgment with rubrics | Grading, evaluation, and classification with criteria map naturally to language reasoning |
| Natural language output | When the goal is human-readable text, LLMs deliver it natively |
| Error tolerance | Individual failures do not break the overall system, so LLM non-determinism is acceptable |
| Batch processing | No conversational state required between items, which keeps context clean |
| Domain knowledge in training | The model already has relevant context, reducing prompt engineering overhead |
Stop when the task has these characteristics:
| Characteristic | Rationale |
|---|---|
| Precise computation | Math, counting, and exact algorithms are unreliable in language models |
| Real-time requirements | LLM latency is too high for sub-second responses |
| Perfect accuracy requirements | Hallucination risk makes 100% accuracy impossible |
| Proprietary data dependence | The model lacks necessary context and cannot acquire it from prompts alone |
| Sequential dependencies | Each step depends heavily on the previous result, compounding errors |
| Deterministic output requirements | Same input must produce identical output, which LLMs cannot guarantee |
Always validate task-model fit with a manual test before investing in automation. Copy one representative input into the model interface, evaluate the output quality, and use the result to answer these questions:
Do this because a failed manual prototype predicts a failed automated system, while a successful one provides both a quality baseline and a prompt-design template. The test takes minutes and prevents hours of wasted development.
Structure LLM projects as staged pipelines because separation of deterministic and non-deterministic stages enables fast iteration and cost control. Design each stage to be:
Use this canonical pipeline structure:
acquire -> prepare -> process -> parse -> render
Stages 1, 2, 4, and 5 are deterministic. Stage 3 is non-deterministic and expensive. Maintain this separation because it allows re-running the expensive LLM stage only when necessary, while iterating quickly on parsing and rendering.
Use the file system to track pipeline state rather than databases or in-memory structures, because file existence provides natural idempotency and human-readable debugging.
data/{id}/
raw.json # acquire stage complete
prompt.md # prepare stage complete
response.md # process stage complete
parsed.json # parse stage complete
Check if an item needs processing by checking whether the output file exists. Re-run a stage by deleting its output file and downstream files. Debug by reading the intermediate files directly. This pattern works because each directory is independent, enabling simple parallelization and trivial caching.
Design prompts for structured, parseable outputs because prompt design directly determines parsing reliability. Include these elements in every structured prompt:
Build parsers that handle LLM output variations gracefully, because LLMs do not follow instructions perfectly. Use regex patterns flexible enough for minor formatting variations, provide sensible defaults when sections are missing, and log parsing failures for review rather than crashing.
Use agent-capable models to accelerate development through rapid iteration: describe the project goal and constraints, let the agent generate initial implementation, test and iterate on specific failures, then refine prompts and architecture based on results.
Adopt these practices because they keep agent output focused and high-quality:
Estimate LLM processing costs before starting, because token costs compound quickly at scale and late discovery of budget overruns forces costly rework. Use this formula:
Total cost = (items x tokens_per_item x price_per_token) + API overhead
For batch processing, estimate input tokens per item (prompt + context), estimate output tokens per item (typical response length), multiply by item count, and add 20-30% buffer for retries and failures.
Track actual costs during development. If costs exceed estimates significantly, reduce context length through truncation, use smaller models for simpler items, cache and reuse partial results, or add parallel processing to reduce wall-clock time.
Default to single-agent pipelines for batch processing with independent items, because they are simpler to manage, cheaper to run, and easier to debug. Escalate to multi-agent architectures only when one of these conditions holds:
Choose multi-agent for context isolation, not role anthropomorphization. Sub-agents get fresh context windows for focused subtasks, which prevents context degradation on long-running tasks.
See multi-agent-patterns skill for detailed architecture guidance.
Start with minimal architecture and add complexity only when production evidence proves it necessary, because over-engineered scaffolding often constrains rather than enables model performance.
Vercel's d0 agent achieved 100% success rate (up from 80%) by reducing from 17 specialized tools to 2 primitives: bash command execution and SQL. The file system agent pattern uses standard Unix utilities (grep, cat, find, ls) instead of custom exploration tools.
Reduce when:
Add complexity when:
See tool-design skill for detailed tool architecture guidance.
Plan for multiple architectural iterations from the start, because production agent systems at scale always require refactoring. Manus refactored their agent framework five times since launch. The Bitter Lesson suggests that structures added for current model limitations become constraints as models improve.
Build for change by following these practices:
Follow this template in order, because each step validates assumptions before the next step invests effort.
Task Analysis
Manual Validation
Architecture Selection
Cost Estimation
Development Plan
Example 1: Batch Analysis Pipeline (Karpathy's HN Time Capsule)
Task: Analyze 930 HN discussions from 10 years ago with hindsight grading.
Architecture:
Results: $58 total cost, ~1 hour execution, static HTML output.
Example 2: Architectural Reduction (Vercel d0)
Task: Text-to-SQL agent for internal analytics.
Before: 17 specialized tools, 80% success rate, 274s average execution.
After: 2 tools (bash + SQL), 100% success rate, 77s average execution.
Key insight: The semantic layer was already good documentation. Claude just needed access to read files directly.
See Case Studies for detailed analysis.
This skill connects to:
Internal references:
Related skills in this collection:
External resources:
Created : 2025-12-25 Last Updated : 2026-03-17 Author : Agent Skills for Context Engineering Contributors Version : 1.1.0
Weekly Installs
58
Repository
GitHub Stars
544
First Seen
Jan 26, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
opencode53
codex51
cursor49
github-copilot49
gemini-cli48
amp47
AI界面设计评审工具 - 全面评估UI/UX设计质量、检测AI生成痕迹与优化用户体验
58,500 周安装
Python依赖自动升级工具ln-823-pip-upgrader:安全审计、破坏性变更检测
113 周安装
LLM 模式技能:AI 优先应用开发指南与最佳实践
112 周安装
Supabase 检测工具 - 快速识别 Web 应用后端是否为 Supabase | 安全审计必备
115 周安装
悖论寓言创作指南:如何通过叙事具象化智慧,创作保持张力的寓言故事
122 周安装
GDPR/CCPA隐私审计工具 - 自动化检测隐私政策与技术实现一致性,降低合规风险
113 周安装
Electron-egg 桌面应用开发框架:基于 Electron 和 Egg.js 的完整指南
44 周安装