大语言模型构建指南：60位AI专家分享提示工程、架构设计与评估方法

building-with-llms by refoundai/lenny-skills

728 周安装量

555 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/refoundai/lenny-skills --skill building-with-llms

AI/机器学习开发产品管理

🇨🇳中文介绍

基于大语言模型进行构建

帮助用户运用来自 60 位产品负责人和 AI 从业者的实用技巧，构建高效的 AI 应用。

如何提供帮助

当用户寻求基于大语言模型进行构建的帮助时：

理解他们的用例 - 询问他们正在构建什么（聊天机器人、智能体、内容生成、代码助手等）
诊断问题 - 帮助识别问题是与提示词相关、上下文相关还是模型选择相关
应用相关技术 - 分享特定的提示模式、架构方法或评估方法
指出常见错误 - 纠正过度依赖直觉、跳过评估或为任务选择错误模型的做法

核心原则

提示工程

少样本示例优于描述 Sander Schulhoff："如果让我推荐一种技巧，那就是少样本提示——给出你想要的例子。与其描述你的写作风格，不如粘贴几封之前的邮件，然后说'像这样写'。"

提供你的观点 Wes Kao："分享我的观点能让输出结果好得多。不要只是问'你会怎么说？' 告诉它：'我想拒绝，但我想保持关系。我理想的做法是……'"

对复杂任务使用分解法 Sander Schulhoff："问'首先需要解决哪些子问题？' 获取列表，逐一解决，然后综合。不要让模型一次性解决所有问题。"

自我批评能改进输出 Sander Schulhoff："让大语言模型检查并批评它自己的回答，然后改进它。当被提示去审视时，模型可以发现自己的错误。"

角色有助于风格，而非准确性 Sander Schulhoff："像'扮演教授'这样的角色对准确性任务没有帮助。但它们在控制创意作品的语气和风格方面非常出色。"

将上下文放在开头 Sander Schulhoff："将长上下文放在提示词的开头。它会被缓存（成本更低），并且模型在处理时不会忘记它的任务。"

架构

上下文工程 > 提示工程 Bret Taylor："如果模型做出了错误的决定，通常是因为缺乏上下文。从根本上解决——通过 MCP 或 RAG 提供更好的数据。"

RAG 质量 = 数据准备质量 Chip Huyen："最大的收益来自数据准备，而不是向量数据库的选择。将源数据重写为问答格式。为人类认为理所当然的上下文添加注释。"

分层模型以提高鲁棒性 Bret Taylor："让 AI 监督 AI 是有效的。分层认知步骤——一个模型生成，另一个模型审查。这可以将准确率从 90% 提高到 99%。"

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

帮助用户的问题

"你正在构建什么？核心的用户问题是什么？"
"模型最常出错的是什么？"
"你是在系统地衡量成功，还是仅凭感觉？"
"模型可以访问哪些上下文？"
"你试过少样本示例吗？"
"当你重试失败的提示词时会发生什么？"

需要指出的常见错误

永远凭直觉 - 最终你需要的是真正的评估，而不仅仅是"感觉不错"
只考虑提示词 - 通常解决方案是提供更好的上下文，而不是更好的提示词
一个模型用于所有事情 - 不同的模型擅长不同的任务
一次失败就放弃 - 随机系统需要重试
跳过人工审查 - AI 输出需要人工验证，尤其是在早期阶段

要查看来自 60 位嘉宾的全部 110 条见解，请参阅 references/guest-insights.md

AI 产品策略
AI 评估
直觉编码
评估新技术

🇺🇸English

Building with LLMs

Help the user build effective AI applications using practical techniques from 60 product leaders and AI practitioners.

How to Help

When the user asks for help building with LLMs:

Understand their use case - Ask what they're building (chatbot, agent, content generation, code assistant, etc.)
Diagnose the problem - Help identify if issues are prompt-related, context-related, or model-selection related
Apply relevant techniques - Share specific prompting patterns, architecture approaches, or evaluation methods
Challenge common mistakes - Push back on over-reliance on vibes, skipping evals, or using the wrong model for the task

Core Principles

Prompting

Few-shot examples beat descriptions Sander Schulhoff: "If there's one technique I'd recommend, it's few-shot prompting—giving examples of what you want. Instead of describing your writing style, paste a few previous emails and say 'write like this.'"

Provide your point of view Wes Kao: "Sharing my POV makes output way better. Don't just ask 'What would you say?' Tell it: 'I want to say no, but I'd like to preserve the relationship. Here's what I'd ideally do...'"

Use decomposition for complex tasks Sander Schulhoff: "Ask 'What subproblems need solving first?' Get the list, solve each one, then synthesize. Don't ask the model to solve everything at once."

Self-criticism improves output Sander Schulhoff: "Ask the LLM to check and critique its own response, then improve it. Models can catch their own errors when prompted to look."

Roles help style, not accuracy Sander Schulhoff: "Roles like 'Act as a professor' don't help accuracy tasks. But they're great for controlling tone and style in creative work."

Put context at the beginning Sander Schulhoff: "Place long context at the start of your prompt. It gets cached (cheaper), and the model won't forget its task when processing."

Architecture

Context engineering > prompt engineering Bret Taylor: "If a model makes a bad decision, it's usually lack of context. Fix it at the root—feed better data via MCP or RAG."

RAG quality = data prep quality Chip Huyen: "The biggest gains come from data preparation, not vector database choice. Rewrite source data into Q&A format. Add annotations for context humans take for granted."

Layer models for robustness Bret Taylor: "Having AI supervise AI is effective. Layer cognitive steps—one model generates, another reviews. This moves you from 90% to 99% accuracy."

Use specialized models for specialized tasks Amjad Masad: "We use Claude Sonnet for coding, other models for critiquing. A 'society of models' with different roles outperforms one general model."

200ms is the latency threshold Ryan J. Salva (GitHub Copilot): "The sweet spot for real-time suggestions is ~200ms. Slower feels like an interruption. Design your architecture around this constraint."

Evaluation

Evals are mandatory, not optional Kevin Weil (OpenAI): "Writing evals is becoming a core product skill. A 60% reliable model needs different UX than 95% or 99.5%. You can't design without knowing your accuracy."

Binary scores > Likert scales Hamel Husain: "Force Pass/Fail, not 1-5 scores. Scales produce meaningless averages like '3.7'. Binary forces real decisions."

Start with vibes, evolve to evals Howie Liu: "For novel products, start with open-ended vibes testing. Only move to formal evals once use cases converge."

Validate your LLM judge Hamel Husain: "If using LLM-as-judge, you must eval the eval. Measure agreement with human experts. Iterate until it aligns."

Building & Iteration

Retry failures—models are stochastic Benjamin Mann (Anthropic): "If it fails, try the exact same prompt again. Success rates are much higher on retry than on banging on a broken approach."

Be ambitious in your asks Benjamin Mann: "The difference between effective and ineffective Claude Code users: ambitious requests. Ask for the big change, not incremental tweaks."

Cross-pollinate between models Guillermo Rauch: "When stuck after 100+ iterations, copy the code to a different model (e.g., from v0 to ChatGPT o1). Fresh perspective unblocks you."

Compounding engineering Dan Shipper: "For every unit of work, make the next unit easier. Save prompts that work. Build a library. Your team's AI effectiveness compounds."

Working with AI Tools

Learn to read and debug, not memorize syntax Amjad Masad: "The ROI on coding doubles every 6 months because AI amplifies it. Focus on reading code and debugging—syntax is handled."

Use chat mode to understand Anton Osika: "Use 'chat mode' to ask the AI to explain its logic. 'Why did you do this? What am I missing?' Treat it as a tutor."

Vibe coding is a real skill Elena Verna: "I put vibe coding on my resume. Build functional prototypes with natural language before handing to engineering."

Questions to Help Users

"What are you building and what's the core user problem?"
"What does the model get wrong most often?"
"Are you measuring success systematically or going on vibes?"
"What context does the model have access to?"
"Have you tried few-shot examples?"
"What happens when you retry failed prompts?"

Common Mistakes to Flag

Vibes forever - Eventually you need real evals, not just "it feels good"
Prompt-only thinking - Often the fix is better context, not better prompts
One model for everything - Different models excel at different tasks
Giving up after one failure - Stochastic systems need retries
Skipping the human review - AI output needs human validation, especially early on

Deep Dive

For all 110 insights from 60 guests, see references/guest-insights.md

Related Skills

AI Product Strategy
AI Evals
Vibe Coding
Evaluating New Technology

Weekly Installs

728

Repository

refoundai/lenny-skills

GitHub Stars

555

First Seen

Jan 29, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykWarn

Installed on

opencode606

codex577

gemini-cli576

cursor542

claude-code538

github-copilot521

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

103,800 周安装