software-backend by vasilyu1983/ai-agents-public
npx skills add https://github.com/vasilyu1983/ai-agents-public --skill software-backend使用此技能来设计、实现和评审生产级后端服务:API 边界、数据层、认证、缓存、可观测性、错误处理、测试和部署。
默认倾向于:类型安全边界(在边缘进行验证)、使用 OpenTelemetry 进行可观测性、零信任假设、重试的幂等性、RFC 9457 错误、Postgres + 连接池、结构化日志、超时和速率限制。
脚手架规则:当搭建新项目时,展示所有领域逻辑的完整工作实现——欺诈规则、审计日志、Webhook 处理器、验证管道、后台作业。不要仅仅引用文件名或存根函数;展示实际代码,以便用户可以立即运行。
| 任务 | 默认选择 | 备注 |
|---|---|---|
| REST API | Fastify / Express / NestJS | 倾向于类型化边界 + 显式超时 |
| Edge API | Hono / 平台原生处理器 | 保持工作无状态、CPU 负载轻 |
| 类型安全 API | tRPC | 倾向于用于 TS 单体仓库和内部 API |
| GraphQL API | Apollo Server / Pothos | 倾向于用于复杂的客户端驱动查询 |
| 数据库 | PostgreSQL | 使用连接池 + 迁移 + 查询预算 |
| ORM / 查询层 | Prisma / Drizzle / SQLAlchemy / GORM / SeaORM / EF Core | 倾向于显式事务 |
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
| 认证 | OIDC/OAuth + sessions/JWT | 对于浏览器倾向于使用 httpOnly cookies |
| 验证 | Zod / Pydantic / validator libs | 在边界验证,不要在深层内部验证 |
| 缓存 | Redis(或托管服务) | 使用 TTL + 失效策略 |
| 后台作业 | BullMQ / 平台队列 | 使作业具有幂等性 + 重试安全 |
| 测试 | 单元 + 集成 + 契约/E2E | 保持大多数测试在 UI 层之下 |
| 可观测性 | 结构化日志 + OpenTelemetry | 端到端的关联 ID |
使用此技能来:
在以下情况时,请使用其他技能:
根据最强的约束条件来选择,而不是功能列表:
| 约束条件 | 默认选择 | 原因 |
|---|---|---|
| 团队只懂 TypeScript | Fastify/Hono + Prisma/Drizzle | 生态系统深度,招聘容易 |
| 需要 <50ms P95,CPU 密集型工作 | Go (net/http + sqlc/pgx) | Goroutines 隔离 CPU 工作;无事件循环风险 |
| 数据密集型 / ML 集成 | Python (FastAPI + SQLAlchemy) | 最适合 numpy/pandas/ML 管道的生态系统 |
| 内存安全关键 | Rust (Axum + SeaORM/SQLx) | 零成本抽象,无 GC |
| 企业/.NET 团队 | C# (ASP.NET Core + EF Core) | Azure 集成,成熟的工具链 |
| 边缘/无服务器 | Hono / 平台原生处理器 | 无状态,CPU 负载轻,冷启动快 |
| 金融科技/审计敏感 | Go + sqlc(或原始 SQL) | ORM 魔法是一种负担;你需要可审计的 SQL |
有关详细的框架/ORM/认证/缓存选择树,请参见 references/edge-deployment-guide.md 和特定语言的参考。请参见 assets/ 获取每种语言的入门模板。
所有变更操作必须支持幂等性,以确保重试安全。
实现:
// Idempotency key header
const idempotencyKey = request.headers['idempotency-key'];
const cached = await redis.get(`idem:${idempotencyKey}`);
if (cached) return JSON.parse(cached);
const result = await processOperation();
await redis.set(`idem:${idempotencyKey}`, JSON.stringify(result), 'EX', 86400);
return result;
| 应做 | 避免 |
|---|---|
| 使用 TTL(通常 24 小时)存储幂等性键 | 处理重复请求 |
| 对重复键返回缓存的响应 | 对相同键返回不同的响应 |
| 使用客户端生成的 UUID | 服务器生成的键 |
| 模式 | 使用场景 | 示例 |
|---|---|---|
| 基于游标 | 大型数据集,实时数据 | ?cursor=abc123&limit=20 |
| 基于偏移量 | 小型数据集,随机访问 | ?page=3&per_page=20 |
| 键集 | 排序数据,高性能 | ?after_id=1000&limit=20 |
对于频繁插入的 API,倾向于使用基于游标的分页。
使用一致的机器可读错误格式(RFC 9457 问题详情):https://www.rfc-editor.org/rfc/rfc9457
{
"type": "https://example.com/problems/invalid-request",
"title": "Invalid request",
"status": 400,
"detail": "email is required",
"instance": "/v1/users"
}
// 存活检查:进程是否在运行?
app.get('/health/live', (req, res) => {
res.status(200).json({ status: 'ok' });
});
// 就绪检查:服务能否处理流量?
app.get('/health/ready', async (req, res) => {
const dbOk = await checkDatabase();
const cacheOk = await checkRedis();
if (dbOk && cacheOk) {
res.status(200).json({ status: 'ready', db: 'ok', cache: 'ok' });
} else {
res.status(503).json({ status: 'not ready', db: dbOk, cache: cacheOk });
}
});
| 避免 | 替代方案 | 原因 |
|---|---|---|
| N+1 查询 | include/select 或 DataLoader | 10-100 倍的性能损失;在 ORM 代码中容易遗漏 |
| 无请求超时 | HTTP 客户端、数据库、处理程序上的超时 | 挂起的依赖项会级联故障;参见下面的生产强化 |
| 缺少连接池 | Prisma 池 / PgBouncer / pgx 池 | 在共享数据库层级下负载时耗尽连接 |
| 静默捕获错误 | 记录 + 重新抛出或显式处理 | 隐藏的故障,无法调试 |
这些模式将“在开发环境中工作”与“在生产环境中存活”区分开来。除非明确提示,模型往往会忽略它们——请将它们添加到每个服务中。
每个出站调用都需要超时。没有超时,挂起的依赖项会泄漏连接并级联故障。
// HTTP 客户端超时
const response = await fetch(url, { signal: AbortSignal.timeout(5000) });
// 数据库查询超时 (Prisma)
await prisma.$queryRaw`SET statement_timeout = '3000'`;
// Express/Fastify 请求超时
server.register(import('@fastify/timeout'), { timeout: 30000 });
| 层级 | 默认超时 | 原理 |
|---|---|---|
| HTTP 客户端调用 | 5s | 外部 API 不应阻塞你 |
| 数据库查询 | 3s | 慢查询 = 缺少索引或执行计划不佳 |
| 请求处理程序 | 30s | 整个请求生命周期的安全网 |
| 后台作业 | 5min | 运行时间更长的作业需要分块处理 |
SELECT *)ORM 默认获取所有列。在宽表上,这会浪费带宽并隐藏性能问题。
// 错误:获取所有 30 列
const users = await prisma.user.findMany({ include: { posts: true } });
// 正确:仅获取端点需要的内容
const users = await prisma.user.findMany({
select: { id: true, name: true, email: true },
include: { posts: { select: { id: true, title: true } } }
});
对于 Go (sqlc):在 SQL 查询中编写显式的列列表——sqlc 天然强制执行这一点。对于 Python (SQLAlchemy):使用 load_only() 或显式列选择。
从一开始就返回机器可读的错误。客户端不应通过正则表达式解析错误消息。
{
"type": "https://api.example.com/problems/validation-error",
"title": "Validation failed",
"status": 422,
"detail": "email must be a valid email address",
"instance": "/v1/users",
"errors": [{ "field": "email", "message": "invalid format" }]
}
设置 Content-Type: application/problem+json。此格式是一个标准 (RFC 9457),任何 HTTP 客户端都可以解析。
在将任何新查询部署到生产环境之前,验证其执行计划:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT ... FROM ... WHERE ...;
输出中的危险信号:大表上的 Seq Scan、高行数估计的 Nested Loop、无索引的 Sort。在部署之前添加索引或重写查询。
当服务变慢时,按顺序检查这些层级。首先修复成本最低的层级——在修复 N+1 查询之前不要添加缓存。
| 步骤 | 检查内容 | 修复方法 |
|---|---|---|
| 1. 查询分析 | 启用查询日志,查找 N+1 查询和慢查询 | 使用 include/joins 重写,为字段级优化添加 select |
| 2. 索引 | 对慢查询运行 EXPLAIN ANALYZE | 添加匹配 WHERE + ORDER BY 模式的复合索引 |
| 3. 连接池 | 检查连接数与池大小的关系 | 配置池限制(Prisma connection_limit、PgBouncer、pgx 池) |
| 4. 缓存 | 识别读取密集、很少变化的数据 | 添加带 TTL + 失效策略的 Redis/内存缓存 |
| 5. 超时 | 检查数据库、HTTP、处理程序上是否缺少超时 | 在每个层级添加超时(参见上面的生产强化) |
| 6. 平台调优 | 共享数据库限制、冷启动、内存 | 升级层级、添加只读副本、调整运行时设置 |
关键原则:始终在修复前后进行测量。使用带有请求 ID 的结构化日志来端到端跟踪特定的慢请求。
后端架构决策直接影响成本和收入。有关详细的成本建模、SLA 到收入的映射、单位经济学清单和 FinOps 实践,请参见 references/infrastructure-economics.md。
资源
共享工具(集中化模式 - 提取,不要重复)
CC-*)模板
相关技能
当用户询问对版本敏感的建议问题时,在断言“最佳”选择或引用版本之前,请先进行快速新鲜度检查。
data/sources.json 开始(官方文档、发布说明、支持策略)。每周安装次数
105
仓库
GitHub 星标数
46
首次出现
2026年1月23日
安全审计
安装于
opencode86
codex83
gemini-cli82
cursor80
github-copilot78
amp69
Use this skill to design, implement, and review production-grade backend services: API boundaries, data layer, auth, caching, observability, error handling, testing, and deployment.
Defaults to bias toward: type-safe boundaries (validation at the edge), OpenTelemetry for observability, zero-trust assumptions, idempotency for retries, RFC 9457 errors, Postgres + pooling, structured logs, timeouts, and rate limiting.
Scaffolding rule : When scaffolding a new project, show full working implementations for all domain logic — fraud rules, audit logging, webhook handlers, validation pipelines, background jobs. Don't just reference file names or stub functions; show the actual code so the user can run it immediately.
| Task | Default Picks | Notes |
|---|---|---|
| REST API | Fastify / Express / NestJS | Prefer typed boundaries + explicit timeouts |
| Edge API | Hono / platform-native handlers | Keep work stateless, CPU-light |
| Type-Safe API | tRPC | Prefer for TS monorepos and internal APIs |
| GraphQL API | Apollo Server / Pothos | Prefer for complex client-driven queries |
| Database | PostgreSQL | Use pooling + migrations + query budgets |
| ORM / Query Layer | Prisma / Drizzle / SQLAlchemy / GORM / SeaORM / EF Core | Prefer explicit transactions |
| Authentication | OIDC/OAuth + sessions/JWT | Prefer httpOnly cookies for browsers |
| Validation | Zod / Pydantic / validator libs | Validate at the boundary, not deep inside |
| Caching | Redis (or managed) | Use TTLs + invalidation strategy |
| Background Jobs | BullMQ / platform queues | Make jobs idempotent + retry-safe |
| Testing | Unit + integration + contract/E2E | Keep most tests below the UI layer |
| Observability | Structured logs + OpenTelemetry | Correlation IDs end-to-end |
Use this skill to:
Use a different skill when:
Pick based on the strongest constraint, not feature lists:
| Constraint | Default Pick | Why |
|---|---|---|
| Team knows TypeScript only | Fastify/Hono + Prisma/Drizzle | Ecosystem depth, hiring ease |
| Need <50ms P95, CPU-bound work | Go (net/http + sqlc/pgx) | Goroutines isolate CPU work; no event-loop risk |
| Data-heavy / ML integration | Python (FastAPI + SQLAlchemy) | Best ecosystem for numpy/pandas/ML pipelines |
| Memory-safety critical | Rust (Axum + SeaORM/SQLx) | Zero-cost abstractions, no GC |
| Enterprise/.NET team | C# (ASP.NET Core + EF Core) | Azure integration, mature tooling |
| Edge/serverless | Hono / platform-native handlers | Stateless, CPU-light, fast cold starts |
| Fintech/audit-sensitive | Go + sqlc (or raw SQL) | ORM magic is a liability; you need auditable SQL |
For detailed framework/ORM/auth/caching selection trees, see references/edge-deployment-guide.md and language-specific references. See assets/ for starter templates per language.
All mutating operations MUST support idempotency for retry safety.
Implementation:
// Idempotency key header
const idempotencyKey = request.headers['idempotency-key'];
const cached = await redis.get(`idem:${idempotencyKey}`);
if (cached) return JSON.parse(cached);
const result = await processOperation();
await redis.set(`idem:${idempotencyKey}`, JSON.stringify(result), 'EX', 86400);
return result;
| Do | Avoid |
|---|---|
| Store idempotency keys with TTL (24h typical) | Processing duplicate requests |
| Return cached response for duplicate keys | Different responses for same key |
| Use client-generated UUIDs | Server-generated keys |
| Pattern | Use When | Example |
|---|---|---|
| Cursor-based | Large datasets, real-time data | ?cursor=abc123&limit=20 |
| Offset-based | Small datasets, random access | ?page=3&per_page=20 |
| Keyset | Sorted data, high performance | ?after_id=1000&limit=20 |
Prefer cursor-based pagination for APIs with frequent inserts.
Use a consistent machine-readable error format (RFC 9457 Problem Details): https://www.rfc-editor.org/rfc/rfc9457
{
"type": "https://example.com/problems/invalid-request",
"title": "Invalid request",
"status": 400,
"detail": "email is required",
"instance": "/v1/users"
}
// Liveness: Is the process running?
app.get('/health/live', (req, res) => {
res.status(200).json({ status: 'ok' });
});
// Readiness: Can the service handle traffic?
app.get('/health/ready', async (req, res) => {
const dbOk = await checkDatabase();
const cacheOk = await checkRedis();
if (dbOk && cacheOk) {
res.status(200).json({ status: 'ready', db: 'ok', cache: 'ok' });
} else {
res.status(503).json({ status: 'not ready', db: dbOk, cache: cacheOk });
}
});
| Avoid | Instead | Why |
|---|---|---|
| N+1 queries | include/select or DataLoader | 10-100x perf hit; easy to miss in ORM code |
| No request timeouts | Timeouts on HTTP clients, DB, handlers | Hung deps cascade; see Production Hardening below |
| Missing connection pooling | Prisma pool / PgBouncer / pgx pool | Exhaustion under load on shared DB tiers |
| Catching errors silently | Log + rethrow or handle explicitly | Hidden failures, impossible to debug |
These are the patterns that separate "works in dev" from "survives production." Models tend to skip them unless explicitly prompted — add them to every service.
Every outbound call needs a timeout. Without one, a hung dependency leaks connections and cascades failures.
// HTTP client timeout
const response = await fetch(url, { signal: AbortSignal.timeout(5000) });
// Database query timeout (Prisma)
await prisma.$queryRaw`SET statement_timeout = '3000'`;
// Express/Fastify request timeout
server.register(import('@fastify/timeout'), { timeout: 30000 });
| Layer | Default Timeout | Rationale |
|---|---|---|
| HTTP client calls | 5s | External APIs shouldn't block you |
| Database queries | 3s | Slow queries = missing index or bad plan |
| Request handler | 30s | Safety net for the whole request lifecycle |
| Background jobs | 5min | Jobs that run longer need chunking |
SELECT *)ORMs default to fetching all columns. On wide tables this wastes bandwidth and hides performance problems.
// BAD: fetches all 30 columns
const users = await prisma.user.findMany({ include: { posts: true } });
// GOOD: fetch only what the endpoint needs
const users = await prisma.user.findMany({
select: { id: true, name: true, email: true },
include: { posts: { select: { id: true, title: true } } }
});
For Go (sqlc): write explicit column lists in SQL queries — sqlc enforces this naturally. For Python (SQLAlchemy): use load_only() or explicit column selection.
Return machine-readable errors from day one. Clients shouldn't have to regex-parse error messages.
{
"type": "https://api.example.com/problems/validation-error",
"title": "Validation failed",
"status": 422,
"detail": "email must be a valid email address",
"instance": "/v1/users",
"errors": [{ "field": "email", "message": "invalid format" }]
}
Set Content-Type: application/problem+json. This format is a standard (RFC 9457) and parseable by any HTTP client.
Before shipping any new query to production, verify its execution plan:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT ... FROM ... WHERE ...;
Red flags in the output: Seq Scan on large tables, Nested Loop with high row estimates, Sort without index. Add indexes or rewrite the query before deploying.
When a service is slow, work through these layers in order. Fix the cheapest layer first — don't add caching before fixing N+1 queries.
| Step | What to Check | Fix |
|---|---|---|
| 1. Query analysis | Enable query logging, find N+1s and slow queries | Rewrite with include/joins, add select for field-level optimization |
| 2. Indexing | Run EXPLAIN ANALYZE on slow queries | Add composite indexes matching WHERE + ORDER BY patterns |
| 3. Connection pooling | Check connection count vs. pool size | Configure pool limits (Prisma connection_limit, PgBouncer, pgx pool) |
| 4. Caching | Identify read-heavy, rarely-changing data | Add Redis/in-memory cache with TTL + invalidation strategy |
| 5. Timeouts | Check for missing timeouts on DB, HTTP, handlers |
Key principle : always measure before and after. Use structured logging with request IDs to trace specific slow requests end-to-end.
Backend architecture decisions directly impact cost and revenue. See references/infrastructure-economics.md for detailed cost modeling, SLA-to-revenue mapping, unit economics checklists, and FinOps practices.
Resources
Shared Utilities (Centralized patterns - extract, don't duplicate)
Templates
Related Skills
When users ask version-sensitive recommendation questions, do a quick freshness check before asserting "best" choices or quoting versions.
data/sources.json (official docs, release notes, support policies).Weekly Installs
105
Repository
GitHub Stars
46
First Seen
Jan 23, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
opencode86
codex83
gemini-cli82
cursor80
github-copilot78
amp69
agent-browser 浏览器自动化工具 - Vercel Labs 命令行网页操作与测试
157,400 周安装
| Add timeouts at every layer (see Production Hardening above) |
| 6. Platform tuning | Shared DB limits, cold starts, memory | Upgrade tier, add read replicas, tune runtime settings |
CC-*