AI回归测试解决方案：解决AI辅助开发的系统性盲点与代码审查缺陷

ai-regression-testing by affaan-m/everything-claude-code

332 周安装量

102,100 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/affaan-m/everything-claude-code --skill ai-regression-testing

AI/机器学习自动化测试

🇨🇳中文介绍

AI 回归测试

专门为 AI 辅助开发设计的测试模式，在这种模式下，同一个模型既编写代码又审查代码——这会产生系统性的盲点，只有自动化测试才能发现。

何时激活

AI 代理（Claude Code、Cursor、Codex）修改了 API 路由或后端逻辑
发现并修复了一个错误——需要防止再次引入
项目具有沙盒/模拟模式，可用于无数据库测试
代码变更后运行 /bug-check 或类似的审查命令
存在多个代码路径（沙盒与生产环境、功能标志等）

核心问题

当 AI 编写代码然后审查自己的工作时，它会将相同的假设带入两个步骤。这造成了一种可预测的失败模式：

AI 编写修复 → AI 审查修复 → AI 说"看起来正确" → 错误仍然存在

真实示例（在生产环境中观察到）：

修复 1：将 notification_settings 添加到 API 响应
  → 忘记将其添加到 SELECT 查询中
  → AI 审查并遗漏了（相同的盲点）

修复 2：将其添加到 SELECT 查询
  → TypeScript 构建错误（列不在生成的类型中）
  → AI 审查了修复 1 但未发现 SELECT 问题

修复 3：改为 SELECT *
  → 修复了生产路径，忘记了沙盒路径
  → AI 审查并再次遗漏（第 4 次出现）

修复 4：测试在第一次运行时立即捕获 ✅

模式：沙盒/生产路径不一致是 AI 引入回归问题的首要原因。

沙盒模式 API 测试

大多数具有 AI 友好架构的项目都有沙盒/模拟模式。这是实现快速、无需数据库的 API 测试的关键。

设置（Vitest + Next.js App Router）

// vitest.config.ts
import { defineConfig } from "vitest/config";
import path from "path";

export default defineConfig({
  test: {
    environment: "node",
    globals: true,
    include: ["__tests__/**/*.test.ts"],
    setupFiles: ["__tests__/setup.ts"],
  },
  resolve: {
    alias: {
      "@": path.resolve(__dirname, "."),
    },
  },
});



// __tests__/setup.ts
// 强制沙盒模式 — 无需数据库
process.env.SANDBOX_MODE = "true";
process.env.NEXT_PUBLIC_SUPABASE_URL = "";
process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY = "";

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

Next.js API 路由的测试助手

// __tests__/helpers.ts
import { NextRequest } from "next/server";

export function createTestRequest(
  url: string,
  options?: {
    method?: string;
    body?: Record<string, unknown>;
    headers?: Record<string, string>;
    sandboxUserId?: string;
  },
): NextRequest {
  const { method = "GET", body, headers = {}, sandboxUserId } = options || {};
  const fullUrl = url.startsWith("http") ? url : `http://localhost:3000${url}`;
  const reqHeaders: Record<string, string> = { ...headers };

  if (sandboxUserId) {
    reqHeaders["x-sandbox-user-id"] = sandboxUserId;
  }

  const init: { method: string; headers: Record<string, string>; body?: string } = {
    method,
    headers: reqHeaders,
  };

  if (body) {
    init.body = JSON.stringify(body);
    reqHeaders["content-type"] = "application/json";
  }

  return new NextRequest(fullUrl, init);
}

export async function parseResponse(response: Response) {
  const json = await response.json();
  return { status: response.status, json };
}

关键原则：为发现的错误编写测试，而不是为正常工作的代码编写测试。

// __tests__/api/user/profile.test.ts
import { describe, it, expect } from "vitest";
import { createTestRequest, parseResponse } from "../../helpers";
import { GET, PATCH } from "@/app/api/user/profile/route";

// 定义契约 — 响应中必须包含哪些字段
const REQUIRED_FIELDS = [
  "id",
  "email",
  "full_name",
  "phone",
  "role",
  "created_at",
  "avatar_url",
  "notification_settings",  // ← 发现错误后添加
];

describe("GET /api/user/profile", () => {
  it("returns all required fields", async () => {
    const req = createTestRequest("/api/user/profile");
    const res = await GET(req);
    const { status, json } = await parseResponse(res);

    expect(status).toBe(200);
    for (const field of REQUIRED_FIELDS) {
      expect(json.data).toHaveProperty(field);
    }
  });

  // 回归测试 — 这个确切的错误被 AI 引入了 4 次
  it("notification_settings is not undefined (BUG-R1 regression)", async () => {
    const req = createTestRequest("/api/user/profile");
    const res = await GET(req);
    const { json } = await parseResponse(res);

    expect("notification_settings" in json.data).toBe(true);
    const ns = json.data.notification_settings;
    expect(ns === null || typeof ns === "object").toBe(true);
  });
});

测试沙盒/生产环境一致性

最常见的 AI 回归问题：修复了生产路径但忘记了沙盒路径（反之亦然）。

// 测试沙盒响应是否符合预期的契约
describe("GET /api/user/messages (conversation list)", () => {
  it("includes partner_name in sandbox mode", async () => {
    const req = createTestRequest("/api/user/messages", {
      sandboxUserId: "user-001",
    });
    const res = await GET(req);
    const { json } = await parseResponse(res);

    // 这捕获了一个错误：partner_name 被添加到了生产路径但未添加到沙盒路径
    if (json.data.length > 0) {
      for (const conv of json.data) {
        expect("partner_name" in conv).toBe(true);
      }
    }
  });
});

将测试集成到 Bug-Check 工作流中

自定义命令定义

<!-- .claude/commands/bug-check.md -->
# Bug Check

## Step 1: Automated Tests (mandatory, cannot skip)

Run these commands FIRST before any code review:

    npm run test       # Vitest test suite
    npm run build      # TypeScript type check + build

- If tests fail → report as highest priority bug
- If build fails → report type errors as highest priority
- Only proceed to Step 2 if both pass

## Step 2: Code Review (AI review)

1. Sandbox / production path consistency
2. API response shape matches frontend expectations
3. SELECT clause completeness
4. Error handling with rollback
5. Optimistic update race conditions

## Step 3: For each bug fixed, propose a regression test

User: "バグチェックして" (or "/bug-check")
  │
  ├─ Step 1: npm run test
  │   ├─ FAIL → Bug found mechanically (no AI judgment needed)
  │   └─ PASS → Continue
  │
  ├─ Step 2: npm run build
  │   ├─ FAIL → Type error found mechanically
  │   └─ PASS → Continue
  │
  ├─ Step 3: AI code review (with known blind spots in mind)
  │   └─ Findings reported
  │
  └─ Step 4: For each fix, write a regression test
      └─ Next bug-check catches if fix breaks

常见的 AI 回归模式

模式 1：沙盒/生产路径不匹配

频率：最常见（在 4 次回归中观察到 3 次）

// ❌ AI 仅将字段添加到生产路径
if (isSandboxMode()) {
  return { data: { id, email, name } };  // 缺少新字段
}
// Production path
return { data: { id, email, name, notification_settings } };

// ✅ 两条路径必须返回相同的结构
if (isSandboxMode()) {
  return { data: { id, email, name, notification_settings: null } };
}
return { data: { id, email, name, notification_settings } };

用于捕获它的测试：

it("sandbox and production return same fields", async () => {
  // 在测试环境中，沙盒模式被强制开启
  const res = await GET(createTestRequest("/api/user/profile"));
  const { json } = await parseResponse(res);

  for (const field of REQUIRED_FIELDS) {
    expect(json.data).toHaveProperty(field);
  }
});

模式 2：SELECT 子句遗漏

频率：在使用 Supabase/Prisma 添加新列时常见

// ❌ 新列已添加到响应但未添加到 SELECT
const { data } = await supabase
  .from("users")
  .select("id, email, name")  // notification_settings 不在这里
  .single();

return { data: { ...data, notification_settings: data.notification_settings } };
// → notification_settings 始终是 undefined

// ✅ 使用 SELECT * 或显式包含新列
const { data } = await supabase
  .from("users")
  .select("*")
  .single();

模式 3：错误状态泄漏

频率：中等——当向现有组件添加错误处理时

// ❌ 设置了错误状态但未清除旧数据
catch (err) {
  setError("Failed to load");
  // reservations 仍然显示上一个标签页的数据！
}

// ✅ 出错时清除相关状态
catch (err) {
  setReservations([]);  // 清除陈旧数据
  setError("Failed to load");
}

模式 4：乐观更新没有适当的回滚

// ❌ 失败时没有回滚
const handleRemove = async (id: string) => {
  setItems(prev => prev.filter(i => i.id !== id));
  await fetch(`/api/items/${id}`, { method: "DELETE" });
  // 如果 API 失败，项目从 UI 中消失但仍存在于数据库中
};

// ✅ 捕获先前状态并在失败时回滚
const handleRemove = async (id: string) => {
  const prevItems = [...items];
  setItems(prev => prev.filter(i => i.id !== id));
  try {
    const res = await fetch(`/api/items/${id}`, { method: "DELETE" });
    if (!res.ok) throw new Error("API error");
  } catch {
    setItems(prevItems);  // 回滚
    alert("削除に失敗しました");
  }
};

策略：在发现错误的地方进行测试

不要追求 100% 的覆盖率。而是：

在 /api/user/profile 中发现错误     → 为 profile API 编写测试
在 /api/user/messages 中发现错误    → 为 messages API 编写测试
在 /api/user/favorites 中发现错误   → 为 favorites API 编写测试
在 /api/user/notifications 中未发现错误 → 不编写测试（暂时）

为什么这在 AI 开发中有效：

AI 倾向于重复犯同一类错误
错误集中在复杂区域（身份验证、多路径逻辑、状态管理）
一旦经过测试，该确切的回归问题就不可能再次发生
测试数量随着错误修复而有机增长——没有浪费精力

AI 回归模式	测试策略	优先级
沙盒/生产环境不匹配	断言沙盒模式下响应结构相同	🔴 高
SELECT 子句遗漏	断言响应中包含所有必需字段	🔴 高
错误状态泄漏	断言出错时状态被清理	🟡 中
缺少回滚	断言 API 失败时状态被恢复	🟡 中
类型转换掩盖 null	断言字段不是 undefined	🟡 中

发现错误后立即编写测试（如果可能，在修复之前）
测试 API 响应结构，而不是实现细节
将运行测试作为每次 bug-check 的第一步
保持测试快速（在沙盒模式下总时间 < 1 秒）
根据测试要防止的错误来命名测试（例如，"BUG-R1 regression"）

为从未出过错的代码编写测试
相信 AI 自我审查可以替代自动化测试
因为"只是模拟数据"而跳过沙盒路径测试
在单元测试足够时编写集成测试
追求覆盖率百分比——追求回归预防

🇺🇸English

AI Regression Testing

Testing patterns specifically designed for AI-assisted development, where the same model writes code and reviews it — creating systematic blind spots that only automated tests can catch.

When to Activate

AI agent (Claude Code, Cursor, Codex) has modified API routes or backend logic
A bug was found and fixed — need to prevent re-introduction
Project has a sandbox/mock mode that can be leveraged for DB-free testing
Running /bug-check or similar review commands after code changes
Multiple code paths exist (sandbox vs production, feature flags, etc.)

The Core Problem

When an AI writes code and then reviews its own work, it carries the same assumptions into both steps. This creates a predictable failure pattern:

AI writes fix → AI reviews fix → AI says "looks correct" → Bug still exists

Real-world example (observed in production):

Fix 1: Added notification_settings to API response
  → Forgot to add it to the SELECT query
  → AI reviewed and missed it (same blind spot)

Fix 2: Added it to SELECT query
  → TypeScript build error (column not in generated types)
  → AI reviewed Fix 1 but didn't catch the SELECT issue

Fix 3: Changed to SELECT *
  → Fixed production path, forgot sandbox path
  → AI reviewed and missed it AGAIN (4th occurrence)

Fix 4: Test caught it instantly on first run ✅

The pattern: sandbox/production path inconsistency is the #1 AI-introduced regression.

Sandbox-Mode API Testing

Most projects with AI-friendly architecture have a sandbox/mock mode. This is the key to fast, DB-free API testing.

Setup (Vitest + Next.js App Router)

// vitest.config.ts
import { defineConfig } from "vitest/config";
import path from "path";

export default defineConfig({
  test: {
    environment: "node",
    globals: true,
    include: ["__tests__/**/*.test.ts"],
    setupFiles: ["__tests__/setup.ts"],
  },
  resolve: {
    alias: {
      "@": path.resolve(__dirname, "."),
    },
  },
});



// __tests__/setup.ts
// Force sandbox mode — no database needed
process.env.SANDBOX_MODE = "true";
process.env.NEXT_PUBLIC_SUPABASE_URL = "";
process.env.NEXT_PUBLIC_SUPABASE_ANON_KEY = "";

Test Helper for Next.js API Routes

// __tests__/helpers.ts
import { NextRequest } from "next/server";

export function createTestRequest(
  url: string,
  options?: {
    method?: string;
    body?: Record<string, unknown>;
    headers?: Record<string, string>;
    sandboxUserId?: string;
  },
): NextRequest {
  const { method = "GET", body, headers = {}, sandboxUserId } = options || {};
  const fullUrl = url.startsWith("http") ? url : `http://localhost:3000${url}`;
  const reqHeaders: Record<string, string> = { ...headers };

  if (sandboxUserId) {
    reqHeaders["x-sandbox-user-id"] = sandboxUserId;
  }

  const init: { method: string; headers: Record<string, string>; body?: string } = {
    method,
    headers: reqHeaders,
  };

  if (body) {
    init.body = JSON.stringify(body);
    reqHeaders["content-type"] = "application/json";
  }

  return new NextRequest(fullUrl, init);
}

export async function parseResponse(response: Response) {
  const json = await response.json();
  return { status: response.status, json };
}

Writing Regression Tests

The key principle: write tests for bugs that were found, not for code that works.

// __tests__/api/user/profile.test.ts
import { describe, it, expect } from "vitest";
import { createTestRequest, parseResponse } from "../../helpers";
import { GET, PATCH } from "@/app/api/user/profile/route";

// Define the contract — what fields MUST be in the response
const REQUIRED_FIELDS = [
  "id",
  "email",
  "full_name",
  "phone",
  "role",
  "created_at",
  "avatar_url",
  "notification_settings",  // ← Added after bug found it missing
];

describe("GET /api/user/profile", () => {
  it("returns all required fields", async () => {
    const req = createTestRequest("/api/user/profile");
    const res = await GET(req);
    const { status, json } = await parseResponse(res);

    expect(status).toBe(200);
    for (const field of REQUIRED_FIELDS) {
      expect(json.data).toHaveProperty(field);
    }
  });

  // Regression test — this exact bug was introduced by AI 4 times
  it("notification_settings is not undefined (BUG-R1 regression)", async () => {
    const req = createTestRequest("/api/user/profile");
    const res = await GET(req);
    const { json } = await parseResponse(res);

    expect("notification_settings" in json.data).toBe(true);
    const ns = json.data.notification_settings;
    expect(ns === null || typeof ns === "object").toBe(true);
  });
});

Testing Sandbox/Production Parity

The most common AI regression: fixing production path but forgetting sandbox path (or vice versa).

// Test that sandbox responses match the expected contract
describe("GET /api/user/messages (conversation list)", () => {
  it("includes partner_name in sandbox mode", async () => {
    const req = createTestRequest("/api/user/messages", {
      sandboxUserId: "user-001",
    });
    const res = await GET(req);
    const { json } = await parseResponse(res);

    // This caught a bug where partner_name was added
    // to production path but not sandbox path
    if (json.data.length > 0) {
      for (const conv of json.data) {
        expect("partner_name" in conv).toBe(true);
      }
    }
  });
});

Integrating Tests into Bug-Check Workflow

Custom Command Definition

<!-- .claude/commands/bug-check.md -->
# Bug Check

## Step 1: Automated Tests (mandatory, cannot skip)

Run these commands FIRST before any code review:

    npm run test       # Vitest test suite
    npm run build      # TypeScript type check + build

- If tests fail → report as highest priority bug
- If build fails → report type errors as highest priority
- Only proceed to Step 2 if both pass

## Step 2: Code Review (AI review)

1. Sandbox / production path consistency
2. API response shape matches frontend expectations
3. SELECT clause completeness
4. Error handling with rollback
5. Optimistic update race conditions

## Step 3: For each bug fixed, propose a regression test

The Workflow

User: "バグチェックして" (or "/bug-check")
  │
  ├─ Step 1: npm run test
  │   ├─ FAIL → Bug found mechanically (no AI judgment needed)
  │   └─ PASS → Continue
  │
  ├─ Step 2: npm run build
  │   ├─ FAIL → Type error found mechanically
  │   └─ PASS → Continue
  │
  ├─ Step 3: AI code review (with known blind spots in mind)
  │   └─ Findings reported
  │
  └─ Step 4: For each fix, write a regression test
      └─ Next bug-check catches if fix breaks

Common AI Regression Patterns

Pattern 1: Sandbox/Production Path Mismatch

Frequency : Most common (observed in 3 out of 4 regressions)

// ❌ AI adds field to production path only
if (isSandboxMode()) {
  return { data: { id, email, name } };  // Missing new field
}
// Production path
return { data: { id, email, name, notification_settings } };

// ✅ Both paths must return the same shape
if (isSandboxMode()) {
  return { data: { id, email, name, notification_settings: null } };
}
return { data: { id, email, name, notification_settings } };

Test to catch it :

it("sandbox and production return same fields", async () => {
  // In test env, sandbox mode is forced ON
  const res = await GET(createTestRequest("/api/user/profile"));
  const { json } = await parseResponse(res);

  for (const field of REQUIRED_FIELDS) {
    expect(json.data).toHaveProperty(field);
  }
});

Pattern 2: SELECT Clause Omission

Frequency : Common with Supabase/Prisma when adding new columns

// ❌ New column added to response but not to SELECT
const { data } = await supabase
  .from("users")
  .select("id, email, name")  // notification_settings not here
  .single();

return { data: { ...data, notification_settings: data.notification_settings } };
// → notification_settings is always undefined

// ✅ Use SELECT * or explicitly include new columns
const { data } = await supabase
  .from("users")
  .select("*")
  .single();

Pattern 3: Error State Leakage

Frequency : Moderate — when adding error handling to existing components

// ❌ Error state set but old data not cleared
catch (err) {
  setError("Failed to load");
  // reservations still shows data from previous tab!
}

// ✅ Clear related state on error
catch (err) {
  setReservations([]);  // Clear stale data
  setError("Failed to load");
}

Pattern 4: Optimistic Update Without Proper Rollback

// ❌ No rollback on failure
const handleRemove = async (id: string) => {
  setItems(prev => prev.filter(i => i.id !== id));
  await fetch(`/api/items/${id}`, { method: "DELETE" });
  // If API fails, item is gone from UI but still in DB
};

// ✅ Capture previous state and rollback on failure
const handleRemove = async (id: string) => {
  const prevItems = [...items];
  setItems(prev => prev.filter(i => i.id !== id));
  try {
    const res = await fetch(`/api/items/${id}`, { method: "DELETE" });
    if (!res.ok) throw new Error("API error");
  } catch {
    setItems(prevItems);  // Rollback
    alert("削除に失敗しました");
  }
};

Strategy: Test Where Bugs Were Found

Don't aim for 100% coverage. Instead:

Bug found in /api/user/profile     → Write test for profile API
Bug found in /api/user/messages    → Write test for messages API
Bug found in /api/user/favorites   → Write test for favorites API
No bug in /api/user/notifications  → Don't write test (yet)

Why this works with AI development:

AI tends to make the same category of mistake repeatedly
Bugs cluster in complex areas (auth, multi-path logic, state management)
Once tested, that exact regression cannot happen again
Test count grows organically with bug fixes — no wasted effort

Quick Reference

AI Regression Pattern	Test Strategy	Priority
Sandbox/production mismatch	Assert same response shape in sandbox mode	🔴 High
SELECT clause omission	Assert all required fields in response	🔴 High
Error state leakage	Assert state cleanup on error	🟡 Medium
Missing rollback	Assert state restored on API failure	🟡 Medium
Type cast masking null	Assert field is not undefined	🟡 Medium

DO / DON'T

DO:

Write tests immediately after finding a bug (before fixing it if possible)
Test the API response shape, not the implementation
Run tests as the first step of every bug-check
Keep tests fast (< 1 second total with sandbox mode)
Name tests after the bug they prevent (e.g., "BUG-R1 regression")

DON'T:

Write tests for code that has never had a bug
Trust AI self-review as a substitute for automated tests
Skip sandbox path testing because "it's just mock data"
Write integration tests when unit tests suffice
Aim for coverage percentage — aim for regression prevention

Weekly Installs

332

Repository

affaan-m/everyt…ude-code

GitHub Stars

102.1K

First Seen

8 days ago

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

codex317

cursor284

github-copilot281

gemini-cli281

opencode281

cline281

AI Elements：基于shadcn/ui的AI原生应用组件库，快速构建对话界面

54,900 周安装