⚠️

重要前提

安装AI Skills的关键前提是：必须科学上网，且开启TUN模式，这一点至关重要，直接决定安装能否顺利完成，在此郑重提醒三遍：科学上网，科学上网，科学上网。查看完整安装教程 →

reduce-unoptimized-query-oracle

reduce-unoptimized-query-oracle by cockroachdb/cockroach

58 周安装量

32,000 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/cockroachdb/cockroach --skill reduce-unoptimized-query-oracle

🇨🇳中文介绍

缩减未优化查询预言测试失败

将未优化查询预言测试失败日志缩减至最简单的复现案例。

未优化查询预言 roachtest 运行一系列随机 SQL 语句以创建随机数据集，然后以不同的优化设置执行两次随机"关注查询"。如果两次执行返回不同结果，则表明 CockroachDB 中存在错误。

使用时机

在以下情况使用此技能：

你遇到了来自未优化查询预言 roachtest 的测试失败。
你需要找到能复现测试失败的最小化 SQL。

步骤 1：定位产物文件

询问用户产物目录的位置。

在产物目录中找到相关文件：

测试参数 : params.log（来自 roachtest 的参数）
测试日志 : test.log（来自 roachtest 的日志）
失败日志 : failure*.log（来自 roachtest 的失败日志）
完整 SQL 日志 : unoptimized-query-oracle*.log（导致失败的 SQL 语句）
关注查询日志 : unoptimized-query-oracle*.failure.log（包含关注查询以及可能的更多失败信息）

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

步骤 2：确定测试配置

从 cockroach.log 确定 git 提交：

grep "binary: CockroachDB" cockroach.log

在版本字符串中查找提交哈希（例如 cb94db961b8f55e3473f279d98ae90f0eeb0adcb）。

通过检查以下内容确定是否启用了运行时断言：

params.log 中的 "runtimeAssertionsBuild": "true"
或 test.log 中的 Runtime assertions enabled

通过查找以下内容确定是否应用了 metamorphic 设置：

params.log 中的类似行：

"metamorphicBufferedSender": "true",
"metamorphicWriteBuffering": "true",

或 test.log 中的类似行：

metamorphically setting "kv.rangefeed.buffered_sender.enabled" to 'true'
metamorphically setting "kv.transaction.write_buffering.enabled" to 'true'

从 cockroach.log 开头确定环境变量：

grep -A10 "using local environment variables:" cockroach.log

重要的环境变量包括：

COCKROACH_INTERNAL_CHECK_CONSISTENCY_FATAL
COCKROACH_INTERNAL_DISABLE_METAMORPHIC_TESTING
COCKROACH_RANDOM_SEED
COCKROACH_TESTING_FORCE_RELEASE_BRANCH 但可能还有其他重要环境变量，因此最好获取所有变量。

通过检查以下内容确定这是多区域测试还是单区域测试：

测试名称（例如 test.log 中的 seed-multi-region 表示多区域）
或完整 SQL 日志中是否存在 \connect 行如果两者都缺失，则为单区域测试。

步骤 3：检出并构建

对于普通构建，使用：

git checkout <commit-hash>
./dev build short

如果启用了运行时断言，则使用测试构建：

git checkout <commit-hash>
./dev build short -- --crdb_test

注意： 仅当复现使用地理空间函数（BOX2D、geometry、geography 等）时才构建 libgeos：

./dev build libgeos

步骤 4：准备完整 SQL 日志文件

首先，检查完整 SQL 日志文件顶部是否存在以下语句。如果不存在，则添加它们：

SET statement_timeout='1m0s';
SET sql_safe_updates = false;

如果使用了 metamorphic 设置，也将其添加到完整 SQL 日志文件的顶部：

SET CLUSTER SETTING kv.rangefeed.buffered_sender.enabled = true;
SET CLUSTER SETTING kv.transaction.write_buffering.enabled = true;

在产物目录或仓库根目录中创建一个合适的目录来存放临时文件。

步骤 5：初始复现

根据测试类型确定正确的 demo 命令：

多区域测试 : 使用 --nodes=9
单区域测试 : 省略 --nodes 选项

使用类似以下命令尝试从完整 SQL 日志文件复现测试失败。此命令可能需要长达 20 分钟才能完成。

<env vars> ./cockroach demo --multitenant=false --nodes=9 --insecure --set=errexit=false --no-example-database --format=tsv -f <full-sql-log-file>

检查输出是否复现了失败日志中描述的测试失败。 存在许多可能的失败模式。查找以下一种，它应与失败日志匹配：

关注查询两次执行之间的结果不同（这是日志末尾附近重复两次的随机生成的 SELECT 语句，包裹在各种 SET 和 RESET 语句中）。这些不同的结果可能表现为不同的结果集，也可能是一种情况出现错误而另一种情况没有错误。这是**"预言"失败**。
或者，internal error 或断言失败。记下错误信息以供缩减步骤使用。
或者，panic。记下错误信息以供缩减步骤使用。
或者，超时。记下超时的语句。

重要提示： 许多失败是非确定性的，尤其是对于多区域测试。如果第一次运行没有发生失败，请尝试最多 10 次，然后才得出结论认为无法复现。

此时，将输出与 failure*.log 进行比较会很有帮助，后者应显示原始测试运行中的失败情况。

如果初始运行尝试 10 次后仍无法复现，请在此处暂停并向用户报告无法复现失败，并显示已尝试的命令。 用户可能有其他指示。

如果看起来可以复现，则可以继续下一步。

步骤 6：使用缩减工具

构建缩减工具：

./dev build reduce

再次准备完整 SQL 日志文件

对于多区域测试，移除 \connect 行（它们会导致 reduce 工具出现语法错误）：

grep -v '^\\connect' <full-sql-log-file> > <cleaned-log>

重要提示： 缩减工具必须从 cockroach 仓库根目录运行，因为它会在当前目录中查找 ./cockroach。

对于多区域测试使用 -multi-region 选项，对于单区域测试则省略。

对于"预言"失败（结果不同）：

./bin/reduce -unoptimized-query-oracle -multi-region -chunk 25 -v -file <cleaned-log> 2>&1 | tee reduce-output.log

-unoptimized-query-oracle 选项检查"关注查询"的两次执行是否产生相同结果。

对于内部错误/断言失败/panic：

./bin/reduce -contains "<error-regex>" -multi-region -chunk 25 -v -file <cleaned-log> 2>&1 | tee reduce-output.log

使用错误信息的独特部分作为 -contains 正则表达式（例如 "nil LeafTxnInputState"）。

缩减工具可能需要长达一小时才能运行。

提取缩减后的 SQL

缩减工具输出进度行，然后是最终的 SQL。仅提取 SQL：

grep -A1000 "^reduction: " reduce-output.log | tail -n +2 > reduced.sql

重要提示： 在手动简化之前立即保存缩减输出的备份：

cp reduced.sql reduced_original.sql

这提供了一个恢复点，以防工作文件在简化过程中损坏。

如果缩减工具无法复现，请在此处暂停并向用户报告。他们可能有其他指示。 偶尔，如果测试失败无法复现，我们必须修改缩减工具本身。

步骤 7：创建测试脚本并确定复现率

重要提示： 许多错误是非确定性的。在手动简化之前，创建一个可重用的测试脚本并确定复现率。

创建一个小型测试脚本（根据需要调整）：

cat > test_repro.sh << 'EOF'
#!/bin/bash
# 测试 reduced_v2.sql 是否能复现错误（首次成功时退出，最多尝试 10 次）
for i in {1..10}; do
  if ./cockroach demo --multitenant=false --nodes=9 --insecure \
     --set=errexit=false --no-example-database --format=tsv \
     -f reduced_v2.sql 2>&1 | grep -q "<error-pattern>"; then
    echo "Run $i: REPRODUCED"
    exit 0
  else
    echo "Run $i: no error"
  fi
done
echo "FAILED"
EOF
chmod +x test_repro.sh

对于"预言"失败，测试脚本可能需要隔离并比较"关注查询"两次执行的结果，而不是检查错误模式。

运行测试脚本以确定复现率。它并不总是 100%。

此速率决定了测试简化时需要多少次尝试：

100% 速率：单次尝试足够
50% 速率：通常 2-3 次尝试足够
10% 速率：需要约 10 次尝试才能确信
<5% 速率：可能需要 20+ 次尝试

请注意，在某些情况下，可能需要将以下设置添加回缩减后的文件中才能获得复现：

SET statement_timeout='1m0s';
SET sql_safe_updates = false;

如果缩减后的 SQL 在 10 次尝试后仍无法复现，请在此处暂停并向用户报告。他们可能有其他指示。

步骤 8：手动简化

现在迭代简化 SQL，同时保持复现能力。

关键提示： 对于非确定性失败，你必须根据复现率用足够的尝试次数测试每个简化。单次尝试失败并不意味着简化破坏了复现能力——可能只是非确定性。

每次简化的工作流程

复制 reduced.sql 到 reduced_v2.sql
对 reduced_v2.sql 做一个小更改
运行 ./test_repro.sh（它测试 reduced_v2.sql）
如果复现：将 reduced_v2.sql 复制到 reduced.sql，继续简化
如果在足够尝试后仍未复现：丢弃 reduced_v2.sql，尝试不同的更改（即回退）。

此工作流程避免了需要恢复文件——你始终在 reduced.sql 中保留最后一个工作版本。

重要提示： 将复制、编辑和测试作为单独的 bash 命令运行（不要用 && 链接）。这减少了权限检查的次数。

尝试移除的内容（大致顺序）

查询投影和聚合 - 将 SELECT 列表简化为仅必要的列
查询谓词 - 简化 WHERE 子句
索引 - 尝试移除二级索引
查询连接 - 简化 WHERE 子句
CREATE TABLE 中的列 - 移除失败查询中未引用的列
奇怪字符 - 移除或替换名称和数据中的非 ASCII 字符
其他 SQL 简化

对于"预言"失败，编辑关注查询时，请确保编辑关注查询的两个副本，使它们完全相同。 否则，在比较结果集时就不是公平比较。

常见的必需元素

这些通常无法移除：

优化器随机种子 : SET testing_optimizer_random_seed = <value> - 这个特定值通常不能更改，因为它决定了哪些优化器规则被禁用
优化器规则概率 : SET testing_optimizer_disable_rule_probability
- 影响查询计划选择
优化器设置的特定 RESET/SET 序列，例如 distsql 和 vectorize
某些索引（影响查询计划）
分布式查询错误的多节点设置（--nodes=9）（尽管先尝试单节点——它可能有效且更简单）
CREATE STATISTICS 语句（影响查询规划）

如果一个更改破坏了复现能力：

丢弃 reduced_v2.sql（不要复制到 reduced.sql）
验证 reduced.sql 是否仍能复现。如果不能，这意味着复现是非确定性的。（它可能一开始就是非确定性的，或者在简化过程中变得非确定性。）尝试复现 10 次并记下新的复现率。使用新的复现率来调整后续每个简化步骤中的复现尝试次数。
尝试不同的简化

永远不要从破坏的状态继续简化。

如果你卡住了（即回退后再次无法复现），请停止并向用户报告你正在尝试的确切命令。

步骤 9：最终验证和输出

经过约 20 分钟的简化，或者回退几次后没有更多简化，就该停止了。

运行复现 10+ 次以确认稳定性并确定最终复现率
记录最小化复现步骤
注意哪些元素是必需的，哪些是可选的

最终输出应包括两个可以展示给用户的文件：

reduced.sql - 复现错误的最小化 SQL 脚本
bisect_run.sh - 用于 git bisect run 的脚本

以可以复制粘贴到终端的方式编写输出。

（此输出中的命令应编辑为匹配复现所需的内容。）

# 最小化复现

# reduced.sql
cat > reduced.sql << 'EOF'
CREATE TABLE t ();

SET testing_optimizer_random_seed = 1234567890;
SET testing_optimizer_disable_rule_probability = 0.5;

SELECT ...;
EOF

# bisect_run.sh
cat > bisect_run.sh << 'EOF'
#!/bin/bash
# Git bisect 运行脚本
# 退出代码：0=良好（错误不存在），1=不良（错误存在），125=跳过（构建失败）

REPO_DIR="/path/to/cockroach"
REPRO_SQL="/path/to/reduced.sql"

cd "$REPO_DIR" || exit 125

echo "=== Testing commit $(git rev-parse --short HEAD) ==="

# 构建（如果原始测试中启用了运行时断言，则使用 --crdb_test）
if ! ./dev build short -- --crdb_test 2>&1 | grep -q "Successfully built"; then
    echo "BUILD FAILED - skipping"
    exit 125
fi

# 测试错误（对于不稳定的错误尝试 3 次）
for i in {1..3}; do
    if ./cockroach demo --multitenant=false --insecure \
        --set=errexit=false --no-example-database --format=tsv \
        -f "$REPRO_SQL" 2>&1 | grep -q "<error-pattern>"; then
        echo "BUG PRESENT - marking as BAD"
        exit 1
    fi
done

echo "Bug not present - marking as GOOD"
exit 0
EOF
chmod +x bisect_run.sh

# 复现命令
git checkout <commit-hash>
./bisect_run.sh

# 二分查找命令
git bisect start ...
git bisect run bisect_run.sh

# 失败
# <在此处粘贴堆栈跟踪或相关失败详细信息>

# 复现率：约 X%（可能需要多次尝试）

显示此输出后，询问用户是否想在 master 分支上尝试复现错误。

可选步骤 10：检查错误是否已在 Master 分支修复

在二分查找之前，检查错误是否已在 master 分支修复。

git stash  # 如果需要
git checkout master
./dev build short -- --crdb_test
./cockroach demo --multitenant=false --insecure --set=errexit=false --no-example-database --format=tsv -f reduced.sql

运行几次以考虑不稳定性。注意错误在 master 分支是否复现。

可选步骤 11：二分查找

如果用户想找到引入或修复错误的提交，使用 git bisect。

如果错误已在 Master 分支修复

二分查找以找到修复提交（错误不再复现的第一个提交）。由于"良好"提交（master）比"不良"提交更新，使用自定义术语：

git bisect start --first-parent --term-old=broken --term-new=fixed
git bisect broken <commit-where-bug-exists>   # 例如，原始失败提交
git bisect fixed master                        # master 已修复

git bisect run ./bisect_run.sh

# 完成后
git bisect reset

注意： --first-parent 选项仅遵循主分支上的合并提交，避免进入功能分支。当错误不存在（已修复）时，二分脚本必须返回 0；当错误存在（损坏）时返回 1。

如果错误在 Master 分支仍然存在

二分查找以找到回归提交（引入错误的第一个提交）：

git bisect start --first-parent
git bisect good <known-good-commit>   # 例如，之前的发布标签
git bisect bad master                  # master 存在错误

git bisect run ./bisect_run.sh

# 完成后
git bisect reset

二分查找将识别引入或修复错误的提交。

如果你不知道一个良好提交（错误不存在的地方），可以跳回过去寻找一个。

# 在主分支上查找约 6 个月前的提交
git rev-list --first-parent -1 --before="6 months ago" HEAD

测试错误在该提交是否存在。如果不存在，将其用作二分查找的良好提交。如果错误仍然存在，尝试更早的时间，但不要超过 1 年。

如果在 1 年内找不到已知的良好提交，请停止并向用户报告。

🇺🇸English

Reduce Unoptimized Query Oracle Test Failure

Reduce an unoptimized-query-oracle test failure log to the simplest possible reproduction case.

The unoptimized-query-oracle roachtest runs a series of random SQL statements to create a random dataset, and then executes a random "Query of Interest" twice, with different optimization settings. If the two executions return different results, it indicates a bug in CockroachDB.

When to Use

Use this skill when:

You have a test failure from the unoptimized-query-oracle roachtest.
You need to find the minimal SQL to reproduce the test failure.

Step 1: Locate artifacts

Ask the user where the artifacts directory is.

Find the relevant files in the artifacts directory:

Test parameters : params.log (the parameters from the roachtest)
Test log : test.log (the log from the roachtest)
Failure log : failure*.log (the failure log from the roachtest)
Full SQL log : unoptimized-query-oracle*.log (the SQL statements that led to failure)
Query of interest log : unoptimized-query-oracle*.failure.log (containing the query of interest and possibly more information about the failure)
Cockroach log : logs/1.unredacted/cockroach.log or logs/unredacted/cockroach.log (contains the git commit)

Step 2: Determine test configuration

Determine the git commit from cockroach.log:

grep "binary: CockroachDB" cockroach.log

Look for the commit hash in the version string (e.g., cb94db961b8f55e3473f279d98ae90f0eeb0adcb).

Determine if runtime assertions are enabled by checking for:

"runtimeAssertionsBuild": "true" in params.log
or Runtime assertions enabled in test.log

Determine if metamorphic settings apply by looking for:

lines like these in params.log:

"metamorphicBufferedSender": "true",
"metamorphicWriteBuffering": "true",

or lines like these in test.log:

metamorphically setting "kv.rangefeed.buffered_sender.enabled" to 'true'
metamorphically setting "kv.transaction.write_buffering.enabled" to 'true'

Determine environment variables from the beginning of cockroach.log:

grep -A10 "using local environment variables:" cockroach.log

Important environment variables include:

COCKROACH_INTERNAL_CHECK_CONSISTENCY_FATAL
COCKROACH_INTERNAL_DISABLE_METAMORPHIC_TESTING
COCKROACH_RANDOM_SEED
COCKROACH_TESTING_FORCE_RELEASE_BRANCH But there might be more important environment variables, so best to get all of them.

Determine if this is a multi-region test or single-region test by checking:

the test name (e.g., seed-multi-region in test.log indicates multi-region)
or the presence of \connect lines in the full SQL log If both of these are missing, it's a single-region test.

Step 3: Check Out and Build

For a normal build use:

git checkout <commit-hash>
./dev build short

If runtime assertions were enabled, use a test build instead:

git checkout <commit-hash>
./dev build short -- --crdb_test

Note: Only build libgeos if the reproduction uses geospatial functions (BOX2D, geometry, geography, etc.):

./dev build libgeos

Step 4: Prepare the Full SQL Log File

First, check that the following statements are at the top of the full SQL log file. If they are not, add them:

SET statement_timeout='1m0s';
SET sql_safe_updates = false;

If metamorphic settings were used, also add them to the top of the full SQL log file:

SET CLUSTER SETTING kv.rangefeed.buffered_sender.enabled = true;
SET CLUSTER SETTING kv.transaction.write_buffering.enabled = true;

Create an appropriate directory either in the artifacts directory or in the repository root for holding temp files.

Step 5: Initial Reproduction

Determine the correct demo command based on test type:

Multi-region test : Use --nodes=9
Single-region test : Omit --nodes option

Use a command like this to try reproducing the test failure from the full SQL log file. This command could take up to 20 minutes to finish.

<env vars> ./cockroach demo --multitenant=false --nodes=9 --insecure --set=errexit=false --no-example-database --format=tsv -f <full-sql-log-file>

Check that the output reproduces the test failure described in the failure log. There are many possible failure modes. Look for one of the following, which should match the failure log:

Different results between the two executions of the "Query of Interest" (which is the randomly generated SELECT statement repeated twice near the end of the log, wrapped in various SET and RESET staements). These different results could take the form of different result sets, or could also be an error in one case and no error in the other case. This is an "oracle" failure.
Or, internal error or assertion failure. Note the error message for the reduce step.
Or, a panic. Note the error message for the reduce step.
Or, a timeout. Note the statement that timed out.

Troubleshooting

IMPORTANT: Many failures are nondeterministic, especially for multi-region tests. If no failure happens on the first run, try up to 10 times before concluding it doesn't reproduce.

It can be helpful at this point to compare the output with the failure*.log which should show the failure from the original test run.

If the initial run fails to reproduce after 10 times, pause here and report to the user that the failure cannot be reproduced, and show the command that was tried. The user might have additional instructions.

If it looks like it reproduces, it's time to move on to the next step.

Step 6: Use the Reduce Tool

Build the reduce tool:

./dev build reduce

Prepare the Full SQL Log File again

For multi-region tests, remove \connect lines (they cause syntax errors in the reduce tool):

grep -v '^\\connect' <full-sql-log-file> > <cleaned-log>

Run Reduce

IMPORTANT: The reduce tool must be run from the cockroach repository root directory, because it looks for ./cockroach in the current directory.

Use the -multi-region option for multi-region tests, or omit it for single-region tests.

For "oracle" failures (different results):

./bin/reduce -unoptimized-query-oracle -multi-region -chunk 25 -v -file <cleaned-log> 2>&1 | tee reduce-output.log

The -unoptimized-query-oracle option checks whether the two executions of the "Query of Interest" produce the same results.

For internal errors/assertion failures/panics:

./bin/reduce -contains "<error-regex>" -multi-region -chunk 25 -v -file <cleaned-log> 2>&1 | tee reduce-output.log

Use a distinctive part of the error message as the -contains regex (e.g., "nil LeafTxnInputState").

The reduce tool might take up to an hour to run.

Extract the Reduced SQL

The reduce tool outputs progress lines followed by the final SQL. Extract just the SQL:

grep -A1000 "^reduction: " reduce-output.log | tail -n +2 > reduced.sql

IMPORTANT: Immediately save a backup of the reduce output before manual simplification:

cp reduced.sql reduced_original.sql

This provides a recovery point if the working file gets corrupted during simplification.

If the reduce tool fails to reproduce, pause here and report this to the user. They might have additional instructions. Occasionally we have to modify the reduce tool itself, if the test failure is not reproducing.

Step 7: Create Test Script and Determine Reproduction Rate

IMPORTANT: Many bugs are nondeterministic. Before manual simplification, create a reusable test script and determine the reproduction rate.

Create a small test script (adjust as needed):

cat > test_repro.sh << 'EOF'
#!/bin/bash
# Test if reduced_v2.sql reproduces the error (exits on first success, up to 10 attempts)
for i in {1..10}; do
  if ./cockroach demo --multitenant=false --nodes=9 --insecure \
     --set=errexit=false --no-example-database --format=tsv \
     -f reduced_v2.sql 2>&1 | grep -q "<error-pattern>"; then
    echo "Run $i: REPRODUCED"
    exit 0
  else
    echo "Run $i: no error"
  fi
done
echo "FAILED"
EOF
chmod +x test_repro.sh

For "oracle" failures, instead of checking for an error pattern, the test script probably needs to isolate and diff the results of the two executions of the "Query of Interest".

Run the test script to determine the reproduction rate. It's not always 100%.

This rate determines how many attempts you need when testing simplifications:

100% rate: Single attempt sufficient
50% rate: 2-3 attempts usually sufficient
10% rate: Need ~10 attempts to be confident
<5% rate: May need 20+ attempts

Note that in some cases, the following settings might need to be added back to the reduced file to get a repro:

SET statement_timeout='1m0s';
SET sql_safe_updates = false;

If the reduced SQL fails to reproduce after 10 attempts, pause here and report this to the user. They might have additional instructions.

Step 8: Manual Simplification

Now iteratively simplify the SQL while maintaining reproduction.

CRITICAL: For nondeterministic failures, you MUST test each simplification with enough attempts based on the repro rate. A single failed attempt does NOT mean the simplification broke the repro - it may just be nondeterminism.

Workflow for Each Simplification

Copy reduced.sql to reduced_v2.sql
Make ONE small change to reduced_v2.sql
Run ./test_repro.sh (which tests reduced_v2.sql)
If it reproduces: Copy reduced_v2.sql to reduced.sql, continue simplifying
If it doesn't reproduce after enough attempts: Discard reduced_v2.sql, try a different change (i.e. backtrack).

This workflow avoids needing to restore files - you always keep the last working version in reduced.sql.

IMPORTANT: Run copy, edit, and test as separate bash commands (not chained with &&). This reduces the number of permission checks.

What to Try Removing (in rough order)

Query projections and aggregations - Simplify SELECT list to just essential columns
Query predicates - Simplify WHERE clause
Indexes - Try removing secondary indexes
Query joins - Simplify WHERE clause
Columns from CREATE TABLE - Remove columns not referenced in the failing query
Weird characters - Remove or replace non-ASCII characters from names and data
other SQL simplifications

For "oracle" failures, when editing the Query of Interest, be sure to edit BOTH copies of the Query of Interest so that they are identical. Otherwise it won't be an apples-to-apples comparison when diffing the result sets.

Common Required Elements

These often cannot be removed:

Optimizer random seed : SET testing_optimizer_random_seed = <value> - this specific value often cannot be changed, as it determines which optimizer rules are disabled
Optimizer rule probability : SET testing_optimizer_disable_rule_probability
- affects query plan selection
Specific RESET/SET sequences for optimizer settings, such as distsql and vectorize
Certain indexes (affect query plans)
Multi-node setup (--nodes=9) for distributed query bugs (though try single-node first - it may work and is simpler)
CREATE STATISTICS statements (affect query planning)

Backtracking

If a change breaks reproduction:

Discard reduced_v2.sql (don't copy it to reduced.sql)
Verify reduced.sql still reproduces. If it doesn't, this means the repro is nondeterministic. (It might have started out nondeterministic, or might have become nondeterministic over the course of simplification.) Try reproducing it 10 times and note the new repro rate. Use the new repro rate to adjust the number of repro attempts during each simplification step going forward.
Try a DIFFERENT simplification

Never continue simplifying from a broken state.

If you get stuck (i.e. cannot reproduce again after backtracking), stop and report to the user with the exact command you were trying.

Step 9: Final Verification and Output

After about 20 minutes of simplification, or if there are no more simplifications after backtracking a few times, it's time to stop.

Run reproduction 10+ times to confirm stability and determine final repro rate
Document the minimal reproduction steps
Note which elements were required vs optional

Output

The final output should include two files that can be shown to the user:

reduced.sql - The minimal SQL script that reproduces the bug
bisect_run.sh - A script for use with git bisect run

Write the output in such a way that it could be copied and pasted into a terminal.

Example Output Format

(The commands in this output should be edited to match what was necessary to reproduce.)

# Minimal Reproduction

# reduced.sql
cat > reduced.sql << 'EOF'
CREATE TABLE t ();

SET testing_optimizer_random_seed = 1234567890;
SET testing_optimizer_disable_rule_probability = 0.5;

SELECT ...;
EOF

# bisect_run.sh
cat > bisect_run.sh << 'EOF'
#!/bin/bash
# Git bisect run script
# Exit codes: 0=good (bug not present), 1=bad (bug present), 125=skip (build failed)

REPO_DIR="/path/to/cockroach"
REPRO_SQL="/path/to/reduced.sql"

cd "$REPO_DIR" || exit 125

echo "=== Testing commit $(git rev-parse --short HEAD) ==="

# Build (use --crdb_test if runtime assertions were enabled in the original test)
if ! ./dev build short -- --crdb_test 2>&1 | grep -q "Successfully built"; then
    echo "BUILD FAILED - skipping"
    exit 125
fi

# Test for bug (try 3 times for flaky bugs)
for i in {1..3}; do
    if ./cockroach demo --multitenant=false --insecure \
        --set=errexit=false --no-example-database --format=tsv \
        -f "$REPRO_SQL" 2>&1 | grep -q "<error-pattern>"; then
        echo "BUG PRESENT - marking as BAD"
        exit 1
    fi
done

echo "Bug not present - marking as GOOD"
exit 0
EOF
chmod +x bisect_run.sh

# Command to reproduce
git checkout <commit-hash>
./bisect_run.sh

# Command to bisect
git bisect start ...
git bisect run bisect_run.sh

# Failure
# <paste stacktrace or relevant failure details here>

# Repro rate: ~X% (may need multiple attempts)

After showing this output, ask the user if they want to try reproducing the bug on master branch.

Optional Step 10: Check if Bug is Fixed on Master

Before bisecting, check whether the bug has already been fixed on master.

git stash  # if needed
git checkout master
./dev build short -- --crdb_test
./cockroach demo --multitenant=false --insecure --set=errexit=false --no-example-database --format=tsv -f reduced.sql

Run this a few times to account for flakiness. Note whether the bug reproduces on master or not.

Optional Step 11: Bisect

If the user wants to find the commit that introduced or fixed the bug, use git bisect.

If the Bug is Already Fixed on Master

Bisect to find the fix commit (the first commit where the bug no longer reproduces). Use custom terms since the "good" commit (master) is newer than the "bad" commit:

git bisect start --first-parent --term-old=broken --term-new=fixed
git bisect broken <commit-where-bug-exists>   # e.g., the original failing commit
git bisect fixed master                        # master is fixed

git bisect run ./bisect_run.sh

# When done
git bisect reset

Note: The --first-parent option follows only merge commits on the main branch, avoiding detours into feature branches. The bisect script must return 0 when the bug is NOT present (fixed) and 1 when the bug IS present (broken).

If the Bug Still Exists on Master

Bisect to find the regression commit (the first commit where the bug was introduced):

git bisect start --first-parent
git bisect good <known-good-commit>   # e.g., a previous release tag
git bisect bad master                  # master has the bug

git bisect run ./bisect_run.sh

# When done
git bisect reset

The bisect will identify the commit that introduced or fixed the bug.

Finding a Good Commit

If you don't know a good commit (where the bug doesn't exist), you can jump back in time to find one.

# Find a commit from ~6 months ago on the main branch
git rev-list --first-parent -1 --before="6 months ago" HEAD

Test whether the bug exists at that commit. If not, use it as the good commit for bisect. If the bug still exists, try going back further in time, but don't go back further than 1 year.

If a known good commit can't be found within 1 year, stop and report this to the user.

Weekly Installs

Repository

cockroachdb/cockroach

GitHub Stars

32.1K

First Seen

Feb 6, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

opencode48

gemini-cli48

github-copilot48

codex48

kimi-cli48

amp48

身份认证漏洞测试指南：OWASP Top 10 Broken Authentication 攻防与修复

57 周安装