重要前提
安装AI Skills的关键前提是:必须科学上网,且开启TUN模式,这一点至关重要,直接决定安装能否顺利完成,在此郑重提醒三遍:科学上网,科学上网,科学上网。查看完整安装教程 →
reduce-unoptimized-query-oracle by cockroachdb/cockroach
npx skills add https://github.com/cockroachdb/cockroach --skill reduce-unoptimized-query-oracle将未优化查询预言测试失败日志缩减至最简单的复现案例。
未优化查询预言 roachtest 运行一系列随机 SQL 语句以创建随机数据集,然后以不同的优化设置执行两次随机"关注查询"。如果两次执行返回不同结果,则表明 CockroachDB 中存在错误。
在以下情况使用此技能:
询问用户产物目录的位置。
在产物目录中找到相关文件:
params.log(来自 roachtest 的参数)test.log(来自 roachtest 的日志)failure*.log(来自 roachtest 的失败日志)unoptimized-query-oracle*.log(导致失败的 SQL 语句)unoptimized-query-oracle*.failure.log(包含关注查询以及可能的更多失败信息)广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
logs/1.unredacted/cockroach.log 或 logs/unredacted/cockroach.log(包含 git 提交信息)从 cockroach.log 确定 git 提交:
grep "binary: CockroachDB" cockroach.log
在版本字符串中查找提交哈希(例如 cb94db961b8f55e3473f279d98ae90f0eeb0adcb)。
通过检查以下内容确定是否启用了运行时断言:
params.log 中的 "runtimeAssertionsBuild": "true"test.log 中的 Runtime assertions enabled通过查找以下内容确定是否应用了 metamorphic 设置:
params.log 中的类似行:
"metamorphicBufferedSender": "true",
"metamorphicWriteBuffering": "true",
或 test.log 中的类似行:
metamorphically setting "kv.rangefeed.buffered_sender.enabled" to 'true'
metamorphically setting "kv.transaction.write_buffering.enabled" to 'true'
从 cockroach.log 开头确定环境变量:
grep -A10 "using local environment variables:" cockroach.log
重要的环境变量包括:
COCKROACH_INTERNAL_CHECK_CONSISTENCY_FATALCOCKROACH_INTERNAL_DISABLE_METAMORPHIC_TESTINGCOCKROACH_RANDOM_SEEDCOCKROACH_TESTING_FORCE_RELEASE_BRANCH 但可能还有其他重要环境变量,因此最好获取所有变量。通过检查以下内容确定这是多区域测试还是单区域测试:
test.log 中的 seed-multi-region 表示多区域)\connect 行 如果两者都缺失,则为单区域测试。对于普通构建,使用:
git checkout <commit-hash>
./dev build short
如果启用了运行时断言,则使用测试构建:
git checkout <commit-hash>
./dev build short -- --crdb_test
注意: 仅当复现使用地理空间函数(BOX2D、geometry、geography 等)时才构建 libgeos:
./dev build libgeos
首先,检查完整 SQL 日志文件顶部是否存在以下语句。如果不存在,则添加它们:
SET statement_timeout='1m0s';
SET sql_safe_updates = false;
如果使用了 metamorphic 设置,也将其添加到完整 SQL 日志文件的顶部:
SET CLUSTER SETTING kv.rangefeed.buffered_sender.enabled = true;
SET CLUSTER SETTING kv.transaction.write_buffering.enabled = true;
在产物目录或仓库根目录中创建一个合适的目录来存放临时文件。
根据测试类型确定正确的 demo 命令:
--nodes=9--nodes 选项使用类似以下命令尝试从完整 SQL 日志文件复现测试失败。此命令可能需要长达 20 分钟才能完成。
<env vars> ./cockroach demo --multitenant=false --nodes=9 --insecure --set=errexit=false --no-example-database --format=tsv -f <full-sql-log-file>
检查输出是否复现了失败日志中描述的测试失败。 存在许多可能的失败模式。查找以下一种,它应与失败日志匹配:
internal error 或断言失败。记下错误信息以供缩减步骤使用。重要提示: 许多失败是非确定性的,尤其是对于多区域测试。如果第一次运行没有发生失败,请尝试最多 10 次,然后才得出结论认为无法复现。
此时,将输出与 failure*.log 进行比较会很有帮助,后者应显示原始测试运行中的失败情况。
如果初始运行尝试 10 次后仍无法复现,请在此处暂停并向用户报告无法复现失败,并显示已尝试的命令。 用户可能有其他指示。
如果看起来可以复现,则可以继续下一步。
构建缩减工具:
./dev build reduce
对于多区域测试,移除 \connect 行(它们会导致 reduce 工具出现语法错误):
grep -v '^\\connect' <full-sql-log-file> > <cleaned-log>
重要提示: 缩减工具必须从 cockroach 仓库根目录运行,因为它会在当前目录中查找 ./cockroach。
对于多区域测试使用 -multi-region 选项,对于单区域测试则省略。
对于"预言"失败(结果不同):
./bin/reduce -unoptimized-query-oracle -multi-region -chunk 25 -v -file <cleaned-log> 2>&1 | tee reduce-output.log
-unoptimized-query-oracle 选项检查"关注查询"的两次执行是否产生相同结果。
对于内部错误/断言失败/panic:
./bin/reduce -contains "<error-regex>" -multi-region -chunk 25 -v -file <cleaned-log> 2>&1 | tee reduce-output.log
使用错误信息的独特部分作为 -contains 正则表达式(例如 "nil LeafTxnInputState")。
缩减工具可能需要长达一小时才能运行。
缩减工具输出进度行,然后是最终的 SQL。仅提取 SQL:
grep -A1000 "^reduction: " reduce-output.log | tail -n +2 > reduced.sql
重要提示: 在手动简化之前立即保存缩减输出的备份:
cp reduced.sql reduced_original.sql
这提供了一个恢复点,以防工作文件在简化过程中损坏。
如果缩减工具无法复现,请在此处暂停并向用户报告。他们可能有其他指示。 偶尔,如果测试失败无法复现,我们必须修改缩减工具本身。
重要提示: 许多错误是非确定性的。在手动简化之前,创建一个可重用的测试脚本并确定复现率。
创建一个小型测试脚本(根据需要调整):
cat > test_repro.sh << 'EOF'
#!/bin/bash
# 测试 reduced_v2.sql 是否能复现错误(首次成功时退出,最多尝试 10 次)
for i in {1..10}; do
if ./cockroach demo --multitenant=false --nodes=9 --insecure \
--set=errexit=false --no-example-database --format=tsv \
-f reduced_v2.sql 2>&1 | grep -q "<error-pattern>"; then
echo "Run $i: REPRODUCED"
exit 0
else
echo "Run $i: no error"
fi
done
echo "FAILED"
EOF
chmod +x test_repro.sh
对于"预言"失败,测试脚本可能需要隔离并比较"关注查询"两次执行的结果,而不是检查错误模式。
运行测试脚本以确定复现率。它并不总是 100%。
此速率决定了测试简化时需要多少次尝试:
请注意,在某些情况下,可能需要将以下设置添加回缩减后的文件中才能获得复现:
SET statement_timeout='1m0s';
SET sql_safe_updates = false;
如果缩减后的 SQL 在 10 次尝试后仍无法复现,请在此处暂停并向用户报告。他们可能有其他指示。
现在迭代简化 SQL,同时保持复现能力。
关键提示: 对于非确定性失败,你必须根据复现率用足够的尝试次数测试每个简化。单次尝试失败并不意味着简化破坏了复现能力——可能只是非确定性。
reduced.sql 到 reduced_v2.sqlreduced_v2.sql 做一个小更改./test_repro.sh(它测试 reduced_v2.sql)reduced_v2.sql 复制到 reduced.sql,继续简化reduced_v2.sql,尝试不同的更改(即回退)。此工作流程避免了需要恢复文件——你始终在 reduced.sql 中保留最后一个工作版本。
重要提示: 将复制、编辑和测试作为单独的 bash 命令运行(不要用 && 链接)。这减少了权限检查的次数。
对于"预言"失败,编辑关注查询时,请确保编辑关注查询的两个副本,使它们完全相同。 否则,在比较结果集时就不是公平比较。
这些通常无法移除:
SET testing_optimizer_random_seed = <value> - 这个特定值通常不能更改,因为它决定了哪些优化器规则被禁用SET testing_optimizer_disable_rule_probability
--nodes=9)(尽管先尝试单节点——它可能有效且更简单)CREATE STATISTICS 语句(影响查询规划)如果一个更改破坏了复现能力:
reduced_v2.sql(不要复制到 reduced.sql)reduced.sql 是否仍能复现。如果不能,这意味着复现是非确定性的。(它可能一开始就是非确定性的,或者在简化过程中变得非确定性。)尝试复现 10 次并记下新的复现率。使用新的复现率来调整后续每个简化步骤中的复现尝试次数。永远不要从破坏的状态继续简化。
如果你卡住了(即回退后再次无法复现),请停止并向用户报告你正在尝试的确切命令。
经过约 20 分钟的简化,或者回退几次后没有更多简化,就该停止了。
最终输出应包括两个可以展示给用户的文件:
git bisect run 的脚本以可以复制粘贴到终端的方式编写输出。
(此输出中的命令应编辑为匹配复现所需的内容。)
# 最小化复现
# reduced.sql
cat > reduced.sql << 'EOF'
CREATE TABLE t ();
SET testing_optimizer_random_seed = 1234567890;
SET testing_optimizer_disable_rule_probability = 0.5;
SELECT ...;
EOF
# bisect_run.sh
cat > bisect_run.sh << 'EOF'
#!/bin/bash
# Git bisect 运行脚本
# 退出代码:0=良好(错误不存在),1=不良(错误存在),125=跳过(构建失败)
REPO_DIR="/path/to/cockroach"
REPRO_SQL="/path/to/reduced.sql"
cd "$REPO_DIR" || exit 125
echo "=== Testing commit $(git rev-parse --short HEAD) ==="
# 构建(如果原始测试中启用了运行时断言,则使用 --crdb_test)
if ! ./dev build short -- --crdb_test 2>&1 | grep -q "Successfully built"; then
echo "BUILD FAILED - skipping"
exit 125
fi
# 测试错误(对于不稳定的错误尝试 3 次)
for i in {1..3}; do
if ./cockroach demo --multitenant=false --insecure \
--set=errexit=false --no-example-database --format=tsv \
-f "$REPRO_SQL" 2>&1 | grep -q "<error-pattern>"; then
echo "BUG PRESENT - marking as BAD"
exit 1
fi
done
echo "Bug not present - marking as GOOD"
exit 0
EOF
chmod +x bisect_run.sh
# 复现命令
git checkout <commit-hash>
./bisect_run.sh
# 二分查找命令
git bisect start ...
git bisect run bisect_run.sh
# 失败
# <在此处粘贴堆栈跟踪或相关失败详细信息>
# 复现率:约 X%(可能需要多次尝试)
显示此输出后,询问用户是否想在 master 分支上尝试复现错误。
在二分查找之前,检查错误是否已在 master 分支修复。
git stash # 如果需要
git checkout master
./dev build short -- --crdb_test
./cockroach demo --multitenant=false --insecure --set=errexit=false --no-example-database --format=tsv -f reduced.sql
运行几次以考虑不稳定性。注意错误在 master 分支是否复现。
如果用户想找到引入或修复错误的提交,使用 git bisect。
二分查找以找到修复提交(错误不再复现的第一个提交)。由于"良好"提交(master)比"不良"提交更新,使用自定义术语:
git bisect start --first-parent --term-old=broken --term-new=fixed
git bisect broken <commit-where-bug-exists> # 例如,原始失败提交
git bisect fixed master # master 已修复
git bisect run ./bisect_run.sh
# 完成后
git bisect reset
注意: --first-parent 选项仅遵循主分支上的合并提交,避免进入功能分支。当错误不存在(已修复)时,二分脚本必须返回 0;当错误存在(损坏)时返回 1。
二分查找以找到回归提交(引入错误的第一个提交):
git bisect start --first-parent
git bisect good <known-good-commit> # 例如,之前的发布标签
git bisect bad master # master 存在错误
git bisect run ./bisect_run.sh
# 完成后
git bisect reset
二分查找将识别引入或修复错误的提交。
如果你不知道一个良好提交(错误不存在的地方),可以跳回过去寻找一个。
# 在主分支上查找约 6 个月前的提交
git rev-list --first-parent -1 --before="6 months ago" HEAD
测试错误在该提交是否存在。如果不存在,将其用作二分查找的良好提交。如果错误仍然存在,尝试更早的时间,但不要超过 1 年。
如果在 1 年内找不到已知的良好提交,请停止并向用户报告。
每周安装次数
48
仓库
GitHub 星标数
32.1K
首次出现
2026年2月6日
安全审计
安装于
opencode48
gemini-cli48
github-copilot48
codex48
kimi-cli48
amp48
Reduce an unoptimized-query-oracle test failure log to the simplest possible reproduction case.
The unoptimized-query-oracle roachtest runs a series of random SQL statements to create a random dataset, and then executes a random "Query of Interest" twice, with different optimization settings. If the two executions return different results, it indicates a bug in CockroachDB.
Use this skill when:
Ask the user where the artifacts directory is.
Find the relevant files in the artifacts directory:
params.log (the parameters from the roachtest)test.log (the log from the roachtest)failure*.log (the failure log from the roachtest)unoptimized-query-oracle*.log (the SQL statements that led to failure)unoptimized-query-oracle*.failure.log (containing the query of interest and possibly more information about the failure)logs/1.unredacted/cockroach.log or logs/unredacted/cockroach.log (contains the git commit)Determine the git commit from cockroach.log:
grep "binary: CockroachDB" cockroach.log
Look for the commit hash in the version string (e.g., cb94db961b8f55e3473f279d98ae90f0eeb0adcb).
Determine if runtime assertions are enabled by checking for:
"runtimeAssertionsBuild": "true" in params.logRuntime assertions enabled in test.logDetermine if metamorphic settings apply by looking for:
lines like these in params.log:
"metamorphicBufferedSender": "true",
"metamorphicWriteBuffering": "true",
or lines like these in test.log:
metamorphically setting "kv.rangefeed.buffered_sender.enabled" to 'true'
metamorphically setting "kv.transaction.write_buffering.enabled" to 'true'
Determine environment variables from the beginning of cockroach.log:
grep -A10 "using local environment variables:" cockroach.log
Important environment variables include:
COCKROACH_INTERNAL_CHECK_CONSISTENCY_FATALCOCKROACH_INTERNAL_DISABLE_METAMORPHIC_TESTINGCOCKROACH_RANDOM_SEEDCOCKROACH_TESTING_FORCE_RELEASE_BRANCH But there might be more important environment variables, so best to get all of them.Determine if this is a multi-region test or single-region test by checking:
seed-multi-region in test.log indicates multi-region)\connect lines in the full SQL log If both of these are missing, it's a single-region test.For a normal build use:
git checkout <commit-hash>
./dev build short
If runtime assertions were enabled, use a test build instead:
git checkout <commit-hash>
./dev build short -- --crdb_test
Note: Only build libgeos if the reproduction uses geospatial functions (BOX2D, geometry, geography, etc.):
./dev build libgeos
First, check that the following statements are at the top of the full SQL log file. If they are not, add them:
SET statement_timeout='1m0s';
SET sql_safe_updates = false;
If metamorphic settings were used, also add them to the top of the full SQL log file:
SET CLUSTER SETTING kv.rangefeed.buffered_sender.enabled = true;
SET CLUSTER SETTING kv.transaction.write_buffering.enabled = true;
Create an appropriate directory either in the artifacts directory or in the repository root for holding temp files.
Determine the correct demo command based on test type:
--nodes=9--nodes optionUse a command like this to try reproducing the test failure from the full SQL log file. This command could take up to 20 minutes to finish.
<env vars> ./cockroach demo --multitenant=false --nodes=9 --insecure --set=errexit=false --no-example-database --format=tsv -f <full-sql-log-file>
Check that the output reproduces the test failure described in the failure log. There are many possible failure modes. Look for one of the following, which should match the failure log:
internal error or assertion failure. Note the error message for the reduce step.IMPORTANT: Many failures are nondeterministic, especially for multi-region tests. If no failure happens on the first run, try up to 10 times before concluding it doesn't reproduce.
It can be helpful at this point to compare the output with the failure*.log which should show the failure from the original test run.
If the initial run fails to reproduce after 10 times, pause here and report to the user that the failure cannot be reproduced, and show the command that was tried. The user might have additional instructions.
If it looks like it reproduces, it's time to move on to the next step.
Build the reduce tool:
./dev build reduce
For multi-region tests, remove \connect lines (they cause syntax errors in the reduce tool):
grep -v '^\\connect' <full-sql-log-file> > <cleaned-log>
IMPORTANT: The reduce tool must be run from the cockroach repository root directory, because it looks for ./cockroach in the current directory.
Use the -multi-region option for multi-region tests, or omit it for single-region tests.
For "oracle" failures (different results):
./bin/reduce -unoptimized-query-oracle -multi-region -chunk 25 -v -file <cleaned-log> 2>&1 | tee reduce-output.log
The -unoptimized-query-oracle option checks whether the two executions of the "Query of Interest" produce the same results.
For internal errors/assertion failures/panics:
./bin/reduce -contains "<error-regex>" -multi-region -chunk 25 -v -file <cleaned-log> 2>&1 | tee reduce-output.log
Use a distinctive part of the error message as the -contains regex (e.g., "nil LeafTxnInputState").
The reduce tool might take up to an hour to run.
The reduce tool outputs progress lines followed by the final SQL. Extract just the SQL:
grep -A1000 "^reduction: " reduce-output.log | tail -n +2 > reduced.sql
IMPORTANT: Immediately save a backup of the reduce output before manual simplification:
cp reduced.sql reduced_original.sql
This provides a recovery point if the working file gets corrupted during simplification.
If the reduce tool fails to reproduce, pause here and report this to the user. They might have additional instructions. Occasionally we have to modify the reduce tool itself, if the test failure is not reproducing.
IMPORTANT: Many bugs are nondeterministic. Before manual simplification, create a reusable test script and determine the reproduction rate.
Create a small test script (adjust as needed):
cat > test_repro.sh << 'EOF'
#!/bin/bash
# Test if reduced_v2.sql reproduces the error (exits on first success, up to 10 attempts)
for i in {1..10}; do
if ./cockroach demo --multitenant=false --nodes=9 --insecure \
--set=errexit=false --no-example-database --format=tsv \
-f reduced_v2.sql 2>&1 | grep -q "<error-pattern>"; then
echo "Run $i: REPRODUCED"
exit 0
else
echo "Run $i: no error"
fi
done
echo "FAILED"
EOF
chmod +x test_repro.sh
For "oracle" failures, instead of checking for an error pattern, the test script probably needs to isolate and diff the results of the two executions of the "Query of Interest".
Run the test script to determine the reproduction rate. It's not always 100%.
This rate determines how many attempts you need when testing simplifications:
Note that in some cases, the following settings might need to be added back to the reduced file to get a repro:
SET statement_timeout='1m0s';
SET sql_safe_updates = false;
If the reduced SQL fails to reproduce after 10 attempts, pause here and report this to the user. They might have additional instructions.
Now iteratively simplify the SQL while maintaining reproduction.
CRITICAL: For nondeterministic failures, you MUST test each simplification with enough attempts based on the repro rate. A single failed attempt does NOT mean the simplification broke the repro - it may just be nondeterminism.
reduced.sql to reduced_v2.sqlreduced_v2.sql./test_repro.sh (which tests reduced_v2.sql)reduced_v2.sql to reduced.sql, continue simplifyingreduced_v2.sql, try a different change (i.e. backtrack).This workflow avoids needing to restore files - you always keep the last working version in reduced.sql.
IMPORTANT: Run copy, edit, and test as separate bash commands (not chained with &&). This reduces the number of permission checks.
For "oracle" failures, when editing the Query of Interest, be sure to edit BOTH copies of the Query of Interest so that they are identical. Otherwise it won't be an apples-to-apples comparison when diffing the result sets.
These often cannot be removed:
SET testing_optimizer_random_seed = <value> - this specific value often cannot be changed, as it determines which optimizer rules are disabledSET testing_optimizer_disable_rule_probability
--nodes=9) for distributed query bugs (though try single-node first - it may work and is simpler)CREATE STATISTICS statements (affect query planning)If a change breaks reproduction:
reduced_v2.sql (don't copy it to reduced.sql)reduced.sql still reproduces. If it doesn't, this means the repro is nondeterministic. (It might have started out nondeterministic, or might have become nondeterministic over the course of simplification.) Try reproducing it 10 times and note the new repro rate. Use the new repro rate to adjust the number of repro attempts during each simplification step going forward.Never continue simplifying from a broken state.
If you get stuck (i.e. cannot reproduce again after backtracking), stop and report to the user with the exact command you were trying.
After about 20 minutes of simplification, or if there are no more simplifications after backtracking a few times, it's time to stop.
The final output should include two files that can be shown to the user:
git bisect runWrite the output in such a way that it could be copied and pasted into a terminal.
(The commands in this output should be edited to match what was necessary to reproduce.)
# Minimal Reproduction
# reduced.sql
cat > reduced.sql << 'EOF'
CREATE TABLE t ();
SET testing_optimizer_random_seed = 1234567890;
SET testing_optimizer_disable_rule_probability = 0.5;
SELECT ...;
EOF
# bisect_run.sh
cat > bisect_run.sh << 'EOF'
#!/bin/bash
# Git bisect run script
# Exit codes: 0=good (bug not present), 1=bad (bug present), 125=skip (build failed)
REPO_DIR="/path/to/cockroach"
REPRO_SQL="/path/to/reduced.sql"
cd "$REPO_DIR" || exit 125
echo "=== Testing commit $(git rev-parse --short HEAD) ==="
# Build (use --crdb_test if runtime assertions were enabled in the original test)
if ! ./dev build short -- --crdb_test 2>&1 | grep -q "Successfully built"; then
echo "BUILD FAILED - skipping"
exit 125
fi
# Test for bug (try 3 times for flaky bugs)
for i in {1..3}; do
if ./cockroach demo --multitenant=false --insecure \
--set=errexit=false --no-example-database --format=tsv \
-f "$REPRO_SQL" 2>&1 | grep -q "<error-pattern>"; then
echo "BUG PRESENT - marking as BAD"
exit 1
fi
done
echo "Bug not present - marking as GOOD"
exit 0
EOF
chmod +x bisect_run.sh
# Command to reproduce
git checkout <commit-hash>
./bisect_run.sh
# Command to bisect
git bisect start ...
git bisect run bisect_run.sh
# Failure
# <paste stacktrace or relevant failure details here>
# Repro rate: ~X% (may need multiple attempts)
After showing this output, ask the user if they want to try reproducing the bug on master branch.
Before bisecting, check whether the bug has already been fixed on master.
git stash # if needed
git checkout master
./dev build short -- --crdb_test
./cockroach demo --multitenant=false --insecure --set=errexit=false --no-example-database --format=tsv -f reduced.sql
Run this a few times to account for flakiness. Note whether the bug reproduces on master or not.
If the user wants to find the commit that introduced or fixed the bug, use git bisect.
Bisect to find the fix commit (the first commit where the bug no longer reproduces). Use custom terms since the "good" commit (master) is newer than the "bad" commit:
git bisect start --first-parent --term-old=broken --term-new=fixed
git bisect broken <commit-where-bug-exists> # e.g., the original failing commit
git bisect fixed master # master is fixed
git bisect run ./bisect_run.sh
# When done
git bisect reset
Note: The --first-parent option follows only merge commits on the main branch, avoiding detours into feature branches. The bisect script must return 0 when the bug is NOT present (fixed) and 1 when the bug IS present (broken).
Bisect to find the regression commit (the first commit where the bug was introduced):
git bisect start --first-parent
git bisect good <known-good-commit> # e.g., a previous release tag
git bisect bad master # master has the bug
git bisect run ./bisect_run.sh
# When done
git bisect reset
The bisect will identify the commit that introduced or fixed the bug.
If you don't know a good commit (where the bug doesn't exist), you can jump back in time to find one.
# Find a commit from ~6 months ago on the main branch
git rev-list --first-parent -1 --before="6 months ago" HEAD
Test whether the bug exists at that commit. If not, use it as the good commit for bisect. If the bug still exists, try going back further in time, but don't go back further than 1 year.
If a known good commit can't be found within 1 year, stop and report this to the user.
Weekly Installs
48
Repository
GitHub Stars
32.1K
First Seen
Feb 6, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
opencode48
gemini-cli48
github-copilot48
codex48
kimi-cli48
amp48
身份认证漏洞测试指南:OWASP Top 10 Broken Authentication 攻防与修复
57 周安装