backtest-expert by tradermonty/claude-trading-skills
npx skills add https://github.com/tradermonty/claude-trading-skills --skill backtest-expert基于专业方法论的系统化回测交易策略方法,优先考虑稳健性而非乐观结果。
目标:寻找“失效最少”的策略,而非纸上“盈利最多”的策略。
原则:增加摩擦,压力测试假设,观察哪些能存活下来。如果一个策略在悲观条件下仍能保持良好表现,那么它在实盘交易中更有可能成功。
在以下情况使用此技能:
用一句话定义策略优势。
示例:“在财报公布后跳空上涨 >3% 并在开盘第一小时内回落到前一日收盘价的股票,提供了均值回归机会。”
如果无法清晰地阐明优势,请不要进行测试。
用完全具体的细节定义:
关键:不允许主观判断。每个决策都必须基于规则且明确无误。
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
测试范围:
检查初步结果以评估基本可行性。如果策略从根本上就存在问题,则需迭代假设。
80% 的测试时间应花在此处。
参数敏感性:
执行摩擦:
时间稳健性:
样本量:
滚动前进分析:
警告信号:
需要回答的问题:
决策标准:
使用评估脚本进行结构化、定量评估:
python3 skills/backtest-expert/scripts/evaluate_backtest.py \
--total-trades 150 \
--win-rate 62 \
--avg-win-pct 1.8 \
--avg-loss-pct 1.2 \
--max-drawdown-pct 15 \
--years-tested 8 \
--num-parameters 3 \
--slippage-tested \
--output-dir reports/
该脚本从 5 个维度(样本量、期望值、风险管理、稳健性、执行现实性)进行评分,检测危险信号,并输出部署/优化/放弃的裁决。
在所有环节增加摩擦:
理由:能在悲观假设下存活的策略,通常在实盘交易中表现更佳。
寻找性能稳定的参数范围,而非产生性能峰值的优化值。
好:止损在 1.5% 到 3.0% 之间的任何位置策略都能盈利 坏:策略仅在止损恰好为 2.13% 时有效
稳定的性能表明真正的优势;狭窄的最优值则暗示曲线拟合。
错误方法:研究精选的、有效的“市场领导者” 正确方法:测试符合标准的每只股票,包括那些失败的
选择性示例会造成幸存者偏差并高估策略质量。
直觉:用于生成假设 验证:必须纯粹基于数据驱动
绝不要让对某个想法的情感依恋影响对测试结果的解读。
尽早识别这些模式以节省时间:
有关详细示例和诊断框架,请参阅 references/failed_tests.md。
reports/backtest_eval_<timestamp>.json — 结构化评估,包含各维度评分、危险信号和裁决reports/backtest_eval_<timestamp>.md — 人类可读的报告,包含维度表、关键指标和危险信号详情文件:references/methodology.md
何时阅读:需要特定测试技术的详细指导时。
内容:
文件:references/failed_tests.md
何时阅读:当策略测试失败,或需要从过去的错误中学习时。
内容:
时间分配:花 20% 的时间生成想法,80% 的时间尝试打破它们。
无上下文要求:如果策略需要“完美上下文”才能工作,那么它对于系统化交易来说不够稳健。
危险信号:如果回测结果看起来好得不真实(>90% 胜率、回撤极小、时机完美),请仔细检查是否存在前视偏差或数据问题。
工具限制:了解你的回测平台的特性(插值方法、低流动性处理、数据对齐问题)。
统计显著性:小的优势需要大的样本量来证明。每笔交易 5% 的优势需要 100 笔以上的交易才能与运气区分开。
本技能侧重于系统化/量化回测,其中:
主观交易者的研究方式不同——本技能可能不适用于需要主观判断的场景。
每周安装数
287
代码仓库
GitHub 星标数
398
首次出现
Jan 26, 2026
安全审计
安装于
gemini-cli269
opencode269
codex264
cursor263
github-copilot262
kimi-cli258
Systematic approach to backtesting trading strategies based on professional methodology that prioritizes robustness over optimistic results.
Goal : Find strategies that "break the least", not strategies that "profit the most" on paper.
Principle : Add friction, stress test assumptions, and see what survives. If a strategy holds up under pessimistic conditions, it's more likely to work in live trading.
Use this skill when:
Define the edge in one sentence.
Example : "Stocks that gap up >3% on earnings and pull back to previous day's close within first hour provide mean-reversion opportunity."
If you can't articulate the edge clearly, don't proceed to testing.
Define with complete specificity:
Critical : No subjective judgment allowed. Every decision must be rule-based and unambiguous.
Test over:
Examine initial results for basic viability. If fundamentally broken, iterate on hypothesis.
This is where 80% of testing time should be spent.
Parameter sensitivity :
Execution friction :
Time robustness :
Sample size :
Walk-forward analysis :
Warning signs :
Questions to answer :
Decision criteria :
Use the evaluation script for a structured, quantitative assessment:
python3 skills/backtest-expert/scripts/evaluate_backtest.py \
--total-trades 150 \
--win-rate 62 \
--avg-win-pct 1.8 \
--avg-loss-pct 1.2 \
--max-drawdown-pct 15 \
--years-tested 8 \
--num-parameters 3 \
--slippage-tested \
--output-dir reports/
The script scores across 5 dimensions (Sample Size, Expectancy, Risk Management, Robustness, Execution Realism), detects red flags, and outputs a Deploy/Refine/Abandon verdict.
Add friction everywhere:
Rationale : Strategies that survive pessimistic assumptions often outperform in live trading.
Look for parameter ranges where performance is stable, not optimal values that create performance spikes.
Good : Strategy profitable with stop loss anywhere from 1.5% to 3.0% Bad : Strategy only works with stop loss at exactly 2.13%
Stable performance indicates genuine edge; narrow optima suggest curve-fitting.
Wrong approach : Study hand-picked "market leaders" that worked Right approach : Test every stock that met criteria, including those that failed
Selective examples create survivorship bias and overestimate strategy quality.
Intuition : Useful for generating hypotheses Validation : Must be purely data-driven
Never let attachment to an idea influence interpretation of test results.
Recognize these patterns early to save time:
See references/failed_tests.md for detailed examples and diagnostic framework.
reports/backtest_eval_<timestamp>.json — structured evaluation with per-dimension scores, red flags, and verdictreports/backtest_eval_<timestamp>.md — human-readable report with dimension table, key metrics, and red flag detailsFile : references/methodology.md
When to read : For detailed guidance on specific testing techniques.
Contents :
File : references/failed_tests.md
When to read : When strategy fails tests, or learning from past mistakes.
Contents :
Time allocation : Spend 20% generating ideas, 80% trying to break them.
Context-free requirement : If strategy requires "perfect context" to work, it's not robust enough for systematic trading.
Red flag : If backtest results look too good (>90% win rate, minimal drawdowns, perfect timing), audit carefully for look-ahead bias or data issues.
Tool limitations : Understand your backtesting platform's quirks (interpolation methods, handling of low liquidity, data alignment issues).
Statistical significance : Small edges require large sample sizes to prove. 5% edge per trade needs 100+ trades to distinguish from luck.
This skill focuses on systematic/quantitative backtesting where:
Discretionary traders study differently—this skill may not apply to setups requiring subjective judgment.
Weekly Installs
287
Repository
GitHub Stars
398
First Seen
Jan 26, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
gemini-cli269
opencode269
codex264
cursor263
github-copilot262
kimi-cli258
Vue.js测试最佳实践:Vue 3组件、组合式函数、Pinia与异步测试完整指南
3,700 周安装