⚠️

重要前提

安装AI Skills的关键前提是：必须科学上网，且开启TUN模式，这一点至关重要，直接决定安装能否顺利完成，在此郑重提醒三遍：科学上网，科学上网，科学上网。查看完整安装教程 →

专业交易策略回测方法：系统化压力测试与稳健性验证指南

backtest-expert by nicepkg/ai-workflow

60 周安装量

154 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/nicepkg/ai-workflow --skill backtest-expert

金融科技测试风险管理

🇨🇳中文介绍

回测专家

基于专业方法论的系统化交易策略回测方法，优先考虑稳健性而非乐观结果。

核心理念

目标：寻找"出错最少"的策略，而非纸上"盈利最多"的策略。

原则：增加摩擦，压力测试假设，观察哪些能存活下来。如果一个策略在悲观条件下仍能保持良好表现，那么它在实盘交易中更可能有效。

何时使用此技能

在以下情况下使用此技能：

开发或验证系统化交易策略时
评估交易想法是否足够稳健以进行实盘实施时
排查回测可能产生误导的原因时
学习正确的回测方法时
避免常见陷阱（曲线拟合、前视偏差、幸存者偏差）时
评估参数敏感性和市场状态依赖性时
为滑点和执行成本设定现实预期时

回测工作流程

1. 陈述假设

用一句话定义策略优势。

示例："财报发布后跳空高开 >3% 并在第一小时内回撤至前一日收盘价的股票，提供了均值回归机会。"

如果无法清晰阐明策略优势，请不要继续测试。

2. 零自由裁量地编码规则

完全具体地定义：

入场：精确条件、时机、价格类型
出场：止损、止盈目标、基于时间的出场
头寸规模：固定金额、投资组合百分比、波动率调整
过滤器：市值、成交量、行业、波动率条件
交易品种：哪些工具符合条件

关键：不允许主观判断。每个决策都必须基于规则且明确无误。

3. 运行初始回测

测试范围：

（最好 10 年以上）

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

4. 压力测试策略

此处应花费 80% 的测试时间。

参数敏感性：

测试止损为基准的 50%、75%、100%、125%、150%
测试止盈目标为基准的 80%、90%、100%、110%、120%
将入场/出场时间变化 ±15-30 分钟
寻找性能稳定的"平台"，而非狭窄的峰值

将滑点增加到典型估计值的 1.5-2 倍
模拟最坏情况成交（以卖一价+1个最小变动单位买入，以买一价-1个最小变动单位卖出）
添加现实的订单拒绝场景
使用悲观的佣金结构进行测试

时间稳健性：

逐年分析性能
要求大多数年份具有正期望值
确保策略不依赖于 1-2 个特殊时期
分别在不同市场状态下测试

绝对最小值：30 笔交易
推荐值：100+ 笔交易
高置信度：200+ 笔交易

滚动向前分析：

在训练期优化（例如，第 1-3 年）
在验证期测试（第 4 年）
向前滚动并重复
比较样本内与样本外性能

样本外性能 < 样本内性能的 50%
需要频繁重新优化参数
不同时期之间参数变化剧烈

需要回答的问题：

策略优势在悲观假设下是否仍然存在？
性能在参数变化范围内是否稳定？
策略在多种市场状态下是否有效？
样本量是否足以获得统计置信度？
结果是否现实，而非"好得不真实"？

✅ 部署：通过所有压力测试且性能可接受
🔄 优化：核心逻辑合理但需要参数调整
❌ 放弃：未通过压力测试或依赖于脆弱的假设

处处增加摩擦：

佣金高于现实
滑点为典型的 1.5-2 倍
最坏情况成交
订单拒绝
部分成交

理由：能在悲观假设下存活的策略通常在实盘交易中表现更佳。

寻找平台，而非峰值

寻找性能稳定的参数范围，而非产生性能峰值的优化值。

好：止损在 1.5% 到 3.0% 之间任何位置都能盈利的策略坏：止损必须精确为 2.13% 才能工作的策略

稳定的性能表明真正的优势；狭窄的最优值暗示曲线拟合。

测试所有案例，而非精选示例

错误方法：研究精选的"市场领导者"成功案例 正确方法：测试符合标准的每只股票，包括失败的股票

选择性示例会产生幸存者偏差并高估策略质量。

将想法生成与验证分离

直觉：用于生成假设验证：必须纯粹基于数据驱动

绝不要让对某个想法的情感依恋影响测试结果的解释。

及早识别这些模式以节省时间：

参数敏感性：仅适用于精确的参数值
状态特定性：某些年份表现优异，其他年份表现糟糕
滑点敏感性：添加现实成本后无利可图
样本量小：交易次数太少，统计置信度不足
前视偏差："好得不真实"的结果
过度优化：参数众多，样本外结果差

详细示例和诊断框架请参见 references/failed_tests.md。

文件：references/methodology.md

何时阅读：需要特定测试技术的详细指导时。

压力测试方法
参数敏感性分析
滑点和摩擦建模
样本量要求
市场状态分类
常见偏差和陷阱（幸存者偏差、前视偏差、曲线拟合等）

文件：references/failed_tests.md

何时阅读：策略测试失败时，或从过去的错误中学习时。

为何失败有价值
常见失败模式及示例
案例研究文档框架
评估回测的红旗检查清单

时间分配：花费 20% 的时间生成想法，80% 的时间试图打破它们。

无上下文要求：如果策略需要"完美上下文"才能工作，那么它对于系统化交易来说不够稳健。

红旗：如果回测结果看起来太好（>90% 胜率、最小回撤、完美时机），请仔细审核是否存在前视偏差或数据问题。

工具限制：了解回测平台的特性（插值方法、低流动性处理、数据对齐问题）。

统计显著性：小的优势需要大的样本量来证明。每笔交易 5% 的优势需要 100+ 笔交易才能与运气区分开来。

主观交易与系统化交易的差异

此技能专注于系统化/量化回测，其中：

所有规则都预先编码
执行中无自由裁量或"感觉"
测试针对所有历史案例，而非精选案例
上下文（新闻、宏观）被刻意剥离

主观交易者的研究方法不同——此技能可能不适用于需要主观判断的设置。

2026 年 1 月 24 日

🇺🇸English

Backtest Expert

Systematic approach to backtesting trading strategies based on professional methodology that prioritizes robustness over optimistic results.

Core Philosophy

Goal : Find strategies that "break the least", not strategies that "profit the most" on paper.

Principle : Add friction, stress test assumptions, and see what survives. If a strategy holds up under pessimistic conditions, it's more likely to work in live trading.

When to Use This Skill

Use this skill when:

Developing or validating systematic trading strategies
Evaluating whether a trading idea is robust enough for live implementation
Troubleshooting why a backtest might be misleading
Learning proper backtesting methodology
Avoiding common pitfalls (curve-fitting, look-ahead bias, survivorship bias)
Assessing parameter sensitivity and regime dependence
Setting realistic expectations for slippage and execution costs

Backtesting Workflow

1. State the Hypothesis

Define the edge in one sentence.

Example : "Stocks that gap up >3% on earnings and pull back to previous day's close within first hour provide mean-reversion opportunity."

If you can't articulate the edge clearly, don't proceed to testing.

2. Codify Rules with Zero Discretion

Define with complete specificity:

Entry : Exact conditions, timing, price type
Exit : Stop loss, profit target, time-based exit
Position sizing : Fixed $$, % of portfolio, volatility-adjusted
Filters : Market cap, volume, sector, volatility conditions
Universe : What instruments are eligible

Critical : No subjective judgment allowed. Every decision must be rule-based and unambiguous.

3. Run Initial Backtest

Test over:

Minimum 5 years (preferably 10+)
Multiple market regimes (bull, bear, high/low volatility)
Realistic costs : Commissions + conservative slippage

Examine initial results for basic viability. If fundamentally broken, iterate on hypothesis.

4. Stress Test the Strategy

This is where 80% of testing time should be spent.

Parameter sensitivity :

Test stop loss at 50%, 75%, 100%, 125%, 150% of baseline
Test profit target at 80%, 90%, 100%, 110%, 120% of baseline
Vary entry/exit timing by ±15-30 minutes
Look for "plateaus" of stable performance, not narrow spikes

Execution friction :

Increase slippage to 1.5-2x typical estimates
Model worst-case fills (buy at ask+1 tick, sell at bid-1 tick)
Add realistic order rejection scenarios
Test with pessimistic commission structures

Time robustness :

Analyze year-by-year performance
Require positive expectancy in majority of years
Ensure strategy doesn't rely on 1-2 exceptional periods
Test in different market regimes separately

Sample size :

Absolute minimum: 30 trades
Preferred: 100+ trades
High confidence: 200+ trades

5. Out-of-Sample Validation

Walk-forward analysis :

Optimize on training period (e.g., Year 1-3)
Test on validation period (Year 4)
Roll forward and repeat
Compare in-sample vs out-of-sample performance

Warning signs :

Out-of-sample <50% of in-sample performance
Need frequent parameter re-optimization
Parameters change dramatically between periods

6. Evaluate Results

Questions to answer :

Does edge survive pessimistic assumptions?
Is performance stable across parameter variations?
Does strategy work in multiple market regimes?
Is sample size sufficient for statistical confidence?
Are results realistic, not "too good to be true"?

Decision criteria :

✅ Deploy : Survives all stress tests with acceptable performance
🔄 Refine : Core logic sound but needs parameter adjustment
❌ Abandon : Fails stress tests or relies on fragile assumptions

Key Testing Principles

Punish the Strategy

Add friction everywhere:

Commissions higher than reality
Slippage 1.5-2x typical
Worst-case fills
Order rejections
Partial fills

Rationale : Strategies that survive pessimistic assumptions often outperform in live trading.

Seek Plateaus, Not Peaks

Look for parameter ranges where performance is stable, not optimal values that create performance spikes.

Good : Strategy profitable with stop loss anywhere from 1.5% to 3.0% Bad : Strategy only works with stop loss at exactly 2.13%

Stable performance indicates genuine edge; narrow optima suggest curve-fitting.

Test All Cases, Not Cherry-Picked Examples

Wrong approach : Study hand-picked "market leaders" that worked Right approach : Test every stock that met criteria, including those that failed

Selective examples create survivorship bias and overestimate strategy quality.

Separate Idea Generation from Validation

Intuition : Useful for generating hypotheses Validation : Must be purely data-driven

Never let attachment to an idea influence interpretation of test results.

Common Failure Patterns

Recognize these patterns early to save time:

Parameter sensitivity : Only works with exact parameter values
Regime-specific : Great in some years, terrible in others
Slippage sensitivity : Unprofitable when realistic costs added
Small sample : Too few trades for statistical confidence
Look-ahead bias : "Too good to be true" results
Over-optimization : Many parameters, poor out-of-sample results

See references/failed_tests.md for detailed examples and diagnostic framework.

Available Reference Documentation

Methodology Reference

File : references/methodology.md

When to read : For detailed guidance on specific testing techniques.

Contents :

Stress testing methods
Parameter sensitivity analysis
Slippage and friction modeling
Sample size requirements
Market regime classification
Common biases and pitfalls (survivorship, look-ahead, curve-fitting, etc.)

Failed Tests Reference

File : references/failed_tests.md

When to read : When strategy fails tests, or learning from past mistakes.

Contents :

Why failures are valuable
Common failure patterns with examples
Case study documentation framework
Red flags checklist for evaluating backtests

Critical Reminders

Time allocation : Spend 20% generating ideas, 80% trying to break them.

Context-free requirement : If strategy requires "perfect context" to work, it's not robust enough for systematic trading.

Red flag : If backtest results look too good (>90% win rate, minimal drawdowns, perfect timing), audit carefully for look-ahead bias or data issues.

Tool limitations : Understand your backtesting platform's quirks (interpolation methods, handling of low liquidity, data alignment issues).

Statistical significance : Small edges require large sample sizes to prove. 5% edge per trade needs 100+ trades to distinguish from luck.

Discretionary vs Systematic Differences

This skill focuses on systematic/quantitative backtesting where:

All rules are codified in advance
No discretion or "feel" in execution
Testing happens on all historical examples, not cherry-picked cases
Context (news, macro) is deliberately stripped out

Discretionary traders study differently—this skill may not apply to setups requiring subjective judgment.

Weekly Installs

Repository

nicepkg/ai-workflow

GitHub Stars

142

First Seen

Jan 24, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

opencode39

gemini-cli33

claude-code33

codex32

cursor31

github-copilot26

测试策略完整指南：单元/集成/E2E测试金字塔与自动化实践

11,200 周安装