prompt-repetition by akillness/oh-my-skills
npx skills add https://github.com/akillness/oh-my-skills --skill prompt-repetitionLLMs 被训练为因果语言模型,其中每个 token 只关注先前的 token。这导致了:
提示词重复使得第二次处理能够参考整个第一次处理,有效地模拟了双向注意力的部分优势。
[上下文] → [问题]
↓
处理上下文 token 时无法参考问题内容
当问题 token 出现时,上下文的注意力权重已经确定
[第一次处理] [第二次处理]
上下文 → 问题 → 上下文' → 问题'
↑ ↑
可以引用整个第一次处理
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
在第二次重复中,模型重新处理整个第一次提示中的信息并增强对关键概念的注意力权重,从而提升性能。
注意:这并不会将模型架构改变为双向的;它是一种提示工程技术,用于缓解因果模型的局限性。
| 指标 | 结果 |
|---|---|
| 显著提升 (p < 0.1) | 47 / 70 个基准测试 |
| 性能下降 | 0 |
| 无影响 | 23 |
| 提升率 | 67% |
最显著的提升: Gemini 2.0 Flash-Lite 在 NameIndex 上:21.33% → 97.33% (+76%p)
| 提供商 | 自动应用模型 | 排除的模型 |
|---|---|---|
| Claude | haiku 系列 | opus, sonnet |
| Gemini | flash, flash-lite | pro, ultra |
| OpenAI | gpt-4o-mini, gpt-low | gpt-4o, gpt-4 |
| 任务类型 | 关键词模式 | 重复次数 | 预期提升 |
|---|---|---|---|
| 选项优先多选题 | A. B. C. D. 选项在前 | 2× | +15-40%p |
| 索引/位置 | slot, position, index, N-th | 3× | +50-76%p |
| 上下文 + 问题 | 一般性问题 | 2× | +5-15%p |
| 使用 CoT | step by step, think through | 0× (不应用) | ~0% |
# 自动应用前检查上下文
max_context = model_context_window * 0.8 # 80% 安全边际
if len(prompt_tokens) * repetitions > max_context:
repetitions = max(1, int(max_context / len(prompt_tokens)))
def apply_prompt_repetition(prompt: str, times: int = 2) -> str:
"""将提示词重复指定次数
Args:
prompt: 原始提示词
times: 重复次数 (默认 2)
Returns:
重复后的提示词
"""
if times <= 1:
return prompt
return "\n\n".join([prompt] * times)
应用前:
A. Paris
B. London
C. Berlin
D. Madrid
Which city is the capital of France?
Reply with one letter.
应用后(重复 ×2):
A. Paris
B. London
C. Berlin
D. Madrid
Which city is the capital of France?
Reply with one letter.
A. Paris
B. London
C. Berlin
D. Madrid
Which city is the capital of France?
Reply with one letter.
预期输出:
A
准确率:原始 78% → 重复后 93% (+15%p)
应用前:
Inventory:
1. Iron Sword
2. Leather Armor
3. Health Potion (x5)
4. Magic Staff
...
25. Dragon Scale
...
50. Ancient Map
What item is in slot 25?
应用后(重复 ×3): 提示词重复 3 次
预期输出:
Dragon Scale
准确率:原始 21% → 重复后 97% (+76%p)
注意:包含工具调用指令的提示词也会整体重复。采用完全重复的方法是为了实现简单性和一致性。
应用前:
Use the calculator tool to compute 234 * 567.
What is the result?
应用后(重复 ×2):
Use the calculator tool to compute 234 * 567.
What is the result?
Use the calculator tool to compute 234 * 567.
What is the result?
研究结果表明,包含工具调用部分的完全重复也是有效的。
"""prompt_repetition_transformer.py"""
from dataclasses import dataclass, field
from typing import Optional, Callable, List
import re
# 每个模型的上下文窗口(以 token 计)
MODEL_CONTEXT_WINDOWS = {
"claude-3-haiku": 200_000,
"claude-haiku": 200_000,
"gemini-flash": 1_000_000,
"gemini-flash-lite": 1_000_000,
"gemini-2.0-flash": 1_000_000,
"gpt-4o-mini": 128_000,
"gpt-low": 128_000,
}
# 自动应用的目标模型
AUTO_APPLY_MODELS = list(MODEL_CONTEXT_WINDOWS.keys())
# CoT 模式(排除应用)
COT_PATTERNS = [
r"step by step",
r"think through",
r"let's think",
r"reasoning:",
r"chain of thought",
]
# 位置/索引模式(3× 重复)
POSITION_PATTERNS = [
r"slot \d+",
r"position \d+",
r"index \d+",
r"\d+(st|nd|rd|th)",
r"item \d+",
r"row \d+",
r"column \d+",
]
@dataclass
class PromptRepetitionConfig:
"""提示词重复配置"""
default_repetitions: int = 2
position_repetitions: int = 3
separator: str = "\n\n"
max_context_ratio: float = 0.8
applied_marker: str = "<!-- prompt-repetition-applied -->"
class PromptRepetitionTransformer:
"""用于轻量级模型的自动应用提示词重复转换器"""
def __init__(self, config: Optional[PromptRepetitionConfig] = None):
self.config = config or PromptRepetitionConfig()
def should_apply(self, model: str, prompt: str) -> bool:
"""决定是否自动应用"""
# 如果已应用则跳过
if self.config.applied_marker in prompt:
return False
# 检查目标模型
model_lower = model.lower()
if not any(m in model_lower for m in AUTO_APPLY_MODELS):
return False
# 检测到 CoT 模式时跳过
prompt_lower = prompt.lower()
for pattern in COT_PATTERNS:
if re.search(pattern, prompt_lower):
return False
return True
def determine_repetitions(self, prompt: str, model: str) -> int:
"""根据任务类型确定重复次数"""
prompt_lower = prompt.lower()
# 检测到位置/索引模式 → 3×
for pattern in POSITION_PATTERNS:
if re.search(pattern, prompt_lower):
return self.config.position_repetitions
return self.config.default_repetitions
def estimate_tokens(self, text: str) -> int:
"""简单的 token 计数估计(速度优先于精度)"""
# 大约 4 个字符 = 1 个 token
return len(text) // 4
def transform(self, prompt: str, model: str) -> str:
"""对提示词应用重复"""
if not self.should_apply(model, prompt):
return prompt
repetitions = self.determine_repetitions(prompt, model)
# 检查上下文限制
model_lower = model.lower()
max_tokens = 128_000 # 默认值
for m, tokens in MODEL_CONTEXT_WINDOWS.items():
if m in model_lower:
max_tokens = tokens
break
max_allowed = int(max_tokens * self.config.max_context_ratio)
prompt_tokens = self.estimate_tokens(prompt)
# 如果超出 token 限制,则减少重复次数
while prompt_tokens * repetitions > max_allowed and repetitions > 1:
repetitions -= 1
if repetitions <= 1:
return prompt
# 应用重复 + 添加标记
repeated = self.config.separator.join([prompt] * repetitions)
return f"{self.config.applied_marker}\n{repeated}"
def wrap_llm_call(self, llm_fn: Callable, model: str) -> Callable:
"""包装 LLM 调用函数"""
def wrapped(prompt: str, **kwargs):
transformed = self.transform(prompt, model)
return llm_fn(transformed, **kwargs)
return wrapped
def run_ab_test(prompts: List[str], llm_fn, model: str, ground_truth: List[str]):
"""提示词重复效果的 A/B 测试"""
transformer = PromptRepetitionTransformer()
results = {"baseline": [], "repeated": []}
for prompt, expected in zip(prompts, ground_truth):
# 基线
response_a = llm_fn(prompt)
results["baseline"].append(response_a == expected)
# 使用重复
repeated_prompt = transformer.transform(prompt, model)
response_b = llm_fn(repeated_prompt)
results["repeated"].append(response_b == expected)
baseline_acc = sum(results["baseline"]) / len(prompts)
repeated_acc = sum(results["repeated"]) / len(prompts)
print(f"Baseline accuracy: {baseline_acc:.2%}")
print(f"Repeated accuracy: {repeated_acc:.2%}")
print(f"Improvement: {repeated_acc - baseline_acc:+.2%}p")
| 指标 | 测量方法 |
|---|---|
| 准确率 | 比较正确答案率 |
| 一致性 | 同一提示词运行 10 次的方差 |
| Token 成本 | 输入 token 增加率 |
| 延迟 | 比较 p50, p99 延迟 |
| 情况 | 原因 |
|---|---|
| 使用 CoT | 推理过程已提供上下文 |
| 推理模型 (opus, sonnet) | 已优化;效果甚微 |
| 非常长的提示词 | 有超出上下文限制的风险 |
| 已重复应用 | 重复应用浪费 token |
| 指标 | 基线 | 使用重复 | 变化 |
|---|---|---|---|
| 输入 token | 500/请求 | 1000/请求 | +100% |
| 输出 token | 100/请求 | 100/请求 | 0% |
| 延迟 (p50) | 450ms | 460ms | +2% |
| 延迟 (p99) | 1200ms | 1250ms | +4% |
| 准确率 | 78% | 89% | +14%p |
| 每个正确答案的成本 | $0.019 | $0.020 | +5% |
关键洞察: 预填充阶段在 GPU 上高度并行化,因此输入 token 翻倍对延迟影响极小。
| 智能体 | 模型 | 是否应用重复 | 应用时机 |
|---|---|---|---|
| Claude 编排器 | opus/sonnet | 可选 | - |
| Claude 执行器 | haiku | 自动 | skill_loader.py |
| Gemini 分析师 | flash | 自动 | 在 MCP 调用时 |
| OpenAI | gpt-4o-mini | 自动 | skill_loader.py |
为防止在多智能体管道中重复应用:
<!-- prompt-repetition-applied --> 标记检测已应用的提示词x-prompt-repetition-applied: true 头部[Claude Sonnet] 规划(无需重复)
↓
[Gemini Flash] 分析(自动应用重复 ×2,添加标记)
↓
[Claude Haiku] 执行(检测到标记 → 跳过重复应用)
# 添加到 skill_loader.py 的代码
from prompt_repetition_transformer import PromptRepetitionTransformer
class SkillLoader:
def __init__(self, ...):
# ... 现有代码 ...
self.prompt_transformer = PromptRepetitionTransformer()
def apply_auto_skills(self, prompt: str, model: str) -> str:
"""处理自动应用技能"""
# 自动应用提示词重复
for skill in self.skills.values():
auto_apply = skill.get('data', {}).get('auto-apply', {})
if auto_apply.get('trigger') == 'auto':
target_models = auto_apply.get('models', [])
if any(m in model.lower() for m in target_models):
prompt = self.prompt_transformer.transform(prompt, model)
return prompt
. 等增加长度无效(根据研究)=== 自动应用目标模型 ===
claude-3-haiku, claude-haiku
gemini-flash, gemini-flash-lite, gemini-2.0-flash
gpt-4o-mini, gpt-low
=== 重复次数 ===
一般任务: 2×
位置/索引 (slot/position/index 关键词): 3×
使用 CoT: 0× (不应用)
=== 效果 (Google Research 2025) ===
提升率: 67% (47/70 基准测试)
性能下降: 0 例
最大提升: +76%p (NameIndex)
=== 成本 ===
输入 token: +100%
延迟: +2% (预填充并行化)
每个正确答案的成本: +5%
=== 防止重复应用 ===
标记: <!-- prompt-repetition-applied -->
每周安装数
1
仓库
GitHub 星标数
3
首次出现
1 天前
安全审计
安装于
mcpjam1
claude-code1
junie1
windsurf1
zencoder1
crush1
LLMs are trained as Causal Language Models , where each token attends only to previous tokens. This leads to:
Prompt repetition enables the second pass to reference the entire first pass, effectively mimicking some benefits of bidirectional attention.
[Context] → [Question]
↓
Cannot reference Question content when processing Context tokens
Attention weights for Context are already finalized by the time Question tokens appear
[First Pass] [Second Pass]
Context → Question → Context' → Question'
↑ ↑
Can reference entire first pass
In the second repetition, the model reprocesses information across the entire first prompt and strengthens attention weights on key concepts , resulting in improved performance.
Note : This does not change the model architecture to bidirectional; it is a prompt engineering technique to mitigate the limitations of causal models.
| Metric | Result |
|---|---|
| Significant improvement (p < 0.1) | 47 / 70 benchmarks |
| Performance degradation | 0 |
| Neutral | 23 |
| Improvement rate | 67% |
Most dramatic improvement: Gemini 2.0 Flash-Lite on NameIndex: 21.33% → 97.33% (+76%p)
| Provider | Auto-apply models | Excluded models |
|---|---|---|
| Claude | haiku series | opus, sonnet |
| Gemini | flash, flash-lite | pro, ultra |
| OpenAI | gpt-4o-mini, gpt-low | gpt-4o, gpt-4 |
| Task Type | Keyword Pattern | Repetitions | Expected Improvement |
|---|---|---|---|
| Options-First MCQ | A. B. C. D. choices first | 2× | +15-40%p |
| Index/Position | slot, position, index, N-th | 3× | +50-76%p |
| Context + Question | General question | 2× | +5-15%p |
| With CoT |
# Check context before auto-apply
max_context = model_context_window * 0.8 # 80% safety margin
if len(prompt_tokens) * repetitions > max_context:
repetitions = max(1, int(max_context / len(prompt_tokens)))
def apply_prompt_repetition(prompt: str, times: int = 2) -> str:
"""Repeat the prompt a specified number of times
Args:
prompt: Original prompt
times: Number of repetitions (default 2)
Returns:
Repeated prompt
"""
if times <= 1:
return prompt
return "\n\n".join([prompt] * times)
Before:
A. Paris
B. London
C. Berlin
D. Madrid
Which city is the capital of France?
Reply with one letter.
After (repetition ×2 applied):
A. Paris
B. London
C. Berlin
D. Madrid
Which city is the capital of France?
Reply with one letter.
A. Paris
B. London
C. Berlin
D. Madrid
Which city is the capital of France?
Reply with one letter.
Expected output:
A
Accuracy: original 78% → after repetition 93% (+15%p)
Before:
Inventory:
1. Iron Sword
2. Leather Armor
3. Health Potion (x5)
4. Magic Staff
...
25. Dragon Scale
...
50. Ancient Map
What item is in slot 25?
After (repetition ×3 applied): Prompt repeated 3 times
Expected output:
Dragon Scale
Accuracy: original 21% → after repetition 97% (+76%p)
Note : Prompts containing tool call instructions are also repeated in their entirety. The full-repetition approach was adopted for implementation simplicity and consistency.
Before:
Use the calculator tool to compute 234 * 567.
What is the result?
After (repetition ×2):
Use the calculator tool to compute 234 * 567.
What is the result?
Use the calculator tool to compute 234 * 567.
What is the result?
Research results show that full repetition including tool call sections is also effective.
"""prompt_repetition_transformer.py"""
from dataclasses import dataclass, field
from typing import Optional, Callable, List
import re
# Context window per model (in tokens)
MODEL_CONTEXT_WINDOWS = {
"claude-3-haiku": 200_000,
"claude-haiku": 200_000,
"gemini-flash": 1_000_000,
"gemini-flash-lite": 1_000_000,
"gemini-2.0-flash": 1_000_000,
"gpt-4o-mini": 128_000,
"gpt-low": 128_000,
}
# Models targeted for auto-apply
AUTO_APPLY_MODELS = list(MODEL_CONTEXT_WINDOWS.keys())
# CoT patterns (excluded from apply)
COT_PATTERNS = [
r"step by step",
r"think through",
r"let's think",
r"reasoning:",
r"chain of thought",
]
# Position/Index patterns (3× repetition)
POSITION_PATTERNS = [
r"slot \d+",
r"position \d+",
r"index \d+",
r"\d+(st|nd|rd|th)",
r"item \d+",
r"row \d+",
r"column \d+",
]
@dataclass
class PromptRepetitionConfig:
"""Prompt repetition configuration"""
default_repetitions: int = 2
position_repetitions: int = 3
separator: str = "\n\n"
max_context_ratio: float = 0.8
applied_marker: str = "<!-- prompt-repetition-applied -->"
class PromptRepetitionTransformer:
"""Auto-apply prompt repetition transformer for lightweight models"""
def __init__(self, config: Optional[PromptRepetitionConfig] = None):
self.config = config or PromptRepetitionConfig()
def should_apply(self, model: str, prompt: str) -> bool:
"""Determine whether to auto-apply"""
# Skip if already applied
if self.config.applied_marker in prompt:
return False
# Check target model
model_lower = model.lower()
if not any(m in model_lower for m in AUTO_APPLY_MODELS):
return False
# Skip when CoT pattern detected
prompt_lower = prompt.lower()
for pattern in COT_PATTERNS:
if re.search(pattern, prompt_lower):
return False
return True
def determine_repetitions(self, prompt: str, model: str) -> int:
"""Determine repetition count based on task type"""
prompt_lower = prompt.lower()
# Position/Index pattern detected → 3×
for pattern in POSITION_PATTERNS:
if re.search(pattern, prompt_lower):
return self.config.position_repetitions
return self.config.default_repetitions
def estimate_tokens(self, text: str) -> int:
"""Simple token count estimation (speed over precision)"""
# Estimate approximately 4 characters = 1 token
return len(text) // 4
def transform(self, prompt: str, model: str) -> str:
"""Apply repetition to prompt"""
if not self.should_apply(model, prompt):
return prompt
repetitions = self.determine_repetitions(prompt, model)
# Check context limit
model_lower = model.lower()
max_tokens = 128_000 # Default value
for m, tokens in MODEL_CONTEXT_WINDOWS.items():
if m in model_lower:
max_tokens = tokens
break
max_allowed = int(max_tokens * self.config.max_context_ratio)
prompt_tokens = self.estimate_tokens(prompt)
# Reduce repetitions if token limit exceeded
while prompt_tokens * repetitions > max_allowed and repetitions > 1:
repetitions -= 1
if repetitions <= 1:
return prompt
# Apply repetition + add marker
repeated = self.config.separator.join([prompt] * repetitions)
return f"{self.config.applied_marker}\n{repeated}"
def wrap_llm_call(self, llm_fn: Callable, model: str) -> Callable:
"""Wrap LLM call function"""
def wrapped(prompt: str, **kwargs):
transformed = self.transform(prompt, model)
return llm_fn(transformed, **kwargs)
return wrapped
def run_ab_test(prompts: List[str], llm_fn, model: str, ground_truth: List[str]):
"""A/B test for prompt repetition effectiveness"""
transformer = PromptRepetitionTransformer()
results = {"baseline": [], "repeated": []}
for prompt, expected in zip(prompts, ground_truth):
# Baseline
response_a = llm_fn(prompt)
results["baseline"].append(response_a == expected)
# With Repetition
repeated_prompt = transformer.transform(prompt, model)
response_b = llm_fn(repeated_prompt)
results["repeated"].append(response_b == expected)
baseline_acc = sum(results["baseline"]) / len(prompts)
repeated_acc = sum(results["repeated"]) / len(prompts)
print(f"Baseline accuracy: {baseline_acc:.2%}")
print(f"Repeated accuracy: {repeated_acc:.2%}")
print(f"Improvement: {repeated_acc - baseline_acc:+.2%}p")
| Metric | Measurement Method |
|---|---|
| Accuracy | Compare correct answer rates |
| Consistency | Variance across 10 runs of same prompt |
| Token cost | Input token increase rate |
| Latency | Compare p50, p99 latency |
| Case | Reason |
|---|---|
| Using CoT | Reasoning process already provides context |
| Reasoning models (opus, sonnet) | Already optimized; minimal effect |
| Very long prompts | Risk of exceeding context limit |
| Already repeated | Duplicate application wastes tokens |
| Metric | Baseline | With Repetition | Change |
|---|---|---|---|
| Input tokens | 500/req | 1000/req | +100% |
| Output tokens | 100/req | 100/req | 0% |
| Latency (p50) | 450ms | 460ms | +2% |
| Latency (p99) | 1200ms | 1250ms | +4% |
| Accuracy | 78% | 89% | +14%p |
| Cost per correct answer | $0.019 | $0.020 | +5% |
Key insight: The prefill phase is highly parallelized on GPU, so doubling input tokens has minimal impact on latency.
| Agent | Model | Repetition Applied | Applied At |
|---|---|---|---|
| Claude Orchestrator | opus/sonnet | Optional | - |
| Claude Executor | haiku | Auto | skill_loader.py |
| Gemini Analyst | flash | Auto | On MCP call |
| OpenAI | gpt-4o-mini | Auto | skill_loader.py |
To prevent duplicate application in multi-agent pipelines:
<!-- prompt-repetition-applied --> markerx-prompt-repetition-applied: true header between agents[Claude Sonnet] Planning (no repetition needed)
↓
[Gemini Flash] Analysis (repetition ×2 auto-applied, marker added)
↓
[Claude Haiku] Execution (marker detected → skip duplicate apply)
# Code to add to skill_loader.py
from prompt_repetition_transformer import PromptRepetitionTransformer
class SkillLoader:
def __init__(self, ...):
# ... existing code ...
self.prompt_transformer = PromptRepetitionTransformer()
def apply_auto_skills(self, prompt: str, model: str) -> str:
"""Handle auto-apply skills"""
# Auto-apply prompt-repetition
for skill in self.skills.values():
auto_apply = skill.get('data', {}).get('auto-apply', {})
if auto_apply.get('trigger') == 'auto':
target_models = auto_apply.get('models', [])
if any(m in model.lower() for m in target_models):
prompt = self.prompt_transformer.transform(prompt, model)
return prompt
. etc. has no effect (per research)=== Auto-Apply Target Models ===
claude-3-haiku, claude-haiku
gemini-flash, gemini-flash-lite, gemini-2.0-flash
gpt-4o-mini, gpt-low
=== Repetition Count ===
General tasks: 2×
Position/Index (slot/position/index keywords): 3×
With CoT: 0× (not applied)
=== Effect (Google Research 2025) ===
Improvement rate: 67% (47/70 benchmarks)
Performance degradation: 0 cases
Maximum improvement: +76%p (NameIndex)
=== Cost ===
Input tokens: +100%
Latency: +2% (Prefill parallelization)
Cost per correct answer: +5%
=== Duplicate Application Prevention ===
Marker: <!-- prompt-repetition-applied -->
Weekly Installs
1
Repository
GitHub Stars
3
First Seen
1 day ago
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
mcpjam1
claude-code1
junie1
windsurf1
zencoder1
crush1
超能力技能使用指南:AI助手技能调用优先级与工作流程详解
46,500 周安装
step by step, think through |
| 0× (not applied) |
| ~0% |