nowait-reasoning-optimizer by davila7/claude-code-templates
npx skills add https://github.com/davila7/claude-code-templates --skill nowait-reasoning-optimizer实现了论文《Wait, We Don't Need to 'Wait'! Removing Thinking Tokens Improves Reasoning Efficiency》(Wang 等人,2025 年)中的 NOWAIT 技术。
NOWAIT 是一种免训练推理时干预技术,可在生成过程中抑制自我反思令牌(例如 "Wait"、"Hmm"、"Alternatively"),将思维链推理轨迹长度减少 27-51%,且不影响模型效用。
| 模型系列 | 类型 | 令牌减少量 |
|---|---|---|
| QwQ-32B | 基于强化学习 | 16-31% |
| Phi4-Reasoning-Plus | 基于强化学习 | 23-28% |
| Qwen3-32B | 基于强化学习 | 13-16% |
| Kimi-VL-A3B | 多模态 | 40-60% |
| QvQ-72B-Preview | 多模态 | 20-30% |
:NOWAIT 在基于强化学习的模型上效果最佳。当抑制反思令牌时,蒸馏模型(Qwen3-4B/8B/14B)的性能会下降。
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
from scripts.nowait_processor import NOWAITLogitProcessor
# 为你的模型分词器初始化处理器
processor = NOWAITLogitProcessor(tokenizer)
# 在生成过程中使用
outputs = model.generate(
inputs,
logits_processor=[processor],
max_new_tokens=32768
)
完整列表请参见 references/keywords.md。核心关键词:
wait, alternatively, hmm, but, however, check,
double-check, maybe, verify, again, oh, ah
Logits (Before) Logits (After)
Wait 0.8 → Wait -inf
First 0.6 → First 0.6
Hmm 0.5 → Hmm -inf
Let 0.4 → Let 0.4
| 模型类型 | NOWAIT 效果 | 建议 |
|---|---|---|
| 基于强化学习(QwQ、Phi4、Qwen3-32B) | 准确率稳定,令牌减少显著 | ✅ 推荐使用 |
| 蒸馏模型(Qwen3-4B/8B/14B) | 在困难任务上准确率下降 | ⚠️ 谨慎使用 |
蒸馏模型严重依赖训练数据中的思维链结构——移除反思令牌会破坏它们的推理模式。
from transformers import AutoModelForCausalLM, AutoTokenizer
from scripts.nowait_processor import NOWAITLogitProcessor
model = AutoModelForCausalLM.from_pretrained("Qwen/QwQ-32B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/QwQ-32B")
processor = NOWAITLogitProcessor(tokenizer)
response = model.generate(
tokenizer(prompt, return_tensors="pt").input_ids,
logits_processor=[processor],
max_new_tokens=32768,
do_sample=True,
temperature=0.7
)
from vllm import LLM, SamplingParams
from scripts.nowait_processor import get_nowait_bad_words_ids
llm = LLM(model="Qwen/QwQ-32B")
bad_words_ids = get_nowait_bad_words_ids(llm.get_tokenizer())
sampling_params = SamplingParams(
max_tokens=32768,
bad_words_ids=bad_words_ids
)
| 任务类型 | 原始令牌数 | NOWAIT 令牌数 | 减少量 |
|---|---|---|---|
| 数学(AIME) | 15,000 | 10,500 | 30% |
| 视觉问答(MMMU) | 2,900 | 1,450 | 50% |
| 视频问答(MMVU) | 1,700 | 1,250 | 27% |
references/keywords.mdscripts/nowait_processor.py每周安装量
210
代码仓库
GitHub 星标数
23.4K
首次出现
2026 年 1 月 21 日
安全审计
安装于
opencode174
gemini-cli170
cursor159
codex157
claude-code143
github-copilot140
Implements the NOWAIT technique from the paper "Wait, We Don't Need to 'Wait'! Removing Thinking Tokens Improves Reasoning Efficiency" (Wang et al., 2025).
NOWAIT is a training-free inference-time intervention that suppresses self-reflection tokens (e.g., "Wait", "Hmm", "Alternatively") during generation, reducing chain-of-thought (CoT) trajectory length by 27-51% without compromising model utility.
| Model Series | Type | Token Reduction |
|---|---|---|
| QwQ-32B | RL-based | 16-31% |
| Phi4-Reasoning-Plus | RL-based | 23-28% |
| Qwen3-32B | RL-based | 13-16% |
| Kimi-VL-A3B | Multimodal | 40-60% |
| QvQ-72B-Preview | Multimodal | 20-30% |
Important : NOWAIT works best with RL-based models. Distilled models (Qwen3-4B/8B/14B) show degraded performance when reflection tokens are suppressed.
from scripts.nowait_processor import NOWAITLogitProcessor
# Initialize processor for your model's tokenizer
processor = NOWAITLogitProcessor(tokenizer)
# Use during generation
outputs = model.generate(
inputs,
logits_processor=[processor],
max_new_tokens=32768
)
See references/keywords.md for the complete list. Core keywords:
wait, alternatively, hmm, but, however, check,
double-check, maybe, verify, again, oh, ah
Logits (Before) Logits (After)
Wait 0.8 → Wait -inf
First 0.6 → First 0.6
Hmm 0.5 → Hmm -inf
Let 0.4 → Let 0.4
| Model Type | NOWAIT Effect | Recommendation |
|---|---|---|
| RL-based (QwQ, Phi4, Qwen3-32B) | Stable accuracy, significant token reduction | ✅ Recommended |
| Distilled (Qwen3-4B/8B/14B) | Accuracy degradation on hard tasks | ⚠️ Use with caution |
Distilled models rely heavily on CoT structure from training data—removing reflection tokens disrupts their reasoning patterns.
from transformers import AutoModelForCausalLM, AutoTokenizer
from scripts.nowait_processor import NOWAITLogitProcessor
model = AutoModelForCausalLM.from_pretrained("Qwen/QwQ-32B")
tokenizer = AutoTokenizer.from_pretrained("Qwen/QwQ-32B")
processor = NOWAITLogitProcessor(tokenizer)
response = model.generate(
tokenizer(prompt, return_tensors="pt").input_ids,
logits_processor=[processor],
max_new_tokens=32768,
do_sample=True,
temperature=0.7
)
from vllm import LLM, SamplingParams
from scripts.nowait_processor import get_nowait_bad_words_ids
llm = LLM(model="Qwen/QwQ-32B")
bad_words_ids = get_nowait_bad_words_ids(llm.get_tokenizer())
sampling_params = SamplingParams(
max_tokens=32768,
bad_words_ids=bad_words_ids
)
| Task Type | Original Tokens | NOWAIT Tokens | Reduction |
|---|---|---|---|
| Math (AIME) | 15,000 | 10,500 | 30% |
| Visual QA (MMMU) | 2,900 | 1,450 | 50% |
| Video QA (MMVU) | 1,700 | 1,250 | 27% |
references/keywords.mdscripts/nowait_processor.pyWeekly Installs
210
Repository
GitHub Stars
23.4K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
opencode174
gemini-cli170
cursor159
codex157
claude-code143
github-copilot140
超能力技能使用指南:AI助手技能调用优先级与工作流程详解
45,100 周安装