重要前提
安装AI Skills的关键前提是:必须科学上网,且开启TUN模式,这一点至关重要,直接决定安装能否顺利完成,在此郑重提醒三遍:科学上网,科学上网,科学上网。查看完整安装教程 →
skill-logger by erichowens/some_claude_skills
npx skills add https://github.com/erichowens/some_claude_skills --skill skill-logger通过系统化的日志记录和评分,追踪、衡量并提升技能质量。
适用于:
不适用于:
┌────────────────────────────────────────────────────────────────┐
│ SKILL LOGGING PIPELINE │
├────────────────────────────────────────────────────────────────┤
│ │
│ 1. CAPTURE 2. ANALYZE 3. SCORE │
│ ├─ Invocation ├─ Output parse ├─ Quality metrics │
│ ├─ Input context ├─ Token usage ├─ User satisfaction │
│ ├─ Output ├─ Tool calls ├─ Goal completion │
│ └─ Timing └─ Error patterns └─ Efficiency │
│ │
│ 4. AGGREGATE 5. ALERT 6. IMPROVE │
│ ├─ Per-skill stats ├─ Quality drops ├─ Identify patterns │
│ ├─ Trend analysis ├─ Error spikes ├─ Suggest changes │
│ └─ Comparisons └─ Underuse └─ Track experiments │
│ │
└────────────────────────────────────────────────────────────────┘
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
{
"invocation_id": "uuid",
"timestamp": "ISO8601",
"skill_name": "wedding-immortalist",
"skill_version": "1.2.0",
"input": {
"user_query": "Create a 3D model from my wedding photos",
"context_tokens": 1500,
"files_referenced": ["photos/", "config.json"]
},
"execution": {
"duration_ms": 45000,
"tool_calls": [
{"tool": "Bash", "count": 5},
{"tool": "Write", "count": 3}
],
"tokens_used": {
"input": 8500,
"output": 3200
},
"errors": []
},
"output": {
"type": "code_generation",
"artifacts_created": ["pipeline.py", "config.yaml"],
"response_length": 3200
}
}
QUALITY_SIGNALS = {
# Implicit signals (automated)
'completion': 'Did the skill complete without errors?',
'token_efficiency': 'Output quality per token used',
'tool_success_rate': 'Tool calls that succeeded',
'retry_count': 'How many retries needed?',
# Explicit signals (user feedback)
'user_edit_ratio': 'How much did user modify output?',
'user_accepted': 'Did user accept/use the output?',
'follow_up_needed': 'Did user need to ask for fixes?',
'explicit_rating': 'Thumbs up/down if available',
# Outcome signals (delayed)
'code_ran_successfully': 'Did generated code work?',
'tests_passed': 'Did it pass tests?',
'reverted': 'Was the output later reverted?',
}
def calculate_skill_score(invocation_log):
"""Score a skill invocation 0-100."""
scores = {
# Completion (25%)
'completion': (
25 if invocation_log['errors'] == [] else
15 if invocation_log['recovered'] else
0
),
# Efficiency (20%)
'efficiency': min(20, 20 * (
BASELINE_TOKENS / invocation_log['tokens_used']
)),
# Output Quality (30%)
'quality': (
30 if invocation_log['user_accepted'] else
20 if invocation_log['user_edit_ratio'] < 0.2 else
10 if invocation_log['user_edit_ratio'] < 0.5 else
0
),
# User Satisfaction (25%)
'satisfaction': (
25 if invocation_log['explicit_rating'] == 'positive' else
15 if invocation_log['no_follow_up'] else
5 if invocation_log['follow_up_resolved'] else
0
),
}
return sum(scores.values())
| 分数范围 | 质量等级 | 行动 |
|---|---|---|
| 90-100 | 优秀 | 记录为典范 |
| 75-89 | 良好 | 监控一致性 |
| 50-74 | 可接受 | 审查以改进 |
| 25-49 | 较差 | 优先修复 |
| 0-24 | 失败 | 立即干预 |
CREATE TABLE skill_invocations (
id TEXT PRIMARY KEY,
skill_name TEXT NOT NULL,
skill_version TEXT,
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
-- Input
user_query TEXT,
context_tokens INTEGER,
-- Execution
duration_ms INTEGER,
tokens_input INTEGER,
tokens_output INTEGER,
tool_calls_json TEXT,
errors_json TEXT,
-- Output
output_type TEXT,
artifacts_json TEXT,
response_length INTEGER,
-- Quality signals
user_accepted BOOLEAN,
user_edit_ratio REAL,
follow_up_needed BOOLEAN,
explicit_rating TEXT,
-- Computed
quality_score REAL,
INDEX idx_skill_name (skill_name),
INDEX idx_timestamp (timestamp),
INDEX idx_quality (quality_score)
);
CREATE TABLE skill_aggregates (
skill_name TEXT,
period TEXT, -- 'daily', 'weekly', 'monthly'
period_start DATE,
invocation_count INTEGER,
avg_quality_score REAL,
error_rate REAL,
avg_tokens_used INTEGER,
avg_duration_ms INTEGER,
PRIMARY KEY (skill_name, period, period_start)
);
{
"logs_version": "1.0",
"skill_name": "wedding-immortalist",
"entries": [
{
"id": "uuid",
"timestamp": "2025-01-15T14:30:00Z",
"input": {...},
"execution": {...},
"output": {...},
"quality": {
"signals": {...},
"score": 85,
"computed_at": "2025-01-15T14:35:00Z"
}
}
]
}
-- Overall skill rankings
SELECT
skill_name,
COUNT(*) as uses,
AVG(quality_score) as avg_quality,
AVG(tokens_output) as avg_tokens,
SUM(CASE WHEN errors_json != '[]' THEN 1 ELSE 0 END) * 100.0 / COUNT(*) as error_rate
FROM skill_invocations
WHERE timestamp > datetime('now', '-30 days')
GROUP BY skill_name
ORDER BY avg_quality DESC;
-- Quality trend (weekly)
SELECT
skill_name,
strftime('%Y-%W', timestamp) as week,
AVG(quality_score) as avg_quality,
COUNT(*) as uses
FROM skill_invocations
GROUP BY skill_name, week
ORDER BY skill_name, week;
-- Problem detection
SELECT skill_name, COUNT(*) as failures
FROM skill_invocations
WHERE quality_score < 50
AND timestamp > datetime('now', '-7 days')
GROUP BY skill_name
HAVING failures >= 3
ORDER BY failures DESC;
def identify_improvement_opportunities(skill_name, logs):
"""Analyze logs to suggest skill improvements."""
opportunities = []
# Pattern 1: Common follow-up questions
follow_ups = extract_follow_up_patterns(logs)
if follow_ups:
opportunities.append({
'type': 'missing_capability',
'description': f'Users frequently ask: {follow_ups[0]}',
'suggestion': 'Add guidance for this common need'
})
# Pattern 2: High edit ratio in specific output types
edit_patterns = analyze_edit_patterns(logs)
if edit_patterns['code'] > 0.4:
opportunities.append({
'type': 'code_quality',
'description': 'Users frequently edit generated code',
'suggestion': 'Review code examples and templates'
})
# Pattern 3: Repeated errors
error_patterns = cluster_errors(logs)
for error_type, count in error_patterns:
if count >= 3:
opportunities.append({
'type': 'recurring_error',
'description': f'{error_type} occurred {count} times',
'suggestion': 'Add error handling or documentation'
})
return opportunities
# hooks/skill_logger.py
import json
import sqlite3
from datetime import datetime
from pathlib import Path
LOG_DB = Path.home() / '.claude' / 'skill_logs.db'
def log_skill_invocation(
skill_name: str,
user_query: str,
output: str,
tool_calls: list,
duration_ms: int,
tokens: dict,
errors: list = None
):
"""Log a skill invocation to the database."""
conn = sqlite3.connect(LOG_DB)
cursor = conn.cursor()
cursor.execute('''
INSERT INTO skill_invocations
(id, skill_name, timestamp, user_query, duration_ms,
tokens_input, tokens_output, tool_calls_json, errors_json,
response_length)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
''', (
str(uuid.uuid4()),
skill_name,
datetime.utcnow().isoformat(),
user_query,
duration_ms,
tokens.get('input', 0),
tokens.get('output', 0),
json.dumps(tool_calls),
json.dumps(errors or []),
len(output)
))
conn.commit()
conn.close()
def collect_quality_signals(invocation_id: str, signals: dict):
"""Update an invocation with quality signals."""
conn = sqlite3.connect(LOG_DB)
cursor = conn.cursor()
# Update with user feedback
cursor.execute('''
UPDATE skill_invocations
SET user_accepted = ?,
user_edit_ratio = ?,
follow_up_needed = ?,
explicit_rating = ?,
quality_score = ?
WHERE id = ?
''', (
signals.get('accepted'),
signals.get('edit_ratio'),
signals.get('follow_up'),
signals.get('rating'),
calculate_score(signals),
invocation_id
))
conn.commit()
conn.close()
ALERT_CONDITIONS = {
'quality_drop': {
'condition': 'avg_quality_7d < avg_quality_30d * 0.8',
'message': 'Skill {skill} quality dropped 20%+ in past week',
'severity': 'warning'
},
'error_spike': {
'condition': 'error_rate_24h > error_rate_7d * 2',
'message': 'Skill {skill} error rate doubled in past 24h',
'severity': 'critical'
},
'underused': {
'condition': 'uses_7d < uses_30d_avg * 0.5',
'message': 'Skill {skill} usage down 50%+ this week',
'severity': 'info'
},
'high_performer': {
'condition': 'avg_quality_7d > 90 AND uses_7d > 10',
'message': 'Skill {skill} performing excellently',
'severity': 'positive'
}
}
错误做法:记录每次调用的完整输入/输出。原因:隐私问题、存储爆炸、噪音。正确做法:记录元数据、摘要,并选择性地进行详细日志记录。
错误做法:在完成后立即计算质量分数。原因:错过了延迟信号(代码运行了吗?是否被回滚了?)。正确做法:随时间收集信号,定期重新计算。
错误做法:只跟踪平均质量分数。原因:隐藏了分布情况,错过了故障模式。正确做法:跟踪百分位数、故障率和模式。
错误做法:在没有建立基线的情况下衡量质量。原因:无法检测改进或回归。正确做法:为每个技能建立基线,比较趋势。
# Skill Health Report - Week of 2025-01-13
## Overview
- Total invocations: 247
- Average quality: 78.3 (up 2.1 from last week)
- Error rate: 4.2% (down 1.8%)
## Top Performers
1. **wedding-immortalist** - 92.1 avg quality, 18 uses
2. **skill-coach** - 89.4 avg quality, 34 uses
3. **api-architect** - 87.2 avg quality, 22 uses
## Needs Attention
1. **legacy-code-converter** - 52.3 avg quality (down 15%)
- Common issue: Missing dependency detection
- Suggested fix: Add dependency scanning step
## Improvement Opportunities
- `partner-text-coach`: Users frequently ask for tone adjustment
- `yard-landscaper`: High edit ratio on plant recommendations
核心理念:能被衡量的东西才能被改进。技能日志记录将关于技能质量的直觉转化为可操作的数据,从而推动整个技能生态系统的持续改进。
每周安装数
52
代码仓库
GitHub 星标数
78
首次出现
2026年1月24日
安全审计
安装于
gemini-cli47
codex47
cursor46
opencode46
github-copilot44
cline39
Track, measure, and improve skill quality through systematic logging and scoring.
Use for:
NOT for:
┌────────────────────────────────────────────────────────────────┐
│ SKILL LOGGING PIPELINE │
├────────────────────────────────────────────────────────────────┤
│ │
│ 1. CAPTURE 2. ANALYZE 3. SCORE │
│ ├─ Invocation ├─ Output parse ├─ Quality metrics │
│ ├─ Input context ├─ Token usage ├─ User satisfaction │
│ ├─ Output ├─ Tool calls ├─ Goal completion │
│ └─ Timing └─ Error patterns └─ Efficiency │
│ │
│ 4. AGGREGATE 5. ALERT 6. IMPROVE │
│ ├─ Per-skill stats ├─ Quality drops ├─ Identify patterns │
│ ├─ Trend analysis ├─ Error spikes ├─ Suggest changes │
│ └─ Comparisons └─ Underuse └─ Track experiments │
│ │
└────────────────────────────────────────────────────────────────┘
{
"invocation_id": "uuid",
"timestamp": "ISO8601",
"skill_name": "wedding-immortalist",
"skill_version": "1.2.0",
"input": {
"user_query": "Create a 3D model from my wedding photos",
"context_tokens": 1500,
"files_referenced": ["photos/", "config.json"]
},
"execution": {
"duration_ms": 45000,
"tool_calls": [
{"tool": "Bash", "count": 5},
{"tool": "Write", "count": 3}
],
"tokens_used": {
"input": 8500,
"output": 3200
},
"errors": []
},
"output": {
"type": "code_generation",
"artifacts_created": ["pipeline.py", "config.yaml"],
"response_length": 3200
}
}
QUALITY_SIGNALS = {
# Implicit signals (automated)
'completion': 'Did the skill complete without errors?',
'token_efficiency': 'Output quality per token used',
'tool_success_rate': 'Tool calls that succeeded',
'retry_count': 'How many retries needed?',
# Explicit signals (user feedback)
'user_edit_ratio': 'How much did user modify output?',
'user_accepted': 'Did user accept/use the output?',
'follow_up_needed': 'Did user need to ask for fixes?',
'explicit_rating': 'Thumbs up/down if available',
# Outcome signals (delayed)
'code_ran_successfully': 'Did generated code work?',
'tests_passed': 'Did it pass tests?',
'reverted': 'Was the output later reverted?',
}
def calculate_skill_score(invocation_log):
"""Score a skill invocation 0-100."""
scores = {
# Completion (25%)
'completion': (
25 if invocation_log['errors'] == [] else
15 if invocation_log['recovered'] else
0
),
# Efficiency (20%)
'efficiency': min(20, 20 * (
BASELINE_TOKENS / invocation_log['tokens_used']
)),
# Output Quality (30%)
'quality': (
30 if invocation_log['user_accepted'] else
20 if invocation_log['user_edit_ratio'] < 0.2 else
10 if invocation_log['user_edit_ratio'] < 0.5 else
0
),
# User Satisfaction (25%)
'satisfaction': (
25 if invocation_log['explicit_rating'] == 'positive' else
15 if invocation_log['no_follow_up'] else
5 if invocation_log['follow_up_resolved'] else
0
),
}
return sum(scores.values())
| Score Range | Quality Level | Action |
|---|---|---|
| 90-100 | Excellent | Document as exemplar |
| 75-89 | Good | Monitor for consistency |
| 50-74 | Acceptable | Review for improvements |
| 25-49 | Poor | Prioritize fixes |
| 0-24 | Failing | Immediate intervention |
CREATE TABLE skill_invocations (
id TEXT PRIMARY KEY,
skill_name TEXT NOT NULL,
skill_version TEXT,
timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,
-- Input
user_query TEXT,
context_tokens INTEGER,
-- Execution
duration_ms INTEGER,
tokens_input INTEGER,
tokens_output INTEGER,
tool_calls_json TEXT,
errors_json TEXT,
-- Output
output_type TEXT,
artifacts_json TEXT,
response_length INTEGER,
-- Quality signals
user_accepted BOOLEAN,
user_edit_ratio REAL,
follow_up_needed BOOLEAN,
explicit_rating TEXT,
-- Computed
quality_score REAL,
INDEX idx_skill_name (skill_name),
INDEX idx_timestamp (timestamp),
INDEX idx_quality (quality_score)
);
CREATE TABLE skill_aggregates (
skill_name TEXT,
period TEXT, -- 'daily', 'weekly', 'monthly'
period_start DATE,
invocation_count INTEGER,
avg_quality_score REAL,
error_rate REAL,
avg_tokens_used INTEGER,
avg_duration_ms INTEGER,
PRIMARY KEY (skill_name, period, period_start)
);
{
"logs_version": "1.0",
"skill_name": "wedding-immortalist",
"entries": [
{
"id": "uuid",
"timestamp": "2025-01-15T14:30:00Z",
"input": {...},
"execution": {...},
"output": {...},
"quality": {
"signals": {...},
"score": 85,
"computed_at": "2025-01-15T14:35:00Z"
}
}
]
}
-- Overall skill rankings
SELECT
skill_name,
COUNT(*) as uses,
AVG(quality_score) as avg_quality,
AVG(tokens_output) as avg_tokens,
SUM(CASE WHEN errors_json != '[]' THEN 1 ELSE 0 END) * 100.0 / COUNT(*) as error_rate
FROM skill_invocations
WHERE timestamp > datetime('now', '-30 days')
GROUP BY skill_name
ORDER BY avg_quality DESC;
-- Quality trend (weekly)
SELECT
skill_name,
strftime('%Y-%W', timestamp) as week,
AVG(quality_score) as avg_quality,
COUNT(*) as uses
FROM skill_invocations
GROUP BY skill_name, week
ORDER BY skill_name, week;
-- Problem detection
SELECT skill_name, COUNT(*) as failures
FROM skill_invocations
WHERE quality_score < 50
AND timestamp > datetime('now', '-7 days')
GROUP BY skill_name
HAVING failures >= 3
ORDER BY failures DESC;
def identify_improvement_opportunities(skill_name, logs):
"""Analyze logs to suggest skill improvements."""
opportunities = []
# Pattern 1: Common follow-up questions
follow_ups = extract_follow_up_patterns(logs)
if follow_ups:
opportunities.append({
'type': 'missing_capability',
'description': f'Users frequently ask: {follow_ups[0]}',
'suggestion': 'Add guidance for this common need'
})
# Pattern 2: High edit ratio in specific output types
edit_patterns = analyze_edit_patterns(logs)
if edit_patterns['code'] > 0.4:
opportunities.append({
'type': 'code_quality',
'description': 'Users frequently edit generated code',
'suggestion': 'Review code examples and templates'
})
# Pattern 3: Repeated errors
error_patterns = cluster_errors(logs)
for error_type, count in error_patterns:
if count >= 3:
opportunities.append({
'type': 'recurring_error',
'description': f'{error_type} occurred {count} times',
'suggestion': 'Add error handling or documentation'
})
return opportunities
# hooks/skill_logger.py
import json
import sqlite3
from datetime import datetime
from pathlib import Path
LOG_DB = Path.home() / '.claude' / 'skill_logs.db'
def log_skill_invocation(
skill_name: str,
user_query: str,
output: str,
tool_calls: list,
duration_ms: int,
tokens: dict,
errors: list = None
):
"""Log a skill invocation to the database."""
conn = sqlite3.connect(LOG_DB)
cursor = conn.cursor()
cursor.execute('''
INSERT INTO skill_invocations
(id, skill_name, timestamp, user_query, duration_ms,
tokens_input, tokens_output, tool_calls_json, errors_json,
response_length)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
''', (
str(uuid.uuid4()),
skill_name,
datetime.utcnow().isoformat(),
user_query,
duration_ms,
tokens.get('input', 0),
tokens.get('output', 0),
json.dumps(tool_calls),
json.dumps(errors or []),
len(output)
))
conn.commit()
conn.close()
def collect_quality_signals(invocation_id: str, signals: dict):
"""Update an invocation with quality signals."""
conn = sqlite3.connect(LOG_DB)
cursor = conn.cursor()
# Update with user feedback
cursor.execute('''
UPDATE skill_invocations
SET user_accepted = ?,
user_edit_ratio = ?,
follow_up_needed = ?,
explicit_rating = ?,
quality_score = ?
WHERE id = ?
''', (
signals.get('accepted'),
signals.get('edit_ratio'),
signals.get('follow_up'),
signals.get('rating'),
calculate_score(signals),
invocation_id
))
conn.commit()
conn.close()
ALERT_CONDITIONS = {
'quality_drop': {
'condition': 'avg_quality_7d < avg_quality_30d * 0.8',
'message': 'Skill {skill} quality dropped 20%+ in past week',
'severity': 'warning'
},
'error_spike': {
'condition': 'error_rate_24h > error_rate_7d * 2',
'message': 'Skill {skill} error rate doubled in past 24h',
'severity': 'critical'
},
'underused': {
'condition': 'uses_7d < uses_30d_avg * 0.5',
'message': 'Skill {skill} usage down 50%+ this week',
'severity': 'info'
},
'high_performer': {
'condition': 'avg_quality_7d > 90 AND uses_7d > 10',
'message': 'Skill {skill} performing excellently',
'severity': 'positive'
}
}
Wrong : Logging complete input/output for every invocation. Why : Privacy concerns, storage explosion, noise. Right : Log metadata, summaries, and opt-in detailed logging.
Wrong : Calculating quality score immediately after completion. Why : Misses delayed signals (did code work? was it reverted?). Right : Collect signals over time, recalculate periodically.
Wrong : Only tracking average quality scores. Why : Hides distribution, misses failure modes. Right : Track percentiles, failure rates, and patterns.
Wrong : Measuring quality without establishing baselines. Why : Can't detect improvement or regression. Right : Establish baselines per skill, compare trends.
# Skill Health Report - Week of 2025-01-13
## Overview
- Total invocations: 247
- Average quality: 78.3 (up 2.1 from last week)
- Error rate: 4.2% (down 1.8%)
## Top Performers
1. **wedding-immortalist** - 92.1 avg quality, 18 uses
2. **skill-coach** - 89.4 avg quality, 34 uses
3. **api-architect** - 87.2 avg quality, 22 uses
## Needs Attention
1. **legacy-code-converter** - 52.3 avg quality (down 15%)
- Common issue: Missing dependency detection
- Suggested fix: Add dependency scanning step
## Improvement Opportunities
- `partner-text-coach`: Users frequently ask for tone adjustment
- `yard-landscaper`: High edit ratio on plant recommendations
Core Philosophy : What gets measured gets improved. Skill logging transforms intuition about skill quality into actionable data, enabling continuous improvement of the entire skill ecosystem.
Weekly Installs
52
Repository
GitHub Stars
78
First Seen
Jan 24, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
gemini-cli47
codex47
cursor46
opencode46
github-copilot44
cline39
Convex性能审计指南 - 诊断修复Convex应用性能问题与优化方案
19,300 周安装
网页无障碍性(a11y)最佳实践指南:遵循WCAG标准,提升网站包容性设计
1,200 周安装
Vue JSX 最佳实践指南:与 React JSX 的区别、迁移技巧与性能优化
1,200 周安装
Motion Vue (motion-v) - Vue 3/Nuxt 动画库 | 硬件加速、声明式动画、手势交互
1,200 周安装
Convex 安全审计指南 - 全面审查授权逻辑、数据访问边界与操作隔离
1,200 周安装
Claude记忆搜索工具 - 高效检索会话历史,节省10倍令牌数
1,200 周安装
API文档自动生成器 - 支持REST/GraphQL/WebSocket,一键生成专业API文档
1,200 周安装