⚠️

重要前提

安装AI Skills的关键前提是：必须科学上网，且开启TUN模式，这一点至关重要，直接决定安装能否顺利完成，在此郑重提醒三遍：科学上网，科学上网，科学上网。查看完整安装教程 →

Skill Logger：AI技能日志记录与质量评估工具，追踪性能提升技能质量

skill-logger by erichowens/some_claude_skills

56 周安装量

86 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/erichowens/some_claude_skills --skill skill-logger

质量管理 Claude技能监控

🇨🇳中文介绍

Skill Logger

通过系统化的日志记录和评分，追踪、衡量并提升技能质量。

何时使用此技能

适用于：

设置技能使用日志记录
为技能输出定义质量指标
分析技能随时间变化的性能
识别需要改进的技能
为技能增强构建反馈循环
对技能变体进行 A/B 测试

不适用于：

创建新技能 → 使用 agent-creator
技能文档 → 使用 skill-coach
运行时调试 → 使用适当的调试器技能
通用日志记录/监控 → 使用 devops-automator

核心日志记录架构

┌────────────────────────────────────────────────────────────────┐
│                    SKILL LOGGING PIPELINE                       │
├────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. CAPTURE          2. ANALYZE           3. SCORE              │
│  ├─ Invocation       ├─ Output parse      ├─ Quality metrics    │
│  ├─ Input context    ├─ Token usage       ├─ User satisfaction  │
│  ├─ Output           ├─ Tool calls        ├─ Goal completion    │
│  └─ Timing           └─ Error patterns    └─ Efficiency         │
│                                                                 │
│  4. AGGREGATE        5. ALERT             6. IMPROVE            │
│  ├─ Per-skill stats  ├─ Quality drops     ├─ Identify patterns  │
│  ├─ Trend analysis   ├─ Error spikes      ├─ Suggest changes    │
│  └─ Comparisons      └─ Underuse          └─ Track experiments  │
│                                                                 │
└────────────────────────────────────────────────────────────────┘

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

分数范围	质量等级	行动
90-100	优秀	记录为典范
75-89	良好	监控一致性
50-74	可接受	审查以改进
25-49	较差	优先修复
0-24	失败	立即干预

SQLite 模式（本地）

CREATE TABLE skill_invocations (
    id TEXT PRIMARY KEY,
    skill_name TEXT NOT NULL,
    skill_version TEXT,
    timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,

    -- Input
    user_query TEXT,
    context_tokens INTEGER,

    -- Execution
    duration_ms INTEGER,
    tokens_input INTEGER,
    tokens_output INTEGER,
    tool_calls_json TEXT,
    errors_json TEXT,

    -- Output
    output_type TEXT,
    artifacts_json TEXT,
    response_length INTEGER,

    -- Quality signals
    user_accepted BOOLEAN,
    user_edit_ratio REAL,
    follow_up_needed BOOLEAN,
    explicit_rating TEXT,

    -- Computed
    quality_score REAL,

    INDEX idx_skill_name (skill_name),
    INDEX idx_timestamp (timestamp),
    INDEX idx_quality (quality_score)
);

CREATE TABLE skill_aggregates (
    skill_name TEXT,
    period TEXT,  -- 'daily', 'weekly', 'monthly'
    period_start DATE,

    invocation_count INTEGER,
    avg_quality_score REAL,
    error_rate REAL,
    avg_tokens_used INTEGER,
    avg_duration_ms INTEGER,

    PRIMARY KEY (skill_name, period, period_start)
);

JSON 日志格式（可移植）

{
  "logs_version": "1.0",
  "skill_name": "wedding-immortalist",
  "entries": [
    {
      "id": "uuid",
      "timestamp": "2025-01-15T14:30:00Z",
      "input": {...},
      "execution": {...},
      "output": {...},
      "quality": {
        "signals": {...},
        "score": 85,
        "computed_at": "2025-01-15T14:35:00Z"
      }
    }
  ]
}

技能性能仪表板

-- Overall skill rankings
SELECT
    skill_name,
    COUNT(*) as uses,
    AVG(quality_score) as avg_quality,
    AVG(tokens_output) as avg_tokens,
    SUM(CASE WHEN errors_json != '[]' THEN 1 ELSE 0 END) * 100.0 / COUNT(*) as error_rate
FROM skill_invocations
WHERE timestamp > datetime('now', '-30 days')
GROUP BY skill_name
ORDER BY avg_quality DESC;

-- Quality trend (weekly)
SELECT
    skill_name,
    strftime('%Y-%W', timestamp) as week,
    AVG(quality_score) as avg_quality,
    COUNT(*) as uses
FROM skill_invocations
GROUP BY skill_name, week
ORDER BY skill_name, week;

-- Problem detection
SELECT skill_name, COUNT(*) as failures
FROM skill_invocations
WHERE quality_score < 50
  AND timestamp > datetime('now', '-7 days')
GROUP BY skill_name
HAVING failures >= 3
ORDER BY failures DESC;

def identify_improvement_opportunities(skill_name, logs):
    """Analyze logs to suggest skill improvements."""

    opportunities = []

    # Pattern 1: Common follow-up questions
    follow_ups = extract_follow_up_patterns(logs)
    if follow_ups:
        opportunities.append({
            'type': 'missing_capability',
            'description': f'Users frequently ask: {follow_ups[0]}',
            'suggestion': 'Add guidance for this common need'
        })

    # Pattern 2: High edit ratio in specific output types
    edit_patterns = analyze_edit_patterns(logs)
    if edit_patterns['code'] > 0.4:
        opportunities.append({
            'type': 'code_quality',
            'description': 'Users frequently edit generated code',
            'suggestion': 'Review code examples and templates'
        })

    # Pattern 3: Repeated errors
    error_patterns = cluster_errors(logs)
    for error_type, count in error_patterns:
        if count >= 3:
            opportunities.append({
                'type': 'recurring_error',
                'description': f'{error_type} occurred {count} times',
                'suggestion': 'Add error handling or documentation'
            })

    return opportunities

基础日志记录钩子

# hooks/skill_logger.py
import json
import sqlite3
from datetime import datetime
from pathlib import Path

LOG_DB = Path.home() / '.claude' / 'skill_logs.db'

def log_skill_invocation(
    skill_name: str,
    user_query: str,
    output: str,
    tool_calls: list,
    duration_ms: int,
    tokens: dict,
    errors: list = None
):
    """Log a skill invocation to the database."""

    conn = sqlite3.connect(LOG_DB)
    cursor = conn.cursor()

    cursor.execute('''
        INSERT INTO skill_invocations
        (id, skill_name, timestamp, user_query, duration_ms,
         tokens_input, tokens_output, tool_calls_json, errors_json,
         response_length)
        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
    ''', (
        str(uuid.uuid4()),
        skill_name,
        datetime.utcnow().isoformat(),
        user_query,
        duration_ms,
        tokens.get('input', 0),
        tokens.get('output', 0),
        json.dumps(tool_calls),
        json.dumps(errors or []),
        len(output)
    ))

    conn.commit()
    conn.close()

def collect_quality_signals(invocation_id: str, signals: dict):
    """Update an invocation with quality signals."""

    conn = sqlite3.connect(LOG_DB)
    cursor = conn.cursor()

    # Update with user feedback
    cursor.execute('''
        UPDATE skill_invocations
        SET user_accepted = ?,
            user_edit_ratio = ?,
            follow_up_needed = ?,
            explicit_rating = ?,
            quality_score = ?
        WHERE id = ?
    ''', (
        signals.get('accepted'),
        signals.get('edit_ratio'),
        signals.get('follow_up'),
        signals.get('rating'),
        calculate_score(signals),
        invocation_id
    ))

    conn.commit()
    conn.close()

ALERT_CONDITIONS = {
    'quality_drop': {
        'condition': 'avg_quality_7d < avg_quality_30d * 0.8',
        'message': 'Skill {skill} quality dropped 20%+ in past week',
        'severity': 'warning'
    },
    'error_spike': {
        'condition': 'error_rate_24h > error_rate_7d * 2',
        'message': 'Skill {skill} error rate doubled in past 24h',
        'severity': 'critical'
    },
    'underused': {
        'condition': 'uses_7d < uses_30d_avg * 0.5',
        'message': 'Skill {skill} usage down 50%+ this week',
        'severity': 'info'
    },
    'high_performer': {
        'condition': 'avg_quality_7d > 90 AND uses_7d > 10',
        'message': 'Skill {skill} performing excellently',
        'severity': 'positive'
    }
}

错误做法：记录每次调用的完整输入/输出。原因：隐私问题、存储爆炸、噪音。正确做法：记录元数据、摘要，并选择性地进行详细日志记录。

"评分一次，不再过问"

错误做法：在完成后立即计算质量分数。原因：错过了延迟信号（代码运行了吗？是否被回滚了？）。正确做法：随时间收集信号，定期重新计算。

"仅关注平均值"

错误做法：只跟踪平均质量分数。原因：隐藏了分布情况，错过了故障模式。正确做法：跟踪百分位数、故障率和模式。

错误做法：在没有建立基线的情况下衡量质量。原因：无法检测改进或回归。正确做法：为每个技能建立基线，比较趋势。

每周技能健康报告

# Skill Health Report - Week of 2025-01-13

## Overview
- Total invocations: 247
- Average quality: 78.3 (up 2.1 from last week)
- Error rate: 4.2% (down 1.8%)

## Top Performers
1. **wedding-immortalist** - 92.1 avg quality, 18 uses
2. **skill-coach** - 89.4 avg quality, 34 uses
3. **api-architect** - 87.2 avg quality, 22 uses

## Needs Attention
1. **legacy-code-converter** - 52.3 avg quality (down 15%)
   - Common issue: Missing dependency detection
   - Suggested fix: Add dependency scanning step

## Improvement Opportunities
- `partner-text-coach`: Users frequently ask for tone adjustment
- `yard-landscaper`: High edit ratio on plant recommendations

skill-coach：为技能改进提供质量数据
agent-creator：在设计新技能时使用指标
automatic-stateful-prompt-improver：为提示优化提供质量信号

核心理念：能被衡量的东西才能被改进。技能日志记录将关于技能质量的直觉转化为可操作的数据，从而推动整个技能生态系统的持续改进。

🇺🇸English

Skill Logger

Track, measure, and improve skill quality through systematic logging and scoring.

When to Use This Skill

Use for:

Setting up skill usage logging
Defining quality metrics for skill outputs
Analyzing skill performance over time
Identifying skills that need improvement
Building feedback loops for skill enhancement
A/B testing skill variations

NOT for:

Creating new skills → use agent-creator
Skill documentation → use skill-coach
Runtime debugging → use appropriate debugger skills
General logging/monitoring → use devops-automator

Core Logging Architecture

┌────────────────────────────────────────────────────────────────┐
│                    SKILL LOGGING PIPELINE                       │
├────────────────────────────────────────────────────────────────┤
│                                                                 │
│  1. CAPTURE          2. ANALYZE           3. SCORE              │
│  ├─ Invocation       ├─ Output parse      ├─ Quality metrics    │
│  ├─ Input context    ├─ Token usage       ├─ User satisfaction  │
│  ├─ Output           ├─ Tool calls        ├─ Goal completion    │
│  └─ Timing           └─ Error patterns    └─ Efficiency         │
│                                                                 │
│  4. AGGREGATE        5. ALERT             6. IMPROVE            │
│  ├─ Per-skill stats  ├─ Quality drops     ├─ Identify patterns  │
│  ├─ Trend analysis   ├─ Error spikes      ├─ Suggest changes    │
│  └─ Comparisons      └─ Underuse          └─ Track experiments  │
│                                                                 │
└────────────────────────────────────────────────────────────────┘

What to Log

Invocation Data

{
  "invocation_id": "uuid",
  "timestamp": "ISO8601",
  "skill_name": "wedding-immortalist",
  "skill_version": "1.2.0",

  "input": {
    "user_query": "Create a 3D model from my wedding photos",
    "context_tokens": 1500,
    "files_referenced": ["photos/", "config.json"]
  },

  "execution": {
    "duration_ms": 45000,
    "tool_calls": [
      {"tool": "Bash", "count": 5},
      {"tool": "Write", "count": 3}
    ],
    "tokens_used": {
      "input": 8500,
      "output": 3200
    },
    "errors": []
  },

  "output": {
    "type": "code_generation",
    "artifacts_created": ["pipeline.py", "config.yaml"],
    "response_length": 3200
  }
}

Quality Signals

QUALITY_SIGNALS = {
    # Implicit signals (automated)
    'completion': 'Did the skill complete without errors?',
    'token_efficiency': 'Output quality per token used',
    'tool_success_rate': 'Tool calls that succeeded',
    'retry_count': 'How many retries needed?',

    # Explicit signals (user feedback)
    'user_edit_ratio': 'How much did user modify output?',
    'user_accepted': 'Did user accept/use the output?',
    'follow_up_needed': 'Did user need to ask for fixes?',
    'explicit_rating': 'Thumbs up/down if available',

    # Outcome signals (delayed)
    'code_ran_successfully': 'Did generated code work?',
    'tests_passed': 'Did it pass tests?',
    'reverted': 'Was the output later reverted?',
}

Scoring Framework

Multi-Dimensional Quality Score

def calculate_skill_score(invocation_log):
    """Score a skill invocation 0-100."""

    scores = {
        # Completion (25%)
        'completion': (
            25 if invocation_log['errors'] == [] else
            15 if invocation_log['recovered'] else
            0
        ),

        # Efficiency (20%)
        'efficiency': min(20, 20 * (
            BASELINE_TOKENS / invocation_log['tokens_used']
        )),

        # Output Quality (30%)
        'quality': (
            30 if invocation_log['user_accepted'] else
            20 if invocation_log['user_edit_ratio'] < 0.2 else
            10 if invocation_log['user_edit_ratio'] < 0.5 else
            0
        ),

        # User Satisfaction (25%)
        'satisfaction': (
            25 if invocation_log['explicit_rating'] == 'positive' else
            15 if invocation_log['no_follow_up'] else
            5 if invocation_log['follow_up_resolved'] else
            0
        ),
    }

    return sum(scores.values())

Score Interpretation

Score Range	Quality Level	Action
90-100	Excellent	Document as exemplar
75-89	Good	Monitor for consistency
50-74	Acceptable	Review for improvements
25-49	Poor	Prioritize fixes
0-24	Failing	Immediate intervention

Log Storage Schema

SQLite Schema (Local)

CREATE TABLE skill_invocations (
    id TEXT PRIMARY KEY,
    skill_name TEXT NOT NULL,
    skill_version TEXT,
    timestamp DATETIME DEFAULT CURRENT_TIMESTAMP,

    -- Input
    user_query TEXT,
    context_tokens INTEGER,

    -- Execution
    duration_ms INTEGER,
    tokens_input INTEGER,
    tokens_output INTEGER,
    tool_calls_json TEXT,
    errors_json TEXT,

    -- Output
    output_type TEXT,
    artifacts_json TEXT,
    response_length INTEGER,

    -- Quality signals
    user_accepted BOOLEAN,
    user_edit_ratio REAL,
    follow_up_needed BOOLEAN,
    explicit_rating TEXT,

    -- Computed
    quality_score REAL,

    INDEX idx_skill_name (skill_name),
    INDEX idx_timestamp (timestamp),
    INDEX idx_quality (quality_score)
);

CREATE TABLE skill_aggregates (
    skill_name TEXT,
    period TEXT,  -- 'daily', 'weekly', 'monthly'
    period_start DATE,

    invocation_count INTEGER,
    avg_quality_score REAL,
    error_rate REAL,
    avg_tokens_used INTEGER,
    avg_duration_ms INTEGER,

    PRIMARY KEY (skill_name, period, period_start)
);

JSON Log Format (Portable)

{
  "logs_version": "1.0",
  "skill_name": "wedding-immortalist",
  "entries": [
    {
      "id": "uuid",
      "timestamp": "2025-01-15T14:30:00Z",
      "input": {...},
      "execution": {...},
      "output": {...},
      "quality": {
        "signals": {...},
        "score": 85,
        "computed_at": "2025-01-15T14:35:00Z"
      }
    }
  ]
}

Analytics Queries

Skill Performance Dashboard

-- Overall skill rankings
SELECT
    skill_name,
    COUNT(*) as uses,
    AVG(quality_score) as avg_quality,
    AVG(tokens_output) as avg_tokens,
    SUM(CASE WHEN errors_json != '[]' THEN 1 ELSE 0 END) * 100.0 / COUNT(*) as error_rate
FROM skill_invocations
WHERE timestamp > datetime('now', '-30 days')
GROUP BY skill_name
ORDER BY avg_quality DESC;

-- Quality trend (weekly)
SELECT
    skill_name,
    strftime('%Y-%W', timestamp) as week,
    AVG(quality_score) as avg_quality,
    COUNT(*) as uses
FROM skill_invocations
GROUP BY skill_name, week
ORDER BY skill_name, week;

-- Problem detection
SELECT skill_name, COUNT(*) as failures
FROM skill_invocations
WHERE quality_score < 50
  AND timestamp > datetime('now', '-7 days')
GROUP BY skill_name
HAVING failures >= 3
ORDER BY failures DESC;

Improvement Opportunities

def identify_improvement_opportunities(skill_name, logs):
    """Analyze logs to suggest skill improvements."""

    opportunities = []

    # Pattern 1: Common follow-up questions
    follow_ups = extract_follow_up_patterns(logs)
    if follow_ups:
        opportunities.append({
            'type': 'missing_capability',
            'description': f'Users frequently ask: {follow_ups[0]}',
            'suggestion': 'Add guidance for this common need'
        })

    # Pattern 2: High edit ratio in specific output types
    edit_patterns = analyze_edit_patterns(logs)
    if edit_patterns['code'] > 0.4:
        opportunities.append({
            'type': 'code_quality',
            'description': 'Users frequently edit generated code',
            'suggestion': 'Review code examples and templates'
        })

    # Pattern 3: Repeated errors
    error_patterns = cluster_errors(logs)
    for error_type, count in error_patterns:
        if count >= 3:
            opportunities.append({
                'type': 'recurring_error',
                'description': f'{error_type} occurred {count} times',
                'suggestion': 'Add error handling or documentation'
            })

    return opportunities

Implementation Guide

Basic Logger Hook

# hooks/skill_logger.py
import json
import sqlite3
from datetime import datetime
from pathlib import Path

LOG_DB = Path.home() / '.claude' / 'skill_logs.db'

def log_skill_invocation(
    skill_name: str,
    user_query: str,
    output: str,
    tool_calls: list,
    duration_ms: int,
    tokens: dict,
    errors: list = None
):
    """Log a skill invocation to the database."""

    conn = sqlite3.connect(LOG_DB)
    cursor = conn.cursor()

    cursor.execute('''
        INSERT INTO skill_invocations
        (id, skill_name, timestamp, user_query, duration_ms,
         tokens_input, tokens_output, tool_calls_json, errors_json,
         response_length)
        VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
    ''', (
        str(uuid.uuid4()),
        skill_name,
        datetime.utcnow().isoformat(),
        user_query,
        duration_ms,
        tokens.get('input', 0),
        tokens.get('output', 0),
        json.dumps(tool_calls),
        json.dumps(errors or []),
        len(output)
    ))

    conn.commit()
    conn.close()

Quality Signal Collection

def collect_quality_signals(invocation_id: str, signals: dict):
    """Update an invocation with quality signals."""

    conn = sqlite3.connect(LOG_DB)
    cursor = conn.cursor()

    # Update with user feedback
    cursor.execute('''
        UPDATE skill_invocations
        SET user_accepted = ?,
            user_edit_ratio = ?,
            follow_up_needed = ?,
            explicit_rating = ?,
            quality_score = ?
        WHERE id = ?
    ''', (
        signals.get('accepted'),
        signals.get('edit_ratio'),
        signals.get('follow_up'),
        signals.get('rating'),
        calculate_score(signals),
        invocation_id
    ))

    conn.commit()
    conn.close()

Alerting & Notifications

Alert Conditions

ALERT_CONDITIONS = {
    'quality_drop': {
        'condition': 'avg_quality_7d < avg_quality_30d * 0.8',
        'message': 'Skill {skill} quality dropped 20%+ in past week',
        'severity': 'warning'
    },
    'error_spike': {
        'condition': 'error_rate_24h > error_rate_7d * 2',
        'message': 'Skill {skill} error rate doubled in past 24h',
        'severity': 'critical'
    },
    'underused': {
        'condition': 'uses_7d < uses_30d_avg * 0.5',
        'message': 'Skill {skill} usage down 50%+ this week',
        'severity': 'info'
    },
    'high_performer': {
        'condition': 'avg_quality_7d > 90 AND uses_7d > 10',
        'message': 'Skill {skill} performing excellently',
        'severity': 'positive'
    }
}

Anti-Patterns

"Log Everything"

Wrong : Logging complete input/output for every invocation. Why : Privacy concerns, storage explosion, noise. Right : Log metadata, summaries, and opt-in detailed logging.

"Score Once, Forget"

Wrong : Calculating quality score immediately after completion. Why : Misses delayed signals (did code work? was it reverted?). Right : Collect signals over time, recalculate periodically.

"Averages Only"

Wrong : Only tracking average quality scores. Why : Hides distribution, misses failure modes. Right : Track percentiles, failure rates, and patterns.

"No Baseline"

Wrong : Measuring quality without establishing baselines. Why : Can't detect improvement or regression. Right : Establish baselines per skill, compare trends.

Output Reports

Weekly Skill Health Report

# Skill Health Report - Week of 2025-01-13

## Overview
- Total invocations: 247
- Average quality: 78.3 (up 2.1 from last week)
- Error rate: 4.2% (down 1.8%)

## Top Performers
1. **wedding-immortalist** - 92.1 avg quality, 18 uses
2. **skill-coach** - 89.4 avg quality, 34 uses
3. **api-architect** - 87.2 avg quality, 22 uses

## Needs Attention
1. **legacy-code-converter** - 52.3 avg quality (down 15%)
   - Common issue: Missing dependency detection
   - Suggested fix: Add dependency scanning step

## Improvement Opportunities
- `partner-text-coach`: Users frequently ask for tone adjustment
- `yard-landscaper`: High edit ratio on plant recommendations

Integration Points

skill-coach : Feed quality data for skill improvements
agent-creator : Use metrics when designing new skills
automatic-stateful-prompt-improver : Quality signals for prompt optimization

Core Philosophy : What gets measured gets improved. Skill logging transforms intuition about skill quality into actionable data, enabling continuous improvement of the entire skill ecosystem.

Weekly Installs

Repository

erichowens/some…e_skills

GitHub Stars

First Seen

Jan 24, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

gemini-cli47

codex47

cursor46

opencode46

github-copilot44

cline39

Convex性能审计指南 - 诊断修复Convex应用性能问题与优化方案

19,300 周安装

Skill Logger：AI技能日志记录与质量评估工具，追踪性能提升技能质量

🇨🇳中文介绍

Skill Logger

何时使用此技能

核心日志记录架构

相关 Skills

记录内容

调用数据

质量信号

评分框架

多维质量评分

分数解读

日志存储模式

SQLite 模式（本地）

JSON 日志格式（可移植）

分析查询

技能性能仪表板

改进机会

实施指南

基础日志记录钩子

质量信号收集

告警与通知

告警条件

反模式

"记录一切"

"评分一次，不再过问"

"仅关注平均值"

"没有基线"

输出报告

每周技能健康报告

集成点

🇺🇸English

Skill Logger

When to Use This Skill

Core Logging Architecture

What to Log

Invocation Data

Quality Signals

Scoring Framework

Multi-Dimensional Quality Score

Score Interpretation

Log Storage Schema

SQLite Schema (Local)

JSON Log Format (Portable)

Analytics Queries

Skill Performance Dashboard

Improvement Opportunities

Implementation Guide

Basic Logger Hook

Quality Signal Collection

Alerting & Notifications

Alert Conditions

Anti-Patterns

"Log Everything"

"Score Once, Forget"

"Averages Only"

"No Baseline"

Output Reports

Weekly Skill Health Report

Integration Points

最新 Skills