csv-excel-merger by onewave-ai/claude-skills
npx skills add https://github.com/onewave-ai/claude-skills --skill csv-excel-merger智能合并多个 CSV 或 Excel 文件,具备自动列匹配和数据去重功能。
当用户需要合并 CSV 或 Excel 文件时:
列匹配: * 精确名称匹配:"email" = "email" * 不区分大小写:"Email" = "email" * 模糊匹配:"E-mail" ≈ "email" * 常见模式: * "first_name", "firstname", "First Name" → "first_name" * "phone", "phone_number", "tel" → "phone" * "email", "e-mail", "email_address" → "email"
冲突解决(当同一记录出现在多个文件中时): * 保留第一个:使用第一个文件中的值 * 保留最后一个:使用最后一个文件中的值(最新的) * 保留最长的:使用最完整的值 * 人工审核:标记冲突供用户审查 * 合并:合并非冲突字段
去重: * 根据主键识别重复行 * 选项:保留第一个、保留最后一个、保留全部、合并值 * 跟踪每一行的来源文件 4. 执行合并:
# 示例合并逻辑
import pandas as pd
# 读取文件
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')
# 规范化列名
df1.columns = df1.columns.str.lower().str.strip()
df2.columns = df2.columns.str.lower().str.strip()
# 映射相似的列
column_mapping = {
'firstname': 'first_name',
'e_mail': 'email',
# ...
}
df2 = df2.rename(columns=column_mapping)
# 合并
merged = pd.concat([df1, df2], ignore_index=True)
# 去重
merged = merged.drop_duplicates(subset=['email'], keep='last')
# 保存
merged.to_csv('merged_output.csv', index=False)
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
5. 格式化输出:
📊 CSV/EXCEL MERGER REPORT
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📁 INPUT FILES
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
File 1: contacts_jan.csv
Rows: 1,245
Columns: 8 (name, email, phone, company, ...)
File 2: contacts_feb.csv
Rows: 987
Columns: 9 (firstname, lastname, email, mobile, ...)
File 3: leads_export.xlsx
Rows: 2,103
Columns: 12 (full_name, email_address, phone, ...)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔄 COLUMN MAPPING
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Unified Schema:
• first_name ← [firstname, first name, fname]
• last_name ← [lastname, last name, lname]
• email ← [email, e-mail, email_address]
• phone ← [phone, mobile, phone_number, tel]
• company ← [company, organization, org]
• title ← [title, job_title, position]
• source ← [file origin tracking]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔍 MERGE ANALYSIS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total rows before merge: 4,335
Duplicate records found: 892
Conflicts detected: 47
Deduplication Strategy: Keep most recent (by source file date)
Primary Key: email
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚠️ CONFLICTS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Record: john.doe@example.com
File 1 phone: (555) 123-4567
File 2 phone: (555) 987-6543
Resolution: Kept most recent (File 2)
[List top 10 conflicts]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ MERGE RESULTS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Output File: merged_contacts.csv
Total Rows: 3,443
Columns: 7
Duplicates Removed: 892
Breakdown by Source:
• contacts_jan.csv: 1,245 rows (398 unique)
• contacts_feb.csv: 987 rows (521 unique)
• leads_export.xlsx: 2,103 rows (2,524 unique)
Data Quality:
• Email completeness: 98.2%
• Phone completeness: 87.5%
• Company completeness: 91.3%
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 RECOMMENDATIONS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
• Review 47 conflict records manually
• Standardize phone number format
• Fill missing company names (8.7% incomplete)
• Export conflicts to: conflicts_review.csv
6. 处理特殊情况:
多个主键: * 使用复合键:(email + company) * 在模糊不清时提供选项
不同的数据类型: * 将日期转换为标准格式 * 规范化电话号码 * 标准化国家代码 * 清理空格和大小写
缺失的列: * 用空值填充 * 标记缺失数据 * 提供创建新列的选项
大文件: * 对大于 100MB 的文件使用分块处理 * 显示进度指示器 * 预估内存使用量 7. 生成代码:提供 Python/pandas 脚本,用于: * 读取所有文件 * 执行智能列匹配 * 根据策略去重 * 解决冲突 * 保存合并后的输出 * 生成详细报告 8. 导出选项: * CSV (UTF-8) * Excel (.xlsx) * JSON * SQL INSERT 语句 * Parquet(适用于大型数据集)
列匹配:
数据质量:
性能:
透明度:
确保合并操作:
生成干净、去重后的合并文件,并具备完全的透明度和数据质量检查。
周安装量
392
代码仓库
GitHub 星标
74
首次出现
Jan 24, 2026
安全审计
安装于
opencode351
gemini-cli337
codex335
cursor329
github-copilot323
kimi-cli305
Intelligently merge multiple CSV or Excel files with automatic column matching and data deduplication.
When a user needs to merge CSV or Excel files:
Analyze Input Files :
Inspect File Structures :
Create Merge Strategy :
Column Matching :
* Exact name match: "email" = "email"
* Case-insensitive: "Email" = "email"
* Fuzzy match: "E-mail" ≈ "email"
* Common patterns:
* "first_name", "firstname", "First Name" → "first_name"
* "phone", "phone_number", "tel" → "phone"
* "email", "e-mail", "email_address" → "email"
Conflict Resolution (when same record appears in multiple files):
* **Keep first** : Use value from first file
* **Keep last** : Use value from last file (most recent)
* **Keep longest** : Use most complete value
* **Manual review** : Flag conflicts for user review
* **Merge** : Combine non-conflicting fields
Deduplication :
* Identify duplicate rows based on primary key
* Options: keep first, keep last, keep all, merge values
* Track source file for each row
4. Perform Merge :
# Example merge logic
import pandas as pd
# Read files
df1 = pd.read_csv('file1.csv')
df2 = pd.read_csv('file2.csv')
# Normalize column names
df1.columns = df1.columns.str.lower().str.strip()
df2.columns = df2.columns.str.lower().str.strip()
# Map similar columns
column_mapping = {
'firstname': 'first_name',
'e_mail': 'email',
# ...
}
df2 = df2.rename(columns=column_mapping)
# Merge
merged = pd.concat([df1, df2], ignore_index=True)
# Deduplicate
merged = merged.drop_duplicates(subset=['email'], keep='last')
# Save
merged.to_csv('merged_output.csv', index=False)
5. Format Output :
📊 CSV/EXCEL MERGER REPORT
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📁 INPUT FILES
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
File 1: contacts_jan.csv
Rows: 1,245
Columns: 8 (name, email, phone, company, ...)
File 2: contacts_feb.csv
Rows: 987
Columns: 9 (firstname, lastname, email, mobile, ...)
File 3: leads_export.xlsx
Rows: 2,103
Columns: 12 (full_name, email_address, phone, ...)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔄 COLUMN MAPPING
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Unified Schema:
• first_name ← [firstname, first name, fname]
• last_name ← [lastname, last name, lname]
• email ← [email, e-mail, email_address]
• phone ← [phone, mobile, phone_number, tel]
• company ← [company, organization, org]
• title ← [title, job_title, position]
• source ← [file origin tracking]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
🔍 MERGE ANALYSIS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Total rows before merge: 4,335
Duplicate records found: 892
Conflicts detected: 47
Deduplication Strategy: Keep most recent (by source file date)
Primary Key: email
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⚠️ CONFLICTS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Record: john.doe@example.com
File 1 phone: (555) 123-4567
File 2 phone: (555) 987-6543
Resolution: Kept most recent (File 2)
[List top 10 conflicts]
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
✅ MERGE RESULTS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Output File: merged_contacts.csv
Total Rows: 3,443
Columns: 7
Duplicates Removed: 892
Breakdown by Source:
• contacts_jan.csv: 1,245 rows (398 unique)
• contacts_feb.csv: 987 rows (521 unique)
• leads_export.xlsx: 2,103 rows (2,524 unique)
Data Quality:
• Email completeness: 98.2%
• Phone completeness: 87.5%
• Company completeness: 91.3%
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
💡 RECOMMENDATIONS
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
• Review 47 conflict records manually
• Standardize phone number format
• Fill missing company names (8.7% incomplete)
• Export conflicts to: conflicts_review.csv
6. Handle Special Cases :
Multiple Primary Keys :
* Use compound keys: (email + company)
* Offer options when ambiguous
Different Data Types :
* Convert dates to standard format
* Normalize phone numbers
* Standardize country codes
* Clean whitespace and casing
Missing Columns :
* Fill with empty values
* Flag missing data
* Offer to create new columns
Large Files :
* Use chunking for files > 100MB
* Show progress indicator
* Estimate memory usage
7. Generate Code : Provide Python/pandas script that:
* Reads all files
* Performs intelligent column matching
* Deduplicates based on strategy
* Resolves conflicts
* Saves merged output
* Generates detailed report
8. Export Options :
* CSV (UTF-8)
* Excel (.xlsx)
* JSON
* SQL INSERT statements
* Parquet (for large datasets)
Column Matching :
Data Quality :
Performance :
Transparency :
Ensure merges:
Generate clean, deduplicated merged files with full transparency and data quality checks.
Weekly Installs
392
Repository
GitHub Stars
74
First Seen
Jan 24, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
opencode351
gemini-cli337
codex335
cursor329
github-copilot323
kimi-cli305
Python PDF处理教程:合并拆分、提取文本表格、创建PDF文件
54,200 周安装
社交媒体内容策略指南:LinkedIn、Twitter、Instagram、TikTok、Facebook平台优化与内容创作模板
303 周安装
data-extractor 数据提取技能:从PDF、Word、Excel等文档自动提取结构化数据
304 周安装
架构决策框架:需求驱动架构设计,ADR记录决策,权衡分析指南
304 周安装
使用reveal.js创建HTML幻灯片 | 交互式演示文稿制作工具 | 代码高亮与动画效果
304 周安装
移动应用发布策略指南:从ASO优化到推广渠道的完整发布计划
304 周安装
计分卡营销系统:四步法生成高转化率潜在客户,互动评估提升线索质量
304 周安装