重要前提
安装AI Skills的关键前提是:必须科学上网,且开启TUN模式,这一点至关重要,直接决定安装能否顺利完成,在此郑重提醒三遍:科学上网,科学上网,科学上网。查看完整安装教程 →
data-anonymizer by dkyazzentwatwa/chatgpt-skills
npx skills add https://github.com/dkyazzentwatwa/chatgpt-skills --skill data-anonymizer检测并屏蔽文本文档和结构化数据中的个人身份信息(PII)。支持多种屏蔽策略,并能大规模处理 CSV 文件。
from scripts.data_anonymizer import DataAnonymizer
# 匿名化文本
anonymizer = DataAnonymizer()
result = anonymizer.anonymize("Contact John Smith at john@email.com or 555-123-4567")
print(result)
# "Contact [NAME] at [EMAIL] or [PHONE]"
# 匿名化 CSV
anonymizer.anonymize_csv("customers.csv", "customers_anon.csv")
anonymizer = DataAnonymizer(
strategy="mask", # mask, redact, hash, fake
reversible=False # 启用令牌映射
)
# 基本匿名化
result = anonymizer.anonymize(text)
# 指定 PII 类型
result = anonymizer.anonymize(text, pii_types=["email", "phone"])
# 获取检测到的 PII 报告
result, report = anonymizer.anonymize(text, return_report=True)
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
text = "Email john@test.com, call 555-1234"
# 掩码(默认)- 用类型标签替换
anonymizer.strategy = "mask"
# "Email [EMAIL], call [PHONE]"
# 涂黑 - 用星号替换
anonymizer.strategy = "redact"
# "Email ***************, call ********"
# 哈希 - 用哈希值替换
anonymizer.strategy = "hash"
# "Email a1b2c3d4, call e5f6g7h8"
# 伪造 - 用逼真的伪造数据替换
anonymizer.strategy = "fake"
# "Email jane@example.org, call 555-9876"
# 自动检测 PII 列
anonymizer.anonymize_csv("input.csv", "output.csv")
# 指定列
anonymizer.anonymize_csv(
"input.csv",
"output.csv",
columns=["name", "email", "phone"]
)
# 每列使用不同策略
anonymizer.anonymize_csv(
"input.csv",
"output.csv",
column_strategies={
"name": "fake",
"email": "hash",
"ssn": "redact"
}
)
anonymizer = DataAnonymizer(reversible=True)
# 使用令牌映射进行匿名化
result = anonymizer.anonymize("John Smith: john@test.com")
mapping = anonymizer.get_mapping()
# 安全保存映射
anonymizer.save_mapping("mapping.json", encrypt=True, password="secret")
# 稍后,去匿名化
anonymizer.load_mapping("mapping.json", password="secret")
original = anonymizer.deanonymize(result)
# 添加自定义 PII 模式
anonymizer.add_pattern(
name="employee_id",
pattern=r"EMP-\d{6}",
label="[EMPLOYEE_ID]"
)
# 匿名化文本文件
python data_anonymizer.py --input document.txt --output document_anon.txt
# 匿名化 CSV
python data_anonymizer.py --input customers.csv --output customers_anon.csv
# 指定策略
python data_anonymizer.py --input data.csv --output anon.csv --strategy fake
# 生成审计报告
python data_anonymizer.py --input document.txt --report audit.json
# 仅指定 PII 类型
python data_anonymizer.py --input doc.txt --types email phone ssn
| 参数 | 描述 | 默认值 |
|---|---|---|
--input | 输入文件 | 必需 |
--output | 输出文件 | 必需 |
--strategy | 屏蔽策略 | mask |
--types | 要检测的 PII 类型 | all |
--columns | 要处理的 CSV 列 | auto |
--report | 生成审计报告 | - |
--reversible | 启用令牌映射 | False |
| 类型 | 示例 | 模式 |
|---|---|---|
name | John Smith, Mary Johnson | 基于 NLP |
email | user@domain.com | 正则表达式 |
phone | 555-123-4567, (555) 123-4567 | 正则表达式 |
ssn | 123-45-6789 | 正则表达式 |
credit_card | 4111-1111-1111-1111 | 正则表达式 + Luhn |
address | 123 Main St, City, ST 12345 | NLP + 正则表达式 |
date_of_birth | 01/15/1990, January 15, 1990 | 正则表达式 |
ip_address | 192.168.1.1 | 正则表达式 |
anonymizer = DataAnonymizer(strategy="mask")
log = """
Ticket #1234: Customer John Doe (john.doe@company.com) called about
billing issue. SSN on file: 123-45-6789. Callback number: 555-867-5309.
Address: 123 Oak Street, Springfield, IL 62701.
"""
result = anonymizer.anonymize(log)
print(result)
# Ticket #1234: Customer [NAME] ([EMAIL]) called about
# billing issue. SSN on file: [SSN]. Callback number: [PHONE].
# Address: [ADDRESS].
anonymizer = DataAnonymizer(strategy="hash")
# 用于连接的一致性哈希
anonymizer.anonymize_csv(
"users.csv",
"users_anon.csv",
columns=["email", "name", "phone"]
)
anonymizer.anonymize_csv(
"orders.csv",
"orders_anon.csv",
columns=["customer_email"] # 与 users.email 相同的哈希值
)
anonymizer = DataAnonymizer(strategy="fake")
# 用逼真的伪造数据替换真实 PII
anonymizer.anonymize_csv(
"production_data.csv",
"test_data.csv"
)
# 测试数据具有相同的结构但为伪造的 PII
pandas>=2.0.0
faker>=18.0.0
每周安装量
42
代码仓库
GitHub 星标数
23
首次出现
2026 年 1 月 24 日
安全审计
安装于
opencode34
claude-code33
gemini-cli33
codex32
github-copilot30
cursor29
Detect and mask personally identifiable information (PII) in text documents and structured data. Supports multiple masking strategies and can process CSV files at scale.
from scripts.data_anonymizer import DataAnonymizer
# Anonymize text
anonymizer = DataAnonymizer()
result = anonymizer.anonymize("Contact John Smith at john@email.com or 555-123-4567")
print(result)
# "Contact [NAME] at [EMAIL] or [PHONE]"
# Anonymize CSV
anonymizer.anonymize_csv("customers.csv", "customers_anon.csv")
anonymizer = DataAnonymizer(
strategy="mask", # mask, redact, hash, fake
reversible=False # Enable token mapping
)
# Basic anonymization
result = anonymizer.anonymize(text)
# With specific PII types
result = anonymizer.anonymize(text, pii_types=["email", "phone"])
# Get detected PII report
result, report = anonymizer.anonymize(text, return_report=True)
text = "Email john@test.com, call 555-1234"
# Mask (default) - replace with type labels
anonymizer.strategy = "mask"
# "Email [EMAIL], call [PHONE]"
# Redact - replace with asterisks
anonymizer.strategy = "redact"
# "Email ***************, call ********"
# Hash - replace with hash
anonymizer.strategy = "hash"
# "Email a1b2c3d4, call e5f6g7h8"
# Fake - replace with realistic fake data
anonymizer.strategy = "fake"
# "Email jane@example.org, call 555-9876"
# Auto-detect PII columns
anonymizer.anonymize_csv("input.csv", "output.csv")
# Specify columns
anonymizer.anonymize_csv(
"input.csv",
"output.csv",
columns=["name", "email", "phone"]
)
# Different strategies per column
anonymizer.anonymize_csv(
"input.csv",
"output.csv",
column_strategies={
"name": "fake",
"email": "hash",
"ssn": "redact"
}
)
anonymizer = DataAnonymizer(reversible=True)
# Anonymize with token mapping
result = anonymizer.anonymize("John Smith: john@test.com")
mapping = anonymizer.get_mapping()
# Save mapping securely
anonymizer.save_mapping("mapping.json", encrypt=True, password="secret")
# Later, de-anonymize
anonymizer.load_mapping("mapping.json", password="secret")
original = anonymizer.deanonymize(result)
# Add custom PII pattern
anonymizer.add_pattern(
name="employee_id",
pattern=r"EMP-\d{6}",
label="[EMPLOYEE_ID]"
)
# Anonymize text file
python data_anonymizer.py --input document.txt --output document_anon.txt
# Anonymize CSV
python data_anonymizer.py --input customers.csv --output customers_anon.csv
# Specific strategy
python data_anonymizer.py --input data.csv --output anon.csv --strategy fake
# Generate audit report
python data_anonymizer.py --input document.txt --report audit.json
# Specific PII types only
python data_anonymizer.py --input doc.txt --types email phone ssn
| Argument | Description | Default |
|---|---|---|
--input | Input file | Required |
--output | Output file | Required |
--strategy | Masking strategy | mask |
--types | PII types to detect | all |
--columns | CSV columns to process | auto |
| Type | Examples | Pattern |
|---|---|---|
name | John Smith, Mary Johnson | NLP-based |
email | user@domain.com | Regex |
phone | 555-123-4567, (555) 123-4567 | Regex |
ssn | 123-45-6789 | Regex |
credit_card |
anonymizer = DataAnonymizer(strategy="mask")
log = """
Ticket #1234: Customer John Doe (john.doe@company.com) called about
billing issue. SSN on file: 123-45-6789. Callback number: 555-867-5309.
Address: 123 Oak Street, Springfield, IL 62701.
"""
result = anonymizer.anonymize(log)
print(result)
# Ticket #1234: Customer [NAME] ([EMAIL]) called about
# billing issue. SSN on file: [SSN]. Callback number: [PHONE].
# Address: [ADDRESS].
anonymizer = DataAnonymizer(strategy="hash")
# Consistent hashing for joins
anonymizer.anonymize_csv(
"users.csv",
"users_anon.csv",
columns=["email", "name", "phone"]
)
anonymizer.anonymize_csv(
"orders.csv",
"orders_anon.csv",
columns=["customer_email"] # Same hash as users.email
)
anonymizer = DataAnonymizer(strategy="fake")
# Replace real PII with realistic fake data
anonymizer.anonymize_csv(
"production_data.csv",
"test_data.csv"
)
# Test data has same structure but fake PII
pandas>=2.0.0
faker>=18.0.0
Weekly Installs
42
Repository
GitHub Stars
23
First Seen
Jan 24, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
opencode34
claude-code33
gemini-cli33
codex32
github-copilot30
cursor29
xdrop 文件传输脚本:Bun 环境下安全上传下载工具,支持加密分享
50,300 周安装
--report | Generate audit report | - |
--reversible | Enable token mapping | False |
| 4111-1111-1111-1111 |
| Regex + Luhn |
address | 123 Main St, City, ST 12345 | NLP + Regex |
date_of_birth | 01/15/1990, January 15, 1990 | Regex |
ip_address | 192.168.1.1 | Regex |