citation-management by davila7/claude-code-templates
npx skills add https://github.com/davila7/claude-code-templates --skill citation-management在整个研究和写作过程中系统地管理引文。此技能提供用于搜索学术数据库(Google Scholar、PubMed)、从多个来源(CrossRef、PubMed、arXiv)提取准确元数据、验证引文信息以及生成正确格式的 BibTeX 条目的工具和策略。
对于保持引文准确性、避免参考文献错误以及确保研究可重复性至关重要。可与文献综述技能无缝集成,实现全面的研究工作流程。
在以下情况下使用此技能:
使用此技能创建文档时,请务必考虑添加科学图表和示意图以增强视觉传达。
如果您的文档尚未包含示意图或图表:
对于新文档: 默认应生成科学示意图,以直观地表示文本中描述的关键概念、工作流程、架构或关系。
如何生成示意图:
python scripts/generate_schematic.py "your diagram description" -o figures/output.png
AI 将自动:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
何时添加示意图:
有关创建示意图的详细指南,请参阅 scientific-schematics 技能文档。
引文管理遵循系统化流程:
目标:使用学术搜索引擎查找相关论文。
Google Scholar 提供跨学科最全面的覆盖范围。
基本搜索:
# 搜索某个主题的论文
python scripts/search_google_scholar.py "CRISPR gene editing" \
--limit 50 \
--output results.json
# 带年份过滤器的搜索
python scripts/search_google_scholar.py "machine learning protein folding" \
--year-start 2020 \
--year-end 2024 \
--limit 100 \
--output ml_proteins.json
高级搜索策略(参见 references/google_scholar_search.md):
"deep learning"author:LeCunintitle:"neural networks"machine learning -survey最佳实践:
PubMed 专门用于生物医学和生命科学文献(超过 3500 万条引文)。
基本搜索:
# 搜索 PubMed
python scripts/search_pubmed.py "Alzheimer's disease treatment" \
--limit 100 \
--output alzheimers.json
# 使用 MeSH 术语和过滤器进行搜索
python scripts/search_pubmed.py \
--query '"Alzheimer Disease"[MeSH] AND "Drug Therapy"[MeSH]' \
--date-start 2020 \
--date-end 2024 \
--publication-types "Clinical Trial,Review" \
--output alzheimers_trials.json
高级 PubMed 查询(参见 references/pubmed_search.md):
"Diabetes Mellitus"[MeSH]"cancer"[Title]、"Smith J"[Author]AND、OR、NOT2020:2024[Publication Date]"Review"[Publication Type]最佳实践:
目标:将论文标识符(DOI、PMID、arXiv ID)转换为完整、准确的元数据。
对于单个 DOI,使用快速转换工具:
# 转换单个 DOI
python scripts/doi_to_bibtex.py 10.1038/s41586-021-03819-2
# 从文件转换多个 DOI
python scripts/doi_to_bibtex.py --input dois.txt --output references.bib
# 不同的输出格式
python scripts/doi_to_bibtex.py 10.1038/nature12345 --format json
对于 DOI、PMID、arXiv ID 或 URL:
# 从 DOI 提取
python scripts/extract_metadata.py --doi 10.1038/s41586-021-03819-2
# 从 PMID 提取
python scripts/extract_metadata.py --pmid 34265844
# 从 arXiv ID 提取
python scripts/extract_metadata.py --arxiv 2103.14030
# 从 URL 提取
python scripts/extract_metadata.py --url "https://www.nature.com/articles/s41586-021-03819-2"
# 从文件批量提取(混合标识符)
python scripts/extract_metadata.py --input identifiers.txt --output citations.bib
元数据来源(参见 references/metadata_extraction.md):
CrossRef API:DOI 的主要来源
PubMed E-utilities:生物医学文献
arXiv API:物理学、数学、计算机科学、定量生物学领域的预印本
DataCite API:研究数据集、软件、其他资源
提取的内容:
目标:生成干净、格式正确的 BibTeX 条目。
完整指南请参见 references/bibtex_formatting.md。
常见条目类型:
@article:期刊文章(最常见)@book:书籍@inproceedings:会议论文@incollection:书籍章节@phdthesis:学位论文@misc:预印本、软件、数据集按类型划分的必填字段:
@article{citationkey,
author = {Last1, First1 and Last2, First2},
title = {Article Title},
journal = {Journal Name},
year = {2024},
volume = {10},
number = {3},
pages = {123--145},
doi = {10.1234/example}
}
@inproceedings{citationkey,
author = {Last, First},
title = {Paper Title},
booktitle = {Conference Name},
year = {2024},
pages = {1--10}
}
@book{citationkey,
author = {Last, First},
title = {Book Title},
publisher = {Publisher Name},
year = {2024}
}
使用格式化工具标准化 BibTeX 文件:
# 格式化和清理 BibTeX 文件
python scripts/format_bibtex.py references.bib \
--output formatted_references.bib
# 按引文键排序条目
python scripts/format_bibtex.py references.bib \
--sort key \
--output sorted_references.bib
# 按年份排序(最新优先)
python scripts/format_bibtex.py references.bib \
--sort year \
--descending \
--output sorted_references.bib
# 删除重复项
python scripts/format_bibtex.py references.bib \
--deduplicate \
--output clean_references.bib
# 验证并报告问题
python scripts/format_bibtex.py references.bib \
--validate \
--report validation_report.txt
格式化操作:
目标:验证所有引文准确且完整。
# 验证 BibTeX 文件
python scripts/validate_citations.py references.bib
# 验证并修复常见问题
python scripts/validate_citations.py references.bib \
--auto-fix \
--output validated_references.bib
# 生成详细的验证报告
python scripts/validate_citations.py references.bib \
--report validation_report.json \
--verbose
验证检查(参见 references/citation_validation.md):
DOI 验证:
必填字段:
数据一致性:
重复检测:
格式合规性:
验证输出:
{
"total_entries": 150,
"valid_entries": 145,
"errors": [
{
"citation_key": "Smith2023",
"error_type": "missing_field",
"field": "journal",
"severity": "high"
},
{
"citation_key": "Jones2022",
"error_type": "invalid_doi",
"doi": "10.1234/broken",
"severity": "high"
}
],
"warnings": [
{
"citation_key": "Brown2021",
"warning_type": "possible_duplicate",
"duplicate_of": "Brown2021a",
"severity": "medium"
}
]
}
创建参考文献列表的完整工作流程:
# 1. 搜索您主题的论文
python scripts/search_pubmed.py \
'"CRISPR-Cas Systems"[MeSH] AND "Gene Editing"[MeSH]' \
--date-start 2020 \
--limit 200 \
--output crispr_papers.json
# 2. 从搜索结果中提取 DOI 并转换为 BibTeX
python scripts/extract_metadata.py \
--input crispr_papers.json \
--output crispr_refs.bib
# 3. 通过 DOI 添加特定论文
python scripts/doi_to_bibtex.py 10.1038/nature12345 >> crispr_refs.bib
python scripts/doi_to_bibtex.py 10.1126/science.abcd1234 >> crispr_refs.bib
# 4. 格式化和清理 BibTeX 文件
python scripts/format_bibtex.py crispr_refs.bib \
--deduplicate \
--sort year \
--descending \
--output references.bib
# 5. 验证所有引文
python scripts/validate_citations.py references.bib \
--auto-fix \
--report validation.json \
--output final_references.bib
# 6. 审查验证报告并修复任何剩余问题
cat validation.json
# 7. 在您的 LaTeX 文档中使用
# \bibliography{final_references}
此技能是对 literature-review 技能的补充:
文献综述技能 → 系统性搜索和综合 引文管理技能 → 技术性引文处理
组合工作流程:
literature-review 进行全面的多数据库搜索citation-management 提取和验证所有引文literature-review 按主题综合发现citation-management 验证最终参考文献列表的准确性# 完成文献综述后
# 验证综述文档中的所有引文
python scripts/validate_citations.py my_review_references.bib --report review_validation.json
# 如果需要,按特定引文样式格式化
python scripts/format_bibtex.py my_review_references.bib \
--style nature \
--output formatted_refs.bib
查找开创性论文:
高级运算符(完整列表见 references/google_scholar_search.md):
"exact phrase" # 精确短语匹配
author:lastname # 按作者搜索
intitle:keyword # 仅在标题中搜索
source:journal # 搜索特定期刊
-exclude # 排除术语
OR # 替代术语
2020..2024 # 年份范围
搜索示例:
# 查找某个主题的最新综述
"CRISPR" intitle:review 2023..2024
# 查找特定作者关于某个主题的论文
author:Church "synthetic biology"
# 查找高被引的基础性工作
"deep learning" 2012..2015 sort:citations
# 排除综述,专注于方法
"protein folding" -survey -review intitle:method
使用 MeSH 术语:MeSH(医学主题词)提供了用于精确搜索的受控词汇。
"Diabetes Mellitus, Type 2"[MeSH]字段标签:
[Title] # 仅在标题中搜索
[Title/Abstract] # 在标题或摘要中搜索
[Author] # 按作者姓名搜索
[Journal] # 搜索特定期刊
[Publication Date] # 日期范围
[Publication Type] # 文章类型
[MeSH] # MeSH 术语
构建复杂查询:
# 最近发表的糖尿病治疗临床试验
"Diabetes Mellitus, Type 2"[MeSH] AND "Drug Therapy"[MeSH]
AND "Clinical Trial"[Publication Type] AND 2020:2024[Publication Date]
# 特定期刊中关于 CRISPR 的综述
"CRISPR-Cas Systems"[MeSH] AND "Nature"[Journal] AND "Review"[Publication Type]
# 特定作者的最新工作
"Smith AB"[Author] AND cancer[Title/Abstract] AND 2022:2024[Publication Date]
用于自动化的 E-utilities:脚本使用 NCBI E-utilities API 进行程序化访问:
完整的 API 文档请参见 references/pubmed_search.md。
搜索 Google Scholar 并导出结果。
功能:
用法:
# 基本搜索
python scripts/search_google_scholar.py "quantum computing"
# 带过滤器的高级搜索
python scripts/search_google_scholar.py "quantum computing" \
--year-start 2020 \
--year-end 2024 \
--limit 100 \
--sort-by citations \
--output quantum_papers.json
# 直接导出为 BibTeX
python scripts/search_google_scholar.py "machine learning" \
--limit 50 \
--format bibtex \
--output ml_papers.bib
使用 E-utilities API 搜索 PubMed。
功能:
用法:
# 简单关键词搜索
python scripts/search_pubmed.py "CRISPR gene editing"
# 带过滤器的复杂查询
python scripts/search_pubmed.py \
--query '"CRISPR-Cas Systems"[MeSH] AND "therapeutic"[Title/Abstract]' \
--date-start 2020-01-01 \
--date-end 2024-12-31 \
--publication-types "Clinical Trial,Review" \
--limit 200 \
--output crispr_therapeutic.json
# 导出为 BibTeX
python scripts/search_pubmed.py "Alzheimer's disease" \
--limit 100 \
--format bibtex \
--output alzheimers.bib
从论文标识符中提取完整元数据。
功能:
用法:
# 单个 DOI
python scripts/extract_metadata.py --doi 10.1038/s41586-021-03819-2
# 单个 PMID
python scripts/extract_metadata.py --pmid 34265844
# 单个 arXiv ID
python scripts/extract_metadata.py --arxiv 2103.14030
# 从 URL 提取
python scripts/extract_metadata.py \
--url "https://www.nature.com/articles/s41586-021-03819-2"
# 批量处理(每行一个标识符的文件)
python scripts/extract_metadata.py \
--input paper_ids.txt \
--output references.bib
# 不同的输出格式
python scripts/extract_metadata.py \
--doi 10.1038/nature12345 \
--format json # 或 bibtex, yaml
验证 BibTeX 条目的准确性和完整性。
功能:
用法:
# 基本验证
python scripts/validate_citations.py references.bib
# 带自动修复
python scripts/validate_citations.py references.bib \
--auto-fix \
--output fixed_references.bib
# 详细验证报告
python scripts/validate_citations.py references.bib \
--report validation_report.json \
--verbose
# 仅检查 DOI
python scripts/validate_citations.py references.bib \
--check-dois-only
格式化和清理 BibTeX 文件。
功能:
用法:
# 基本格式化
python scripts/format_bibtex.py references.bib
# 按年份排序(最新优先)
python scripts/format_bibtex.py references.bib \
--sort year \
--descending \
--output sorted_refs.bib
# 删除重复项
python scripts/format_bibtex.py references.bib \
--deduplicate \
--output clean_refs.bib
# 完全清理
python scripts/format_bibtex.py references.bib \
--deduplicate \
--sort year \
--validate \
--auto-fix \
--output final_refs.bib
快速将 DOI 转换为 BibTeX。
功能:
用法:
# 单个 DOI
python scripts/doi_to_bibtex.py 10.1038/s41586-021-03819-2
# 多个 DOI
python scripts/doi_to_bibtex.py \
10.1038/nature12345 \
10.1126/science.abc1234 \
10.1016/j.cell.2023.01.001
# 从文件(每行一个 DOI)
python scripts/doi_to_bibtex.py --input dois.txt --output references.bib
# 复制到剪贴板
python scripts/doi_to_bibtex.py 10.1038/nature12345 --clipboard
先宽后窄:
使用多个来源:
利用引文:
记录您的搜索:
尽可能使用 DOI:
验证提取的元数据:
处理边缘情况:
保持一致性:
遵循约定:
保持简洁:
系统化组织:
尽早且经常验证:
及时修复问题:
对关键引文进行人工审查:
单一来源偏见:仅使用 Google Scholar 或 PubMed
盲目接受元数据:不验证提取的信息
忽略 DOI 错误:参考文献列表中存在损坏或不正确的 DOI
格式不一致:混合的引文键样式、格式
重复条目:同一论文使用不同键多次引用
缺少必填字段:不完整的 BibTeX 条目
过时的预印本:引用预印本而存在已发表版本
特殊字符问题:由于字符导致 LaTeX 编译失败
提交前未验证:提交带有引文错误
手动输入 BibTeX 条目:手动输入条目
* **解决方案**:始终使用脚本从元数据源提取
# 步骤 1:查找您主题的关键论文
python scripts/search_google_scholar.py "transformer neural networks" \
--year-start 2017 \
--limit 50 \
--output transformers_gs.json
python scripts/search_pubmed.py "deep learning medical imaging" \
--date-start 2020 \
--limit 50 \
--output medical_dl_pm.json
# 步骤 2:从搜索结果中提取元数据
python scripts/extract_metadata.py \
--input transformers_gs.json \
--output transformers.bib
python scripts/extract_metadata.py \
--input medical_dl_pm.json \
--output medical.bib
# 步骤 3:添加您已知的特定论文
python scripts/doi_to_bibtex.py 10.1038/s41586-021-03819-2 >> specific.bib
python scripts/doi_to_bibtex.py 10.1126/science.aam9317 >> specific.bib
# 步骤 4:合并所有 BibTeX 文件
cat transformers.bib medical.bib specific.bib > combined.bib
# 步骤 5:格式化和去重
python scripts/format_bibtex.py combined.bib \
--deduplicate \
--sort year \
--descending \
--output formatted.bib
# 步骤 6:验证
python scripts/validate_citations.py formatted.bib \
--auto-fix \
--report validation.json \
--output final_references.bib
# 步骤 7:审查任何问题
cat validation.json | grep -A 3 '"errors"'
# 步骤 8:在 LaTeX 中使用
# \bibliography{final_references}
# 您有一个包含 DOI 的文本文件(每行一个)
# dois.txt 包含:
# 10.1038/s41586-021-03819-2
# 10.1126/science.aam9317
# 10.1016/j.cell.2023.01.001
# 全部转换为 BibTeX
python scripts/doi_to_bibtex.py --input dois.txt --output references.bib
# 验证结果
python scripts/validate_citations.py references.bib --verbose
# 您有一个来自不同来源的混乱 BibTeX 文件
# 系统地清理它
# 步骤 1:格式化和标准化
python scripts/format_bibtex.py messy_references.bib \
--output step1_formatted.bib
# 步骤 2:删除重复项
python scripts/format_bibtex.py step1_formatted.bib \
--deduplicate \
--output step2_deduplicated.bib
# 步骤 3:验证和自动修复
python scripts/validate_citations.py step2_deduplicated.bib \
--auto-fix \
--output step3_validated.bib
# 步骤 4:按年份排序
python scripts/format_bibtex.py step3_validated.bib \
--sort year \
--descending \
--output clean_references.bib
# 步骤 5:最终验证报告
python scripts/validate_citations.py clean_references.bib \
--report final_validation.json \
--verbose
# 审查报告
cat final_validation.json
# 查找某个主题的高被引论文
python scripts/search_google_scholar.py "AlphaFold protein structure" \
--year-start 2020 \
--year-end 2024 \
--sort-by citations \
--limit 20 \
--output alphafold_seminal.json
# 提取被引次数前 10 的论文
# (脚本将在 JSON 中包含被引次数)
# 转换为 BibTeX
python scripts/extract_metadata.py \
--input alphafold_seminal.json \
--output alphafold_refs.bib
# BibTeX 文件现在包含最具影响力的论文
引文管理 为 文献综述 提供技术基础设施:
组合工作流程:
引文管理 确保 科学写作 的准确参考文献:
引文管理 与 会议模板 配合使用,生成可提交的手稿:
参考文献(位于 references/):
google_scholar_search.md:完整的 Google Scholar 搜索指南pubmed_search.md:PubMed 和 E-utilities API 文档metadata_extraction.md:元数据来源和字段要求citation_validation.md:验证标准和质量检查bibtex_formatting.md:BibTeX 条目类型和格式化规则脚本(位于 scripts/):
search_google_scholar.py:Google Scholar 搜索自动化search_pubmed.py:PubMed E-utilities API 客户端extract_metadata.py:通用元数据提取器validate_citations.py:引文验证和核实format_bibtex.py:BibTeX 格式化器和清理器doi_to_bibtex.py:快速 DOI 到 BibTeX 转换器资产(位于 assets/):
bibtex_template.bib:所有类型的示例 BibTeX 条目citation_checklist.md:质量保证清单搜索引擎:
元数据 API:
工具和验证器:
引文样式:
# 核心依赖项
pip install requests # 用于 API 的 HTTP 请求
pip install bibtexparser # BibTeX 解析和格式化
pip install
Manage citations systematically throughout the research and writing process. This skill provides tools and strategies for searching academic databases (Google Scholar, PubMed), extracting accurate metadata from multiple sources (CrossRef, PubMed, arXiv), validating citation information, and generating properly formatted BibTeX entries.
Critical for maintaining citation accuracy, avoiding reference errors, and ensuring reproducible research. Integrates seamlessly with the literature-review skill for comprehensive research workflows.
Use this skill when:
When creating documents with this skill, always consider adding scientific diagrams and schematics to enhance visual communication.
If your document does not already contain schematics or diagrams:
For new documents: Scientific schematics should be generated by default to visually represent key concepts, workflows, architectures, or relationships described in the text.
How to generate schematics:
python scripts/generate_schematic.py "your diagram description" -o figures/output.png
The AI will automatically:
When to add schematics:
For detailed guidance on creating schematics, refer to the scientific-schematics skill documentation.
Citation management follows a systematic process:
Goal : Find relevant papers using academic search engines.
Google Scholar provides the most comprehensive coverage across disciplines.
Basic Search :
# Search for papers on a topic
python scripts/search_google_scholar.py "CRISPR gene editing" \
--limit 50 \
--output results.json
# Search with year filter
python scripts/search_google_scholar.py "machine learning protein folding" \
--year-start 2020 \
--year-end 2024 \
--limit 100 \
--output ml_proteins.json
Advanced Search Strategies (see references/google_scholar_search.md):
"deep learning"author:LeCunintitle:"neural networks"machine learning -surveyBest Practices :
PubMed specializes in biomedical and life sciences literature (35+ million citations).
Basic Search :
# Search PubMed
python scripts/search_pubmed.py "Alzheimer's disease treatment" \
--limit 100 \
--output alzheimers.json
# Search with MeSH terms and filters
python scripts/search_pubmed.py \
--query '"Alzheimer Disease"[MeSH] AND "Drug Therapy"[MeSH]' \
--date-start 2020 \
--date-end 2024 \
--publication-types "Clinical Trial,Review" \
--output alzheimers_trials.json
Advanced PubMed Queries (see references/pubmed_search.md):
"Diabetes Mellitus"[MeSH]"cancer"[Title], "Smith J"[Author]AND, OR, NOT2020:2024[Publication Date]"Review"[Publication Type]Best Practices :
Goal : Convert paper identifiers (DOI, PMID, arXiv ID) to complete, accurate metadata.
For single DOIs, use the quick conversion tool:
# Convert single DOI
python scripts/doi_to_bibtex.py 10.1038/s41586-021-03819-2
# Convert multiple DOIs from a file
python scripts/doi_to_bibtex.py --input dois.txt --output references.bib
# Different output formats
python scripts/doi_to_bibtex.py 10.1038/nature12345 --format json
For DOIs, PMIDs, arXiv IDs, or URLs:
# Extract from DOI
python scripts/extract_metadata.py --doi 10.1038/s41586-021-03819-2
# Extract from PMID
python scripts/extract_metadata.py --pmid 34265844
# Extract from arXiv ID
python scripts/extract_metadata.py --arxiv 2103.14030
# Extract from URL
python scripts/extract_metadata.py --url "https://www.nature.com/articles/s41586-021-03819-2"
# Batch extraction from file (mixed identifiers)
python scripts/extract_metadata.py --input identifiers.txt --output citations.bib
Metadata Sources (see references/metadata_extraction.md):
CrossRef API : Primary source for DOIs
PubMed E-utilities : Biomedical literature
arXiv API : Preprints in physics, math, CS, q-bio
DataCite API : Research datasets, software, other resources
What Gets Extracted :
Goal : Generate clean, properly formatted BibTeX entries.
See references/bibtex_formatting.md for complete guide.
Common Entry Types :
@article: Journal articles (most common)@book: Books@inproceedings: Conference papers@incollection: Book chapters@phdthesis: Dissertations@misc: Preprints, software, datasetsRequired Fields by Type :
@article{citationkey,
author = {Last1, First1 and Last2, First2},
title = {Article Title},
journal = {Journal Name},
year = {2024},
volume = {10},
number = {3},
pages = {123--145},
doi = {10.1234/example}
}
@inproceedings{citationkey,
author = {Last, First},
title = {Paper Title},
booktitle = {Conference Name},
year = {2024},
pages = {1--10}
}
@book{citationkey,
author = {Last, First},
title = {Book Title},
publisher = {Publisher Name},
year = {2024}
}
Use the formatter to standardize BibTeX files:
# Format and clean BibTeX file
python scripts/format_bibtex.py references.bib \
--output formatted_references.bib
# Sort entries by citation key
python scripts/format_bibtex.py references.bib \
--sort key \
--output sorted_references.bib
# Sort by year (newest first)
python scripts/format_bibtex.py references.bib \
--sort year \
--descending \
--output sorted_references.bib
# Remove duplicates
python scripts/format_bibtex.py references.bib \
--deduplicate \
--output clean_references.bib
# Validate and report issues
python scripts/format_bibtex.py references.bib \
--validate \
--report validation_report.txt
Formatting Operations :
Goal : Verify all citations are accurate and complete.
# Validate BibTeX file
python scripts/validate_citations.py references.bib
# Validate and fix common issues
python scripts/validate_citations.py references.bib \
--auto-fix \
--output validated_references.bib
# Generate detailed validation report
python scripts/validate_citations.py references.bib \
--report validation_report.json \
--verbose
Validation Checks (see references/citation_validation.md):
DOI Verification :
Required Fields :
Data Consistency :
Duplicate Detection :
Format Compliance :
Validation Output :
{
"total_entries": 150,
"valid_entries": 145,
"errors": [
{
"citation_key": "Smith2023",
"error_type": "missing_field",
"field": "journal",
"severity": "high"
},
{
"citation_key": "Jones2022",
"error_type": "invalid_doi",
"doi": "10.1234/broken",
"severity": "high"
}
],
"warnings": [
{
"citation_key": "Brown2021",
"warning_type": "possible_duplicate",
"duplicate_of": "Brown2021a",
"severity": "medium"
}
]
}
Complete workflow for creating a bibliography:
# 1. Search for papers on your topic
python scripts/search_pubmed.py \
'"CRISPR-Cas Systems"[MeSH] AND "Gene Editing"[MeSH]' \
--date-start 2020 \
--limit 200 \
--output crispr_papers.json
# 2. Extract DOIs from search results and convert to BibTeX
python scripts/extract_metadata.py \
--input crispr_papers.json \
--output crispr_refs.bib
# 3. Add specific papers by DOI
python scripts/doi_to_bibtex.py 10.1038/nature12345 >> crispr_refs.bib
python scripts/doi_to_bibtex.py 10.1126/science.abcd1234 >> crispr_refs.bib
# 4. Format and clean the BibTeX file
python scripts/format_bibtex.py crispr_refs.bib \
--deduplicate \
--sort year \
--descending \
--output references.bib
# 5. Validate all citations
python scripts/validate_citations.py references.bib \
--auto-fix \
--report validation.json \
--output final_references.bib
# 6. Review validation report and fix any remaining issues
cat validation.json
# 7. Use in your LaTeX document
# \bibliography{final_references}
This skill complements the literature-review skill:
Literature Review Skill → Systematic search and synthesis Citation Management Skill → Technical citation handling
Combined Workflow :
literature-review for comprehensive multi-database searchcitation-management to extract and validate all citationsliterature-review to synthesize findings thematicallycitation-management to verify final bibliography accuracy# After completing literature review
# Verify all citations in the review document
python scripts/validate_citations.py my_review_references.bib --report review_validation.json
# Format for specific citation style if needed
python scripts/format_bibtex.py my_review_references.bib \
--style nature \
--output formatted_refs.bib
Finding Seminal Papers :
Advanced Operators (full list in references/google_scholar_search.md):
"exact phrase" # Exact phrase matching
author:lastname # Search by author
intitle:keyword # Search in title only
source:journal # Search specific journal
-exclude # Exclude terms
OR # Alternative terms
2020..2024 # Year range
Example Searches :
# Find recent reviews on a topic
"CRISPR" intitle:review 2023..2024
# Find papers by specific author on topic
author:Church "synthetic biology"
# Find highly cited foundational work
"deep learning" 2012..2015 sort:citations
# Exclude surveys and focus on methods
"protein folding" -survey -review intitle:method
Using MeSH Terms : MeSH (Medical Subject Headings) provides controlled vocabulary for precise searching.
"Diabetes Mellitus, Type 2"[MeSH]Field Tags :
[Title] # Search in title only
[Title/Abstract] # Search in title or abstract
[Author] # Search by author name
[Journal] # Search specific journal
[Publication Date] # Date range
[Publication Type] # Article type
[MeSH] # MeSH term
Building Complex Queries :
# Clinical trials on diabetes treatment published recently
"Diabetes Mellitus, Type 2"[MeSH] AND "Drug Therapy"[MeSH]
AND "Clinical Trial"[Publication Type] AND 2020:2024[Publication Date]
# Reviews on CRISPR in specific journal
"CRISPR-Cas Systems"[MeSH] AND "Nature"[Journal] AND "Review"[Publication Type]
# Specific author's recent work
"Smith AB"[Author] AND cancer[Title/Abstract] AND 2022:2024[Publication Date]
E-utilities for Automation : The scripts use NCBI E-utilities API for programmatic access:
See references/pubmed_search.md for complete API documentation.
Search Google Scholar and export results.
Features :
Usage :
# Basic search
python scripts/search_google_scholar.py "quantum computing"
# Advanced search with filters
python scripts/search_google_scholar.py "quantum computing" \
--year-start 2020 \
--year-end 2024 \
--limit 100 \
--sort-by citations \
--output quantum_papers.json
# Export directly to BibTeX
python scripts/search_google_scholar.py "machine learning" \
--limit 50 \
--format bibtex \
--output ml_papers.bib
Search PubMed using E-utilities API.
Features :
Usage :
# Simple keyword search
python scripts/search_pubmed.py "CRISPR gene editing"
# Complex query with filters
python scripts/search_pubmed.py \
--query '"CRISPR-Cas Systems"[MeSH] AND "therapeutic"[Title/Abstract]' \
--date-start 2020-01-01 \
--date-end 2024-12-31 \
--publication-types "Clinical Trial,Review" \
--limit 200 \
--output crispr_therapeutic.json
# Export to BibTeX
python scripts/search_pubmed.py "Alzheimer's disease" \
--limit 100 \
--format bibtex \
--output alzheimers.bib
Extract complete metadata from paper identifiers.
Features :
Usage :
# Single DOI
python scripts/extract_metadata.py --doi 10.1038/s41586-021-03819-2
# Single PMID
python scripts/extract_metadata.py --pmid 34265844
# Single arXiv ID
python scripts/extract_metadata.py --arxiv 2103.14030
# From URL
python scripts/extract_metadata.py \
--url "https://www.nature.com/articles/s41586-021-03819-2"
# Batch processing (file with one identifier per line)
python scripts/extract_metadata.py \
--input paper_ids.txt \
--output references.bib
# Different output formats
python scripts/extract_metadata.py \
--doi 10.1038/nature12345 \
--format json # or bibtex, yaml
Validate BibTeX entries for accuracy and completeness.
Features :
Usage :
# Basic validation
python scripts/validate_citations.py references.bib
# With auto-fix
python scripts/validate_citations.py references.bib \
--auto-fix \
--output fixed_references.bib
# Detailed validation report
python scripts/validate_citations.py references.bib \
--report validation_report.json \
--verbose
# Only check DOIs
python scripts/validate_citations.py references.bib \
--check-dois-only
Format and clean BibTeX files.
Features :
Usage :
# Basic formatting
python scripts/format_bibtex.py references.bib
# Sort by year (newest first)
python scripts/format_bibtex.py references.bib \
--sort year \
--descending \
--output sorted_refs.bib
# Remove duplicates
python scripts/format_bibtex.py references.bib \
--deduplicate \
--output clean_refs.bib
# Complete cleanup
python scripts/format_bibtex.py references.bib \
--deduplicate \
--sort year \
--validate \
--auto-fix \
--output final_refs.bib
Quick DOI to BibTeX conversion.
Features :
Usage :
# Single DOI
python scripts/doi_to_bibtex.py 10.1038/s41586-021-03819-2
# Multiple DOIs
python scripts/doi_to_bibtex.py \
10.1038/nature12345 \
10.1126/science.abc1234 \
10.1016/j.cell.2023.01.001
# From file (one DOI per line)
python scripts/doi_to_bibtex.py --input dois.txt --output references.bib
# Copy to clipboard
python scripts/doi_to_bibtex.py 10.1038/nature12345 --clipboard
Start broad, then narrow :
Use multiple sources :
Leverage citations :
Document your searches :
Always use DOIs when available :
Verify extracted metadata :
Handle edge cases :
Maintain consistency :
Follow conventions :
Keep it clean :
Organize systematically :
Validate early and often :
Fix issues promptly :
Manual review for critical citations :
Single source bias : Only using Google Scholar or PubMed
Accepting metadata blindly : Not verifying extracted information
Ignoring DOI errors : Broken or incorrect DOIs in bibliography
Inconsistent formatting : Mixed citation key styles, formatting
Duplicate entries : Same paper cited multiple times with different keys
Missing required fields : Incomplete BibTeX entries
Outdated preprints : Citing preprint when published version exists
* **Solution** : Always extract from metadata sources using scripts
# Step 1: Find key papers on your topic
python scripts/search_google_scholar.py "transformer neural networks" \
--year-start 2017 \
--limit 50 \
--output transformers_gs.json
python scripts/search_pubmed.py "deep learning medical imaging" \
--date-start 2020 \
--limit 50 \
--output medical_dl_pm.json
# Step 2: Extract metadata from search results
python scripts/extract_metadata.py \
--input transformers_gs.json \
--output transformers.bib
python scripts/extract_metadata.py \
--input medical_dl_pm.json \
--output medical.bib
# Step 3: Add specific papers you already know
python scripts/doi_to_bibtex.py 10.1038/s41586-021-03819-2 >> specific.bib
python scripts/doi_to_bibtex.py 10.1126/science.aam9317 >> specific.bib
# Step 4: Combine all BibTeX files
cat transformers.bib medical.bib specific.bib > combined.bib
# Step 5: Format and deduplicate
python scripts/format_bibtex.py combined.bib \
--deduplicate \
--sort year \
--descending \
--output formatted.bib
# Step 6: Validate
python scripts/validate_citations.py formatted.bib \
--auto-fix \
--report validation.json \
--output final_references.bib
# Step 7: Review any issues
cat validation.json | grep -A 3 '"errors"'
# Step 8: Use in LaTeX
# \bibliography{final_references}
# You have a text file with DOIs (one per line)
# dois.txt contains:
# 10.1038/s41586-021-03819-2
# 10.1126/science.aam9317
# 10.1016/j.cell.2023.01.001
# Convert all to BibTeX
python scripts/doi_to_bibtex.py --input dois.txt --output references.bib
# Validate the result
python scripts/validate_citations.py references.bib --verbose
# You have a messy BibTeX file from various sources
# Clean it up systematically
# Step 1: Format and standardize
python scripts/format_bibtex.py messy_references.bib \
--output step1_formatted.bib
# Step 2: Remove duplicates
python scripts/format_bibtex.py step1_formatted.bib \
--deduplicate \
--output step2_deduplicated.bib
# Step 3: Validate and auto-fix
python scripts/validate_citations.py step2_deduplicated.bib \
--auto-fix \
--output step3_validated.bib
# Step 4: Sort by year
python scripts/format_bibtex.py step3_validated.bib \
--sort year \
--descending \
--output clean_references.bib
# Step 5: Final validation report
python scripts/validate_citations.py clean_references.bib \
--report final_validation.json \
--verbose
# Review report
cat final_validation.json
# Find highly cited papers on a topic
python scripts/search_google_scholar.py "AlphaFold protein structure" \
--year-start 2020 \
--year-end 2024 \
--sort-by citations \
--limit 20 \
--output alphafold_seminal.json
# Extract the top 10 by citation count
# (script will have included citation counts in JSON)
# Convert to BibTeX
python scripts/extract_metadata.py \
--input alphafold_seminal.json \
--output alphafold_refs.bib
# The BibTeX file now contains the most influential papers
Citation Management provides the technical infrastructure for Literature Review :
Combined workflow :
Citation Management ensures accurate references for Scientific Writing :
Citation Management works with Venue Templates for submission-ready manuscripts:
References (in references/):
google_scholar_search.md: Complete Google Scholar search guidepubmed_search.md: PubMed and E-utilities API documentationmetadata_extraction.md: Metadata sources and field requirementscitation_validation.md: Validation criteria and quality checksbibtex_formatting.md: BibTeX entry types and formatting rulesScripts (in scripts/):
search_google_scholar.py: Google Scholar search automationsearch_pubmed.py: PubMed E-utilities API clientextract_metadata.py: Universal metadata extractorvalidate_citations.py: Citation validation and verificationformat_bibtex.py: BibTeX formatter and cleanerdoi_to_bibtex.py: Quick DOI to BibTeX converterAssets (in assets/):
bibtex_template.bib: Example BibTeX entries for all typescitation_checklist.md: Quality assurance checklistSearch Engines :
Metadata APIs :
Tools and Validators :
Citation Styles :
# Core dependencies
pip install requests # HTTP requests for APIs
pip install bibtexparser # BibTeX parsing and formatting
pip install biopython # PubMed E-utilities access
# Optional (for Google Scholar)
pip install scholarly # Google Scholar API wrapper
# or
pip install selenium # For more robust Scholar scraping
# For advanced validation
pip install crossref-commons # Enhanced CrossRef API access
pip install pylatexenc # LaTeX special character handling
The citation-management skill provides:
Use this skill to maintain accurate, complete citations throughout your research and ensure publication-ready bibliographies.
Weekly Installs
328
Repository
GitHub Stars
22.6K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
opencode273
gemini-cli258
codex242
claude-code237
cursor229
github-copilot210
Azure RBAC 权限管理工具:查找最小角色、创建自定义角色与自动化分配
101,200 周安装
Special character issues : Broken LaTeX compilation due to characters
No validation before submission : Submitting with citation errors
Manual BibTeX entry : Typing entries by hand