folder-organization by delphine-l/claude_global
npx skills add https://github.com/delphine-l/claude_global --skill folder-organization为研究和开发工作提供组织项目目录、建立文件命名规范以及维护清晰、可导航项目结构的专家指导。
project-name/
├── README.md # 项目概述和入门指南
├── .gitignore # 排除数据、输出、环境文件
├── environment.yml # Conda 环境(或 requirements.txt)
├── data/ # 输入数据(通常被 gitignore)
│ ├── raw/ # 原始、不可变数据
│ ├── processed/ # 清理、转换后的数据
│ └── external/ # 第三方数据
├── notebooks/ # 用于探索的 Jupyter notebooks
│ ├── 01-exploration.ipynb
│ ├── 02-analysis.ipynb
│ └── figures/ # Notebook 生成的图表
├── src/ # 源代码(可重用模块)
│ ├── __init__.py
│ ├── data_processing.py
│ ├── analysis.py
│ └── visualization.py
├── scripts/ # 独立脚本和工作流
│ ├── download_data.sh
│ └── run_pipeline.py
├── tests/ # 单元测试
│ └── test_analysis.py
├── docs/ # 文档
│ ├── methods.md
│ └── references.md
├── results/ # 分析输出(gitignored)
│ ├── figures/
│ ├── tables/
│ └── models/
└── config/ # 配置文件
└── analysis_config.yaml
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
project-name/
├── README.md
├── .gitignore
├── setup.py # 包配置
├── requirements.txt # 或 pyproject.toml
├── src/
│ └── package_name/
│ ├── __init__.py
│ ├── core.py
│ └── utils.py
├── tests/
│ ├── test_core.py
│ └── test_utils.py
├── docs/
│ ├── api.md
│ └── usage.md
├── examples/ # 使用示例
│ └── example_workflow.py
└── .github/ # CI/CD 工作流
└── workflows/
└── tests.yml
project-name/
├── README.md
├── data/
│ ├── raw/ # 原始测序数据
│ ├── reference/ # 参考基因组、注释
│ └── processed/ # 工作流输出
├── workflows/ # Galaxy .ga 或 Snakemake 文件
│ ├── preprocessing.ga
│ └── assembly.ga
├── config/
│ ├── workflow_params.yaml
│ └── sample_sheet.tsv
├── scripts/ # 辅助脚本
│ ├── submit_workflow.py
│ └── quality_check.py
├── results/ # 最终输出
│ ├── figures/
│ ├── tables/
│ └── reports/
└── logs/ # 工作流执行日志
对于涉及 Jupyter notebooks、数据分析和生成大量图表可视化的项目:
project-name/
├── README.md # 项目概述
├── .gitignore
├── notebooks/ # Jupyter notebooks(分析、探索)
│ ├── 01-data-loading.ipynb
│ ├── 02-exploratory-analysis.ipynb
│ └── 03-final-analysis.ipynb
├── figures/ # 所有生成的可视化(PNG、PDF、SVG)
│ ├── fig1_distribution.png
│ ├── fig2_correlation.png
│ └── supplementary_*.png
├── data/ # 数据文件(JSON、CSV、TSV、Excel)
│ ├── raw/ # (可选)原始、未处理数据
│ ├── processed/ # (可选)清理、处理后的数据
│ ├── input_data.json
│ └── metadata.tsv
├── tests/ # 测试脚本(test_*.py、pytest)
│ ├── test_processing.py
│ └── test_analysis.py
├── scripts/ # 独立的 Python/R 脚本
│ ├── data_fetch.py
│ └── preprocessing.py
├── docs/ # 文档(MD、RST 文件)
│ ├── methods.md
│ └── analysis_notes.md
└── archives/ # 压缩归档、旧版本
└── backup_YYYYMMDD.tar.gz
此结构的优点:
何时使用此结构:
MANIFEST 集成:
为了增强导航和令牌效率,添加 MANIFEST.md 文件:
project-name/
├── MANIFEST.md # 根项目索引(~1,500 令牌)
├── MANIFEST_TEMPLATE.md # 创建新 MANIFEST 的模板
├── notebooks/
│ ├── MANIFEST.md # (可选)Notebook 目录
│ └── [notebook 文件]
├── figures/
│ ├── MANIFEST.md # 图表目录(~500-1,000 令牌)
│ └── [图表文件]
├── data/
│ ├── MANIFEST.md # 数据清单(~500-1,000 令牌)
│ └── [数据文件]
├── scripts/
│ ├── MANIFEST.md # 脚本文档(~500-1,000 令牌)
│ └── [脚本文件]
└── documentation/
├── MANIFEST.md # 文档组织(~500-1,000 令牌)
└── [文档文件]
优点:
有关完整文档,请参阅下面的“用于令牌高效导航的 MANIFEST 系统”部分。
data-analysis.py 或 data_analysis.pyDataAnalysis.py 或 data analysis.pyprocess-telomere-data.pyscript.py 或 process_all_the_telomere_sequencing_data_from_experiments.pyreport-2026-01-23.pdf 或 model-v2.pklreport-final-final-v3.pdf对于顺序文件(notebooks、脚本),使用零填充编号:
notebooks/
├── 01-data-exploration.ipynb
├── 02-quality-control.ipynb
├── 03-statistical-analysis.ipynb
└── 04-visualization.ipynb
尽可能在文件名中包含元数据:
data/raw/
├── sample-A_hifi_reads_2026-01-15.fastq.gz
├── sample-B_hifi_reads_2026-01-15.fastq.gz
└── reference_genome_v3.fasta
应该提交:
不应该提交:
.gitignore)venv/、conda-env/)# Claude Code(本地技能和设置)
.claude/
# Python
__pycache__/
*.py[cod]
*$py.class
.venv/
venv/
*.egg-info/
# Jupyter
.ipynb_checkpoints/
*.ipynb_checkpoints
# Data
data/raw/
data/processed/
*.fastq.gz
*.bam
*.vcf.gz
# Outputs
results/
outputs/
*.png
*.pdf
*.html
# Logs
logs/
*.log
# Environment
.env
environment.local.yml
# OS
.DS_Store
Thumbs.db
标准化使用 session-saves/(而不是 sessions-history/、Archive/ 或其他变体)
优点:
archived/ 模式重组时:
mv sessions-history/ session-saves/
mv Archive/ session-saves/ # 如果用于会话笔记
项目文件夹模板(包含会话笔记):
project-name/
├── TO-DOS.md # 项目特定任务
├── session-saves/ # 工作会话笔记(标记为 #dump)
├── archived/ # 已处理/合并的笔记
│ ├── daily/ # 每日合并
│ └── monthly/ # 月度摘要
├── Planning/ # 规划文档
├── Development/ # 开发笔记
└── [其他内容文件夹]
与 Obsidian 集成:
dump 标签以便过滤session-saves/ 移动到 archived/daily/ 或 archived/monthly/清理旧的会话文件夹变体:
如果在重组期间重命名了文件夹(例如,sessions-history/ → session-saves/):
移动已归档的会话:
mv old-folder/*.md session-saves/archived/daily/
安全移除空旧文件夹:
rmdir old-folder/ # 如果不为空则失败 - 安全检查
更新指向旧文件夹结构的任何链接
不要留下多个会话文件夹:
sessions-history/ 和 session-saves/ 同时存在session-saves/ 包含归档内容验证:
# 检查会话文件夹变体
find . -type d -name "*session*" -not -path "*/archived/*"
# 应仅显示:./session-saves/
data/raw/ 中,如果可能使其只读data/
├── raw/ # 原始、不可变
├── interim/ # 中间处理步骤
├── processed/ # 最终、可用于分析的数据
└── external/ # 第三方数据
每个项目都应有一个包含以下内容的 README:
# 项目名称
简要描述
## 安装
如何设置环境
## 使用
如何运行分析/代码
## 项目结构
目录简要概述
## 数据
数据位置及如何访问
## 结果
在哪里找到输出
在重大更改(清理、弃用、恢复)后,创建摘要文档:
创建带日期的摘要文档:
OPERATION_SUMMARY_YYYY-MM-DD.md
必要部分:
良好摘要文档示例:
FIGURE_RESTORATION_SUMMARY.md - 记录恢复的文件DEPRECATION_SUMMARY.md - 记录弃用的 notebooksRECENT_CHANGES_SUMMARY.md - 高级概述# [操作] 摘要 - [日期]
## 问题
[问题简要描述]
## 解决方案
[为解决它做了什么]
### 更改的文件
- **移动**:[列表]
- **恢复**:[列表]
- **更新**:[列表]
## 当前状态
- **活动文件**:[数量和列表]
- **弃用文件**:[数量和列表]
- **状态**:[就绪/进行中/等]
## 恢复说明
```bash
# 如果需要撤销更改的命令
日期:YYYY-MM-DD 状态:[完成/部分/等]
**为什么这很重要**:
- 未来用户(包括你自己)了解发生了什么变化
- 提供恢复说明(如果需要)
- 创建项目历史的审计跟踪
- 帮助协作者理解项目演变
## 常见反模式避免
❌ **根目录扁平结构**
project/ ├── script1.py ├── script2.py ├── data.csv ├── output1.png ├── output2.png └── final_really_final_v3.xlsx
❌ **模糊命名**
notebooks/ ├── notebook1.ipynb ├── test.ipynb ├── analysis.ipynb └── analysis_new.ipynb
❌ **关注点混合**
project/ ├── src/ │ ├── analysis.py │ ├── data.csv # 源代码目录中的数据 │ └── figure1.png # 源代码目录中的输出
## 项目重组
当重组现有项目的文件夹结构时,遵循此系统方法以避免破坏依赖关系。
### 重组后路径更新
重组文件夹后,**务必验证并更新所有脚本和 notebooks 中的文件路径**。文件通常包含硬编码路径,在重组后会失效。
#### 系统路径更新过程
1. **识别所有引用其他文件的文件**:
```bash
# 查找 notebooks
find . -name "*.ipynb"
# 查找 Python 脚本
find . -name "*.py"
# 搜索文件引用
grep -r "\.json\|\.csv\|\.tsv\|\.png" --include="*.ipynb" --include="*.py"
要搜索的常见文件引用模式:
.json、.csv、.tsv、.xlsx.png、.jpg、.pdf、.svgopen()、read_csv()、read_json()、load()savefig()、文件 I/O 操作使用 sed 进行批量替换更新路径:
# 更新数据文件路径
sed -i.bak "s|'filename.json'|'../data/filename.json'|g" notebook.ipynb
# 更新图表输出路径
sed -i.bak "s|savefig('fig|savefig('../figures/fig|g" notebook.ipynb
# 使用双引号更新(Python 代码中常见)
sed -i.bak 's|"filename.csv"|"../data/filename.csv"|g' script.py
验证更新:
# 检查路径是否正确更新
grep -o "'../data/[^']*'" notebook.ipynb | head -5
grep -o "'../figures/[^']*'" notebook.ipynb | head -5
清理备份文件:
rm *.bak *.bak2 *.bak3
重组后始终检查这些文件类型:
.ipynb):数据加载、图表保存.py):文件 I/O 操作test_*.py):通常引用数据夹具| 原始位置 | 重组后 | 相对路径 |
|---|---|---|
./data.json | ../data/data.json | 上一级,进入 data/ |
./figure.png | ../figures/figure.png | 上一级,进入 figures/ |
./test.py 访问 data.json | ../data/data.json | 从 tests/ 到 data/ |
./notebook.ipynb 访问两者 | ../data/、../figures/ | 从 notebooks/ 到两者 |
../data/)而非绝对路径以提高可移植性.bak)直到验证完成完成重组后,始终验证结果并进行清理:
# 统计移动到每个目录的文件数
for dir in figures data tests notebooks docs archives; do
echo "$dir: $(ls $dir 2>/dev/null | wc -l | tr -d ' ') files"
done
# 确保根目录干净
ls -la
# 应仅看到:
# - 组织好的目录(figures/、data/ 等)
# - 项目特定文件夹(Fetch_data/、sharing/)
# - 配置目录(.claude/、.git/)
# - 必要文件(README.md、.gitignore 等)
# 移除 Jupyter 检查点(自动生成,版本控制中不需要)
rm -rf .ipynb_checkpoints
# 移除 sed 备份文件
rm *.bak *.bak2 *.bak3
# 移除重复/备份数据文件
rm *.backup *.backup2 *.old
# 显示干净的目录树
tree -L 2 -d
# 或仅列出目录
ls -d */
TODO.md、NOTES.mdresults/ 中当项目随时间积累许多文件时,使用此系统方法识别并仅保留必要文件:
# 从 Jupyter notebooks 中提取图表引用
grep -o "figures/[^'\"]*\.png" YourNotebook.ipynb | sort -u
# 对于多个 notebooks,检查每个
for nb in *.ipynb; do
echo "=== $nb ==="
grep -o "figures/[^'\"]*\.png" "$nb" | sort -u
done
# 查找哪个脚本生成特定图表
grep -l "figure_name" scripts/*.py
# 搜索输出目录模式
grep -l "figures/curation_impact" scripts/*.py
创建清晰结构:
mkdir -p deprecated/{figures,scripts,notebooks}
mkdir -p deprecated/figures/{unused_category1,unused_category2}
mkdir -p deprecated/scripts/unused_utilities
使用描述性的子目录名称:
良好结构:
deprecated/
├── figures/
│ ├── unused_regression_plots/ # 基于类别的名称
│ ├── unused_curation_impact/
│ └── exploratory_analysis/
├── scripts/
│ ├── unused_utilities/ # 基于目的的组织
│ ├── old_data_fetch/
│ └── notebook_fixes/
└── data/
├── intermediate_tables/
└── old_versions/
不良结构:
deprecated/
├── old_stuff/ # 太模糊
├── misc/ # 目的不明确
└── temp/ # 模糊不清
良好命名的优点:
创建 MINIMAL_ESSENTIAL_FILES.md:
示例结构:
## 活动图表
1. figure_01.png - 在 Notebook A 中使用(图 1)
- 生成者:script_14.py
## 必要脚本
1. script_14.py - 生成图 1-4、7
2. build_data.py - 必需的基础设施
在最终确定清理前:
清理包含多个配置文件版本的分析目录时:
指示旧/被取代文件的模式:
_UPDATED 版本时没有 _UPDATED 后缀的文件*_old.txt、*_backup.csv 的文件itol_branch_colors.txt、itol_branch_colors_v2.txt、itol_branch_colors_UPDATED.txt_UPDATED.txt(当前版本)清理策略:
# 创建具有描述性名称的弃用目录
mkdir -p deprecated/phylo_old_configs
# 移动被取代的文件(保留供参考)
mv old_file1.txt old_file2.csv deprecated/phylo_old_configs/
# 验证移动(应为空或仅当前文件)
ls *.txt *.csv
活动目录中保留的文件:
*_UPDATED.*)要弃用的文件:
来自 phylo 清理的示例(18 个文件被弃用):
itol_3category_colorstrip.txt(旧颜色)itol_3category_colorstrip_UPDATED.txt(新颜色)species_curation_methods.csv(2 类别系统)species_3category_methods_UPDATED.csv(3 类别系统)优点:
项目在根目录中积累文档文件(.md、.log、.txt)。有效地整合它们:
documentation/
├── README.md # 所有文档的索引
├── logs/ # 流程的日志文件
├── working_files/ # 临时/工作文件
└── [组织好的 .md 文件]
# 1. 创建结构
mkdir -p documentation/{logs,working_files}
# 2. 移动文档
mv *.md documentation/
mv *.log documentation/logs/
mv *.txt documentation/working_files/ # 或将必要的保留在根目录
# 3. 创建索引(documentation/README.md)
cat > documentation/README.md << 'EOF'
# 项目文档
## 快速开始
- ESSENTIAL_FILE.md - 从这里开始
- RECENT_CHANGES.md - 最新更新
## 按类别
### 分析
- analysis_summary.md
- results.md
### 方法
- methods.md
- protocols.md
[等等...]
EOF
在 documentation/README.md 中包含:
保留在项目根目录:
README.md - 项目概述LICENSE、CONTRIBUTING.md - 标准文件.gitignore、配置文件移动到 documentation/:
❌ 不要删除旧文档 - 将其移动到 documentation/archive/
✓ 保留历史但清晰地组织它
对于大型数据分析项目,实施 MANIFEST 系统以实现高效的项目导航并最小化 Claude Code 会话中的令牌使用。
挑战:包含许多文件的大型项目在会话启动期间消耗过多令牌:
解决方案:MANIFEST 文件提供轻量级项目索引(每个约 500-2,000 令牌),无需读取实际文件即可提供完整上下文。
使用 MANIFEST 前(读取实际文件):
Root MANIFEST: N/A
Notebooks: ~10,000-15,000 tokens (reading 3 large notebooks)
Data exploration: ~3,000-5,000 tokens
Scripts analysis: ~2,000-3,000 tokens
---
Total: ~15,000-23,000 tokens for project orientation
使用 MANIFEST 后(读取索引):
Root MANIFEST: ~1,500 tokens (full project overview)
Subdirectory MANIFEST: ~500-1,000 tokens (specific area)
---
Total: ~2,000-2,500 tokens for complete context
Savings: 85-90% token reduction!
MANIFEST.md 文件是目录的全面索引,包括:
关键原则:MANIFEST 提供恢复工作所需 80% 的上下文,仅需 500-2,000 令牌,而不是读取 5,000-15,000 令牌的实际文件。
# [项目名称] - 根 MANIFEST
**最后更新**:YYYY-MM-DD
**目的**:[1-2 句描述]
**状态**:活动/弃用/归档
---
## 快速参考
**入口点**:[首先读取哪些文件]
**关键输出**:[主要交付物]
**依赖关系**:[外部要求]
---
## 文件
### Notebooks
#### `notebook_name.ipynb`(大小)
- **目的**:[此分析回答什么问题?]
- **依赖**:[输入文件、数据、脚本]
- **生成**:[输出文件、图表]
- **关键发现**:[1-2 句摘要]
- **最后修改**:YYYY-MM-DD
- **执行时间**:[如果相关,约 X 分钟]
- **优先级**:[主要文档或补充分析]
### 关键目录
#### `data/`(大小)
详见 `data/MANIFEST.md`。
- **内容**:[简要描述]
- **关键文件**:[最重要的文件]
[为 figures/、scripts/、documentation/ 重复]
---
## 目录结构
project/ ├── MANIFEST.md(此文件) ├── data/ │ └── MANIFEST.md ├── figures/ │ └── MANIFEST.md └── [等等。]
---
## 工作流依赖关系
[数据 → 处理 → 输出的可视化或文本描述]
示例:
1. 数据获取:fetch_data.py → data/raw/
2. 处理:process.py → data/processed/
3. 分析:analysis.ipynb → figures/
---
## 恢复工作笔记
**当前状态**:[最后完成的是什么?]
**后续步骤**:[接下来需要做什么?]
**已知问题**:[问题、待办事项、阻碍]
**参考**:[相关文档、其他 MANIFEST 的链接]
---
## 元数据
**创建者**:Claude Code
**项目**:[项目名称]
**标签**:[#用于 #搜索的 #关键词]
**环境**:[conda 环境名称或 venv 路径]
**Obsidian 笔记路径**:[项目笔记链接]
---
## 对于 Claude Code 会话
**新会话快速开始**:
1. 读取此 MANIFEST.md(~500 令牌)
2. 读取相关子目录 MANIFEST.md(~500 令牌)
3. 仅在编辑时读取实际文件
**令牌效率**:
- 此 MANIFEST 提供所需 80% 的上下文
- 子目录 MANIFEST 提供详细文件信息
- 仅在更改时读取实际代码/notebooks
使用相同结构但专注于特定目录。对于子目录:
data/MANIFEST.md - 专注于:
figures/MANIFEST.md - 专注于:
scripts/MANIFEST.md - 专注于:
documentation/MANIFEST.md - 专注于:
当实现 analysis_files/(迭代 2)时,创建双向链接:
在 figures/MANIFEST.md 中 - 链接到分析文件:
**01_figure_name.png**(318 KB)
- **描述**:简要描述
- **分析文件**:`../analysis_files/figures/01_figure_name.md` - 详细分析
在 analysis_files/MANIFEST.md 中 - 链接回图表:
#### 01_figure_name.md
- **图表文件**:`figures/curation_impact_3cat/01_figure_name.png`
- **目的**:详细分析和解释
在根 MANIFEST.md 中 - 引用两者:
#### `analysis_files/`(~90 KB)**[新 - 迭代 2]**
- **目的**:用于图表分析的独立 markdown 文件
- **令牌效率**:比 notebooks 减少约 98%
- **链接到**:figures/ 目录
这创建了一个可导航的文档网络。
在项目根目录创建 MANIFEST_TEMPLATE.md 作为起点。有关包含所有部分和指导的完整示例,请参阅 Curation_Paper_figures 项目中的模板。
在 .claude/commands/generate-manifest.md 中创建
Expert guidance for organizing project directories, establishing file naming conventions, and maintaining clean, navigable project structures for research and development work.
project-name/
├── README.md # Project overview and getting started
├── .gitignore # Exclude data, outputs, env files
├── environment.yml # Conda environment (or requirements.txt)
├── data/ # Input data (often gitignored)
│ ├── raw/ # Original, immutable data
│ ├── processed/ # Cleaned, transformed data
│ └── external/ # Third-party data
├── notebooks/ # Jupyter notebooks for exploration
│ ├── 01-exploration.ipynb
│ ├── 02-analysis.ipynb
│ └── figures/ # Notebook-generated figures
├── src/ # Source code (reusable modules)
│ ├── __init__.py
│ ├── data_processing.py
│ ├── analysis.py
│ └── visualization.py
├── scripts/ # Standalone scripts and workflows
│ ├── download_data.sh
│ └── run_pipeline.py
├── tests/ # Unit tests
│ └── test_analysis.py
├── docs/ # Documentation
│ ├── methods.md
│ └── references.md
├── results/ # Analysis outputs (gitignored)
│ ├── figures/
│ ├── tables/
│ └── models/
└── config/ # Configuration files
└── analysis_config.yaml
project-name/
├── README.md
├── .gitignore
├── setup.py # Package configuration
├── requirements.txt # or pyproject.toml
├── src/
│ └── package_name/
│ ├── __init__.py
│ ├── core.py
│ └── utils.py
├── tests/
│ ├── test_core.py
│ └── test_utils.py
├── docs/
│ ├── api.md
│ └── usage.md
├── examples/ # Example usage
│ └── example_workflow.py
└── .github/ # CI/CD workflows
└── workflows/
└── tests.yml
project-name/
├── README.md
├── data/
│ ├── raw/ # Raw sequencing data
│ ├── reference/ # Reference genomes, annotations
│ └── processed/ # Workflow outputs
├── workflows/ # Galaxy .ga or Snakemake files
│ ├── preprocessing.ga
│ └── assembly.ga
├── config/
│ ├── workflow_params.yaml
│ └── sample_sheet.tsv
├── scripts/ # Helper scripts
│ ├── submit_workflow.py
│ └── quality_check.py
├── results/ # Final outputs
│ ├── figures/
│ ├── tables/
│ └── reports/
└── logs/ # Workflow execution logs
For projects involving Jupyter notebooks, data analysis, and visualization with many generated figures:
project-name/
├── README.md # Project overview
├── .gitignore
├── notebooks/ # Jupyter notebooks (analysis, exploration)
│ ├── 01-data-loading.ipynb
│ ├── 02-exploratory-analysis.ipynb
│ └── 03-final-analysis.ipynb
├── figures/ # ALL generated visualizations (PNG, PDF, SVG)
│ ├── fig1_distribution.png
│ ├── fig2_correlation.png
│ └── supplementary_*.png
├── data/ # Data files (JSON, CSV, TSV, Excel)
│ ├── raw/ # (optional) Original, unprocessed data
│ ├── processed/ # (optional) Cleaned, processed data
│ ├── input_data.json
│ └── metadata.tsv
├── tests/ # Test scripts (test_*.py, pytest)
│ ├── test_processing.py
│ └── test_analysis.py
├── scripts/ # Standalone Python/R scripts
│ ├── data_fetch.py
│ └── preprocessing.py
├── docs/ # Documentation (MD, RST files)
│ ├── methods.md
│ └── analysis_notes.md
└── archives/ # Compressed archives, old versions
└── backup_YYYYMMDD.tar.gz
Benefits of This Structure :
When to Use This Structure :
MANIFEST Integration :
For enhanced navigation and token efficiency, add MANIFEST.md files:
project-name/
├── MANIFEST.md # Root project index (~1,500 tokens)
├── MANIFEST_TEMPLATE.md # Template for creating new MANIFESTs
├── notebooks/
│ ├── MANIFEST.md # (optional) Notebook catalog
│ └── [notebook files]
├── figures/
│ ├── MANIFEST.md # Figure catalog (~500-1,000 tokens)
│ └── [figure files]
├── data/
│ ├── MANIFEST.md # Data inventory (~500-1,000 tokens)
│ └── [data files]
├── scripts/
│ ├── MANIFEST.md # Script documentation (~500-1,000 tokens)
│ └── [script files]
└── documentation/
├── MANIFEST.md # Doc organization (~500-1,000 tokens)
└── [doc files]
Benefits:
See "MANIFEST System for Token-Efficient Navigation" section below for complete documentation.
Use lowercase with hyphens or underscores
data-analysis.py or data_analysis.pyDataAnalysis.py or data analysis.pyBe descriptive but concise
process-telomere-data.pyscript.py or process_all_the_telomere_sequencing_data_from_experiments.pyUse consistent separators
For sequential files (notebooks, scripts), use zero-padded numbers:
notebooks/
├── 01-data-exploration.ipynb
├── 02-quality-control.ipynb
├── 03-statistical-analysis.ipynb
└── 04-visualization.ipynb
Include metadata in filename when possible:
data/raw/
├── sample-A_hifi_reads_2026-01-15.fastq.gz
├── sample-B_hifi_reads_2026-01-15.fastq.gz
└── reference_genome_v3.fasta
DO commit:
DON'T commit:
.gitignore)venv/, conda-env/)# Claude Code (local skills and settings)
.claude/
# Python
__pycache__/
*.py[cod]
*$py.class
.venv/
venv/
*.egg-info/
# Jupyter
.ipynb_checkpoints/
*.ipynb_checkpoints
# Data
data/raw/
data/processed/
*.fastq.gz
*.bam
*.vcf.gz
# Outputs
results/
outputs/
*.png
*.pdf
*.html
# Logs
logs/
*.log
# Environment
.env
environment.local.yml
# OS
.DS_Store
Thumbs.db
Standardize onsession-saves/ (not sessions-history/, Archive/, or other variants)
Benefits:
archived/ patternWhen reorganizing:
mv sessions-history/ session-saves/
mv Archive/ session-saves/ # if used for session notes
Project folder template (with session notes):
project-name/
├── TO-DOS.md # Project-specific tasks
├── session-saves/ # Working session notes (tagged with #dump)
├── archived/ # Processed/consolidated notes
│ ├── daily/ # Daily consolidations
│ └── monthly/ # Monthly summaries
├── Planning/ # Planning documents
├── Development/ # Development notes
└── [other content folders]
Integration with Obsidian:
dump tag in frontmatter for easy filteringsession-saves/ to archived/daily/ or archived/monthly/Clean up old session folder variations:
If you renamed folders during reorganization (e.g., sessions-history/ → session-saves/):
mv old-folder/*.md session-saves/archived/daily/
rmdir old-folder/ # Fails if not empty - safety check
Don't leave multiple session folders:
sessions-history/ and session-saves/ both presentsession-saves/ with archived contentVerification:
# Check for session folder variations
find . -type d -name "*session*" -not -path "*/archived/*"
# Should show only: ./session-saves/
data/raw/ and make it read-only if possibledata/
├── raw/ # Original, immutable
├── interim/ # Intermediate processing steps
├── processed/ # Final, analysis-ready data
└── external/ # Third-party data
Every project should have a README with:
# Project Name
Brief description
## Installation
How to set up the environment
## Usage
How to run the analysis/code
## Project Structure
Brief overview of directories
## Data
Where data lives and how to access it
## Results
Where to find outputs
After major changes (cleanup, deprecation, restoration), create summary documents:
Create a dated summary document :
OPERATION_SUMMARY_YYYY-MM-DD.md
Essential sections :
Examples of good summary docs :
FIGURE_RESTORATION_SUMMARY.md - Documents restored filesDEPRECATION_SUMMARY.md - Documents deprecated notebooksRECENT_CHANGES_SUMMARY.md - High-level overview# [Operation] Summary - [Date]
## Problem
[Brief description of the issue]
## Solution
[What was done to address it]
### Files Changed
- **Moved**: [list]
- **Restored**: [list]
- **Updated**: [list]
## Current State
- **Active files**: [count and list]
- **Deprecated files**: [count and list]
- **Status**: [Ready/In Progress/etc.]
## Restoration Instructions
```bash
# Commands to undo changes if needed
Date : YYYY-MM-DD Status : [Complete/Partial/etc.]
**Why This Matters**:
- Future users (including yourself) understand what changed
- Provides restoration instructions if needed
- Creates audit trail for project history
- Helps collaborators understand project evolution
## Common Anti-Patterns to Avoid
❌ **Flat structure with everything in root**
project/ ├── script1.py ├── script2.py ├── data.csv ├── output1.png ├── output2.png └── final_really_final_v3.xlsx
❌ **Ambiguous naming**
notebooks/ ├── notebook1.ipynb ├── test.ipynb ├── analysis.ipynb └── analysis_new.ipynb
❌ **Mixed concerns**
project/ ├── src/ │ ├── analysis.py │ ├── data.csv # Data in source code directory │ └── figure1.png # Output in source code directory
## Project Reorganization
When reorganizing an existing project's folder structure, follow this systematic approach to avoid breaking dependencies.
### Post-Reorganization Path Updates
After reorganizing folders, **ALWAYS verify and update file paths** in all scripts and notebooks. Files often contain hardcoded paths that break after reorganization.
#### Systematic Path Update Process
1. **Identify all files that reference other files**:
```bash
# Find notebooks
find . -name "*.ipynb"
# Find Python scripts
find . -name "*.py"
# Search for file references
grep -r "\.json\|\.csv\|\.tsv\|\.png" --include="*.ipynb" --include="*.py"
2. Common file reference patterns to search for :
* Data files: `.json`, `.csv`, `.tsv`, `.xlsx`
* Figures: `.png`, `.jpg`, `.pdf`, `.svg`
* Python: `open()`, `read_csv()`, `read_json()`, `load()`
* Jupyter: `savefig()`, file I/O operations
3. Update paths using sed for batch replacements :
# Update data file paths
sed -i.bak "s|'filename.json'|'../data/filename.json'|g" notebook.ipynb
# Update figure output paths
sed -i.bak "s|savefig('fig|savefig('../figures/fig|g" notebook.ipynb
# Update with double quotes (common in Python code)
sed -i.bak 's|"filename.csv"|"../data/filename.csv"|g' script.py
4. Verify updates :
# Check that paths were updated correctly
grep -o "'../data/[^']*'" notebook.ipynb | head -5
grep -o "'../figures/[^']*'" notebook.ipynb | head -5
5. Clean up backup files :
rm *.bak *.bak2 *.bak3
Always check these file types after reorganization:
.ipynb): Data loading, figure saving.py): File I/O operationstest_*.py): Often reference data fixtures| Original Location | After Reorganization | Relative Path |
|---|---|---|
./data.json | ../data/data.json | Go up one level, into data/ |
./figure.png | ../figures/figure.png | Go up one level, into figures/ |
./test.py accessing data.json | ../data/data.json |
../data/) not absolute paths for portability.bak) until verification completeAfter completing a reorganization, always verify the results and clean up:
# Count files moved to each directory
for dir in figures data tests notebooks docs archives; do
echo "$dir: $(ls $dir 2>/dev/null | wc -l | tr -d ' ') files"
done
# Ensure root is clean
ls -la
# Should only see:
# - Organized directories (figures/, data/, etc.)
# - Project-specific folders (Fetch_data/, sharing/)
# - Config directories (.claude/, .git/)
# - Essential files (README.md, .gitignore, etc.)
# Remove Jupyter checkpoints (auto-generated, not needed in version control)
rm -rf .ipynb_checkpoints
# Remove sed backup files
rm *.bak *.bak2 *.bak3
# Remove duplicate/backup data files
rm *.backup *.backup2 *.old
# Show clean directory tree
tree -L 2 -d
# Or list directories only
ls -d */
TODO.md, NOTES.md from completed workresults/When projects accumulate many files over time, use this systematic approach to identify and keep only essential files:
# Extract figure references from Jupyter notebooks
grep -o "figures/[^'\"]*\.png" YourNotebook.ipynb | sort -u
# For multiple notebooks, check each one
for nb in *.ipynb; do
echo "=== $nb ==="
grep -o "figures/[^'\"]*\.png" "$nb" | sort -u
done
# Find which script generates a specific figure
grep -l "figure_name" scripts/*.py
# Search for output directory patterns
grep -l "figures/curation_impact" scripts/*.py
Create clear structure:
mkdir -p deprecated/{figures,scripts,notebooks}
mkdir -p deprecated/figures/{unused_category1,unused_category2}
mkdir -p deprecated/scripts/unused_utilities
Use descriptive subdirectory names:
Good structure:
deprecated/
├── figures/
│ ├── unused_regression_plots/ # Category-based names
│ ├── unused_curation_impact/
│ └── exploratory_analysis/
├── scripts/
│ ├── unused_utilities/ # Purpose-based organization
│ ├── old_data_fetch/
│ └── notebook_fixes/
└── data/
├── intermediate_tables/
└── old_versions/
Poor structure:
deprecated/
├── old_stuff/ # Too vague
├── misc/ # Unclear purpose
└── temp/ # Ambiguous
Benefits of good naming:
Create MINIMAL_ESSENTIAL_FILES.md:
Example structure :
## Active Figures
1. figure_01.png - Used in Notebook A (Figure 1)
- Generated by: script_14.py
## Essential Scripts
1. script_14.py - Generates Figures 1-4, 7
2. build_data.py - Required infrastructure
Before finalizing cleanup:
When cleaning up analysis directories with multiple config file versions:
Patterns indicating old/superseded files:
Naming patterns:
_UPDATED suffix when _UPDATED versions exist*_old.txt, *_backup.csvContent indicators:
Multiple similar files:
itol_branch_colors.txt, itol_branch_colors_v2.txt, itol_branch_colors_UPDATED.txtCleanup Strategy:
# Create deprecation directory with descriptive name
mkdir -p deprecated/phylo_old_configs
# Move superseded files (preserve for reference)
mv old_file1.txt old_file2.csv deprecated/phylo_old_configs/
# Verify move (should be empty or only current files)
ls *.txt *.csv
Files to keep in active directory:
*_UPDATED.*)Files to deprecate:
Example from phylo cleanup (18 files deprecated):
itol_3category_colorstrip.txt (old colors)itol_3category_colorstrip_UPDATED.txt (new colors)species_curation_methods.csv (2-category system)species_3category_methods_UPDATED.csv (3-category system)Benefits:
Projects accumulate documentation files (.md, .log, .txt) in the root directory. Consolidate them effectively:
documentation/
├── README.md # Index to all documentation
├── logs/ # Log files from processes
├── working_files/ # Temporary/working files
└── [organized .md files]
# 1. Create structure
mkdir -p documentation/{logs,working_files}
# 2. Move documentation
mv *.md documentation/
mv *.log documentation/logs/
mv *.txt documentation/working_files/ # or keep essential ones in root
# 3. Create index (documentation/README.md)
cat > documentation/README.md << 'EOF'
# Project Documentation
## Quick Start
- ESSENTIAL_FILE.md - Start here
- RECENT_CHANGES.md - Latest updates
## By Category
### Analysis
- analysis_summary.md
- results.md
### Methods
- methods.md
- protocols.md
[etc...]
EOF
Include in documentation/README.md:
Keep in project root:
README.md - Project overviewLICENSE, CONTRIBUTING.md - Standard files.gitignore, config filesMove to documentation/:
❌ Don't delete old documentation - move it to documentation/archive/ ✓ Preserve history but organize it clearly
For large data analysis projects, implement a MANIFEST system to enable efficient project navigation and minimize token usage in Claude Code sessions.
Challenge : Large projects with many files consume excessive tokens during session startup:
Solution : MANIFEST files provide lightweight project indexes (~500-2,000 tokens each) that give complete context without reading actual files.
Before MANIFESTs (reading actual files):
Root MANIFEST: N/A
Notebooks: ~10,000-15,000 tokens (reading 3 large notebooks)
Data exploration: ~3,000-5,000 tokens
Scripts analysis: ~2,000-3,000 tokens
---
Total: ~15,000-23,000 tokens for project orientation
After MANIFESTs (reading indexes):
Root MANIFEST: ~1,500 tokens (full project overview)
Subdirectory MANIFEST: ~500-1,000 tokens (specific area)
---
Total: ~2,000-2,500 tokens for complete context
Savings: 85-90% token reduction!
A MANIFEST.md file is a comprehensive index for a directory that includes:
Key Principle : MANIFEST provides 80% of context needed to resume work in 500-2,000 tokens instead of reading 5,000-15,000 tokens of actual files.
# [Project Name] - ROOT MANIFEST
**Last Updated**: YYYY-MM-DD
**Purpose**: [1-2 sentence description]
**Status**: Active/Deprecated/Archive
---
## Quick Reference
**Entry Points**: [Which files to read first]
**Key Outputs**: [Main deliverables]
**Dependencies**: [External requirements]
---
## Files
### Notebooks
#### `notebook_name.ipynb` (Size)
- **Purpose**: [What analysis/questions does this answer?]
- **Depends on**: [Input files, data, scripts]
- **Generates**: [Output files, figures]
- **Key findings**: [1-2 sentence summary]
- **Last modified**: YYYY-MM-DD
- **Execution time**: [~X minutes if relevant]
- **Priority**: [Main document or complementary analysis]
### Key Directories
#### `data/` (Size)
See `data/MANIFEST.md` for details.
- **Contents**: [Brief description]
- **Key files**: [Most important files]
[Repeat for figures/, scripts/, documentation/]
---
## Directory Structure
project/ ├── MANIFEST.md (this file) ├── data/ │ └── MANIFEST.md ├── figures/ │ └── MANIFEST.md └── [etc.]
---
## Workflow Dependencies
[Visual or text description of data → processing → outputs flow]
Example:
---
## Notes for Resuming Work
**Current Status**: [What was last completed?]
**Next Steps**: [What needs to be done next?]
**Known Issues**: [Problems, TODOs, blockers]
**Reference**: [Links to related docs, other MANIFESTs]
---
## Metadata
**Created by**: Claude Code
**Project**: [Project name]
**Tags**: [#keywords #for #searching]
**Environment**: [conda env name or venv path]
**Obsidian notes path**: [Link to project notes]
---
## For Claude Code Sessions
**Quick Start for New Sessions**:
1. Read this MANIFEST.md (~500 tokens)
2. Read relevant subdirectory MANIFEST.md (~500 tokens)
3. Only read actual files when editing them
**Token Efficiency**:
- This MANIFEST provides 80% of context needed
- Subdirectory MANIFESTs provide detailed file info
- Read actual code/notebooks only when making changes
Use the same structure but focused on the specific directory. For subdirectories:
data/MANIFEST.md - Focus on:
figures/MANIFEST.md - Focus on:
scripts/MANIFEST.md - Focus on:
documentation/MANIFEST.md - Focus on:
When implementing analysis_files/ (Iteration 2), create bidirectional links:
In figures/MANIFEST.md - Link to analysis files:
**01_figure_name.png** (318 KB)
- **Description**: Brief description
- **Analysis file**: `../analysis_files/figures/01_figure_name.md` - Detailed analysis
In analysis_files/MANIFEST.md - Link back to figures:
#### 01_figure_name.md
- **Figure file**: `figures/curation_impact_3cat/01_figure_name.png`
- **Purpose**: Detailed analysis and interpretation
In root MANIFEST.md - Reference both:
#### `analysis_files/` (~90 KB) **[NEW - ITERATION 2]**
- **Purpose**: Separate markdown files for figure analyses
- **Token efficiency**: ~98% reduction vs notebooks
- **Links to**: figures/ directory
This creates a navigable web of documentation.
Create MANIFEST_TEMPLATE.md in project root as a starting point. See the template in the Curation_Paper_figures project for a complete example with all sections and guidance.
Create this command in .claude/commands/generate-manifest.md (or symlink from global commands):
Purpose : Automatically generates MANIFEST files by analyzing directory contents
Key Features :
Usage :
/generate-manifest # Interactive mode
/generate-manifest data # Generate for data/
/generate-manifest figures # Generate for figures/
Implementation Tips :
wc -l for CSV filesCreate this command in .claude/commands/update-manifest.md:
Purpose : Quickly updates existing MANIFESTs while preserving user content
Key Features :
Usage :
/update-manifest # Update current directory
/update-manifest data # Update data/MANIFEST.md
/update-manifest --quick # Force quick mode
/update-manifest --full # Full re-analysis
Session End Pattern :
/update-manifest # Capture session progress
/update-skills # Save new knowledge
/safe-exit # Clean exit with notes
Create template :
# Copy MANIFEST_TEMPLATE.md to project root
# Or use /generate-manifest to create from scratch
Generate root MANIFEST :
/generate-manifest
# Choose "root directory"
# Fill in user-specific fields
Generate subdirectory MANIFESTs :
/generate-manifest data
/generate-manifest figures
/generate-manifest scripts
/generate-manifest documentation
Customize MANIFESTs :
Start session - Read MANIFESTs for context:
cat MANIFEST.md # Project overview
cat figures/MANIFEST.md # If working on figures
Work on project - Normal development
End session - Update MANIFESTs:
/update-manifest # Captures session progress
When you:
Run full regeneration:
/generate-manifest --update # Full re-analysis
ALWAYS include :
USER FILL fields for :
Tip for filling user-specific fields :
.claude/project-config file - it often contains the vault path in the obsidian_vault or similar fieldenvironment.yml/requirements.txtAuto-generate from code :
Template for Analysis Notebook Entries:
When adding a new analysis notebook to MANIFEST.md, include:
#### `Notebook_Name.ipynb` (file size) **[NEW]**
- **Purpose**: One-sentence objective of the analysis
- **Type**: Category (e.g., "Confounding analysis", "Data enrichment", "Primary analysis")
- **Rationale**: Why this analysis is needed (2-3 sentences explaining motivation)
- **Approach**:
- Bullet points of analytical steps
- Key methodological decisions
- **Key Questions**:
- Question 1 the analysis addresses
- Question 2 the analysis addresses
- **Depends on**:
- data/input_file.csv (description)
- scripts/processing_script.py
- **Generates**:
- figures/output_dir/figure1.png (what it shows)
- results/statistics.csv
- **Dataset**: N assemblies/samples, key statistics
- **Last modified**: YYYY-MM-DD
- **Status**: Current state (e.g., "Code optimized", "In progress", "Complete")
- **Execution time**: ~XX minutes
- **Priority**: Role in project (e.g., "Confounding analysis - validates main findings")
- **Note**: Important caveats or special considerations
Example: Technology/Temporal Confounding Analysis
#### `Technology_Temporal_Analysis.ipynb` (32 KB) **[NEW]**
- **Purpose**: Investigate whether sequencing technology (CLR vs HiFi) and temporal trends confound the curation method comparisons
- **Type**: Confounding analysis - technology and temporal effects
- **Rationale**: Sequencing technology evolved rapidly (CLR → HiFi), and assembly methods may correlate with technology era. Need to determine if observed quality differences are due to curation methods or underlying technology/temporal confounders.
- **Approach**:
- Technology-separated analysis: Compare categories split by sequencing technology
- Temporal trend analysis: Plot quality metrics over time (2019-2025)
- HiFi-only temporal analysis: Eliminate technology confounding
- **Key Questions**:
- Are quality differences consistent across technologies (HiFi vs CLR)?
- Do quality metrics improve over time, and is this technology-driven?
- Do temporal trends persist when technology is held constant?
- **Depends on**:
- `data/vgp_assemblies_3categories_tech.csv` (3-category data with technology inference)
- scipy for statistical tests (Mann-Whitney U, Spearman correlation)
- **Generates**:
- `figures/technology_temporal/01_prialt_tech_comparison.png` (HiFi vs CLR)
- `figures/technology_temporal/04_hifi_only_temporal_trends.png` (HiFi-only, 2021-2025)
- `figures/technology_temporal/technology_effects_statistics.csv`
- `figures/technology_temporal/temporal_trends_hifi_only_statistics.csv`
- **Dataset**: 541 VGP assemblies, 464/541 (86%) with technology assignment (355 HiFi, 107 CLR)
- **Last modified**: 2026-02-25
- **Status**: Code optimized (DPI reduced 300→150 to prevent image loading errors)
- **Execution time**: ~10-15 minutes
- **Priority**: Confounding analysis - validates that curation effects are not driven by technology or temporal biases
- **Note**: Figure sizes reduced (DPI 150) to prevent notebook image loading errors
Benefits of Comprehensive Documentation:
/update-manifest/generate-manifestStructure for "Recent Session Work" Section:
When running /update-manifest, document the session in this format:
**Recent Session Work** (YYYY-MM-DD):
- **[Action taken]**:
- Specific change 1 with details
- Specific change 2 with quantitative results
- Why this change was made
- Brief description of problem solved or feature added
- Any updates to directory structure or workflow
Example Session Documentation:
**Recent Session Work** (2026-02-25):
- **Updated Technology_Temporal_Analysis.ipynb code**:
- Reduced DPI from 300→150 in global settings and all savefig calls
- Reduced figure sizes: 01_prialt (15×10→12×8), 02_all_tech (18×12→14×9)
- Prevents image loading errors while maintaining publication quality
- File sizes reduced by ~75% (combination of DPI and size reduction)
- Added Technology_Temporal_Analysis.ipynb entry to root MANIFEST
- Updated directory structure to include figures/technology_temporal/
- Added HiFi-only temporal analysis section to eliminate technology confounding
Next Steps Format:
Prioritize and number action items:
**Next Steps**:
1. **[High priority action]** - [Why it's important]
2. **[Medium priority]** - [Context]
3. **[Future work]** - [When to tackle]
Example:
**Next Steps**:
1. **Re-run Technology_Temporal_Analysis.ipynb** - Execute cells to regenerate figures with optimized 150 DPI settings
2. **Generate notebook with temporal effect and only HiFi data** - Already added to notebook, need to execute new cells
3. **Write integrated manuscript Results section** - Combine findings from all 5 clades into cohesive narrative
This structure makes it easy to resume work by quickly understanding what was done and what's next.
Add MANIFESTs to standard project organization:
project/
├── MANIFEST.md # Root project index
├── MANIFEST_TEMPLATE.md # Template for new MANIFESTs
├── data/
│ ├── MANIFEST.md # Data inventory
│ └── [data files]
├── figures/
│ ├── MANIFEST.md # Figure catalog
│ └── [figure files]
├── scripts/
│ ├── MANIFEST.md # Script documentation
│ └── [script files]
├── documentation/
│ ├── MANIFEST.md # Doc organization
│ └── [doc files]
└── [other directories with MANIFESTs as needed]
Emphasize:
Emphasize:
Emphasize:
Emphasize:
See the Curation_Paper_figures project for a complete implementation:
For Claude Code :
For Users :
For Teams :
MANIFEST too long (>2,500 tokens):
MANIFEST outdated :
/update-manifest before /safe-exit/generate-manifest --update for full refreshToo many [USER TO FILL] fields :
/update-manifest to capture context immediatelyUnclear what to include :
This skill works well with:
# Create standard research project structure
mkdir -p data/{raw,processed,external} notebooks scripts src tests docs results config
touch README.md .gitignore environment.yml
Consider using cookiecutter for standardized project templates:
cookiecutter-data-science - Data science projectscookiecutter-research - Research projectsWeekly Installs
65
Repository
GitHub Stars
8
First Seen
Jan 24, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
opencode57
codex56
gemini-cli54
github-copilot51
cursor49
kimi-cli47
React 组合模式指南:Vercel 组件架构最佳实践,提升代码可维护性
120,000 周安装
Include version/date for important outputs
report-2026-01-23.pdf or model-v2.pklreport-final-final-v3.pdf| From tests/ to data/ |
./notebook.ipynb accessing both | ../data/, ../figures/ | From notebooks/ to both |
_UPDATED.txt (current version)