项目文件夹组织最佳实践指南：代码、数据、文档结构规范与命名规则

folder-organization by delphine-l/claude_global

80 周安装量

10 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/delphine-l/claude_global --skill folder-organization

开发项目结构代码规范

🇨🇳中文介绍

文件夹组织最佳实践

为研究和开发工作提供组织项目目录、建立文件命名规范以及维护清晰、可导航项目结构的专家指导。

何时使用此技能

设置新项目
重组现有项目
建立团队规范
创建可重复的研究结构
管理数据密集型项目

核心原则

可预测性 - 常见文件类型有标准位置
可扩展性 - 结构随项目优雅增长
可发现性 - 便于他人（以及未来的你）导航
关注点分离 - 代码、数据、文档、输出分离
版本控制友好 - 适当排除大型/生成的文件

标准项目结构

研究/分析项目

project-name/
├── README.md                 # 项目概述和入门指南
├── .gitignore               # 排除数据、输出、环境文件
├── environment.yml          # Conda 环境（或 requirements.txt）
├── data/                    # 输入数据（通常被 gitignore）
│   ├── raw/                # 原始、不可变数据
│   ├── processed/          # 清理、转换后的数据
│   └── external/           # 第三方数据
├── notebooks/               # 用于探索的 Jupyter notebooks
│   ├── 01-exploration.ipynb
│   ├── 02-analysis.ipynb
│   └── figures/            # Notebook 生成的图表
├── src/                     # 源代码（可重用模块）
│   ├── __init__.py
│   ├── data_processing.py
│   ├── analysis.py
│   └── visualization.py
├── scripts/                 # 独立脚本和工作流
│   ├── download_data.sh
│   └── run_pipeline.py
├── tests/                   # 单元测试
│   └── test_analysis.py
├── docs/                    # 文档
│   ├── methods.md
│   └── references.md
├── results/                 # 分析输出（gitignored）
│   ├── figures/
│   ├── tables/
│   └── models/
└── config/                  # 配置文件
    └── analysis_config.yaml

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

生物信息学/工作流项目

project-name/
├── README.md
├── data/
│   ├── raw/                # 原始测序数据
│   ├── reference/          # 参考基因组、注释
│   └── processed/          # 工作流输出
├── workflows/               # Galaxy .ga 或 Snakemake 文件
│   ├── preprocessing.ga
│   └── assembly.ga
├── config/
│   ├── workflow_params.yaml
│   └── sample_sheet.tsv
├── scripts/                # 辅助脚本
│   ├── submit_workflow.py
│   └── quality_check.py
├── results/                # 最终输出
│   ├── figures/
│   ├── tables/
│   └── reports/
└── logs/                   # 工作流执行日志

使用 Notebooks 的数据分析项目

对于涉及 Jupyter notebooks、数据分析和生成大量图表可视化的项目：

project-name/
├── README.md                # 项目概述
├── .gitignore
├── notebooks/               # Jupyter notebooks（分析、探索）
│   ├── 01-data-loading.ipynb
│   ├── 02-exploratory-analysis.ipynb
│   └── 03-final-analysis.ipynb
├── figures/                 # 所有生成的可视化（PNG、PDF、SVG）
│   ├── fig1_distribution.png
│   ├── fig2_correlation.png
│   └── supplementary_*.png
├── data/                    # 数据文件（JSON、CSV、TSV、Excel）
│   ├── raw/                # （可选）原始、未处理数据
│   ├── processed/          # （可选）清理、处理后的数据
│   ├── input_data.json
│   └── metadata.tsv
├── tests/                   # 测试脚本（test_*.py、pytest）
│   ├── test_processing.py
│   └── test_analysis.py
├── scripts/                 # 独立的 Python/R 脚本
│   ├── data_fetch.py
│   └── preprocessing.py
├── docs/                    # 文档（MD、RST 文件）
│   ├── methods.md
│   └── analysis_notes.md
└── archives/                # 压缩归档、旧版本
    └── backup_YYYYMMDD.tar.gz

此结构的优点：

清晰的关注点分离（代码 vs 数据 vs 输出）
易于导航：在一个地方找到所有图表
可扩展性：处理 50+ 个图表而不混乱根目录
Git 友好：易于 .gitignore 大型数据/图表
协作：标准结构减少上手时间

何时使用此结构：

包含多个 notebooks 的项目
生成许多可视化（10+ 个图表）的分析
多个数据源/格式
团队协作
长期研究项目

MANIFEST 集成：

为了增强导航和令牌效率，添加 MANIFEST.md 文件：

project-name/
├── MANIFEST.md                  # 根项目索引（~1,500 令牌）
├── MANIFEST_TEMPLATE.md         # 创建新 MANIFEST 的模板
├── notebooks/
│   ├── MANIFEST.md             # （可选）Notebook 目录
│   └── [notebook 文件]
├── figures/
│   ├── MANIFEST.md             # 图表目录（~500-1,000 令牌）
│   └── [图表文件]
├── data/
│   ├── MANIFEST.md             # 数据清单（~500-1,000 令牌）
│   └── [数据文件]
├── scripts/
│   ├── MANIFEST.md             # 脚本文档（~500-1,000 令牌）
│   └── [脚本文件]
└── documentation/
    ├── MANIFEST.md             # 文档组织（~500-1,000 令牌）
    └── [文档文件]

会话启动令牌减少 85-90%
在约 2,000 令牌（vs 15,000+）内获得完整项目上下文
无需读取文件即可快速恢复工作
清晰的工作流文档

有关完整文档，请参阅下面的“用于令牌高效导航的 MANIFEST 系统”部分。

使用小写字母，用连字符或下划线分隔
- ✅ data-analysis.py 或 data_analysis.py
- ❌ DataAnalysis.py 或 data analysis.py
描述性但简洁
- ✅ process-telomere-data.py
- ❌ script.py 或 process_all_the_telomere_sequencing_data_from_experiments.py
使用一致的分隔符
- 选择连字符或下划线并坚持使用
- 惯例：文件名用连字符，Python 模块用下划线
重要输出包含版本/日期
- ✅ report-2026-01-23.pdf 或 model-v2.pkl
- ❌ report-final-final-v3.pdf

对于顺序文件（notebooks、脚本），使用零填充编号：

notebooks/
├── 01-data-exploration.ipynb
├── 02-quality-control.ipynb
├── 03-statistical-analysis.ipynb
└── 04-visualization.ipynb

尽可能在文件名中包含元数据：

data/raw/
├── sample-A_hifi_reads_2026-01-15.fastq.gz
├── sample-B_hifi_reads_2026-01-15.fastq.gz
└── reference_genome_v3.fasta

目录管理最佳实践

源代码
文档
配置文件
小型测试数据集（<1MB）
需求/环境文件
README 文件

不应该提交：

大型数据文件（使用 .gitignore）
生成的输出
环境目录（venv/、conda-env/）
日志
临时文件
API 密钥/密钥

# Claude Code（本地技能和设置）
.claude/

# Python
__pycache__/
*.py[cod]
*$py.class
.venv/
venv/
*.egg-info/

# Jupyter
.ipynb_checkpoints/
*.ipynb_checkpoints

# Data
data/raw/
data/processed/
*.fastq.gz
*.bam
*.vcf.gz

# Outputs
results/
outputs/
*.png
*.pdf
*.html

# Logs
logs/
*.log

# Environment
.env
environment.local.yml

# OS
.DS_Store
Thumbs.db

标准化使用 session-saves/（而不是 sessions-history/、Archive/ 或其他变体）

与标准命名模式一致（名词 + 复数）
匹配 archived/ 模式
适用于所有项目
编写自动化脚本时防止混淆

mv sessions-history/ session-saves/
mv Archive/ session-saves/  # 如果用于会话笔记

项目文件夹模板（包含会话笔记）：

project-name/
├── TO-DOS.md                # 项目特定任务
├── session-saves/           # 工作会话笔记（标记为 #dump）
├── archived/                # 已处理/合并的笔记
│   ├── daily/              # 每日合并
│   └── monthly/            # 月度摘要
├── Planning/               # 规划文档
├── Development/            # 开发笔记
└── [其他内容文件夹]

与 Obsidian 集成：

会话笔记应在 frontmatter 中包含 dump 标签以便过滤
有关 dump 标签要求和 frontmatter 模式的完整详细信息，请参阅 obsidian 技能
归档工作流：从 session-saves/ 移动到 archived/daily/ 或 archived/monthly/

清理旧的会话文件夹变体：

如果在重组期间重命名了文件夹（例如，sessions-history/ → session-saves/）：

移动已归档的会话：

mv old-folder/*.md session-saves/archived/daily/

安全移除空旧文件夹：

rmdir old-folder/  # 如果不为空则失败 - 安全检查

更新指向旧文件夹结构的任何链接

不要留下多个会话文件夹：

❌ sessions-history/ 和 session-saves/ 同时存在
✅ 单个 session-saves/ 包含归档内容

# 检查会话文件夹变体
find . -type d -name "*session*" -not -path "*/archived/*"
# 应仅显示：./session-saves/

原始数据是神圣的

绝不修改原始数据 - 始终保留原始文件不变
存储在 data/raw/ 中，如果可能使其只读
记录数据来源（来自何处、何时下载）

处理后的数据层次结构

data/
├── raw/                    # 原始、不可变
├── interim/                # 中间处理步骤
├── processed/              # 最终、可用于分析的数据
└── external/               # 第三方数据

每个项目都应有一个包含以下内容的 README：

# 项目名称

简要描述

## 安装

如何设置环境

## 使用

如何运行分析/代码

## 项目结构

目录简要概述

## 数据

数据位置及如何访问

## 结果

在哪里找到输出

所有函数/类的文档字符串
复杂逻辑的注释
用于跟踪更改的 CHANGELOG.md
用于跟踪工作的 TODO.md（在合并前 gitignored 或移除）

变更文档最佳实践

在重大更改（清理、弃用、恢复）后，创建摘要文档：

创建带日期的摘要文档：
```
OPERATION_SUMMARY_YYYY-MM-DD.md
```
必要部分：
- 概述：做了什么以及为什么
- 问题：正在解决什么问题
- 解决方案：采取的行动
- 结果：更改后的当前状态
- 受影响文件：移动/更改/恢复了什么
- 恢复：如何撤销（如果需要）
良好摘要文档示例：
- FIGURE_RESTORATION_SUMMARY.md - 记录恢复的文件
- DEPRECATION_SUMMARY.md - 记录弃用的 notebooks
- RECENT_CHANGES_SUMMARY.md - 高级概述

# [操作] 摘要 - [日期]

## 问题
[问题简要描述]

## 解决方案
[为解决它做了什么]

### 更改的文件
- **移动**：[列表]
- **恢复**：[列表]
- **更新**：[列表]

## 当前状态
- **活动文件**：[数量和列表]
- **弃用文件**：[数量和列表]
- **状态**：[就绪/进行中/等]

## 恢复说明
```bash
# 如果需要撤销更改的命令

[已更新的文档列表]

日期：YYYY-MM-DD 状态：[完成/部分/等]


**为什么这很重要**：
- 未来用户（包括你自己）了解发生了什么变化
- 提供恢复说明（如果需要）
- 创建项目历史的审计跟踪
- 帮助协作者理解项目演变

## 常见反模式避免

❌ **根目录扁平结构**

project/ ├── script1.py ├── script2.py ├── data.csv ├── output1.png ├── output2.png └── final_really_final_v3.xlsx

❌ **模糊命名**

notebooks/ ├── notebook1.ipynb ├── test.ipynb ├── analysis.ipynb └── analysis_new.ipynb

❌ **关注点混合**

project/ ├── src/ │ ├── analysis.py │ ├── data.csv # 源代码目录中的数据 │ └── figure1.png # 源代码目录中的输出

## 项目重组

当重组现有项目的文件夹结构时，遵循此系统方法以避免破坏依赖关系。

### 重组后路径更新

重组文件夹后，**务必验证并更新所有脚本和 notebooks 中的文件路径**。文件通常包含硬编码路径，在重组后会失效。

#### 系统路径更新过程

1. **识别所有引用其他文件的文件**：
   ```bash
   # 查找 notebooks
   find . -name "*.ipynb"

   # 查找 Python 脚本
   find . -name "*.py"

   # 搜索文件引用
   grep -r "\.json\|\.csv\|\.tsv\|\.png" --include="*.ipynb" --include="*.py"

要搜索的常见文件引用模式：
- 数据文件：.json、.csv、.tsv、.xlsx
- 图表：.png、.jpg、.pdf、.svg
- Python：open()、read_csv()、read_json()、load()
- Jupyter：savefig()、文件 I/O 操作

使用 sed 进行批量替换更新路径：

# 更新数据文件路径
sed -i.bak "s|'filename.json'|'../data/filename.json'|g" notebook.ipynb

# 更新图表输出路径
sed -i.bak "s|savefig('fig|savefig('../figures/fig|g" notebook.ipynb

# 使用双引号更新（Python 代码中常见）
sed -i.bak 's|"filename.csv"|"../data/filename.csv"|g' script.py

验证更新：

# 检查路径是否正确更新
grep -o "'../data/[^']*'" notebook.ipynb | head -5
grep -o "'../figures/[^']*'" notebook.ipynb | head -5

清理备份文件：
```
rm *.bak *.bak2 *.bak3
```

重组后始终检查这些文件类型：

Jupyter notebooks（.ipynb）：数据加载、图表保存
Python 脚本（.py）：文件 I/O 操作
测试文件（test_*.py）：通常引用数据夹具
数据处理脚本：输入/输出路径
文档：包含文件路径的代码示例
配置文件：资源路径

原始位置	重组后	相对路径
`./data.json`	`../data/data.json`	上一级，进入 data/
`./figure.png`	`../figures/figure.png`	上一级，进入 figures/
`./test.py` 访问 `data.json`	`../data/data.json`	从 tests/ 到 data/
`./notebook.ipynb` 访问两者	`../data/`、`../figures/`	从 notebooks/ 到两者

使用相对路径（../data/）而非绝对路径以提高可移植性
批量处理 类似更改，使用 sed 或查找/替换
一次测试一种文件类型（先 notebooks，然后脚本）
保留备份文件（.bak）直到验证完成
使用 grep 验证 更改是否正确应用
考虑不同操作系统上的大小写敏感性

重组后的验证和清理

完成重组后，始终验证结果并进行清理：

1. 验证文件计数

# 统计移动到每个目录的文件数
for dir in figures data tests notebooks docs archives; do
  echo "$dir: $(ls $dir 2>/dev/null | wc -l | tr -d ' ') files"
done

# 确保根目录干净
ls -la

# 应仅看到：
# - 组织好的目录（figures/、data/ 等）
# - 项目特定文件夹（Fetch_data/、sharing/）
# - 配置目录（.claude/、.git/）
# - 必要文件（README.md、.gitignore 等）

3. 移除临时文件

# 移除 Jupyter 检查点（自动生成，版本控制中不需要）
rm -rf .ipynb_checkpoints

# 移除 sed 备份文件
rm *.bak *.bak2 *.bak3

# 移除重复/备份数据文件
rm *.backup *.backup2 *.old

4. 显示最终结构

# 显示干净的目录树
tree -L 2 -d

# 或仅列出目录
ls -d */

所有目标目录成功创建
文件计数符合预期（没有文件丢失）
根目录干净（没有散乱的文件）
临时/备份文件已移除
notebooks/脚本中的路径已更新（见“重组后路径更新”）
结构已记录（README 或类似文件）
测试 notebooks/脚本仍能正确运行

归档旧分支 - 删除已合并的功能分支
清理临时文件 - 从已完成工作中移除 TODO.md、NOTES.md
更新文档 - 保持 README 与更改同步
审查 .gitignore - 确保大型文件未被跟踪
组织 notebooks - 随着项目发展重命名/重新编号

README 完整且准确
代码已文档化
测试通过
大型文件已 gitignored
工作文件已移除（TODO.md、草稿 notebooks）
最终输出在 results/ 中
环境文件最新
已添加许可证（如果适用）

项目清理：识别必要文件

当项目随时间积累许多文件时，使用此系统方法识别并仅保留必要文件：

1. 分析 Notebooks 以查找使用的图表

# 从 Jupyter notebooks 中提取图表引用
grep -o "figures/[^'\"]*\.png" YourNotebook.ipynb | sort -u

# 对于多个 notebooks，检查每个
for nb in *.ipynb; do
    echo "=== $nb ==="
    grep -o "figures/[^'\"]*\.png" "$nb" | sort -u
done

2. 将图表映射到生成脚本

# 查找哪个脚本生成特定图表
grep -l "figure_name" scripts/*.py

# 搜索输出目录模式
grep -l "figures/curation_impact" scripts/*.py

3. 组织弃用文件

创建清晰结构：

mkdir -p deprecated/{figures,scripts,notebooks}
mkdir -p deprecated/figures/{unused_category1,unused_category2}
mkdir -p deprecated/scripts/unused_utilities

使用描述性的子目录名称：

deprecated/
├── figures/
│   ├── unused_regression_plots/       # 基于类别的名称
│   ├── unused_curation_impact/
│   └── exploratory_analysis/
├── scripts/
│   ├── unused_utilities/              # 基于目的的组织
│   ├── old_data_fetch/
│   └── notebook_fixes/
└── data/
    ├── intermediate_tables/
    └── old_versions/

deprecated/
├── old_stuff/        # 太模糊
├── misc/             # 目的不明确
└── temp/             # 模糊不清

良好命名的优点：

未来的你理解每个文件夹中的内容
易于恢复特定类别
清楚哪些可以安全删除 vs 归档
记录项目演变

4. 记录保留内容

创建 MINIMAL_ESSENTIAL_FILES.md：

列出所有活动图表及其源脚本
列出必要脚本及其目的
提供重新生成说明
包括弃用文件的恢复说明

## 活动图表
1. figure_01.png - 在 Notebook A 中使用（图 1）
   - 生成者：script_14.py

## 必要脚本
1. script_14.py - 生成图 1-4、7
2. build_data.py - 必需的基础设施

在最终确定清理前：

所有 notebook 引用的图表已识别
生成这些图表的脚本已识别
未使用文件已移动（非删除）到 deprecated/
文档已创建（MINIMAL_ESSENTIAL_FILES.md）
重新生成命令已测试
Notebooks 在清理后的结构中仍能工作

减少混淆：清楚哪些文件是活动的 vs 历史的
更易维护：只需更新必要文件
更好的文档：明确的图表 → 脚本映射
可恢复：弃用文件被保留，而非删除
上手：新协作者看到最小必要集合

识别要弃用的文件

清理包含多个配置文件版本的分析目录时：

指示旧/被取代文件的模式：

命名模式：
- 存在 _UPDATED 版本时没有 _UPDATED 后缀的文件
- 带有中间版本号或日期的文件
- 名为 *_old.txt、*_backup.csv 的文件
内容指示：
- 旧参数值（例如，旧配色方案）
- 旧物种名称（例如，Time Tree 替换）
- 不完整的覆盖范围（比当前更少的物种）
多个类似文件：
- itol_branch_colors.txt、itol_branch_colors_v2.txt、itol_branch_colors_UPDATED.txt
- 保留：_UPDATED.txt（当前版本）
- 弃用：其他（被取代的）

# 创建具有描述性名称的弃用目录
mkdir -p deprecated/phylo_old_configs

# 移动被取代的文件（保留供参考）
mv old_file1.txt old_file2.csv deprecated/phylo_old_configs/

# 验证移动（应为空或仅当前文件）
ls *.txt *.csv

活动目录中保留的文件：

当前版本（例如，*_UPDATED.*）
生成配置的源脚本
文档（README、MANIFEST）
正在使用的数据文件

要弃用的文件：

带有旧参数的被取代配置
中间测试版本
来自先前分析版本的文件
不再被 notebooks 引用的配置

来自 phylo 清理的示例（18 个文件被弃用）：

旧：itol_3category_colorstrip.txt（旧颜色）
当前：itol_3category_colorstrip_UPDATED.txt（新颜色）
旧：species_curation_methods.csv（2 类别系统）
当前：species_3category_methods_UPDATED.csv（3 类别系统）

清楚哪些文件是当前的 vs 历史的
更新配置时减少混淆
保留旧版本以供比较
更易识别需要更新的文件

项目在根目录中积累文档文件（.md、.log、.txt）。有效地整合它们：

documentation/
├── README.md                    # 所有文档的索引
├── logs/                        # 流程的日志文件
├── working_files/               # 临时/工作文件
└── [组织好的 .md 文件]

# 1. 创建结构
mkdir -p documentation/{logs,working_files}

# 2. 移动文档
mv *.md documentation/
mv *.log documentation/logs/
mv *.txt documentation/working_files/  # 或将必要的保留在根目录

# 3. 创建索引（documentation/README.md）
cat > documentation/README.md << 'EOF'
# 项目文档

## 快速开始
- ESSENTIAL_FILE.md - 从这里开始
- RECENT_CHANGES.md - 最新更新

## 按类别
### 分析
- analysis_summary.md
- results.md

### 方法
- methods.md
- protocols.md

[等等...]
EOF

文档 README 模板

在 documentation/README.md 中包含：

快速开始部分 - 最重要的文档
分类组织 - 按目的分组
文件描述 - 单行摘要
文件计数 - 显示组织规模
归档策略 - 哪些文档是历史的
访问说明 - 如何查找特定信息

根目录保留内容

保留在项目根目录：

README.md - 项目概述
LICENSE、CONTRIBUTING.md - 标准文件
.gitignore、配置文件

移动到 documentation/：

分析摘要
会话笔记
方法描述
更新日志
所有其他 markdown 文件

干净的根目录：仅可见必要的项目文件
组织好的文档：易于查找特定文档
分类：日志与摘要与方法是分开的
已索引：README 提供路线图
可扩展：新文档有明确位置

❌ 不要删除旧文档 - 将其移动到 documentation/archive/ ✓ 保留历史但清晰地组织它

用于令牌高效导航的 MANIFEST 系统

对于大型数据分析项目，实施 MANIFEST 系统以实现高效的项目导航并最小化 Claude Code 会话中的令牌使用。

挑战：包含许多文件的大型项目在会话启动期间消耗过多令牌：

读取大型 notebooks（4-6 MB）= 5,000-15,000 令牌
探索数据文件和结构 = 3,000-5,000 令牌
理解脚本和工作流 = 2,000-3,000 令牌
总计：仅用于定位就 15,000-23,000 令牌！

解决方案：MANIFEST 文件提供轻量级项目索引（每个约 500-2,000 令牌），无需读取实际文件即可提供完整上下文。

使用 MANIFEST 前（读取实际文件）：

Root MANIFEST: N/A
Notebooks: ~10,000-15,000 tokens (reading 3 large notebooks)
Data exploration: ~3,000-5,000 tokens
Scripts analysis: ~2,000-3,000 tokens
---
Total: ~15,000-23,000 tokens for project orientation

使用 MANIFEST 后（读取索引）：

Root MANIFEST: ~1,500 tokens (full project overview)
Subdirectory MANIFEST: ~500-1,000 tokens (specific area)
---
Total: ~2,000-2,500 tokens for complete context
Savings: 85-90% token reduction!

什么是 MANIFEST？

MANIFEST.md 文件是目录的全面索引，包括：

快速参考：入口点、关键输出、依赖关系
文件清单：所有文件及其描述、大小、用途
工作流依赖关系：文件如何关联和相互依赖
恢复工作笔记：当前状态、后续步骤、已知问题
元数据：标签、环境信息、Obsidian 笔记链接

关键原则：MANIFEST 提供恢复工作所需 80% 的上下文，仅需 500-2,000 令牌，而不是读取 5,000-15,000 令牌的实际文件。

根目录 MANIFEST 模板

# [项目名称] - 根 MANIFEST

**最后更新**：YYYY-MM-DD
**目的**：[1-2 句描述]
**状态**：活动/弃用/归档

---

## 快速参考

**入口点**：[首先读取哪些文件]
**关键输出**：[主要交付物]
**依赖关系**：[外部要求]

---

## 文件

### Notebooks

#### `notebook_name.ipynb`（大小）
- **目的**：[此分析回答什么问题？]
- **依赖**：[输入文件、数据、脚本]
- **生成**：[输出文件、图表]
- **关键发现**：[1-2 句摘要]
- **最后修改**：YYYY-MM-DD
- **执行时间**：[如果相关，约 X 分钟]
- **优先级**：[主要文档或补充分析]

### 关键目录

#### `data/`（大小）
详见 `data/MANIFEST.md`。
- **内容**：[简要描述]
- **关键文件**：[最重要的文件]

[为 figures/、scripts/、documentation/ 重复]

---

## 目录结构

project/ ├── MANIFEST.md（此文件） ├── data/ │ └── MANIFEST.md ├── figures/ │ └── MANIFEST.md └── [等等。]

---

## 工作流依赖关系

[数据 → 处理 → 输出的可视化或文本描述]

示例：

1. 数据获取：fetch_data.py → data/raw/
2. 处理：process.py → data/processed/
3. 分析：analysis.ipynb → figures/

---

## 恢复工作笔记

**当前状态**：[最后完成的是什么？]
**后续步骤**：[接下来需要做什么？]
**已知问题**：[问题、待办事项、阻碍]
**参考**：[相关文档、其他 MANIFEST 的链接]

---

## 元数据

**创建者**：Claude Code
**项目**：[项目名称]
**标签**：[#用于 #搜索的 #关键词]
**环境**：[conda 环境名称或 venv 路径]
**Obsidian 笔记路径**：[项目笔记链接]

---

## 对于 Claude Code 会话

**新会话快速开始**：
1. 读取此 MANIFEST.md（~500 令牌）
2. 读取相关子目录 MANIFEST.md（~500 令牌）
3. 仅在编辑时读取实际文件

**令牌效率**：
- 此 MANIFEST 提供所需 80% 的上下文
- 子目录 MANIFEST 提供详细文件信息
- 仅在更改时读取实际代码/notebooks

子目录 MANIFEST 模板

使用相同结构但专注于特定目录。对于子目录：

data/MANIFEST.md - 专注于：

数据来源（数据来自何处）
文件格式和结构（行、列、大小）
数据依赖关系（哪些文件依赖哪些）
处理历史（原始 → 处理后的版本）

figures/MANIFEST.md - 专注于：

哪些代码生成哪些图表
图表目的和关键信息
手稿图表编号
图表依赖关系（数据源）

scripts/MANIFEST.md - 专注于：

脚本目的和 I/O
执行顺序和依赖关系
使用示例和参数
所需依赖关系

documentation/MANIFEST.md - 专注于：

按类别组织的文档
关键入口点（RESUME_HERE.md）
活动 vs 归档状态
会话摘要位置

跨目录链接 MANIFEST

当实现 analysis_files/（迭代 2）时，创建双向链接：

在 figures/MANIFEST.md 中 - 链接到分析文件：

**01_figure_name.png**（318 KB）
- **描述**：简要描述
- **分析文件**：`../analysis_files/figures/01_figure_name.md` - 详细分析

在 analysis_files/MANIFEST.md 中 - 链接回图表：

#### 01_figure_name.md
- **图表文件**：`figures/curation_impact_3cat/01_figure_name.png`
- **目的**：详细分析和解释

在根 MANIFEST.md 中 - 引用两者：

#### `analysis_files/`（~90 KB）**[新 - 迭代 2]**
- **目的**：用于图表分析的独立 markdown 文件
- **令牌效率**：比 notebooks 减少约 98%
- **链接到**：figures/ 目录

这创建了一个可导航的文档网络。

MANIFEST 模板文件

在项目根目录创建 MANIFEST_TEMPLATE.md 作为起点。有关包含所有部分和指导的完整示例，请参阅 Curation_Paper_figures 项目中的模板。

MANIFEST 管理命令

/generate-manifest 命令

在 .claude/commands/generate-manifest.md 中创建

🇺🇸English

Folder Organization Best Practices

Expert guidance for organizing project directories, establishing file naming conventions, and maintaining clean, navigable project structures for research and development work.

When to Use This Skill

Setting up new projects
Reorganizing existing projects
Establishing team conventions
Creating reproducible research structures
Managing data-intensive projects

Core Principles

Predictability - Standard locations for common file types
Scalability - Structure grows gracefully with project
Discoverability - Easy for others (and future you) to navigate
Separation of Concerns - Code, data, documentation, outputs separated
Version Control Friendly - Large/generated files excluded appropriately

Standard Project Structure

Research/Analysis Projects

project-name/
├── README.md                 # Project overview and getting started
├── .gitignore               # Exclude data, outputs, env files
├── environment.yml          # Conda environment (or requirements.txt)
├── data/                    # Input data (often gitignored)
│   ├── raw/                # Original, immutable data
│   ├── processed/          # Cleaned, transformed data
│   └── external/           # Third-party data
├── notebooks/               # Jupyter notebooks for exploration
│   ├── 01-exploration.ipynb
│   ├── 02-analysis.ipynb
│   └── figures/            # Notebook-generated figures
├── src/                     # Source code (reusable modules)
│   ├── __init__.py
│   ├── data_processing.py
│   ├── analysis.py
│   └── visualization.py
├── scripts/                 # Standalone scripts and workflows
│   ├── download_data.sh
│   └── run_pipeline.py
├── tests/                   # Unit tests
│   └── test_analysis.py
├── docs/                    # Documentation
│   ├── methods.md
│   └── references.md
├── results/                 # Analysis outputs (gitignored)
│   ├── figures/
│   ├── tables/
│   └── models/
└── config/                  # Configuration files
    └── analysis_config.yaml

Development Projects

project-name/
├── README.md
├── .gitignore
├── setup.py                 # Package configuration
├── requirements.txt         # or pyproject.toml
├── src/
│   └── package_name/
│       ├── __init__.py
│       ├── core.py
│       └── utils.py
├── tests/
│   ├── test_core.py
│   └── test_utils.py
├── docs/
│   ├── api.md
│   └── usage.md
├── examples/                # Example usage
│   └── example_workflow.py
└── .github/                 # CI/CD workflows
    └── workflows/
        └── tests.yml

Bioinformatics/Workflow Projects

project-name/
├── README.md
├── data/
│   ├── raw/                # Raw sequencing data
│   ├── reference/          # Reference genomes, annotations
│   └── processed/          # Workflow outputs
├── workflows/               # Galaxy .ga or Snakemake files
│   ├── preprocessing.ga
│   └── assembly.ga
├── config/
│   ├── workflow_params.yaml
│   └── sample_sheet.tsv
├── scripts/                # Helper scripts
│   ├── submit_workflow.py
│   └── quality_check.py
├── results/                # Final outputs
│   ├── figures/
│   ├── tables/
│   └── reports/
└── logs/                   # Workflow execution logs

Data Analysis Projects with Notebooks

For projects involving Jupyter notebooks, data analysis, and visualization with many generated figures:

project-name/
├── README.md                # Project overview
├── .gitignore
├── notebooks/               # Jupyter notebooks (analysis, exploration)
│   ├── 01-data-loading.ipynb
│   ├── 02-exploratory-analysis.ipynb
│   └── 03-final-analysis.ipynb
├── figures/                 # ALL generated visualizations (PNG, PDF, SVG)
│   ├── fig1_distribution.png
│   ├── fig2_correlation.png
│   └── supplementary_*.png
├── data/                    # Data files (JSON, CSV, TSV, Excel)
│   ├── raw/                # (optional) Original, unprocessed data
│   ├── processed/          # (optional) Cleaned, processed data
│   ├── input_data.json
│   └── metadata.tsv
├── tests/                   # Test scripts (test_*.py, pytest)
│   ├── test_processing.py
│   └── test_analysis.py
├── scripts/                 # Standalone Python/R scripts
│   ├── data_fetch.py
│   └── preprocessing.py
├── docs/                    # Documentation (MD, RST files)
│   ├── methods.md
│   └── analysis_notes.md
└── archives/                # Compressed archives, old versions
    └── backup_YYYYMMDD.tar.gz

Benefits of This Structure :

Clear separation of concerns (code vs. data vs. outputs)
Easy navigation : Find all figures in one place
Scalability : Handles 50+ figures without cluttering root
Git-friendly : Easy to .gitignore large data/figures
Collaboration : Standard structure reduces onboarding time

When to Use This Structure :

Projects with multiple notebooks
Analysis generating many visualizations (10+ figures)
Multiple data sources/formats
Team collaboration
Long-term research projects

MANIFEST Integration :

For enhanced navigation and token efficiency, add MANIFEST.md files:

project-name/
├── MANIFEST.md                  # Root project index (~1,500 tokens)
├── MANIFEST_TEMPLATE.md         # Template for creating new MANIFESTs
├── notebooks/
│   ├── MANIFEST.md             # (optional) Notebook catalog
│   └── [notebook files]
├── figures/
│   ├── MANIFEST.md             # Figure catalog (~500-1,000 tokens)
│   └── [figure files]
├── data/
│   ├── MANIFEST.md             # Data inventory (~500-1,000 tokens)
│   └── [data files]
├── scripts/
│   ├── MANIFEST.md             # Script documentation (~500-1,000 tokens)
│   └── [script files]
└── documentation/
    ├── MANIFEST.md             # Doc organization (~500-1,000 tokens)
    └── [doc files]

Benefits:

85-90% token reduction for session startup
Complete project context in ~2,000 tokens (vs 15,000+)
Quick work resumption without reading files
Clear workflow documentation

See "MANIFEST System for Token-Efficient Navigation" section below for complete documentation.

File Naming Conventions

General Rules

Use lowercase with hyphens or underscores
- ✅ data-analysis.py or data_analysis.py
- ❌ DataAnalysis.py or data analysis.py
Be descriptive but concise
- ✅ process-telomere-data.py
- ❌ script.py or process_all_the_telomere_sequencing_data_from_experiments.py
Use consistent separators
- Choose either hyphens or underscores and stick with it

Numbered Sequences

For sequential files (notebooks, scripts), use zero-padded numbers:

notebooks/
├── 01-data-exploration.ipynb
├── 02-quality-control.ipynb
├── 03-statistical-analysis.ipynb
└── 04-visualization.ipynb

Data Files

Include metadata in filename when possible:

data/raw/
├── sample-A_hifi_reads_2026-01-15.fastq.gz
├── sample-B_hifi_reads_2026-01-15.fastq.gz
└── reference_genome_v3.fasta

Directory Management Best Practices

What to Version Control

DO commit:

Source code
Documentation
Configuration files
Small test datasets (<1MB)
Requirements/environment files
README files

DON'T commit:

Large data files (use .gitignore)
Generated outputs
Environment directories (venv/, conda-env/)
Logs
Temporary files
API keys/secrets

.gitignore Template

# Claude Code (local skills and settings)
.claude/

# Python
__pycache__/
*.py[cod]
*$py.class
.venv/
venv/
*.egg-info/

# Jupyter
.ipynb_checkpoints/
*.ipynb_checkpoints

# Data
data/raw/
data/processed/
*.fastq.gz
*.bam
*.vcf.gz

# Outputs
results/
outputs/
*.png
*.pdf
*.html

# Logs
logs/
*.log

# Environment
.env
environment.local.yml

# OS
.DS_Store
Thumbs.db

Session Notes Storage

Standardize onsession-saves/ (not sessions-history/, Archive/, or other variants)

Benefits:

Consistent with standard naming pattern (noun + plural)
Matches archived/ pattern
Works across all projects
Prevents confusion when writing automation scripts

When reorganizing:

mv sessions-history/ session-saves/
mv Archive/ session-saves/  # if used for session notes

Project folder template (with session notes):

project-name/
├── TO-DOS.md                # Project-specific tasks
├── session-saves/           # Working session notes (tagged with #dump)
├── archived/                # Processed/consolidated notes
│   ├── daily/              # Daily consolidations
│   └── monthly/            # Monthly summaries
├── Planning/               # Planning documents
├── Development/            # Development notes
└── [other content folders]

Integration with Obsidian:

Session notes should have dump tag in frontmatter for easy filtering
See the obsidian skill for complete details on the dump tag requirement and frontmatter schema
Archiving workflow: move from session-saves/ to archived/daily/ or archived/monthly/

After Reorganization

Clean up old session folder variations:

If you renamed folders during reorganization (e.g., sessions-history/ → session-saves/):

Move archived sessions:

mv old-folder/*.md session-saves/archived/daily/

Remove empty old folders safely:

rmdir old-folder/  # Fails if not empty - safety check

Update any links to old folder structure

Don't leave multiple session folders:

❌ sessions-history/ and session-saves/ both present
✅ Single session-saves/ with archived content

Verification:

# Check for session folder variations
find . -type d -name "*session*" -not -path "*/archived/*"
# Should show only: ./session-saves/

Data Organization

Raw Data is Sacred

Never modify raw data - Always keep originals untouched
Store in data/raw/ and make it read-only if possible
Document data provenance (where it came from, when downloaded)

Processed Data Hierarchy

data/
├── raw/                    # Original, immutable
├── interim/                # Intermediate processing steps
├── processed/              # Final, analysis-ready data
└── external/               # Third-party data

Documentation Standards

README.md Essentials

Every project should have a README with:

# Project Name

Brief description

## Installation

How to set up the environment

## Usage

How to run the analysis/code

## Project Structure

Brief overview of directories

## Data

Where data lives and how to access it

## Results

Where to find outputs

Code Documentation

Docstrings for all functions/classes
Comments for complex logic
CHANGELOG.md for tracking changes
TODO.md for tracking work (gitignored or removed before merge)

Change Documentation Best Practices

After major changes (cleanup, deprecation, restoration), create summary documents:

Create a dated summary document :
```
OPERATION_SUMMARY_YYYY-MM-DD.md
```
Essential sections :
- Overview : What was done and why
- Problem : What issue was being addressed
- Solution : Actions taken
- Result : Current state after changes
- Files affected : What was moved/changed/restored
- Restoration : How to undo if needed
Examples of good summary docs :
- FIGURE_RESTORATION_SUMMARY.md - Documents restored files
- DEPRECATION_SUMMARY.md - Documents deprecated notebooks
- RECENT_CHANGES_SUMMARY.md - High-level overview

Template for Change Summaries

# [Operation] Summary - [Date]

## Problem
[Brief description of the issue]

## Solution
[What was done to address it]

### Files Changed
- **Moved**: [list]
- **Restored**: [list]
- **Updated**: [list]

## Current State
- **Active files**: [count and list]
- **Deprecated files**: [count and list]
- **Status**: [Ready/In Progress/etc.]

## Restoration Instructions
```bash
# Commands to undo changes if needed

Documentation Updated

[List of docs that were updated]

Date : YYYY-MM-DD Status : [Complete/Partial/etc.]

**Why This Matters**:
- Future users (including yourself) understand what changed
- Provides restoration instructions if needed
- Creates audit trail for project history
- Helps collaborators understand project evolution

## Common Anti-Patterns to Avoid

❌ **Flat structure with everything in root**

project/ ├── script1.py ├── script2.py ├── data.csv ├── output1.png ├── output2.png └── final_really_final_v3.xlsx

❌ **Ambiguous naming**

notebooks/ ├── notebook1.ipynb ├── test.ipynb ├── analysis.ipynb └── analysis_new.ipynb

❌ **Mixed concerns**

project/ ├── src/ │ ├── analysis.py │ ├── data.csv # Data in source code directory │ └── figure1.png # Output in source code directory

## Project Reorganization

When reorganizing an existing project's folder structure, follow this systematic approach to avoid breaking dependencies.

### Post-Reorganization Path Updates

After reorganizing folders, **ALWAYS verify and update file paths** in all scripts and notebooks. Files often contain hardcoded paths that break after reorganization.

#### Systematic Path Update Process

1. **Identify all files that reference other files**:
   ```bash
   # Find notebooks
   find . -name "*.ipynb"

   # Find Python scripts
   find . -name "*.py"

   # Search for file references
   grep -r "\.json\|\.csv\|\.tsv\|\.png" --include="*.ipynb" --include="*.py"

2. Common file reference patterns to search for :

 * Data files: `.json`, `.csv`, `.tsv`, `.xlsx`
 * Figures: `.png`, `.jpg`, `.pdf`, `.svg`
 * Python: `open()`, `read_csv()`, `read_json()`, `load()`
 * Jupyter: `savefig()`, file I/O operations

3. Update paths using sed for batch replacements :

     # Update data file paths
     sed -i.bak "s|'filename.json'|'../data/filename.json'|g" notebook.ipynb
     
     # Update figure output paths
     sed -i.bak "s|savefig('fig|savefig('../figures/fig|g" notebook.ipynb
     
     # Update with double quotes (common in Python code)
     sed -i.bak 's|"filename.csv"|"../data/filename.csv"|g' script.py

4. Verify updates :

     # Check that paths were updated correctly
     grep -o "'../data/[^']*'" notebook.ipynb | head -5
     grep -o "'../figures/[^']*'" notebook.ipynb | head -5

5. Clean up backup files :

     rm *.bak *.bak2 *.bak3

Files to Check

Always check these file types after reorganization:

Jupyter notebooks (.ipynb): Data loading, figure saving
Python scripts (.py): File I/O operations
Test files (test_*.py): Often reference data fixtures
Data processing scripts : Input/output paths
Documentation : Code examples with file paths
Configuration files : Paths to resources

Common Path Patterns

Original Location	After Reorganization	Relative Path
`./data.json`	`../data/data.json`	Go up one level, into data/
`./figure.png`	`../figures/figure.png`	Go up one level, into figures/
`./test.py` accessing `data.json`	`../data/data.json`

Tips

Use relative paths (../data/) not absolute paths for portability
Batch process similar changes using sed or find/replace
Test one file type at a time (notebooks first, then scripts)
Keep backup files (.bak) until verification complete
Grep to verify changes were applied correctly
Consider case sensitivity on different operating systems

Verification and Cleanup After Reorganization

After completing a reorganization, always verify the results and clean up:

1. Verify File Counts

# Count files moved to each directory
for dir in figures data tests notebooks docs archives; do
  echo "$dir: $(ls $dir 2>/dev/null | wc -l | tr -d ' ') files"
done

2. Check Root Directory

# Ensure root is clean
ls -la

# Should only see:
# - Organized directories (figures/, data/, etc.)
# - Project-specific folders (Fetch_data/, sharing/)
# - Config directories (.claude/, .git/)
# - Essential files (README.md, .gitignore, etc.)

3. Remove Temporary Files

# Remove Jupyter checkpoints (auto-generated, not needed in version control)
rm -rf .ipynb_checkpoints

# Remove sed backup files
rm *.bak *.bak2 *.bak3

# Remove duplicate/backup data files
rm *.backup *.backup2 *.old

4. Display Final Structure

# Show clean directory tree
tree -L 2 -d

# Or list directories only
ls -d */

Verification Checklist

All target directories created successfully
File counts match expectations (no files lost)
Root directory is clean (no scattered files)
Temporary/backup files removed
Paths in notebooks/scripts updated (see "Post-Reorganization Path Updates")
Structure documented (README or similar)
Test that notebooks/scripts still run correctly

Cleanup and Maintenance

Regular Maintenance Tasks

Archive old branches - Delete merged feature branches
Clean temp files - Remove TODO.md, NOTES.md from completed work
Update documentation - Keep README current with changes
Review .gitignore - Ensure large files aren't tracked
Organize notebooks - Rename/renumber as project evolves

End-of-Project Checklist

README complete and accurate
Code documented
Tests passing
Large files gitignored
Working files removed (TODO.md, scratch notebooks)
Final outputs in results/
Environment files current
License added (if applicable)

Project Cleanup: Identifying Essential Files

When projects accumulate many files over time, use this systematic approach to identify and keep only essential files:

1. Analyze Notebooks to Find Used Figures

# Extract figure references from Jupyter notebooks
grep -o "figures/[^'\"]*\.png" YourNotebook.ipynb | sort -u

# For multiple notebooks, check each one
for nb in *.ipynb; do
    echo "=== $nb ==="
    grep -o "figures/[^'\"]*\.png" "$nb" | sort -u
done

2. Map Figures to Generating Scripts

# Find which script generates a specific figure
grep -l "figure_name" scripts/*.py

# Search for output directory patterns
grep -l "figures/curation_impact" scripts/*.py

3. Organize Deprecated Files

Create clear structure:

mkdir -p deprecated/{figures,scripts,notebooks}
mkdir -p deprecated/figures/{unused_category1,unused_category2}
mkdir -p deprecated/scripts/unused_utilities

Use descriptive subdirectory names:

Good structure:

deprecated/
├── figures/
│   ├── unused_regression_plots/       # Category-based names
│   ├── unused_curation_impact/
│   └── exploratory_analysis/
├── scripts/
│   ├── unused_utilities/              # Purpose-based organization
│   ├── old_data_fetch/
│   └── notebook_fixes/
└── data/
    ├── intermediate_tables/
    └── old_versions/

Poor structure:

deprecated/
├── old_stuff/        # Too vague
├── misc/             # Unclear purpose
└── temp/             # Ambiguous

Benefits of good naming:

Future-you understands what's in each folder
Easy to restore specific categories
Clear what can be safely deleted vs archived
Documents project evolution

4. Document What Was Kept

Create MINIMAL_ESSENTIAL_FILES.md:

List all active figures and their source scripts
List essential scripts with their purposes
Provide regeneration instructions
Include restoration instructions for deprecated files

Example structure :

## Active Figures
1. figure_01.png - Used in Notebook A (Figure 1)
   - Generated by: script_14.py

## Essential Scripts
1. script_14.py - Generates Figures 1-4, 7
2. build_data.py - Required infrastructure

5. Verification Checklist

Before finalizing cleanup:

All notebook-referenced figures identified
Scripts generating those figures identified
Unused files moved (not deleted) to deprecated/
Documentation created (MINIMAL_ESSENTIAL_FILES.md)
Regeneration commands tested
Notebooks still work with cleaned structure

Benefits of This Approach

Reduced confusion : Clear which files are active vs historical
Easier maintenance : Only essential files to update
Better documentation : Explicit mapping of figures → scripts
Recoverable : Deprecated files preserved, not deleted
Onboarding : New collaborators see minimal essential set

Identifying Files for Deprecation

When cleaning up analysis directories with multiple config file versions:

Patterns indicating old/superseded files:

Naming patterns:
- Files without _UPDATED suffix when _UPDATED versions exist
- Files with intermediate version numbers or dates
- Files named *_old.txt, *_backup.csv
Content indicators:
- Old parameter values (e.g., old color schemes)
- Old species names (e.g., Time Tree replacements)
- Incomplete coverage (fewer species than current)
Multiple similar files:
- itol_branch_colors.txt, itol_branch_colors_v2.txt, itol_branch_colors_UPDATED.txt

Cleanup Strategy:

# Create deprecation directory with descriptive name
mkdir -p deprecated/phylo_old_configs

# Move superseded files (preserve for reference)
mv old_file1.txt old_file2.csv deprecated/phylo_old_configs/

# Verify move (should be empty or only current files)
ls *.txt *.csv

Files to keep in active directory:

Current versions (e.g., *_UPDATED.*)
Source scripts that generate configs
Documentation (README, MANIFEST)
Data files actively used

Files to deprecate:

Superseded configs with old parameters
Intermediate test versions
Files from previous analysis versions
Configs no longer referenced by notebooks

Example from phylo cleanup (18 files deprecated):

Old: itol_3category_colorstrip.txt (old colors)
Current: itol_3category_colorstrip_UPDATED.txt (new colors)
Old: species_curation_methods.csv (2-category system)
Current: species_3category_methods_UPDATED.csv (3-category system)

Benefits:

Clear which files are current vs historical
Reduced confusion when updating configs
Preserved old versions for comparison
Easier to identify files needing updates

Documentation Organization Strategy

Projects accumulate documentation files (.md, .log, .txt) in the root directory. Consolidate them effectively:

Structure

documentation/
├── README.md                    # Index to all documentation
├── logs/                        # Log files from processes
├── working_files/               # Temporary/working files
└── [organized .md files]

Implementation

# 1. Create structure
mkdir -p documentation/{logs,working_files}

# 2. Move documentation
mv *.md documentation/
mv *.log documentation/logs/
mv *.txt documentation/working_files/  # or keep essential ones in root

# 3. Create index (documentation/README.md)
cat > documentation/README.md << 'EOF'
# Project Documentation

## Quick Start
- ESSENTIAL_FILE.md - Start here
- RECENT_CHANGES.md - Latest updates

## By Category
### Analysis
- analysis_summary.md
- results.md

### Methods
- methods.md
- protocols.md

[etc...]
EOF

Documentation README Template

Include in documentation/README.md:

Quick start section - Most important docs
Categorical organization - Group by purpose
File descriptions - One-line summaries
File counts - Show organization scale
Archive policy - Which docs are historical
Access instructions - How to find specific info

What to Keep in Root

Keep in project root:

README.md - Project overview
LICENSE, CONTRIBUTING.md - Standard files
.gitignore, config files

Move to documentation/:

Analysis summaries
Session notes
Method descriptions
Update logs
All other markdown files

Benefits

Clean root directory : Only essential project files visible
Organized docs : Easy to find specific documentation
Categorized : Logs separate from summaries separate from methods
Indexed : README provides roadmap
Scalable : Clear place for new documentation

Common Mistake

❌ Don't delete old documentation - move it to documentation/archive/ ✓ Preserve history but organize it clearly

MANIFEST System for Token-Efficient Navigation

For large data analysis projects, implement a MANIFEST system to enable efficient project navigation and minimize token usage in Claude Code sessions.

Problem Statement

Challenge : Large projects with many files consume excessive tokens during session startup:

Reading large notebooks (4-6 MB) = 5,000-15,000 tokens
Exploring data files and structure = 3,000-5,000 tokens
Understanding scripts and workflows = 2,000-3,000 tokens
Total: 15,000-23,000 tokens just for orientation!

Solution : MANIFEST files provide lightweight project indexes (~500-2,000 tokens each) that give complete context without reading actual files.

Token Efficiency Impact

Before MANIFESTs (reading actual files):

Root MANIFEST: N/A
Notebooks: ~10,000-15,000 tokens (reading 3 large notebooks)
Data exploration: ~3,000-5,000 tokens
Scripts analysis: ~2,000-3,000 tokens
---
Total: ~15,000-23,000 tokens for project orientation

After MANIFESTs (reading indexes):

Root MANIFEST: ~1,500 tokens (full project overview)
Subdirectory MANIFEST: ~500-1,000 tokens (specific area)
---
Total: ~2,000-2,500 tokens for complete context
Savings: 85-90% token reduction!

What is a MANIFEST?

A MANIFEST.md file is a comprehensive index for a directory that includes:

Quick Reference : Entry points, key outputs, dependencies
File Inventory : All files with descriptions, sizes, purposes
Workflow Dependencies : How files relate and depend on each other
Notes for Resuming Work : Current status, next steps, known issues
Metadata : Tags, environment info, Obsidian notes links

Key Principle : MANIFEST provides 80% of context needed to resume work in 500-2,000 tokens instead of reading 5,000-15,000 tokens of actual files.

MANIFEST Structure

Root Directory MANIFEST Template

# [Project Name] - ROOT MANIFEST

**Last Updated**: YYYY-MM-DD
**Purpose**: [1-2 sentence description]
**Status**: Active/Deprecated/Archive

---

## Quick Reference

**Entry Points**: [Which files to read first]
**Key Outputs**: [Main deliverables]
**Dependencies**: [External requirements]

---

## Files

### Notebooks

#### `notebook_name.ipynb` (Size)
- **Purpose**: [What analysis/questions does this answer?]
- **Depends on**: [Input files, data, scripts]
- **Generates**: [Output files, figures]
- **Key findings**: [1-2 sentence summary]
- **Last modified**: YYYY-MM-DD
- **Execution time**: [~X minutes if relevant]
- **Priority**: [Main document or complementary analysis]

### Key Directories

#### `data/` (Size)
See `data/MANIFEST.md` for details.
- **Contents**: [Brief description]
- **Key files**: [Most important files]

[Repeat for figures/, scripts/, documentation/]

---

## Directory Structure

project/ ├── MANIFEST.md (this file) ├── data/ │ └── MANIFEST.md ├── figures/ │ └── MANIFEST.md └── [etc.]

---

## Workflow Dependencies

[Visual or text description of data → processing → outputs flow]

Example:

Data Acquisition: fetch_data.py → data/raw/
Processing: process.py → data/processed/
Analysis: analysis.ipynb → figures/

---

## Notes for Resuming Work

**Current Status**: [What was last completed?]
**Next Steps**: [What needs to be done next?]
**Known Issues**: [Problems, TODOs, blockers]
**Reference**: [Links to related docs, other MANIFESTs]

---

## Metadata

**Created by**: Claude Code
**Project**: [Project name]
**Tags**: [#keywords #for #searching]
**Environment**: [conda env name or venv path]
**Obsidian notes path**: [Link to project notes]

---

## For Claude Code Sessions

**Quick Start for New Sessions**:
1. Read this MANIFEST.md (~500 tokens)
2. Read relevant subdirectory MANIFEST.md (~500 tokens)
3. Only read actual files when editing them

**Token Efficiency**:
- This MANIFEST provides 80% of context needed
- Subdirectory MANIFESTs provide detailed file info
- Read actual code/notebooks only when making changes

Subdirectory MANIFEST Template

Use the same structure but focused on the specific directory. For subdirectories:

data/MANIFEST.md - Focus on:

Data provenance (where data came from)
File formats and structure (rows, columns, size)
Data dependencies (which files depend on which)
Processing history (original → processed versions)

figures/MANIFEST.md - Focus on:

Which code generates which figure
Figure purpose and key message
Manuscript figure numbering
Figure dependencies (data sources)

scripts/MANIFEST.md - Focus on:

Script purpose and I/O
Execution order and dependencies
Usage examples and parameters
Required dependencies

documentation/MANIFEST.md - Focus on:

Document organization by category
Critical entry points (RESUME_HERE.md)
Active vs archived status
Session summary locations

Linking MANIFESTs Across Directories

When implementing analysis_files/ (Iteration 2), create bidirectional links:

In figures/MANIFEST.md - Link to analysis files:

**01_figure_name.png** (318 KB)
- **Description**: Brief description
- **Analysis file**: `../analysis_files/figures/01_figure_name.md` - Detailed analysis

In analysis_files/MANIFEST.md - Link back to figures:

#### 01_figure_name.md
- **Figure file**: `figures/curation_impact_3cat/01_figure_name.png`
- **Purpose**: Detailed analysis and interpretation

In root MANIFEST.md - Reference both:

#### `analysis_files/` (~90 KB) **[NEW - ITERATION 2]**
- **Purpose**: Separate markdown files for figure analyses
- **Token efficiency**: ~98% reduction vs notebooks
- **Links to**: figures/ directory

This creates a navigable web of documentation.

MANIFEST Template File

Create MANIFEST_TEMPLATE.md in project root as a starting point. See the template in the Curation_Paper_figures project for a complete example with all sections and guidance.

Commands for MANIFEST Management

/generate-manifest Command

Create this command in .claude/commands/generate-manifest.md (or symlink from global commands):

Purpose : Automatically generates MANIFEST files by analyzing directory contents

Key Features :

Analyzes directory type (root, data, figures, scripts, documentation)
Extracts file information efficiently (sizes, dates, row counts)
Identifies dependencies by searching code for file references
Maps workflow relationships
Uses AskUserQuestion for ambiguous information
Marks fields requiring user input

Usage :

/generate-manifest              # Interactive mode
/generate-manifest data         # Generate for data/
/generate-manifest figures      # Generate for figures/

Implementation Tips :

Don't read entire large files - use targeted searches
Extract docstrings and header comments from scripts
Use grep to find file references in code
Check first/last cells of notebooks for descriptions
Get row counts with wc -l for CSV files
Target 1000-2000 tokens for root, 500-1000 for subdirectories

/update-manifest Command

Create this command in .claude/commands/update-manifest.md:

Purpose : Quickly updates existing MANIFESTs while preserving user content

Key Features :

Preserves user-entered descriptions and notes
Updates dates, sizes, and file existence
Captures session context (asks "What did you accomplish?")
Three modes: Minimal, Quick (default), Full
Provides update summary

Usage :

/update-manifest              # Update current directory
/update-manifest data         # Update data/MANIFEST.md
/update-manifest --quick      # Force quick mode
/update-manifest --full       # Full re-analysis

Session End Pattern :

/update-manifest              # Capture session progress
/update-skills               # Save new knowledge
/safe-exit                   # Clean exit with notes

MANIFEST Workflow

Initial Setup

Create template :

# Copy MANIFEST_TEMPLATE.md to project root
# Or use /generate-manifest to create from scratch

Generate root MANIFEST :

/generate-manifest
# Choose "root directory"
# Fill in user-specific fields

Generate subdirectory MANIFESTs :

/generate-manifest data
/generate-manifest figures
/generate-manifest scripts
/generate-manifest documentation

Customize MANIFESTs :
- Fill in [USER TO FILL] placeholders
- Add key findings summaries
- Document environment setup
- Add Obsidian notes paths

During Active Development

Start session - Read MANIFESTs for context:

cat MANIFEST.md              # Project overview
cat figures/MANIFEST.md      # If working on figures

Work on project - Normal development

End session - Update MANIFESTs:

/update-manifest              # Captures session progress

After Major Changes

When you:

Add new files or directories
Reorganize structure
Complete major analysis
Make significant changes

Run full regeneration:

/generate-manifest --update     # Full re-analysis

MANIFEST Best Practices

Content Guidelines

Be concise but informative : Target 500-2,000 tokens
Front-load important info : Put critical details first
Use bullet points : Not paragraphs
Include dates : Everything should have timestamps
Think "6 months from now" : What would you need to know?

What to Document

ALWAYS include :

File purpose and key message
Dependencies (inputs)
Outputs (what it generates)
Last modified date
Size for large files

USER FILL fields for :

Key findings (requires understanding)
Priority classification (main vs complementary)
Known issues and TODOs
Environment names
Obsidian note paths

Tip for filling user-specific fields :

Obsidian notes path : Check .claude/project-config file - it often contains the vault path in the obsidian_vault or similar field
Environment name : Check conda env list or look for environment.yml/requirements.txt
Key findings : Analyze generation scripts or read notebook markdown cells for summaries

Auto-generate from code :

File sizes and dates
Row/column counts for data
Dependencies (by searching code)
Script usage (from docstrings)

Documenting New Analysis Notebooks in MANIFEST

Template for Analysis Notebook Entries:

When adding a new analysis notebook to MANIFEST.md, include:

#### `Notebook_Name.ipynb` (file size) **[NEW]**
- **Purpose**: One-sentence objective of the analysis
- **Type**: Category (e.g., "Confounding analysis", "Data enrichment", "Primary analysis")
- **Rationale**: Why this analysis is needed (2-3 sentences explaining motivation)
- **Approach**:
  - Bullet points of analytical steps
  - Key methodological decisions
- **Key Questions**:
  - Question 1 the analysis addresses
  - Question 2 the analysis addresses
- **Depends on**:
  - data/input_file.csv (description)
  - scripts/processing_script.py
- **Generates**:
  - figures/output_dir/figure1.png (what it shows)
  - results/statistics.csv
- **Dataset**: N assemblies/samples, key statistics
- **Last modified**: YYYY-MM-DD
- **Status**: Current state (e.g., "Code optimized", "In progress", "Complete")
- **Execution time**: ~XX minutes
- **Priority**: Role in project (e.g., "Confounding analysis - validates main findings")
- **Note**: Important caveats or special considerations

Example: Technology/Temporal Confounding Analysis

#### `Technology_Temporal_Analysis.ipynb` (32 KB) **[NEW]**
- **Purpose**: Investigate whether sequencing technology (CLR vs HiFi) and temporal trends confound the curation method comparisons
- **Type**: Confounding analysis - technology and temporal effects
- **Rationale**: Sequencing technology evolved rapidly (CLR → HiFi), and assembly methods may correlate with technology era. Need to determine if observed quality differences are due to curation methods or underlying technology/temporal confounders.
- **Approach**:
  - Technology-separated analysis: Compare categories split by sequencing technology
  - Temporal trend analysis: Plot quality metrics over time (2019-2025)
  - HiFi-only temporal analysis: Eliminate technology confounding
- **Key Questions**:
  - Are quality differences consistent across technologies (HiFi vs CLR)?
  - Do quality metrics improve over time, and is this technology-driven?
  - Do temporal trends persist when technology is held constant?
- **Depends on**:
  - `data/vgp_assemblies_3categories_tech.csv` (3-category data with technology inference)
  - scipy for statistical tests (Mann-Whitney U, Spearman correlation)
- **Generates**:
  - `figures/technology_temporal/01_prialt_tech_comparison.png` (HiFi vs CLR)
  - `figures/technology_temporal/04_hifi_only_temporal_trends.png` (HiFi-only, 2021-2025)
  - `figures/technology_temporal/technology_effects_statistics.csv`
  - `figures/technology_temporal/temporal_trends_hifi_only_statistics.csv`
- **Dataset**: 541 VGP assemblies, 464/541 (86%) with technology assignment (355 HiFi, 107 CLR)
- **Last modified**: 2026-02-25
- **Status**: Code optimized (DPI reduced 300→150 to prevent image loading errors)
- **Execution time**: ~10-15 minutes
- **Priority**: Confounding analysis - validates that curation effects are not driven by technology or temporal biases
- **Note**: Figure sizes reduced (DPI 150) to prevent notebook image loading errors

Benefits of Comprehensive Documentation:

Resume work easily : Understand analysis purpose months later
Collaboration : Others can understand without reading code
Dependency tracking : Know what data/scripts are required
Output tracking : Know what files this notebook generates
Execution planning : Estimate time needed to re-run
Prioritization : Understand role in overall project

Update Frequency

End of every session : Quick update with /update-manifest
After adding files : Note new files, mark as [TO BE DOCUMENTED]
After major changes : Full regeneration with /generate-manifest
Before sharing : Ensure MANIFESTs are current

MANIFEST Session Context Updates

Structure for "Recent Session Work" Section:

When running /update-manifest, document the session in this format:

**Recent Session Work** (YYYY-MM-DD):
- **[Action taken]**:
  - Specific change 1 with details
  - Specific change 2 with quantitative results
  - Why this change was made
- Brief description of problem solved or feature added
- Any updates to directory structure or workflow

Example Session Documentation:

**Recent Session Work** (2026-02-25):
- **Updated Technology_Temporal_Analysis.ipynb code**:
  - Reduced DPI from 300→150 in global settings and all savefig calls
  - Reduced figure sizes: 01_prialt (15×10→12×8), 02_all_tech (18×12→14×9)
  - Prevents image loading errors while maintaining publication quality
  - File sizes reduced by ~75% (combination of DPI and size reduction)
- Added Technology_Temporal_Analysis.ipynb entry to root MANIFEST
- Updated directory structure to include figures/technology_temporal/
- Added HiFi-only temporal analysis section to eliminate technology confounding

Next Steps Format:

Prioritize and number action items:

**Next Steps**:
1. **[High priority action]** - [Why it's important]
2. **[Medium priority]** - [Context]
3. **[Future work]** - [When to tackle]

Example:

**Next Steps**:
1. **Re-run Technology_Temporal_Analysis.ipynb** - Execute cells to regenerate figures with optimized 150 DPI settings
2. **Generate notebook with temporal effect and only HiFi data** - Already added to notebook, need to execute new cells
3. **Write integrated manuscript Results section** - Combine findings from all 5 clades into cohesive narrative

This structure makes it easy to resume work by quickly understanding what was done and what's next.

Integration with Project Structure

Add MANIFESTs to standard project organization:

project/
├── MANIFEST.md                 # Root project index
├── MANIFEST_TEMPLATE.md        # Template for new MANIFESTs
├── data/
│   ├── MANIFEST.md            # Data inventory
│   └── [data files]
├── figures/
│   ├── MANIFEST.md            # Figure catalog
│   └── [figure files]
├── scripts/
│   ├── MANIFEST.md            # Script documentation
│   └── [script files]
├── documentation/
│   ├── MANIFEST.md            # Doc organization
│   └── [doc files]
└── [other directories with MANIFESTs as needed]

Common MANIFEST Patterns

For Data Directories

Emphasize:

Data provenance and source
File formats and structure
Original vs processed versions
Data dependencies and lineage
Size and scale information

For Figure Directories

Emphasize:

Generating code (which notebook/script)
Data sources
Manuscript figure numbers
Key messages and findings
Figure dependencies

For Script Directories

Emphasize:

Input/output relationships
Execution order
Usage examples
Dependencies (packages)
Script purpose and logic

For Documentation Directories

Emphasize:

Entry points (where to start)
Document categories
Active vs archived
Session summaries location
Critical documents for resuming work

Real-World Example

See the Curation_Paper_figures project for a complete implementation:

5 MANIFEST files (root + 4 subdirectories)
~10,000 lines of documentation
Covers 3 notebooks, 12 scripts, 18 figures, 48 doc files
Enables session startup in 2,000 tokens vs 15,000+ tokens
Includes working examples of all MANIFEST types

Benefits Summary

For Claude Code :

85-90% reduction in session startup tokens
Fast project orientation (2-3 MANIFESTs vs 20+ files)
Clear entry points and workflow understanding
Efficient file navigation without exploring

For Users :

Quick work resumption (read 1 MANIFEST vs 10+ files)
Clear project documentation
Session continuity (Notes for Resuming Work)
Workflow transparency (dependency maps)

For Teams :

Faster onboarding for new members
Shared understanding of project structure
Clear documentation of decisions
Easier code review (understand context quickly)

Troubleshooting

MANIFEST too long (>2,500 tokens):

Break into subdirectory MANIFESTs
Use "See subdirectory MANIFEST" links
Summarize instead of listing all files

MANIFEST outdated :

Set up session-end habit: /update-manifest before /safe-exit
Use /generate-manifest --update for full refresh
Add "Last Updated" reminders

Too many [USER TO FILL] fields :

Fill in during active work, not after
Use /update-manifest to capture context immediately
Ask user questions during generation for key info

Unclear what to include :

Think: "What would I need to resume work in 6 months?"
Include anything that saves reading a file
Front-load critical information

Integration with Other Skills

This skill works well with:

python-environment - Environment setup and management
claude-collaboration - Team workflow best practices
jupyter-notebook-analysis - Notebook organization standards
data-backup - Backup system should include MANIFESTs
project-sharing - Include MANIFESTs in shared packages

Templates and Tools

Quick Project Setup

# Create standard research project structure
mkdir -p data/{raw,processed,external} notebooks scripts src tests docs results config
touch README.md .gitignore environment.yml

Cookiecutter Templates

Consider using cookiecutter for standardized project templates:

cookiecutter-data-science - Data science projects
cookiecutter-research - Research projects
Custom team templates

References and Resources

Weekly Installs

Repository

delphine-l/claude_global

GitHub Stars

First Seen

Jan 24, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

opencode57

codex56

gemini-cli54

github-copilot51

cursor49

kimi-cli47

React 组合模式指南：Vercel 组件架构最佳实践，提升代码可维护性

120,000 周安装

Convention: hyphens for file names, underscores for Python modules

Include version/date for important outputs

✅ report-2026-01-23.pdf or model-v2.pkl
❌ report-final-final-v3.pdf

Keep: _UPDATED.txt (current version)

Deprecate: others (superseded)

项目文件夹组织最佳实践指南：代码、数据、文档结构规范与命名规则

🇨🇳中文介绍

文件夹组织最佳实践

何时使用此技能

核心原则

标准项目结构

研究/分析项目

相关 Skills

开发项目

生物信息学/工作流项目

使用 Notebooks 的数据分析项目

文件命名规范

通用规则

编号序列

数据文件

目录管理最佳实践

版本控制内容

.gitignore 模板

会话笔记存储

重组后

数据组织

原始数据是神圣的

处理后的数据层次结构

文档标准

README.md 要点

代码文档

变更文档最佳实践

变更摘要模板

已更新的文档

要检查的文件

常见路径模式

提示

重组后的验证和清理

1. 验证文件计数

2. 检查根目录

3. 移除临时文件

4. 显示最终结构

验证清单

清理和维护

定期维护任务

项目结束清单

项目清理：识别必要文件

1. 分析 Notebooks 以查找使用的图表

2. 将图表映射到生成脚本

3. 组织弃用文件

4. 记录保留内容

5. 验证清单

此方法的优点

识别要弃用的文件

文档组织策略

结构

实现

文档 README 模板

根目录保留内容

优点

常见错误

用于令牌高效导航的 MANIFEST 系统

问题陈述

令牌效率影响

什么是 MANIFEST？

MANIFEST 结构

根目录 MANIFEST 模板

子目录 MANIFEST 模板

跨目录链接 MANIFEST

MANIFEST 模板文件

MANIFEST 管理命令

/generate-manifest 命令

🇺🇸English

Folder Organization Best Practices

When to Use This Skill

Core Principles

Standard Project Structure

Research/Analysis Projects

Development Projects

Bioinformatics/Workflow Projects

Data Analysis Projects with Notebooks

File Naming Conventions

General Rules

Numbered Sequences

Data Files