lamindb by davila7/claude-code-templates
npx skills add https://github.com/davila7/claude-code-templates --skill lamindbLaminDB 是一个面向生物学的开源数据框架,旨在使数据可查询、可追溯、可重现且符合 FAIR(可查找、可访问、可互操作、可重用)原则。它通过单一的 Python API 提供了一个统一平台,结合了湖仓架构、谱系追踪、特征存储、生物本体论、LIMS(实验室信息管理系统)和 ELN(电子实验记录本)功能。
核心价值主张:
在以下情况下使用此技能:
LaminDB 提供六个相互关联的能力领域,每个领域在参考资料文件夹中都有详细文档。
核心实体:
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
关键工作流:
ln.track() 和 ln.finish() 追踪 notebook/脚本执行artifact.view_lineage() 可视化数据谱系图参考: references/core-concepts.md - 阅读此文件以获取关于 Artifacts、Records、Runs、Transforms、Features、版本控制和谱系追踪的详细信息。
查询能力:
get()、one()、one_or_none() 进行单条记录检索__gt、__lte、__contains、__startswith)关键工作流:
参考: references/data-management.md - 阅读此文件以获取全面的查询模式、筛选示例、流式传输策略和数据组织最佳实践。
管理流程:
模式类型:
支持的数据类型:
关键工作流:
DataFrameCurator 或 AnnDataCurator 进行验证.cat.standardize() 标准化值.cat.add_ontology() 映射到本体论参考: references/annotation-validation.md - 阅读此文件以获取详细的管理工作流、模式设计模式、处理验证错误和最佳实践。
可用的本体论(通过 Bionty):
关键工作流:
bt.CellType.import_source() 导入公共本体论参考: references/ontologies.md - 阅读此文件以获取全面的本体论操作、标准化策略、层次结构导航和标注工作流。
工作流管理器:
MLOps 平台:
存储系统:
数组存储:
可视化:
版本控制:
参考: references/integrations.md - 阅读此文件以获取第三方系统的集成模式、代码示例和故障排除。
安装:
uv pip install lamindbuv pip install 'lamindb[gcp,zarr,fcs]'实例类型:
存储选项:
配置:
部署模式:
参考: references/setup-deployment.md - 阅读此文件以获取详细的安装、配置、存储设置、数据库管理、安全最佳实践和故障排除。
import lamindb as ln
import bionty as bt
import anndata as ad
# 开始追踪
ln.track(params={"analysis": "scRNA-seq QC and annotation"})
# 导入细胞类型本体论
bt.CellType.import_source()
# 加载数据
adata = ad.read_h5ad("raw_counts.h5ad")
# 验证并标准化细胞类型
adata.obs["cell_type"] = bt.CellType.standardize(adata.obs["cell_type"])
# 使用模式进行管理
curator = ln.curators.AnnDataCurator(adata, schema)
curator.validate()
artifact = curator.save_artifact(key="scrna/validated.h5ad")
# 链接本体论标注
cell_types = bt.CellType.from_values(adata.obs.cell_type)
artifact.feature_sets.add_ontology(cell_types)
ln.finish()
import lamindb as ln
# 注册多个实验
for i, file in enumerate(data_files):
artifact = ln.Artifact.from_anndata(
ad.read_h5ad(file),
key=f"scrna/batch_{i}.h5ad",
description=f"scRNA-seq batch {i}"
).save()
# 使用特征进行标注
artifact.features.add_values({
"batch": i,
"tissue": tissues[i],
"condition": conditions[i]
})
# 跨所有实验查询
immune_datasets = ln.Artifact.filter(
key__startswith="scrna/",
tissue="PBMC",
condition="treated"
).to_dataframe()
# 加载特定数据集
for artifact in immune_datasets:
adata = artifact.load()
# 分析
import lamindb as ln
import wandb
# 初始化两个系统
wandb.init(project="drug-response", name="exp-42")
ln.track(params={"model": "random_forest", "n_estimators": 100})
# 从 LaminDB 加载训练数据
train_artifact = ln.Artifact.get(key="datasets/train.parquet")
train_data = train_artifact.load()
# 训练模型
model = train_model(train_data)
# 记录到 W&B
wandb.log({"accuracy": 0.95})
# 在 LaminDB 中保存模型并链接到 W&B
import joblib
joblib.dump(model, "model.pkl")
model_artifact = ln.Artifact("model.pkl", key="models/exp-42.pkl").save()
model_artifact.features.add_values({"wandb_run_id": wandb.run.id})
ln.finish()
wandb.finish()
# 在 Nextflow 进程脚本中
import lamindb as ln
ln.track()
# 加载输入 Artifact
input_artifact = ln.Artifact.get(key="raw/batch_${batch_id}.fastq.gz")
input_path = input_artifact.cache()
# 处理(比对、定量等)
# ... Nextflow 进程逻辑 ...
# 保存输出
output_artifact = ln.Artifact(
"counts.csv",
key="processed/batch_${batch_id}_counts.csv"
).save()
ln.finish()
要开始有效使用 LaminDB:
安装与设置 (references/setup-deployment.md)
lamin login 进行身份验证lamin init --storage ... 初始化实例学习核心概念 (references/core-concepts.md)
ln.track() 和 ln.finish()掌握查询 (references/data-management.md)
设置验证 (references/annotation-validation.md)
集成本体论 (references/ontologies.md)
连接工具 (references/integrations.md)
在使用 LaminDB 时遵循以下原则:
追踪一切:在每次分析开始时使用 ln.track() 以自动捕获谱系
尽早验证:在深入分析之前定义模式并验证数据
使用本体论:利用公共生物本体论进行标准化标注
使用键组织:分层结构组织 Artifact 键(例如 project/experiment/batch/file.h5ad)
先查询元数据:在加载大文件之前进行筛选和搜索
版本化,不复制:使用内置版本控制,而不是为修改创建新键
使用特征标注:定义类型化特征以实现可查询的元数据
充分记录:为 Artifacts、模式和 Transforms 添加描述
利用谱系:使用 view_lineage() 理解数据来源
从本地开始,扩展到云端:使用 SQLite 在本地开发,使用 PostgreSQL 部署到云端
此技能包含按能力组织的全面参考文档:
references/core-concepts.md - Artifacts、Records、Runs、Transforms、Features、版本控制、谱系references/data-management.md - 查询、筛选、搜索、流式传输、数据组织references/annotation-validation.md - 模式设计、管理工作流、验证策略references/ontologies.md - 生物本体论管理、标准化、层次结构references/integrations.md - 工作流管理器、MLOps 平台、存储系统、工具references/setup-deployment.md - 安装、配置、部署、故障排除根据手头任务所需的特定 LaminDB 能力,阅读相关的参考文件。
每周安装数
122
仓库
GitHub 星标数
22.6K
首次出现
2026年1月21日
安全审计
安装于
claude-code104
opencode98
gemini-cli93
cursor93
antigravity84
codex82
LaminDB is an open-source data framework for biology designed to make data queryable, traceable, reproducible, and FAIR (Findable, Accessible, Interoperable, Reusable). It provides a unified platform that combines lakehouse architecture, lineage tracking, feature stores, biological ontologies, LIMS (Laboratory Information Management System), and ELN (Electronic Lab Notebook) capabilities through a single Python API.
Core Value Proposition:
Use this skill when:
LaminDB provides six interconnected capability areas, each documented in detail in the references folder.
Core entities:
Key workflows:
ln.track() and ln.finish()artifact.view_lineage()Reference: references/core-concepts.md - Read this for detailed information on artifacts, records, runs, transforms, features, versioning, and lineage tracking.
Query capabilities:
get(), one(), one_or_none()__gt, __lte, __contains, __startswith)Key workflows:
Reference: references/data-management.md - Read this for comprehensive query patterns, filtering examples, streaming strategies, and data organization best practices.
Curation process:
Schema types:
Supported data types:
Key workflows:
DataFrameCurator or AnnDataCurator for validation.cat.standardize().cat.add_ontology()Reference: references/annotation-validation.md - Read this for detailed curation workflows, schema design patterns, handling validation errors, and best practices.
Available ontologies (via Bionty):
Key workflows:
bt.CellType.import_source()Reference: references/ontologies.md - Read this for comprehensive ontology operations, standardization strategies, hierarchy navigation, and annotation workflows.
Workflow managers:
MLOps platforms:
Storage systems:
Array stores:
Visualization:
Version control:
Reference: references/integrations.md - Read this for integration patterns, code examples, and troubleshooting for third-party systems.
Installation:
uv pip install lamindbuv pip install 'lamindb[gcp,zarr,fcs]'Instance types:
Storage options:
Configuration:
Deployment patterns:
Reference: references/setup-deployment.md - Read this for detailed installation, configuration, storage setup, database management, security best practices, and troubleshooting.
import lamindb as ln
import bionty as bt
import anndata as ad
# Start tracking
ln.track(params={"analysis": "scRNA-seq QC and annotation"})
# Import cell type ontology
bt.CellType.import_source()
# Load data
adata = ad.read_h5ad("raw_counts.h5ad")
# Validate and standardize cell types
adata.obs["cell_type"] = bt.CellType.standardize(adata.obs["cell_type"])
# Curate with schema
curator = ln.curators.AnnDataCurator(adata, schema)
curator.validate()
artifact = curator.save_artifact(key="scrna/validated.h5ad")
# Link ontology annotations
cell_types = bt.CellType.from_values(adata.obs.cell_type)
artifact.feature_sets.add_ontology(cell_types)
ln.finish()
import lamindb as ln
# Register multiple experiments
for i, file in enumerate(data_files):
artifact = ln.Artifact.from_anndata(
ad.read_h5ad(file),
key=f"scrna/batch_{i}.h5ad",
description=f"scRNA-seq batch {i}"
).save()
# Annotate with features
artifact.features.add_values({
"batch": i,
"tissue": tissues[i],
"condition": conditions[i]
})
# Query across all experiments
immune_datasets = ln.Artifact.filter(
key__startswith="scrna/",
tissue="PBMC",
condition="treated"
).to_dataframe()
# Load specific datasets
for artifact in immune_datasets:
adata = artifact.load()
# Analyze
import lamindb as ln
import wandb
# Initialize both systems
wandb.init(project="drug-response", name="exp-42")
ln.track(params={"model": "random_forest", "n_estimators": 100})
# Load training data from LaminDB
train_artifact = ln.Artifact.get(key="datasets/train.parquet")
train_data = train_artifact.load()
# Train model
model = train_model(train_data)
# Log to W&B
wandb.log({"accuracy": 0.95})
# Save model in LaminDB with W&B linkage
import joblib
joblib.dump(model, "model.pkl")
model_artifact = ln.Artifact("model.pkl", key="models/exp-42.pkl").save()
model_artifact.features.add_values({"wandb_run_id": wandb.run.id})
ln.finish()
wandb.finish()
# In Nextflow process script
import lamindb as ln
ln.track()
# Load input artifact
input_artifact = ln.Artifact.get(key="raw/batch_${batch_id}.fastq.gz")
input_path = input_artifact.cache()
# Process (alignment, quantification, etc.)
# ... Nextflow process logic ...
# Save output
output_artifact = ln.Artifact(
"counts.csv",
key="processed/batch_${batch_id}_counts.csv"
).save()
ln.finish()
To start using LaminDB effectively:
Installation & Setup (references/setup-deployment.md)
lamin loginlamin init --storage ...Learn Core Concepts (references/core-concepts.md)
ln.track() and ln.finish() in workflowsMaster Querying (references/data-management.md)
Follow these principles when working with LaminDB:
Track everything : Use ln.track() at the start of every analysis for automatic lineage capture
Validate early : Define schemas and validate data before extensive analysis
Use ontologies : Leverage public biological ontologies for standardized annotations
Organize with keys : Structure artifact keys hierarchically (e.g., project/experiment/batch/file.h5ad)
Query metadata first : Filter and search before loading large files
Version, don't duplicate : Use built-in versioning instead of creating new keys for modifications
Annotate with features : Define typed features for queryable metadata
Document thoroughly : Add descriptions to artifacts, schemas, and transforms
Leverage lineage : Use view_lineage() to understand data provenance
Start local, scale cloud : Develop locally with SQLite, deploy to cloud with PostgreSQL
This skill includes comprehensive reference documentation organized by capability:
references/core-concepts.md - Artifacts, records, runs, transforms, features, versioning, lineagereferences/data-management.md - Querying, filtering, searching, streaming, organizing datareferences/annotation-validation.md - Schema design, curation workflows, validation strategiesreferences/ontologies.md - Biological ontology management, standardization, hierarchiesreferences/integrations.md - Workflow managers, MLOps platforms, storage systems, toolsreferences/setup-deployment.md - Installation, configuration, deployment, troubleshootingRead the relevant reference file(s) based on the specific LaminDB capability needed for the task at hand.
Weekly Installs
122
Repository
GitHub Stars
22.6K
First Seen
Jan 21, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
claude-code104
opencode98
gemini-cli93
cursor93
antigravity84
codex82
免费AI数据抓取智能体:自动化收集、丰富与存储网站/API数据
1,100 周安装
Set Up Validation (references/annotation-validation.md)
Integrate Ontologies (references/ontologies.md)
Connect Tools (references/integrations.md)