仪器数据转Allotrope ASM格式工具 - 自动化数据转换与标准化解决方案

instrument-data-to-allotrope by anthropics/knowledge-work-plugins

173 周安装量

10,900 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/anthropics/knowledge-work-plugins --skill instrument-data-to-allotrope

自动化科研工具数据处理

🇨🇳中文介绍

仪器数据到 Allotrope 转换器

将仪器文件转换为标准化的 Allotrope 简单模型 (ASM) 格式，以便上传到 LIMS、数据湖或移交给数据工程团队。

注意：这是一个示例技能

此技能展示了技能如何支持您的数据工程任务——自动化模式转换、解析仪器输出以及生成可用于生产的代码。

要为您的组织进行自定义：

修改 references/ 目录下的文件以包含您公司的特定模式或本体映射

使用 MCP 服务器连接到定义您模式的系统（例如，您的 LIMS、数据目录或模式注册表）

扩展 scripts/ 目录下的脚本来处理专有仪器格式或内部数据标准

此模式可适用于任何需要在格式之间转换或根据组织标准进行验证的数据转换工作流。

工作流概述

检测仪器类型 从文件内容（自动检测或用户指定）
解析文件 使用 allotropy 库（原生）或灵活的回退解析器
生成输出 :
- ASM JSON（完整的语义结构）
- 扁平化 CSV（二维表格格式）
- Python 解析器代码（用于数据工程师交接）
交付包含摘要和使用说明的文件

不确定时： 如果您不确定如何将字段映射到 ASM（例如，这是原始数据还是计算数据？设备设置还是环境条件？），请向用户澄清。请参考 references/field_classification_guide.md 获取指导，但当存在歧义时，请与用户确认而不是猜测。

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

仪器	计算字段
细胞计数器	活力百分比、稀释调整后的细胞密度值
分光光度计	浓度（来自吸光度）、260/280 比值
酶标仪	来自标准曲线的浓度、%CV
电泳	DIN/RIN、区域浓度、平均大小
qPCR	相对数量、倍数变化

类别	仪器
细胞计数	Vi-CELL BLU, Vi-CELL XR, NucleoCounter
分光光度法	NanoDrop One/Eight/8000, Lunatic
酶标仪	SoftMax Pro, EnVision, Gen5, CLARIOstar
ELISA	SoftMax Pro, BMG MARS, MSD Workbench
qPCR	QuantStudio, Bio-Rad CFX
色谱法	Empower, Chromeleon

检测与解析策略

第 1 层：原生 allotropy 解析（首选）

始终首先尝试 allotropy。 直接检查可用的供应商：

from allotropy.parser_factory import Vendor

# 列出所有支持的供应商
for v in Vendor:
    print(f"{v.name}")

# 常见供应商：
# AGILENT_TAPESTATION_ANALYSIS  (for TapeStation XML)
# BECKMAN_VI_CELL_BLU
# THERMO_FISHER_NANODROP_EIGHT
# MOLDEV_SOFTMAX_PRO
# APPBIO_QUANTSTUDIO
# ... 更多

当用户提供文件时，在回退到手动解析之前，检查 allotropy 是否支持它。 scripts/convert_to_asm.py 中的自动检测仅涵盖 allotropy 供应商的一个子集。

第 2 层：灵活的回退解析

仅在 allotropy 不支持该仪器时使用。 此回退：

不生成 calculated-data-aggregate-document
不包含完整的可追溯性
生成简化的 ASM 结构

使用灵活解析器进行：

列名模糊匹配
从标题中提取单位
从文件结构中提取元数据

第 3 层：PDF 提取

对于仅 PDF 文件，使用 pdfplumber 提取表格，然后应用第 2 层解析。

在编写自定义解析器之前，始终：

检查 allotropy 是否支持它 - 如果可用，使用原生解析器
查找参考 ASM 文件 - 检查 references/examples/ 或询问用户
查看仪器特定指南 - 检查 references/instrument_guides/
根据参考文件进行验证 - 运行 validate_asm.py --reference <file>

需要避免的常见错误

错误	正确方法
清单作为对象	使用 URL 字符串
检测类型小写	使用 "Absorbance" 而不是 "absorbance"
"emission wavelength setting"	对于发射，使用 "detector wavelength setting"
所有测量值在一个文档中	按孔/样本位置分组
缺少程序元数据	提取每次测量的所有设备设置

为数据工程师导出代码

生成科学家可以移交的独立 Python 脚本：

# 导出解析器代码
python scripts/export_parser.py --input "data.csv" --vendor "VI_CELL_BLU" --output "parser_script.py"

除了 pandas/allotropy 外没有外部依赖
包含内联文档
可以在 Jupyter notebooks 中运行
可用于数据管道的生产环境

instrument-data-to-allotrope/
├── SKILL.md                          # 此文件
├── scripts/
│   ├── convert_to_asm.py            # 主转换脚本
│   ├── flatten_asm.py               # ASM → 2D CSV 转换
│   ├── export_parser.py             # 生成独立解析器代码
│   └── validate_asm.py              # 验证 ASM 输出质量
└── references/
    ├── supported_instruments.md     # 包含 Vendor 枚举的完整仪器列表
    ├── asm_schema_overview.md       # ASM 结构参考
    ├── field_classification_guide.md # 不同字段类型的放置位置
    └── flattening_guide.md          # 扁平化工作原理

示例 1：Vi-CELL BLU 文件

User: "Convert this cell counting data to Allotrope format"
[uploads viCell_Results.xlsx]

Claude:
1. Detects Vi-CELL BLU (95% confidence)
2. Converts using allotropy native parser
3. Outputs:
   - viCell_Results_asm.json (full ASM)
   - viCell_Results_flat.csv (2D format)
   - viCell_parser.py (exportable code)

示例 2：代码交接请求

User: "I need to give our data engineer code to parse NanoDrop files"

Claude:
1. Generates self-contained Python script
2. Includes sample input/output
3. Documents all assumptions
4. Provides Jupyter notebook version

示例 3：LIMS 就绪的扁平化输出

User: "Convert this ELISA data to a CSV I can upload to our LIMS"

Claude:
1. Parses plate reader data
2. Generates flattened CSV with columns:
   - sample_identifier, well_position, measurement_value, measurement_unit
   - instrument_serial_number, analysis_datetime, assay_type
3. Validates against common LIMS import requirements

pip install allotropy --break-system-packages

如果 allotropy 原生解析失败：

记录错误以便调试
回退到灵活解析器
向用户报告元数据完整性降低
建议从仪器导出不同格式

在可用时根据 Allotrope 模式验证输出：

import jsonschema
# Schema URLs in references/asm_schema_overview.md

🇺🇸English

Instrument Data to Allotrope Converter

Convert instrument files into standardized Allotrope Simple Model (ASM) format for LIMS upload, data lakes, or handoff to data engineering teams.

Note: This is an Example Skill

This skill demonstrates how skills can support your data engineering tasks—automating schema transformations, parsing instrument outputs, and generating production-ready code.

To customize for your organization:

Modify the references/ files to include your company's specific schemas or ontology mappings

Use an MCP server to connect to systems that define your schemas (e.g., your LIMS, data catalog, or schema registry)

Extend the scripts/ to handle proprietary instrument formats or internal data standards

This pattern can be adapted for any data transformation workflow where you need to convert between formats or validate against organizational standards.

Workflow Overview

Detect instrument type from file contents (auto-detect or user-specified)
Parse file using allotropy library (native) or flexible fallback parser
Generate outputs :
- ASM JSON (full semantic structure)
- Flattened CSV (2D tabular format)
- Python parser code (for data engineer handoff)
Deliver files with summary and usage instructions

When Uncertain: If you're unsure how to map a field to ASM (e.g., is this raw data or calculated? device setting or environmental condition?), ask the user for clarification. Refer to references/field_classification_guide.md for guidance, but when ambiguity remains, confirm with the user rather than guessing.

Quick Start

# Install requirements first
pip install allotropy pandas openpyxl pdfplumber --break-system-packages

# Core conversion
from allotropy.parser_factory import Vendor
from allotropy.to_allotrope import allotrope_from_file

# Convert with allotropy
asm = allotrope_from_file("instrument_data.csv", Vendor.BECKMAN_VI_CELL_BLU)

Output Format Selection

ASM JSON (default) - Full semantic structure with ontology URIs

Best for: LIMS systems expecting ASM, data lakes, long-term archival
Validates against Allotrope schemas

Flattened CSV - 2D tabular representation

Best for: Quick analysis, Excel users, systems without JSON support
Each measurement becomes one row with metadata repeated

Both - Generate both formats for maximum flexibility

Calculated Data Handling

IMPORTANT: Separate raw measurements from calculated/derived values.

Raw data → measurement-document (direct instrument readings)
Calculated data → calculated-data-aggregate-document (derived values)

Calculated values MUST include traceability via data-source-aggregate-document:

"calculated-data-aggregate-document": {
  "calculated-data-document": [{
    "calculated-data-identifier": "SAMPLE_B1_DIN_001",
    "calculated-data-name": "DNA integrity number",
    "calculated-result": {"value": 9.5, "unit": "(unitless)"},
    "data-source-aggregate-document": {
      "data-source-document": [{
        "data-source-identifier": "SAMPLE_B1_MEASUREMENT",
        "data-source-feature": "electrophoresis trace"
      }]
    }
  }]
}

Common calculated fields by instrument type:

Instrument	Calculated Fields
Cell counter	Viability %, cell density dilution-adjusted values
Spectrophotometer	Concentration (from absorbance), 260/280 ratio
Plate reader	Concentrations from standard curve, %CV
Electrophoresis	DIN/RIN, region concentrations, average sizes
qPCR	Relative quantities, fold change

See references/field_classification_guide.md for detailed guidance on raw vs. calculated classification.

Validation

Always validate ASM output before delivering to the user:

python scripts/validate_asm.py output.json
python scripts/validate_asm.py output.json --reference known_good.json  # Compare to reference
python scripts/validate_asm.py output.json --strict  # Treat warnings as errors

Validation Rules:

Based on Allotrope ASM specification (December 2024)
Last updated: 2026-01-07
Source: https://gitlab.com/allotrope-public/asm

Soft Validation Approach: Unknown techniques, units, or sample roles generate warnings (not errors) to allow for forward compatibility. If Allotrope adds new values after December 2024, the validator won't block them—it will flag them for manual verification. Use --strict mode to treat warnings as errors if you need stricter validation.

What it checks:

Correct technique selection (e.g., multi-analyte profiling vs plate reader)
Field naming conventions (space-separated, not hyphenated)
Calculated data has traceability (data-source-aggregate-document)
Unique identifiers exist for measurements and calculated values
Required metadata present
Valid units and sample roles (with soft validation for unknown values)

Supported Instruments

See references/supported_instruments.md for complete list. Key instruments:

Category	Instruments
Cell Counting	Vi-CELL BLU, Vi-CELL XR, NucleoCounter
Spectrophotometry	NanoDrop One/Eight/8000, Lunatic
Plate Readers	SoftMax Pro, EnVision, Gen5, CLARIOstar
ELISA	SoftMax Pro, BMG MARS, MSD Workbench
qPCR	QuantStudio, Bio-Rad CFX
Chromatography	Empower, Chromeleon

Detection & Parsing Strategy

Tier 1: Native allotropy parsing (PREFERRED)

Always try allotropy first. Check available vendors directly:

from allotropy.parser_factory import Vendor

# List all supported vendors
for v in Vendor:
    print(f"{v.name}")

# Common vendors:
# AGILENT_TAPESTATION_ANALYSIS  (for TapeStation XML)
# BECKMAN_VI_CELL_BLU
# THERMO_FISHER_NANODROP_EIGHT
# MOLDEV_SOFTMAX_PRO
# APPBIO_QUANTSTUDIO
# ... many more

When the user provides a file, check if allotropy supports it before falling back to manual parsing. The scripts/convert_to_asm.py auto-detection only covers a subset of allotropy vendors.

Tier 2: Flexible fallback parsing

Only use if allotropy doesn't support the instrument. This fallback:

Does NOT generate calculated-data-aggregate-document
Does NOT include full traceability
Produces simplified ASM structure

Use flexible parser with:

Column name fuzzy matching
Unit extraction from headers
Metadata extraction from file structure

Tier 3: PDF extraction

For PDF-only files, extract tables using pdfplumber, then apply Tier 2 parsing.

Pre-Parsing Checklist

Before writing a custom parser, ALWAYS:

Check if allotropy supports it - Use native parser if available
Find a reference ASM file - Check references/examples/ or ask user
Review instrument-specific guide - Check references/instrument_guides/
Validate against reference - Run validate_asm.py --reference <file>

Common Mistakes to Avoid

Mistake	Correct Approach
Manifest as object	Use URL string
Lowercase detection types	Use "Absorbance" not "absorbance"
"emission wavelength setting"	Use "detector wavelength setting" for emission
All measurements in one document	Group by well/sample location
Missing procedure metadata	Extract ALL device settings per measurement

Code Export for Data Engineers

Generate standalone Python scripts that scientists can hand off:

# Export parser code
python scripts/export_parser.py --input "data.csv" --vendor "VI_CELL_BLU" --output "parser_script.py"

The exported script:

Has no external dependencies beyond pandas/allotropy
Includes inline documentation
Can run in Jupyter notebooks
Is production-ready for data pipelines

File Structure

instrument-data-to-allotrope/
├── SKILL.md                          # This file
├── scripts/
│   ├── convert_to_asm.py            # Main conversion script
│   ├── flatten_asm.py               # ASM → 2D CSV conversion
│   ├── export_parser.py             # Generate standalone parser code
│   └── validate_asm.py              # Validate ASM output quality
└── references/
    ├── supported_instruments.md     # Full instrument list with Vendor enums
    ├── asm_schema_overview.md       # ASM structure reference
    ├── field_classification_guide.md # Where to put different field types
    └── flattening_guide.md          # How flattening works

Usage Examples

Example 1: Vi-CELL BLU file

User: "Convert this cell counting data to Allotrope format"
[uploads viCell_Results.xlsx]

Claude:
1. Detects Vi-CELL BLU (95% confidence)
2. Converts using allotropy native parser
3. Outputs:
   - viCell_Results_asm.json (full ASM)
   - viCell_Results_flat.csv (2D format)
   - viCell_parser.py (exportable code)

Example 2: Request for code handoff

User: "I need to give our data engineer code to parse NanoDrop files"

Claude:
1. Generates self-contained Python script
2. Includes sample input/output
3. Documents all assumptions
4. Provides Jupyter notebook version

Example 3: LIMS-ready flattened output

User: "Convert this ELISA data to a CSV I can upload to our LIMS"

Claude:
1. Parses plate reader data
2. Generates flattened CSV with columns:
   - sample_identifier, well_position, measurement_value, measurement_unit
   - instrument_serial_number, analysis_datetime, assay_type
3. Validates against common LIMS import requirements

Implementation Notes

Installing allotropy

pip install allotropy --break-system-packages

Handling parse failures

If allotropy native parsing fails:

Log the error for debugging
Fall back to flexible parser
Report reduced metadata completeness to user
Suggest exporting different format from instrument

ASM Schema Validation

Validate output against Allotrope schemas when available:

import jsonschema
# Schema URLs in references/asm_schema_overview.md

Weekly Installs

127

Repository

anthropics/know…-plugins

GitHub Stars

8.9K

First Seen

Jan 31, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykWarn

Installed on

opencode112

codex111

gemini-cli106

github-copilot102

claude-code101

cursor97

通过 LiteLLM 代理让 Claude Code 对接 GitHub Copilot 运行 | 高级变通方案指南

40,000 周安装