instrument-data-to-allotrope by anthropics/knowledge-work-plugins
npx skills add https://github.com/anthropics/knowledge-work-plugins --skill instrument-data-to-allotrope将仪器文件转换为标准化的 Allotrope 简单模型 (ASM) 格式,以便上传到 LIMS、数据湖或移交给数据工程团队。
注意:这是一个示例技能
此技能展示了技能如何支持您的数据工程任务——自动化模式转换、解析仪器输出以及生成可用于生产的代码。
要为您的组织进行自定义:
- 修改
references/目录下的文件以包含您公司的特定模式或本体映射- 使用 MCP 服务器连接到定义您模式的系统(例如,您的 LIMS、数据目录或模式注册表)
- 扩展
scripts/目录下的脚本来处理专有仪器格式或内部数据标准
此模式可适用于任何需要在格式之间转换或根据组织标准进行验证的数据转换工作流。
不确定时: 如果您不确定如何将字段映射到 ASM(例如,这是原始数据还是计算数据?设备设置还是环境条件?),请向用户澄清。请参考
references/field_classification_guide.md获取指导,但当存在歧义时,请与用户确认而不是猜测。
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
# 首先安装依赖
pip install allotropy pandas openpyxl pdfplumber --break-system-packages
# 核心转换
from allotropy.parser_factory import Vendor
from allotropy.to_allotrope import allotrope_from_file
# 使用 allotropy 转换
asm = allotrope_from_file("instrument_data.csv", Vendor.BECKMAN_VI_CELL_BLU)
ASM JSON(默认) - 包含本体 URI 的完整语义结构
扁平化 CSV - 二维表格表示
两者 - 生成两种格式以获得最大灵活性
重要: 将原始测量值与计算/派生值分开。
measurement-document(直接仪器读数)calculated-data-aggregate-document(派生值)计算值必须通过 data-source-aggregate-document 包含可追溯性:
"calculated-data-aggregate-document": {
"calculated-data-document": [{
"calculated-data-identifier": "SAMPLE_B1_DIN_001",
"calculated-data-name": "DNA integrity number",
"calculated-result": {"value": 9.5, "unit": "(unitless)"},
"data-source-aggregate-document": {
"data-source-document": [{
"data-source-identifier": "SAMPLE_B1_MEASUREMENT",
"data-source-feature": "electrophoresis trace"
}]
}
}]
}
按仪器类型划分的常见计算字段:
| 仪器 | 计算字段 |
|---|---|
| 细胞计数器 | 活力百分比、稀释调整后的细胞密度值 |
| 分光光度计 | 浓度(来自吸光度)、260/280 比值 |
| 酶标仪 | 来自标准曲线的浓度、%CV |
| 电泳 | DIN/RIN、区域浓度、平均大小 |
| qPCR | 相对数量、倍数变化 |
有关原始数据与计算数据分类的详细指导,请参阅 references/field_classification_guide.md。
在交付给用户之前,始终验证 ASM 输出:
python scripts/validate_asm.py output.json
python scripts/validate_asm.py output.json --reference known_good.json # 与参考文件比较
python scripts/validate_asm.py output.json --strict # 将警告视为错误
验证规则:
软验证方法: 未知技术、单位或样本角色会生成警告(而非错误),以允许向前兼容。如果 Allotrope 在 2024 年 12 月之后添加了新值,验证器不会阻止它们——它会标记它们以进行手动验证。如果您需要更严格的验证,请使用 --strict 模式将警告视为错误。
验证内容:
data-source-aggregate-document)完整列表请参阅 references/supported_instruments.md。关键仪器:
| 类别 | 仪器 |
|---|---|
| 细胞计数 | Vi-CELL BLU, Vi-CELL XR, NucleoCounter |
| 分光光度法 | NanoDrop One/Eight/8000, Lunatic |
| 酶标仪 | SoftMax Pro, EnVision, Gen5, CLARIOstar |
| ELISA | SoftMax Pro, BMG MARS, MSD Workbench |
| qPCR | QuantStudio, Bio-Rad CFX |
| 色谱法 | Empower, Chromeleon |
始终首先尝试 allotropy。 直接检查可用的供应商:
from allotropy.parser_factory import Vendor
# 列出所有支持的供应商
for v in Vendor:
print(f"{v.name}")
# 常见供应商:
# AGILENT_TAPESTATION_ANALYSIS (for TapeStation XML)
# BECKMAN_VI_CELL_BLU
# THERMO_FISHER_NANODROP_EIGHT
# MOLDEV_SOFTMAX_PRO
# APPBIO_QUANTSTUDIO
# ... 更多
当用户提供文件时,在回退到手动解析之前,检查 allotropy 是否支持它。 scripts/convert_to_asm.py 中的自动检测仅涵盖 allotropy 供应商的一个子集。
仅在 allotropy 不支持该仪器时使用。 此回退:
calculated-data-aggregate-document使用灵活解析器进行:
对于仅 PDF 文件,使用 pdfplumber 提取表格,然后应用第 2 层解析。
在编写自定义解析器之前,始终:
references/examples/ 或询问用户references/instrument_guides/validate_asm.py --reference <file>| 错误 | 正确方法 |
|---|---|
| 清单作为对象 | 使用 URL 字符串 |
| 检测类型小写 | 使用 "Absorbance" 而不是 "absorbance" |
| "emission wavelength setting" | 对于发射,使用 "detector wavelength setting" |
| 所有测量值在一个文档中 | 按孔/样本位置分组 |
| 缺少程序元数据 | 提取每次测量的所有设备设置 |
生成科学家可以移交的独立 Python 脚本:
# 导出解析器代码
python scripts/export_parser.py --input "data.csv" --vendor "VI_CELL_BLU" --output "parser_script.py"
导出的脚本:
instrument-data-to-allotrope/
├── SKILL.md # 此文件
├── scripts/
│ ├── convert_to_asm.py # 主转换脚本
│ ├── flatten_asm.py # ASM → 2D CSV 转换
│ ├── export_parser.py # 生成独立解析器代码
│ └── validate_asm.py # 验证 ASM 输出质量
└── references/
├── supported_instruments.md # 包含 Vendor 枚举的完整仪器列表
├── asm_schema_overview.md # ASM 结构参考
├── field_classification_guide.md # 不同字段类型的放置位置
└── flattening_guide.md # 扁平化工作原理
User: "Convert this cell counting data to Allotrope format"
[uploads viCell_Results.xlsx]
Claude:
1. Detects Vi-CELL BLU (95% confidence)
2. Converts using allotropy native parser
3. Outputs:
- viCell_Results_asm.json (full ASM)
- viCell_Results_flat.csv (2D format)
- viCell_parser.py (exportable code)
User: "I need to give our data engineer code to parse NanoDrop files"
Claude:
1. Generates self-contained Python script
2. Includes sample input/output
3. Documents all assumptions
4. Provides Jupyter notebook version
User: "Convert this ELISA data to a CSV I can upload to our LIMS"
Claude:
1. Parses plate reader data
2. Generates flattened CSV with columns:
- sample_identifier, well_position, measurement_value, measurement_unit
- instrument_serial_number, analysis_datetime, assay_type
3. Validates against common LIMS import requirements
pip install allotropy --break-system-packages
如果 allotropy 原生解析失败:
在可用时根据 Allotrope 模式验证输出:
import jsonschema
# Schema URLs in references/asm_schema_overview.md
每周安装次数
127
代码仓库
GitHub 星标数
8.9K
首次出现
Jan 31, 2026
安全审计
安装于
opencode112
codex111
gemini-cli106
github-copilot102
claude-code101
cursor97
Convert instrument files into standardized Allotrope Simple Model (ASM) format for LIMS upload, data lakes, or handoff to data engineering teams.
Note: This is an Example Skill
This skill demonstrates how skills can support your data engineering tasks—automating schema transformations, parsing instrument outputs, and generating production-ready code.
To customize for your organization:
- Modify the
references/files to include your company's specific schemas or ontology mappings- Use an MCP server to connect to systems that define your schemas (e.g., your LIMS, data catalog, or schema registry)
- Extend the
scripts/to handle proprietary instrument formats or internal data standards
This pattern can be adapted for any data transformation workflow where you need to convert between formats or validate against organizational standards.
When Uncertain: If you're unsure how to map a field to ASM (e.g., is this raw data or calculated? device setting or environmental condition?), ask the user for clarification. Refer to
references/field_classification_guide.mdfor guidance, but when ambiguity remains, confirm with the user rather than guessing.
# Install requirements first
pip install allotropy pandas openpyxl pdfplumber --break-system-packages
# Core conversion
from allotropy.parser_factory import Vendor
from allotropy.to_allotrope import allotrope_from_file
# Convert with allotropy
asm = allotrope_from_file("instrument_data.csv", Vendor.BECKMAN_VI_CELL_BLU)
ASM JSON (default) - Full semantic structure with ontology URIs
Flattened CSV - 2D tabular representation
Both - Generate both formats for maximum flexibility
IMPORTANT: Separate raw measurements from calculated/derived values.
measurement-document (direct instrument readings)calculated-data-aggregate-document (derived values)Calculated values MUST include traceability via data-source-aggregate-document:
"calculated-data-aggregate-document": {
"calculated-data-document": [{
"calculated-data-identifier": "SAMPLE_B1_DIN_001",
"calculated-data-name": "DNA integrity number",
"calculated-result": {"value": 9.5, "unit": "(unitless)"},
"data-source-aggregate-document": {
"data-source-document": [{
"data-source-identifier": "SAMPLE_B1_MEASUREMENT",
"data-source-feature": "electrophoresis trace"
}]
}
}]
}
Common calculated fields by instrument type:
| Instrument | Calculated Fields |
|---|---|
| Cell counter | Viability %, cell density dilution-adjusted values |
| Spectrophotometer | Concentration (from absorbance), 260/280 ratio |
| Plate reader | Concentrations from standard curve, %CV |
| Electrophoresis | DIN/RIN, region concentrations, average sizes |
| qPCR | Relative quantities, fold change |
See references/field_classification_guide.md for detailed guidance on raw vs. calculated classification.
Always validate ASM output before delivering to the user:
python scripts/validate_asm.py output.json
python scripts/validate_asm.py output.json --reference known_good.json # Compare to reference
python scripts/validate_asm.py output.json --strict # Treat warnings as errors
Validation Rules:
Soft Validation Approach: Unknown techniques, units, or sample roles generate warnings (not errors) to allow for forward compatibility. If Allotrope adds new values after December 2024, the validator won't block them—it will flag them for manual verification. Use --strict mode to treat warnings as errors if you need stricter validation.
What it checks:
data-source-aggregate-document)See references/supported_instruments.md for complete list. Key instruments:
| Category | Instruments |
|---|---|
| Cell Counting | Vi-CELL BLU, Vi-CELL XR, NucleoCounter |
| Spectrophotometry | NanoDrop One/Eight/8000, Lunatic |
| Plate Readers | SoftMax Pro, EnVision, Gen5, CLARIOstar |
| ELISA | SoftMax Pro, BMG MARS, MSD Workbench |
| qPCR | QuantStudio, Bio-Rad CFX |
| Chromatography | Empower, Chromeleon |
Always try allotropy first. Check available vendors directly:
from allotropy.parser_factory import Vendor
# List all supported vendors
for v in Vendor:
print(f"{v.name}")
# Common vendors:
# AGILENT_TAPESTATION_ANALYSIS (for TapeStation XML)
# BECKMAN_VI_CELL_BLU
# THERMO_FISHER_NANODROP_EIGHT
# MOLDEV_SOFTMAX_PRO
# APPBIO_QUANTSTUDIO
# ... many more
When the user provides a file, check if allotropy supports it before falling back to manual parsing. The scripts/convert_to_asm.py auto-detection only covers a subset of allotropy vendors.
Only use if allotropy doesn't support the instrument. This fallback:
calculated-data-aggregate-documentUse flexible parser with:
For PDF-only files, extract tables using pdfplumber, then apply Tier 2 parsing.
Before writing a custom parser, ALWAYS:
references/examples/ or ask userreferences/instrument_guides/validate_asm.py --reference <file>| Mistake | Correct Approach |
|---|---|
| Manifest as object | Use URL string |
| Lowercase detection types | Use "Absorbance" not "absorbance" |
| "emission wavelength setting" | Use "detector wavelength setting" for emission |
| All measurements in one document | Group by well/sample location |
| Missing procedure metadata | Extract ALL device settings per measurement |
Generate standalone Python scripts that scientists can hand off:
# Export parser code
python scripts/export_parser.py --input "data.csv" --vendor "VI_CELL_BLU" --output "parser_script.py"
The exported script:
instrument-data-to-allotrope/
├── SKILL.md # This file
├── scripts/
│ ├── convert_to_asm.py # Main conversion script
│ ├── flatten_asm.py # ASM → 2D CSV conversion
│ ├── export_parser.py # Generate standalone parser code
│ └── validate_asm.py # Validate ASM output quality
└── references/
├── supported_instruments.md # Full instrument list with Vendor enums
├── asm_schema_overview.md # ASM structure reference
├── field_classification_guide.md # Where to put different field types
└── flattening_guide.md # How flattening works
User: "Convert this cell counting data to Allotrope format"
[uploads viCell_Results.xlsx]
Claude:
1. Detects Vi-CELL BLU (95% confidence)
2. Converts using allotropy native parser
3. Outputs:
- viCell_Results_asm.json (full ASM)
- viCell_Results_flat.csv (2D format)
- viCell_parser.py (exportable code)
User: "I need to give our data engineer code to parse NanoDrop files"
Claude:
1. Generates self-contained Python script
2. Includes sample input/output
3. Documents all assumptions
4. Provides Jupyter notebook version
User: "Convert this ELISA data to a CSV I can upload to our LIMS"
Claude:
1. Parses plate reader data
2. Generates flattened CSV with columns:
- sample_identifier, well_position, measurement_value, measurement_unit
- instrument_serial_number, analysis_datetime, assay_type
3. Validates against common LIMS import requirements
pip install allotropy --break-system-packages
If allotropy native parsing fails:
Validate output against Allotrope schemas when available:
import jsonschema
# Schema URLs in references/asm_schema_overview.md
Weekly Installs
127
Repository
GitHub Stars
8.9K
First Seen
Jan 31, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykWarn
Installed on
opencode112
codex111
gemini-cli106
github-copilot102
claude-code101
cursor97
通过 LiteLLM 代理让 Claude Code 对接 GitHub Copilot 运行 | 高级变通方案指南
40,000 周安装