tracing-upstream-lineage by astronomer/agents
npx skills add https://github.com/astronomer/agents --skill tracing-upstream-lineage追溯数据来源——回答“这些数据从何而来?”
确定我们要追踪的目标:
表通常由 Airflow DAG 填充。找到关联:
按名称搜索 DAG :使用 af dags list 并查找与表名匹配的 DAG 名称
load_customers -> customers 表etl_daily_orders -> orders 表探索 DAG 源代码 :使用 读取 DAG 定义
广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
af dags source <dag_id>检查 DAG 任务 :使用 af tasks list <dag_id> 查看 DAG 执行的操作
如果您在 Astro 上运行,Astro UI 中的 血缘关系标签页 提供了跨 DAG 和数据集的视觉化血缘关系探索。使用它可以快速追踪上游依赖关系,而无需手动搜索 DAG 源代码。
使用 DAG 源代码和任务日志来追踪血缘关系(无内置的跨 DAG UI)。
从 DAG 代码中,识别源表和系统:
SQL 源(查找 FROM 子句):
# In DAG code:
SELECT * FROM source_schema.source_table # <- 这是一个上游源
外部源(查找连接引用):
S3Operator -> S3 存储桶源PostgresOperator -> Postgres 数据库源SalesforceOperator -> Salesforce API 源HttpOperator -> REST API 源文件源 :
递归追踪每个源:
TARGET: analytics.orders_daily
^
+-- DAG: etl_daily_orders
^
+-- SOURCE: raw.orders (table)
| ^
| +-- DAG: ingest_orders
| ^
| +-- SOURCE: Salesforce API (external)
|
+-- SOURCE: dim.customers (table)
^
+-- DAG: load_customers
^
+-- SOURCE: PostgreSQL (external DB)
对于每个上游源:
af dags stats 检查最近运行状态当追踪特定列时:
source.col AS target_colCOALESCE(a.col, b.col) AS target_colSUM(detail.amount) AS total_amount一句话答案:“此表由 DAG X 从源 Y 和 Z 填充”
[Salesforce] --> [raw.opportunities] --> [stg.opportunities] --> [fct.sales]
| |
DAG: ingest_sfdc DAG: transform_sales
| 源 | 类型 | 连接 | 新鲜度 | 所有者 |
|---|---|---|---|---|
| raw.orders | 表 | 内部 | 2 小时前 | data-team |
| Salesforce | API | salesforce_conn | 实时 | sales-ops |
描述数据如何流动和转换:
raw.orderstransform_orders 清理并去重数据到 stg.ordersbuild_order_facts 与维度表关联后进入 fct.orders每周安装量
388
代码仓库
GitHub 星标数
269
首次出现
Jan 23, 2026
安全审计
安装于
opencode284
codex275
cursor271
github-copilot266
gemini-cli252
claude-code249
Trace the origins of data - answer "Where does this data come from?"
Determine what we're tracing:
Tables are typically populated by Airflow DAGs. Find the connection:
Search DAGs by name : Use af dags list and look for DAG names matching the table name
load_customers -> customers tableetl_daily_orders -> orders tableExplore DAG source code : Use af dags source <dag_id> to read the DAG definition
Check DAG tasks : Use af tasks list <dag_id> to see what operations the DAG performs
If you're running on Astro, the Lineage tab in the Astro UI provides visual lineage exploration across DAGs and datasets. Use it to quickly trace upstream dependencies without manually searching DAG source code.
Use DAG source code and task logs to trace lineage (no built-in cross-DAG UI).
From the DAG code, identify source tables and systems:
SQL Sources (look for FROM clauses):
# In DAG code:
SELECT * FROM source_schema.source_table # <- This is an upstream source
External Sources (look for connection references):
S3Operator -> S3 bucket sourcePostgresOperator -> Postgres database sourceSalesforceOperator -> Salesforce API sourceHttpOperator -> REST API sourceFile Sources :
Recursively trace each source:
TARGET: analytics.orders_daily
^
+-- DAG: etl_daily_orders
^
+-- SOURCE: raw.orders (table)
| ^
| +-- DAG: ingest_orders
| ^
| +-- SOURCE: Salesforce API (external)
|
+-- SOURCE: dim.customers (table)
^
+-- DAG: load_customers
^
+-- SOURCE: PostgreSQL (external DB)
For each upstream source:
af dags statsWhen tracing a specific column:
source.col AS target_colCOALESCE(a.col, b.col) AS target_colSUM(detail.amount) AS total_amountOne-line answer: "This table is populated by DAG X from sources Y and Z"
[Salesforce] --> [raw.opportunities] --> [stg.opportunities] --> [fct.sales]
| |
DAG: ingest_sfdc DAG: transform_sales
| Source | Type | Connection | Freshness | Owner |
|---|---|---|---|---|
| raw.orders | Table | Internal | 2h ago | data-team |
| Salesforce | API | salesforce_conn | Real-time | sales-ops |
Describe how data flows and transforms:
raw.orders via Salesforce API synctransform_orders cleans and dedupes into stg.ordersbuild_order_facts joins with dimensions into fct.ordersWeekly Installs
388
Repository
GitHub Stars
269
First Seen
Jan 23, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
opencode284
codex275
cursor271
github-copilot266
gemini-cli252
claude-code249
Azure 升级评估与自动化工具 - 轻松迁移 Functions 计划、托管层级和 SKU
66,100 周安装