使用dbt回答自然语言问题：语义层查询与SQL修改指南

answering-natural-language-questions-with-dbt by dbt-labs/dbt-agent-skills

120 周安装量

311 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/dbt-labs/dbt-agent-skills --skill answering-natural-language-questions-with-dbt

自动化数据分析业务分析

🇨🇳中文介绍

使用 dbt 回答自然语言问题

概述

使用最佳可用方法回答数据问题：优先使用语义层，然后是 SQL 修改，接着是模型发现，最后是清单分析。在说"无法回答"之前，务必穷尽所有选项。

适用于： 用户提出的需要数据答案的业务问题

"上个月的总销售额是多少？"
"我们有多少活跃客户？"
"按地区显示收入"

不适用于：

开发期间验证模型逻辑
测试 dbt 模型或语义层定义
构建或修改 dbt 模型
dbt run、dbt test 或 dbt build 工作流

决策流程

flowchart TD
    start([收到业务问题])
    check_sl{语义层工具可用？}
    list_metrics[list_metrics]
    metric_exists{相关指标存在？}
    get_dims[get_dimensions]
    sl_sufficient{语义层能直接回答？}
    query_metrics[query_metrics]
    answer([返回答案])
    try_compiled[get_metrics_compiled_sql<br/>修改 SQL，execute_sql]
    check_discovery{模型发现工具可用？}
    try_discovery[get_mart_models<br/>get_model_details<br/>编写 SQL，执行]
    check_manifest{在 dbt 项目中？}
    try_manifest[分析清单/目录<br/>编写 SQL]
    cannot([无法回答])
    suggest{在 dbt 项目中？}
    improvements[建议语义层更改]
    done([完成])

    start --> check_sl
    check_sl -->|是| list_metrics
    check_sl -->|否| check_discovery
    list_metrics --> metric_exists
    metric_exists -->|是| get_dims
    metric_exists -->|否| check_discovery
    get_dims --> sl_sufficient
    sl_sufficient -->|是| query_metrics
    sl_sufficient -->|否| try_compiled
    query_metrics --> answer
    try_compiled -->|成功| answer
    try_compiled -->|失败| check_discovery
    check_discovery -->|是| try_discovery
    check_discovery -->|否| check_manifest
    try_discovery -->|成功| answer
    try_discovery -->|失败| check_manifest
    check_manifest -->|是| try_manifest
    check_manifest -->|否| cannot
    try_manifest -->|SQL 就绪| answer
    answer --> suggest
    cannot --> done
    suggest -->|是| improvements
    suggest -->|否| done
    improvements --> done

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

优先级	条件	方法	工具
1	语义层激活	直接查询指标	`list_metrics`, `get_dimensions`, `query_metrics`
2	语义层激活但需要少量修改（缺少维度、自定义过滤器、case when、不同聚合方式）	修改编译后的 SQL	`get_metrics_compiled_sql`, 然后 `execute_sql`
3	无语义层，发现工具激活	探索模型，编写 SQL	`get_mart_models`, `get_model_details`, 然后 `show`/`execute_sql`
4	无 MCP，在 dbt 项目中	分析工件，编写 SQL	读取 `target/manifest.json`, `target/catalog.json`

方法 1：语义层查询

当 list_metrics 和 query_metrics 可用时：

list_metrics - 查找相关指标
get_dimensions - 验证所需维度是否存在
query_metrics - 使用适当的过滤器执行

如果语义层不能直接回答（缺少维度、需要自定义逻辑）→ 转到方法 2。

方法 2：修改编译后的 SQL

当语义层有指标但需要少量修改时：

缺少维度（连接 + 分组）
自定义过滤器不作为维度提供
用于自定义分类的 case when 逻辑
与定义不同的聚合方式

get_metrics_compiled_sql - 获取将要运行的 SQL（返回原始 SQL，不是 Jinja）
修改 SQL 以添加所需内容
execute_sql 以运行原始 SQL
始终建议 如果修改内容可重用，则更新语义模型

-- 示例：添加 sales_rep 维度

WITH base AS (
    -- ... 编译后的指标逻辑（已解析为表名）...
)
SELECT base.*, reps.sales_rep_name
FROM base
JOIN analytics.dim_sales_reps reps ON base.rep_id = reps.id
GROUP BY ...

-- 示例：自定义过滤器
SELECT * FROM (compiled_metric_sql) WHERE region = 'EMEA'

-- 示例：Case when 分类
SELECT
    CASE WHEN amount > 1000 THEN 'large' ELSE 'small' END as deal_size,
    SUM(amount)
FROM (compiled_metric_sql)
GROUP BY 1

注意： 编译后的 SQL 包含已解析的表名，而不是 {{ ref() }}。请使用返回的原始 SQL。

方法 3：模型发现

当没有语义层但 get_all_models/get_model_details 可用时：

get_mart_models - 从集市模型开始，而不是暂存模型
get_model_details 用于相关模型 - 理解模式
使用 {{ ref('model_name') }} 编写 SQL
show --inline "..." 或 execute_sql

优先使用集市模型而非暂存模型 - 集市模型已应用业务逻辑。

方法 4：清单/目录分析

当在 dbt 项目中但没有 MCP 服务器时：

检查 target/manifest.json 和 target/catalog.json
在读取前过滤 - 这些文件可能很大

# 在清单中查找集市模型

jq '.nodes | to_entries | map(select(.key | startswith("model.") and contains("mart"))) | .[].value | {name: .name, schema: .schema, columns: .columns}' target/manifest.json

# 从目录获取列信息
jq '.nodes["model.project_name.model_name"].columns' target/catalog.json

3. 根据发现的模式编写 SQL 4. 解释："此 SQL 应在您的数据仓库中运行。没有数据库访问权限我无法执行它。"

当在 dbt 项目中时，在回答后（或无法回答时）建议语义层更改：

差距	建议
指标不存在	"将指标定义添加到您的语义模型中"
缺少维度	"将 `dimension_name` 添加到语义模型的维度列表中"
无语义层	"考虑为此数据添加语义层"

保持在语义层级别。 不要建议：

数据库模式更改
ETL 管道修改
"请您的数据工程团队..."

需要抵制的理由

您在想...	现实情况
"语义层不支持这个确切的查询"	获取编译后的 SQL 并修改它（方法 2）
"没有 MCP 工具，无法帮助"	检查本地是否有清单/目录
"用户需要快速得到答案，跳过系统检查"	系统方法才是最快的途径
"直接写 SQL，更快"	语义层存在是有原因的 - 优先使用它
"数据中不存在该维度"	也许它存在，但不在语义层配置中

危险信号 - 停止

未检查语义层是否能回答就编写 SQL
未尝试所有 4 种方法就说"无法回答"
为语义层差距建议数据库级别的修复
未过滤就读取整个 manifest.json
存在集市模型时使用暂存模型
使用此方法来验证模型正确性而非回答业务问题

错误	修复
当语义层不能直接回答时放弃	获取编译后的 SQL 并修改它
查询暂存模型	首先使用 `get_mart_models`
读取完整的 manifest.json	使用 jq 过滤
建议 ETL 更改	将建议保持在语义层级别
未检查工具可用性	在选择方法前列出可用工具

🇺🇸English

Answering Natural Language Questions with dbt

Overview

Answer data questions using the best available method: semantic layer first, then SQL modification, then model discovery, then manifest analysis. Always exhaust options before saying "cannot answer."

Use for: Business questions from users that need data answers

"What were total sales last month?"
"How many active customers do we have?"
"Show me revenue by region"

Not for:

Validating model logic during development
Testing dbt models or semantic layer definitions
Building or modifying dbt models
dbt run, dbt test, or dbt build workflows

Decision Flow

flowchart TD
    start([Business question received])
    check_sl{Semantic layer tools available?}
    list_metrics[list_metrics]
    metric_exists{Relevant metric exists?}
    get_dims[get_dimensions]
    sl_sufficient{SL can answer directly?}
    query_metrics[query_metrics]
    answer([Return answer])
    try_compiled[get_metrics_compiled_sql<br/>Modify SQL, execute_sql]
    check_discovery{Model discovery tools available?}
    try_discovery[get_mart_models<br/>get_model_details<br/>Write SQL, execute]
    check_manifest{In dbt project?}
    try_manifest[Analyze manifest/catalog<br/>Write SQL]
    cannot([Cannot answer])
    suggest{In dbt project?}
    improvements[Suggest semantic layer changes]
    done([Done])

    start --> check_sl
    check_sl -->|yes| list_metrics
    check_sl -->|no| check_discovery
    list_metrics --> metric_exists
    metric_exists -->|yes| get_dims
    metric_exists -->|no| check_discovery
    get_dims --> sl_sufficient
    sl_sufficient -->|yes| query_metrics
    sl_sufficient -->|no| try_compiled
    query_metrics --> answer
    try_compiled -->|success| answer
    try_compiled -->|fail| check_discovery
    check_discovery -->|yes| try_discovery
    check_discovery -->|no| check_manifest
    try_discovery -->|success| answer
    try_discovery -->|fail| check_manifest
    check_manifest -->|yes| try_manifest
    check_manifest -->|no| cannot
    try_manifest -->|SQL ready| answer
    answer --> suggest
    cannot --> done
    suggest -->|yes| improvements
    suggest -->|no| done
    improvements --> done

Quick Reference

Priority	Condition	Approach	Tools
1	Semantic layer active	Query metrics directly	`list_metrics`, `get_dimensions`, `query_metrics`
2	SL active but minor modifications needed (missing dimension, custom filter, case when, different aggregation)	Modify compiled SQL	`get_metrics_compiled_sql`, then `execute_sql`
3	No SL, discovery tools active	Explore models, write SQL	, , then /

Approach 1: Semantic Layer Query

When list_metrics and query_metrics are available:

list_metrics - find relevant metric
get_dimensions - verify required dimensions exist
query_metrics - execute with appropriate filters

If semantic layer can't answer directly (missing dimension, need custom logic) → go to Approach 2.

Approach 2: Modified Compiled SQL

When semantic layer has the metric but needs minor modifications:

Missing dimension (join + group by)
Custom filter not available as a dimension
Case when logic for custom categorization
Different aggregation than what's defined

get_metrics_compiled_sql - get the SQL that would run (returns raw SQL, not Jinja)
Modify SQL to add what's needed
execute_sql to run the raw SQL
Always suggest updating the semantic model if the modification would be reusable

-- Example: Adding sales_rep dimension

WITH base AS (
    -- ... compiled metric logic (already resolved to table names) ...
)
SELECT base.*, reps.sales_rep_name
FROM base
JOIN analytics.dim_sales_reps reps ON base.rep_id = reps.id
GROUP BY ...

-- Example: Custom filter
SELECT * FROM (compiled_metric_sql) WHERE region = 'EMEA'

-- Example: Case when categorization
SELECT
    CASE WHEN amount > 1000 THEN 'large' ELSE 'small' END as deal_size,
    SUM(amount)
FROM (compiled_metric_sql)
GROUP BY 1

Note: The compiled SQL contains resolved table names, not {{ ref() }}. Work with the raw SQL as returned.

Approach 3: Model Discovery

When no semantic layer but get_all_models/get_model_details available:

get_mart_models - start with marts, not staging
get_model_details for relevant models - understand schema
Write SQL using {{ ref('model_name') }}
show --inline "..." or execute_sql

Prefer marts over staging - marts have business logic applied.

Approach 4: Manifest/Catalog Analysis

When in a dbt project but no MCP server:

Check for target/manifest.json and target/catalog.json
Filter before reading - these files can be large

# Find mart models in manifest

jq '.nodes | to_entries | map(select(.key | startswith("model.") and contains("mart"))) | .[].value | {name: .name, schema: .schema, columns: .columns}' target/manifest.json

# Get column info from catalog
jq '.nodes["model.project_name.model_name"].columns' target/catalog.json

3. Write SQL based on discovered schema 4. Explain: "This SQL should run in your warehouse. I cannot execute it without database access."

Suggesting Improvements

When in a dbt project , suggest semantic layer changes after answering (or when cannot answer):

Gap	Suggestion
Metric doesn't exist	"Add a metric definition to your semantic model"
Dimension missing	"Add `dimension_name` to the dimensions list in the semantic model"
No semantic layer	"Consider adding a semantic layer for this data"

Stay at semantic layer level. Do NOT suggest:

Database schema changes
ETL pipeline modifications
"Ask your data engineering team to..."

Rationalizations to Resist

You're Thinking...	Reality
"Semantic layer doesn't support this exact query"	Get compiled SQL and modify it (Approach 2)
"No MCP tools, can't help"	Check for manifest/catalog locally
"User needs this quickly, skip the systematic check"	Systematic approach IS the fastest path
"Just write SQL, it's faster"	Semantic layer exists for a reason - use it first
"The dimension doesn't exist in the data"	Maybe it exists but not in semantic layer config

Red Flags - STOP

Writing SQL without checking if semantic layer can answer
Saying "cannot answer" without trying all 4 approaches
Suggesting database-level fixes for semantic layer gaps
Reading entire manifest.json without filtering
Using staging models when mart models exist
Using this to validate model correctness rather than answer business questions

Common Mistakes

Mistake	Fix
Giving up when SL can't answer directly	Get compiled SQL and modify it
Querying staging models	Use `get_mart_models` first
Reading full manifest.json	Use jq to filter
Suggesting ETL changes	Keep suggestions at semantic layer
Not checking tool availability	List available tools before choosing approach

Weekly Installs

120

Repository

dbt-labs/dbt-ag…t-skills

GitHub Stars

311

First Seen

Jan 29, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

github-copilot75

opencode73

gemini-cli72

codex72

amp69

kimi-cli69

Python PDF处理教程：合并拆分、提取文本表格、创建PDF文件

62,000 周安装