adding-dbt-unit-test by dbt-labs/dbt-agent-skills
npx skills add https://github.com/dbt-labs/dbt-agent-skills --skill adding-dbt-unit-test广告位招租
在这里展示您的产品或服务
触达数万 AI 开发者,精准高效
dbt 单元测试在生产环境中物化之前,使用静态输入验证 SQL 建模逻辑。如果模型的任何单元测试失败,dbt 将不会物化该模型。
您应该在以下情况下对模型进行单元测试:
更多示例:
when 的 case when 语句我们不建议为以下情况创建单元测试:
min() 等。dbt 单元测试使用模型、给定输入和预期输出(模型-输入-输出)这三者:
model - 当构建此模型时given 输入 - 给定一组源、种子和模型作为前提条件expect 输出 - 然后期望模型的此行内容作为后置条件不言自明——标题说明了一切!
dict)不同,请指定 format。
format”部分以确定使用哪种 format。提示: 使用 dbt show 来探索来自上游模型或源的现有数据。这有助于您了解真实的输入结构。但是,在将样本数据用于单元测试夹具之前,务必对其进行清理,以删除任何敏感信息或 PII 信息。
# 预览上游模型数据
dbt show --select upstream_model --limit 5
dict)不同,请指定 format。
format”部分以确定使用哪种 format。假设您有这样一个模型:
-- models/hello_world.sql
select 'world' as hello
该模型的最小单元测试:
# models/_properties.yml
unit_tests:
- name: test_hello_world
# 始终只有一个要测试的转换
model: hello_world
# 这次不需要输入!
# 大多数单元测试都会有输入——请参阅下面的“真实世界示例”部分
given: []
# 预期输出可以有零到多行
expect:
rows:
- {hello: world}
运行单元测试,构建模型,并为 hello_world 模型运行数据测试:
dbt build --select hello_world
这可以节省数据仓库开销,因为只有在单元测试成功通过后,模型才会被物化并继续进行数据测试。
或者,仅运行单元测试而不构建模型或运行数据测试:
dbt test --select "hello_world,test_type:unit"
或者按名称选择特定的单元测试:
dbt test --select test_is_valid_email_address
dbt Labs 强烈建议仅在开发或 CI 环境中运行单元测试。由于单元测试的输入是静态的,无需在生产环境中使用额外的计算周期来运行它们。在进行测试驱动开发时使用它们,并在 CI 中使用以确保更改不会破坏它们。
使用 --resource-type 标志 --exclude-resource-type 或 DBT_EXCLUDE_RESOURCE_TYPES 环境变量从生产构建中排除单元测试以节省计算资源。
unit_tests:
- name: test_order_items_count_drink_items_with_zero_drinks
description: >
场景:没有任何饮品的订单
当构建 `order_items_summary` 表时
给定一个只有 1 个食品项目的订单
那么饮品项目的数量为 0
# 模型
model: order_items_summary
# 输入
given:
- input: ref('order_items')
rows:
- {
order_id: 76,
order_item_id: 3,
is_drink_item: false,
}
- input: ref('stg_orders')
rows:
- { order_id: 76 }
# 输出
expect:
rows:
- {
order_id: 76,
count_drink_items: 0,
}
有关单元测试的更多示例,请参阅 references/examples.md
materialized view 物化的模型。expect 输出。expect 输出。model-paths 目录(默认为 models/)中的 YAML 文件中定义test-paths 目录(默认为 tests/fixtures)中的 SQL 或 CSV 文件中定义ref 或 source 模型引用作为 input 包含进来,以避免编译期间出现“节点未找到”错误。format: sql。join 逻辑进行单元测试。在单元测试中使用 input 来引用测试的特定模型或源:
input:,使用表示 ref 或 source 调用的字符串:
ref('my_model') 或 ref('my_model', v='2') 或 ref('dougs_project', 'users')source('source_schema', 'source_name')rows: [] 来使用“空”输入
ref 或 source 依赖项,但其值与此特定单元测试无关,这将非常有用。但请注意,如果模型在该输入上有连接,可能会导致行被丢弃!models/schema.yml
unit_tests:
- name: test_is_valid_email_address # 这是测试的唯一名称
model: dim_customers # 我要进行单元测试的模型名称
given: # 输入的模拟数据
- input: ref('stg_customers')
rows:
- {email: cool@example.com, email_top_level_domain: example.com}
- {email: cool@unknown.com, email_top_level_domain: unknown.com}
- {email: badgmail.com, email_top_level_domain: gmail.com}
- {email: missingdot@gmailcom, email_top_level_domain: gmail.com}
- input: ref('top_level_email_domains')
rows:
- {tld: example.com}
- {tld: gmail.com}
- input: ref('irrelevant_dependency') # 我们需要确认但不需要任何数据的依赖项
rows: []
...
formatdbt 支持三种用于单元测试中模拟数据的格式:
dict(默认):内联 YAML 字典值。csv:内联 CSV 值或 CSV 文件。sql:内联 SQL 查询或 SQL 文件。要查看每种格式的示例,请参阅 references/examples.md
formatdict 格式,但根据需要回退到其他格式。ephemeral 模型的模型时,使用 sql 格式dict 或 csv 格式支持时,使用 sql 格式。csv 或 sql 格式。默认使用 csv,但如果任何列数据类型不被 csv 格式支持,则回退到 sql。sql 格式可读性最差,并且需要为_所有_列提供模拟数据,因此在可能的情况下优先使用其他格式。但它也是最灵活的,在 dict 或 csv 无法工作的场景中应作为回退方案使用。注意:
sql 格式,您必须为_所有列_提供模拟数据,而 dict 和 csv 可以只提供子集。sql 格式允许您对依赖于临时模型的模型进行单元测试——dict 和 csv 在这种情况下无法使用。dict 格式仅支持内联 YAML 模拟数据,但您也可以使用 csv 或 sql 格式,可以是内联的,也可以是在单独的夹具文件中。将您的夹具文件存储在 test-paths 中任意一个目录的 fixtures 子目录中。例如,tests/fixtures/my_unit_test_fixture.sql。
使用 dict 或 csv 格式时,您只需要定义与您相关的列的模拟数据。这使您能够编写简洁且_具体_的单元测试。对于 sql 格式,需要定义_所有_列。
如果在(Redshift、BigQuery 等)上实施,需要了解平台特定的详细信息。请阅读您数据库的注意事项文件(如果存在):
单元测试旨在测试预期的_值_,而不是数据类型本身。dbt 获取您提供的值,并尝试将其转换为从输入和输出模型推断出的数据类型。
您在单元测试 YAML 定义中指定输入和预期值的方式在各个数据仓库中基本一致,但对于更复杂的数据类型会有一些变化。
请阅读您数据库的数据类型文件:
默认情况下,所有指定的单元测试都已启用,并将根据 --select 标志包含在内。
要禁用单元测试的执行,请设置:
config:
enabled: false
如果单元测试错误地失败,并且需要禁用它直到修复,这将很有帮助。
当单元测试失败时,将出现“实际与预期不同”的日志消息,并显示两者之间的“数据差异”:
actual differs from expected:
@@ ,email ,is_valid_email_address
→ ,cool@example.com,True→False
,cool@unknown.com,False
单元测试失败时有两种主要可能性:
需要专家判断来区分两者。
--empty 标志您要进行单元测试的模型的直接父节点需要存在于数据仓库中,然后才能执行单元测试。run 和 build 命令支持 --empty 标志来构建仅模式的空运行。--empty 标志将 ref 和 source 限制为零行。dbt 仍将针对目标数据仓库执行模型 SQL,但将避免昂贵的输入数据读取。这可以验证依赖关系并确保您的模型能够正确构建。
使用 --empty 标志来构建模型的空版本以节省数据仓库开销。
dbt run --select "stg_customers top_level_email_domains" --empty
| 错误 | 修复方法 |
|---|---|
| 测试使用内置函数的简单 SQL | 仅对复杂逻辑进行单元测试:正则表达式、日期计算、窗口函数、多条件 case 语句 |
| 模拟输入数据中的所有列 | 仅包含与测试用例相关的列 |
当 dict 可用时使用 sql 格式 | 优先使用 dict(可读性最高),仅在需要时回退到 csv 或 sql |
缺少 ref 或 source 的 input | 包含所有模型依赖项以避免“节点未找到”错误 |
| 测试 Python 模型或快照 | 单元测试仅支持 SQL 模型 |
每周安装数
84
代码仓库
GitHub 星标数
246
首次出现
2026年1月29日
安全审计
安装于
github-copilot59
opencode58
codex57
gemini-cli56
amp53
kimi-cli53
dbt unit tests validate SQL modeling logic on static inputs before materializing in production. If any unit test for a model fails, dbt will not materialize that model.
You should unit test a model:
More examples:
case when statements when there are many whensCases we don't recommend creating unit tests for:
min(), etc.dbt unit test uses a trio of the model, given inputs, and expected outputs (Model-Inputs-Outputs):
model - when building this modelgiven inputs - given a set of source, seeds, and models as preconditionsexpect output - then expect this row content of the model as a postconditionSelf explanatory -- the title says it all!
format if different than the default (YAML dict).
formats for unit tests" section below to determine which format to use.Tip: Use dbt show to explore existing data from upstream models or sources. This helps you understand realistic input structures. However, always sanitize the sample data to remove any sensitive or PII information before using it in your unit test fixtures.
# Preview upstream model data
dbt show --select upstream_model --limit 5
format if different than the default (YAML dict).
formats for unit tests" section below to determine which format to use.Suppose you have this model:
-- models/hello_world.sql
select 'world' as hello
Minimal unit test for that model:
# models/_properties.yml
unit_tests:
- name: test_hello_world
# Always only one transformation to test
model: hello_world
# No inputs needed this time!
# Most unit tests will have inputs -- see the "real world example" section below
given: []
# Expected output can have zero to many rows
expect:
rows:
- {hello: world}
Run the unit tests, build the model, and run the data tests for the hello_world model:
dbt build --select hello_world
This saves on warehouse spend as the model will only be materialized and move on to the data tests if the unit tests pass successfully.
Or only run the unit tests without building the model or running the data tests:
dbt test --select "hello_world,test_type:unit"
Or choose a specific unit test by name:
dbt test --select test_is_valid_email_address
dbt Labs strongly recommends only running unit tests in development or CI environments. Since the inputs of the unit tests are static, there's no need to use additional compute cycles running them in production. Use them when doing development for a test-driven approach and CI to ensure changes don't break them.
Use the --resource-type flag --exclude-resource-type or the DBT_EXCLUDE_RESOURCE_TYPES environment variable to exclude unit tests from your production builds and save compute.
unit_tests:
- name: test_order_items_count_drink_items_with_zero_drinks
description: >
Scenario: Order without any drinks
When the `order_items_summary` table is built
Given an order with nothing but 1 food item
Then the count of drink items is 0
# Model
model: order_items_summary
# Inputs
given:
- input: ref('order_items')
rows:
- {
order_id: 76,
order_item_id: 3,
is_drink_item: false,
}
- input: ref('stg_orders')
rows:
- { order_id: 76 }
# Output
expect:
rows:
- {
order_id: 76,
count_drink_items: 0,
}
For more examples of unit tests, see references/examples.md
materialized view materialization.expect output for final state of the database table after inserting/merging for incremental models.expect output for what will be merged/inserted for incremental models.model-paths directory (models/ by default)test-paths directory (tests/fixtures by default)ref or source model references in the unit test configuration as inputs to avoid "node not found" errors during compilation.format: sql for the ephemeral model input.join logicUse inputs in your unit tests to reference a specific model or source for the test:
input:, use a string that represents a ref or source call:
ref('my_model') or ref('my_model', v='2') or ref('dougs_project', 'users')source('source_schema', 'source_name')rows: []
models/schema.yml
unit_tests:
- name: test_is_valid_email_address # this is the unique name of the test
model: dim_customers # name of the model I'm unit testing
given: # the mock data for your inputs
- input: ref('stg_customers')
rows:
- {email: cool@example.com, email_top_level_domain: example.com}
- {email: cool@unknown.com, email_top_level_domain: unknown.com}
- {email: badgmail.com, email_top_level_domain: gmail.com}
- {email: missingdot@gmailcom, email_top_level_domain: gmail.com}
- input: ref('top_level_email_domains')
rows:
- {tld: example.com}
- {tld: gmail.com}
- input: ref('irrelevant_dependency') # dependency that we need to acknowlege, but does not need any data
rows: []
...
formats for unit testsdbt supports three formats for mock data within unit tests:
dict (default): Inline YAML dictionary values.csv: Inline CSV values or a CSV file.sql: Inline SQL query or a SQL file.To see examples of each of the formats, see references/examples.md
formatdict format by default, but fall back to another format as-needed.sql format when testing a model that depends on an ephemeral modelsql format when unit testing a column whose data type is not supported by the dict or csv formats.csv or sql formats when using a fixture file. Default to csv, but fallback to sql if any of the column data types are not supported by the csv format.Notes:
sql format you must supply mock data for all columns whereas dict and csv may supply only a subset.sql format allows you to unit test a model that depends on an ephemeral model -- dict and csv can't be used in that case.The dict format only supports inline YAML mock data, but you can also use csv or sql either inline or in a separate fixture file. Store your fixture files in a fixtures subdirectory in any of your test-paths. For example, tests/fixtures/my_unit_test_fixture.sql.
When using the dict or csv format, you only have to define the mock data for the columns relevant to you. This enables you to write succinct and specific unit tests. For the sql format all columns need to be defined.
There are platform-specific details required if implementing on (Redshift, BigQuery, etc). Read the caveats file for your database (if it exists):
Unit tests are designed to test for the expected values , not for the data types themselves. dbt takes the value you provide and attempts to cast it to the data type as inferred from the input and output models.
How you specify input and expected values in your unit test YAML definitions are largely consistent across data warehouses, with some variation for more complex data types.
Read the data types file for your database:
By default, all specified unit tests are enabled and will be included according to the --select flag.
To disable a unit test from being executed, set:
config:
enabled: false
This is helpful if a unit test is incorrectly failing and it needs to be disabled until it is fixed.
When a unit test fails, there will be a log message of "actual differs from expected", and it will show a "data diff" between the two:
actual differs from expected:
@@ ,email ,is_valid_email_address
→ ,cool@example.com,True→False
,cool@unknown.com,False
There are two main possibilities when a unit test fails:
It takes expert judgement to determine one from the other.
--empty flagThe direct parents of the model that you’re unit testing need to exist in the warehouse before you can execute the unit test. The run and build commands supports the --empty flag for building schema-only dry runs. The --empty flag limits the refs and sources to zero rows. dbt will still execute the model SQL against the target data warehouse but will avoid expensive reads of input data. This validates dependencies and ensures your models will build properly.
Use the --empty flag to build an empty version of the models to save warehouse spend.
dbt run --select "stg_customers top_level_email_domains" --empty
| Mistake | Fix |
|---|---|
| Testing simple SQL using built-in functions | Only unit test complex logic: regex, date math, window functions, multi-condition case statements |
| Mocking all columns in input data | Only include columns relevant to the test case |
Using sql format when dict works | Prefer dict (most readable), fall back to csv or sql only when needed |
Missing input for a ref or source |
Weekly Installs
84
Repository
GitHub Stars
246
First Seen
Jan 29, 2026
Security Audits
Gen Agent Trust HubPassSocketPassSnykPass
Installed on
github-copilot59
opencode58
codex57
gemini-cli56
amp53
kimi-cli53
Excel财务建模规范与xlsx文件处理指南:专业格式、零错误公式与数据分析
45,000 周安装
refsourcesql format is the least readable and requires suppling mock data for all columns, so prefer other formats when possible. But it is also the most flexible, and should be used as the fallback in scenarios where dict or csv won't work.| Include all model dependencies to avoid "node not found" errors |
| Testing Python models or snapshots | Unit tests only support SQL models |