dbt 单元测试完整指南：为模型添加测试，确保数据转换质量

adding-dbt-unit-test by dbt-labs/dbt-agent-skills

184 周安装量

355 GitHub Stars

GitHub

安装命令

npx skills add https://github.com/dbt-labs/dbt-agent-skills --skill adding-dbt-unit-test

数据分析数据处理测试

🇨🇳中文介绍

为 dbt 模型添加单元测试

额外资源

规范参考 - 单元测试的所有必需和可选的 YAML 键
示例 - 各种格式（字典、csv、sql）的单元测试示例
增量模型 - 单元测试增量模型
临时依赖项 - 单元测试依赖于临时模型的模型
特殊情况覆盖 - 内省宏、项目变量、环境变量
版本化模型 - 单元测试版本化 SQL 模型
BigQuery 注意事项 - BigQuery 特定注意事项
BigQuery 数据类型 - BigQuery 数据类型处理
Postgres 数据类型 - Postgres 数据类型处理
Redshift 注意事项 - Redshift 特定注意事项

广告位招租

在这里展示您的产品或服务

触达数万 AI 开发者，精准高效

联系我们

dbt 中的单元测试是什么

dbt 单元测试在生产环境中物化之前，使用静态输入验证 SQL 建模逻辑。如果模型的任何单元测试失败，dbt 将不会物化该模型。

您应该在以下情况下对模型进行单元测试：

为模型的预期功能以及边界情况添加模型-输入-输出场景，以防止日后更改模型逻辑时出现回归。
验证错误修复是否解决了现有 dbt 模型的错误报告。

当您的 SQL 包含复杂逻辑时：
- 正则表达式
- 日期计算
- 窗口函数
- 包含多个 when 的 case when 语句
- 截断
- 复杂连接（多重连接、自连接或具有非平凡条件的连接）
当您编写自定义逻辑来处理输入数据时，类似于创建一个函数。
之前曾收到错误报告的逻辑。
您尚未在实际数据中看到但希望确信能正确处理好的边界情况。
在重构转换逻辑之前（特别是如果重构规模较大）。
具有高“关键性”的模型（公开的、有合同约束的模型或直接位于曝光上游的模型）。

我们不建议为以下情况创建单元测试：

由数据仓库提供商广泛测试的内置函数。如果出现意外问题，更可能是底层数据问题而非函数本身问题。因此，单元测试中的夹具数据不会提供有价值的信息。
- 常见的 SQL 规范函数，如 min() 等。

dbt 单元测试使用模型、给定输入和预期输出（模型-输入-输出）这三者：

model - 当构建此模型时
given 输入 - 给定一组源、种子和模型作为前提条件
expect 输出 - 然后期望模型的此行内容作为后置条件

1. 选择要测试的模型

不言自明——标题说明了一切！

为模型所依赖的每个节点创建一个输入。
指定它应使用的模拟数据。
如果与默认值（YAML dict）不同，请指定 format。
- 请参阅下面的“单元测试的数据 format”部分以确定使用哪种 format。
模拟数据只需要包含此测试用例中使用的列的子集。

提示： 使用 dbt show 来探索来自上游模型或源的现有数据。这有助于您了解真实的输入结构。但是，在将样本数据用于单元测试夹具之前，务必对其进行清理，以删除任何敏感信息或 PII 信息。

# 预览上游模型数据
dbt show --select upstream_model --limit 5

指定给定这些输入时期望模型创建的数据。
如果与默认值（YAML dict）不同，请指定 format。
- 请参阅下面的“单元测试的数据 format”部分以确定使用哪种 format。
模拟数据只需要包含此测试用例中使用的列的子集。

假设您有这样一个模型：

-- models/hello_world.sql

select 'world' as hello

该模型的最小单元测试：

# models/_properties.yml

unit_tests:
  - name: test_hello_world

    # 始终只有一个要测试的转换
    model: hello_world

    # 这次不需要输入！
    # 大多数单元测试都会有输入——请参阅下面的“真实世界示例”部分
    given: []

    # 预期输出可以有零到多行
    expect:
      rows:
        - {hello: world}

运行单元测试，构建模型，并为 hello_world 模型运行数据测试：

dbt build --select hello_world

这可以节省数据仓库开销，因为只有在单元测试成功通过后，模型才会被物化并继续进行数据测试。

或者，仅运行单元测试而不构建模型或运行数据测试：

dbt test --select "hello_world,test_type:unit"

或者按名称选择特定的单元测试：

dbt test --select test_is_valid_email_address

从生产构建中排除单元测试

dbt Labs 强烈建议仅在开发或 CI 环境中运行单元测试。由于单元测试的输入是静态的，无需在生产环境中使用额外的计算周期来运行它们。在进行测试驱动开发时使用它们，并在 CI 中使用以确保更改不会破坏它们。

使用 --resource-type 标志 --exclude-resource-type 或 DBT_EXCLUDE_RESOURCE_TYPES 环境变量从生产构建中排除单元测试以节省计算资源。

unit_tests:

  - name: test_order_items_count_drink_items_with_zero_drinks
    description: >
      场景：没有任何饮品的订单
        当构建 `order_items_summary` 表时
        给定一个只有 1 个食品项目的订单
        那么饮品项目的数量为 0

    # 模型
    model: order_items_summary

    # 输入
    given:
      - input: ref('order_items')
        rows:
          - {
              order_id: 76,
              order_item_id: 3,
              is_drink_item: false,
            }
      - input: ref('stg_orders')
        rows:
          - { order_id: 76 }

    # 输出
    expect:
      rows:
        - {
            order_id: 76,
            count_drink_items: 0,
          }

有关单元测试的更多示例，请参阅 references/examples.md

支持和不支持的场景

dbt 仅支持单元测试 SQL 模型。
- 不支持单元测试 Python 模型。
- 不支持单元测试非模型节点，如快照、种子、源、分析等。
dbt 仅支持向您_当前_项目中的模型添加单元测试。
- 不支持单元测试跨项目模型或从包导入的模型。
dbt _不_支持单元测试使用 materialized view 物化的模型。
dbt _不_支持单元测试使用递归 SQL 的模型。
dbt _不_支持单元测试使用内省查询的模型。
dbt _不_支持为增量模型插入/合并后数据库表的最终状态设置 expect 输出。
dbt _支持_为增量模型将要合并/插入的内容设置 expect 输出。

需要了解的信息

单元测试必须在 model-paths 目录（默认为 models/）中的 YAML 文件中定义
单元测试的夹具文件必须在 test-paths 目录（默认为 tests/fixtures）中的 SQL 或 CSV 文件中定义
在单元测试配置中将所有 ref 或 source 模型引用作为 input 包含进来，以避免编译期间出现“节点未找到”错误。
如果您的模型有多个版本，默认情况下单元测试将在模型的_所有_版本上运行。
如果您想对依赖于临时模型的模型进行单元测试，则必须对临时模型输入使用 format: sql。
模型中的表名必须使用别名，以便对 join 逻辑进行单元测试。

用于指定单元测试的 YAML

有关单元测试 YAML 定义中所有必需和可选的键，请参阅 references/spec.md

单元测试的输入

在单元测试中使用 input 来引用测试的特定模型或源：

对于 input:，使用表示 ref 或 source 调用的字符串：
- ref('my_model') 或 ref('my_model', v='2') 或 ref('dougs_project', 'users')
- source('source_schema', 'source_name')
对于种子输入：
- 如果您不为种子提供输入，我们将使用种子的 CSV 文件_作为_输入。
- 如果您确实为种子提供了输入，我们将使用该输入。
通过将行设置为空列表 rows: [] 来使用“空”输入
- 如果模型具有 ref 或 source 依赖项，但其值与此特定单元测试无关，这将非常有用。但请注意，如果模型在该输入上有连接，可能会导致行被丢弃！

unit_tests:
  - name: test_is_valid_email_address  # 这是测试的唯一名称
    model: dim_customers  # 我要进行单元测试的模型名称
    given:  # 输入的模拟数据
      - input: ref('stg_customers')
        rows:
         - {email: cool@example.com,     email_top_level_domain: example.com}
         - {email: cool@unknown.com,     email_top_level_domain: unknown.com}
         - {email: badgmail.com,         email_top_level_domain: gmail.com}
         - {email: missingdot@gmailcom,  email_top_level_domain: gmail.com}
      - input: ref('top_level_email_domains')
        rows:
         - {tld: example.com}
         - {tld: gmail.com}
      - input: ref('irrelevant_dependency')  # 我们需要确认但不需要任何数据的依赖项
        rows: []
...

单元测试的数据 `format`

dbt 支持三种用于单元测试中模拟数据的格式：

dict（默认）：内联 YAML 字典值。
csv：内联 CSV 值或 CSV 文件。
sql：内联 SQL 查询或 SQL 文件。

要查看每种格式的示例，请参阅 references/examples.md

如何选择 `format`

默认使用 dict 格式，但根据需要回退到其他格式。
当测试依赖于 ephemeral 模型的模型时，使用 sql 格式
当单元测试的数据类型不被 dict 或 csv 格式支持时，使用 sql 格式。
当使用夹具文件时，使用 csv 或 sql 格式。默认使用 csv，但如果任何列数据类型不被 csv 格式支持，则回退到 sql。
sql 格式可读性最差，并且需要为_所有_列提供模拟数据，因此在可能的情况下优先使用其他格式。但它也是最灵活的，在 dict 或 csv 无法工作的场景中应作为回退方案使用。

对于 sql 格式，您必须为_所有列_提供模拟数据，而 dict 和 csv 可以只提供子集。
只有 sql 格式允许您对依赖于临时模型的模型进行单元测试——dict 和 csv 在这种情况下无法使用。
没有支持 Jinja 的格式。

dict 格式仅支持内联 YAML 模拟数据，但您也可以使用 csv 或 sql 格式，可以是内联的，也可以是在单独的夹具文件中。将您的夹具文件存储在 test-paths 中任意一个目录的 fixtures 子目录中。例如，tests/fixtures/my_unit_test_fixture.sql。

使用 dict 或 csv 格式时，您只需要定义与您相关的列的模拟数据。这使您能够编写简洁且_具体_的单元测试。对于 sql 格式，需要定义_所有_列。

单元测试增量模型。请参阅 references/special-cases-incremental-model.md。
单元测试依赖于临时模型的模型。请参阅 references/special-cases-ephemeral-dependency.md。
单元测试依赖于任何内省宏、项目变量或环境变量的模型。请参阅 references/special-cases-special-case-overrides.md。
单元测试版本化 SQL 模型。请参阅 references/special-cases-versioned-model.md。

平台/适配器特定注意事项

如果在（Redshift、BigQuery 等）上实施，需要了解平台特定的详细信息。请阅读您数据库的注意事项文件（如果存在）：

平台/适配器特定数据类型

单元测试旨在测试预期的_值_，而不是数据类型本身。dbt 获取您提供的值，并尝试将其转换为从输入和输出模型推断出的数据类型。

您在单元测试 YAML 定义中指定输入和预期值的方式在各个数据仓库中基本一致，但对于更复杂的数据类型会有一些变化。

请阅读您数据库的数据类型文件：

默认情况下，所有指定的单元测试都已启用，并将根据 --select 标志包含在内。

要禁用单元测试的执行，请设置：

    config: 
      enabled: false

如果单元测试错误地失败，并且需要禁用它直到修复，这将很有帮助。

当单元测试失败时

当单元测试失败时，将出现“实际与预期不同”的日志消息，并显示两者之间的“数据差异”：

actual differs from expected:

@@ ,email           ,is_valid_email_address
→  ,cool@example.com,True→False
   ,cool@unknown.com,False

单元测试失败时有两种主要可能性：

构建单元测试的方式存在错误（误报）
模型中存在错误（真报）

需要专家判断来区分两者。

您要进行单元测试的模型的直接父节点需要存在于数据仓库中，然后才能执行单元测试。run 和 build 命令支持 --empty 标志来构建仅模式的空运行。--empty 标志将 ref 和 source 限制为零行。dbt 仍将针对目标数据仓库执行模型 SQL，但将避免昂贵的输入数据读取。这可以验证依赖关系并确保您的模型能够正确构建。

使用 --empty 标志来构建模型的空版本以节省数据仓库开销。

dbt run --select "stg_customers top_level_email_domains" --empty

错误	修复方法
测试使用内置函数的简单 SQL	仅对复杂逻辑进行单元测试：正则表达式、日期计算、窗口函数、多条件 case 语句
模拟输入数据中的所有列	仅包含与测试用例相关的列
当 `dict` 可用时使用 `sql` 格式	优先使用 `dict`（可读性最高），仅在需要时回退到 `csv` 或 `sql`
缺少 `ref` 或 `source` 的 `input`	包含所有模型依赖项以避免“节点未找到”错误
测试 Python 模型或快照	单元测试仅支持 SQL 模型

🇺🇸English

Add unit test for a dbt model

Additional Resources

Spec Reference - All required and optional YAML keys for unit tests
Examples - Unit test examples across formats (dict, csv, sql)
Incremental Models - Unit testing incremental models
Ephemeral Dependencies - Unit testing models depending on ephemeral models
Special Case Overrides - Introspective macros, project variables, environment variables
Versioned Models - Unit testing versioned SQL models
BigQuery Caveats - BigQuery-specific caveats
BigQuery Data Types - BigQuery data type handling
Postgres Data Types - Postgres data type handling
Redshift Caveats - Redshift-specific caveats
Redshift Data Types - Redshift data type handling
Snowflake Data Types - Snowflake data type handling
Spark Data Types - Spark data type handling

What are unit tests in dbt

dbt unit tests validate SQL modeling logic on static inputs before materializing in production. If any unit test for a model fails, dbt will not materialize that model.

When to use

You should unit test a model:

Adding Model-Input-Output scenarios for the intended functionality of the model as well as edge cases to prevent regressions if the model logic is changed at a later date.
Verifying that a bug fix solves a bug report for an existing dbt model.

More examples:

When your SQL contains complex logic:
- Regex
- Date math
- Window functions
- case when statements when there are many whens
- Truncation
- Complex joins (multiple joins, self-joins, or joins with non-trivial conditions)
When you're writing custom logic to process input data, similar to creating a function.
Logic for which you had bugs reported before.
Edge cases not yet seen in your actual data that you want to be confident you are handling properly.
Prior to refactoring the transformation logic (especially if the refactor is significant).
Models with high "criticality" (public, contracted models or models directly upstream of an exposure).

When not to use

Cases we don't recommend creating unit tests for:

Built-in functions that are tested extensively by the warehouse provider. If an unexpected issue arises, it's more likely a result of issues in the underlying data rather than the function itself. Therefore, fixture data in the unit test won't provide valuable information.
- common SQL spec functions like min(), etc.

General format

dbt unit test uses a trio of the model, given inputs, and expected outputs (Model-Inputs-Outputs):

model - when building this model
given inputs - given a set of source, seeds, and models as preconditions
expect output - then expect this row content of the model as a postcondition

Workflow

1. Choose the model to test

Self explanatory -- the title says it all!

2. Mock the inputs

Create an input for each of the nodes the model depends on.
Specify the mock data it should use.
Specify the format if different than the default (YAML dict).
- See the "Data formats for unit tests" section below to determine which format to use.
The mock data only needs include the subset of columns used within this test case.

Tip: Use dbt show to explore existing data from upstream models or sources. This helps you understand realistic input structures. However, always sanitize the sample data to remove any sensitive or PII information before using it in your unit test fixtures.

# Preview upstream model data
dbt show --select upstream_model --limit 5

3. Mock the output

Specify the data that you expect the model to create given those inputs.
Specify the format if different than the default (YAML dict).
- See the "Data formats for unit tests" section below to determine which format to use.
The mock data only needs include the subset of columns used within this test case.

Minimal unit test

Suppose you have this model:

-- models/hello_world.sql

select 'world' as hello

Minimal unit test for that model:

# models/_properties.yml

unit_tests:
  - name: test_hello_world

    # Always only one transformation to test
    model: hello_world

    # No inputs needed this time!
    # Most unit tests will have inputs -- see the "real world example" section below
    given: []

    # Expected output can have zero to many rows
    expect:
      rows:
        - {hello: world}

Executing unit tests

Run the unit tests, build the model, and run the data tests for the hello_world model:

dbt build --select hello_world

This saves on warehouse spend as the model will only be materialized and move on to the data tests if the unit tests pass successfully.

Or only run the unit tests without building the model or running the data tests:

dbt test --select "hello_world,test_type:unit"

Or choose a specific unit test by name:

dbt test --select test_is_valid_email_address

Excluding unit tests from production builds

dbt Labs strongly recommends only running unit tests in development or CI environments. Since the inputs of the unit tests are static, there's no need to use additional compute cycles running them in production. Use them when doing development for a test-driven approach and CI to ensure changes don't break them.

Use the --resource-type flag --exclude-resource-type or the DBT_EXCLUDE_RESOURCE_TYPES environment variable to exclude unit tests from your production builds and save compute.

More realistic example

unit_tests:

  - name: test_order_items_count_drink_items_with_zero_drinks
    description: >
      Scenario: Order without any drinks
        When the `order_items_summary` table is built
        Given an order with nothing but 1 food item
        Then the count of drink items is 0

    # Model
    model: order_items_summary

    # Inputs
    given:
      - input: ref('order_items')
        rows:
          - {
              order_id: 76,
              order_item_id: 3,
              is_drink_item: false,
            }
      - input: ref('stg_orders')
        rows:
          - { order_id: 76 }

    # Output
    expect:
      rows:
        - {
            order_id: 76,
            count_drink_items: 0,
          }

For more examples of unit tests, see references/examples.md

Supported and unsupported scenarios

dbt only supports unit testing SQL models.
- Unit testing Python models is not supported.
- Unit testing non-model nodes like snapshots, seeds, sources, analyses, etc. is not supported.
dbt only supports adding unit tests to models in your current project.
- Unit testing cross-project models or models imported from a package is not supported.
dbt does not support unit testing models that use the materialized view materialization.
dbt does not support unit testing models that use recursive SQL.
dbt does not support unit testing models that use introspective queries.
dbt does not support an expect output for final state of the database table after inserting/merging for incremental models.
dbt does support an expect output for what will be merged/inserted for incremental models.

Handy to know

Unit tests must be defined in a YAML file in your model-paths directory (models/ by default)
Fixture files for unit tests must be defined in a SQL or CSV file in your test-paths directory (tests/fixtures by default)
Include all ref or source model references in the unit test configuration as inputs to avoid "node not found" errors during compilation.
If your model has multiple versions, by default the unit test will run on all versions of your model.
If you want to unit test a model that depends on an ephemeral model, you must use format: sql for the ephemeral model input.
Table names within the model must be aliased in order to unit test join logic

YAML for specifying unit tests

For all the required and optional keys in the YAML definition of unit tests, see references/spec.md

Inputs for unit tests

Use inputs in your unit tests to reference a specific model or source for the test:

For input:, use a string that represents a ref or source call:
- ref('my_model') or ref('my_model', v='2') or ref('dougs_project', 'users')
- source('source_schema', 'source_name')
For seed inputs:
- If you do not supply an input for a seed, we will use the seed's CSV file as the input.
- If you do supply an input for a seed, we will use that input instead.
Use “empty” inputs by setting rows to an empty list rows: []
- This is useful if the model has a or dependency, but its values are irrelevant to this particular unit test. Just beware if the model has a join on that input that would cause rows to drop out!

models/schema.yml

unit_tests:
  - name: test_is_valid_email_address  # this is the unique name of the test
    model: dim_customers  # name of the model I'm unit testing
    given:  # the mock data for your inputs
      - input: ref('stg_customers')
        rows:
         - {email: cool@example.com,     email_top_level_domain: example.com}
         - {email: cool@unknown.com,     email_top_level_domain: unknown.com}
         - {email: badgmail.com,         email_top_level_domain: gmail.com}
         - {email: missingdot@gmailcom,  email_top_level_domain: gmail.com}
      - input: ref('top_level_email_domains')
        rows:
         - {tld: example.com}
         - {tld: gmail.com}
      - input: ref('irrelevant_dependency')  # dependency that we need to acknowlege, but does not need any data
        rows: []
...

Data `format`s for unit tests

dbt supports three formats for mock data within unit tests:

dict (default): Inline YAML dictionary values.
csv: Inline CSV values or a CSV file.
sql: Inline SQL query or a SQL file.

To see examples of each of the formats, see references/examples.md

How to choose the `format`

Use the dict format by default, but fall back to another format as-needed.
Use the sql format when testing a model that depends on an ephemeral model
Use the sql format when unit testing a column whose data type is not supported by the dict or csv formats.
Use the csv or sql formats when using a fixture file. Default to csv, but fallback to sql if any of the column data types are not supported by the csv format.

Notes:

For the sql format you must supply mock data for all columns whereas dict and csv may supply only a subset.
Only the sql format allows you to unit test a model that depends on an ephemeral model -- dict and csv can't be used in that case.
There are no formats that support Jinja.

Fixture files

The dict format only supports inline YAML mock data, but you can also use csv or sql either inline or in a separate fixture file. Store your fixture files in a fixtures subdirectory in any of your test-paths. For example, tests/fixtures/my_unit_test_fixture.sql.

When using the dict or csv format, you only have to define the mock data for the columns relevant to you. This enables you to write succinct and specific unit tests. For the sql format all columns need to be defined.

Special cases

Unit testing incremental models. See references/special-cases-incremental-model.md.
Unit testing a model that depends on ephemeral model(s). See references/special-cases-ephemeral-dependency.md.
Unit test a model that depends on any introspective macros, project variables, or environment variables. See references/special-cases-special-case-overrides.md.
Unit testing versioned SQL models. See references/special-cases-versioned-model.md.

Platform/adapter-specific caveats

There are platform-specific details required if implementing on (Redshift, BigQuery, etc). Read the caveats file for your database (if it exists):

Platform/adapter-specific data types

Unit tests are designed to test for the expected values , not for the data types themselves. dbt takes the value you provide and attempts to cast it to the data type as inferred from the input and output models.

How you specify input and expected values in your unit test YAML definitions are largely consistent across data warehouses, with some variation for more complex data types.

Read the data types file for your database:

Disabling a unit test

By default, all specified unit tests are enabled and will be included according to the --select flag.

To disable a unit test from being executed, set:

    config: 
      enabled: false

This is helpful if a unit test is incorrectly failing and it needs to be disabled until it is fixed.

When a unit test fails

When a unit test fails, there will be a log message of "actual differs from expected", and it will show a "data diff" between the two:

actual differs from expected:

@@ ,email           ,is_valid_email_address
→  ,cool@example.com,True→False
   ,cool@unknown.com,False

There are two main possibilities when a unit test fails:

There was an error in the way the unit test was constructed (false positive)
There is an bug is the model (true positive)

It takes expert judgement to determine one from the other.

The `--empty` flag

The direct parents of the model that you’re unit testing need to exist in the warehouse before you can execute the unit test. The run and build commands supports the --empty flag for building schema-only dry runs. The --empty flag limits the refs and sources to zero rows. dbt will still execute the model SQL against the target data warehouse but will avoid expensive reads of input data. This validates dependencies and ensures your models will build properly.

Use the --empty flag to build an empty version of the models to save warehouse spend.

dbt run --select "stg_customers top_level_email_domains" --empty

Common Mistakes

Mistake	Fix
Testing simple SQL using built-in functions	Only unit test complex logic: regex, date math, window functions, multi-condition case statements
Mocking all columns in input data	Only include columns relevant to the test case
Using `sql` format when `dict` works	Prefer `dict` (most readable), fall back to `csv` or `sql` only when needed
Missing `input` for a `ref` or `source`

Weekly Installs

Repository

dbt-labs/dbt-ag…t-skills

GitHub Stars

246

First Seen

Jan 29, 2026

Security Audits

Gen Agent Trust HubPass SocketPass SnykPass

Installed on

github-copilot59

opencode58

codex57

gemini-cli56

amp53

kimi-cli53

Excel财务建模规范与xlsx文件处理指南：专业格式、零错误公式与数据分析

45,000 周安装

The sql format is the least readable and requires suppling mock data for all columns, so prefer other formats when possible. But it is also the most flexible, and should be used as the fallback in scenarios where dict or csv won't work.

dbt 单元测试完整指南：为模型添加测试，确保数据转换质量

🇨🇳中文介绍

为 dbt 模型添加单元测试

额外资源

相关 Skills

dbt 中的单元测试是什么

何时使用

何时不使用

通用格式

工作流程