August 19, 2025

Top Guides – Data School

Databricks vs dbt testing – Singular test

In Databricks Delta Live Tables (DLT), singular tests can be implemented using the @dlt.expect decorator. A singular test in dbt typically refers to a test on a single column or a small set of rows, and it focuses on validating specific data quality issues such as checking for null values, uniqueness, or referential integrity in a particular field.

In DLT, you can define similar checks as part of your transformation logic, using @dlt.expect for specific columns or conditions on your tables. These tests ensure that data adheres to certain rules, and if the conditions are not met, the pipeline will fail, similar to how dbt works.

Singular Tests in dbt vs. Databricks DLT

In dbt, you might write a singular test like this:

Not Null Test (dbt):

— dbt: Check for nulls in a column
select *
from {{ ref(‘my_model’) }}
where my_column is null

Unique Test (dbt):

— dbt: Check for duplicates
select my_column, count(*)
from {{ ref(‘my_model’) }}
group by my_column
having count(*) > 1

In Databricks DLT, similar tests can be written using the @dlt.expect decorator inside your pipeline code.

How to Implement Singular Tests in Databricks DLT

Let’s walk through how you can replicate singular tests for common checks such as non-null, unique, and other conditions in DLT.

1. Not Null Test (Singular Test)

To check if a column contains any null values, you can use @dlt.expect with a condition that ensures the column doesn’t contain any nulls. This will be a singular test on a single column.

Example: Not Null Test in DLT

import dlt
from pyspark.sql.functions import col

# Singular Test to ensure ‘id’ column is not null
@dlt.table
@dlt.expect(“id_not_null”, col(“id”).isNotNull())
def cleaned_data():
return (
dlt.read(“raw_data”)
.filter(col(“id”).isNotNull())
.dropDuplicates([“id”])
)

Explanation:
- @dlt.expect("id_not_null", col("id").isNotNull()): This is a singular test that checks whether the id column contains any null values.
- If any row contains a null value in the id column, the pipeline will fail at this point.

2. Unique Test (Singular Test)

You can also write a singular test to check if a column contains unique values, similar to dbt’s uniqueness test.

Example: Unique Test in DLT

import dlt
from pyspark.sql.functions import col

# Singular Test to ensure ‘id’ column is unique
@dlt.table
@dlt.expect(“id_unique”, “COUNT(DISTINCT id) = COUNT(id)”)
def cleaned_data():
return (
dlt.read(“raw_data”)
.filter(col(“id”).isNotNull())
.dropDuplicates([“id”])
)

Explanation:
- @dlt.expect("id_unique", "COUNT(DISTINCT id) = COUNT(id)"): This is a singular test that ensures there are no duplicates in the id column.
- If the count of distinct id values is not equal to the total count, the pipeline will fail, indicating that there are duplicates.

3. Range Validity Test (Singular Test)

If you want to check if a column contains values within a specific range, this can be implemented as a singular test.

Example: Range Test (e.g., Age)

import dlt
from pyspark.sql.functions import col

# Singular Test to ensure ‘age’ column is within the valid range (18 to 100)
@dlt.table
@dlt.expect(“valid_age_range”, (col(“age”) >= 18) & (col(“age”) <= 100)) def cleaned_data(): return ( dlt.read("raw_data") .filter(col("age").isNotNull()) )

Explanation:
- @dlt.expect("valid_age_range", (col("age") >= 18) & (col("age") <= 100)): This singular test checks whether the age column contains values between 18 and 100 (inclusive).
- If any value falls outside this range, the pipeline will fail.

4. Check for Accepted Values (Singular Test)

For some columns, you might want to ensure that the values belong to an allowed set. For example, a column with a status could only contain specific values like "active", "inactive", or "pending".

Example: Accepted Values Test (Status)

import dlt
from pyspark.sql.functions import col

# Singular Test to ensure ‘status’ column contains only accepted values
@dlt.table
@dlt.expect(“status_valid”, col(“status”).isin(“active”, “inactive”, “pending”))
def cleaned_data():
return (
dlt.read(“raw_data”)
.filter(col(“status”).isNotNull())
)

Explanation:
- @dlt.expect("status_valid", col("status").isin("active", "inactive", "pending")): This singular test ensures that the status column contains only the values "active", "inactive", or "pending".
- If there is a value outside this list, the pipeline will fail.

5. Combining Multiple Singular Tests

You can also combine multiple singular tests into a single pipeline to enforce multiple data quality rules. This is similar to running several tests in dbt within a single model.

Example: Combining Singular Tests

import dlt
from pyspark.sql.functions import col

@dlt.table
@dlt.expect(“id_not_null”, col(“id”).isNotNull()) # Check for nulls in ‘id’
@dlt.expect(“id_unique”, “COUNT(DISTINCT id) = COUNT(id)”) # Check for uniqueness of ‘id’
@dlt.expect(“valid_age_range”, (col(“age”) >= 18) & (col(“age”) <= 100)) # Check if 'age' is within range def cleaned_data(): return ( dlt.read("raw_data") .filter(col("id").isNotNull()) .dropDuplicates(["id"]) .filter((col("age") >= 18) & (col(“age”) <= 100)) )

Explanation:
- Here, we have three singular tests:
  - Check if id is not null.
  - Ensure id is unique.
  - Ensure age falls within the valid range of 18 to 100.

If any of these tests fail, the pipeline will stop processing and show an error indicating which test failed.

Handling Test Failures in DLT

When using @dlt.expect, if a test fails, the pipeline will fail at that point and display a clear error message.
You can use Databricks’ UI or API to monitor pipeline failures and see which tests have failed during execution.

If you want to capture errors or take action after a failure, you can also implement custom error handling inside your pipeline logic, though by default, DLT will stop processing on test failure.

Summary: Singular Tests in Databricks DLT

In Databricks Delta Live Tables (DLT), you can implement singular tests using the @dlt.expect decorator to enforce data quality checks on specific columns, similar to dbt’s singular tests.

Not Null: Check if a column contains null values.
Unique: Ensure there are no duplicate values in a column.
Range Validity: Check if values in a column fall within a defined range.
Accepted Values: Ensure that column values belong to a predefined set of valid values.

These singular tests can be combined to apply multiple checks on a single table or column, and the pipeline will fail if any of the tests fail.

By using @dlt.expect, you can enforce data quality rules within your DLT pipeline, ensuring that your data is validated and cleaned before further processing.