Data Quality Dimension and Implementation

Prevent bad data from propagating downstream. Addressing data quality issues at later stages can be costly and time-consuming.
In certain scenarios, poor data quality may result in reputational damage, increased risk exposure, and regulatory breaches.
Automated data quality checks should be used to abort pipelines when critical issues are detected.

Note: This is a case-by-case decision. While early prevention is appropriate in some scenarios, stopping pipelines for every issue can increase maintenance complexity.
In such cases, issues should be allowed to flow through and be clearly flagged in DQ reporting for fixing them later and keep the users aware of any known issues.

These two approaches: stopping pipelines and flagging issues can be implemented independently within the pipeline by using appropriate implementation techniques and leveraging the modern data processing tools.

Choosing the right ETL tool plays a key role in data quality, it allows you to control how errors are handled, whether that means aborting the pipeline or raising warnings. In our case we use dbt inbuild test to control them.
When it comes to reporting data quality issues at a granular level, databases like Snowflake or databricks make it possible to capture and store row-level errors in a JSON format by adding the errors in a separate field. These errors can then be unpivoted in a downstream view and clearly surfaced in BI dashboards, providing transparency and the details of the errors reported.

For Data Quality (DQ) reporting, the following must be needed

Apply data quality checks and measure each data quality dimension.
Report the total number of code bases, including checks applied vs. not applied.
For applied checks, report the count of Pass, Fail, and Not Evaluated/Tested.
Provide reporting broken down by System, Data Product, and Module.

DQ Dimension checks and examples

Accuracy

Data accurately represents the real world value.

Compares the value against Master data (Direct comparision)
Salary amount is correct as per the contract / employee type / band (Indirect comparision or sense check)
Age or DOB is between the valid range
FTE is between the valid values i.e. between 0 and 1

Consistency

Data is same across different systems or different databases or tables

Employee status in Workday = status in Datapay = status in payroll
Code is same across different systems i.e. The values and he meaning is same across all systems example: Cost centre / Location / department / division is same across everywhere

Completeness

Required data is present (Not null / Not missing)

all workers have work schedule, employment type, contracts, agreements
workers have work address, emergency contact filled in
workers part of payroll has their tax details
These issues could arise to the new joiners or due to some failures in the Data capture process.

Timeliness

Measures whether data is delivered on time and remains up to date when needed. Freshness checks and pipeline monitoring provide early warning of delays and availability issues, allowing teams to be informed.

Make sure the report displays the extract date and time and the sources used and their freshness
Analysing trends in timeliness data quality tests, such as source freshness, over time helps identify how often data falls out of sync and impacts reporting.
It also reveals whether events occurred in the correct order i.e., highlighting cases where an action was recorded before its prerequisite process was completed.

Validity & Confirmity

Data confirms to defined rules, formats, standards and valid from the domain context.

Email address is in a valid fomat
Data is in the valid format
Similar to accuracy data is in the valid range can be reported under this DQ dimension

Uniquness & Integrity Checks

This covers duplicate tests and Prevent duplicate records from entering the database due to:

Issues in the source system
Recent changes in business processes
Incorrect code updates in recent releases
Duplicate loads during ETL or data migration
Detect and flag existing duplicates in datasets to maintain accurate reporting and analytics.
Ensure primary and composite keys are consistently enforced across systems.
Enable automated deduplication checks or reconciliation processes where appropriate.
Improve decision-making by maintaining a single version of truth for each entity.

Integrity checks ensures the relationship between the tables are valid
example: manager ID exist in worker table

Summary

Most teams are familiar with data quality dimensions, but the real impact comes from how effectively they are implemented and used in practice. This determines an organisation’s maturity in managing data quality.