Data Done Right: 6 Dimensions of Data Quality

Improving data quality is much more than “clearing” out bad data; it’s a dedicated process that involves considering all aspects of your collection process. We’ll explain how data quality can decrease over time, and the six dimensions to consider when determining “bad” data.

Bad data quality can lead to inaccurate and slow decision-making. The explosive growth of data resulting from modernizing cloud data & analytics is only making the problem worse. To be successful in business, you need to make decisions fast and based on the right information.

One of the important functions of a data warehouse and an enterprise business intelligence solution is to provide users with a snap-shot of their business at any given point of time. This allows decision makers to gain better insight into their business and market so that they can make decisions faster and better. This is the “new normal” in age of digital transformation, and table stakes for any organization wishing to gain a competitive advantage.

Why is Data Quality Such an Issue?

While a business intelligence system makes it much simpler to analyze and report on the data loaded into a data warehouse system, the existence of data alone does not ensure that executives make decisions smoothly; the quality of the data is equally as important.

You can have data without information, but you cannot have information without data.

Daniel Keys Moran, an American computer programmer and science fiction writer

Furthermore, as more and more organizations are going through digital transformations, analytics capabilities are getting more sophisticated. The complexity of data quality management is growing while the risk and impact of bad data quality is also rising.

Consider a high-level meeting to review company performance: if you learn that two reports compiled from supposedly the same set of data reflect two different revenue figures, no one can know which figures are accurate, which could cause important decisions to be postponed while the “truth” is investigated.

One of the causes of data quality issues is in source data that is housed in a patchwork of operational systems and enterprise applications. Each of these data sources can have scattered or misplaced values, outdated and duplicate records, and inconsistent (or undefined) data standards and formats across customers, products, transactions, financials and more.

Data quality problems can also arise when an enterprise consolidates data during a merger or acquisition. But perhaps the largest contributor to data quality issues is that the data are being entered, edited, maintained, manipulated and reported on by people.

These problems are only exacerbated in modern data & analytics solution architectures in the cloud. Many include data lakes or other raw data ingestion data pipelines with the primary purpose of speed, not necessarily quality. Other use cases including streaming and IoT, for example, are new patterns that legacy data quality tools and processes may not currently cover.

To maintain the accuracy and value of the business-critical operational information that impact strategic decision-making, businesses should implement a data quality strategy that embeds data quality techniques into business processes, enterprise applications and data integration. They should also consider approaches to address downstream in their modern data & analytics architectures.

On the surface, it is obvious that data quality is about cleaning up bad data – data that are missing, incorrect or invalid in some way. But in order to ensure data are trustworthy, it is important to understand the key dimensions of data quality to assess how the data are “bad” in the first place.

Smartbridge is a Snowflake Partner
Azure Partner
Smartbridge is a Microsoft Azure Partner

The 6 dimensions of data quality are: Completeness, Consistency, Conformity, Accuracy, Integrity and Timeliness. Let’s look at how to address each of them.

6 Dimensions of Data Quality

#1: Completeness

Completeness is defined as expected comprehensiveness. Data can be complete even if optional data is missing. As long as the data meets the expectations then the data is considered complete.

For example, a customer’s first name and last name are mandatory but middle name is optional; so a record can be considered complete even if a middle name is not available.

Questions you can ask yourself:

Is all the requisite information available?

Do any data values have missing elements? Or are they in an unusable state?

#2: Consistency

Consistency means data across all systems reflects the same information and are in synch with each other across the enterprise. Examples:

  • A business unit status is closed but there are sales for that business unit.

  • Employee status is terminated but pay status is active.

Questions you can ask yourself:

Are data values the same across the data sets?

Are there any distinct occurrences of the same data instances that provide conflicting information?

#3: Conformity

Conformity means the data is following the set of standard data definitions like data type, size and format. For example, date of birth of customer is in the format “mm/dd/yyyy”

Questions you can ask yourself:

Do data values comply with the specified formats?

If so, do all the data values comply with those formats?

Maintaining conformance to specific formats is important.

#4: Accuracy

Accuracy is the degree to which data correctly reflects the real world object OR an event being described. Examples:

  • Sales of the business unit are the real value.
  • Address of an employee in the employee database is the real address.

Questions you can ask yourself:

Do data objects accurately represent the “real world” values they are expected to model?

Are there incorrect spellings of product or person names, addresses, and even untimely or not current data?

These issues can impact operational and advanced analytics applications.

As a general rule, the most successful man in life is the man who has the best information.

Benjamin Disraeli (1804-1881), 1st Earl of Beaconsfield, British Prime Minister, and Parliamentarian

#5: Integrity

Integrity means validity of data across the relationships and ensures that all data in a database can be traced and connected to other data.

For example, in a customer database, there should be a valid customer, addresses and relationship between them. If there is an address relationship data without a customer then that data is not valid and is considered an orphaned record.

Ask yourself:

Is there are any data missing important relationship linkages?

The inability to link related records together may actually introduce duplication across your systems.

#6: Timeliness

Timeliness references whether information is available when it is expected and needed. Timeliness of data is very important. This is reflected in:

  • Companies that are required to publish their quarterly results within a given frame of time
  • Customer service providing up-to date information to the customers
  • Credit system checking in real-time on the credit card account activity

The timeliness depends on user expectation. Online availability of data could be required for room allocation system in hospitality, but nightly data could be perfectly acceptable for a billing system.

As you can see, data quality is an important issue that should be considered – starting with initial application design, all the way through implementation, maintenance and use. In future blog posts, we will discuss techniques for addressing these types of data quality issues.

Share This:

Looking for more on data management?

Explore more insights and expertise at Smartbridge.com/data

There’s more to explore at Smartbridge.com!

Sign up to be notified when we publish articles, news, videos and more!