When you work with data (or “in data”), you come across inefficiencies and process flaws that are hard to overlook – bad data. However, sometimes, it’s hard to put the problems into words and present a case to leadership or decision-makers. In this article, we’re trying to do exactly that.
If you are responsible for data or are trying to build a data quality management and data governance program in your organization, this guide will explain bad data quality, its pains and issues, and how it affects business initiatives.
Below are our top causes and consequences of bad data quality. If any of these problems seem familiar, it might be time to start addressing or improving your data management program.
If you already know the causes and consequences of bad data quality, you can download our complete data quality framework, a comprehensive guide on data quality to deliver value quickly and iteratively build your data quality program.
What is bad data quality?
Bad data quality refers to data held by your organization that is inaccurate, inconsistent, incomplete, or outdated. Essentially, it’s data that does not reflect reality. It can result from missing values, duplicate records, differences between data sources, and many other reasons. Ultimately, this bad data can hurt your business by slowing down processes and lessening the outcomes of any data-driven initiative.
What causes bad data quality?
Bad data quality can stem from various sources, but several key culprits consistently undermine the reliability and accuracy of data. Here are our top 5 causes of bad data quality:
1 Unclear ownership of data
Data ownership—as in assigning an owner to a data domain or data source—is critical for two reasons:
- Without clearly defined ownership, there is no accountability for the data that is produced. As a result, you get bad data quality.
- If anything goes wrong or a change is about to happen that might affect a particular data sources, it’s also unclear who to contact to fix the bad data quality problem.
The lack of documented data owners makes it hard to implement data improvement initiatives or create self-service means for accessing data. Ownership is important because it imposes accountability for data quality.
2 Siloed operations
Teams often live in siloes. Business teams do not communicate effectively with each other, technical teams work individually, and business and technical teams are not connected, which leads to data silos. They often cooperate on a one-off basis, but the outcomes they produce are not shared with others. In the long term, it means the issues will come up again on a new project, and if teams don’t communicate about their expectations of data, the issue won’t fix itself.
Data scientists spend 50-80% of their time collecting and cleansing data held by different teams in different locations.
When data owners, data stewards, data engineers, and data scientists collaborate, they can minimize bad data quality by creating standards and expectations for critical data assets used for models and reporting.
3 No data quality program in place
You can’t conquer bad data quality without looking at it strategically. Data quality needs to become an enterprise-wide program, with shared tools, data quality rules, enablement, and reporting both on data quality metrics and the impact of improved data on business initiatives.
Data quality can emerge as a bottom-up push, but it absolutely needs to become a top-down motion, where the whole organization gets it and comes on board.
4 No visibility into the state of data and data flows
Teams don’t have an easy way to understand the available data and how it flows throughout IT systems. Having difficulty finding and understanding data can lead to bad data quality. For example, a change in database structure or API can create downstream issues in the data warehouse where data lands in an expected format. In turn, this would cause issues with reporting.
This issue affects data scientists, data engineers, business analysts, business subject matter experts, and IT.
Proper tooling can solve this issue: a data catalog with data lineage and quality capabilities can track and store this information, making it easier to understand.
5 Manual data management processes
All organizations that generate or collect data manage it manually or automatically. The following data processes are big timewasters when done manually:
- Data collection
- Issue correction
- Data classification
- Data validation
- Data cleansing
Organizations without dedicated data quality tools, controls, and workflows for these processes end up repeating them. This can drain budgets and leave more space for human error, leading to a high chance of bad data quality.
What are the consequences of bad data quality?
If bad data quality goes unaddressed, it can cause greater problems for your business in the present and future. Here are a few significant consequences of poor data quality:
1 The data engineering team is flooded with requests to fix data
Data engineers are usually responsible for fixing data issues in organizations with complex data pipelines. They have to do this repeatedly, and it often takes a lot of time to find the root cause of the issue. While they hunt for the answer, more issues can build up, causing poor overall data quality for the business.
Fixing all these issues means data engineers would have less time to code and maintain data quality checks, this is a poorly scalable process where problems can grow exponentially as the workload piles up.
2 Data-dependent teams don’t trust the data
It’s hard to be data-driven when you don’t trust your data. If your teams believe you have bad data quality, they won’t be able to do their work without double and triple-checking results. Here are the signs you have data trust issues:
- Business leaders don’t trust the reports
- Data scientists spend too much time validating and cleansing data
- Product teams are reluctant to use data for decision-making on creating new products
- Teams are reluctant to use data from other business units
#3 Long lead time for deriving value from data
If it takes you weeks or months to access data and finally create reports, something must be fixed. Unfortunately, organizations with low data maturity operate in this fashion. When someone needs access to data, they go through a convoluted process of figuring out where that data might be stored and who owns it. Then, they wait for approval to export that data. When they finally do, they notice the bad data quality, and they either try a new source or try to fix it — on their own or by involving someone technical.
By the time they’re finished, they have done a lot of busy work and probably used up a lot of other people’s time. This complex issue stems from the lack of the right systems, data governance processes, and tools to manage data.
4 Your M&As didn’t go as intended
Mergers and acquisitions are data-intensive activities, and 70-90% of them fail, with integration being one of the top reasons.
Indeed, M&As achieve little without integrating systems and data. That is why master data management best practices are so important.
If your company has been going through one or multiple M&As, listen for the following signs of suboptimal M&As where poor data quality was likely the issue:
- The integration timeline was extended
- Fewer systems were integrated or migrated than expected
- Organizations use inconsistent business language
- No single view of customers, employees, or other data domains exists
5 AI models have questionable ROI and performance
It’s a cliche at this point, but “garbage in, garbage out” holds true for ML models. Data quality is one of the top factors influencing model performance, deployment speed, and long-term reliability. Top performers in AI generate up to 20% of their EBIT from AI models, but getting to that level requires solid investment in data management foundations:
- Data & AI governance
- Data quality automation
- Monitoring models for data drift
If you don’t have these basics in place, you’re likely dealing with one or several of the following issues:
- Frequent reports of data drift and long investigation times (days to weeks)
- Fewer models are deployed than expected on a consistent basis
- AI projects do not deliver the expected results (model accuracy)
- Head of AI or Chief Data Scientists regularly bring up data quality issues
6 System modernization projects go over time and budget
Data & system modernization is on the agenda of every data-driven and innovative organization. Modernization projects simplify the IT system landscape and data flows, consolidate billing and operations, and accelerate data-related activities. Some examples are ERP and core systems consolidation, CRM migrations, Customer 360 projects, and data consumption modernization, like moving from on-premises DWH to a data lakehouse architecture in the cloud.
All of these projects depend on the state of your data. When data is not consistent, valid, and accurate, they eventually grind to a halt. You definitely have a larger problem with data if:
- Modernization projects go over time, and budget
- Projects are often scrapped or put on hold
7 Reporting is manual, ad hoc, and unreliable
Accurate reporting is the bedrock of any data-driven organization. Companies in regulated industries, such as banking, insurance, and life science companies, must submit regulatory reports to authorities, which sets an even higher standard.
Here are some common pains that these organizations experience when dealing with bad data quality:
- Reporting periods end, and the responsible teams must work overtime to manually compile data for reporting.
- Teams aggregate spreadsheets in a datamart manually.
- Authorities often turn down reports, and teams have to fix data issues manually and prepare reports again.
8 Customer acquisition and retention metrics are degrading
Customer-centricity is the single most important factor for successful business digitalization. Companies built around and for their customers are 60% more profitable than others. They are also more likely to receive more information from their customers. It’s a virtuous cycle.
However, what happens when bad data quality causes customer data to degrade over time? Here is a non-exhaustive list of signs that your customer data needs attention:
- Marketing ROI is declining.
- Lack of agility and long lead times to prepare data for marketing campaigns.
- Marketing leadership questions reporting & analytics.
- Frequent customer complaints around preferred methods of communication.
- Delayed billing & reconciliation.
Overcome bad data quality today with our DQ framework
Bad data quality isn’t just a nuisance. It’s a far-reaching issue that can significantly hinder your data-driven organization’s ability to thrive. From operational inefficiencies and missed opportunities to failed projects and damaged customer relationships, the cost of neglecting data quality can be substantial.
To ensure your organization’s data is a valuable asset rather than a liability, it’s crucial to proactively address data quality issues. Download our free ebook, “The End-to-End Data Quality Framework,” and discover a comprehensive guide for building a robust data quality program.