August 31, 2021
Jeff George, Senior Director of Technology
Like Cyber Security, Data Governance is easy to overlook; it’s a lot of work to do it right, and most of the time, it doesn’t seem like a pressing issue. But – like Cybersecurity – ignoring the risk can seriously damage your business.
As organizations increasingly rely on data to make critical decisions, artifacts and practices that used to be nice-to-haves, like data dictionaries, consistent naming strategy for fields and tables, a taxonomy of values, business rule documentation, audits, etc., are essential. If you’re not addressing those most basic issues, you’re putting your business at risk.
Capturing, storing, and managing data is a necessary competency for modern businesses. Most businesses have gotten very good at leveraging modern tools to manage data streams that would have easily overwhelmed the technology from just a few years ago. But in our zeal to collect the data, we far too often overlook the meaning of data.
We all know the “Three V’s” of Big Data: Velocity, Volume, and Variety. Big data tools like Hadoop and Spark, coupled with the immense scalability of cloud technologies, can handle rapidly streaming data (velocity), vast amounts of data (volume), and many kinds of data (variety). These “Three V’s” are the classic challenges that big data tools solve. But there’s another V that can wreck all of that great work: Vagueness.
Metadata – data about the data – is a crucial component to making data usable, and without it, we introduce inefficiency and risk into every analysis. At a minimum, a lack of clarity around the meaning of data makes your technology and analytics teams inefficient. Digging around to uncover institutional knowledge, cleaning up erratic values, and keeping track of inconsistent names take a toll on some of your most highly skilled resources.
Worse, a lack of clarity around data’s purpose, use, and limitations can result in bad decisions. It’s pretty easy to recognize bad data, but misunderstood data is far more insidious. Using the right data in the wrong way is harder to catch and can have a profound impact on the quality of an analysis and, ultimately a decision.
So what are the signals that we have a Vagueness problem? First, talk to your stakeholders, and be open to their responses. Now is not the time to explain why it’s challenging; it’s the time to confront the brutal facts. Listen for phrases like “I think,” “usually,” and my personal favorite, “that’s how we’ve always done it.” If your users can’t point to a source of truth, or worse, if they’re pointing to different sources of truth, you have a problem.
Launching a data governance program would require a whole article (or three) by itself, but you can start with the basics:
· Begin with your stakeholders. What are their pain points? Get a high-level understanding of the data used and produced across your organization, which data is essential, where the most data challenges are experienced.
· Establish data stewards for your most important data. Data Stewards are SMEs, and they’re responsible for maintaining the quality of the data they own (they usually come from the department that produces or consumes the data, not necessarily from IT).
· Establish your standards: What data do you collect and when, why, how do you collect it. Consider your company’s policies, industry norms, and regulations.
· Build a data dictionary to document your core tables and fields. Tables need to map to logical entities, and fields need clear definitions of their business use and what the various values mean.
· Use a risk-value matrix to figure out whereto start. Consider the value and risk of various data sources. Start with the high-value/high-risk and save low-value/low-risk for the end.
Getting the basics right is an important start, so don’t let the lack of a complete plan stop you from making progress. You may not be able to solve the problem, but in most cases, you can move the needle with small investments and a little bit (ok, a lot) of discipline.