Deborah Poindexter: Meta Data Driven Enterprise Data Management – Future or Fantasy?
By Deborah Poindexter
Is your data out of control? Is your enterprise data management fragmented and inconsistent? Are you unable to answer these questions? If the answer to any of these questions is “Yes”, then Enterprise Data Management may be a discipline you should investigate. Enterprise Data Management (EDM) is continually gaining acceptance as a critical function of IT, since data is the foundation of all business decisions. Data must be carefully managed and controlled to achieve its full usefulness and value to the organization, and to allow sound business decisions to be made and refined. EDM comprises several components: Data Quality, Data Governance, Master Data Management, and a Managed Meta Data Environment. Typically we manage each of these components separately with little or no overlap. Sometimes they are managed by many organizations using many disparate tools. But does this approach really accomplish what we are all looking for – consistent, high quality, dependable, interoperable data assets? If the answer is no, we must ask how can we achieve data nirvana with EDM. A meta data driven methodology for EDM allows all components to share information about the data as it moves through its lifecycle, thereby enabling consistency, accountability, and true control of our data assets.
What is a meta data driven EDM? It is simply the centralized management of all meta data to create a semantically-rich, robust, and dynamic meta data interchange. Meta data flows bi-directionally from source to meta data repository and back to source. The managed meta data environment (MME) becomes the origination point for semantic changes, and also the system of record for security, compliance, access, and regulatory policy. And of course, the MME is the single source of truth for all information about data and process.
Let’s examine how a meta data driven approach could affect each component of EDM. To achieve Data Quality, a holistic view of the data is required. Solving a problem in application A can break application B. Having access to the meta data for all the steps in the lifecycle allows for the data quality team to easily spot points of failure, origination points, and redundancies. When data quality issues are discovered, the meta data will point us to the appropriate business and IT contact to begin the process of correcting the data.
Data Governance requires an owner and a steward for every piece of data. Having a named person who is responsible for the care and feeding of the data at any given point in its lifecycle (“steward”) expedites data quality issues, change requirements, influences the development of new applications, and can enable reuse of the data. Governance also helps us understand whether the data is enterprise level data, or business unit or subject area specific, knowledge which can be of extremely valuable in data warehouse development. Since data stewards understand their data intimately, drive standards and data quality, and generally have a vested interest in “their” data, they can be wonderful champions for a meta data driven approach.
Master Data Management (MDM) is defined as the formulation and implementation of a unified set of principles, processes, and practices, fully supported by a governing body, to provide consistent management for all corporate master data environments. MDM is such a logical way to track, explain, understand, compare, and report on master data that it should be a fundamental in all organizations.
For most of us, master data is the beginning of the data lifecycle. A managed meta data environment allows us to understand redundant master data (sounds like an oxymoron doesn’t it?) and whether we can clean it and reuse it. Once again, the data steward plays a critical role in this process. Nowhere in the data lifecycle is impact analysis more critical than in targeting master data to be managed as part of the enterprise’s key assets. Complete, accurate, and reportable meta data easily reveals the impact of changes to master data on ALL systems and their owners, which means no nasty surprises in data usage at any point in the lifecycle.
While we certainly capture information about data quality, data governance and master data in our Managed Meta Data Environment (MME), isn’t there more to our total data galaxy? What about our OLTP systems, data warehouses, messages, and unstructured data? They should all be part of the MME. An MME allows us to link all data to its stewards, master data sources (if not the original source), and applications which access or update it. An MME shows us the flow of the data into and through our OLTP systems, who is responsible for those systems and the data created and updated therein. It allows us to see these systems as sources and targets, both for other OLTP systems as well as for data warehouse environments. What happens to the data as it flows along its path? Where are our points of failure? What metrics are we capturing? All this information should be part of an MME.
If all these attributes are captured, managed and reported from a managed meta data environment, doesn’t it make sense to make the MME the center of our data universe? That’s what a meta data driven methodology is all about – using the meta data as the starting point for all EDM functions. Imagine a data warehouse front end that simply provides the user with a pick list of meta data objects on which to build a report. Based on the user, our MME would decide which meta data objects to display, which associated objects the user was cleared to see, what the content should contain, what formula to apply to any derived or calculated fields, and how and where to deliver the finished report. All this is meta data. Unfortunately, it is currently captured in many places by many applications and not typically managed as an asset. Having a single source for meta data knowledge is tantamount to having the Rosetta Stone for the organization’s data.
Imagine every new development project starting with the MME to determine what data currently exists which can be reused, who owns the data, how comprehensive it is, what other processes affect it, and where it is currently reported. The MME would also determine project scope by performing impact analysis for the entire data lifecycle. It would deliver data models to your data architects, transformation rules to your integration architects, and data quality requirements and metrics to your data quality staff. All from a single point of reference.
Okay, so you agree to the premise that a meta data driven enterprise data management approach is the right thing to do. Can you actually do it with existing tools? Probably not, unfortunately. Current meta data management tools are not able to effectively extract and couple all the meta data needed to produce a robust data lineage which must include identifying data stewards, transformation / integration rules, processes, metrics, and security. Could you accomplish this with existing technology? Probably, but it would not be a trivial exercise.
So what is the answer to this conundrum? There are many factors which will help you decide which route to take. First, is your organization managing its meta data at the enterprise level currently? If so, you are on the path to success with a meta data driven EDM approach. If not, rally the troops and educate your organization and its management on the critical nature of meta data management. Sarbanes Oxley can be a good place to start! Management already understands the importance of Sarbanes Oxley compliance. Meta data management can make gathering the legally required information for SOX a breeze.
The first step in any data related project – or any project actually – is requirements gathering. Really understand the meta data requirements for your organization. Can the meta data driven approach satisfy most of these requirements? If so, you can make a business case to continue your quest. Create a meta data model to help you understand these requirements. In the future, this model may become the basis for extending your current MME.
For some organizations, a large data warehousing project provides a great place to start driving this approach. To successfully build and administer a data warehouse, meta data must be clearly understood and managed. On the sourcing end of the warehouse, meta data describes where the data is extracted, who owns it, what transformations and mappings must occur to get it into the warehouse plus any cleansing activities. Operational meta data is required to properly manage, distribute, and secure the data in the data warehouse. Building or interfacing with the data delivery layer of a data warehouse can be enhanced by well-managed meta data. If you are extracting data from the warehouse into marts, use your MME to capture, track, and report on the destination for each data element and what rules are applied. Operational metrics may also be part of your MME. Obviously, a great deal of meta data is created and should be managed within the data warehouse. Since a data warehouse project is an area that is fairly well understood, unlike some of our legacy OLTP systems, it is often a good starting point for a meta data driven EDM approach.
Sound like a big job? It is. Which is why we are all counting on the meta data repository vendors to continue expanding and enhancing their products in the years to come. An MME must provide a way to capture, associate, maintain, navigate, and report on the most valuable asset in your organization – meta data.
In the final analysis, it is obvious that a meta data driven EDM is the best way to care for your data, but few organizations are ready for the cultural and technical challenges it presents. By continuing to grow awareness of the essential value of enterprise data and meta data management, by urging software vendors to embrace this vision, and by continuing to support the principles, processes and practices of good data management, we lay the foundation for future realization of meta data driven enterprise data management.
About the Author
Deborah Poindexter is a principal consultant and methodologist with EWSolutions specializing in meta data management methodology and enterprise data architecture. Previous accomplishments include the development and implementation of an Enterprise Meta Data Management Program for Hewlett Packard, design of a Meta Data driven architecture for HP’s Enterprise Data Warehouse, and the formation and management of several Communities of Practice focused on Meta Data Management and general Data Architecture Standards and Best Practices. She also developed and implemented a Data Architecture training and certification program. Deborah began her career as a COBOL and Assembly language programmer. She quickly determined that data was her true passion and began working as a DBA. Moving into Data Architecture was a natural transition and she began working as a Data Architect in 1991. Deborah has developed and delivered Data Architecture and Meta Data training and certification programs for several clients and regularly speaks at technical conferences. Her hobbies include cooking, throwing pottery and creating stained glass. Deborah can be reached at DPoindexter@EWSolutions.com