Meta Data Repository Myths
By David Marco
As the meta data repository industry continues to grow the myths and misunderstandings around this market segment also continue to grow. I am fortunate enough to have the opportunity to meet and speak to thousands of meta data professionals every year. During these times, I am often asked questions which reveal that there is a good deal of inaccurate meta data information being disseminated. In this month’s column I am going to address the most common meta data myths and set the record straight.
Myth #1: Meta Data Only Exists In Tools And Is Only Used For Tool Interoperability
Before we dive into this myth lets revisit the definition of meta data. Meta data is
“All physical data (contained in software and other media) and knowledge (contained in employees and various media) from within and outside an organization, containing information about your company’s physical data, industry, technical processes, and business processes.”
This definition of meta data can be summed up in one word…knowledge. Meta data looks to electronically capture the knowledge, both technical and business, that exists in our companies.
When we look at where knowledge exists within our companies, it is clear that the vast majority of knowledge is stored within the minds of the employees (see figure below).
Figure 1: Where Is Corporate Knowledge Stored?
With this definition in mind, it is clear that meta data is much more than what exists in a tool. This misconception employs a limited view of technical meta data and completely ignores business meta data. Business meta data is every bit as important as technical meta data, and often times, even more valuable. As IT professionals, we need to understand and learn how to capture, maintain and disseminate business meta data.
I don’t want to create any misunderstandings; software tools store a great deal of very valuable meta data. Extraction, transformation and load (ETL) tools (e.g. Ascential, Informatica, etc.), data modeling tools (e.g. Erwin), relational databases (e.g. Oracle, DB2, MS SQL Server, etc.) and business intelligence tools (e.g. Business Objects, Cognos, etc.) all have very valuable technical meta data that is commonly stored in a repository.
A related misconception is many people belief that meta data’s only purpose is to enable software tools to communicate with one another (tool interoperability). In fact, when we hear about meta data standards (e.g. Object Management Group’s (OMG) CWM (Common Warehouse Metadata)) there is often significant deal of focus on tool interoperability. Once again, tool interoperability is important; however, it is only one piece of the complete meta data management pie.
Myth #2: Repositories Always Require A Large IT Development Effort
This misperception is as common for data warehouse initiatives as it is for meta data repository projects. Far too often corporations approach meta data repository initiatives from the standpoint of thinking about what they “can” capture as opposed to what they “should” capture. This approach is as misguided as building a data warehouse by asking what data “can” we capture. Instead the company must decide what are the business and technical objectives that they are looking to accomplish by building an enterprise meta data repository. Then they should take a subset of these business and technical objectives and build the repository in an iterative manner to accomplish these objectives.
A good meta data repository is best built iteratively. Do not misunderstand, this is not to advise against building a fully functional, enterprise wide meta data repository that supports all of a company’s systems. It simply means that the highest probability for success comes from implementing a meta data repository in a phased approach instead of trying to “boil the ocean”. For our repository clients, I always look to target 6 – 9 month project cycles; using the first iteration as an opportunity to train the corporation will set the stage for bigger and better future implementations.
Myth #3: A Centralized Architecture Stores All Meta Data Centrally
There is a belief that when the term central meta data repository is used it means that all meta data should be stored centrally. This belief is the equivalent of believing that data warehouses should store all of a corporation’s data. This impression couldn’t be further from the truth. Only meta data that is necessary to meeting the defined business requirements and that which makes sense should be stored centrally. For example, business rules and business definitions are critical meta data that should obviously be stored in a centralized fashion. On the other hand, meta data that there is no requirement to store can stay in its current source.
Understanding these myths will help you avoid their pitfalls and build a successful meta data repository that provides your company with a competitive advantage in the marketplace.
1Marco, David, “Building and Managing the Meta Data Repository”, John Wiley & Sons, 2000