Managed Meta Data Environment: A Complete Walkthrough (part 5 of 8)
By David Marco
This article is adapted from the book “Universal Meta Data Models” by David Marco & Michael Jennings, John Wiley & Sons.
Over the last several parts of this series, I presented the first component of a Managed Meta Data Environment (MME), the Meta Data Sourcing Layer. The fifth installment on the six architectural components of a MME will walkthrough the second and third major components of a MME: Meta Data Repository and Meta Data Integration Layer.
Meta Data Integration Layer
The meta data integration layer (Figure 1) takes the various sources of meta data, integrates them, and load it into the meta data repository. This approach differs slightly from the common techniques used to load data into a data warehouse, as the data warehouse clearly separates the transformation (what we call integration) process from the load process. In a MME these steps are combined because, unlike a data warehouse, the volume of meta data is not nearly that of data warehousing data. As a general rule the MMEs holds between 5-20 gigabytes of meta data; however, as MME’s are looking to target data audit related meta data then storage can grow into the 20-75 gigabyte range and over the next few years you will see some MME’s reach the terabyte range.
Figure 1: Meta Data Integration Layer
The specific steps in this process depend on whether you are building a custom process or if you are using a meta data integration tool to assist your effort. If you decide to use a meta data integration tool, the specific tool selection can also greatly impact this process.
Meta Data Repository
A meta data repository is a fancy name for a database designed to gather, retain, and disseminate meta data. The meta data repository (Figure 2) is responsible for the cataloging and persistent physical storage of the meta data.
Figure 2: Meta Data Repository
The Meta Data Repository should be generic, integrated, current and historical. Generic means that the physical meta model looks to store meta data by meta data subject area as opposed to applicationspecific. For example, a generic meta model will have an attribute named “DATABASE_PHYS_NAME” that will hold the physical database names within the company. A meta model that is application-specific would name this same attribute “ORACLE_PHYS_NAME”. The problem with application-specific meta models is that meta data subject areas change. To return to our example, today Oracle may be our company’s database standard. Tomorrow we may switch the standard to SQL Server for cost or compatibility advantages. This situation would cause needless additional changes to the change to the physical meta model.1
A Meta Data Repository also provides an integrated view of the enterprise’s major meta data subject areas. The repository should allow the user to view all entities within the company, and not just entities loaded in Oracle or entities that are just in the customer relationship management (CRM) applications.
Third, the Meta Data Repository contains current and future meta data, meaning that the meta data is periodically updated to reflect the current and future technical and business environment. Keep in mind that a Meta Data Repository is constantly being updated and it needs to be, in order to be truly valuable.
Lastly, meta data repositories are historical. A good repository will hold historical views of the meta data, even as it changes over time. This allows a corporation to understand how their business has changed over time. This is especially critical if the MME is supporting an application that contains historical data, like a data warehouse or a CRM application. For example, if the business meta data definition for “customer” is “anyone that has purchased a product from our company within one of our stores or through our catalog”. A year later a new distribution channel is added to the strategy. The company constructs a Web site to allow customers to order our products. At that point in time, the business meta data definition for customer would be modified to “anyone that has purchased a product from our company within one of our stores, through our mail order catalog or through the web”. A good Meta Data Repository stores both of these definitions because they both have validity, depending on what data you are analyzing (and the age of that data). Lastly, it is strongly recommended that you implement your Meta Data Repository component on an open, relational database platform, as opposed, to a proprietary database engine.
1See Chapters 4 – 8 of “Universal Meta Data Models” (David Marco & Michael Jennings, Wiley 2004) to see various physical meta models.