Managed Meta Data Environment: A Complete Walkthrough (part 2 of 8)
By David Marco
This article is adapted from the book “Universal Meta Data Models” by David Marco & Michael Jennings, John Wiley & Sons
In part one of this series, I presented the Managed Meta Data Environment (MME) and its six components:
Meta data sourcing layer
Meta data integration layer
Meta data repository
Meta data management layer
Meta data marts
Meta data delivery layer
In the second part of this series on the MME I will discuss the Meta Data Sourcing Layer and begin to discuss the most common sources of meta data that this layer targets.
Meta Data Sourcing Layer
The meta data sourcing layer is the first component of the MME architecture. The purpose of the MetaData Sourcing Layer is to extract meta data from its source and to send it into the Meta Data Integration Layer or directly into the meta data repository (see Figure 1). Some meta data will be accessed by the MME through the use of pointers (distributed) that will present the meta data to the end user at the time that it is requested. The pointers are managed by the Meta Data Sourcing Layer and stored in the Meta Data Repository.
Figure 1: Meta Data Sourcing Layer
It is best to send the extracted meta data to the same hardware location as the Meta Data Repository. Often meta data architects incorrectly build meta data integration processes on the platform that the meta data is sourced from (other than record selection, which is acceptable). This merging of the meta data sourcing layer with the meta data integration layer is a common mistake that causes a whole host of issues.
As sources of meta data are changed and added (and they will), the meta data integration process is negatively impacted. When the meta data sourcing layer is separated from the Meta Data Integration Layer only the meta data sourcing layer if impacted by this type of change. By keeping all of the meta data together on the target platform the meta data architect can adapt the integration processes much more easily.
Keeping the extraction layer separate from the sourcing layer provides a tidy backup and restart point. Meta data loading errors typically happen in the meta data transformation layer. Without the extraction layer, if an error occurred the architect would have to go back to the source of the meta data and re-read it. This can cause a number of problems. If the source of meta data has been updated it may become out of sync with some of the other sources of meta data that it integrates with. Also the meta data source may currently be in use and this processing could impact the performance of the meta data source. The golden rule of meta data extraction is:
Never have multiple processes extracting the same meta data from the same meta data source.
In these situations, the timeliness and consequently the accuracy of the meta data can be compromised. For example, suppose that you have built one meta data extraction process (Process #1) that reads physical attribute names from a modeling tool’s tables to load a target entity in the meta model table that contains physical attribute names. You also built a second process (Process #2) to read and load attribute domain values. It is possible that the attribute table in the modeling tool has been changed between the running of Process #1 and Process #2. This situation would cause the meta data to be out-of-sync.
This situation can also cause unnecessary delays in the loading of the meta data with meta data sources that have limited availability/batch windows. For example, if you were reading database logs from your enterprise resource planning (ERP) system you would not want to run multiple extraction processes on these logs since they most likely have a limited amount of available batch window. While this situation doesn’t happen often, there is no reason to build in unnecessary flaws into your meta data architecture.
The number and variety of meta data sources will vary greatly based on the business requirements of your MME. Though there are sources of meta data that many companies commonly source, I’ve never seen two meta data repositories that have exactly the same meta data sources (have you every seen two data warehouses with exactly the same source information?), but following are the most common meta data sources:
Documents and spreadsheets
Messaging and transactions
Web sites and E-commerce
A great deal of valuable meta data is stored in various software tools. Keep in mind that many of these tools have internal meta data repositories designed to enable the tool’s specific functionality and typically are not designed to be accessed by meta data users, or integrated into other sources of meta data. You will need to set up processes to go into these tool’s repositories and pull the meta data out.
Of these tools, relational databases and modeling tools are the most common sources of meta data for the meta data sourcing layer. The MME usually reads the database’s system tables to extract meta data about physical column names, logical attribute names, physical table names, logical entity names, relationships, indexing, change data capture, and data access.
Part Three of this series will continue to walking through the sources of meta data that the Meta Data Extraction Layer targets.