Managed Meta Data Environment: A Complete Walkthrough (part 3 of 8)
By David Marco
This article is adapted from the book “Universal Meta Data Models” by David Marco & Michael Jennings, John Wiley & Sons
In part 2 of this series, the Meta Data Sourcing Layer of a Managed Meta Data Environment (MME) was presented, along with a walkthrough of one of the most common sources of meta data: Software tools. In part 3 month’s column I will walkthrough two additional common meta data sources:
Documents and spreadsheets
End users are one of the most important sources for of meta data that is brought into the MME. These users come in two flavors: business and technical. Figure 1 lists the types of meta data entry done by each group.
Often the business meta data for a corporation is stored in the collective conscience of its employees’ “tribal knowledge”. As a result, it is vital for the business users to input business meta data into the repository. The need for active and engaged business users ties into the topic of data stewardship1.
Figure 1: Meta Data Sourcing Layer: End User Meta Data Entry
The technical users also need direct access into the Meta Data Repository to input their technical meta data. Because much of the technical meta data is stored in various software tools, the task for technical users to input the technical meta data is not as rigorous as it is for business users to input the business meta data.
The interface for both of these user groups should be Web-enabled. The Web provides an easy to use and intuitive interface that both of these groups are familiar with. It is critical that this interface is directly linked to the meta data in the repository. I strongly suggest the use of drop boxes and pick lists, as these are functions that users are highly familiar with. You should always use the referential integrity that the database software provides.
Documents and Spreadsheets
A great deal of meta data is stored in corporate documents (Microsoft Word) and spreadsheets (Microsoft Excel). The requirements of your MME will greatly impact the degree to which you need to bring in meta data from documents or to provide pointers to them. Sometimes these documents and spreadsheets are located in a central area of a network or on an employee’s computer. In most organizations, though, documents and spreadsheets tend to be highly volatile, and lack standardized formats and business rules. As a result, they are traditionally one of the most unreliable and problematic sources of meta data in the MME. Sometimes business meta data for these sources can be found in the note or comment fields associated to the document or to a cell (if a spreadsheet). Technical meta data, such as calculation, dependences, or lookup values are stored in the application’s (Microsoft Excel or Lotus 1-2-3) proprietary data store.
For companies that have implemented a document management system, it’s important to extract the meta data out of these sources and bring it into the MME’s repository. Typically when a company builds a document or content management system, it also purchases a software product to aid management of meta data on documents, images, audio, geospatial (geographical topography), and spreadsheets. It is important to have a meta data sourcing layer that can read the meta data in the document management tool and extract it out and bring it into the MME’s repository. This task is extremely difficult because most document management companies do not understand that they are really meta data repositories and, as such, need to be accessible. These tools often employ proprietary database software to persist their meta data and/or their internal database structure is highly obfuscated, meaning that the structure of the meta data is not represented in the meta model, but is instead, represented in program code. As a result, it can be difficult to build processes to pull meta data out of these sources (Figure 2).
Figure 2: Meta Data Sourcing Layer: Document Management Sources
1For a detailed discussion on Data Stewardship please see David Marco’s four-part series in the Data Governance & Stewardship Resource Portal.