“What’s In YOUR Data Architecture?” Part Two
By Mark Mosley
A popular credit card commercial asks, “What’s in YOUR wallet?” I’ve been asking a similar question of data architects at several different organizations – “What’s in YOUR data architecture?” In Part One of this article, I rephrased this question more precisely to be, “What specification artifacts should be in your target enterprise data architecture?” and identified three major categories of enterprise data architecture artifacts. In Part Two, I will briefly describe common components of an enterprise data model, saving discussion of information value chain analysis and related data delivery architecture for Part Three.
Figure 1. Enterprise Architecture Artifacts
The core of any enterprise data architecture is an enterprise data model (EDM). No enterprise data architecture exists without an enterprise data model. The EDM is an integrated subject-oriented data model defining the essential data produced and consumed across an entire organization.
- Essential means the data critical to the effective operation and decision-making of the organization. Few (if any) enterprise data models define all the data within an enterprise. Decisions must be made (and revisited) about the scope of enterprise data modeling efforts. “Essential” does not mean “common” or “shared.” Essential data requirements may or may not be common to multiple applications and projects. Some data defined in the enterprise data model may be shared by multiple systems, but other data may be critically important yet created and used within a single system. Over time, the enterprise data model should define all data of importance to the enterprise.
- Integrated means that all of the entities, attributes and rules in the model are defined once, without redundancy. The concepts in the model fit together as the CEO sees the enterprise, not reflecting separate and limited functional or departmental views. There is only one version of the Customer entity, one Order entity, etc. Every data element also has a single name and definition. The data model may also identify common synonyms and important distinctions between different sub-types of the same common business entity.
- Subject-oriented means the model is divided into commonly recognized subject areas that span across multiple business processes and application systems. Subject areas are focused around the most essential business entities.
The enterprise data model is an integrated set of data specifications (meta data!), viewable through reports and subject area diagrams. Each subject area diagram depicts business entities and the relationships between these entities. Business entities are classes of things and concepts of interest to the enterprise. We capture, store and analyze data about specific instances of business entities. The model includes an official name and business definition for each entity (often common synonyms, instance examples and related business rules complement the business definition). The model also defines the relationships between two entities, usually as a bi-directional set of verb phrases, with business rules that govern the numeric relationships between instances of each entity. Other relationships identify one business entity as a kind of (sub-type) of another entity.
The scope of each subject area includes 5-30 business entities and their relationships. Each subject area is described with an entity relationship diagram depicting business entities as boxes and business relationships as lines connecting the boxes. Several different modeling styles are commonly used to depict business relationships in entity relationship diagrams. The scope of a given subject area overlaps with the scope of other subject areas, so that a business entity and its relationships may be included in more than one subject area. The collective scope of the subject areas in the enterprise data model should cover all the essential interests of the enterprise.
The conceptual views of business entities and business relationships don’t include any data attributes. These views model business semantics – the meaning of business terms — and in fact are more accurately described as semantic models (also known as ontologies). Non-technical people are often surprised to discover that these conceptual models have so little to do with technology.
The goals of the enterprise data model are:
- To capture at a high level the collective data requirements of the enterprise.
- To align information systems and data management efforts with business strategy
- To guide data integration
- To guide continual improvement of data quality
- To build deeper business understanding and wiser interpretation of data
- To enable and organize data stewardship and data governance
Some enterprise data models also include essential data attributes, shown in more detailed “logical” views of the same business entities and relationships (either in the same subject areas or smaller subsets). An enterprise data model does not attempt to identify all the data attributes required by the enterprise. The model identifies the data attributes of most importance to the operation and management of the enterprise. The model depicts these attributes independent of any specific usage or application context. These “application neutral” logical views are quite different from application-specific logical data models. The enterprise data model is only partially normalized; no “data entities” are created to resolve many-to-many relationships. Including essential data attributes enables the enterprise data model to better address its objectives – to identify enterprise data requirements and to guide data integration.
The enterprise data model is often organized into three layers of abstraction: the subject area model, the enterprise conceptual data model and the enterprise logical data model. The subject area model is simply a list or hierarchy of the subject areas within the enterprise data model. It serves as an introduction to the model and an index to the conceptual and logical views. Sometimes subject areas are depicted graphically in a sort of conceptual picture or map of the enterprise.
Figure 2. Enterprise Data Model Layers
The subject area model taxonomy enables people to more easily access and navigate their way through the subject areas of most interest to them in the enterprise data model. It is also an essential organizational structure for data governance and stewardship. Furthermore, most enterprise data models are developed iteratively and incrementally, focusing on higher priority subject areas first. For all these reasons, it is very important to define a practical and commonly acceptable taxonomy of subject areas from the very start.
The choice of data modeling tool used to capture and maintain the enterprise data model will dictate to some extent how the model is structured. Some organizations keep the enterprise conceptual and logical data models in one integrated data model, while other organizations synchronize two separate data model files. Any graphical depiction of the subject area model is likely to be maintained separately, outside the data modeling tool itself, and so its contents and structure must also be synchronized with the data model as the data model evolves.
Some enterprise data models are extended to include:
- A more complete business glossary, expanded beyond the definition of business entities to include other terms (including processes, roles and organizations).
- Data stewardship responsibility assignments – who is accountable for the quality of meta data in the model and the actual data in the enterprise, either for a subject area or a business entity. The meta data attributes in the data model should be extended to include these assignments.
- Data quality requirements for essential data attributes, for specific dimensions of data quality, in any context or the most common such as:
- “Is this a required (mandatory, non-nullable) attribute?”
- “How current must the data be?”
- “How accurate and precise must the data be?”
- Entity life cycle states, shown as state transition diagrams, depict the trigger events that change the status of particularly important business entities. These diagrams are not supported by all data modeling tools, but the diagrams are relatively simple, so many organizations maintain a supplemental set of diagrams in another tool.
- Reference data value sets for particularly important data attributes, which may be defined externally or internally. While small value sets (domains with less than 20 values) may be listed in the data model itself, large reference data value sets are likely to be maintained outside the data model. Of course, all reference data value sets should be maintained in some form of master data management or code management application.
The enterprise data model is guided by modeling standards, especially naming conventions for entities and attributes. Each subject area view in the enterprise data model is developed collaboratively with data stewards and other subject matter experts. Data architects facilitate and coordinate these efforts through workshops and review sessions. The data model is developed and refined iteratively over time.
The enterprise data model by itself is not enough. The model is part of the overall enterprise architecture. It is critical to understand how data relates to business strategy, process, organization, application systems and technology infrastructure. In Part Three, I’ll introduce how this is done through information value chain analysis and related data delivery architecture.
About the Author
Mark Mosley, Principal Consultant with EWSolutions (www.ewsolutions.com), is a leading expert in enterprise information management (EIM). Mark has over 25 years experience in data modeling, data warehousing, data architecture and organizational change management. As a consultant, enterprise data architect and director of data resource management for multiple corporations, Mark has coordinated several successful data governance programs, developed enterprise data models and implemented enterprise data warehouses and master data management solutions. During his 13 years with IBM, Mark led the development of IBM’s AD Effectiveness Consulting Methodology and trained consultants worldwide in its techniques. Mark has B.S and M.S. degrees from the University of Illinois at Urbana-Champaign. Mark is a guest lecturer at DePaul University and a Certified Data Management Professional (CDMP). Mark serves as chief editor for The DAMA Guide to the Data Management Body of Knowledge (DAMA-DMBOK Guide) and the editor of the DAMA Dictionary of Data Management. He is the author of the DAMA-DMBOK Framework white paper, available for free download at www.dama.org. You can email Mark at firstname.lastname@example.org.