So You Think You Don’t Need A Conceptual Data Model
Many organizations take a shortcut on the time required for modeling by skipping the vital step of developing a Conceptual Data Model (CDM). Conceptual Data Models describe the business and should be developed before attempting to develop any kind of solution model, be that for a Data Warehouse, OLTP, MDM, or any other application.
In the Conceptual Data Model, key business entities or objects and their relationships are identified. In the CDM it is not uncommon to discover a significant number of many to many relationships – how these relationships are addressed in the logical model can make the difference between a successful or failed application.
A many to many relationship does not just affect the logical data model – it can affect nearly every aspect of your application. Apart from the tremendous impact to data design and data quality (which I’ve written about elsewhere), other aspects of your application are much more complex. Impacts include:
project scheduling and estimation
application design and development
data exchange interfaces
Typically a M:M (or M:N) is resolved by an associative entity (though not the only way), therefore one more table and one more relationship is involved which have to be accommodated in the application. The bad news is that the other ways of dealing with a M:M, have their own complexities and tradeoffs as well. The good news is that by identifying these complexities in advance (during requirements gathering) you’ve just helped your team better scope the project, define and understand requirements, and increased the likelihood of coming in on time and under budget.
Assume that in your CDM 50% of relationships are M:M (not unreasonable when you account for longitudinal or over time perspectives – a CDM is not a state model and so the model needs to reflect the “over time” view). Assume that 65% of M:M relationships in your CDM must be resolved by an associative entity. Now, assume there are 50 conceptual entities and 100 relationships overall – roughly 32 of these relationships would be resolved with an associative entity – you’ve just added 64 new objects to your logical model (32 associative entities/tables plus 32 extra relationships). You’ve roughly increased your complexity by over 40%.
The calculation is: (M * AE%) * 100
(E + R)
Where M = number of M:M relationships (50)
AE = percent to be resolved by an associative entity (65%)
E = total number of conceptual entities (50)
R = total number of conceptual relationships (100)
The table below describes the methods for resolving a M:M relationship. These relationships can arise from different scenarios – it may be helpful to be cognizant of the origination of the M:M in order to help determine the best way to resolve it:
Inherent nature of the business. E.g. A customer may have many accounts, an account may have many customers.
Longitudinal perspective. Something that normally would be thought of as a 1:M may over time become M:M. For example, a clerk works for only 1 store at a time, but over time a clerk can move and work at different stores.
Some conceptual entities may be implemented as an attribute in the logical model – but there may still be M:M’s to resolve. For example, in a denormalized model it may be acceptable or desirable for a dimension to have repeating groups (e.g. for a Type 3 Slowly Changing Dimension)
As you can see resolving a M:M relationship is not a trivial affair and so identifying these relationships in advance is critical to understanding the complexity of your application. The Conceptual Data Model, besides its benefits for semantic reconciliation, improved business/IT alignment, data design and quality is just the tool to identify these many to many relationships. Not just your data is at stake – your entire application is at stake when you don’t properly identify relationships. In order for your application to be successful you must identify the correct data relationships – the question is "when are you going to pay to find these?" Uncovering these M:M’s late in a project or in production results in exponential cost increases.
About The Author
Pete Stiglich is a Senior Consultant/Trainer with EWSolutions with nearly 25 years of IT experience in the fields of data modeling, Data Warehousing (DW), Business Intelligence (BI), Meta Data management, Enterprise Architecture (EA), Data Governance, Data Integration, Customer Relationship Management (CRM), Customer Data Integration (CDI), Master Data Management (MDM), Database Design and Administration, Data Quality, and Project Management.
Pete has taught courses on Managed Meta Data Environments (MME), Data Modeling, Dimensional Data Modeling, Conceptual Data Modeling, ER/Studio, and SQL. Pete has presented at the 2008 MIT Information Quality conference, 2007 and 2008 Marco Masters Series, at DAMA at the international and local level, and at the 2007 IADQ Conference.
Pete’s articles on Information Architecture have been published in Real World Decision Support, DMForum, InfoAdvisors, and the Information and Data Quality Newsletter. Pete is a listed expert for SearchDataManagement on the topics of data modeling and data warehousing. Pete is an industry thought leader in the field of Conceptual Data Modeling.