Abstract Data Models – Blessing or Curse
Many managers, developers, architects, and data modelers I have met have questions around the challenges and benefits of working with an abstract data model and database schema. The presence and exposure to abstract models has grown significantly in the last 10 years. However, most education on these types of models seems to come from vendors who either sell a product that runs on such a model or provide a service that is dependent on proving the value of an abstract model.
On the surface, the topic alone frequently generates significant banter from those that have strong opinions either way. However, when we look below the surface, there are many facets to consider when thinking about abstract data models: where they have value, where they present challenges, and some critical success factors you will want to rely on.
What Is An Abstract Model?
First, let’s look at what an abstract data model is. The process of abstracting data is used to combine like things for higher level classification and grouping. For example, patients, providers, employees, researchers, etc… are all people. In theory if you roll them up to all people, you can have one definition and storage area for all people. The promise of significant more simplicity is obviously attractive; it should result in reduced timeframes and lower costs. While in theory this makes it sound much easier, the information a corporation wants to capture about different types of people will likely differ. Thus, there is the challenge of how do you capture and separate information that is different about patients from providers from employees, etc…?
The most common sources of abstract models come from 2 main areas:
- Vendors – they are able to preach the promise that one database schema/model can run for any business in any industry.
- Desire to generalize functions – the promise of a Service Oriented Architecture can be further leveraged by having abstracted models to handle edits, validations, and retrievals of similar data.
Where Is The Value?
There are several areas where truly well-built abstract based solutions have value. In order to properly examine that value, let’s separate them based on the types of solutions they support.
The value of an abstract model in applications can have several benefits. Generally, it is intended to make it easier to get data in. A few specific examples of value include:
- Reduction in overall code
- Reduction in support and maintenance
- More efficient use of data structures – data is stored once and leveraged many times
- Consistent application of edits and rules across data makes data more consistent – ex: all addresses across all entities are handled with the same editing rules.
The value of an abstract model in analytics also shows up in several ways, but the details and dependencies seem more significant. Getting data from disparate sources to a level of integration that will support various analytics is probably the single biggest benefit, albeit, there are many ways to do this that do not require an abstract database. A few specific examples of value related to analytics include:
- Easier integration of data from disparate sources
- Faster build of data loading jobs – consistent edits and cleansing rules that are reusable.
- Smaller numbers of tables in a normalized data structure
- Data marts have very small numbers of tables and are easier for operations to manage/support
- Smaller numbers of tables mean less joins and generally much more efficient data load times and less hardware/infrastructure required to perform the loads.
What Are Some Of The Challenges?
Again this would probably be best represented by separating applications and analytics.
- While the promise of a vendor having an abstract based solution that allows you to easily get all data in is alluring, the fact that most abstract based vended solutions take exponentially more time and money to implement than they tell you should be very concerning. The stories of implementations that take several years and 100’s of millions of dollars are easy to find and easy to validate. I have personally seen 2 of these first hand and both could have been avoided if it weren’t for the promise that executive leadership was enticed by only to see the reality that bites. While there are many horror stories with these large packages, there are few success stories. Most implementations that end successfully are more about surviving the project. The root cause of this has more to do with the challenge of mapping and defining specific data to very high-level definitions. Developers will struggle with this and there really aren’t tools to help them resolve these things easily. This is actually magnified further in maintenance activities. When a developer knows that a user has a problem with a specific reference to a field, but trying to find that field and a correct definition for that role of that field is typically extremely time-consuming.
- In healthcare there is critical need for having a master patient index to tie all patient data to the correct patient. Thinking in abstract terms, you can easily see that taking that to the level of a master person index would provide additional flexibility. However, please consider the ramifications of this before jumping in. If you do not have a master patient index that is sound and reliable, that is likely due to either not having addressed it yet or not having worked out the issues required to solve that challenging dilemma. While that dilemma is indeed something that is not easy, consider how hard it will be to solve the challenge of a master person index. If you have the time, money, and wherewithal to accomplish this, you still must consider the value of such an effort. If you have a provider table and a patient table, I can probably quickly write a query that can match who is a provider is also a patient with relative certainty and very little to no cost. How much is it worth to your organization to go through the effort and challenge vs. how much reward you will realize for your efforts.
- Data marts – Become very hard for users to navigate. You must take time to build all of the user insulation into the front end solution, which causes it to become very complex and difficult to support – this is the single biggest tradeoff and architecture question you need to answer – do you want complexity in the data or in the usage? For those organizations that are having a hard time defining this, I would suggest always fall back to the data. However, in the right circumstances, an abstract solution can be the perfect fit.
- While getting data in is easier with this type of solution, getting data out is much more difficult. You do not want your users navigating an abstract database, just like you do not want them writing direct SQL against your warehouse or mart. By following a sound architecture of putting an insulated BI tool in between the users and the database, you make the creation of that insulation layer incredibly difficult with an abstracted solution. Seasoned developers that work well with abstracted solutions are very good at this and can mitigate the overhead, but only to an extent. Developers that have not dealt with this type of solution most often struggle immensely without really strong mentorship.
- Classifications that are very clearly defined and delineated are great candidates for abstraction. Unfortunately, many areas of data, especially in organizations new to analytics, do not fit these criteria. In this case, building and supporting these classifications becomes a significant challenge, if not an insurmountable issue.
Critical Success Factors
Here are a few critical success factors you should be aware of should you decide to leverage an abstract model in your applications or analytics.
- The level of abstraction you go to provides the single biggest challenge. If your goal is to abstract slightly from where root data would be defined, the impact is minimal. If you want to abstract all the way to the highest level (I have always argued that this the definition of a noun – a person, place, or thing – everything will fit into those 3 buckets) the challenges to leverage that data effectively will be immense.
- Experience. Having senior developers that have not just read about these types of solutions, but have actually worked with them is critical to success. The best source of these employees comes from those that have worked directly in developing vended software or integrators that have to successfully implement packages that they did not develop. In most cases, these resources have had to figure out how to work with and maintain these solutions in multiple business clients that have different needs.
- Experience. Data modelers that have built these types of models are nice to have, but if they have not developed solutions that leverage them and are not experienced in how to make them successful, you are in serious jeopardy of having a solution that looks good from a model perspective, but never gets used by the business. This is one way that data modelers unfortunately get reputations for not delivering to the business.
- One solution to very high level abstraction that I like changes your physical implementation to end up a hybrid from a logical abstracted model. In this case, the logical will contain information that is highly abstracted, but in the physical there will be a master person table that has all people in it and a separate table for patients, another for provider, etc… This allows you to make sure that any new person that is associated to a specific role will have all of the information that is necessary to clearly define that type of person. Yet, they will be formally linked within the data for each and every role they perform.
- Consider the target area for your model. An abstract model to run everything in your business means you had better have very deep pockets and be willing to spend eons of time to address applying a new model to all of your existing applications. A more targeted solution for a specific need will allow you to keep focused and deliver a level of abstraction that is appropriate for that solution – be it one application or an area of applications or analytical solutions.
If your organization is looking at designing or implementing solutions based on abstract data models, hopefully this article will have given you a better idea of the makeup, purpose, and some criteria you need to help decide if that is the best solution for your need. This topics feels like it warrants much more in-depth analysis and discussion, but in the form of an article, this is about as deep as I could take it.
My one suggestion to those trying to decide on whether or not to use abstraction and to what level would be – when in doubt, go without – you do not need to have one of these types of solutions in order for your business to be successful. Generally they are harder to work with and learn. With really strong, seasoned resources, you can build very effective solutions based on abstract models, but keep in mind the maintenance and support, not to mention if you lose those experienced resources.
About the Author
Bruce has over 20 years of IT experience focused on data / application architecture, and IT management, mostly relating to Data Warehousing. His work spans the industries of healthcare, finance, travel, transportation, retailing, and other areas working formally as an IT architect, manager/director, and consultant. Bruce has successfully engaged business leadership in understanding the value of enterprise data management and establishing the backing and funding to build enterprise data architecture programs for large companies. He has taught classes to business and IT resources ranging from data modeling and ETL architecture to specific BI/ETL tools and subjects like “getting business value from BI tools”. He enjoys speaking at conferences and seminars on data delivery and data architectures. Bruce D. Johnson is the Managing director of Data Architecture, Strategy, and Governance for Recombinant Data (a healthcare solutions provider) and can be reached at firstname.lastname@example.org