“Independent” Data Marts: Being Stranded on Islands of Data – Part 1
By David Marco
There is a severe disease that has spread to epidemic proportions throughout our society. This disease is particularly dangerous as it effects are not readily identifiable at the time of infection. However if this condition goes untreated it can be debilitating and even terminal. This disease is not hepatitis, but rather “independent” data marts. While this imagery may seem a bit dramatic, unfortunately it reflects the reality in many of today’s companies. For example, at EWSolutions we have a large client that has many multi-terabyte data warehouses. Upon closer examination we have estimated that they have 75 – 200 independent data marts. The cost to this company for data warehousing is well over $500 million annually. Sadly, their situation is not a unique one. If you work for a government agency or a Global 2000 company it is highly likely that your data warehouse architecture is that of independent data marts.
This column the first of a three part series on migrating from independent data marts to an architected data warehouse solution. This installment will address the characteristics of independent data marts, the flaws in their architecture, and the reasons why they exist. Part two will address specifically how a company can migrate off of the independent data mart architecture to an architected data warehouse solution.
Characteristics of Independent Data Marts
Independent data marts are characterized by several traits. First, each data mart is sourced directly from the operational systems without the structure of a data warehouse to supply the architecture necessary to sustain and grow the data marts. Second, these data marts are typically built independently from one another by autonomous teams. Typically, these teams will usually utilize varying tools, software, hardware, procedures, standards and processes.
Possibly the most visually descriptive trait of a company that has constructed independent data marts is that once they map out a schema of there data warehousing systems is that the schema will resemble that of a “spaghetti” chart (see Figure 1).
* What is most disturbing is the number of companies that have expressed that this chart resembles their current data warehousing architecture.
Figure 1: Independent Data Mart Architecture
As we see this architecture is not an architecture at all. Instead it is a series of “stovepipe” data mart systems. This architecture greatly differs from that of an architected data warehouse (see Figure II).
The purpose of this article is to discuss independent data marts and the process for migrating off of them to an architected solution; however, we will briefly touch on the topic of data warehouse architecture. We will not go into a detailed discussion of top-down vs. bottom-up approaches (we will save that topic for a future column), except to say that the “classic” top-down approach is a more scalable, and logical approach for constructing a data warehousing system. It is surprising how often the top-down methodology is mistaken for a “galactic” approach. This is a misunderstanding as the top-down approach is best used iteratively and incrementally to build the data warehousing system. When used in this fashion the cost for building a data warehouse that feeds “dependent” data marts becomes highly comparable to the cost of building independent data marts.
Figure 2: Architected Data Warehousing System
Problems With Independent Data Marts
As the number of independent data marts grow, the amount of redundant data begins to grow uncontrollably across the enterprise. This redundancy occurs because each of the independent data marts requires its own, typically duplicated copy of the detailed corporate data. Often a great deal of this detailed data is not required in the data marts, which typically provide summarized views.
It would be enlightening if a study were conducted to calculate the costs of maintaining non-necessary redundant data for Fortune 1000 companies. The end total would be in the billions of dollars in expenses and lost opportunity. Certainly it has been my experience working with large government agencies and Global 2000 companies that needlessly duplicate data is running rampant throughout our industry. As a result, IT budgets are straining under this weight.
A data warehouse provides the architecture to centralize the data integration and cleansing activities common to all of the data marts of a company. Without the data warehouse all of these data integration and cleansing processes need to be duplicated for all of the independent data marts. This greatly increases the number of support staff required to maintain the data warehousing system as these tasks are the largest and most costly data warehousing activities.
Separate teams will typically build each of the independent data marts in isolation of one another. As a result, these teams do not leverage the other’s standards, processes, knowledge, and lessons learned. This results in a great deal of rework and reanalysis.
These autonomous teams will commonly select differing tools, software, and hardware. This forces the enterprise to retain skilled employees to support each of these technologies. In addition, a great deal of financial savings is lost, as standardization on these tools doesn’t occur. Often a software, hardware, or tool contract can be negotiated to provide considerable discounts
for enterprise licenses, which can be phased into. These economies of scale can provide tremendous cost savings to the organization.
Independent data marts directly read operational system files and/or tables, which greatly limits the data warehousing system’s ability to scale. For example, if a company has five independent data marts it is likely that each data mart would require customer information. Therefore, there would be five separate extracts being pulled off of the same customer tables in the operational system of record. Most operational systems have limited batch windows and can not support this number extracts. With a data warehouse only one extract is required in the operational system of record.
As previously discussed each independent data mart is built by autonomous teams, typically working for separate departments. As a result, these data marts are not integrated and none of them contain an enterprise view of the corporation. Therefore, if the CEO asks the IT department to provide him with a “listing of our most profitable customers” each data mart will offer a different answer. Having worked with a company that had experienced this exact situation I can attest that the CIO is rarely pleased to have to explain why his department cannot answer this seemingly simple question. In this company’s case the CIO and his directors where removed from their positions.
Why Do Independent Data Marts Exist?
With all of these architectural flaws it would seem surprising that so many companies have built their data warehousing systems around this architecture. There are several reasons why this aberration has occurred.
Data Warehousing Systems Are Complex
When the decision support craze spread, most companies were looking to build a data warehouse of their own. Unfortunately, the task of building a well architected and scalable business intelligence system is complicated and requires sophisticated software, expensive hardware, and a highly skilled and experienced team. Finding data warehouse architects and project leaders that truly understand data warehouse architecture is a daunting challenge, both in the corporate and consulting ranks.
In order to construct a data warehouse a corporation must truly come to terms with their data and the business procedures that the data represent. While this task is challenging it is a necessary step and one in which the true value of the data warehousing process is derived from.
Independent Data Mart Shortcut
Building independent data marts are less expensive than architected data warehousing systems. In addition, independent data marts can be constructed fairly quickly and do not require a company to really understand their data beyond that of individual departments as a data warehouse requires. These points have been effectively used to sell the concept of constructing independent data marts. Unfortunately, it is this lack of thorough analysis and long-term planning that limits the independent data marts from being an effective business intelligence system.
Inappropriate Vendor Messages
Many vendors (both consulting and software) have developed tools/methodologies that are effective at building small departmental independent data marts. These companies in their rush to market with these tools have worked very hard at selling the independent data mart concept (of course it is never worded like this). The reasons are obvious. These companies can significantly reduce their sales cycles because only one department is involved in the software purchasing decision. In addition, their software requires much less sophistication because they merely need to build a standalone data store.
In my next column the second part of this three part series will take an in-depth look at how to migrate off of this flawed architecture. In that article we will present the two approaches for migrating from independent data marts, identify necessary initial corporate decisions, methods for identifying the migration path to the architected solution, and we will walk through an independent data mart migration case study.
* It is important to note that this chart is an actual client’s data warehousing architecture schematic. I’m proud to say that they are no longer on this architecture.