Data Warehouse Goals and Objectives: Part 1
By Larissa Moss
Traditional Decision Support Systems
Larissa Moss and Sid Adelman
This article is excerpted from a book tentatively titled, Data Warehouse Project Management (Addison-Wesley, Spring 2000). All material is copyright Addison Wesley Longman, Sid Adelman and Larissa Moss. It is the second in a three-part series on Data Warehouse Goals and Objectives.
In this article, we will examine the traditional decision support systems (DSS) and the reasons why they have failed to provide complete, correct, and timely information to the organization. In the two follow-up articles, we will describe how short term and long term data warehouse objectives address the deficiencies of traditional DSS environments.
Shortly after manual processes were automated, management in all companies and across all industries started to ask for data from their newly automated systems. Their requests were heard and promptly put on the backlog, because the main focus of the time was still on automating processes. However, the hunger for data was so great, and the tools for manipulating that data were so limited, that it did not take long for data processing to split into two segments: Operational Systems and Decision Support Systems (DSS).
What happened during the decades that followed is a familiar story to all of us. Decision Support Systems could not be built fast enough; a new market for DSS tools was created; business units started to hire their own programmers, and more recently some business units are even buying and managing their own technology platforms. Even the term “data processing” underwent some transformations to reflect the shift from automating processes to providing information. Data Processing (DP) became Information Systems (IS) and Management Information Systems (MIS) and now Information Technology (IT). Newer terms like Corporate Information Factory and Business Intelligence (BI) are further indications of this shift.
On the surface it appears that all is well now; that all information needs are being met; that all participants are satisfied. Or are they? Let’s look under the covers and see what’s wrong with today’s DSS environment. When the impact of the deficiencies of our traditional DSS environment is really understood and felt, the goals and objectives of a data warehouse environment become very clear.
Departmental Views of Data
The speedy proliferation of traditional DSS environments happened in two ways.
IT technicians dedicated to one or more business unit(s) – This is the most common support model. One or more business units are matched up with an IT unit which is dedicated to solving the decision support needs for those business units. The IT unit may be a formal unit within the IT organization, or it may just be an informal group of two to three people whose priority it is to serve those business units.
Business units’ own technicians – With the pervasiveness of client/server, this support model is becoming more and more popular. Analysts and programmers are hired directly by the business units, and in many cases they develop and implement their systems on a hardware/ software platform owned and maintained by their business unit.
In both cases, the technicians are exclusively supporting their business units. Independent from each other, they analyze the requirements, define the data, find the most suitable operational system from which to source their DSS, apply the most appropriate “spin” on the data, and basically cater in all ways to the business unit they are supporting This results in independently developed stand-alone systems. It requires little examination to understand that both models result in DSS deficiencies when seen from the organizational point of view.
As we look at some of the most common DSS deficiencies today, we realize that they are so prevalent, and that we have become so accustomed to them, that we almost accept them as part of the fabric of system development. Those involved may think of the situation as “job security”. As long as these deficiencies exist, there will always be a need to build another system, write another bridge between two systems, rewrite the system in another language, and buy another tool. However, if we seriously think about the impact of the situation, we realize the tremendous waste of time, money and resources we have also been accepting for all this time.
Just what is the impact for the most common DSS deficiencies? Here are some examples:
Data is not understood – This applies in a lesser degree to the original users of a DSS who had the system custom built. They probably understood their data, or at least they thought they did. However, new users, either in the same business unit or from other business units, who want to use the data often do not have the same understanding of it. The data names are often reused and misleading, the data content has changed over the years, and how the data is used in reports may also not be self explanatory. These new users must now take the time to search through documentation, if any exists, or ask other people who have been using the data to explain it to them. Most of the time these explanations are not documented and repeated over and over. And if they are documented, the documentation seldom leaves the original business unit, is almost always out of date, and its existence is often not known to everyone else in the organization. The time spent by the new users and the staff assisting them to learn about the data is wasted time.
Users disagree on data definitions – Since the traditional DSS are developed independently of each other by different people, data is interpreted and used in a business unit’s view and not an organizational view. Where it is possible and appropriate for different business units to have their own view on the data, the absence of reconciling and capturing these views in meta data leads to arguments between users about what the data means, what its values should be, and how it should be used. Again, a lot of unnecessary time is spent on these arguments.
Reports are inconsistent – When users don’t even realize that they have a different interpretation and view of data, they may reject reports that show different totals or a different breakdown than their own. They will further label these other reports, along with the systems that generated them, as “bad”, “wrong”, “useless”, “unreliable”. Too often these labels spill over to the innocent staff that developed the other systems and reports. Not only is energy wasted on arguing about who is right and who is wrong, but the bigger damage is done when different groups no longer respect each other and the work atmosphere is degraded. The impact is lower morale, which usually manifests itself as lower productivity and lower quality.
Users don’t trust the reports – This is often the result when the communication between IT and users is limited, and when IT develops the system with little or no user involvement besides gathering the original requirements. Here we have a situation where the understanding of data is not only different between business units, but also between the business unit and its supporting IT staff. To make things worse, not only may IT lack understanding of the business view of the data, but the users often don’t understand how the data is being captured and manipulated in the operational system. In this “us versus them” environment, users often try to handle their distrust by creating their own pseudo-IT environment. The impact of this solution surpasses low morale and low productivity into tangible costs for duplicating technology platforms.
Data is “dirty” – This is a complaint we hear frequently from users and IT alike. Yet, when faced with the task of analyzing the operational source data and cleaning it up for the DSS, we hear equally frequently such excuses as:
“We are used to the dirty content in this field”
“We are too busy with other things”
“We will just write our queries to eliminate certain records”
“We know how to interpret the bad data values”
“It would take much too long to clean it up”
“It would cost too much to clean it up”
“They will never tighten edit checks on the operational system”
“It’s really not that bad, once you get used to it”
When confronted with the fact that every other person who does not intimately know the dirty data would need a lot of time to learn how to “eliminate certain records” or to “interpret bad data values”, there are usually two types of responses from the users:
- “They’ll just have to take the time to learn. That’s what we had to do. That’s just how it is.”
- “This is our data. No one else needs to access it. They should call us when they need our data.”
Needless to say, the time it takes for many people to relearn the same facts is not being taken into account, and neither is the associated cost being considered.
- Data is not shared or shared reluctantly – Because traditional DSS development has been business-unit centric it is no surprise that data sharing is neither encouraged nor sought after. Users who have full control over their systems in terms of data definitions, data acquisition, data cleansing, data transformation, database design, and tools used, have no incentive to share what they develop. After all, since the funding for their system came out of their own budget, they “have the right” to be in full control of what goes into the system, what comes out of it, and who uses it. And there certainly are no incentives to give up that control, especially if upper management is still holding them accountable for delivering the system in a very short time frame. Sharing means involving other users. Involving other users means reconciling their views of the data. Considering other views will slow down development. Slowed down development could result in missed deadlines. Missed deadlines will be remembered at performance appraisal time. An unfavorable performance appraisal will result in a small raise, or no raise at all, and most likely no bonus for the year. On the other hand, by keeping things separate the raises and bonuses are safe, even though different users will spend time and money to reinvent the proverbial wheel.
- Data is not integrated – Even if there were a willingness to allow other users to share their data, as long as business units build their stand-alone systems based on their individual views, data between systems will not be integrated. Therefore, accessing data across multiple systems often involves writing complicated bridges between systems; a time consuming and costly solution.
- Historical data is not available – We cannot conclude this section without talking about another paradigm shift in DSS: the change from operational decision making to tactical and strategic. The user community of the DSS environment is also changing from business administrators to business analysts and executives from such business units as marketing, legal, finance and human resources. The new focus on tactical, and especially on strategic decision making, brings a new requirement to DSS: the ability to compare data between time periods, geographic regions, and other business dimensions. This translates into the need for historical data which is as easily accessible as current data. Since traditional DSS do not typically store historical data in the same manner as current data, analysts end up creating new elaborate systems to accomplish their analyses. It can be weeks before an analyst can complete an analysis assignment, because that is how long the process of extracting current data, merging it with historical data, running queries and analyzing the query results may take each time.
Data Management Solutions
Data warehousing is not the first attempt at tackling the data management problems discussed above, but it seems to be, if done correctly, the most effective so far.
Data Management became a major topic in most organizations in the early 1980’s. Information Engineering was created as a new IT unit, chartered with developing and applying methods and techniques to manage the organization’s data the same way as any other corporate resource is managed.
One of the first methods for managing data was the corporate data dictionary. Information Engineers spent years loading a central data dictionary with technical data from their operational systems. They ended up with thousands of data elements, hundreds of files, hundreds and thousands of programs and job control language (JCL) procedures representing all of their systems. It took many more years to analyze and define all the accumulated systems-related data in the data dictionary. This was an honorable first attempt at gathering and maintaining meta data. However, the only visible benefit of this exercise appeared to be a tremendous understanding of corporate data by one or two IT analysts, and not much else.
Most companies decided that this was not a cost-effective approach to solving their data management problems, and Information Engineering is now an affectionate memory of “the good old past”.
Since the idea was good but the concept was unworkable, Information Engineers quickly re-invented themselves as Data Administrators. Their mission was the same, but their method was different. A new technique was gaining much popularity during the rise of relational database management systems. This technique was called Entity-Relationship modeling, or logical data modeling. There were some distinct benefits to the logical data modeling approach over the old data dictionary approach:
Top-down analysis – Logical data modeling was based on user participation. Having users involved in the analysis shortened the process considerably.
Business-centric – The data was being analyzed from a business perspective, not from a technical perspective. The benefit was that users, not just one or two IT analysts, now had a tremendous understanding of corporate data.
Relational design – But what really saved data administration from becoming another “extinct IT species” was the direct applicability of a logical data model to relational database design. With few modifications to a logical data model we were actually able to implement the model as an application database.
Data Administration is still aiming for more than managing the data for isolated stand-alone application databases. The charter of Data Administration remains the management of corporate data across all business units and all systems in the organization.
Data Warehouse Administration
The mission of a Data Warehouse is to provide consistent and reconciled business intelligence, which is based on operational data, decision support data, and external data, to all business units in the organization. In order to do that, corporate data must be analyzed, understood, transformed and delivered. Therefore, the Data Warehouse Administration must coordinate and oversee the development, delivery, management and maintenance of the entire data warehouse environment.
It has been difficult for IT professionals to keep up with user demands for information. Many DSS projects are running in parallel to satisfy the different business units of an organization. Since these parallel development activities are neither coordinated nor integrated, they are producing stand-alone DSS systems. These stand-alone systems do not provide an acceptable data management solution because of their inherent deficiencies.
In our next article, we will examine some short term data warehouse objectives that will reduce the deficiencies of traditional decision support systems.
About the Authors
Larissa Moss is president of Method Focus Inc., a consulting firm specializing in data warehousing. Larissa has published numerous articles on various data management topics. She frequently teaches seminars in the United States, Canada and Europe.
Larissa can be reached at firstname.lastname@example.org
Sid Adelman is a charter member of Business Information Alliance. He co-authored a methodology and project planning product tailored specifically for data warehouse. Sid is an international speaker at data warehouse and industry conferences.
He can be reached at email@example.com