Data Warehouse Failures
By Sid Adelman
“They couldn’t hit an elephant at this dis—-“ General John B. Sedgwick –The general’s last words at the Battle of Spotsylvania, 1864.
This article will present the types of failures that have been experienced by various data warehouse projects. There is disagreement over the failure rate of data warehouse projects. Rather than contribute to the debate we will detail the types of situations that could be characterized as failures, and leave it to the reader to decide if they truly constitute failure.
The Project Is Over Budget
Depending on how much the actual expenditures exceeded the budget, the project may be considered a failure. The cause may have been an overly optimistic budget or the inexperience of those calculating the estimate. The inadequate budget might be the result of not wanting to tell management the bitter truth about the costs of a data warehouse.
Unanticipated and expensive consulting help may have been needed. Performance or capacity problems, more users, more queries or more complex queries may have required more hardware or extra effort to resolve the problems. The project scope may have been extended without a change in the budget. Extenuating circumstances such as delays caused by hardware problems, software problems, user unavailability, changes in the business or other factors may have resulted in additional expenses.
Most of the factors listed in the preceding section could also have contributed to the schedule not being met, but the major reason for a slipped schedule is the inexperience or optimism of those creating the project plan. In many cases management wanting to “put a stake in the ground” were the ones who set the schedule by choosing an arbitrary date for delivery in the hope of giving project managers something to shoot for. The schedule becomes a deadline without any real reason for a fixed delivery date. In those cases the schedule is usually established without input from those who know how long it takes to actually perform the data warehouse tasks. The deadline is usually set without the benefit of a project plan. Without a project plan that details the tasks, dependencies and resources, it is impossible to develop a realistic date by which the project should be completed.
Functions and Capabilities Not Implemented
The project agreement specified certain functions and capabilities. These would have included what data to deliver, the quality of the data, the training given to the users, the number of users, the method of delivery e.g. web based, service level agreements (performance and availability), pre-defined queries, etc. If important functions and capabilities were not realized or were postponed to subsequent implementation phases, these would be indications of failure
If the users are unhappy, the project should be considered a failure. Unhappiness is often the result of unrealistic expectations. Users were expecting far more than they got. They may have been promised too much or there may have been a breakdown in communication between IT and the user. IT might not have known enough to correct the users’ false expectations, or may have been afraid to tell them the truth. We often observe situations where the user says jump, and IT is told to say “how high?” Also, the users may have believed the vendors’ promises for grand capabilities and grossly optimistic schedules.
Furthermore, users may be unhappy about the cleanliness of their data, response time, availability, usability of the system, anticipated function and capability, or the quality and availability of support and training.
Unacceptable performance has often been the reason that data warehouse projects are cancelled. Data warehouse performance should be explored for both the query response time and the extract/transform/load time.
Any characterization of good query response time is relative to what is realistic and whether it is acceptable to the user. If the user was expecting sub second response time for queries that join two multi-million-row tables, the expectation would cause the user to say that performance was unacceptable. In this example, good performance should have been measured in minutes, not fractions of a second. The user needs to understand what to expect. Even though the data warehouse may require executing millions of instructions and may require accessing millions of rows of data, there are limits to what the user should be expected to tolerate. We have seen queries where response time is measured in days. Except for a few exceptions, this is clearly unacceptable.
As data warehouses get larger, the extract/transform/load (ETL) process will take longer, sometimes as long as days. This will impact the availability of the data warehouse to the users. Database design, architecture, and hardware configuration, database tuning and the ETL code – whether an ETL product or hand written code – will significantly impact ETL performance. As the ETL process time increases, all of the factors have to be evaluated and adjusted. In some cases the service level agreement for availability will also have to be adjusted. Without such adjustments, the ETL processes may not complete on time, and the project would be considered a failure.
Availability is both scheduled availability (the days per week and the number of hours per day) as well as the percentage of time the system is accessible during scheduled hours. Availability failure is usually the result of the data warehouse being treated as a second-class system. Operational systems usually demand availability service level agreements. The performance evaluations and bonus plans of those IT members who work in operations and in systems often depends on reaching high availability percentages. If the same standards are not applied to the data warehouse, problems will go unnoticed and response to problems will be casual, untimely and ineffective.
Inability to Expand
If a robust architecture and design is not part of the data warehouse implementation, any significant increase in the number of users or increase in the number of queries or complexity of queries may exceed the capabilities of the system. If the data warehouse is successful, there will also be a demand for more data, for more detailed data and, perhaps, a demand for more historical data to perform extended trend analysis, e.g. five years of monthly data.
Poor Quality Data/Reports
If the data is not clean and accurate, the queries and reports will be wrong, In which case users will either make the wrong decisions or, if they recognize that the data is wrong, will mistrust the reports and not act on them. Users may spend significant time validating the report figures, which in turn will impact their productivity. This impact on productivity puts the value of the data warehouse in question.
Too Complicated for Users
Some tools are too difficult for the target audience. Just because IT is comfortable with a tool and its interfaces, it does not follow that all the users will be as enthusiastic. If the tool is too complicated, the users will find ways to avoid it, including asking other people in their department or asking IT to run a report for them. This nullifies one of the primary benefits of a data warehouse, to empower the users to develop their own queries and reports.
Project Not Cost Justified
Every organization should cost justify their data warehouse projects. Justification includes an evaluation of both the costs and the benefits. When the benefits were actually measured after implementation, they may have turned out to be much lower than expected, or the benefits came much later than anticipated. The actual costs may have been much higher than the estimated costs. In fact, the costs may have exceeded both the tangible and intangible benefits.
Management Does Not Recognize the Benefits
In many cases, organizations do not measure the benefits of the data warehouse or do not properly report those benefits to management. Project managers, and IT as a whole, are often shy in boasting about their accomplishments. Sometimes they may not know how to report on their progress or on the impact the data warehouse is having on the organization. The project managers may believe that everyone in the organization will automatically know how wonderfully IT performed, and that everyone will recognize the data warehouse for the success that it is. They are wrong. In most cases, if management is not properly briefed on the usage and the benefits of the data warehouse, they will not recognize its benefits and will be reluctant to continue funding something they do not appreciate.
There are many ways for a data warehouse project to fail. The project can be over budget, the schedule may slip, critical functions may not be implemented, the users could be unhappy and the performance may be unacceptable. The system may not be available when the users expect it, the system may not be able to expand function or users, the data and the reports coming from the data may be of poor quality, the interface may be too complicated for the users, the project may not be cost justified and management might not recognize the benefits of the data warehouse. By knowing the types of failures others have experienced you are in a position to avoid those failures. You must know what risks to anticipate with the data warehouse if you are going to deal with those risks and head them off before they sink your project.
This article is excerpted from the chapter on Data Warehouse Risks inData Warehouse Project Management,by Sid Adelman and Larissa Moss
About the Author
Sid Adelman is a principal consultant with Sid Adelman & Associates, an organization specializing in planning and implementing data warehouses, performing data warehouse and BI assessments, and in establishing effective data strategies. He is a regular speaker at “The Data Warehouse Institute” and IBM’s “DB2 and Data Warehouse Conference”. Sid chairs the “Ask the Experts” column on www.dmreview.com, and has had a bi-monthly column in DMReview. He is a frequent contributor to journals that focus on data warehousing. He co-authored one of the initial works in data warehousing, Data Warehousing, Practical Advice from the Experts, and is co-author of Data Warehouse Project Management with Larissa Moss. He is the principal author of Impossible Data Warehouse Situations with Solutions from the Experts and his newest book, Data Strategy, was co-authored by Larissa Moss and Majid Abai. He can be reached at 818 783 9634and firstname.lastname@example.org. His web site is www.sidadelman.com.