Data Warehouse Indicators of Success (Part 2)
By Larissa Moss
This article is excerpted from Data Warehouse Project Management (Addison-Wesley, Spring 2000). All material is copyright Addison Wesley Longman, Sid Adelman and Larissa Moss. It is the second article in a two-part series on Data Warehouse Indicators of Success.
In our previous article, we talked about the measures of success. In this article, we will discuss the factors that are necessary for a project to succeed, and how to measure your data warehouse results to determine if your project was indeed a success.
Critical Success Factors
If a factor or characteristic is critical to the success of a project, we shall call it a critical success factor (CSF). The absence of that factor or characteristic dooms the project. CSFs provide a measure for the completion and quality of the project. By knowing and understanding what is very important, the project manager can make a case for adequate budget, resources, schedule improvement, and management commitment.
Examples of CSFs
These CSFs are mandatory for a successful data warehouse project:
Common data definitions
The definitions in most organizations make the Tower of Babel look communicative. Every department has its own set of definitions for business terms, which are often defined differently by other departments. To make matters worse, these departmental definitions are rarely documented. Department heads assume that everyone shares their understanding of the business and shares their definition of the major business terms. Not wanting to appear stupid, most employees do not question the meaning of business terms. While it is not possible to gain definitional concurrence among departments, each project must have a glossary of business terms that support the project.
Well-defined transformation rules
As the data is brought over from the source systems to the data warehouse, much of the data will be transformed in one way or another. The data may be specifically selected, recoded, summarized, integrated with other data or changed in some other way. The rules for the transformations are critical to the users getting what they expect and need.
Users properly trained
In spite of what the vendors tell you, users must be trained and the training should be geared to the level of user and the way they plan to use the data warehouse. In addition to the tool, users should learn about the availability of pre-defined queries and reports. Users must learn about the data and the power users should have more in-depth training on the data structures. Just-in-time training will solidify and reinforce the skills learned in class as the students immediately begin using the data warehouse at the conclusion of the class.
- Expectations communicated to the users
IT is often unwilling or afraid to tell the users what they will be getting and when
Performance – Users must know that not all of their queries will have sub-second response time. A query joining two tables of ten million rows each will take minutes, or even hours, and the users should expect such.
Expectations of availability include the time and days (ex. 6 AM to 11 PM, Monday through Saturday) the system is scheduled to be accessible as well as the percentage of time planned for availability (ex. 97% availability during scheduled hours). A service level agreement (SLA) will normally document an availability agreement.
Function includes what data will be accessible and what pre-defined queries and reports are available. It also means the level of detail data as well as how the data is integrated and aggregated.
The expectation of simplicity is the ease of use. Users do not want a complex system.
The expectations of accuracy are for both the cleanliness of the data as well as an understanding of what the data means.
Timeliness is both when the data will be available (ex. Three days after month-end) as well as the frequency of refreshing the data (ex. Daily, weekly, monthly).
Schedule expectations involve when the system is due for delivery. Since not all the users will be getting access on the first implementation, each user needs to know when they will get their turn.
The expectation for support comes into play as the users have problems. Where will they go for help, how knowledgeable will the support be for the query tools, and how well will they understand the data.
All these expectations should be documented in a scope agreement. Besides the scope agreement document, every opportunity must be seized to clarify expectations for the users, especially when a casual comment may create a misunderstanding of what they will be getting.
Ensured user involvement
There are three level of user involvement:
“Build it; they will use it.”
Solicit requirements input from the users.
Have the users involved all the way through the project
Number 3 is by far the most successful approach while number 1 almost always results in failure. A commitment by the users to the project is critical to the project’s success and the users’ involvement is an excellent indicator of that commitment.
- The project has a good sponsor
The best sponsor is from the business side, not from IT. The sponsor should be well connected, willing to provide an ample budget and be able to get other resources needed for the project. The sponsor should be accepting of problems as they occur and not use those problems as an excuse to either kill the project or withdraw support. Most importantly, the sponsor should be in serious need of the data warehouse’s capabilities to solve a specific problem or gain some advantage for his or her department.
- The team has the right skill set
Without the right skills dedicated to the team, the project will fail. The emphasis is on “dedicated to the team.” It does little good to have skills somewhere in the organization if they are unavailable to the project. The critical roles should be reporting directly to the project manager. Matrix management does not allow the project manager to control these resources. Without this control, there are no guarantees that the people will be available when needed.
- The schedule is realistic
The most common cause of failure is an unrealistic schedule, usually imposed without the input or the concurrence of the project manager or the team members. Most often, the imposed schedules have no rationale for specific dates but are only means to “hold the project manager to a schedule.” Those imposing the schedule usually have little concept of the tasks and effort required.
- The project has proper control procedures (change control)
There will always be changes in the scope, but the scope must be controlled and change control must be implemented just as it is in transactional systems.
- The right tools have been chosen
The first decisions to be made are the categories of tools (extract, transform and load, data cleansing, OLAP, ROLAP, data modeling, administration, etc.). Many of these tools are expensive, not just for their initial costs and maintenance costs, but in training, consulting and in terms of the internal people required to implement and support the tool. The tools must match the requirements of the organization, the users and the project. The tools must work together without the need to build interfaces or write special code.
You are going to have to determine what the CSFs are for your organization and your project. With these in mind and after they have been documented, you will be able to compete for the scarce resources (good people, budget, time, etc.) you will need for your project.
Measuring Data Warehouse Results
The only way we will know if we are successful is to monitor and measure the project. The measurements are both subjective and objective. Just like certain medical tests, some of these measures are invasive and may have negative consequences, such as an impact on performance or the stability of the system. Some measures are costly; they require knowledgeable people and machine resources to carry them out. The clever project manager will select the appropriate metrics by evaluating both cost and impact. The metrics that should be considered are:
- Functional quality
Do the capabilities of the data warehouse satisfy the user requirements? Does the data warehouse provide the information necessary for the users to do his or her their job?
- Data quality
If the data warehouse data is of poor quality, the users will reject it. There are two means of measuring quality:
- Ask the users if their reports are accurate,
- Use a software tool to provide a scorecard on the quality of the data.
Be aware that the software tools cannot evaluate all types of data quality.
- Computer performance
There are four indicators of performance we should consider:
- Query response time,
- Report response time,
- Time to load/update/refresh the data warehouse,
- Machine resource.
Some organizations have established benchmark performance numbers for known queries and reports, and they exercise and measure these benchmarks periodically looking for impending performance problems. There are a number of tools that measure performance. Most of the database management systems have imbedded capabilities to measure database performance. Third-party utilities supplement this capability. A number of the query and report tools have response time metrics.
- Network performance
The ability of the network to handle the data traffic will directly impact response time. Network software measures line load, line traffic and indicates conditions where an activity was waiting for line availability. Besides the software, network administrators must be available to analyze the results and take appropriate action.
- User satisfaction
Users must be polled shortly after being given the data warehouse capability and then polled periodically to identify changes in their level of satisfaction and to watch trends. At the end of this article, we have a sample user satisfaction questionnaire and can be easily tailored to your organization.
- Number of queries
Many of the query tools provide metrics on the number of queries executed by department and by individual.
- What data is accessed
Many organizations have data that is never accessed. This is the result of inaccurate or incomplete requirements gathering or of the users changing their minds. Sometimes IT loads all of the source data fearing the user will ask for something they did not anticipate in the requirements gathering phase. IT has been beaten up so often by the users who want “all the data, they want to keep it forever and they want the system delivered yesterday”. There are tools that will identify what data is actually being accessed and how often.
- Satisfies scope agreement
A scope agreement documents what functions the users will be getting and when. It’s appropriate to review the scope document and determine which functions might not have been satisfied and why.
- Benefits achieved
Before the project began, you estimated the benefits, both tangible and intangible for your project. Now you need to measure the tangible benefits and make some approximations for the intangibles. Since the benefits will not materialize the first day the system is installed, measurement should wait at least two months after implementation.
No project is perfect initially. There are always opportunities for improvement. In many cases, these improvements are necessary for the project to even survive. By measuring results you and your project will be in a position to know where the resources must be directed to make the necessary improvements that every data warehouse project needs.
About the Authors
Sid Adelman is a charter member of Business Information Alliance. He co-authored a methodology and project planning product tailored specifically for data warehouse. Sid is an international speaker at data warehouse and industry conferences.
He can be reached at firstname.lastname@example.org
Larissa Moss is president of Method Focus Inc., a consulting firm specializing in data warehousing. Larissa has published numerous articles on various data management topics. She frequently teaches seminars in the United States, Canada and Europe.
Larissa can be reached at email@example.com.