Data Warehouse Risks Part 1 – Co Authored by Larissa Moss

By Sid Adelman

This article will provide you with the ability to identify the potential risks within your project and to address those risks before they become major problems. Risk is inherent in any project but the risks involved in a data warehouse project seem to be greater than in others and there are different types of risks. In this article we will present the inherent risks with all their attendant horrors and techniques to deal with each of them.

Types of Risk

All projects have risks and many of the ones discussed here are relevant to operational systems as well as to the data warehouse. Nonetheless, they must be addressed because you are likely to encounter these problems in one way or another.

No Mission or Objectives

Neither the mission nor the objectives for the data warehouse have been defined. This is a boat without a rudder. It is not clear where the data warehouse is going or what business problem it is supposed to solve. Cliches may be used but they may have no real meaning to either management or to the team.

To avoid this situation, we suggest the following remedies:

Identify the Data Warehouse Sponsor –The sponsor – usually on the business side – is someone who has a strong stake in the success of the data warehouse. It may be a department head who desperately needs the information the data warehouse will provide. Occasionally it is an IT executive who, aligned with an important business requirement, has a career stake in the success of the data warehouse.

Define the mission and objectives first –Insist – or strongly recommend that the mission and objectives be defined prior to any serious activity on the project. This is an issue of what comes first. Management may want you to get started and, perhaps, they see defining the mission and objectives as an activity that is not contributing to moving the project along. Now is the time to bring out the examples of those projects that failed or got delayed because their direction was not clear.

Develop a sample set of missions and objectives –The fastest way to build and get approval for a set of missions and objectives is to develop a sample set of missions and present this set to the IT Advisory Board and the Business Advisory Board and ask for their approval. Expect discussion, disagreement and changes to this set.

Convene a committee –Bring a small group of people together who have a strong interest in the data warehouse and where it is going. This will take a little longer than the previous approach but should result in more support from the troops. The list of missions and objectives would still need the approval of the IT and Business Advisory Boards. Carefully select the members of the committee. There will be some important influencers and decision-makers who will not have time for the committee, but be sure to at least copy them on drafts and to solicit their input.

Data Warehouse Mission and Objectives Do Not Map to Those of the Enterprise

Document explicit enterprise objectives –These objectives can be found in the Chairman of the Board’s letter to the shareholders. It can also be found in the organization’s mission statement, vision statement and in documents outlining the strategic direction. Some examples are:

  • Outstanding customer service
  • Superior quality products
  • Low cost provider
  • Fastest delivery to the market

Document implicit enterprise objectives – If there are no explicit enterprise objectives, there probably are implicit or assumed objectives to which most people in the enterprise subscribe. These should be documented and mapped to the data warehouse objectives.

If the Data Warehouse does not support the enterprise objectives – If enterprise objectives exist but the data warehouse does not support them, rethink what you are trying to accomplish with the data warehouse, and consider data warehouse applications that do support the strategic objectives of the enterprise.

Quality of the Source Data Is Not Known

In most organizations, the quality of the operational data is either unknown or grossly overestimated. Organizations often do not even know how to assess the quality of their source systems. The effort involved in transforming and cleansing the source data can be significant. Without knowing about the source data quality, it is impossible to estimate the effort and time to cleanse the data.

Data quality profiling tool – Use a quality evaluation tool to determine just how bad things are. Identify operational data with quality problems to someone high in the organization, perhaps to the CIO, but definitely to the business executive who owns the data. Identify the impact of the dirty data on the reliability of decision making, determine the cost for cleansing, determine the benefits for cleansing, prioritize the cleansing effort, and negotiate time, effort, cost, and scope with the user and the owner of the data.

Implement the cleanest first – There are often multiple files and databases that can be used for the source data. Choose the cleanest source. The data warehouse is always implemented in phases, which are discreet deliverables handed over to the users. Different phases will be using different source data. Try to structure the phases so that the first phase is able to use the cleanest source data.

Metadata reflecting quality – Quality indicators can be stored in the metadata repository. You may choose to assign one of four categories to each data element: “Pristine,” “Questionable,” “Dirty” and “Not Evaluated.”

Skills Are Not in Place

It is rare that the team initially has the right number of people in the right roles with the right skills, and that they are available at the right time. To mitigate the risks associated with shortage of skill, we recommend the following:

Define responsibilities – Define the functional responsibilities of Data Administrators, Database Administrators, Application Developers, both Back End ETL developers and Business Intelligence developers, User Liaisons and any other required roles.

Develop a project plan – By defining the resources required and when they are needed in the project plan, you will be able to explicitly identify the need for the right people with the right skills. Without such a plan, the project manager has nothing to support his or her requests.

Identify candidate resources – List all candidate staff members who are being considered for the data warehouse team and evaluate their skill levels. If there is a shortage of skills in the organization, consider supplementing the team with contractors and consultants.

Convince management – Management will often ask you to get by with a substandard team – inadequate resources and skills. You need to sell management on the need to have skilled people on the data warehouse team. You also need to convince management of the need to have these people sufficiently dedicated to the project. Management may ask you to use a person who is 100% committed to another project. This will not work. This is the time to bring out your notes from discussions with references where they told you how many experienced and trained people it required to implement their data warehouse. Based on size and project complexity, you should be able to extrapolate and provide management with a realistic resource requirement.

If you are unable to convince management on the need for resources, hire an expensive consultant (management doesn’t listen to low-cost consultants) and ask them to either present or write a resource recommendation.

Inadequate Budget

It is often difficult to know ahead of time how much a data warehouse will cost. Data warehouse budgets are often underestimated.

Research – Compile industry publications, presentations and consultant reports that indicate what a typical average data warehouse will normally cost. Watch out for those who give figures for selected subsets of the effort, for those that do not count software or hardware they already have in place or those who do not include costs assigned to some other departments.

Estimate the costs – Itemize each of the costs for your project. Don’t pad the numbers, but don’t underestimate just because you think the true cost will cause management paralysis.

Think smaller – If the costs are too high, consider a smaller project or one that does not require some expensive items (a new RDBMS, fancy tools, or major new hardware)

Lack of Supporting Software

In many cases, supporting software (ETL, cleansing, BI tools, RDBMS, etc.) have not been chosen or have not been installed in time.

Evaluate software benefits – Understand the benefit of the software to the project. If it does not benefit this specific project, justification can only be accomplished if major follow-on projects will significantly benefit from its use. However, if it does not benefit your project, don’t waste your time and energy justifying the software.

Cost of not using the software – Quantify the costs of not using the software. These costs should include the additional effort to write code or create operational procedures, the ongoing costs to maintain the code, the costs of a delayed implementation, the increased risk and the potential for reduced quality of the deliverable.

Choose the big hitters – ;Identify only the software that can make a major contribution. Avoid recommending a piece of software that is fun, leading edge, and a resume enhancer, but does not significantly contribute to the project’s success.

Source Data Not Understood

Most organizations do not have a documented understanding of the source data. The knowledge is probably in someone’s head or is documented on his laptop but this information is generally unavailable to the rest of the organization.

Inventory and model source data – If source data has been neither inventoried nor modeled, it is probably because users and IT management do not recognize the importance of these activities. Any such recommendations would probably be seen as delaying the project. In truth, the inventory and modeling effort is long and laborious. If management has not already recognized their benefits, it’s unlikely that the data warehouse project will support it either. The danger here is that, whereas stand-alone systems could get by without this type of analysis and documentation, integrated systems cannot. Do not try to model the entire organization but you must analyze and model those data elements that are targeted for the data warehouse as part of your project. Integration cannot occur unless the data relationships are known and can actually be built from the source data.

Reverse engineering – If a modeling tool is in place that has reverse engineering capability (the ability to take database definitions [DDL], capture them in the tool repository, and generate rough models), this reverse engineering could be the least costly and most acceptable starting point. Unfortunately, many existing databases are unintelligible and the reverse engineering process will require significant effort to make the documentation on those databases useful.

Weak Sponsor

For the project to succeed it needs a strong, well-placed user sponsor who makes reasonable decisions.

Solicit the best – Take your time. Make a list of sponsors and put the strongest ones at the top of your list. Research their decision support requirements and determine which problems could be well served by the data warehouse. Invite the top candidate to lunch, present your research results to that user, and try to sell them on the benefits of a data warehouse. Explain to them what they would be getting, outline what would be needed from them and from their department, and ask for their sponsorship.

Solicit the second best – If the top candidate is not agreeable, invite the second on the list. As you progress down the list and when you are down toThe User From Hell, stop and do something else. In fact, it is not productive to go too far down the list. ;

Users Not Computer Literate

There will always be users who, based on their experience and their willingness, will be open to try something new. However, there will be other users who may not be computer literate, or at least not literate with your desktop operating system, and who may be reluctant to use new technology.

Expand User Support -& ;If your users are not computer literate, budget more money for User Support. You will be getting many more calls, and explaining how to use the system. It will take longer than it would if you were supporting power users.

Adjust expectations – Computer illiterate users will generate fewer queries. Your projections of the volume of queries may have assumed users who were more comfortable with the system. Allow more time for the expected volume of queries to be achieved. Allow more time for the users to be satisfied with the system and to be able to use the system productively. Readjust your expectations and those of the sponsor.

Training – ;Slow down the training so as not to frighten the students. Be sure the students are successful in their workshops. Provide mentors in the training process.

Pre-defined queries and reports – Develop a more comprehensive set of pre-defined queries and reports and communicate their availability to the user community. Take the initiative to add pre-defined queries and reports to the library.

User friendly front-end – ;Choose an extra-user-friendly BI tool. Choose “warm and fuzzy” over “power and function”. It does not matter about the wonderful features of the tool if the users are afraid to use it. The ideal tool is both easy to learn and easy to use.

Our next column will focus on political problems, unrealistic user expectations, scope creep and changing requirements, out-of-control vendors, key people leaving the project, and the loss of a sponsor.

About the Author

Sid Adelman is a principal consultant with Sid Adelman & Associates, an organization specializing in planning and implementing data warehouses, performing data warehouse and BI assessments, and in establishing effective data strategies. He is a regular speaker at “The Data Warehouse Institute” and IBM’s “DB2 and Data Warehouse Conference”. Sid chairs the “Ask the Experts” column on www.dmreview.com, and has had a bi-monthly column in DMReview. He is a frequent contributor to journals that focus on data warehousing.  He co-authored one of the initial works in data warehousing, Data Warehousing, Practical Advice from the Experts, and is co-author of Data Warehouse Project Management with Larissa Moss. He is the principal author of Impossible Data Warehouse Situations with Solutions from the Experts and his newest book, Data Strategy, was co-authored by Larissa Moss and Majid Abai. He can be reached at 818 783 9634and sidadelman@aol.com. His web site is www.sidadelman.com.

 
Free Expert Consultation