Data Warehouse Cost
By Sid Adelman
This article is an excerpt from Data Warehouse Project Management at Addison Wesley Longman, Sid Adelman and Larissa Moss.
Not every data warehouse project will initially be cost justified. Many have been implemented without really knowing how much they will cost. Many will be authorized without even estimating the potential benefits. Projects are often started because a sponsor has the need for information and has the budget.
The Need For Cost Justification
Budgeting – Without cost justification, projects will always be in jeopardy. During future budget cycles, management will be looking for ways to reduce cost and if there is no documented reason for completing the project, they are likely to forget the flush of excitement that accompanied the project’s initiation.
Staffing – Without cost justification, staffing with the right people may be difficult. By having some real dollar numbers to back up a request, the request is more likely to be satisfied.
Prioritization – Without a cost/benefit analysis, project prioritization is difficult and management has little to compare projects other than a gut-feel that one is more important than another. Without cost/benefit analysis, the line-of-business manager with the most power is likely to get her project approved. The project that is most important to the enterprise may never be implemented.
Costs can and must be controlled. It is the project manager who has the responsibility for controlling costs along with the other responsibilities. Adhering to the Project Agreement is a major start for controlling costs. The Project Agreement specifies the data that will be in the data warehouse, the periods for which the data is kept, the number of users and predefined queries and reports. Any one of these factors, if not held in check, will increase the cost and possibly the schedule of the project. A primary role of the project manager will be to control scope creep.
User Support staff or the Help Desk staff will be the users’ primary contact when there are problems. Providing adequate User Support will require more people, and more training of those people, to answer questions and help the users through difficult situations. The cost for the additional people, the training and possibly an upgrade in the number and knowledge-level of the staff answering the phones must be added into the data warehouse costs.
Consultants and Contractors
Consultant and contractor expenses can balloon a project’s cost. Consultants are used to supplement the lack of experience of the project team, contractors are used to supplement the lack of skilled personnel. There are two types of consultant/contractors:
Product specific contractors – These persons are brought in because they know the product. They can either help or actually install the product, and they can tune the product. They will customize the product, if it is necessary. The product-specific consultants may either be in the employ of the tool vendor or may be independent. An example of their services would be installing and using an ETL tool to extract, transform and load data from your source files to the data warehouse. In this activity they may be generating the ETL code on their own or working with your people in this endeavor. To control costs, the most important thing is to have the right person(s) on your staff work with the consultant/contractors and absorb their knowledge and understand the process. The acquired knowledge would allow your staff to perform this work in the future and to maintain what has already been done. Your goal is to make your staff self-sufficient as soon as possible.
General data warehouse consultants – These consultants may have a specific niche such as data modeling, performance, data mining, tool selection, requirements gathering or project planning. They will typically be involved for a shorter period of time than the product-specific consultant/contractor. They have two roles that are equally important. The first is working with your people to complete a task such as selecting a query tool or developing a project plan. The second is the knowledge transfer to your staff so they can perform the activity the next time on their own. Just as in the case of the product-specific consultant/contractor, your goal is to make your staff as self-sufficient as soon as possible.
Very often, contractors and consultants are asked to participate in activities beyond their original statement of work. These may or may not be useful activities and they may or may not contribute to the success of your project. Keep the contractors and consultants focused on their stated objectives and on your project and, if they are working on something other than your project, be sure some other department is paying for them.
You didn’t marry the contractors and consultants. They are not there until death do you part (or when the project is cancelled). Don’t let them get too comfortable. Watch their time, their activities and their deliverables. By having specific deliverables associated with their contracts, they are more likely to complete their projects on time and within your budget. Do not bring them in too early and do not keep them beyond the time when they no longer make significant contributions to your project.
The software products that support the data warehouse can be very expensive. The first thing to consider is which categories of tools you need. Do not bring in more categories of products than you need. Do not try to accomplish everything with your first implementation. Be very selective.
Hopefully, you have someone in your organization experienced in dealing with vendors and understanding their contracts. You will be working closely with this person. They will know the things to watch out for in a contract, but you will need to give them some help to acquaint them with data warehousing. You will also have to give them some warning if you heard anything negative about the vendor. Your contract people will know how to include protection in the contract to keep the vendor from arbitrarily raising their prices. They will know how to control maintenance costs. They will know about protection if the vendor sells out to a less-accommodating company. They will know about enforcing satisfactory service from the vendor. If you do not have anyone in your company who can perform these functions, engage a procurement consultant.
Most of the products have base prices and many add-ons. For example, add-ons with the ETL tools could include additional costs for each different type of source file or target database. The tools are often priced based on the platform (MVS, UNIX, and NT) and the size of the platform. For the query tools, they are based on the number of seats and these could either be designated users or concurrent users. When you talk to the references, you will want to ask them what they planned to buy from the vendor, and what they had to eventually buy. There often are surprises.
Anticipate growing your environment. The growth will be in the number of users, the size of the database, much more machine resources necessary to perform the more complex queries, an extension to users beyond your enterprise (customers and suppliers), and possibly, the need to migrate to a more robust and better performing platform. Be sure your contract allows for these types of growth factors, and that the growth, while it may increase, does not explode your costs.
Your organization most likely already has an RDBMS. Should you have to pay for it as part of your data warehouse project? If there is a site license, there may be no charge to your department or you may have to pay a portion of the site license. You may have to pay if the data warehouse will be on another CPU, and if the RDBMS is charged by CPU. You may have to pay an upgrade if the data warehouse requires going to a larger CPU, and if there is an additional cost for the larger CPU.
What if you already have a site license for a query tool or report writer? You may have to pay a proportional percentage of the cost or at least a proportional percentage of the yearly maintenance fee. A reasonable method of assigning proportionality is by the number of users your project supports.
Capacity planning for a data warehouse is extremely difficult because:
- The actual amount of data that will be in the warehouse is very difficult to anticipate.
- The number of users will also be difficult to estimate.
- The number of queries each user will run is difficult to anticipate.
- The time of day and the day in the week when the queries will be run is difficult to guess (we know there will not be an even distribution, expecting more activity at month-end, etc.).
- The nature of the queries, the number of I/Os, the internal processing is almost impossible to estimate.
All these unknowns mean that whatever hardware is chosen, it must be scalable. It must be able to scale and grow to at least three times the largest anticipated size.
The turnpike effect may cause you to underestimate the resources needed. (The number of lanes needed for turnpikes was based on the traffic in the area. When the turnpikes were built, more traffic was attracted because of the road’s convenience. This meant the turnpikes were underbuilt). If the data warehouse is successful, it will be used more than anticipated, and an increasing number of users will want to use it. It will be used for more functions than were originally considered. More data will be desired. The data will be expected at a more detailed level and for more historical periods (five years instead of two years) will also be desired.
For the data warehouse, you will need CPUs, disks, networks and desktop workstations. The hardware vendors can help size the machines and disks. Be aware that unanticipated growth of the data, increased number of users and increased usage will explode the hardware costs. Existing desktop workstations may not be able to support the query tool. Do not ask the query tool vendor for the minimum desktop configuration. Ask for the recommended configuration. Call references to find out if and how they had to upgrade their desktop workstations.
Raw Data Multiplier
There are many debates over how much disk is needed as a multiplier of the raw data. Besides the raw data itself, space is needed for indexes, summary tables and working space. Additional space may be needed for replicated data that may be required for both performance and security reasons. The actual space is very dependent on how much is indexed and how many summary tables are needed. The summary tables may be created as you learn more about what the users are asking for and how often they ask. From this information, you will be creating summary tables, one at a time. The RDBM vendors should be able to help you with estimates of how much working space you will need. We did a study of what other data warehouse installations were using as a multiplier. It was highly varied but we arrived at a five times multiplier– if you have 100 gigabytes of raw data you should provide for 500 gigabytes of disk space.
How should you account for existing hardware that can be used for the data warehouse? It may mean you do not have to buy any additional hardware. The Y2K testing may have required hardware that is now redundant and unused. Should that be included in our data warehouse cost? It is a safe assumption that your organization will need additional hardware in the future. By using the redundant hardware for the data warehouse, it means that additional hardware for non-data warehouse purposes must be purchased sooner. You may be able to defer the cost of the redundant hardware; you will eventually have to pay. At the time the hardware is purchased, it will undoubtedly be less than today’s costs.
Controlling Hardware Costs
Your ability to control hardware costs will depend primarily on whether your organization has a chargeback system. Even though department heads are supposed to have the best interests of the organization at heart, what they care most about is meeting their performance objectives. These, of course, include the costs assigned to their department. If department heads are paying for what they get, they will be more thoughtful about asking for resources that may not be cost justified. We had an experience with a user asking to store ten years worth of detailed data. When he was presented with the bill (an additional $1.5 million), he decided that two years worth of data was adequate.
Internal People Costs
These people are getting paid anyway regardless of whether we use them on this project or not. Why should we have to include their costs in our budget? We have to assume these people would be working on other productive projects. Otherwise, there is no reason for the organization to keep them employed. Count on having to include the fully burdened costs of the people on your project. Keep in mind that you are much better off with a small team of highly skilled and dedicated workers than with a larger team of the type of people to avoid for your project.
User training is usually done on the premises and not at a vendor site. There are four cost areas for user training that must be considered.
- The cost to engage a trainer from the outside or the time it takes for your in-house trainer to develop and teach the class.
- The facilities including the desktop workstations for the workshop.
- The time the users spend away from the job being in class, and the time it takes them to become proficient with the tool.
- If not all the users are in the same location, travel expenses for either the users or the trainer must be included.
Generic training may be appropriate. Examples are classes in logical data modeling, data warehouse project management or star schema database designs. Data warehouse conferences and seminars can provide an overall perspective as well as training in specific areas. IT will need to attend training on the complex tools and products. IT will also need enough time to work with the products to become proficient. The cost of training is sometimes included in the price of the tool.
Most organizations focus on the cost to implement the initial data warehouse application and give little thought to on-going expense. Over a period of years, the continuing cost will very likely exceed the cost of the initial application. The data warehouse will grow in size, in the number of users and in the number of queries and reports. The database will not remain static. New data will be added, sometimes more than for the initial implementation and the design most probably will change, and the database will need to be tuned. New software will be introduced, new releases will be installed and some interfaces will have to be rewritten. As the data warehouse grows, the hardware and network will have to be upgraded.
User Support of the data warehouse will be an ongoing expense both from the maintenance team and the Help Desk. Desktop workstations will need to be upgraded as will the software on those workstations.
Total Cost of Ownership
The total cost of ownership (TCO) includes all the direct and the indirect costs. TCO includes costs that are normally hidden from a budget such as the additional time the users spend working with the data warehouse directly, assuming that in the past they sent their requests to IT. TCO includes costs you incur initially as well as costs in future periods. If your recruiting requires using placement agencies, they must be included. Insurance, tax, floor space, utility costs will all be part of the TCO. The TCO should be calculated for the initial implementation and for projected costs in subsequent years for the enhancement and sustenance of the data warehouse.
About the Author
Sid Adelman is a principal consultant with Sid Adelman & Associates, an organization specializing in planning and implementing data warehouses, performing data warehouse and BI assessments, and in establishing effective data strategies. He is a regular speaker at “The Data Warehouse Institute” and IBM’s “DB2 and Data Warehouse Conference”. Sid chairs the “Ask the Experts” column on www.dmreview.com, and has had a bi-monthly column in DMReview. He is a frequent contributor to journals that focus on data warehousing. He co-authored one of the initial works in data warehousing, Data Warehousing, Practical Advice from the Experts, and is co-author of Data Warehouse Project Management with Larissa Moss. He is the principal author of Impossible Data Warehouse Situations with Solutions from the Experts and his newest book, Data Strategy, was co-authored by Larissa Moss and Majid Abai. He can be reached at 818 783 9634and firstname.lastname@example.org. His web site is www.sidadelman.com.