Data Warehousing: a Short Overview
By Anne Marie Smith, Ph.D.
Many corporations are experiencing significant business benefits using data warehouse technology. Users report gains in market competitiveness (increased revenue) and information management (reduced costs).
- Data: observable, recordable facts.
- Information: an organized collection of facts in a meaningful context; data given relevance and purpose.
- Operational system: an environment of data and programs used to support the continuing activities of an organization.
- Informational Data Warehouse: a collection of data and programs used to support analysis and decision-making needs, separate from the operational systems.
A data warehouse is a separate architecture used to maintain critical historical data that has been extracted from operational data storage and transformed into formats understandable to the organization’s analytical community.
Data integrity is a major issue within most organizations, and the development of a data warehouse is frequently used as a vehicle to improve data quality significantly. Accuracy in data can mean realized savings of thousands of dollars in areas such as marketing, customer service, and finance. Many studies by organizations such as Gartner Group and Innovative Systems point to the savings obtained from a 4% increase in the integrity of data in many diverse companies.
Some of the benefits of the development of a data warehouse include:
- more accurate predictions of customer demand based on the use of trends analysis
- the response improvement in direct-marketing campaigns through the use of household demographics and current customer analysis
- the improvement in vendor relations and price reductions by targeting selected vendors with increased levels of purchasing over the enterprise
- significant savings from improved data quality across the enterprise
Costs associated with the development of a data warehouse can be identified to include:
- Relational Database Management Systems
- Data Warehouse Administration software
- Repository and Information Directory software
- Decision Support and Analysis software
- Middleware (interfaces)
A data warehouse provides the foundation for information processing by creating an integrated database of critical, subject-oriented, historical data for analysis. The data in the data warehouse has been integrated; that is, data from various operational systems has been combined to form a consistent view of a subject, such as customer data. This data has been reformatted, edited and cleansed to permit compatible results from analysis. The data is also historical, enabling analysis over time to detect trends and variations in data patterns.
The need for better quality information that can be more easily accessed and analyzed is a primary focus of the data warehousing efforts at many companies. The most common application areas so far are sales, marketing, customer information, finance, procurement, actuarial and statistical analysis, and asset and risk management. Another reason for developing a data warehouse is the need to empower users by providing access to data scattered over many disparate systems, thus saving IS staff time and improving user and systems productivity.
Analysis and decision making usually requires the integration of data from across multiple subject areas and systems, such as customers, product usage and billing. It may also require the inclusion of data from sources outside the company’s applications, such as demographic data. This is very different from operational systems, where all the data needed for an application is collected and used within that application. Performance issues are also different between operational systems and analytical systems. Operational transactions place a fairly consistent burden on computing resources. Querying functions, however, tend to be unpredictable, with alternating periods of intense activity and minimal use. By providing two distinct environments, one for operations and another for analysis, a company can tailor each environment to meet the specific needs.
A data warehouse serves as the focus for analytical and decision making querying and reporting. Analysis and decision making can mean executive information systems (EIS) with highly summarized data structures; managerial analysis summarizing departments or product lines; workstation analysis with more detail than in managerial analysis but still aggregated for trend and other types of analysis.
Data warehouses are built iteratively; that is, each subject area is built as a separate project. The extremely poor performance of project plans that called for a massive single development of all subject areas strongly suggests that the iterative approach be used.
A data warehouse architecture will provide many lasting benefits to many companies. Competitive advantage, improved knowledge of relationships among products and services and their performances, analytical and decision making gains can all be realized by integrating a data warehouse into your information environment.