Data Strategy
By Sid Adelman
A Chief Financial Officer (CFO) was approached by the CEO and asked for an accounting of the company’s financial assets. The CFO gave a vague response indicating a lack of knowledge of the corporate bank accounts, had little idea what was in each account, and had no idea about the status of accounts receivable. The Board of Directors asked the CEO about the intended use of the corporate assets and they were told “there is no plan for their use.” The CFO and the CEO were soon pursuing new personal interests.
A primary asset in IT is data and if most CIOs are asked about the assets under their control, most would be forced to respond that there are no plans for the use of this primary asset. They would have to admit that there is no inventory of data, that little is known about what data is in which database, that they have no idea about the quality of the data, and finally “there is no plan” for the productive use of this asset.
Current Status In Contemporary Organizations
Very few organizations, large or small, have a well-defined data strategy. If asked some will point you to dusty, outdated volumes of database standards, usually geared specifically to their Database Management System (DBMS). The more advanced organizations will have a subset of standards and perhaps a documented strategy on portions of what should be included in an overall strategy.
In most organizations, the value of data is not well understood. Data is considered the province of the department that creates it, and this data is often jealously guarded by that department, often under the guise of data ownership, which to the mind of the department head means secreting their data from all others in the organization. Some of the more astute people in your organization may recognize that data and the information derived from that data, translate into real power, and that control of that data can be the steppingstone to advancement.
A data strategy is usually addressed piecemeal. A company will launch an effort to choose its preferred DBMS or will attack a database performance problem when response time becomes excessive. Rarely do organizations work from the big picture, and as a result they sub-optimize solutions, introduce programs which may have a deleterious effect on the overall enterprise, cause inconsistencies that result in major efforts for interfacing, or develop systems that cannot be easily integrated.
As application packages or enterprise resource programs (ERPs) are introduced, they bring with them a plethora of diverse standards, naming conventions, codes, and DBMS platforms. Modifying these packages to conform to the organization’s standards is unthinkable and so the data Tower of Babel becomes even more unmanageable.
Most organizations have databases running rogue and stealth applications. These are often on Excel or on Access but they may also be on SQL Server or on some other DBMS. The existence of these applications is often unknown (and purposely undisclosed) to IT. They came into existence because a department needed an application and users were unwilling to wait for IT to deliver, the department wanted total control or did not want to have to conform to IT’s standards. These departments either developed the applications themselves or hired an outside consultant to write the applications. The choice of the platform, the DBMS, and the development language is usually determined by the skills of those writing the application. There is almost no thought to conforming to organizational standards and certainly no thought to integrate with other applications.
Why A Data Strategy Is Needed
Working without a data strategy is analogous to a company allowing each department and each person within each department to develop their own financial chart of accounts. This empowerment would allow each person in the organization to choose their own numbering scheme. Existing charts of accounts would be ignored as each person exercised his or her own creativity. Even to those of us who don’t wear green eye shades, the resulting chaos is obvious and easy to predict.
The chaos without a data strategy is not as obvious, but the indicators abound: dirty data, redundant data, inconsistent data, inability to integrate, poor performance, terrible availability, little accountability, users who are becoming increasingly dissatisfied with the performance of IT, and the general feeling that things are out of control.
Without a data strategy, the IT people within the organization have no guidelines for making the decisions that are absolutely crucial to the success of the IT organization. In addition, the absence of a strategy gives a blank check to those who want to pursue their own agendas including attachment to certain technologies, or Machiavellian aspirations of power. This includes those who want to try a new DBMS, new technologies (often unproven), and new tools that may or may not be appropriate. This type of environment provides no checks or validation for those who might be pursuing a strategy that has no hope for success.
A data strategy should result in the development of systems with less risk and a much higher success rate. It should also result in much higher quality systems. A data strategy provides a chief technology officer (CTO) and CIO with a rationale to counter arguments for immature technology, and data management approaches which are inconsistent with existing strategies.
Value Of Data As An Organization Asset
Organizations have data on their customers, their suppliers, and transactional data that captures the heart of the business, the purchases, sales, customer calls, activities, as well as financial data. This data has value, which means it is an asset that is just as important, if not more important, than the buildings, the parts inventory, accounts receivable, and equipment assets of the organization. Some organizations that sell their data do carry it as an asset on their books and this may be considered intellectual capital. When a company gets evaluated for acquisition, evaluated as a merger candidate, or appraised by Wall Street, the notion of a going concern includes the value of the data as an asset. The accounting method adopted by the Europeans and the United States is referred to as “fair-value accounting.” This means, accounting for assets, previously unrecognized, and placing those assets on their books. An asset has future value and an organization’s data certainly has future value but this value has rarely been reflected on a company’s books. However, this value is lurking there sometimes as sound management, excellent technology, goodwill, and sometimes it is reflected in the price of the stock. It will be difficult to properly value this asset, especially data that has been in the organization for some time.
Why would we need to assign value to data since an organization cannot exist without the data that supports their applications? The reason is that budgets and resources are limited, and it may be difficult for the person attempting to get the needed budget and the right people resources. By showing the business value of data, the budget and the right staff should be easier to acquire.
New federal regulations require the CEO and the CFO to certify the accuracy of what is reported to the investment community and to the governmental regulators. Such certification would be greatly improved if the organization had a strong and viable data strategy.
Components Of A Data Strategy
There are a number of key components in a data strategy. They are:
- Data integration
- Data quality
- Metadata
- Data modeling
- Organizational roles and responsibilities
- Performance and measurement
- Security and privacy
- DBMS selection
- Business intelligence
- Unstructured data
- Data categorization – categorizing data on the basis of performance requirements, security requirements, and availability requirements
I’ll briefly expand on two of these components, data integration and BI.
Data Integration
Data integration has continued to elude most organizations. They have not been able to integrate business data and understanding the customer continues to be a problem. Organizations have been taxed with massive volumes of data redundancy, and they have been plagued with the difficulties of integration because their data has resided on different DBMSs for decades and no thought had ever been given to the need for eventual integration.
Legacy data is the data on which most organizations run their operational systems. Organizations need to address how they plan to inventory their data, the redundancy factor, and the evaluation of the legacy data for its use and quality. Organizations must decide what to do with much of their legacy data. Should they migrate to another platform? Is it possible to dispose of some of this data (it’s expensive to keep it)? How should it best be archived? Historical data is required for some business intelligence (BI) operations as well as for some operational systems. Legal requirements and the threat of legal action demands certain and selected data retention.
The benefits of integrating your data are:
- Minimizing data redundancy – a reduction in redundant data means less hardware and software requirements, less program maintenance and less time reconciling inconsistent reports.
- Providing a comprehensive view of your key subject areas such as customer, patient, member, and supplier.
- Minimizing the effort and the potential for error when data needs to be pulled together from multiple sources.
BI
Business Intelligence (BI), including data warehousing (DW), has become critical for an organization’s ability to make intelligent tactical and strategic decisions. BI includes the DW infrastructure, the DW data, the DW tools, the methodology, as well as the organization, and training. The data strategy will profoundly affect BI as it addresses performance, security, data modeling, data integration, and metadata.
The benefits of a BI platform are:
- Revenue Enhancement – Improved marketing can produce more revenue per customer resulting from increased spending, and a greater share of the customer’s wallet. Selling higher margin products, focusing on the more profitable customers, and turning unprofitable customers into profitable ones will all enhance revenue.
- Analyst Productivity – In the past, analysts and knowledge workers had to spend 80% of their time gathering data with only 20% left over to perform the analysis. With some data warehouses, we are seeing those numbers reversed. Depending on the degree of consolidation and integration of the data, and the availability of useful metadata, analysts now may spend only 20% of their time gathering the data.
- Cost Containment – In contrast to revenue, which must be balanced with the cost required to produce that revenue, cost savings flow directly to the bottom line. The data warehouse can help to control costs in a number of areas. Costs can be reduced by minimizing inventory, minimizing promotional mailers, and using more cost effective channels to deliver services.
- Fraud Reduction – The DW and specifically data mining have been used to detect fraudulent insurance claims and fraudulent credit card usage. Analysis of claims has identified fraudulent health and workers’ compensation claims coming from specific doctors and lawyers. The types and patterns of the claims alert the investigators, who then conduct a more thorough audit to uncover fraud and abuse.
- Customer Conversion Rates – Better understanding the customers and targeting the prospects with the right products, channels, and incentives can dramatically improve the converting shoppers to buyers.
- Customer Attrition/Retention Rates – By knowing which customers are likely to leave, and knowing their relative profitability, you can take appropriate action to minimize customer attrition.
Data Environment Assessment
Orienteering (the British call it Cunning Running) features cross-country running with a map and a compass. When learning to use a map and compass, the first exercise is to determine where you are on the map. Without this knowledge, you are absolutely lost. So it is with establishing a target data environment under the umbrella of a data strategy. First you must determine where you are today. This includes an assessment of your existing DBMSs, internal skills, culture, and legacy systems.
How Is This Going To Happen?
The only way a data strategy will be developed and then implemented is with the very strong support of the CIO and business executives who must be educated on the criticality and impact of a good data strategy. It will also require that the business managers understand and concur with the goals and with the process; their cooperation is essential. A small team of dedicated (dedicated means full time with no other duties) people who will concentrate on building the data strategy and then concentrate on selling and then implementing the strategy. The ideal team will be composed of the chief technology officer (CTO) who will lead the project or at least be a major contributor to the project, a data administrator, a strong DBA, and a business analyst who comes from the business side. All these people should have a strong track record of accomplishment and a good relationship with key members of both IT and the business. They should also be intimately familiar with the business and the data that supports the organization. The team will have to sell the strategy and should be authorized to create the strategy and to implement it. Again, this means strong support from the CIO, senior directors, and the business executives, and this support must be communicated to everyone involved. A data strategy may attract many would-be assassins and the only thing that will keep them at bay is their understanding that the strategy will happen.
This article is excerpted from Data Strategy by Sid Adelman, Larissa Moss, and Majid Abai, Addison Wesley, 2005.
About the Author
Sid Adelman is a principal consultant with Sid Adelman & Associates, an organization specializing in planning and implementing data warehouses, performing data warehouse and BI assessments, and in establishing effective data strategies. He is a regular speaker at “The Data Warehouse Institute” and IBM’s “DB2 and Data Warehouse Conference”. Sid chairs the “Ask the Experts” column on www.dmreview.com, and has had a bi-monthly column in DMReview. He is a frequent contributor to journals that focus on data warehousing. He co-authored one of the initial works in data warehousing, Data Warehousing, Practical Advice from the Experts, and is co-author of Data Warehouse Project Management with Larissa Moss. He is the principal author of Impossible Data Warehouse Situations with Solutions from the Experts and his newest book, Data Strategy, was co-authored by Larissa Moss and Majid Abai. He can be reached at 818 783 9634and [email protected]. His web site is www.sidadelman.com.