Data Warehouse Security
By Larissa Moss
As I was cleaning out my file drawers the other day, I came across an old DM Review article onData Warehousing Securitywritten by Kenneth Allard and Ramon Barquin. Those of you in the healthcare industry will recognize Ken and Ramon’s lead-in story as a type of breach of patient privacy and confidentiality, which led to the stringent HIPAA regulation.
“The CEO stared hard at his chief corporate attorney, the bitter reality of the man’s words still sinking in. The HMO for which both of them worked was being sued by patients whose confidential, on-line medical records had been electronically altered, either by outside hackers or a disgruntled employee with a grudge. It was already costing a fortune to double-check each record. But then word had leaked out and now the class-action suit, alleging “multiple violations of privacy and patient confidentiality,” sought damages in excess of one hundred million dollars. The HMO’s outside counsel thought that amount was probably excessive, but they were certain that an out-of-court settlement was unlikely for less than fifty million. Worse yet was the attorney’s latest piece of bad news: there was a strong possibility that the operating and settlement losses would trigger a stockholders lawsuit against the CEO, his top officers and the board of directors. Shaking his head in bewilderment, the CEO wondered how this had all happened. Data warehousing had been such an obviously right move, both for improved efficiency and lower operating costs; and his CIO had assured him that the system would have password and virus protections. But something had gone very, very wrong.” [DM Review, November 1997]
Like so many other fundamental topics I discuss in my column, implementing security is very basic to any system, including a data warehouse. Thus, it warrants a quick review of such basic questions as:
- What type of security measures do we need? What needs to be secured?
- How are we securing the data? The applications? The tools?
- What type of security measures do we already have in place?
- Do the security measures have to include encryption and decryption, especially for a Web-enabled BI application?
- Are single-user authentication services of a Web portal part of this data warehouse or this BI application?
The following discussion is an excerpt from the bookBusiness Intelligence Roadmapauthored by Larissa T. Moss and Shaku Atre, published by Addison Wesley in 2003.
Organizations that have a strong security umbrella in their operational mainframe environment are more likely to pay attention to security measures for their data warehouse on a multi-tier platform. Organizations that have a very lax security policy for their operational environment are usually prone to treat security casually for their data warehouse as well. These organizations may unwittingly be exposing themselves to security breaches, especially if the plan is to deliver information from the data warehouse databases over the Web.
The following is an example of a security requirement that may need to be imposed on a data warehouse. Suppose an organization wanted to give its distributors the ability to analyze their orders and shipments via a multi-dimensional BI application. To prevent one distributor from searching through other distributors’ data, there would have to be a mechanism for restricting the order and shipment information to only those pertaining to that particular distributor. In other words, some security lock is required to prevent access to the sales records of the distributor’s competitors. This is not as straightforward as it sounds, because:
There are no off-the-shelf umbrella security solutions to impose this kind of security. This security requirement would have to be implemented through the various security features of the RDBMS and of the BI tools used by the BI application.
The solution of imposing security at a table level may not be granular enough. Although, one possible way to achieve this type of security is to partition the tables either physically or logically (through VIEWs). Partitioning will restrict access solely to the appropriate distributor as long as both the fact tables and the dimension tables are partitioned. Therefore, this method could become too cumbersome.
An alternative may be to enhance the meta data with definitions of data parameters, which could control access to the data. This form of security would be implemented with appropriate program logic to tell the meta data repository “who the distributor is,” allowing the application to return the appropriate data for that distributor only. This type of security measure will only be as good as the program controlling it.
This example illustrates that the required security measures must be well thought through, and that the security features of the RDBMS and of the BI tools must be well understood and cross-tested. Complete reliance on one comprehensive security package that has the capability to implement any and all types of security measures is not a security solution, because such a security package does not exist.
If you are installing purchased security packages – as you should, be sure to minimize the number of security packages you implement because one of two things may happen:
Business people will be logging in through multiple security packages, using multiple logon IDs, and multiple passwords that expire at different times. They will get frustrated very quickly if they have to go through different logon procedures and remember different IDs and passwords for each procedure. Complaints will run high.
Business people will stop using the data warehouse entirely because it is too cumbersome. You don’t want this to happen either.
A number of organizations are avoiding this problem by adopting a single-sign-on scheme, which keeps the frustration level to a minimum but still allows tracking any security breaches, albeit in a less sophisticated way.
Centralized vs. Decentralized Security
The goal of centralized security is “one entry point — one guard.” It is much easier to guard a single door than multiple doors. In a centralized environment, all security measures can be implemented in one location because all the data is in one place. However, keeping all the data in one central place is not always feasible, or desirable.
If data needs to be stored in a distributed fashion, implementing security measures becomes much more complicated. The steps involved are:
Identify the endpoints in your network architecture and the paths connecting the endpoints.
Determine the connectivity paths (from the entry points) to get to the data. Link and label the connectivity paths.
Compare the paths with the existing security you have in place. You may already have some security packages installed, and some of them may be sufficient to guard a subset of the data. It may be useful to draw a matrix for security gap analysis purposes.
The security gap analysis matrix will help to identify where security is still needed and what type of security is needed. Keep in mind that:
Password security may be the least expensive to implement, but it can be easily violated.
RDBMS security is the most important component of the security solution and should override all other security measures that may contradict the authority granted to the data in the RDBMS.
Encryption is not that prevalent in data warehouses because of the complicated encryption and decryption algorithms. Encryption and decryption processes also degrade performance considerably. However, with the frequent use of the Internet as an access and delivery mechanism, encryption should be seriously considered to protect the organization from costly security breaches.
Security for Internet Access
The Internet enables distribution of information worldwide, and the data warehouse provides easy access to organizational data. Combining these two capabilities appears to be a giant leap forward for engaging in e-commerce. However, consider the implications of combining these technologies carefully before you decide to take the risk of potentially exposing sensitive organizational data.
Many product vendors are enabling Web access to databases in general and some vendors to data warehouse databases in particular. This complicates the concern for:
Security of the data warehouse in general.
Security issues associated with allowing Web access to the organization’s data.
If you opt to display the data over the Internet, spend extra time and money on authorization and authentication of internal staff and external customers. And if you are transmitting sensitive data to and from external customers, consider investing in encryption and decryption software.
Authentication— is the process of identifying a person, usually based on a logon ID and password. This process is meant to ensure that the person is who he or she claims to be.
Authorization— is the process of granting or denying a person access to a resource, such as an application or a Web page. In security software, authentication is distinct from authorization; and most security packages implement a two-step authentication and authorization process.
Encryption— is the “translation” of data into a secret code. It is the most effective way to achieve data security. To read an encrypted file, you must have access to a secret key or password that enables you to decrypt it.
The bottom line on security is that you need to define your security requirements early in order to have time to consider and weigh all factors.
In conclusion, I would like to quote from Kenneth Allard’s and Ramon Barquin’s article again: “Asked why he robbed banks, Willie Sutton famously replied, ‘Because that’s where the money is!’ In the information age, data warehouses are where the information is — so cyber-thieves are certain to try to loot them.”
References and Additional Reading
Allard, Kenneth, and Ramon Barquin.Data Warehousing Security. DM Review, November 1997.
Moss, Larissa, and Shaku Atre.Business Intelligence Roadmap.Upper Saddle River, NJ: Addison-Wesley, 2004.
About the Author
Larissa Moss is president of Method Focus Inc., and a senior consultant for the BI Practice at the Cutter Consortium. She has 27 years of IT experience, focused on information management. She frequently speaks at conferences worldwide on the topics of data warehousing, business intelligence, master data management, project management, development methodologies, enterprise architecture, data integration, and information quality. She is widely published and has co-authored the books Data Warehouse Project Management, Impossible Data Warehouse Situations, Business Intelligence Roadmap, and Data Strategy. Her present and past associations include Friends of NCR-Teradata, the IBM Gold Group, the Cutter Consortium, DAMA Los Angeles Chapter, the Relational Institute, and Codd & Date Consulting Group. She was a part-time faculty member at the Extended University of California Polytechnic University Pomona, and has been lecturing for TDWI, the Cutter Consortium, MIS Training Institute, Digital Consulting Inc. and Professional Education Strategies Group, Inc. She can be reached at email@example.com