Data Integrity in a New Light
By Anne Marie Smith, Ph.D.
One of the main areas of responsibility for any data steward is the enforcement of data integrity. Most data administration texts define data integrity as “attention to the consistency, accuracy and correctness of data stored in a database or other electronic file” (Watson, R., “Data Management”, Wiley, 2000). Commonly, data integrity refers to the validity of data in its incarnations (electronic, paper, etc.). This approach is primarily a reactive one and is focused on the rules used to create and store data values, by creating and storing the “right” values for each data element.
However, data stewardship in its most robust and active form should be concerned with much more than simply enforcing rules for creating and storing “right” data values. As a steward, one has the responsibility for the proper use and welfare of the assets under their stewardship (Brackett, M. H., “Data Resource Quality”, Addison-Wesley, 2000). If stewards are to implement their responsibilities fully, perhaps we should adopt an alternative definition to “data integrity” – one that has as its goal the use and presentation of data without bias, refusing to allow data to support of one point of view to the exclusion of any competing view.
As defined by Webster, “integrity” is “firm adherence to a code, a set of moral values, honesty and incorruptibility”. Therefore, “data integrity” can be defined as “using data according to a code or set of values, with honesty”. Data stewards can serve as the foundation of this new approach to data integrity through their oversight of the data rules, values and access to the data in their areas of responsibility. By choosing to teach data stewards of the ethical and honest uses of the data in their areas, organizations can formulate and articulate the acceptable uses of their data – and the consequences of unacceptable uses. Some commonly accepted prohibitions on the use of data, taken from Cornell University’s Information Use in the 21st Century project include:
- Do not use information (even if authorized to access it) to support actions by which individuals might profit (e.g., a change in salary, title, or similar administrative category). Do not disclose information about individuals without prior supervisor authorization.
- Do not engage in what might be termed “administrative voyeurism” (e.g., tracking the pattern of salary raises; determining the source and/or destination of telephone calls or Internet protocol addresses; exploring race and ethnicity indicators; tracking internal stock purchases; perusing server-stored email), unless authorized to conduct such analyses for stated business efforts.
- Do not circumvent the nature or level of data access given to others by providing access or data sets that are broader than those available to them via their own approved levels of access (e.g., providing a company-wide data set of human resource information to a coworker who only has approved access to a single human resource department), unless authorized.
- Do not facilitate another’s illegal access to the company’s administrative systems or compromise the integrity of the systems data by sharing your passwords or other information.
The ethical use of data can be applied to the creation and analysis/interpretation of data in reports or similar documents, especially those that are drawn from a data warehouse. Presenting data from a pre-determined view (e.g., looking for evidence to support an already-chosen outcome or result) is a common activity in organizations, even in organizations that espouse the traditional definition of “data integrity”. These reports are written by analysts who can be considered as “custodians” of data, defined as “anyone who has access to, receives or dispenses data” (Watson, 2000). As custodians, these analysts should be expected to present data in an un-biased format, so the final recipient of the results or report can draw their own conclusions. Giving the final users of the data the opportunity and freedom to make decisions and form conclusions with un-biased data should be a goal of all data stewards, in both transactional systems and with decision support and data warehousing systems.
Why is the impartial presentation of data an issue for data warehouse data stewards? Data drawn from a data warehouse can be combined in ways not expected with traditional systems, since the dimensional view of data in a data warehouse allows users / analysts to relate previously unrelated values. These new relationships can result in the presentation of data that can be slanted toward or against a particular outcome, and this bias may be invisible to the eventual report reader.
Allowing analysts to display or present data impartially is considered a core integrity value in many organizations that have adopted Peter Block’s “Organizational Stewardship” approach, since it gives the report reader (i.e. final user) the opportunity to use the data as the reader wishes, without the need to filter the analysis through a distant analyst’s prism. Block’s view of stewardship can be succinctly defined as “giving order to the dispersion of power, moving choice, resources and control to the edges of the organization where actual activity occurs (Bloch, P., “Stewardship”, Berrett-Kohler Inc., 1993). Recently, this un-biased approach has been used by some investment firms to overcome the impression that their analysts have presented past results in less-than-impartial terms to external customers. The impartial presentation of the facts, with the opportunity for the final user to draw fact-based conclusions could become an objective of data warehousing and other decision support systems’ performance measurements. Such an objective could enhance the integrity of the organizations that use decision support systems to generate data used in multiple levels of analysis.
However, attempts to present data without bias can be taken to an extreme, and can result in analysts abdicating their responsibilities to provide advice and guidance in interpreting complicated data. “Impartial” does not have to mean “without any attempt at clarification or examination” since many uses of data require some interpretation and deduction to be operable. Analysts, and data custodians in general, should strive for a balance between the raw presentation of data versus the tendency to slant the presentation to serve a pre-determined outcome or decision. Data stewards can assist analysts and other custodians to develop this balanced approach by working to develop guidelines for data integrity that include this impartial yet advisory presentation of data. Stewards can lead this effort by educating all members of the organization of the need for using data with integrity, and by facilitating discussions on the un-biased yet examined analysis of the organization’s data to internal and external customers.
In conclusion, adopting a new definition of “data integrity” could lead to expanding an awareness of the need for active data stewardship within organizations and within a data warehouse. Data stewards can foster the data integrity approach through communication of the possibility of impartial presentation and use of data and by exhibiting the principles of true data integrity in their development of standards, definitions and guidelines for data usage.