Information Quality Characteristics (part 1), guest author: James Funk

By Richard Wang

Information quality can have many different definitions. If you listen carefully to people describing issues they have with data that they use, you hear them talk about inaccurate data, data that is not relevant, data that is not timely, as well as having too much information. The work done as part of the Massachusetts Institute of Technology (MIT) research concerning data quality conducted by Richard Wang, Yang Lee, Diane Strong, and Leo Pipino indicates that one can identify 16 characteristics that impact the overall quality of the information people are expected to use in fulfilling their job and task responsibilities. That does not mean that each information quality situation involves all of the characteristics. It does mean that one has to listen carefully to the person who is describing the situation and identify which of these 166 characteristics exist within the specific context being explained. We will examine each of these characteristics in future columns. A list of the 16 characteristics can be found at the end of this month’s column.

But to set the stage for future discussions, I would like to share with you recent newspaper and magazine articles dealing with aspects of information quality. I do this because there have been an increasing number of such articles and they help to start one thinking about different data quality issues and characteristics as well as what individuals and organizations can do to address those issues. Although some people equate information quality with “accuracy”, our experience has shown that concerns about information quality extend far beyond that single characteristic.

Betsy Burton from Deloitte in an “INFOWORLD” article about Business intelligence (BI) and associated data issues states that “Data quality and data integrity are not going away. There’s no easy way to solve them.” The article correctly mentions that BI software vendors have tried to address data quality and integration issues with Master Data Management (MDM) solutions but that they have had limited success in trying to cleanse and reconcile data. One of the main reasons is that the BI efforts fall short in this regard is because they usually are dealing with a part of the organization and the data quality issues arise as they attempt to integrate and use information from across the entire organization. In each particular context the information is perceived and measured to have a high level of quality. It is when one tries to integrate the information within an overall organization context that troubles begin to appear and are not easy to overcome. You should realize that for some sources the data will always be dirty. In such instances you should try to determine if the data is really needed. Is it relevant for the purpose at hand? If it is, you need to think about the best approach to handle the data to make it meaningful. The article mentions a situation where two different people using the same information run reports using two different tools and get different results. It reminds one of a recent advertisement by a BI vendor that has three different people walking into a meeting with the CEO each with a different answer to a question. The question could be as simple as “What is the firm’s gross profit for the current fiscal period”? Depending upon the specific context used to calculate the answers, all three could be correct or none of them could be correct. A too frequent experience is attending a meeting and spending too much time determining whose numbers are correct. The time should have been spent analyzing the numbers and determining the proper action that should be taken by the organization. Consistency and usefulness of the information is important to any organization.

In a “Baseline Magazine” article it was reported that a recent Gartner Group study indicated that more than 25% of critical data held by Fortune 1000 companies was flawed. The data was inaccurate, incomplete or duplicated. Think about the implications to an organization! These data issues are addressed in some fashion when the financial statements for the firm are prepared. Unfortunately when someone tries to build aggregates of the information from the original source data, Complications will arise and issues of inconsistency and misunderstanding will occur.

A personal experience involved the development of an initial data warehouse for global financial information. The initial effort was to build a new source of global information that would be more available and would allow senior management to monitor the current month’s progress toward budget goals for gross revenue and other profit and loss (P&L) items. I will be referring to this effort in more detail in future columns as it is a good example for discussing different information quality issues. The effort was to build the information from the source systems that feed the process used to develop the P&L statements. To deliver information that would be believable to the senior executives, a stated goal was to match the published P&L information. After a great deal of effort the initial goal was changed to deliver the capability for gross revenue. This change was necessitated because there was no consistent source data for the other P&L items. Even the new goal proved elusive as the definition for gross revenue varied among the over 75 corporate subsidiaries. Initial attempts to aggregate sales for a subsidiary that matched reported amounts proved to be extremely challenging. We found that we had to develop a different process to aggregate sales for each subsidiary. Even then we were not always successful in matching the published revenue amounts. It took almost two years to successfully match the revenue numbers for all but one of the “major subsidiaries”. The only one that we were not able to match was the United States. Needless to say the senior executives were not thrilled with the progress or results. Fortunately they understood we were just the messengers about the current situation. The result was major global efforts – including a new definition for “gross revenue” – which we will discuss in future columns as part of possible solutions for information quality issues.

My hope is that you begin to understand the complexity surrounding this concept of information quality and that there usually are no quick and simple solutions within most organizations. That is why we use the metaphor of a “journey” as well as continuous improvement when talking about solving information quality problems. It took a long time to create the problems and it will take time to correct the problems. We ill have more to say about these journeys in future columns.

We look forward to our continuing conversations about information quality and wish you success in your information quality journey. If you have questions about what we have discussed or want more clarity about what we have said, or have suggestions for what you want discussed contact us at eitherjimfunk@mit.eduorrwang@mit.edu,http://mitiq.mit.edu.

About the Author

Richard Y. Wang is Director of MIT Information Quality (MITIQ) Program at the Massachusetts Institute of Technology. He also holds an appointment as University Professor of Information Quality, University of Arkansas at Little Rock. Before heading the MITIQ program Dr. Wang served as a professor at MIT for a decade. He also served on the faculty of the University of Arizona and Boston University. Dr. Wang received a Ph.D. in InformationTechnology from MIT. Wang has put the term Information Quality on the intellectual map with myriad publications. In 1996, Prof. Wang organized the premier International Conference on Information Quality, which he has served as the general conference chair and currently serves as Chairman of the Board. Wang’s books on information quality include Quality Information and Knowledge (Prentice Hall, 1999), Data Quality (Kluwer Academic, 2001), Introduction to Information Quality (MITIQ Publications, 2005), and Journey to Data Quality (MIT Press, 2006). Prof. Wang has been instrumental in the establishment of the Master of Science in Information Quality degree program at the University of Arkansas at Little Rock (25 students enrolled in the first offering in September 2005), the Stuart Madnick IQ Best Paper Award for the International Conference on Information Quality (the first award was made in 2006), the comprehensive IQ Ph.D. dissertations website, and the Donald Ballou & Harry Pazer IQ Ph.D. Dissertation Award. Wang’s current research focuses on extending information quality to enterprise issues such as architecture, governance, and data sharing. Additionally, he heads a U.S. Government project on Leadership in Enterprise Architecture Deployment (LEAD). The MITIQ program offers certificate programs and executive courses on information quality. Dr. Wang is the recipient of the 2005 DAMA International Academic Achievement Award (previous recipients of this award include Ted Codd for the Relational Data model, Peter Chen for the Entity Relationship model, and Bill Inman for data warehouse contributions to the data management field). He has given numerous speeches in the public and private sectors internationally, including a thought-leader presentation to some 25 CIO’s at a gathering of the Advanced Practices Council of the Society of Information Management (SIM APC) in 2007. Dr. Wang can be reached at rwang@mit.edu, http://mitiq.mit.edu