Accuracy is not Always Easy to Obtain – Guest Author James Funk

By Richard Wang

Several recent articles have brought to light how difficult it can be to determine and maintain accuracy for a piece of data or information. Ina recent article about global warming in the Wall Street Journal, NASA’s Goddard Institute for Space Studies indicated that it revised records indicating which year was the warmest on record. The new information indicates that the warmest year on record is 1934 not 1998. In addition, it was discovered that NASA made a technical error in standardizing the weather air temperature data for the period from the year 2000 until the present for the United States. Based on the old data, it was reported that six of the 10 hottest years on record have occurred since 1990. The revised data indicates that six of the 10 hottest years occurred in the 1930s and 1940s. The revised data for the period 2000-2006 lowered the average temperature by 0.15C. NASA states that the change Is trivial. But it is difficult to know if the revision is trivial. Total warming in the United States since 1920 has been about 0.21 degrees Celsius.

In a New York Times article, there was a fascinating discussion about calculating the odds of a natural catastrophe and also calculating the cost for such natural disasters. An insurance company can function only if it is able to control its exposure to loss. Auto insurance companies cannot forecast an individual automobile accident, but the total number of accidents over a large population is amazingly predictable. The company knows from past experience what percentage of drivers it insures will file claims and how much those claims will cost. The risk associated with insuring natural catastrophes is much different. It relates how the insurance industry was unprepared for the claims generated by Hurricane Andrew in 1992 and Hurricane Katrina in 2005. Insurance companies had ignored historical data about frequency of hurricanes hitting the U. S. Coast and the tremendous growth in property value along the coasts most vulnerable to hurricane landfalls. They had greatly underestimated the losses from such catastrophes. Insurance premiums had been set too low because they had not properly assessed the risk involved in insurance policies related to such disasters even though accurate data was available to them concerning the relative risk and the magnitude of potential losses. The losses in Dade County from Hurricane Andrew exceeded all of the insurance premiums ever collected in that county.

Another example involves global organizations that report trends in sales. Most of them report year over year trends in sales that usually are presented as a percentage increase or decrease in the related global sales volume. Many times the report does not take into account variations due to inflation or currency exchange rates. If an organization only is concerned with a one year trend for reporting purposes, these variations tend to be small and probably have little impact on any decision based on the calculated change. If the trend is calculated over a longer period of time these variations can have a large impact on any decision based on the trend. An organization can overcome some of these issues by maintaining a series of related pieces of data. Sales amounts in the original currency would be maintained because that is what the local operation would use. When converting those original sales amounts to another currency the exchange rate used to make the conversion should be maintained. In addition the inflation rates for both currencies should be kept. By keeping all of the basic data organizations retain the capability to calculate and report trends in manners that best fit the situation. As an aside, users of trend information from sources outside their own organization should always question how the numbers were calculated so they fully understand what the numbers are trying to convey. Additionally organizations can keep data as to the number of units sold to make trend calculations. This however adds another set of complexities dealing with product mix and product innovation that are much harder to explain than financials representations of trends.

One may ask what is the point of these stories? There was never a question about the accuracy of the underlying basic data that measured the phenomena associated with the calculations. The NASA revision did not change the basic data. It changed the method used to standardize the data so it could be reliably compared over time. Much like what needs to be done with financial data. Experience shows that such basic data tends to be very accurate. Most accuracy issues arise for basic data that is not used in an immediate transaction, such as an order, a claim, or a shipment.

When data is generated that is not subject to a normal feedback cycle, such as a customer reviewing an invoice or a claims agent reviewing an insurance claim, data errors are more prevalent. When this data is used to calculate another number, it is easy for the errors in the underlying data to get lost and over time to become undetectable. It is also easy for numbers to become less accurate when they are aggregated. There are many pitfalls that can occur when aggregating data, but that is a subject for another time.

We have much more to talk about concerning what constitutes high quality data. It involves much more than raw accuracy. We will begin to address those characteristics of quality data beginning next month.

We look forward to our continuing conversations about information quality and wish you success in your information quality journey. If you have questions about what we have discussed or want more clarity about what we have said, contact us at eitherjimfunk@mit.eduorrwang@mit.edu, or visithttp://mitiq.mit.edu.

About the Author

Richard Y. Wang is Director of MIT Information Quality (MITIQ) Program at the Massachusetts Institute of Technology. He also holds an appointment as University Professor of Information Quality, University of Arkansas at Little Rock. Before heading the MITIQ program Dr. Wang served as a professor at MIT for a decade. He also served on the faculty of the University of Arizona and Boston University. Dr. Wang received a Ph.D. in InformationTechnology from MIT. Wang has put the term Information Quality on the intellectual map with myriad publications. In 1996, Prof. Wang organized the premier International Conference on Information Quality, which he has served as the general conference chair and currently serves as Chairman of the Board. Wang’s books on information quality include Quality Information and Knowledge (Prentice Hall, 1999), Data Quality (Kluwer Academic, 2001), Introduction to Information Quality (MITIQ Publications, 2005), and Journey to Data Quality (MIT Press, 2006). Prof. Wang has been instrumental in the establishment of the Master of Science in Information Quality degree program at the University of Arkansas at Little Rock (25 students enrolled in the first offering in September 2005), the Stuart Madnick IQ Best Paper Award for the International Conference on Information Quality (the first award was made in 2006), the comprehensive IQ Ph.D. dissertations website, and the Donald Ballou & Harry Pazer IQ Ph.D. Dissertation Award. Wang’s current research focuses on extending information quality to enterprise issues such as architecture, governance, and data sharing. Additionally, he heads a U.S. Government project on Leadership in Enterprise Architecture Deployment (LEAD). The MITIQ program offers certificate programs and executive courses on information quality. Dr. Wang is the recipient of the 2005 DAMA International Academic Achievement Award (previous recipients of this award include Ted Codd for the Relational Data model, Peter Chen for the Entity Relationship model, and Bill Inman for data warehouse contributions to the data management field). He has given numerous speeches in the public and private sectors internationally, including a thought-leader presentation to some 25 CIO’s at a gathering of the Advanced Practices Council of the Society of Information Management (SIM APC) in 2007. Dr. Wang can be reached at rwang@mit.edu, http://mitiq.mit.edu