Abstract 220 Poster Session III, Monday, 5/3 (poster 95)

Developing technology to verify and validate the accuracy of data entered into large databases by hand typing, by electronic upload from peripheral equipment, or by direct download from hospital computer systems has had little attention in the current nationwide effort to computerize the health care industry. Currently, verifying electronically acquired or typed clinical data resorts to human inspection. This hands-on approach to the "verification and validation" problem is an impossibly impractical, even dangerous, solution, because checking electronically uploaded data requires far more than a single real-time FTE, even for a full-time verifier. We have written software that finds erroneous values in large electronic datasets by using statistical techniques based in linear mathematical theory to calculate whether a given datapoint is likely to be incorrect. The software adapts statistical algorithms that were developed for edge detection in image intensification science. The algorithm calculates the standard deviation of the preceding 100 observations, using a sensitivity and specificity window the user defines by setting variance in a dialog box. The smaller the variance, the smaller is the possible error range. ErrorCheck installs from a 4" floppy into a Windows/Office 486 PC computer where it works within MS Excel to translate data files (Lotus, ASCI...) into the format used by ErrorCheck (text files). ErrorCheck detects trend-break, column-shift, decimal point, typographical, out-of-range and string variable errors in hand-typed or electronically uploaded data sets. This allows the verifier to focus on points the program has deemed likely to be errors or artifacts. ErrorCheck reports by either a paper report format that sequentially identifies the erroneous point, the potentially erroneous value, followed by two dated signature lines for the verifier and for the operator who corrects database files or a real time graphical report. The graphic report replaces potential error points with a red line, while archiving the potential error point.

To our knowledge, this is the first time a statistical approach to error detection has been applied to the numerical, spread-sheet type datasets common in classical science, medicine and business. This software incorporates user-definable variance capabilities using a simple point and click dialog box that can write complex error detection algorithms onto "open-platform" software. In medical applications, when used on-line for verifying and validating electronically uploaded clinical data, the interval printed report could help to identify and judge problematic values (= "verification"), thereby assisting the nurse who "signs off" on the clinical data, prior to permanent archiving (= "validation").