Electronic health databases have become extremely valuable resources for pharmaco-epidemiological and translational research, as noted in the recent Perspective article, Beyond debacle and debate: developing solutions in drug safety (Nature Rev. Drug Discov. 8, 775–779; 2009)1. Through these databases, researchers can gain a better understanding of the short- and long-term impact of exposure to drugs and devices, identify populations at risk for adverse effects, estimate the prevalence and natural history of medical conditions, and assess drug utilization across different demographic groups2,3,4.
However, the daunting size and complexity of these databases have made them inaccessible to all but a few experts with advanced data-management and statistical programming skills. Although simpler interfaces have begun to emerge5,6, a truly integrative approach that combines convenient access with advanced analytics is still lacking.
Our Advanced Biological and Chemical Discovery (ABCD) system has demonstrated the potential of such an approach in drug discovery7,8. Recently, we extended our toolset and the underlying design principles to the field of outcomes research. The solution consists of four major components.
The first is a high-performance relational database with a specialized data organization and indexing strategy, which decreases the data processing time by orders of magnitude compared with traditional, file-based approaches. To facilitate cohort selection by distinct criteria such as exposure to medications or clinical diagnoses, it creates separate copies of the same data tables with different clustered indices (Box 1), thus greatly improving performance.
The second is a graphical interface that allows the user to define complex queries with multiple inclusion or exclusion criteria through a series of dialogue boxes, employing language and identifiers that are familiar to epidemiologists (Fig. 1). This interface, which was implemented as a plug-in to Third Dimension Explorer (3DX)7, eliminates the need for programming expertise by hiding complex Structured Query Language (SQL) query generation, execution and post-processing.
The third is 3DX itself, an intuitive data analysis and visualization environment for interactive plotting, analysis, filtering and manipulation of the query results7 (Fig. 2). Follow-up queries and more elaborate statistical assessments can be invoked through specialized routines or services available from within 3DX, or by exporting the data into an external statistical package.
The final component is a rigorous evaluation of the tool's performance, usability and versatility for different users and study types, and a strict validation of the results by comparing them with the current gold standard: customized SAS routines that are specific to the issue at hand. We found that queries requiring many hours or even days of painstaking programming and data processing were constructed and executed within a few minutes, yielding identical results.
Examples of queries that are greatly simplified by the current interface include retrieving records of patients who have been administered a specific medication within a particular time period and have developed specific undesirable adverse events; identifying subjects who have experienced a particular event of interest, regardless of exposure history; developing an inception cohort of subjects with a particular diagnosis; and a number of other largely descriptive analytic routines.
Our future plans include the development of interfaces that simultaneously query multiple databases for the same investigation, as well as the direct linking of discovery, clinical, biomarker and health outcomes data to address translational hypotheses.
References
Ray, A. Beyond debacle and debate: developing solutions in drug safety. Nature Rev. Drug Discov. 8, 775–779 (2009).
Suissa, S. & Garbe, E. Primer: administrative health databases in observational studies of drug effects–advantages and disadvantages. Nature Clin. Pract. Rheumatol. 3, 725–732 (2007).
Schneeweiss, S & Avorn, J. A review of uses of health care utilization databases for epidemiologic research on therapeutics. J. Clin. Epidemiol. 58, 323–337 (2005).
Strom, B. L. How the US drug safety system should be changed. J. Am. Med. Assoc. 295, 2072–2075 (2006).
IMS Health. PharMetrics Integrated Database [online]
GPRD [online]
Agrafiotis, D. K. et al. Advanced Biological and Chemical Discovery (ABCD): Centralizing discovery knowledge in an inherently decentralized world. J. Chem. Inf. Mod. 47, 1999–2014 (2007).
Kirkpatrick, P. The ABCD of data management. Nature Rev. Drug Discov. 6, 956–957 (2007).
Acknowledgements
We would like to thank C. Confoy, L. Bulusu, G. Griffin, V. Ogay and R. Chizhevski for their contributions to this project.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Cepeda, M., Lobanov, V., Farnum, M. et al. Broadening access to electronic healthcare databases. Nat Rev Drug Discov 9, 84 (2010). https://doi.org/10.1038/nrd2988-c1
Issue Date:
DOI: https://doi.org/10.1038/nrd2988-c1