Broadening access to electronic healthcare databases

Cepeda, M. Soledad; Lobanov, Victor S.; Farnum, Michael; Weinstein, Rachel; Gates, Peter; Agrafiotis, Dimitris K.; Stang, Paul; Berlin, Jesse A.

doi:10.1038/nrd2988-c1

Download PDF

Correspondence
Published: January 2010

Broadening access to electronic healthcare databases

M. Soledad Cepeda¹,
Victor S. Lobanov²,
Michael Farnum²,
Rachel Weinstein¹,
Peter Gates¹,
Dimitris K. Agrafiotis²,
Paul Stang¹ &
…
Jesse A. Berlin¹

Nature Reviews Drug Discovery volume 9, page 84 (2010)Cite this article

871 Accesses
13 Citations
Metrics details

Electronic health databases have become extremely valuable resources for pharmaco-epidemiological and translational research, as noted in the recent Perspective article, Beyond debacle and debate: developing solutions in drug safety (Nature Rev. Drug Discov. 8, 775–779; 2009)¹. Through these databases, researchers can gain a better understanding of the short- and long-term impact of exposure to drugs and devices, identify populations at risk for adverse effects, estimate the prevalence and natural history of medical conditions, and assess drug utilization across different demographic groups^2,3,4.

However, the daunting size and complexity of these databases have made them inaccessible to all but a few experts with advanced data-management and statistical programming skills. Although simpler interfaces have begun to emerge^5,6, a truly integrative approach that combines convenient access with advanced analytics is still lacking.

Our Advanced Biological and Chemical Discovery (ABCD) system has demonstrated the potential of such an approach in drug discovery^7,8. Recently, we extended our toolset and the underlying design principles to the field of outcomes research. The solution consists of four major components.

The first is a high-performance relational database with a specialized data organization and indexing strategy, which decreases the data processing time by orders of magnitude compared with traditional, file-based approaches. To facilitate cohort selection by distinct criteria such as exposure to medications or clinical diagnoses, it creates separate copies of the same data tables with different clustered indices (Box 1), thus greatly improving performance.

The second is a graphical interface that allows the user to define complex queries with multiple inclusion or exclusion criteria through a series of dialogue boxes, employing language and identifiers that are familiar to epidemiologists (Fig. 1). This interface, which was implemented as a plug-in to Third Dimension Explorer (3DX)⁷, eliminates the need for programming expertise by hiding complex Structured Query Language (SQL) query generation, execution and post-processing.

The third is 3DX itself, an intuitive data analysis and visualization environment for interactive plotting, analysis, filtering and manipulation of the query results⁷ (Fig. 2). Follow-up queries and more elaborate statistical assessments can be invoked through specialized routines or services available from within 3DX, or by exporting the data into an external statistical package.

The final component is a rigorous evaluation of the tool's performance, usability and versatility for different users and study types, and a strict validation of the results by comparing them with the current gold standard: customized SAS routines that are specific to the issue at hand. We found that queries requiring many hours or even days of painstaking programming and data processing were constructed and executed within a few minutes, yielding identical results.

Examples of queries that are greatly simplified by the current interface include retrieving records of patients who have been administered a specific medication within a particular time period and have developed specific undesirable adverse events; identifying subjects who have experienced a particular event of interest, regardless of exposure history; developing an inception cohort of subjects with a particular diagnosis; and a number of other largely descriptive analytic routines.

Our future plans include the development of interfaces that simultaneously query multiple databases for the same investigation, as well as the direct linking of discovery, clinical, biomarker and health outcomes data to address translational hypotheses.

Box 1 | Indices

Indices are data structures that improve the speed of operations on a database table. Clustering is an indexing technique that re-arranges the raw data blocks in a way that matches the index, resembling an address book in which entries are ordered by last name. This technique leads to dramatic improvements when the data are accessed sequentially, in the same or reverse order of the clustered index, or when a range of items are selected, which is typical in cohort selection.

References

Ray, A. Beyond debacle and debate: developing solutions in drug safety. Nature Rev. Drug Discov. 8, 775–779 (2009).
Article CAS Google Scholar
Suissa, S. & Garbe, E. Primer: administrative health databases in observational studies of drug effects–advantages and disadvantages. Nature Clin. Pract. Rheumatol. 3, 725–732 (2007).
Article CAS Google Scholar
Schneeweiss, S & Avorn, J. A review of uses of health care utilization databases for epidemiologic research on therapeutics. J. Clin. Epidemiol. 58, 323–337 (2005).
Article PubMed Google Scholar
Strom, B. L. How the US drug safety system should be changed. J. Am. Med. Assoc. 295, 2072–2075 (2006).
Article CAS Google Scholar
IMS Health. PharMetrics Integrated Database [online]
GPRD [online]
Agrafiotis, D. K. et al. Advanced Biological and Chemical Discovery (ABCD): Centralizing discovery knowledge in an inherently decentralized world. J. Chem. Inf. Mod. 47, 1999–2014 (2007).
Article CAS Google Scholar
Kirkpatrick, P. The ABCD of data management. Nature Rev. Drug Discov. 6, 956–957 (2007).
Article CAS Google Scholar

Download references

Acknowledgements

We would like to thank C. Confoy, L. Bulusu, G. Griffin, V. Ogay and R. Chizhevski for their contributions to this project.

Author information

Authors and Affiliations

Paul Stang and Jesse A. Berlin are at the Department of Epidemiology, M. Soledad Cepeda, Rachel Weinstein, Johnson & Johnson Pharmaceutical Research & Development, L.L.C., 1125 Trenton Harbourton Road, Titusville, New Jersey 08560, USA.,
M. Soledad Cepeda, Rachel Weinstein, Peter Gates, Paul Stang & Jesse A. Berlin
Victor S. Lobanov, Michael Farnum, Peter Gates and Dimitris K. Agrafiotis are at Informatics, Johnson & Johnson Pharmaceutical Research & Development, L.L.C., 665 Stockton Drive, Exton, Pennsylvania 19341, USA.,
Victor S. Lobanov, Michael Farnum & Dimitris K. Agrafiotis

Authors

M. Soledad Cepeda
View author publications
You can also search for this author in PubMed Google Scholar
Victor S. Lobanov
View author publications
You can also search for this author in PubMed Google Scholar
Michael Farnum
View author publications
You can also search for this author in PubMed Google Scholar
Rachel Weinstein
View author publications
You can also search for this author in PubMed Google Scholar
Peter Gates
View author publications
You can also search for this author in PubMed Google Scholar
Dimitris K. Agrafiotis
View author publications
You can also search for this author in PubMed Google Scholar
Paul Stang
View author publications
You can also search for this author in PubMed Google Scholar
Jesse A. Berlin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to M. Soledad Cepeda or Victor S. Lobanov.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cepeda, M., Lobanov, V., Farnum, M. et al. Broadening access to electronic healthcare databases. Nat Rev Drug Discov 9, 84 (2010). https://doi.org/10.1038/nrd2988-c1

Download citation

Issue Date: January 2010
DOI: https://doi.org/10.1038/nrd2988-c1

Broadening access to electronic healthcare databases

Box 1 | Indices

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Search

Quick links

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding authors

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links