Main

The Electronic Medical Records and Genomics (eMERGE) Network is a National Human Genome Research Institute (NHGRI)–funded consortium tasked with developing methods and best practices for the utilization of the electronic medical record (EMR) as a tool for genomic research. The eMERGE Network comprises nine geographically distinct groups ( Figure 1 ), each with its own biorepository where DNA specimens are linked to phenotypic data contained within EMRs. The large number of study participants and considerable diversity of the network sites provide a unique opportunity to conduct cost-effective studies in genomic medicine. Longitudinal phenotypic data already contained within EMRs linked to each group’s biorepository can be extracted and repurposed so that cases and controls for a large number of phenotypes can be collected efficiently and merged across eMERGE Network sites. These data can then be combined with genomic data for the discovery of genotype–phenotype associations, and these discoveries, once validated, may be introduced back into the EMR to augment clinical care ( Figure 2 ).

Figure 1
figure 1

Locations of member sites, affiliates, and support and service centers of the eMERGE Network. Red color indicates the nine members of eMERGE- II, gray color indicates the eMERGE Coordinating Center, blue color indicates an eMERGE affiliate or subcontract site, and black color indicates centers that provide services and support to eMERGE. eMERGE, Electronic Medical Records and Genomics.

Figure 2
figure 2

Outline of the activities in the eMERGE Network. The main activities of the network and how they are integrated together are summarized. See text for details. eMERGE, Electronic Medical Records and Genomics; EMR, electronic medical record; GWAS, genome-wide association studies.

Now in its sixth year and second funding cycle, the network continues to make advances in multiple disciplines related to the fields of genomics and health-care informatics. Locations of the nine research groups, their affiliated sites, a coordinating center, and the services and support centers constituting the current eMERGE Network are shown in Figure 1 . Outlines of the activities of the eMERGE Network are shown in Figure 2 , and the organizational structure of the network is represented in Figure 3 . Details of the biorepositories, EMR systems, and genotyping projects are summarized in Table 1 , and goals of the projects at each eMERGE site are listed in Supplementary Table S1 online. The primary and secondary phenotypes selected by each site are summarized in Supplementary Table S2 online. Additional site and project descriptions were authored by each site and are presented in the Supplementary Materials online. In the following sections, we describe the evolution of the network in the context of the rapidly changing landscape of genomic medicine.

Figure 3
figure 3

Structure of eMERGE Network. The Steering Committee, composed of the principal investigators from each institution and the NIH Project Scientist, is the governing body for the consortium. The External Scientific Panel provides input to the NHGRI director about the progress and direction of the network. The Coordinating Center provides centralized support and infrastructure. Genotyping Centers provide genotyping under CLIA certification for clinically actionable genetic variants. For details on the activities by the workgroups listed at the bottom of the figure, please see the main text. CLIA, Clinical Laboratory Improvement Amendments; eMERGE, Electronic Medical Records and Genomics; NHGRI, National Human Genome Research Institute; NIH, National Institutes of Health.

Table 1 Summary of biorepositories, electronic medical records, and available genome-wide data at 10 eMERGE institutions

Summary of Phase I Scope and Aims

A request for applications from the NHGRI for eMERGE was released in March 20071 and was intended “to provide support for investigative groups affiliated with existing biorepositories to develop … methods and procedures for genome-wide studies in participants with phenotypes … derived from EMR.” In September 2007, grants were awarded to five sites (hereafter referred to as eMERGE-I)—Group Health Cooperative/University of Washington, Marshfield Clinic, Mayo Clinic, Northwestern University, and Vanderbilt University, which also served as the network’s coordinating center.

eMERGE-I had three major aims: (i) use EMR data for robust electronic phenotyping, (ii) conduct genome-wide association studies (GWAS) using the phenotypes derived in the above-mentioned first aim, and (iii) explore the ethical, legal, and social implications associated with EMR-based GWAS and wide-scale data sharing. The network formed workgroups that became the main drivers of progress in the key subject areas. In eMERGE-I, the workgroups included an informatics group, a genomics group, and a consent and community consultation group. Besides numerous publications (for a complete listing, see http://www.gwas.org), the workgroups had several accomplishments that were fundamental to the aims of phase I. The consent and community consultation group published model consent language for EMR-linked biorepositories, intended to harmonize the consent process for the collection and storage of human biospecimens and data for future research, particularly those collections that have an EMR component.2 The genomics workgroup created a unified data set of genotyped samples across all sites and published a “how to” paper that outlined the procedures and lessons learned from combining genotype data across a research network. The documented pitfalls of merging data from different genotyping facilities (even when generated on the same genotyping platform), such as inconsistencies in strand orientation, sample relatedness and population stratification across sites, site-specific batch effects, and errors introduced in the merging process, are of relevance to any group attempting to merge data from multiple sites.3 The informatics workgroup created and published a library of EMR-based phenotyping algorithms accrued throughout phase I that is available to investigators outside of the eMERGE Network.4

Lessons Learned from Phase I

Much of the success of eMERGE-I resulted from utilizing the full capacity of the network, and several key lessons learned were used to augment its structure.5

Although the founding sites initially focused on projects relating to phenotypes of local interest as well as joint projects, it became clear that projects had better outcomes when deployed across the network. Development of a phenotype algorithm was generally led by one site and then deployed at a second site. The issues encountered as the second site implemented the algorithm led to revisions that made it more robust and generalizable when deployed across the network. In addition, there was increased statistical power when cases and controls were shared. The eMERGE Network has played a major role in validating the concept that phenotypes derived from EMRs can be used successfully for GWAS and has disseminated its methods and findings extensively.6,7,8,9,10,11,12,13

Most eMERGE participants have consented to contributing their data to health research of any kind. However, whenever combining large data sets pertaining to individual-level information such as health or genomic data, even when fully deidentified, there exists the potential risk for the identification of individuals. Through network-wide projects, eMERGE-I was compelled to develop best practices for the sharing of genomic data and EMR-derived phenotypes while protecting the privacy of participants, and these have been published to aid other investigators engaged in the field.14,15,16,17,18,19

The issue of returning research results to participants emerged as another key point for discussion as network analyses identified individual-level chromosomal anomalies such as Klinefelter and Turner syndromes. In response, the network convened a return of results (RoR) oversight committee to provide ongoing support and clinical information on incidental findings from GWAS. These discussions were also brought to local constituencies for final decision making. The process is outlined and published and may form the basis for a deliberative model for adoption by other collaborative research groups faced with similar challenges.20

Transition to Phase II (eMERGE-II)

The key advances and challenges encountered in phase I were instrumental in shaping the goals of eMERGE as the network transitioned to phase II in August 2011 following a second request for applications.21 The memberships of the five initial sites were renewed and two new sites, Geisinger Clinic and Mount Sinai School of Medicine, were added. A separate award for the network coordinating center was granted to Vanderbilt University. In August 2012, following a request for applications22 for pediatric sites, eMERGE membership was extended to Children’s Hospital of Philadelphia and a joint membership for Cincinnati Children’s Hospital and Boston Children’s Hospital ( Figure 1 ). In particular, the new, larger network was interested in broadening its scope from using EMR data for discovery of genotype–phenotype associations all the way through to incorporation of genotype data into the EMR ( Figure 2 ). This would allow the network to assess the utility of these results in clinical decision making such as informing clinicians of relevant pharmacogenomic (PGx) variants before a drug is prescribed or identifying persons at high genomic risk for a given condition.

This new focus required restructuring of the eMERGE-I workgroups for phase II. eMERGE-II introduced workgroups on EMR integration of genomic information, return of genomic results, and PGx, which was designed to address the complexities of linking genetic variation data with EMRs for effective clinical use as well as to address the difficulties in determining which results to use and how to return these results to participants and providers. The consent and community consultation group was restructured to include focus on clinician and patient education, and the informatics workgroup was restructured to become the phenotyping workgroup. As in phase I, an External Scientific Panel was formed to meet annually with eMERGE-II investigators in order to challenge the focus of the network and to encourage appropriate dissemination of products and lessons learned ( Figures 2 and 3 ).

Major Goals and Activities of eMERGE Phase II

The eMERGE Network continues to discover genomic variants associated with clinical conditions identified using EMRs and to develop algorithms for electronic phenotyping. Building on this success, the network is now extending its focus to pilot studies for implementing genomic medicine through the EMR.23 Critical goals include determining the optimal methods and infrastructure needed for aspects such as patient consent, laboratory assays, RoR, integrating findings into the EMR, and providing sufficient decision support and patient/clinician education to use them effectively ( Figure 2 ). These components are essential to facilitating the translation of genomic medicine from bench to bedside. To illustrate the regular activities of the eMERGE-II workgroups, case studies detailing a typical project have been authored by each group.

Phenotyping workgroup: phenotype algorithm development and PheKB

The phenotyping workgroup has as its goal the creation, validation, and execution of phenotype algorithms across the network and beyond. To aid in this process, investigators have developed Phenotype KnowledgeBase (PheKB),4 a repository for phenotype algorithms. Users can read, upload, search, and provide feedback on the algorithms and upload a variety of documents and metadata. Algorithms can be published and shared publicly or restricted to a particular collaborative group within a social networking framework to facilitate development and revising of the phenotypes. Users can comment and ask questions on phenotypes, receive e-mail notification when updates are made, and create “implementation” records, which capture site-specific validation of a phenotype algorithm. In eMERGE, phenotype algorithms on PheKB are validated at the creating site as well as at least 1–2 other institutions. PheKB is currently searchable by metadata fields.

Genomics workgroup: genotype imputation to facilitate network-wide genetic studies

To allow for the aggregation of genomic and phenotype data across all eMERGE sites, a genotype imputation pipeline was implemented to create a single and uniform data set for all individuals genotyped across the network. Genotype imputation is the process of inferring unobserved genotypes in a sample based on the haplotypes observed in a more densely genotyped reference sample. Imputation is computationally intensive and involves several steps including phasing the haplotypes, filling in the missing genotypes, and finally assembling and assessing the accuracy of the data. Version 1.0 of the eMERGE imputed data set includes more than 13 million single-nucleotide polymorphisms in more than 42,000 samples that have been imputed using the BEAGLE reference panel24 and the 1000 Genomes25 cosmopolitan reference panel, October 2011 release. The imputation process for eMERGE-II consumed ~1.1 × 106 CPU h.

RoR workgroup: penetrance of hemochromatosis mutations

The genetic and EMR data available in the eMERGE Network provide an opportunity to estimate the penetrance of genetic diseases, such as hemochromatosis, a common autosomal-recessive disorder of increased iron absorption, and subsequent adult-onset iron overload. Most individuals have C282Y or H63D mutations in the HFE gene but are asymptomatic. Homozygous and compound heterozygous adults for these HFE mutations will be identified from the eMERGE cohort, and a chart review will be carried out to establish the prevalence of hemochromatosis as well as the penetrance of related phenotypes. Because iron overload can be easily screened for and treated by phlebotomy, the cost–benefit of genetic screening is dependent on penetrance. The RoR workgroup is collaborating with the consent, education, regulation, and consultation workgroup on issues related to the process of returning clinically relevant HFE variants.

Consent, education, regulation, and consultation workgroup: evaluating the impact of returning hemochromatosis results

The consent, education, regulation, and consultation workgroup is working closely with the RoR workgroup on issues relating to the return of hemochromatosis results. Although there is compelling evidence that medical management of hemochromatosis can provide benefit to those with penetrant disease, a number of issues relating to the penetrance of HFE variants remain when making the decision to return results: is it possible to safely return low-penetrant results without unduly alarming participants and health-care providers? Do patients and their health-care providers find this information valuable? How do these decisions impact health-care costs? To answer some of these questions, the workgroup is developing a protocol to deliver HFE results and assess their impact. Education of research participants and health-care providers about low-risk genetic test results before the results are returned is critical. The effectiveness of educational tools, including those used within the EMR will be evaluated, and the amount of pre- and postreturn education required will be studied.

EMR integration workgroup: PGx pilot project

A major challenge in implementing genomic medicine is presenting relevant information to clinicians at the point of care. The increase of actionable genomic information needs to be matched with development and implementation of knowledge-based clinical decision support (CDS) systems deployed through EMRs. The eMERGE PGx project (also discussed in the next section) will preemptively genotype drug-naive patients who have an increased probability of receiving target drugs, primarily clopidogrel, warfarin, or simvastatin, in the next 3 years. The network consensus is that there is sufficient evidence and guidelines for preliminary clinical implementation of genotype-guided prescribing for these medications.26 For study patients, prescription of any one of these three drugs placed in computerized order entry systems will automatically trigger processing of clinical and genomic data. If predefined rules are met, information will be presented to the ordering clinician that could inform dosing or medication choice. Clinicians’ decisions to use or disregard the information will be analyzed along with feedback to identify factors that promote or impede implementation. The outcomes measured in eMERGE-PGx will be primarily process outcomes (e.g., number of patients identified with an actionable pharmaceutical genotype, number of times a CDS rule fires, percentage of clinicians who follow recommendation, and appropriate changes in medication or dose based on recommendation). However, sites that are farther along the translation spectrum plan to include measurement of some health outcomes, including documented adverse drug reaction within 24 h of initiation of opioid medication, development of myopathy, and adherence to medication.

Collaborations with external groups

Of the lessons learned through the eMERGE experience, none is more prominent than that of collaboration. The many individuals and groups with diverse geography, experience, and expertise that constitute eMERGE have undoubtedly increased both the yield and quality of our work. The tools created by eMERGE investigators, as well as the genomic and clinical databases within the network, provide valuable resources for collaborations. In addition to collaborations within and between the eMERGE sites and workgroups, the network is also working closely with other groups focused on similar goals and activities.

The NHGRI’s 2011 Strategic Plan emphasized implementation of genomic medicine, leading to the formation of the genomic medicine working group27 with members from more than 40 eMERGE and non-eMERGE institutions.28 The genomic medicine working group provides guidance to NHGRI and organizes meetings to discuss diverse implementation issues and develop pilot implementation projects.

Another key example of successful external collaboration is the eMERGE-PGx project, developed with the Pharmacogenetics Research Network.29 eMERGE-PGx will deploy targeted next-generation sequencing of 84 very important pharmacogenes. The activities of eMERGE-PGx include (i) clinical reporting restricted to very important pharmacogenes with evidence for “actionability” such as those included in guidelines promulgated by the Pharmacogenetics Research Network’s Clinical Pharmacogenomics Implementation Consortium;26 (ii) preemptive testing and presentation of “actionable” variants in the EMR with CDS at the point of care; and (iii) creating a repository of the other very important pharmacogene variants that will enable future genotype–phenotype studies.

The eMERGE Network has also forged successful links with other NHGRI-funded consortia including the Population Architecture Using Genomics and Epidemiology Consortium,30 the Return of Results Consortium,31 and the Clinical Sequencing Exploratory Research Program.32 These links have allowed the network to exchange expertise with other groups doing complementary and often synergistic work in the genomic medicine domain.

The eMERGE Steering Committee has established guidelines on how external institutions can apply for affiliate membership to the eMERGE Network (http://www.gwas.org), and this is strongly encouraged.

eMERGE Phase II Network Opportunities, Challenges, and Lessons Learned

The combined resources of the eMERGE Network provide opportunities accompanied by some significant challenges, which the workgroups are addressing. Some notable examples are highlighted below.

Portability of electronic phenotypes within and outside eMERGE

There is currently no formal “phenotyping language” for the purpose of building EMR phenotyping algorithms nor is there a common approach to their implementation. Developing portable phenotyping algorithms is an area of high priority in eMERGE, with a view to easing implementation within and outside the network. One potential solution is the National Quality Forum’s Quality Data Model, an XML-based information model for representing EMR-based quality measures to support meaningful-use reporting requirements.33,34,35 Nine algorithms have been implemented using the Quality Data Model, and eMERGE investigators are testing Drools36 and Konstanz Information Miner37 as common execution engines. The network’s experiences will be formally documented and disseminated to the community.

Approaches to EMR integration of genomic information

EMRs and CDS systems can improve the quality of care and reduce adverse drug events,38,39,40,41 but no commercial EMR integrates pharmacogenetic information systematically even though the US Food and Drug Administration drug labels include pharmacogenetic variants for 105 drugs in 117 contexts.42 Nomenclatures and ontologies,43 such as SNOMED-CT and LOINC, reasonably represent concepts related to genetic tests, but mechanisms for long-term storage of genomic data as well as secure, generalizable, and interoperable data exchange between health-care settings are needed to ensure continuity of care.44 Given that most of the genomic data gained through high-density genotyping arrays or whole-exome/whole-genome sequencing are not actionable at this time, and that knowledge and interpretation are changing rapidly, the data will likely be stored external to the EMR.45 eMERGE is investigating external CDS, but there is no standard for external CDS and subsequent user actions (e.g., placing an order). An external CDS engine cannot specify choices for what happens next, whereas integrated CDS can specify a litany of options. eMERGE is collaborating with the Clinical Decision Support Consortium46 and participating in other national efforts to address these issues. These interactions are expected to lead to the establishment of a standard for genome-informed CDS.

Integration of pediatric sites

The addition of pediatric eMERGE sites affords opportunities to explore new phenotypes and data sets while posing several challenges. Integration of pediatric and adult projects into one eMERGE Network is nontrivial but could provide valuable information about heritable diseases that present early in life and continue to adulthood. In theory, identifying genetic contributions to complex diseases should be easier in children because environmental exposures have less time to take effect. A study of childhood obesity47 in which in addition to replicating adult obesity loci, novel loci were identified, supports this hypothesis. The network’s experiences in combining adult and pediatric data will produce insights that are useful beyond the genomics community to large, heterogeneous collaborative research endeavors in general.

Longitudinal cost-effective genomic medicine discovery and implementation

The size and diversity of the collective eMERGE biobank and the rich EMR-linked phenotypic data provide a unique opportunity for cost-effective longitudinal studies in genomic medicine, permitting study of incident disease, age, and period biases,48 as well as reducing prevalence and incidence bias.49 Continued collection of data in the clinical setting at no additional cost to the research program not only increases its value and utility over time but may also necessitate informing participants about new interpretations of the results, either because knowledge about significant health impacts of identified variants50 is accruing rapidly or because new conditions or use of new medications change the risk profile context for the individual. The burden, ethics, and costs of revisiting genomic variation in a given person, as knowledge evolves about that person and the variation he/she carries, will continue to be a significant focus of the eMERGE Network. Any lessons learned are likely to be of great importance to the genomic medicine community as we near the possibility of comprehensive genomic information being the norm in clinical care.

Generalizable framework for the return of genomic results

The opportunities gained through longitudinal genomic discovery are strongly correlated with the challenges of returning results. It is generally accepted that results with an immediate impact on a person’s health should be returned to the research participant.50,51,52,53 There is, however, far less consensus on how “medically actionable” or the related concept of “clinical utility” should be defined.53,54 Returning genomic research results raises practical, financial, psychosocial, and ethical challenges for both investigators and patients.53 The network is investigating models that allow patients to make choices about their results, evaluating the benefits and costs of returning results,50 and has also initiated consultation about returning research results with stakeholders, including physicians, patients, advisory committees, laboratory directors, and health plans.

The eMERGE network in the context of a translational framework

Implementing genomic medicine in the clinic is part of the strategic vision of the NHGRI and has been discussed recently.28,55,56 Five phases of moving genomic research into practice and policy have been defined,57,58,59 with the early phases focusing on biologic discoveries (T0), development of candidate health applications (T1), and assessing outcomes of interventions (T2). eMERGE-I focused largely on the T0 discovery phase through GWAS. eMERGE-II is developing T1 applications such as genomic risk prediction algorithms and clinically validated PGx assays, while continuing T0 discovery research through GWAS and phenome-wide association studies.10 eMERGE is not powered to assess outcomes directly (T2) but is building upon available literature and expert opinion to investigate how best to move genomic findings into health practice (T3) in its pilot implementation projects. The continued need for T2 research is expected to be greatly facilitated by the infrastructure for genomic research in biorepositories that eMERGE is developing and freely disseminating—especially its methods for electronic phenotyping and mining of EMRs, consent, returning results, patient education, and providing education and decision support to clinicians. eMERGE-II resources and findings will also facilitate the conduct of future T3 implementation research and potentially provide the foundation for comparative effectiveness research and public health surveillance (T4).

Conclusions

In the nearly 6 years since its inception, eMERGE has made great strides in the fields of genomics and informatics, contributing significantly to the now-established notion that the EMR is a powerful and cost-effective tool for genomics research. The network has developed tools and best practices that are being shared and utilized by the genomics and informatics communities and beyond. Building on its success, eMERGE is poised to lead the implementation of genomic medicine in clinical care through the EMR. It is hoped that this will result in improvements in health care, through safer and more effective prescribing, augmentation of primary and secondary prevention strategies, and enhanced understanding of the biology of disease. With the passage of the Patient Protection and Affordable Care Act and major changes to health-care delivery now upon us, there has never been a greater need and opportunity to improve safety and efficiency while reducing costs.

Disclosure

The authors declare no conflict of interests.