Introduction

Background to the study

Clinical sequencing is an area of rapid growth, as laboratories, medical centers, and hospitals adopt this method of comprehensively scanning the human genome for genomic variants associated with clinical disease or pharmacogenomic effects.1 Clinical sequencing impacts multiple facets of clinical care, such as processing of large-scale genomic data; reporting of results, including results to diagnose the presenting condition; integration of results into the medical record; and support of genome-enabled clinical decision making. To guide the development of best practices for the integration of clinical sequencing into clinical care, as well as to research the ethical, legal, and psychosocial implications of delivering broad genomic data into the clinic, the National Human Genome Research Institute, in collaboration with the National Cancer Institute, funded the Clinical Sequencing Exploratory Research (CSER) Program.2 Currently composed of six leading academic medical centers, CSER is working collaboratively to incorporate sequence data into the clinical care of patients and to examine the relevant ethical, legal, and psychosocial issues. Each site is completing a clinical genomic trial (whole-exome sequencing (WES) or whole-genome sequencing) with a unique patient population. The acceleration in uptake of WES outside the research setting is illustrated by the Baylor experience, in which clinical WES for direct patient care was offered beginning in November 2011. In the first 18 months, the laboratory has reported more than 1,000 WES results and currently receives ~200 clinical test requests monthly. The CSER consortium members awarded funding by the National Institutes of Health in 2011 are Baylor College of Medicine (BCM), Brigham and Women’s Hospital–Harvard Medical School (BWH), The Children’s Hospital of Philadelphia (CHOP), the Dana-Farber Cancer Institute (DFCI), the University of North Carolina (UNC), and the University of Washington (UW).

The CSER Electronic Medical Record Working Group (CSER EMR WG) was created to explore informatics issues related to annotation and prioritization of genomic variants, as well as to the integration of genomic results into the EMR/electronic health record (EHR) and EHR-enabled clinical decision support (CDS). The CSER EMR WG (chaired by P.T.-H.) includes representation from each CSER site and the National Institutes of Health. The initial goal was to systematically describe and compare the current state across the CSER sites regarding how WES and whole-genome sequencing results are incorporated into the EHR, including approaches to decision support. This work is complementary to the special issue article on “Opportunities for Genomic Clinical Decision Support Interventions”3 by the Electronic Medical Records and Genomics Network,4 which describes opportunities for genomic CDS informed by the literature on traditional computerized CDS. Our work takes a bottom-up approach (examining current practices at the CSER sites), whereas the Electronic Medical Records and Genomics article takes top-down approach (describing an ideal state and requirements to achieve this state) to elucidating the future directions for genomic CDS.

Background on the six current CSER sites

The six initial CSER sites5 represent projects looking at the issue of generating and incorporating next-generation sequencing (NGS) data (WES and whole-genome sequencing) across a broad range of clinical settings to study the implementation and impact of genomic medicine. The sites are exploring outcomes including (i) the communication of results to their target populations, (ii) the preferences of their participants with respect to return of incidental findings, (iii) the impact of reporting NGS to participants, (iv) how best to report and store NGS data, (v) the incorporation of novel decision support technologies and solutions, and (vi) the use of NGS as a diagnostic modality. Table 1 describes the patient populations and unique features at each site.

Table 1 Populations and features of the six current Clinical Sequencing Exploratory Research sites

Materials and Methods

The CSER EMR WG developed a survey to describe and compare across the CSER sites how NGS results are being incorporated into the EHR, including approaches to decision support. Table 2 provides an overview of the survey, and Figure 1 illustrates graphically in shaded form the scope of the survey.

Table 2 Overview of survey
Figure 1
figure 1

This figure created by the National Institutes of Health Clinical Sequencing Exploratory Research (CSER) Electronic Medical Records Working Group (EMR WG) shows the typical CSER site workflow from specimen acquisition through the reporting of whole-exome or whole-genome results into the EMR. As described in the text, each of the six CSER sites has its own site-specific workflow and its own variant database/knowledge base. The focus of the CSER EMR WG (and the survey presented in this article) is indicated by the shaded areas of the figure.

Over a 3-month period, the CSER EMR WG members drafted and refined the survey, which was then completed by each of the six sites in early 2013. Each site analyzed and summarized the results for one of the six main sections of the survey, using a qualitative approach. For certain questions, additional clarifying information was sought during the analysis and summarization, which was reviewed by all the sites. The discussion section includes material from CSER EMR WG teleconferences and in-person meetings in which the data from the survey and site-specific approaches to NGS integration into the EHR were presented and discussed.

Results

Context

All six CSER network sites are performing massively parallel sequencing using the Illumina HiSeq platform (San Diego, CA) and analyzing sequence data for germline mutations associated with inherited disease and/or risk of disease. BCM and DFCI are focused on cancer ( Table 1 ) and thus they are also analyzing tumor samples for the presence of somatic alterations.

Five CSER sites are performing NGS locally, and one site is outsourcing NGS followed by on-site variant confirmation by Sanger sequencing. BCM and BWH are performing NGS in a Clinical Laboratory Improvement Amendments (CLIA)–certified laboratory setting. The other sites are performing NGS in a research setting followed by Sanger confirmation of variants in a CLIA–certified laboratory ( Table 3 ). Most sites are planning for a transition of NGS to a CLIA–certified laboratory in the near future.

Table 3 Overview of Clinical Laboratory Improvement Amendments (CLIA) model, variant annotation, and report generation by CSER site

Each site has built a local bioinformatics workflow for variant annotation and interpretation ( Table 3 ). All sites have incorporated multiple public data sources for variant annotation as part of their bioinformatics workflow. In addition, each site has built tools to incorporate locally derived NGS data for variant annotation, such as local allele frequencies or tracking of previously ascertained sequence variants. All sites incorporate manual or semi-automated curation of sequence variants by searching the medical literature and/or relevant locus-specific databases to determine the clinical significance of sequence variants before reporting.

All sites report different categories of information ( Table 3 ) based on whether a variant is related to the disease phenotype, a medically actionable incidental finding, or other incidental but otherwise deemed reportable finding (e.g., carrier status, pharmacogenetic information). Each site has developed its own strategy, described in detail elsewhere,6 for how to classify and report incidental information. Several sites incorporate patient choice to receive certain types of incidental genomic information with opt-in or opt-out categories. All sites provide an indication-specific, focused report for potentially diagnostic information and also report incidental findings not related to the disease phenotype as additional information or in a separate report. In addition to relevant literature references, most of the CSER sites provide links in their reports to websites that provide additional information regarding variant classification and interpretation, such as OMIM,7 PubMed,8 RefSeq,9 dbSNP,10 GeneTests,11,12 GeneReviews,13,14,15 Clinicaltrials.gov,16 and PharmGKB,17 as relevant. As a result of the heterogeneity apparent in Tables 1 and 3 , each site has developed its own custom annotation workflow. Currently, these bioinformatics tools are not being shared, and commercial tools are not being used.

Variant databases/knowledge bases

The capabilities and characteristics of site-specific, accumulated variant databases/knowledge bases (VDBKBs) have implications for what structured and computable data can be sent to the EHR. External and internal VDBKBs are a critical component of NGS diagnostic processes. The goal of these processes is to determine whether any of the tens of thousands to millions of variants identified by NGS are clinically important (e.g., associated with the indication for testing, a medically actionable incidental finding, or having a significant pharmacogenomic association). The need for these assessments and the creation of local VDBKBs stems from the fact that the NGS platforms currently available do not include tools for annotation of the variants they report nor are they integrated with VDBKBs.

The sites reported the use of different external variant and gene databases ( Table 3 ). External variant databases that catalog reported associations between specific variants and known clinical phenotypes serve as a first-pass filter for annotation. However, thoroughly evaluating NGS data often requires verifying reported associations and searching for other potentially important variants that have not yet been associated with clinical phenotypes. This task is particularly challenging because most annotation features available for filtering provide suboptimal sensitivity and/or specificity. External databases with information on variant frequencies or gene-level annotations are particularly useful in this context.

In addition to utilizing these external databases, each CSER site has developed a custom local VDBKB to manage its internal variant information and support its reporting process ( Table 4 ). These systems record the variants identified in each patient and most CSER sites are also recording their internal variant assessments. Investigating the clinical implications of variants can be a time-consuming process; therefore, it is useful to store variant classifications so that they can be leveraged if a variant is identified again in another patient. BWH uses VDBKB annotation software (GeneInsight, Boston, MA)6 that is commercially available. The other sites have developed their own systems.

Table 4 Overview of variant database, report generation, EHR, reporting, and decision support by CSER site

Reporting of results into the EHR

The systems used for report generation (before integration of the report into the EHR) are unique to each site ( Table 4 ). Reporting is semi-automated (manual at CHOP), involving a combination of expertise in genetics, genomics, bioinformatics, and laboratory medicine/pathology using custom systems for report generation, including the site’s local VDBKB. Four sites send the same report(s) to all providers/patients. CHOP has separate reports for each group. DFCI and BCM have a tumor board–specific report. BCM has a more in-depth clinical report for external providers. The UW report has the first section designed for nongeneticists and later sections aimed at experts in molecular variants. Each of the sites uses a different laboratory information management system (LIMS) ( Table 4 ) that is either locally developed or involves custom adaptations to a commercial platform.

The destination systems for reports are both commercial and custom-developed EHRs ( Table 4 ). Four sites use a single EHR and do not have partner sites, whereas UNC and UW have partner sites with separate EHRs. UW has a heterogeneous EHR environment ( Table 4 ), using Cerner,18 Sorian,19 and Epic20 EHRs. BWH, DFCI, and UNC are using custom-developed EHRs to report NGS results while making plans to report into the commercial EHRs in the future.

Despite the heterogeneity in workflows, VDBKBs, bioinformatics tools, LIMS, and EHRs ( Tables 3 and 4 ), the common end result at all sites is a PDF human-readable structured document designed to be sent to the EHR (analogous to a pathology or other text report). Several CSER sites also report to outside labs that are not part of the CSER project, using their normal reporting process. Some sites also provide structured reporting in machine-readable format. Due to a lack of standards in report content, structure, coding, generation, and LIMS ( Table 4 ), current EHRs are not able to process these structured reports because their format is unique to each site. For a subset of actionable indication and incidental findings, UW uses structured laboratory data (molecular testing results pathway) in the SunQuest clinical laboratory system for active decision support.

Communication of results to providers

All sites feel that it is important to have highly trained personnel in medical/molecular genetics available to the ordering physician. Each laboratory has identified genetic counselors, molecular geneticists, or medical geneticists to communicate with the ordering physician. In addition, the majority of sites make the medical director of the lab available to ordering physicians as needed to explain more complex results.

Sites estimated that conversations to explain the implications of results, discuss interpretation of uncertain variants, and answer questions typically take 15–30 min. The ability to effectively explain the results to ordering physicians was felt to be an important challenge in scaling up the process. The BCM laboratory, currently the largest clinical WES CSER testing site, has already had to increase the number of staff who perform this specific function.

Handling of changing variant/annotation information

As new genomic discoveries are made, genomic findings may be reclassified over time, and other medical knowledge relevant to an NGS analysis or test interpretation may be gained. Therefore, each CSER site has considered the challenge of reinterpreting genomic events and what, if any, actions are required when new information is available for a patient’s genome report. The results of this survey demonstrate the wide spectrum of study policies across the sites for the reanalysis of genomic data and the disclosure and reporting mechanisms for new information. The diversity of approaches across the CSER sites highlights the challenges of integrating new genomic information into clinical care given the rapid pace of scientific discovery.

CSER sites have segregated into one of two overarching options when it comes to disclosure of new genomic information that may impact a patient’s clinical management: (i) strict nondisclosure of new information (DFCI, UW) or (ii) disclosure of new information if logistically feasible ( Table 4 ). The BWH site has an automated method for disclosure using GeneInsight. Automated e-mail alerts are sent to clinicians when a variant is reclassified in a manner that generates a “high” alert, and clinicians receive “medium-” and “low- priority” alerts in batch form or on a weekly basis.21 A “high” or “medium” alert reflects a possible change in treatment (e.g., uncertain significance to pathogenic or pathogenic to nonpathogenic category), whereas a low-level alert specifies less substantial reclassifications (e.g., benign to likely benign). Other sites have opted for semi-annual reviews (UNC) or periodically as determined by the diagnostic lab (BCM, CHOP).

Regarding incidental findings that do not directly relate to the diagnostic question under evaluation but that could impact decision making in a broader clinical context, one site (UNC) has opted to issue amended reports that include new data. However, all other sites that would issue new information plan to do so only for variants that fall within the original indication for testing where this original assessment may include a “general genome report” that assesses highly penetrant conditions independent of a prior probability of disease.

CDS

Given the growing number of reportable and actionable genes, the ideal approach to NGS CDS in the EHR would include a combination of active (e.g., alerts inside the EHR triggered by context) and passive CDS (e.g., reports requiring providers to seek out and review reports).22

Because all sites use PDF documents for passive CDS ( Table 4 ), we explored how the sites ensure clinicians are made aware of the reports. Two organizations (DFCI and UW) used features built in to the EHR to deliver sequence results (with notification) to a wide range of clinical stakeholders, whereas two locations (BWH and BCM) sent e-mails (no protected health information) outside the EHR to staff at the clinic where the order was placed and provided verbal notification for more complex cases. At BWH, the e-mails provide deep links that enable clinicians to authenticate and access relevant information or navigate to the information through a system integrated with the EHR under a patient genetic summary table. A medical geneticist at a fourth location (UNC) provided results verbally to study subjects and solicited informed consent for results to be entered into the EHR. In addition to the built-in notification features of the EHR, UW also directly contacted the primary provider for a subset of the participants.

Active decision support rules were functional at two organizations, BWH and UW ( Table 4 ). BWH developed custom decision support external to, but integrated with, its custom (locally built) EHR. This system, along with medication data from the EHR, provides pharmacogenomics decision support and sends notifications when variants are reclassified. UW is using native features in its commercial EHR (Cerner) to implement alerts as part of pilot efforts for selected variants for patients enrolled in the CSER study by building on prior UW pilot work.23

Other CDS approaches are also used. Two organizations (UNC, UW) enhance their reports with clickable links to supporting materials as a variant of passive decision support. BCM has put significant effort into developing an iPad application providing the test report in a more dynamic platform. The tablet application allows control of the content and direct educational links (glossary, OMIM, PubMed) independent of the current limitations of the EHR at the study site as well as the variety of EHRs used by hospitals outside the study receiving test results as a simple PDF.

Discussion

Across the sites there is a common sequencing approach (Illumina HiSeq) and a common end point (PDF text providing passive CDS). Despite this commonality, there is great diversity of workflow, bioinformatics tools, and approaches starting from sequence data and ending with report generation, which presents both opportunities and challenges for the community.

A key reason for this diversity is that each site has its own approach to annotating variants ( Table 3 ), with its own informatics tools and overall workflow. This heterogeneity is independent of NGS platform and results from the manual annotation currently required of NGS data ( Figure 1 ) and different decisions around this annotation process at each site ( Table 3 ). Another cause of diversity in workflow and tools is that each site has its own internal VDBKB to capture site-specific assessments of variants ( Table 4 ). Unfortunately, VDBKB content cannot yet be shared across sites due to lack of standardization. A consequence of this heterogeneity is that the same variant in a sequence may not be annotated or reported on in the same way across the sites. There are a number of National Institutes of Health–supported initiatives, including ClinVar24 and a proposed resource for the identification and dissemination of consensus information on genetic variants relevant for clinical care (described in RFA-HG-12-016, ref. 25) to address the current duplication and heterogeneity of internal VDBKBs.

Passive CDS in the EHR is implemented at all sites as human-readable structured documents. The ability to easily generate a PDF report independent of the workflow leading to the report, and the ability of virtually any commercial or custom EHR system to accept a PDF report, suggests this is likely to be a common first step to genomic decision support. However, this approach has the known risks of passive decision support,22 which are substantially exacerbated by the complexity of NGS.

Active CDS thus will be necessary as NGS-based testing is increasingly adopted, and as the number of genes and variants deemed actionable/reportable for a given patient rises.26 Providers cannot be expected to read and remember all variants for a given patient via a passive CDS. Active CDS triggered by context (e.g., drug–gene interactions at the time of electronic order entry) will be critical to effectively scaling up EHR-based CDS of NGS results. EHRs, particularly those of commercial systems, include decision support engines that can be adapted for active genomic CDS subject to the availability of trigger conditions.27 Institutions have published literature about using CDS engines for pharmacogenomics decision support using different types of genomic data. For example, the University of Utah used limited single-nucleotide polymorphism and allele data to pilot pharmacogenomic decision support for CYP2C9.28 St Jude Hospital (Memphis, TN) used data from the DMET Plus array (affymetrix, Santa Clara, CA 1,936 genomic variants in 225 genes) and the Cerner EHR to implement a set of pharmacogenomics decision support rules for 29 CYP2D6 alleles and 9 TMPT alleles.29 Two CSER sites are extending this work to NGS ( Table 4 ). UW is building on its proof-of-concept work23 and extending the standard single-gene molecular testing result mechanism to put multiple actionable pharmacogenomics variants into the EHR in a computable representation. These data can then be used by the UW commercial EHR for active CDS. This approach is consistent with the recommendation made by Masys et al.,30 which involves putting only the actionable variants into the EHR as discrete data rather than all the NGS data. This approach is also consistent with the model presented by the Electronic Medical Records and Genomics consortium.31 Representing actionable variants via a clinical LIMS single-gene molecular testing result reporting data structure could generalize to any LIMS and any EMR; however, it is difficult to scale up because a new “test” would need to be created for each gene. BWH is using an external tool (GeneInsight) integrated into the EHR through a single sign-on mechanism that passes patient context. GeneInsight sends structured data into EHRs to support CDS rules that rely on these data (e.g., supplying data to the EHR needed to provide a pharmacogenomic alert that appears at the time of electronic order entry). Both CSER active CDS approaches have the potential to be implemented at other sites.

A challenge to scaling up active CDS is that currently computable rules cannot be automatically derived or created from the VDBKB. Although not explicitly asked in the survey, four of the six sites stated that they felt more sophisticated (and standardized) biomedical informatics tools for interpretation of sequence variants are needed in order to effectively scale up NGS diagnostics. The emerging discipline of translational bioinformatics32 includes as its focus these types of translational tools. The change of variant/annotation information over time magnifies the challenge of maintaining these rules. These challenges are independent of the NGS method and the approach to delivering the results into the EHR and are inherent to the nature of NGS. The need to re-annotate is especially an issue for the sites disclosing new genomic information ( Table 4 ).

A final barrier to putting actionable variant NGS data into the EHR is poor adherence to existing standards to represent genomic variants, such as the guideline proposed by the Human Genome Variation Society33 (in contrast to adopted standardized coding systems such as the International Classification of Diseases, 10th revision).37 This challenge is magnified by the lack of any VDBKB collective that would feed actionable variant results into the EHR. Mark Hoffman (of the Cerner EHR company) has over the past 8 years begun to put into the Cerner system an open source ontology along these lines,34 working toward a vision of a genome-enabled EHR35 to enable a more generalizable and scalable approach to personalized medicine.36 Ultimately, adoption of such standardized ontologic approaches will be key to enabling sharing of active genomic decision support rules across organizations and EHR vendors.

CDS alone is not sufficient and needs to be augmented by a mechanism to provide human-to-human communication. All sites feel it is important to provide the ordering physician access to highly trained personnel in medical/molecular genetics. This need is independent of the EHR environment and reflects broader perspectives in the genetics community. Given the complexity and nuances of interpreting genomic test results, as well as the possible involvement of different medical stakeholders with widely varied perspectives and genomic knowledge, there is a need for better leveraging of standard EHR mechanisms for person-to-person consultation (e.g., provider-to-provider messaging embedded in commercial EHRs such as inboxes and electronic consult workflows).

The issues above and the heterogeneity in approaches to variant identification, annotation, prioritization, VDBKBs, and reporting tools ( Tables 3 and 4 ) has led each CSER site to build bioinformatics tools and workflows for the four intermediate steps between sequence data and decision support outlined here ( Figure 1 ). This represents significant past and ongoing investment in bioinformatics infrastructure within and across the sites. Based on the current study and the work of the CSER EMR WG, it appears that these tools are too closely linked to local variations in approaches ( Tables 3 and 4 ) to be generalizable across the sites, even if they are open source. A number of early-phase companies are attempting to fill this gap between the sequence data and the EHR. BWH uses the GeneInsight and Alamut applications in their workflow. CHOP uses Cartagenia and Alamut. The CSER EMR WG is thus exploring opportunities for cross-site collaboration across the steps in Figure 1 as part of the recently formed CSER Coordinating Center.

We conclude that future directions to maximize the ability to scale up NGS for clinical use include (i) more cross-site collaboration in creation, curation, and integration of VDBKBs; (ii) ensuring these knowledge bases are able to generate both human-readable and computable reports (and standardizing the vocabulary or ontology used to code the computable reports); (iii) development of standards for automating the translation of information in VDBKBs into active decision support rules; (iv) development of best practices for integrating biomedical informatics into clinical and communication workflows; and (v) collaboration with vendors on adapting their active CDS (both EHR and related genetic system vendors as well as the emerging third-party NGS decision support vendors).

Disclosure

S.J.A. and M.L.: Partners HealthCare licensed the GeneInsight technology in November 2012 to a company in which it acquired 100% equity ownership. This transaction was reviewed by the Partners Committee on conflicts of interest in light of Partners’ acquisition of this financial interest, and the Committee, consistent with Partners policy, required that notice of Partners financial interest in the technology be provided to journals and in publications and presentations on the technology. S.J.A. and M.L. are staff and employee. The other authors declare no conflict of interest.