INTRODUCTION

Increasing numbers of patients are undergoing panel, exome, or genome sequencing testing as a part of routine clinical care; consequently, genetic variation that is medically actionable, but not related (secondary) to the indication for testing will become increasingly common. Though there are recommendations regarding the return of these findings,1,2,3 challenges remain regarding the identification, interpretation, and return of these results to patients. To begin to address these challenges, the American College of Genetics and Genomics (ACMG) published guidelines listing 56 genes1 from which secondary findings (SFs, sometimes called incidental or additional findings) should be reported when genomic sequencing is used for clinical purposes, and the types of variants, per interpretation guidelines, that would qualify as true SFs. In 2017, the ACMG working group published an updated version of these guidelines, adding 4 genes and removing 1 gene to make up the current list of 59 genes.2

Despite continued development and uptake of this list by the community, the most recent update to the ACMG guideline for secondary findings return still refers to pathogenicity attributes from the 2008 ACMG variant interpretation guidelines.4 However, many labs return SFs classified as likely pathogenic (LP) or pathogenic (P) based on the 2015 ACMG/Association for Molecular Pathology (ACMG/AMP) interpretation framework,5 which is not a direct match to the known pathogenic (KP) and expected pathogenic (EP) categories recommended by current ACMG secondary findings guidelines. One major challenge to the application of the ACMG secondary findings guidelines is that, to date, much of the expert opinion regarding actionability of these gene–disease pairs does not derive from the experience of testing the broader patient populations that these guidelines are primarily intended to support. Specifically, prior estimates of SF frequency have been limited in the age, ancestry, and phenotypic diversity of the sample, restricted to a subset of all known actionable genes (e.g., hereditary cancer only), and often constrained to a single health-care entity or testing lab.6,7,8,9

Providing data to guide the process and inform the rationale for or against SF return was a major goal of the third phase of the National Human Genome Research Institute’s (NHGRI) Electronic Medical Records and Genomics Network (eMERGE III). eMERGE III aims to study and improve standards and methods for delivery of clinical and research data across a multisite cohort, while providing actionable genetic results derived from a next-generation sequencing platform to eMERGE III research participants. This current phase builds on the network’s decade of experience in genomics and return of results in the context of health systems, clinical data abstraction from the electronic health record (EHR), linking health-care processes that inform care with those that catalyze research, and studying multiple models of participant consent.

To this end, we developed eMERGEseq, a sequencing panel focused on 109 genes of network interest, and then selected a subset deemed actionable by network consensus including all 56 genes on the initial ACMG list alongside an additional 11 genes and 14 individual variants in 11 additional genes. The panel was deployed by two CLIA sequencing labs on samples from 25,015 participants drawn from ten different clinical sites across the United States;10,11,12 the 21,915 participants in the SF cohort described here are participants eligible for results from the entire consensus return list who were not ascertained based on prior sequencing results. Here we describe the frequency of actionable SFs within this cohort to better inform researchers, patients, and providers seeking an understanding of the spectrum of results that can be expected from personal genomic testing.

MATERIALS AND METHODS

Panel development and actionability assessment

The process to determine the eMERGE III sequencing platform content and consensus actionability list, described elsewhere,10 was led by the eMERGE III Sequencing Centers and Clinical Annotation Working Group (ClinAnn WG), with input from the Return of Results Working Group and all eMERGE network members. At the time the eMERGEseq platform was designed, the ACMG recommended return of pathogenic variants in 56 gene–disease pairs1 as secondary findings when a genomic sequencing test was ordered for a clinical indication. These genes were all included on the platform, in addition to 53 site-nominated additional genes and single-nucleotide variants (SNVs) for inclusion on the panel, to support both discovery-focused and clinical return of result aims.

Pathogenic variants in all 56 ACMG gene–disease pairs were determined to be returnable, as well as variants in these genes pathogenic for other conditions (e.g., WT1 NM_024426.6:c.1447+5G>A, pathogenic for Frasier syndrome in addition to Wilms tumor, its ACMG pair). Additional gene actionability recommendations from the literature were also considered for the remaining 53 genes; these were then individually reviewed as previously described,10 considering both evidence for the gene–disease association and nature of the recommended clinical action or intervention. Additional consensus genes included BMPR1A, PALB2, POLD1, POLE, SMAD4, COL5A1, KCNE1, KCNJ2, HNF1A, HNF1B, CACNA1A, and OTC; genes with individual actionable variants included ACADM (NM_000016.5:c.985A>G [p.Lys329Glu]), ALDOB (NM_000035.4:c.356_359CAAA[1] [p.Asn120fs]), BCKDHB (NM_183050.4:c.832G>A (p.Gly278Ser) and NM_183050.4:c.548G>C [p.Arg183Pro]), FAH (NM_000137.3:c.782C>T [p.Pro261Leu]), G6PC (NM_000151.4:c.247C>T [p.Arg83Cys]), CPT2 (NM_000098.3:c.1239_1240del [p.Lys414fs]), BLM (NM_000057.4:c.2207_2212delinsTAGATTC [p.Tyr736fs]), CYP21A2 (NM_000500.9:c.293-13C>G), F5 (NM_000130.4:c.1601G>A [p.Arg534Gln]), HFE (NM_000410.3:c.845G>A [p.Cys282Tyr], homozygotes only), and MEFV (NM_000243.2:c.2177T>C [p.Val726Ala] and NM_000243.2:c.2080A>G [p.Met694Val]).10 Though all sites participated in this review, latitude was given to individual sites regarding returning these results to account for differences in study population (e.g., pediatrics sites), study aims, consent, and institutional review.

The ACMG updated its recommendation to 59 genes in 2017, dropping 1 (MYLK) and adding 4 (ATP7B, BMPR1A, SMAD4, and OTC).2 Three of these new genes were included on the eMERGEseq panel and had already been deemed actionable by the eMERGE III network, and the fourth was represented by a single SNV (ATP7B; NM_000053.4:c.3207C>A [p.His1069Gln]), which was returned only by two clinical sites. MYLK was specifically discussed and, given its calculated score per the ClinGen gene–disease validity framework13,14 did not differ from other ACMG-included genes associated with thoracic aortic aneurysms and aortic dissections (TAAD), it was retained as actionable by the eMERGE network. Given the research nature of the project, the consent process, and the participants’ accessibility to medical care, after some discussion it was decided by network consensus to return both pathogenic (P) and likely pathogenic (LP) variants in all 68 genes.

Cohort description

The eMERGE III network consists of ten clinical study sites, two sequencing centers (SCs), and a coordinating center. A primary goal of eMERGE III is to harmonize and implement an entire, unified pipeline simulating real-life clinical genomic testing within a broad spectrum of participants and clinical settings, with each clinical site’s cohort of ~2500 participants being, by design, markedly different in ascertainment strategy and demographics. The study adhered to the principles set out in the Declaration of Helsinki, with informed consent from all participants, as approved by individual sites’ institutional review boards (IRBs); a specific discussion of the ethical considerations across the eMERGE III study is described elsewhere.15 One clinical site that did not return the consensus list to participants could not be included in this analysis. Another clinical site could not be included as they ascertained participants based on previous non-CLIA sequencing results enriching for putative pathogenic variants, which would bias the analysis; results from this site have been previously described elsewhere.9 The resulting SF cohort (N = 21,915) includes a diverse range of participants drawn from across the lifespan and across the health system, from primary care to specialty clinics; Table 1 details the demographic makeup of both the overall and individual site cohorts.

Table 1 Demographic characteristics of the eMERGE III secondary findings cohort and nine contributing clinical sites.

Clinical sequencing and interpretation pipeline

Details of the eMERGEseq clinical sequencing pipeline are described in detail elsewhere.10 Briefly, participants were enrolled at each site, blood collected, DNA extracted locally, then sent to one of two assigned SCs for targeted sequencing in a College of American Pathologists (CAP)/CLIA setting: the Baylor College of Medicine Human Genome Sequencing Center (BCM-HGSC), Houston Texas; or the Broad Institute and Partners Laboratory for Molecular Medicine (LMM), Cambridge, Massachusetts. Variants identified through this sequencing test were interpreted by the assigned SC in the context of the site-provided disease status and test indication (if present). Variant classifications from both laboratories were based primarily on ACMG/AMP criteria,5 supplemented by recommendations from ClinGen’s expert panels and Sequence Variant Interpretation Working Group. Final interpretation decisions incorporated these recommendations along with evidence from the literature and, when present, additional SC internal data accrued from previous testing. Evidence summaries from the laboratories supporting these interpretations were included on the clinical reports. Variant classifications along with evidence and interpretive summaries for all reported variants were submitted directly to ClinVar by both BCM-HGSC and LMM (submitter IDs 500199 and 21766, respectively). Though there are subtle differences between the sequencing and variant analysis approaches at the two SCs, an extensive preliminary harmonization effort between the two SCs at all critical components of the sequencing and interpretation workflow ensured that test quality and accuracy were highly concordant.10

Working definition and identification of secondary findings

Although there is some debate as to the specific terminology of each case, the rate of clinically actionable findings unrelated to any participant ascertainment indication can be estimated. In this study, for the purpose of clarity, an secondary finding (SF) is defined as a pathogenic (P) or likely pathogenic (LP) variant in a gene on the consensus list with an associated disease phenotype unrelated to any participant indication that was provided to the SCs at sample intake. For individuals without an indication for testing, any positive (P or LP) finding was considered secondary. Thus, SFs in this cohort derive from individuals both with and without a reported indication for testing. Supplementary Table 1 lists these indications and their frequency in this cohort.

Given the mix of ascertainment protocols, the distribution of those eligible to receive primary findings (i.e., those with an indication for testing) varies considerably across site and disease phenotype; this could artificially skew overall estimates of SF frequency for phenotypes that were also indications for testing. To correct for this, we adjusted the total number of participants contributing to each frequency estimate based on indication for testing. For example, in calculating the SF rate for LDLR, we excluded the 1626 participants indicated to have a lipid disorder, as these participants by definition could not have received an SF in LDLR. Although a limited set of pharmacogenetic markers were also assayed and reported by the SCs, these results, known to be relatively common,16 are not included in this analysis. Likewise, carrier status for conditions with autosomal recessive inheritance is not included in this analysis.

RESULTS

Overall SF rates

Among the 1166 positive reports across the entire eMERGE III-SF cohort (N = 21,915), we identified 661 actionable findings unrelated to participant test indication, resulting in an overall SF rate of 3.02% (Fig. 1). Of these, 556 findings (2.54%) are variants in genes recommended by ACMG to be reported as SFs, while the remaining 105 findings (0.48%) are in other consensus actionable loci. Ten individuals were found to have two independent SFs, each unrelated to participant test indication. Ten variants were each reported in ≥5 participants (Table 2). Three of these ten variants, in MYBPC3, KCNQ1, and RYR1 respectively, were reported as LP. A full list of variants determined to be SFs and their rates within this cohort can be found in Supplementary Table 2. Additionally, two participants were found to have the unanticipated SF of acquired mosaic trisomy 12, a marker in chronic lymphocytic leukemia;17 after consultation between the relevant SC and clinical site, these results were included on the participant’s clinical report.

Fig. 1: Overall frequency of secondarily findings (SFs) across the eMERGE III-SF cohort.
figure 1

Ten individuals were found to have two different SFs. ACMG American College of Medical Genetics and Genomics.

Table 2 Actionable secondary findings variants identified secondarily in ≥5 participants.

SF rates by gene and disease domain

SF counts by disease domain are illustrated in Fig. 2a. Most SFs identified in this study were in cancer-associated genes, with cardiovascular disease–associated genes being the next most common group; these two domains represent the majority of genes on the SF list. For cancer-associated genes, most SFs were P (1.12% P vs. 0.26% LP), while for the cardiac-associated genes most were LP (0.36% P vs. 0.51% LP). Figure 2b illustrates the SFs reported by gene. SFs were most commonly reported in BRCA2, LDLR, HFE, MYBPC3, and BRCA1, in that order. Of note, for HFE, only homozygosity for p.Cys282Tyr (NM_001300749.2:c.845G>A) was returned.

Fig. 2: Secondary finding (SF) rates.
figure 2

SF rates grouped by (a) associated disease domain and (b) gene. Interpretations as reported by the sequencing centers. Rates are calculated using the proportion of participants who do not have a related indication as enumerated in Supplementary Table 1. Hemochromatosis findings are HFE p.C282Y homozygotes only (NM_001300749.2:c.845G>A). ACMG American College of Medical Genetics and Genomics, SNV single-nucleotide variant.

SF rates across self-reported ancestry groups

Table 3 summarizes the rate of SFs by self-reported ancestry. Self-reported Caucasian/White participants had the highest rate, 3.10% (95% confidence interval [CI] 2.84–3.40, N = 14,480); followed by Asian, 2.74% (2.02–3.69, N = 1497); Black/African American, 2.29% (1.83–2.86, N = 3279); Hispanic/Latinx, 1.98% (1.41–2.77, N = 1666); and American Indian, Alaska Native, or Native Pacific Islander, 1.30% (0.23–6.97, N = 77). In participants where self-reported race was known, there was a significant excess of SFs in those who self-reported as White versus those who did not when HFE p.Cys282Tyr (NM_001300749.2:c.845G>A) results were included (chi-square p = 0.003, 1 df). This allele is known to be common only in European ancestry individuals18 and indeed, 65/68 participants in our cohort with this finding identified as White; of the remaining 3, one identified as Black/African American and the final 2 were unknown. After excluding this allele, frequency of SFs did not differ significantly between those self-reporting their race as White (2.68%) versus all other groups combined (2.36%; chi-square p = 0.20, 1 df). The fraction SFs classified as LPs also did not differ between those self-reporting as White (38.1%) and other groups (41.0%; chi-square, p = 0.50).

Table 3 Secondary finding rates stratified by self-reported participant race/ethnicity.

We did not have access to data on whether participants self-reported Ashkenazi Jewish (AJ) ancestry. These individuals, representing roughly 1.73% of the US population, are known to have a 1/40 rate of harboring a pathogenic BRCA1/2 founder allele.19 Based on these rates, we would expect to find ~10 individuals with one of these variants across our cohort of 21,915; we found 26 participants with one of these variants (NM_000059.3[BRCA2]:c.5946del [p.Ser1982fs], NM_007294.3[BRCA1]:c.66_67AG[1] [p.Glu23fs]), suggesting enrichment of individuals of AJ ancestry in our cohort. This enrichment was not driven by one clinical site alone. However, excluding participants with these variants from analyses does not change our finding that the most frequent disease category of s was genes associated with cancer susceptibility (1.38%; 1.24% without AJ founder alleles), though previous studies found the highest rate to be cardiac disease–associated genes8,20 (0.87% in our cohort).

DISCUSSION

As patients and research participants consider the option to receive SFs as part of a genomic test, it is important that they have information on the probability and type of SFs that may be reported. Health systems must also be prepared to return these results and offer follow-up care. To our knowledge, this report is the largest study of medically actionable secondary findings to date. In our cohort, the frequency of SFs was 3.02% overall, with 2.54% in the 59 genes recommended by ACMG to be reported as SFs. This result is consistent with previous estimates of this frequency, though prior studies differ in the genes included as SFs, precede the current ACMG/AMP variant classification system, derive from a single health system, or have considerably smaller sample size.6,7,8,9,20

Our study cohort includes more individuals identifying as non-White than most studies evaluating SF rates. After removal of the HFE p.Cys282Tyr variant (NM_001300749.2:c.845G>A), we did not find a higher frequency in those who self-report as White versus other race/ethnicity groups, and the proportion of SFs that were classified as LP versus P did not differ between these groups. This differs from some previous studies of SF rate, which had previously reported an excess of SFs in those of European ancestry versus African ancestry. Aside from our exclusion of HFE, this difference could also be potentially explained by their use of slightly different lists of genes to return, cohorts outside the realm of clinical genetic testing, and earlier versions of population databases with limited representation of non-White individuals.6,7 While there may be differences in the rate of predicted deleterious variants among ancestry groups,21,22 the frequency of medically actionable secondary findings in our cohort does not mirror this prediction, though our study was not powered to test differences among the individual groups comprising non-White. This limitation, and the dearth of published evidence to accurately classify variants specific to other ancestry backgrounds (e.g., cosegregation), underscores the need to amplify participant recruitment across ancestry groups historically underrepresented in genomics research to ameliorate genomic health-care disparities.23

eMERGE did consider several genes to be returnable beyond those recommended by ACMG. Of these, we returned SFs in HFE (associated with hemochromatosis; only in homozygotes for the higher-penetrance p.Cys282Tyr variant20) PALB2 (breast cancer), KCNE1 (arrhythmia), HNF1A (maturity-onset diabetes of the young), and MEFV (familial Mediterranean fever; only in homozygotes or compound heterozygotes for NM_000243.2:c.2080A>G [p.Met694Val] or NM_000243.2:c.2177T>C [p.Val726Ala]). The p.Cys282Tyr variant in HFE (NM_001300749.2:c.845G>A) was the third most common SF returned and contributed to the higher rate of SFs in those identifying as White. eMERGE has previously advocated for the return of this genotype as an SF,18,24 reporting a penetrance rate of hemochromatosis of 24.4% in males and 14.0% in females with this genotype.18

The eMERGE III network elected to return LP SFs in this research context; 43.5% of the reported results were LP. One goal of this return was to collect further data on participants who harbor an LP variant (and potentially also their family members) to assist in any potential future reclassification. In our cohort, the proportion of variants classified as LP was greater among cardiac genes than the proportion in cancer genes. Factors that may account for this difference are (1) lower or unknown penetrance for many cardiac genes; (2) a dramatically higher burden of missense variants among results (47% of cardiac SFs vs. 21% of cancer SFs in our study), which are generally classified with less certainty than putative truncations; and (3) the smaller evidence base available for variant interpretation in cardiac genes, which have generally been studied and reported clinically for a shorter time period and in fewer individuals than cancer genes.25 While current ACMG/AMP guidelines do not explicitly recommend return of LP variants, they are often reported by clinical labs. We note, however, that even over the course of this study, some variants initially classified as LP (e.g., NM_000256.3(MYBPC3):c.3628-41_3628-17del25 and NM_198056.2(SCN5A):c.3911C>T [p.Thr1304Met]) were downgraded by one of the SCs to variant of uncertain significance (VUS) or below. This highlights that the true frequency of LP will likely shift over time as new evidence emerges, new terminology for low-risk and low-penetrance loci is adopted, and the interpretation guidelines themselves continue to evolve.

Limitations of the study include smaller sample sizes for non-European ancestry groups, though this cohort is more heterogeneous than previous studies reporting SF frequencies.6,7,8,9,20 Due to sample size, which limited power of interpopulation comparisons, we prespecified testing the frequency in Whites versus all other ancestries combined. As such, differences between Whites and specific non-White groups, even if small, cannot be ruled out using these data. Despite these limitations, to our knowledge, this is the largest study of SF rates in self-identified non-Whites. Though we did not have information on self-reported AJ ancestry, removing known AJ founder alleles from the analysis does not alter our findings that SFs are most likely to be in genes associated with cancer risk, though a previous study found the highest rate to be in genes associated with cardiac disease,20

An additional limitation is that we could not independently confirm patient indication for testing via EHR review. It was difficult for a minority of cases recruited in specialty clinics to determine if a finding was truly “secondary.” In some cases, participants were assigned an indication based on the specialty clinic in which they enrolled, but it is possible that some of these participants were family members of those with the condition necessitating the visit to these clinics. Regardless, ongoing detailed EHR review for participants with positive genetics results is currently underway by the eMERGE Outcomes and Clinical Annotation workgroups to determine the spectrum of clinical features in all individuals in which a P or LP variant was identified. Finally, ATBP7, associated with Wilson disease, was not fully sequenced but only genotyped for a single known P variant. However, given the rarity of that disorder (estimated as 1/30,000), our estimate of the SF rate for ACMG genes is likely not affected by its omission.

In summary, we evaluated the rate of SFs in a diverse cohort of 21,915 eMERGE III participants. We found a 3.02% overall frequency of SFs, of which 2.54% were in the genes deemed actionable by the ACMG. The most frequent category of SFs in our cohort were in genes associated with cancer susceptibility (1.38%), followed by cardiac diseases (0.87%) and lipid disorders (0.50%). Though the frequency of SFs and proportion of P versus LP SFs were higher in those self-identifying as White than in those identifying as another group, this difference was not significant after removing HFE results from the analysis. These important findings serve as a resource to inform decision making in patients and research participants undergoing genomic testing, aid the ongoing development of practice standards and guidelines in genomic medicine, and drive future research efforts in variant interpretation and SF return.