Introduction

As whole-exome and -genome sequencing has become less expensive and bioinformatics has improved, the clinical use of genome sequencing has rapidly increased. One important concern related to this expanded clinical use is the identification and reporting of secondary findings that are unrelated to the indication of the sequencing but are of medical value for patient care.1,2 There has been much discussion and publication on the topic of secondary findings,3,4,5 and the American College of Medical Genetics and Genomics (ACMG) has released a set of guidelines.1 The ACMG recommends restricting the return of secondary findings to a minimum set of 56 actionable genes and restricting the variants reported as secondary findings to those fitting one of two categories: “known pathogenic” or “expected pathogenic.”1

However, classification of identified variants to the appropriate category is very difficult, even with the ACMG recommendations for interpretative categories of sequence variants.6 The ACMG and the Association for Molecular Pathology (AMP) recently came to a joint consensus regarding updated guidelines for variant classification during the 2013 ACMG Annual Clinical Genetics Meeting. The new recommendation is to classify variants into pathogenic or likely pathogenic categories according to evidence-based guidelines, in which multiple lines of evidence (e.g., population data, computational data, functional data, segregation data) are combined. The proposed evidence-based guidelines are more stringent and objective than previous guidelines; however, their value to clinical exomes has not been elucidated.

A substantial number of secondary findings exist in the human genome. Each individual carries about 50–100 variants in disease-associated genes,7 and a screen of 1,000 exomes with actionable variants revealed an estimated frequency of 3.4% for those of European descent and 1.2% for those of African descent harboring the pathogenic or likely pathogenic variants.8 In Korea the number of individuals undergoing exome or genome sequencing is also rapidly increasing. However, no study has yet addressed the prevalence of secondary findings per individual in Korea. In this study we analyzed exome data from 196 Korean individuals to search for actionable pathogenic variants as secondary findings using the proposed evidence-based guidelines.

Materials and Methods

Study population

This study was approved by the institutional review board of Samsung Medical Center, Seoul, South Korea. A total of 196 exomes, including 100 exomes from healthy controls (group A, control exomes) and 96 exomes from patients with any kind of underlying disorders (group B, disease exomes), were screened for variants from a list of 56 genes recommended by the ACMG for return of secondary findings.1 Control exome data were obtained from the Korean Genome and Epidemiology Study, which began in 2001 as an ongoing, population-based study of Korean adults in Ansan and Ansung, South Korea. We also investigated disease exome data from 96 consecutive, unrelated Korean patients with suspected Mendelian disorders who underwent whole-exome sequencing between 2011 and December 2013 at Samsung Medical Center, Seoul, South Korea. The underlying disease conditions for patients who underwent whole-exome sequencing were Alzheimer disease (3/96, 3%), amyotrophic lateral sclerosis (5/96, 5%), atelosteogenesis (1/96, 1%), autism spectrum disorder (1/96, 1%), frontotemporal dementia (3/96, 3%), inherited metabolic disease (4/96, 4%), nontuberculous mycobacterial infection (20/96, 21%), primary congenital glaucoma (12/96, 13%), subcortical vascular cognitive impairment (45/96, 47%), retinitis pigmentosa (1/96, 1%), and vasculopathy (1/96, 1%). The population structure of our sample was examined to confirm its genetic homogeneity and to assess its stratification using principal component analysis.9

Exome sequencing

Genomic DNA was obtained from the participants in group A with a SureSelect Human All Exon v2 (Agilent Technologies, Santa Clara, CA) and sequenced on an Illumina HiSeq2000 platform (Illumina, San Diego, CA). Genomic DNA from the participants in group B was enriched using Agilent SureSelect Human All Exon v3 (Agilent Technologies), with the exception of five samples from patients with amyotrophic lateral sclerosis that were obtained using an Illumina TruSeq kit (Illumina) and sequenced using the Illumina HiSeq2000 machine (Illumina).

Criteria for evidence-based classification

We adopted the evidence-based guidelines recently recommended at the 2013 joint consensus meeting of the ACMG and the Association for Molecular Pathology to classify the identified variants. Variants were classified into five categories: pathogenic, likely pathogenic, uncertain significance, likely benign, and benign (Supplementary Table S1 online). The criteria were meant to be stringent and stratified based on the combination of many lines of evidence, including population data, computational and predictive data, functional data, segregation data, de novo data, allelic data, and other data. Each piece of evidence was weighted as very strong, strong, moderate, or supporting. A detailed definition of evidence for each case is given in Supplementary Table S2 online.

A primary literature review to determine their potential pathogenicity of all identified variants was conducted using various sources cited in the Human Gene Mutation Database (HGMD; professional version for release 2014.1) and PubMed. Population data were assessed using the allele frequency of a variant in the general population as described in databases such as the 1000 Genomes Project; the National Heart, Lung, and Blood Institute Exome Sequencing Project database; and the single-nucleotide polymorphism database. If the allele frequency was absent from controls or was present only at an extremely low frequency (minor allele frequency <0.005) in the databases, moderate evidence for a pathogenic variant could be assigned to the variant. Computational and predictive data were assessed by different prediction tools such as Sorting Intolerant From Tolerant, Polymorphism Phenotyping version 2, Combined Annotation Dependent Depletion, and Human Splicing Finder, along with a conservation score (PhyloP). Cosegregation of a variant also was considered a criterion in determining pathogenicity. If a variant was described in the literature as de novo, there was increased suspicion that the variant was pathogenic.

Selection of pathogenic or likely pathogenic variants

We first reduced all variants to the set that occurs within the 56 genes and manually curated this set to exclude nongenic or intronic variants. To reduce the list of variants to a manageable size, we prioritized variants previously listed as disease-causing in the HGMD, as well as novel or disruptive truncating variants expected to cause disease with no previous reports in the HGMD. All available lines of evidence were reviewed for each selected variant according to evidence-based guidelines. After deliberate variant classification, variants classified as “pathogenic” or “likely pathogenic” were selected as actionable pathogenic secondary findings.

Results

Population stratification and exome sequencing statistics

Principal component analysis demonstrated that the genetic variants exhibited by the Korean subjects in our study are clearly distinguished from those of other ethnic groups, indicating the genetic homogeneity of our study population (Supplementary Figure S1 online).

The targeted coding sequences of the 56 genes examined 286,381 bp (1,161 exons). The proportion of target bases covered by 10 or more sequence reads was 90% in group A and 93% in group B. Detailed statistical data are given in Supplementary Table S3 online.

Pathogenic or likely pathogenic variants by study group

As shown in Table 1 , a total of 4,409 unique variants of the 56 ACMG-reportable genes were identified in the 196 exomes (2,309 and 2,100 in groups A and B, respectively). After exclusion of nongenic or intronic variants, 70 potentially pathogenic variants that were classified as disease-causing in the HGMD were identified (33 and 37 in groups A and B, respectively). Six novel truncating variants not listed as disease-causing in the HGMD also were identified (two and four in groups A and B, respectively). These 76 variants, previously listed or not listed as disease-causing in the HGMD, were reviewed according to all available lines of evidence. This analysis identified 11 potentially pathogenic variants, including pathogenic or likely pathogenic variants, in 13 individuals (7 and 6 in groups A and B, respectively). These variants are listed in Table 2 . The detailed classification of each pathogenic or likely pathogenic variant is described in Supplementary Table S4 online. All pathogenic or likely pathogenic variants seemed to be unrelated to the ascertainment of the disease group B.

Table 1 Filtering and classification of variants by study group
Table 2 Pathogenic or likely pathogenic variants according to the evidence-based guidelines in ACMG reportable genes

Estimation of the population frequency of actionable pathogenic secondary findings

We identified 13 participants, 7 individuals in group A and 6 individuals in group B, with pathogenic or likely pathogenic variants as secondary findings in 56 medically-actionable genes, according to the ACMG ( Table 1 ). Thus, ~7% (7/100 in group A; 6/96 in group B) of the participants analyzed, who were all Korean, had secondary findings. For 9 of the variants, these pathogenic or likely pathogenic variants were identified by their classification in the HGMD as disease-causing mutations, whereas two were identified via predicted premature truncations ( Table 2 ). Among the 56 ACMG-reportable genes, only the MUTYH gene is an autosomal-recessive locus. These MUTYH variants were found in individuals who carry one copy of the variant.

Discussion

In this study we found that ~7% (13/196) of the Korean individuals examined had a pathogenic or likely pathogenic variant as identified by exome sequencing. This study is the first to estimate the actionable pathogenic variant load in an Asian population and to apply evidence-based guidelines to variant classification.

This work provides several important findings. First, the frequency of secondary findings in our Korean study population falls within the range of previously published studies.8,10,11 A prior report of 1,000 clinical exomes from European/African subjects estimated that 1.2–3.4% of its subjects had secondary findings.8 Another study, by Xue et al.,11 of 179 healthy individuals from the 1000 Genomes Project found frequencies of up to 11%. A screen of 104 exomes for 448 severe recessive diseases found an average carrier burden of 2.8 variants per person.12 Another study found a high proportion of pathogenic variants (8.5%) in 1,092 asymptomatic individuals.10 The variation in these rates of secondary findings between different groups could be due to a number of reasons. Differences in database annotations and interpretations are a major source of differences in apparent secondary findings.13 Moreover, our interpretation criteria—evidence guidelines—are different from those used in earlier reports and might be an additional source of variation in our observed secondary findings.

The second important observation is that the detection rate of secondary findings was similar across study groups. Approximately 7% (7/100) of the individuals in the control group exhibited pathogenic or likely pathogenic variants; similarly, 6% (6/96) of the individuals suspected to have genetic disease exhibited pathogenic or likely pathogenic variants. Thus the burden of actionable pathogenic secondary findings in the Korean population is estimated to be ~7% (13/196), although additional data are needed to extrapolate our findings to a wider population. Third, because they provide detailed guidance for how to classify a variant based on multiple lines of evidence, the evidence-based guidelines effectively classified the identified variants in this study.

This study was limited by the following: (i) it included only 196 Korean participants, (ii) phenotype data were not available, (iii) the data may have been skewed by missense variants that are common among Koreans and are not found in the current exome databases, and (iv) the gene list did not consider disease prevalence by ethnic group. We focused our analysis on the 56 genes recommended by the ACMG. Although this gene list may be applied to any ethnicity, it might not include more prevalent and highly penetrant genes in the Asian population. For example, the CDH1 gene, which is the causative gene for hereditary diffuse gastric cancer, with a lifetime risk of over 80%, was not included in the ACMG list of minimum genes.1 In this study, we found two likely pathogenic variants of CDH1, c.1018A>G (p.Thr340Ala) and c.2494G>A (p.Val832Met), in 0.5% (1/196) and 1.0% (2/196) of all patients, respectively. The prevalence of gastric cancer in Korea is high, with crude incidence rates of 92.7 and 43.5 per 100,000 people per year among men and women, respectively, as estimated in 2014.14 Moreover, the frequency of this disease is expected to continue to increase.

Taken together, the results from our work demonstrate that actionable pathogenic secondary findings are frequently identified in the Korean population.

Disclosure

The authors declare no conflict of interest.