Analysis of 589,306 genomes identifies individuals resilient to severe Mendelian childhood diseases

Chen, Rong; Shi, Lisong; Hakenberg, Jörg; Naughton, Brian; Sklar, Pamela; Zhang, Jianguo; Zhou, Hanlin; Tian, Lifeng; Prakash, Om; Lemire, Mathieu; Sleiman, Patrick; Cheng, Wei-yi; Chen, Wanting; Shah, Hardik; Shen, Yulan; Fromer, Menachem; Omberg, Larsson; Deardorff, Matthew A; Zackai, Elaine; Bobe, Jason R; Levin, Elissa; Hudson, Thomas J; Groop, Leif; Wang, Jun; Hakonarson, Hakon; Wojcicki, Anne; Diaz, George A; Edelmann, Lisa; Schadt, Eric E; Friend, Stephen H

doi:10.1038/nbt.3514

Article
Published: 11 April 2016

Analysis of 589,306 genomes identifies individuals resilient to severe Mendelian childhood diseases

Rong Chen ORCID: orcid.org/0000-0001-6322-0340^1,2^na1,
Lisong Shi^1,2^na1,
Jörg Hakenberg^1,2,
Brian Naughton³^nAff11,
Pamela Sklar^1,2,4,
Jianguo Zhang⁵,
Hanlin Zhou⁵,
Lifeng Tian ORCID: orcid.org/0000-0003-1880-5811⁶,
Om Prakash⁷,
Mathieu Lemire⁸,
Patrick Sleiman⁶,
Wei-yi Cheng^1,2,
Wanting Chen⁵,
Hardik Shah^1,2,
Yulan Shen ORCID: orcid.org/0000-0002-2866-4146⁵,
Menachem Fromer ORCID: orcid.org/0000-0003-3749-4342^1,2,4,
Larsson Omberg⁹,
Matthew A Deardorff⁶,
Elaine Zackai⁶,
Jason R Bobe^1,2,
Elissa Levin^1,2,
Thomas J Hudson ORCID: orcid.org/0000-0002-1376-4849⁸,
Leif Groop⁷,
Jun Wang¹⁰,
Hakon Hakonarson⁶,
Anne Wojcicki³,
George A Diaz^1,2,
Lisa Edelmann^1,2,
Eric E Schadt^1,2 &
…
Stephen H Friend ORCID: orcid.org/0000-0002-0830-7600^1,2,9

Nature Biotechnology volume 34, pages 531–538 (2016)Cite this article

83k Accesses
207 Citations
1221 Altmetric
Metrics details

Subjects

This article has been updated

Abstract

Genetic studies of human disease have traditionally focused on the detection of disease-causing mutations in afflicted individuals. Here we describe a complementary approach that seeks to identify healthy individuals resilient to highly penetrant forms of genetic childhood disorders. A comprehensive screen of 874 genes in 589,306 genomes led to the identification of 13 adults harboring mutations for 8 severe Mendelian conditions, with no reported clinical manifestation of the indicated disease. Our findings demonstrate the promise of broadening genetic studies to systematically search for well individuals who are buffering the effects of rare, highly penetrant, deleterious mutations. They also indicate that incomplete penetrance for Mendelian diseases is likely more common than previously believed. The identification of resilient individuals may provide a first step toward uncovering protective genetic variants that could help elucidate the mechanisms of Mendelian diseases and new therapeutic strategies.

You have full access to this article via your institution.

Download PDF

Common genetic variation associated with Mendelian disease severity revealed through cryptic phenotype analysis

Article Open access 27 June 2022

A pediatric perspective on genomics and prevention in the twenty-first century

Article 02 October 2019

Genetic association analysis of 77,539 genomes reveals rare disease etiologies

Article Open access 16 March 2023

Main

Advances in genomic technologies have rapidly expanded our knowledge of the genetic basis of human disease. To date, >6,000 Mendelian disorders have been described (Online Mendelian Inheritance in Man (OMIM)¹), with more than 150,000 disease-associated variants identified across these disorders in the Human Gene Mutation Database (HGMD)². Despite the success of genome-wide association and whole-exome and whole-genome sequencing (WES/WGS) studies in revealing the DNA variants that underlie the genetic basis of disease, the development of effective treatments for most diseases has remained a challenge. Even for Mendelian disorders, only a handful of drugs have been developed³. One reason for this lack of success is the difficulty of using small-molecule therapies to restore protein activity in the presence of loss-of-function (LoF) mutations. As a result, treatment of Mendelian disorders typically focuses on the relief of symptoms rather than on a biological 'cure'.

A promising avenue for addressing some of these limitations is to focus analysis on the genetic and environmental modulators that keep people well by suppressing the effects of disease-causing mutations⁴. However, a major challenge in identifying resilient individuals is accurately cataloging disease mutations. Currently, there are no databases that provide a complete characterization of disease genes and their mutations as well as in-depth clinical annotations. For example, the OMIM¹ database contains all known Mendelian disorders with detailed clinical characterizations, but has limited descriptions of disease-causing mutations. In contrast, HGMD² has collected almost all disease-associated variants reported to date, but has almost no parameters pertaining to the clinical characteristics attributed to these variants. Furthermore, although many commercial pan-ethnic screening panels cover the most common highly penetrant mutations^5,6,7, important mutations might be omitted owing to technological limitations and cost-benefit considerations. Also, the exact mutations in these commercial pan-ethnic screening panels are typically inaccessible to the public.

Despite these challenges, identification of secondary modulators has proven successful across a multitude of model organisms in which the prominent role of second-site suppressors that buffer or modify traits has been established^8,9,10,11. For example, human genetic studies have identified rare mutations in CCR5 that confer resilience against HIV infection¹², mutations in globin genes that modify the severity of sickle cell disease by buffering primary mutations in β-globin genes¹³, and LoF mutations in PCSK9 that protect carriers from high lipid levels and resulting heart disease¹⁴. Second-site mutations in disease genes have also been shown to revert clinical phenotype in patients with recessive dystrophic epidermolysis¹⁵ and Fanconi anemia¹⁶, whereas LoF mutations in zinc transporter 8 have been found to protect obese individuals from diabetes¹⁷. Most recently, a variant identified in the gene Jagged1 was found to confer resilience to Duchenne muscular dystrophy in two dogs, implicating Jagged1 as a therapeutic target for the disorder¹⁸.

Here we analyze sequence and genotype data from 589,306 individuals across 12 studies (complete list in Online Methods) to identify healthy individuals harboring what are currently believed to be completely penetrant Mendelian disease-causing mutations. We refer to this search for resilient individuals as the Resilience Project. We screen mutations in 874 genes believed to cause 584 distinct severe Mendelian childhood disorders. In total, we identified 13 candidate resilient individuals spanning 8 diseases. The genomes of such resilient individuals, if appropriately decoded, hold promise in elucidating protective mechanisms of disease that could lead to novel treatments¹⁹.

Results

We carried out a search of existing genomic data for individuals who may be resilient to disease by focusing on mutations annotated as being completely penetrant for severe childhood Mendelian disorders. Our rationale for restricting attention to these disorders is manifold. First, there is a significant unmet medical need for many of these disorders that have the potential to benefit from the identification of resilient individuals. Second, a focus on diseases with a more profound phenotype and a simple genetic architecture decreased the chances of diagnostic errors or missed diagnoses due to subclinical manifestation of disease. This is particularly important for our screen, given we generally did not have access to medical records and depended on self-reporting of conditions by study participants. Finally, restricting attention to severe childhood disorders and including only individuals over the age of 18 reduces the likelihood that subjects harboring deleterious mutations will manifest the disorder later in life. The overall workflow for the retrospective search for resilient individuals is depicted in Figure 1.

Building gene and allele panels

The search for individuals who are resilient to severe childhood disorders required the construction of a screening panel of alleles known to cause such disorders with complete penetrance (Supplementary Fig. 1). A multi-stage filter was applied to identify the subset of disorders that fit our criteria. Diseases annotated as mild or of unknown severity, with an unknown age of onset or an age of onset later than 18 years, or with incomplete or unknown penetrance were removed, leaving 584 unique Mendelian diseases spanning 17 different disease categories and 874 implicated genes. This comprised the disease gene panel for our study (Table 1 and Supplementary Table 1). The top three most-represented disease categories were metabolic conditions, neurological diseases and developmental disorders, which accounted for 22.9%, 16.8% and 15.6% of the disease genes, respectively.

Table 1 The Resilience Project gene and allele panels cover diseases from 16 categories

Full size table

Disease-causing mutations in genes in the disease gene panel were identified using two independent pipelines. The first, comprising a core allele panel (CAP; Supplementary Table 2), aimed to identify well-established and well-annotated disease mutations, and the second, comprising an expanded allele panel (EAP), aimed to identify mutations that have strong support for causing severe childhood disorders. The CAP comprised 674 founder or major recurrent mutations from 162 genes representing 125 severe, early-onset diseases. Among these mutations, 47% were missense, 20% were nonsense, 11% affected splicing, 4% were in-frame insertions or deletions, and the remaining 18% were frameshift insertions or deletions resulting in premature stop codons (Supplementary Fig. 2). The EAP was intended to complement the CAP by casting a broader net for disease mutations in genes in the disease gene panel, tolerating a higher number of false positives with respect to our selection criteria for the initial identification of resilient individuals, and resolving the false-positive identifications by manual curation and clinical review. The EAP covered 24,186 variants from HGMD tagged as “disease causing mutations” (DM) with allele frequencies lower than 0.5% in the 1000 Genomes Project²⁰ and NHLBI GO Exome Sequencing Project (ESP)6500 (ref. 21; Table 1).

Applying CAP and EAP to screen 589,306 genomes

In our search for resilient individuals, we analyzed existing DNA sequence and genotype data from 12 past and ongoing genetic studies worldwide (Online Methods and Table 2). Combined, these data sets provided genome-wide variant data on 589,306 individuals. Because individual-level data could not be shared across studies, we were unable to definitively assess the number of unique individuals represented. However, we anticipate that all 589,306 individuals are unique given the geographic separation between most of the studies and the low sampling rates in the studies that sampled across broader geographic regions. We verified this in the samples from 2 of the 12 studies, 1000 Genomes and UK10K project²² samples using a single-nucleotide polymorphism (SNP) panel of 40 polymorphic markers. In comparing all samples pairwise across these two studies, we identified no duplicate samples, in addition to 18 twin pairs from UK10K.

Table 2 Data sources used in current retrospective study

Full size table

Given the different genotyping or sequencing assays run across the cohorts in our study, the coverage across all variants represented in CAP and EAP varied widely among the samples (Supplementary Fig. 3). A subset of 59 loci in CAP was covered across all samples in the study. For The Cancer Genome Atlas (TCGA) Project, UK10K and 1000 Genomes studies, which comprised 19,820 samples, the assays covered all 674 loci in the CAP. However, for these data sets we did not obtain the per-sample coverage for each locus, so individual samples may not cover all loci. Per-sample coverage was available for only one cohort, the Swedish schizophrenia cohort (SWE-SCZ)²³. These data were used to assess the extent of coverage achieved across all CAP loci. For the 5,092 samples in SWE-SCZ, 670 of the 674 loci in CAP are well-covered by all samples, with the remaining four loci having no coverage in any sample. The four loci not covered are intronic and are at least 20 nucleotides from the closest exon. For cohorts with genotype data, we used both assayed and imputed genotypes in the screen, making use of information on the quality of the called genotype, genotype likelihood and imputed genotype confidence to filter out spurious candidates. Of the 674 loci in CAP, the 23andMe, Mount Sinai BioBank, the Children's Hospital of Philadelphia (CHOP) BioBank and Finnish (components listed in Online Methods) cohorts had 297, 105, 59 and 163 filtered loci, respectively (Supplementary Fig. 4). Over all studies, the effective number of loci (as a proportion of all loci covered in CAP) was 36.5%.

Identifying candidate resilient individuals

We identified 15,597 candidate resilient individuals from our screen of 589,306 genomes against the CAP and EAP panels, representing 300 compound heterozygous or homozygous mutations across 188 genes for 163 Mendelian diseases. Of these 15,597 candidates, 367 were identified from the CAP (44 mutations), whereas the remaining 15,230 were identified from the EAP (256 mutations). We manually reviewed all mutations represented in this group to ensure that the corresponding phenotype associated with these mutations met our criteria for inclusion (completely penetrant, severe phenotype, early age of onset) and to ensure the genotype calls were made with high confidence. We excluded 6,667 of 15,597 candidates due to low confidence in the genotype call as represented by either low sequencing depth, high GC or AT content, repetitive sequence region or skewed Hardy-Weinberg equilibrium statistics. We excluded an additional 8,627 candidates owing to high population frequency (>0.5%) of discovered variants or an inability to access individual data for follow-up (e.g., ESP data set) (Table 3).

Table 3 Reasons for filtering out initial candidates due to sequencing quality, inaccurate information obtained from databases, clinical review of mutations, and clinical review of individual medical record

Full size table

For the remaining 303 candidates, we carried out a manual review of each mutation with a review team composed of bioinformatics scientists, board-certified clinical geneticists, medical consultants and genetic counselors to assess whether variation in the ages of onset and/or variations in the expression of the corresponding phenotype could explain why a candidate was flagged. For 245 of the 303 candidates, we determined the expressivity of the disease phenotype was not extreme enough to unambiguously categorize the candidate as completely resilient (Table 3). Another 16 candidates were excluded because the published literature could not provide sufficient evidence to support pathogenicity for the variants discovered in these individuals, although the diseases associated with the corresponding genes are generally severe enough to be considered as candidates in our list.

After reviewing available medical records for the remaining 42 candidates, 14 presented expected manifestations from the genotypes they carried, indicating that they did not meet the criteria of a 'healthy' individual. Sanger sequencing ruled out another 15 candidates because the genotypes were determined to be heterozygous, not homozygous, as originally determined from the variant data. The final 13 candidates all harbored homozygous (autosomal recessive disease) or heterozygous (autosomal dominant disease) mutations to one of eight different severe Mendelian childhood disorders that would normally be expected to cause severe disease before the age of 18 years: cystic fibrosis, Smith-Lemli-Opitz syndrome, familial dysautonomia, epidermolysis bullosa simplex, Pfeiffer syndrome, autoimmune polyendocrinopathy syndrome, acampomelic campomelic dysplasia and atelosteogenesis (Table 4; Table 5 and Supplementary Fig. 5). The severity of the expected phenotypes makes it highly unlikely that such an individual would have manifested the disease without it being clearly annotated in their health records. A review of the individual health information for six candidates was performed, and no evidence of the indicated disease was uncovered. Genotypes for 5 of the 13 candidates were confirmed by Sanger sequencing to be true homozygotes, whereas the remaining 8 candidates from the UK10K²², 23andMe, Sequencing Initiative Suomi or SISu (http://www.sisuproject.fi/), and BGI cohorts could not be validated owing to insufficient remaining DNA for these samples.

Table 4 13 Candidates identified in the Resilience Project

Full size table

Table 5 Status codes for different levels of support identified during follow up of candidate resilient individuals

Full size table

We modeled estimates regarding the number of expected resilient individuals from our study cohort with all autosomal recessive alleles in CAP, based on allele frequencies in the ExAC²⁴, DIVAS²⁵ and related databases and penetrance information (Supplementary Table 3). We estimated that we would have expected to identify 9 or 10 individuals with the indicated genotype out of all of those screened, which is not significantly different from the number of candidates we identified (P > 0.05).

Attempted recontact of candidate resilient individuals

We were unable to recontact any of the 13 candidate resilient individuals identified in this study, often due to the absence of a recontact clause in the original informed consent forms used for the studies from which these individuals were identified. Although recontact was possible for some cohorts in this study (e.g., Mount Sinai School of Medicine Biobank), no candidates were identified from those cohorts. Given this, we were unable to perform additional critical preprocessing steps to further confirm the resilient status of these individuals. Such steps would include confirming that the analyzed DNA matched the correct medical records for each individual, that they had not been diagnosed with the indicated Mendelian disorder, and that they were not mosaics. We consider these preprocessing steps as critical in order to formally characterize candidates as truly resilient.

Searching for simple explanations of resilience

Although in-depth decoding of candidate resilient individuals requires unfettered access to the individual and their medical records, we searched for counterbalancing variants occurring in the same gene region as the pathogenic one in an attempt to uncover simple explanations for the putative resilience. Among the 13 candidates we identified, 2 from the UK10K cohort had WES data (Table 4) and both had the pathogenic variant in the DHCR7 gene. These two individuals had 14 and 17 additional DHCR7 variants, respectively. Only five of these variants were annotated in the ClinVar, HGMD, and/or OMIM databases (Supplementary Table 4). All five were annotated as benign by ClinVar. Interestingly, both of these resilient candidates share the same homozygous alternative genotypes across all five variants. None of the variants identified clearly explains putative resilience in these two individuals. The pathogenic variant in these two individuals alters the splice site acceptor for the last exon (c.964-1G>C). Therefore, in explaining the resilience to this mutation, WGS data would provide a way to search for variants that could lead to the last exon being retained. For the remaining 11 candidates, either the raw sequencing data were inaccessible or only genotype data were available. In these cases the interrogated sites in the implicated gene regions were too sparsely covered to draw conclusions.

Lowering filtering stringency to retrieve more candidates

Given the small number of resilient candidates identified using our high-stringency filters, we attempted to lower their stringency to expand our search. Specifically, we broadened the disease and allele selection criteria to include conditions with more variable or milder clinical manifestations, reduced (but still very high) penetrance, phenotypes that can be managed, and a lower evidence level. These criteria resulted in the identification of 111 additional, second-tier candidates (Supplementary Table 5). However, the larger number of candidates resulted in a dramatic increase in the complexity of evaluating their legitimacy compared to that of the first-tier candidates. For example, 33 candidates were associated with conditions with known incomplete penetrance or milder clinical manifestations, 43 harbored variants that were more likely to be polymorphic based on evidence available in the genome variation databases, 7 harbored variants that have been reported only once or in a limited number of patients from the literature, and the remaining 28 candidates had mutations associated with conditions that are known to be strongly influenced by environmental factors. The number of candidates identified were still not large enough to employ statistical genetics techniques to identify modifier loci, and the complexity of the genetic variance component may be significantly increased, making it more challenging to employ variant-specific, or even individual-specific, study designs to elucidate the complexity of resilience (Fig. 2).

**Figure 2: Different strategies for identifying genetic variants buffering human disease.**

Discussion

The primary objective of this study was to construct a screening panel to identify individuals who did not have clinical manifestations of severe childhood-onset diseases despite harboring causal mutations believed to be completely penetrant. The multi-tier panel design was driven by technological limitations regarding the characterization of disease mutations, a desire to allow for customization of a screening panel, and by financial considerations in carrying forward a prospective screen for resilient individuals. Although WGS/WES of all participants in such a study would theoretically maximize coverage of genetic information, the associated cost ($300–$1,500/sample) would greatly reduce the number of individuals that could be screened by a targeted sequencing panel (<$50/sample).

The utility of a high-impact screening panel depends directly on rigorous informatics processes and clinical review. Less than 1% of the candidates we initially identified from the screening panel survived our filtering criteria. More than 75% of the initial candidates identified were filtered out due to errors in variant calls resulting from low coverage that made it difficult to reliably call homozygous genotypes, high GC or AT content known to lead to higher sequencing-error rates, or from repetitive sequences known to lead to alignment errors that in turn lead to false small insertion or deletion calls. The remaining false positives represented candidates that failed to pass our established clinical presentation criteria, harbored mutations that were inaccurately represented in the mutation databases, or for which there was insufficient scientific evidence to support the predicted phenotypic impact of the mutation.

Of the identified candidate resilient individuals, two individuals from the UK10K project were homozygous carriers of a splicing consensus acceptor mutation for Smith-Lemli-Opitz syndrome (SLOS). This is a well-known mutation leading to a null allele of the delta-7-sterol reductase gene, which accounts for up to one-third of mutant alleles of SLOS patients in populations of European descent. Homozygotes of this splicing mutation are rarely seen in SLOS patients despite the high carrier frequency, and all manifest at the severe end of the SLOS phenotypic spectrum and are not known to survive through childhood^26,27. Four other well-characterized recessive diseases were represented in our final list of candidates. The CFTR mutation c.1558G>T is associated with classic cystic fibrosis in combination with other disease alleles, but no homozygous cases have been described to the best of our knowledge. In vitro analysis has demonstrated that the mutated form of the CTFR receptor traffics to the cell surface but has severely impaired function²⁸. The IKBKAP mutation is an Ashkenazi Jewish founder mutation observed in nearly all cases of familial dysautonomia, a debilitating childhood-onset disorder²⁹. The Finnish/European c.769C>T mutation in AIRE has been associated with autoimmune polyendocrinopathy-candidiasis-ectodermal dystrophy syndrome (APECED)³⁰, a childhood-onset disorder characterized by chronic mucocutaneous candidiasis, hypoparathyroidism and Addison's disease. The p.R279W is a common SLC26A2 mutation. Compound heterozygotes or homozygotes of this mutation usually manifest severe skeletal dysplasia, although patients with milder phenotypes have been reported³¹.

Three autosomal dominant disorders are represented in our final list of candidates. The KRT14 c.373C>T mutation has been associated with the severe Dowling-Meara subtype of epidermolysis bullosa simplex (MIM131760)³². The recurrent c.755C→G mutation in FGFR1 has been associated with Pfeiffer syndrome, a craniosynostosis disorder with manifestations in the distal extremities³³. The SOX9 nonsense mutation p.Y440* is recurrently seen in patients with acampomelic campomelic dysplasia (MIM114290)^34,35,36, a severe form of skeletal dysplasia. Variable survival time of patients with this same mutation and lack of clear genotype-phenotype correlation among patients suggest that genetic modifiers that affect phenotypic variability may exist.

During our screening of the existing data sets, we identified a GBA compound-heterozygous (affecting amino acid positions p.N409S and p.L483P in the protein sequence) individual who had undergone routine carrier screening at Mount Sinai, but who had never been diagnosed with Gaucher disease. Upon clinical review, it was demonstrated that this individual exhibited subclinical manifestations of this disease. This patient's diagnosis was subsequently confirmed by acid β-glucosidase assay, which was in the affected range (0.7 nmol/h/mg, range 3.6–18.2 nmol/h/mg). Her medical record showed a history of easy bruising and bleeding since childhood; she was subsequently misdiagnosed with idiopathic thrombocytopenic purpura. The patient currently receives enzyme replacement therapy, which has resulted in improvement with respect to thrombocytopenia. Her story is an example of the complexity of genetic conditions such as Gaucher disease, which can exhibit a broad range of expressivity, leading to subclinical manifestations and misdiagnoses.

Given that most of the candidate resilient individuals were unavailable for recontacting, we cannot exclude straightforward explanations for their candidacy status. With the exception of disorders with hematologic manifestations, somatic mosaicism for deleterious mutations could explain the absence of phenotypic expression. The 589,306 individuals analyzed in this study were recruited from 12 large study cohorts, where the sample types were mixed with respect to ethnicity and health status, providing for the possibility that one or more of the candidates in our final list was an affected individual that harbors a homozygous deleterious mutation that may explain their diagnosed condition. The lack of metadata and the unavailability for recontacting of those participating in this study present perhaps the biggest obstacles for leveraging data retrospectively to identify resilient individuals, and speaks to the advantage of carrying out a prospective search for resilient individuals where participants can be appropriately consented for recontacting, and relevant metadata can be collected.

Despite the difficulties in getting traction on decoding the 13 individuals we identified, a number of findings demonstrate the utility of carrying out this type of comprehensive screen. First, we found mutations for severe early-onset diseases that are annotated as being completely penetrant, in putative nonpenetrant individuals, providing for the possibility that genetic modifiers may be more common than believed. Therefore, identification of resilient individuals may enhance our understanding of Mendelian disease etiology and how we counsel others regarding such conditions. Second, our screening panel provides a fully curated list of variants and their disease implications that go beyond what is covered by currently available commercial screening panels. Finally, our study suggests that genotype calling and disease variant curation and annotation are still a challenge for deriving meaningful interpretations from large-scale genomic data.

The extremely rare frequency of candidate resilient individuals in this retrospective study supports the intuitive notion that securing larger numbers of candidates would require analyzing all data worldwide being generated by genotyping and next-generation sequencing methods. A number of existing projects, such as the Human Knockout Project³⁷, The Million Veterans Program³⁸ and the large UK Biobank Project³⁹, all stand to contribute considerably to this type of effort. Whereas the penetrance, disease severity and allele-frequency parameters employed in our study restricted our screen to those mutations thought to be completely penetrant with very severe childhood manifestations of disease phenotypes, a broader net could be cast by relaxing these conditions, and allowing, for example, mutations that are not completely penetrant, but still highly penetrant (Fig. 2). Although this would result in an increase in the number of candidate resilient individuals, it would come at the expense of increasing the complexity of the factors buffering disease. We observed a sharp increase in the number of candidates by slightly loosening our stringency filters (Supplementary Table 5), but this increase was complemented by an increase in the complexity of interpretation, annotation and subsequent follow-up analyses for these additional candidates. It is worth trying to understand the complex tradeoffs between sample size, penetrance, the genetic complexity of the disease as well as resilience to disease, and our ability to identify factors buffering the disease (Fig. 2).

In prospective searches for resilient individuals, more appropriate consenting will be needed to link participants to their medical records and to allow for appropriate recontacting that enables follow-up characterizations, validation of their resilient condition and decoding to uncover the causes of the resilience. In cases where the buffering effect is itself a highly penetrant Mendelian trait, even with a small sample size (even a sample size of 1, referred to as “N of 1” cases), there is a reasonable probability of identifying the genetic cause. For example, a number of studies using whole-exome sequencing to provide diagnoses for undiagnosed, suspected genetic conditions, resulted in a roughly 25% success rate, with a significant proportion of these successes resulting in the identification of mutations that had not been previously characterized⁴⁰. In “N of 1” cancer cases for both retrospective⁴¹ and prospective studies⁴², finding actionable mutations that can affect treatment choices happens in well over 50% of the cases, with a high percentage of the actionable mutations identified as being de novo. We anticipate that future searches for individuals resilient to various genetic defects will be most effective when combining the traditional searches for positive outliers in known extended families with very broad searches for positive outliers in the general population.

Methods

Curating a mutation database of severe childhood Mendelian disorders.

The first step in our workflow for interrogating existing large-scale sequence and genotype data (Supplementary Fig. 1) is the construction of a comprehensive gene panel comprising genes that harbor completely penetrant mutations for severe childhood Mendelian disorders. We consolidated gene and mutation information for such disorders from eight independent databases that contained complementary and supporting data for genes and mutations involved in disease: (i) the Online Mendelian Inheritance in Man (OMIM) database (http://www.omim.org/)¹; (ii) the Human Gene Mutation Database (HGMD; http://www.hgmd.cf.ac.uk)²; (iii) GeneReviews (http://www.ncbi.nlm.nih.gov/books/NBK1116/)¹⁸; (iv) Genetics Home Reference (GHR; http://ghr.nlm.nih.gov/); (v) ClinVar (http://www.clinvar.com/)⁵³; (vi) Orphanet (http://www.orpha.net)⁵⁴; (vii) the Leiden Open Variation Database (LOVD; http://www.lovd.nl/3.0/home)⁵⁵; and (viii) Reference Variant Store (RVS)⁵⁶.

Criteria for including diseases and alleles in our database. To restrict attention to severe childhood Mendelian disorders, we required a disease to have certain features to be represented on our panel. First, we required the disease to be a Mendelian disorder with known pathogenic mutation(s) and a clear mode of inheritance: autosomal recessive, autosomal dominant or X-linked recessive. Disorders arising from mitochondrial DNA variants or the many different types of structural variants, digenic and complex diseases were not considered. Second, we restricted our attention to diseases that were not exceptionally rare, defined as having a prevalence higher than one in one million individuals or an increased incidence in specific subpopulations. Third, we restricted attention to diseases in which patients manifest severe, obvious phenotypes that lead to significantly increased mortality or are debilitating early in life. Fourth, we required that the clinical manifestation of the disease most typically occur before 18 years of age. Finally, we required that the diseases be caused by (nearly) completely penetrant mutations (Supplementary Table 6 and Supplementary Fig. 6).

For the set of diseases represented in our screening panels, there may be many mutations that can cause them, but the expressivity of these mutations can vary widely with respect to age of onset, severity and penetrance. We focused on those mutations that were completely penetrant and that led to the most severe forms of disease. Therefore, we constructed a filter that ensured the mutations on our panel met these different criteria. First, we required the mutation to be recurrent (a 'hotspot'), seen in multiple patients or reported several times in literature, or that it be a known founder mutation in a given subpopulation. Second, we required that the mutation be fully penetrant or nearly completely penetrant. Third, we required the mutations to be associated with severe phenotypes, having significantly increased mortality or debilitation before adulthood. Fourth, we required that the mutations lead to a significant loss of production or function compared to normal mRNAs or proteins (nonsense mutations, frameshift mutations that lead to premature stop codons or missense mutations known to affect important protein domains). Finally, we restricted attention to those mutations that could be more easily detected by standard genotyping or sequencing assays. Mutations that involve gross genomic rearrangement, copy number abnormality, large deletion/insertion and tandem repeats, although highly interesting, were excluded from consideration given that the DNA variant information available for our study did not include these types of calls and most of the data used in this study were generated by technologies and protocols that were not optimized to routinely assay structural variants in a high-throughput fashion. For example, more than half of the samples examined in this study relied on existing genotype data sets from which these types of mutations cannot be reliably called.

Deriving a screening panel to identify individuals resilient to severe childhood Mendelian disorder.

From the set of rare Mendelian childhood diseases, genes and associated mutations assembled above, we derived a gene panel and two allele panels to employ in our screen. The gene panel comprised curated genes associated with early-onset severe disease, and the two allele panels comprised disease-causing mutations that were identified at different confidence levels. For the gene panel, we compiled a list of genes associated with the highly penetrant, early-onset, severe Mendelian disorders identified above. The clinical significance for the diseases and corresponding mutations was annotated based on information from public human genetics disease phenotype databases (OMIM, GeneReviews, Genetic Testing Repository, GHR, ClinVar, Orphanet), the literature and published carrier-screening panels^5,6,7 (Supplementary Fig. 7a). We also used a pre-existing in-house (maintained by R.C.) set of more than 20,000 full-text articles curated for risk alleles and gene-disease associations. Each disease and the corresponding genes harboring mutations were annotated using published data on mode of inheritance, severity, penetrance, prevalence and age of onset. We grouped annotations for each of these annotation types into discrete categories to enable more efficient sorting and filtering (Supplementary Table 7). For example, “age of onset” ranges from 1 (prenatal or congenital or infantile <2 years old) to 4 (late onset >18 years old), and then 5 indicating the age of onset is unknown.

The two allele panels were developed from the same sources but using different stringencies. The first panel, CAP, contained only recurrent or founder mutations that had been well-documented and were associated with the most severe phenotype as represented in the above gene panel. Genotype-phenotype correlations and recurrence of mutations were determined based upon the genomic phenotype databases, including OMIM, GeneReviews, ClinVar and LOVD. The CAP was also annotated with respect to a mutation-based clinical significance score assigned to each variant using the same scoring system indicated above (Supplementary Table 7). The CAP comprised only the most heavily curated, highest-confidence alleles that are well-established as causing severe childhood disorders. Most of the alleles in the CAP are routinely assayed on carrier screening panels. However, to better leverage the vast number of discoveries made in the last couple of decades, we constructed a second “expanded allele panel” (EAP) that included all disease-associated variants in HGMD classified as disease causing, “DM”, and with overall minor allele frequency (MAF) < 0.5% according to the 1000 Genomes and ESP databases, for those genes contained within the gene panel defined above. The rationale for the EAP in addition to CAP was to broaden coverage by leveraging the extensive HGMD resource, accepting the increased noise present in this database for the initial screen, then applying more in-depth curation and clinical review to those variants in the EAP identified as hits. In this way, the significant informatics and clinical resources needed to curate disease alleles were restricted to those identified in our study population. The CAP overlaps significantly with the EAP, but given the extensive curation of the CAP, there are alleles in CAP not represented in EAP (Supplementary Fig. 7b). Both allele panels include variant-specific information such as genomic coordination; dbSNP rs-number; cDNA and protein level change in Human Genome Variation Society nomenclature⁵⁷, literature references; and most importantly, observation frequencies obtained from several public databases such as 1000 Genomes, ESP6500 and TCGA (normal samples).

Samples analyzed in the Resilience Project.

All study subjects in the current retrospective study were from 12 past and ongoing genetic studies worldwide (Table 2). Many of these studies provide open, unrestricted access or restricted access through data access committees to the genetic variant data generated in the study, including the 1000 Genomes Project²⁰, ESP²¹, matched normal samples from The Cancer Genome Atlas (TCGA) Project, the UK10K project²², the SWE-SCZ exome sequencing project, and SISu, whereas others represent private databases that are available through collaboration with the corresponding investigator, such as the Finnish study cohort (which includes the FINRISK cohort, EUFAM, the Finnish Twin Study and the Migraine Study), the Mount Sinai BioBank, 23andMe, BGI exome sequencing database and the Children's Hospital of Philadelphia (CHOP) BioBank.

A wide variety of assays were leveraged in these different studies to score DNA variants, from genotyping of comprehensive SNP panels capturing all common small-nucleotide variation in the genome, to whole exome and genome sequencing (Table 2). For imputation of genotyping data sets (Mount Sinai BioBank and CHOP), we used 1,000 Genome Project Phase 1 (b37) as the reference panel. For other genotyping data sets (23andMe and FINN), original assayed genotypes were used. A total of 589,306 individuals' variant data sets were analyzed, including 518,721 genotyping data sets and 70,585 whole exome or whole genome sequencing data sets.

The search for resilient individuals.

The union of the CAP and EAP were input into a software tool, Search Your Genome, we developed to screen genotype and sequence data for disease-causing alleles. Our scanning tool takes Variant Call Format (VCF) files as well as GFF and tab-delimited files, stored either as data summarized across a study or as single sample data sets. The input files were preprocessed by compressing and indexing them using SAMtools bgzip and tabix, respectively⁵⁸, with preliminary annotations assigned using snpEff⁵⁹ for genes (HGNC symbol or Entrez Gene ID) and nucleotide changes for variants. For VCF files, a set of common markups referring to features such as genotypes, allele frequencies and zygosity were identified for each sample and each variant of interest as defined in our panels, in addition to searching for de novo variants in genes represented in our panels. For other input formats, depending on the details provided in the corresponding data files, our tool interrogates the files for homozygotes and compound heterozygotes for alleles in the combined CAP and EAP, as well as for de novo variants leading to premature stop codons, given such variants are likely to lead to the same effects as the known deleterious mutations represented in our allele panels. The Search Your Genome tool is written in Java to ensure maximum portability to any platform running a Java Virtual Machine version 6.0 or above. On a typical desktop computer, interrogating the 1000 Genomes data (more than 37 million genetic variants) for resilient individuals from the CAP takes roughly one minute. The software is available at https://bitbucket.org/rongchenlab/resilience and http://rongchenlab.org/software/the-resilience-project-software/.

Manual review and annotation of candidates.

For each candidate that has passed high-throughput sequencing and/or genotyping QC pipeline, manual review was performed in small batches by two to five reviewers independently. At least one of the reviewers was a specialist in the disease area associated with the candidate's mutation. Any candidate that achieved consistent categorization from different reviewers, went directly to the final candidate table (if it passed clinical QC) or it was removed from CAP/EAP. For any inconsistent annotations, a group meeting session was called, a deep literature review was done and an extensive discussion was held on clinical significance to guarantee that all candidates in the final resilient individual table had solid evidence of being a real candidate. If the group discussion could not achieve a unified categorization for a candidate, this candidate was rejected from the final candidate table.

Change history

21 April 2016
In the version of this article initially published, in Table 3, the row labeled “Individual clinical review,” the number of mutations should have read 10, not 6; the number of diseases, 9, not 5; and the number of individuals, 14, not 10. The errors have been corrected for the print, PDF and HTML versions of this article.

References

McKusick, V.A. Mendelian Inheritance in Man and its online version, OMIM. Am. J. Hum. Genet. 80, 588–604 (2007).
Article CAS Google Scholar
Stenson, P.D. et al. The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Hum. Genet. 133, 1–9 (2014).
Article CAS Google Scholar
Dietz, H.C. New therapeutic approaches to mendelian disorders. N. Engl. J. Med. 363, 852–863 (2010).
Article CAS Google Scholar
Topol, E.J. Individualized medicine from prewomb to tomb. Cell 157, 241–253 (2014).
Article CAS Google Scholar
Bell, C.J. et al. Carrier testing for severe childhood recessive diseases by next-generation sequencing. Sci. Transl. Med. 3, 65ra4 (2011).
Article CAS Google Scholar
Lazarin, G.A. et al. An empirical estimate of carrier frequencies for 400+ causal Mendelian variants: results from an ethnically diverse clinical sample of 23,453 individuals. Genet. Med. 15, 178–186 (2013).
Article Google Scholar
Tanner, A.K. et al. Development and performance of a comprehensive targeted sequencing assay for pan-ethnic screening of carrier status. J. Mol. Diagn. 16, 350–360 (2014).
Article CAS Google Scholar
Hartman, J.L. IV. Buffering of deoxyribonucleotide pool homeostasis by threonine metabolism. Proc. Natl. Acad. Sci. USA 104, 11700–11705 (2007).
Article CAS Google Scholar
Hartman, J.L. IV., Garvik, B. & Hartwell, L. Principles for the buffering of genetic variation. Science 291, 1001–1004 (2001).
Article CAS Google Scholar
Hartman, J.L. IV. & Tippery, N.P. Systematic quantification of gene interactions by phenotypic array analysis. Genome Biol. 5, R49 (2004).
Article Google Scholar
Louie, R.J. et al. A yeast phenomic model for the gene interaction network modulating CFTR-ΔF508 protein biogenesis. Genome Med. 4, 103 (2012).
Article CAS Google Scholar
Philpott, S. et al. CCR5 genotype and resistance to vertical transmission of HIV-1. J. Acquir. Immune Defic. Syndr. 21, 189–193 (1999).
Article CAS Google Scholar
Galarneau, G. et al. Fine-mapping at three loci known to affect fetal hemoglobin levels explains additional genetic variation. Nat. Genet. 42, 1049–1051 (2010).
Article CAS Google Scholar
Cohen, J. et al. Low LDL cholesterol in individuals of African descent resulting from frequent nonsense mutations in PCSK9. Nat. Genet. 37, 161–165 (2005).
Article CAS Google Scholar
Pasmooij, A.M. et al. Revertant mosaicism due to a second-site mutation in COL7A1 in a patient with recessive dystrophic epidermolysis bullosa. J. Invest. Dermatol. 130, 2407–2411 (2010).
Article CAS Google Scholar
Ikeda, H. et al. Genetic reversion in an acute myelogenous leukemia cell line from a Fanconi anemia patient with biallelic mutations in BRCA2. Cancer Res. 63, 2688–2694 (2003).
CAS PubMed Google Scholar
Flannick, J. et al. Loss-of-function mutations in SLC30A8 protect against type 2 diabetes. Nat. Genet. 46, 357–363 (2014).
Article CAS Google Scholar
Vieira, N.M. et al. Jagged 1 rescues the Duchenne muscular dystrophy phenotype. Cell 163, 1204–1213 (2015).
Article CAS Google Scholar
Friend, S.H. & Schadt, E.E. Translational genomics. Clues from the resilient. Science 344, 970–972 (2014).
Article CAS Google Scholar
Abecasis, G.R. et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
Article Google Scholar
Tennessen, J.A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).
Article CAS Google Scholar
Kaye, J. et al. Managing clinically significant findings in research: the UK10K example. Eur. J. Hum. Genet. 22, 1100–1104 (2014).
Article Google Scholar
Purcell, S.M. et al. A polygenic burden of rare disruptive mutations in schizophrenia. Nature 506, 185–190 (2014).
Article CAS Google Scholar
Lek, M. et al. Analysis of protein-coding genetic variation in 60,706 humans. bioRxiv Preprint at http://biorxiv.org/content/early/2015/10/30/030338 (2015).
Cheng, W.Y., Hakenberg, J., Li, S.D. & Chen, R. DIVAS: a centralized genetic variant repository representing 150 000 individuals from multiple disease cohorts. Bioinformatics 32, 151–153 (2016).
Article CAS Google Scholar
Jira, P.E. et al. Novel mutations in the 7-dehydrocholesterol reductase gene of 13 patients with Smith–Lemli–Opitz syndrome. Ann. Hum. Genet. 65, 229–236 (2001).
Article CAS Google Scholar
Nowaczyk, M.J. et al. Smith-Lemli-Opitz (RHS) syndrome: holoprosencephaly and homozygous IVS8-1G-->C genotype. Am. J. Med. Genet. 103, 75–80 (2001).
Article CAS Google Scholar
Sosnay, P.R. et al. Defining the disease liability of variants in the cystic fibrosis transmembrane conductance regulator gene. Nat. Genet. 45, 1160–1167 (2013).
Article CAS Google Scholar
Shohat, M. & Hubshman, M.W. in GeneReviews (eds. Pagon, R.A. et al.) (University of Washington, Seattle, 1993–2016) (updated December 18, 2014).
Nagamine, K. et al. Positional cloning of the APECED gene. Nat. Genet. 17, 393–398 (1997).
Article CAS Google Scholar
Ballhausen, D. et al. Recessive multiple epiphyseal dysplasia (rMED): phenotype delineation in eighteen homozygotes for DTDST mutation R279W. J. Med. Genet. 40, 65–71 (2003).
Article CAS Google Scholar
Letai, A. et al. Disease severity correlates with position of keratin point mutations in patients with epidermolysis bullosa simplex. Proc. Natl. Acad. Sci. USA 90, 3197–3201 (1993).
Article CAS Google Scholar
Muenke, M. et al. A common mutation in the fibroblast growth factor receptor 1 gene in Pfeiffer syndrome. Nat. Genet. 8, 269–274 (1994).
Article CAS Google Scholar
Ebensperger, C. et al. No evidence of mutations in four candidate genes for male sex determination/differentiation in sex-reversed XY females with campomelic dysplasia. Ann. Genet. 34, 233–238 (1991).
CAS PubMed Google Scholar
Wagner, T. et al. Autosomal sex reversal and campomelic dysplasia are caused by mutations in and around the SRY-related gene SOX9. Cell 79, 1111–1120 (1994).
Article CAS Google Scholar
Meyer, J. et al. Mutational analysis of the SOX9 gene in campomelic dysplasia and autosomal sex reversal: lack of genotype/phenotype correlations. Hum. Mol. Genet. 6, 91–98 (1997).
Article CAS Google Scholar
MacArthur, D.G. et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823–828 (2012).
Article CAS Google Scholar
Roberts, J.P. Million veterans sequenced. Nat. Biotechnol. 31, 470 (2013).
Article CAS Google Scholar
Palmer, L.J.U.K. UK Biobank: bank on it. Lancet 369, 1980–1982 (2007).
Article Google Scholar
Lee, H. et al. Clinical exome sequencing for genetic identification of rare Mendelian disorders. J. Am. Med. Assoc. 312, 1880–1887 (2014).
Article Google Scholar
Schwaederle, M. et al. On the road to precision cancer medicine: analysis of genomic biomarker actionability in 439 patients. Mol. Cancer Ther. 14, 1488–1494 (2015).
Article CAS Google Scholar
Beltran, H. et al. Whole-exome sequencing of metastatic cancer and biomarkers of treatment response. JAMA Oncol. 1, 466–474 (2015).
Article Google Scholar
Rajakumar, C. et al. Carnitine palmitoyltransferase IA polymorphism P479L is common in Greenland Inuit and is associated with elevated plasma apolipoprotein A-I. J. Lipid Res. 50, 1223–1228 (2009).
Article CAS Google Scholar
Fluharty, A.L. in GeneReviews (eds. Pagon, R.A. et al.) (University of Washington, Seattle, 1993–2016) (updated February 6, 2014).
Bienvenu, T. et al. Spectrum of CFTR mutations on Réunion Island: impact on neonatal screening. Hum. Biol. 77, 705–714 (2005).
Article CAS Google Scholar
Ensenauer, R. et al. A common mutation is associated with a mild, potentially asymptomatic phenotype in patients with isovaleric acidemia diagnosed by newborn screening. Am. J. Hum. Genet. 75, 1136–1142 (2004).
Article CAS Google Scholar
Samstad, S.O., Rossvoll, O., Torp, H.G., Skjaerpe, T. & Hatle, L. Cross-sectional early mitral flow-velocity profiles from color Doppler in patients with mitral valve disease. Circulation 86, 748–755 (1992).
Article CAS Google Scholar
Thiadens, A.A. et al. Comprehensive analysis of the achromatopsia genes CNGA3 and CNGB3 in progressive cone dystrophy. Ophthalmology 117, 825–30.e1 (2010).
Article Google Scholar
Pace, J.M., Kuslich, C.D., Willing, M.C. & Byers, P.H. Disruption of one intra-chain disulphide bond in the carboxyl-terminal propeptide of the proalpha1(I) chain of type I procollagen permits slow assembly and secretion of overmodified, but stable procollagen trimers and results in mild osteogenesis imperfecta. J. Med. Genet. 38, 443–449 (2001).
Article CAS Google Scholar
Okano, Y. et al. Molecular basis of phenotypic heterogeneity in phenylketonuria. N. Engl. J. Med. 324, 1232–1238 (1991).
Article CAS Google Scholar
Riazuddin, S.A. et al. Novel SIL1 mutations in consanguineous Pakistani families mapping to chromosomes 5q31. Mol. Vis. 15, 1050–1056 (2009).
CAS PubMed PubMed Central Google Scholar
Christodoulou, J., Grimm, A., Maher, T. & Bennetts, B. RettBASE: The IRSA MECP2 variation database-a new mutation database in evolution. Hum. Mutat. 21, 466–472 (2003).
Article CAS Google Scholar
Landrum, M.J. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, D980–D985 (2014).
Article CAS Google Scholar
Rath, A. et al. Representation of rare diseases in health information systems: the Orphanet approach to serve a wide range of end users. Hum. Mutat. 33, 803–808 (2012).
Article Google Scholar
Fokkema, I.F. et al. LOVD v.2.0: the next generation in gene variant databases. Hum. Mutat. 32, 557–563 (2011).
Article CAS Google Scholar
Hakenberg, J. et al. Integrating 400 million variants from 80,000 human samples with extensive annotations: towards a knowledge base to analyze disease cohorts. BMC Bioinformatics 17, 24 (2016).
Article Google Scholar
Horaitis, O. & Cotton, R.G. The challenge of documenting mutation across the genome: the human genome variation society approach. Hum. Mutat. 23, 447–452 (2004).
Article CAS Google Scholar
Li, H. Tabix: fast retrieval of sequence features from generic TAB-delimited files. Bioinformatics 27, 718–719 (2011).
Article Google Scholar
Cingolani, P. et al. A program for annotating and predicting the effects of single nucleotide polymorphisms, SnpEff: SNPs in the genome of Drosophila melanogaster strain w1118; iso-2; iso-3. Fly (Austin) 6, 80–92 (2012).
Article CAS Google Scholar

Download references

Acknowledgements

We thank S. Sieberts (Sage Bionetworks) and L. Mangravite (Sage Bionetworks) for critical review of our manuscript. The authors would like to thank the Exome Aggregation Consortium and the group that provided exome variant data for comparison.

Author information

Brian Naughton
Present address: Present address: Boolean Biotech Inc., Mountain View, California, USA.,
Rong Chen and Lisong Shi: These authors contributed equally to this work.

Authors and Affiliations

Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, New York, USA
Rong Chen, Lisong Shi, Jörg Hakenberg, Pamela Sklar, Wei-yi Cheng, Hardik Shah, Menachem Fromer, Jason R Bobe, Elissa Levin, George A Diaz, Lisa Edelmann, Eric E Schadt & Stephen H Friend
Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, New York, USA
Rong Chen, Lisong Shi, Jörg Hakenberg, Pamela Sklar, Wei-yi Cheng, Hardik Shah, Menachem Fromer, Jason R Bobe, Elissa Levin, George A Diaz, Lisa Edelmann, Eric E Schadt & Stephen H Friend
23andMe, Mountain View, California, USA
Brian Naughton & Anne Wojcicki
Friedman Brain Institute and Department of Psychiatry, Icahn School of Medicine at Mount Sinai, New York, New York, USA
Pamela Sklar & Menachem Fromer
BGI-Shenzhen, Shenzhen, China
Jianguo Zhang, Hanlin Zhou, Wanting Chen & Yulan Shen
Center for Applied Genomics, The Children's Hospital of Philadelphia, Philadelphia, Pennsylvania, USA
Lifeng Tian, Patrick Sleiman, Matthew A Deardorff, Elaine Zackai & Hakon Hakonarson
Department of Clinical Sciences, Diabetes & Endocrinology, Lund University Diabetes Center, Skåne University Hospital, Lund University, Malmö, Sweden
Om Prakash & Leif Groop
Ontario Institute for Cancer Research, Toronto, Ontario, Canada
Mathieu Lemire & Thomas J Hudson
Sage Bionetworks, Seattle, Washington, USA
Larsson Omberg & Stephen H Friend
iCarbonX, Shenzhen, China
Jun Wang

Authors

Rong Chen
View author publications
You can also search for this author in PubMed Google Scholar
Lisong Shi
View author publications
You can also search for this author in PubMed Google Scholar
Jörg Hakenberg
View author publications
You can also search for this author in PubMed Google Scholar
Brian Naughton
View author publications
You can also search for this author in PubMed Google Scholar
Pamela Sklar
View author publications
You can also search for this author in PubMed Google Scholar
Jianguo Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Hanlin Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Lifeng Tian
View author publications
You can also search for this author in PubMed Google Scholar
Om Prakash
View author publications
You can also search for this author in PubMed Google Scholar
Mathieu Lemire
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Sleiman
View author publications
You can also search for this author in PubMed Google Scholar
Wei-yi Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Wanting Chen
View author publications
You can also search for this author in PubMed Google Scholar
Hardik Shah
View author publications
You can also search for this author in PubMed Google Scholar
Yulan Shen
View author publications
You can also search for this author in PubMed Google Scholar
Menachem Fromer
View author publications
You can also search for this author in PubMed Google Scholar
Larsson Omberg
View author publications
You can also search for this author in PubMed Google Scholar
Matthew A Deardorff
View author publications
You can also search for this author in PubMed Google Scholar
Elaine Zackai
View author publications
You can also search for this author in PubMed Google Scholar
Jason R Bobe
View author publications
You can also search for this author in PubMed Google Scholar
Elissa Levin
View author publications
You can also search for this author in PubMed Google Scholar
Thomas J Hudson
View author publications
You can also search for this author in PubMed Google Scholar
Leif Groop
View author publications
You can also search for this author in PubMed Google Scholar
Jun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hakon Hakonarson
View author publications
You can also search for this author in PubMed Google Scholar
Anne Wojcicki
View author publications
You can also search for this author in PubMed Google Scholar
George A Diaz
View author publications
You can also search for this author in PubMed Google Scholar
Lisa Edelmann
View author publications
You can also search for this author in PubMed Google Scholar
Eric E Schadt
View author publications
You can also search for this author in PubMed Google Scholar
Stephen H Friend
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

R.C., E.E.S. and S.H.F. contributed to the conception and study design. L.S. and R.C. built the disease gene and mutation panels. J.H. and R.C. curated databases and built bioinformatics pipelines. J.H. and R.C. performed bioinformatics analysis. L.S., R.C., J.H., B.N., M.A.D., E.Z., G.A.D., L.E. and S.H.F. performed QC and clinical review of all candidates. B.N., P. Sklar, J.Z., H.Z., L.T., O.P., M.L., P. Sleiman, W.-y.C., W.C., H.S., Y.S., M.F., L.O., J.R.B., E.L., T.H., L.G., J.W., H.H. and A.W. contributed the data and analysis. L.S., R.C., J.H., E.E.S. and S.H.F. wrote the manuscript.

Corresponding authors

Correspondence to Rong Chen, Eric E Schadt or Stephen H Friend.

Ethics declarations

Competing interests

B.N. worked for 23andMe at the time this study was carried out, A.W. works for 23andMe, J.Z., H.Z., W.C., and Y.S. work for BGI-Shenzhen, and J.W. works for iCarbonX.

Integrated supplementary information

Supplementary Figure 1 Workflow of the retrospective Resilience Project to build allele and gene panels and interrogate existing sequencing data

Supplementary Figure 2 Distribution of mutation types in the Resilience Project core allele panel (CAP)

Supplementary Figure 3 Comprehensive annotation of 674 mutations in the Core Allele Panel (CAP).

From the outside to inside layers the annotations include disease category (outermost layer, one color per gene); average read depth per allele in a representative full exome sequencing study (red barchart: SWE-SCZ; N=5092; Agilent SureSelect Human All Exon v2, covering 33Mb); relative number of known disease-causing variants in CAP per gene (light blue) and gene name; name of the mutation; number of samples screened in our study (blue bar chart); age of onset, penetrance, severity, and evidence (orange/red heat map, with red or 1 depicting earliest onset variants, fully penetrant, most severe, and the most reliable evidence; consult Supplementary Table 7 for more details). A cross-section with respect to the CFTR p.F508del mutation is highlighted to illustrate each of these different layer. The average read depth at this site in the exome sequencing study SWE-SCZ study was 68 (red bar chart). The minimum read depth across the CFTR exons was 21, the maximum 279, with one intronic variant not covered in any of the 5,092 samples. This p.F508del variant is marked as "mostly early age of onset (<18y)"; complete penetrance with a known exception; and severe disease manifestation plus variable expressivity (all three in dark orange). The evidence comes from a large cohort and is therefore marked by a red bar.

Supplementary Figure 4 Number of samples analyzed for the alleles in the core allele panel (CAP)

Since ESP6500 provides only variant calls, we define coverage of a core allele by ESP6500 based on the existence of any variant within a -/+100bp window around each core allele.

Supplementary Figure 5 Radar plots showing clinical manifestations of 8 disorders for the 13 identified resilient candidate individuals

The axes of radar plot are indicating quantitative scores used for clinical annotation for alleles, with lower score (category 1) outside and higher score (category 4) inside. A larger radar map suggests a more severe phenotype.

Supplementary Figure 6 Disease selection criteria plot

Numbers on axes are quantitative scores defined in Supplementary Table 1. A smaller number indicates a severe score. The radar plot is showing disease selection criteria in the current study: a disease with complete or near complete penetrance, early onset (symptom developed younger than 18y) and significantly reduced mobility or increased mortality.

Supplementary Figure 7 Overlaps of current gene and allele panels with published carrier screening panels

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–7 and Supplementary Tables 4–7 (PDF 4200 kb)

Supplementary Table 1

Core gene panel (XLSX 76 kb)

Supplementary Table 2

Core allele panel (XLSX 69 kb)

Supplementary Table 3

Power estimates using model of all autosomal recessive alleles in CAP (XLSX 114 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, R., Shi, L., Hakenberg, J. et al. Analysis of 589,306 genomes identifies individuals resilient to severe Mendelian childhood diseases. Nat Biotechnol 34, 531–538 (2016). https://doi.org/10.1038/nbt.3514

Download citation

Received: 29 July 2015
Accepted: 12 February 2016
Published: 11 April 2016
Issue Date: May 2016
DOI: https://doi.org/10.1038/nbt.3514

This article is cited by

Developmental disruption and restoration of brain synaptome architecture in the murine Pax6 neurodevelopmental disease model
- Laura Tomas-Roca
- Zhen Qiu
- Seth G. N. Grant
Nature Communications (2022)
Whole-genome sequencing of 1,171 elderly admixed individuals from Brazil
- Michel S. Naslavsky
- Marilia O. Scliar
- Mayana Zatz
Nature Communications (2022)
Whole exome sequencing in dense families suggests genetic pleiotropy amongst Mendelian and complex neuropsychiatric syndromes
- Suhas Ganesh
- Alekhya Vemula
- Meera Purushottam
Scientific Reports (2022)
Clinical challenges in interpreting multiple pathogenic mutations in single patients
- Christa Slaught
- Elizabeth G. Berry
- Sancy A. Leachman
Hereditary Cancer in Clinical Practice (2021)
Splice-variant specific effects of a CACNA1H mutation associated with writer’s cramp
- Ivana A. Souza
- Maria A. Gandini
- Gerald W. Zamponi
Molecular Brain (2021)

Subjects

Abstract

Similar content being viewed by others

Main

Results

Building gene and allele panels

Applying CAP and EAP to screen 589,306 genomes

Identifying candidate resilient individuals

Attempted recontact of candidate resilient individuals

Searching for simple explanations of resilience

Lowering filtering stringency to retrieve more candidates

Discussion

Methods

Curating a mutation database of severe childhood Mendelian disorders.

Deriving a screening panel to identify individuals resilient to severe childhood Mendelian disorder.

Samples analyzed in the Resilience Project.

The search for resilient individuals.

Manual review and annotation of candidates.

Change history

21 April 2016

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Integrated supplementary information

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links