Abstract
Loss-of-function mutations cause many mendelian diseases. Here we aimed to create a catalog of autosomal genes that are completely knocked out in humans by rare loss-of-function mutations. We sequenced the whole genomes of 2,636 Icelanders and imputed the sequence variants identified in this set into 101,584 additional chip-genotyped and phased Icelanders. We found a total of 6,795 autosomal loss-of-function SNPs and indels in 4,924 genes. Of the genotyped Icelanders, 7.7% are homozygotes or compound heterozygotes for loss-of-function mutations with a minor allele frequency (MAF) below 2% in 1,171 genes (complete knockouts). Genes that are highly expressed in the brain are less often completely knocked out than other genes. Homozygous loss-of-function offspring of two heterozygous parents occurred less frequently than expected (deficit of 136 per 10,000 transmissions for variants with MAF <2%, 95% confidence interval (CI) = 10–261).
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Eilbeck, K. et al. The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol. 6, R44 (2005).
MacArthur, D.G. et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823–828 (2012).
Tennessen, J.A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).
Nelson, M.R. et al. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337, 100–104 (2012).
Gudbjartsson, D. et al. Large-scale whole-genome sequencing of the Icelandic population. Nat. Genet. doi:10.1038/ng.3247 (25 March 2015).
Lim, E.T. et al. Rare complete knockouts in humans: population distribution and significant role in autism spectrum disorders. Neuron 77, 235–242 (2013).
Bamshad, M.J. et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat. Rev. Genet. 12, 745–755 (2011).
McKusick, V.A. Mendelian Inheritance in Man and its online version, OMIM. Am. J. Hum. Genet. 80, 588–604 (2007).
Chen, F.C., Chen, C.J., Li, W.H. & Chuang, T.J. Human-specific insertions and deletions inferred from mammalian genome sequences. Genome Res. 17, 16–22 (2007).
Montgomery, S.B. et al. The origin, evolution, and functional impact of short insertion-deletion variants identified in 179 human genomes. Genome Res. 23, 749–761 (2013).
Fu, W. et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493, 216–220 (2013).
Helgason, A. et al. The Y-chromosome point mutation rate in humans. Nat. Genet. doi:10.1038/ng.3171 (25 March 2015).
Lyon, M.F. Gene action in the X-chromosome of the mouse (Mus musculus L.). Nature 190, 372–373 (1961).
Rosenstein, B.J. & Cutting, G.R. The diagnosis of cystic fibrosis: a consensus statement. Cystic Fibrosis Foundation Consensus Panel. J. Pediatr. 132, 589–595 (1998).
Pruitt, K.D., Tatusova, T., Brown, G.R. & Maglott, D.R. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 40, D130–D135 (2012).
Robinson, P.N. & Mundlos, S. The human phenotype ontology. Clin. Genet. 77, 525–534 (2010).
Saunders, C.J. et al. Rapid whole-genome sequencing for genetic disease diagnosis in neonatal intensive care units. Sci. Transl. Med. 4, 154ra135 (2012).
Köhler, S. et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 42, D966–D974 (2014).
Cooper, D.N. & Krawczak, M. Human Gene Mutation Database. Hum. Genet. 98, 629 (1996).
Helgason, A. et al. Estimating Scandinavian and Gaelic ancestry in the male settlers of Iceland. Am. J. Hum. Genet. 67, 697–717 (2000).
Fagerberg, L. et al. Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol. Cell. Proteomics 13, 397–406 (2014).
Eppig, J.T., Blake, J.A., Bult, C.J., Kadin, J.A. & Richardson, J.E. The Mouse Genome Database (MGD): comprehensive resource for genetics and genomics of the laboratory mouse. Nucleic Acids Res. 40, D881–D886 (2012).
Petrovski, S., Wang, Q., Heinzen, E.L., Allen, A.S. & Goldstein, D.B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).
Wang, T., Wei, J.J., Sabatini, D.M. & Lander, E.S. Genetic screens in human cells using the CRISPR-Cas9 system. Science 343, 80–84 (2014).
Tint, G.S. et al. Defective cholesterol biosynthesis associated with the Smith-Lemli-Opitz syndrome. N. Engl. J. Med. 330, 107–113 (1994).
Löffler, J., Trojovsky, A., Casati, B., Kroisel, P.M. & Utermann, G. Homozygosity for the W151X stop mutation in the δ7-sterol reductase gene (DHCR7) causing a lethal form of Smith-Lemli-Opitz syndrome: retrospective molecular diagnosis. Am. J. Med. Genet. 95, 174–177 (2000).
Khurana, E. et al. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342, 1235587 (2013).
Montgomery, S.B., Lappalainen, T., Gutierrez-Arcelus, M. & Dermitzakis, E.T. Rare and common regulatory variation in population-scale sequenced human genomes. PLoS Genet. 7, e1002144 (2011).
Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).
Baker, K.E. & Parker, R. Nonsense-mediated mRNA decay: terminating erroneous gene expression. Curr. Opin. Cell Biol. 16, 293–299 (2004).
Kettleborough, R.N. et al. A systematic genome-wide analysis of zebrafish protein-coding gene function. Nature 496, 494–497 (2013).
Ayadi, A. et al. Mouse large-scale phenotyping initiatives: overview of the European Mouse Disease Clinic (EUMODIC) and of the Wellcome Trust Sanger Institute Mouse Genetics Project. Mamm. Genome 23, 600–610 (2012).
Abraira, V.E. et al. Cross-repressive interactions between Lrig3 and netrin 1 shape the architecture of the inner ear. Development 135, 4091–4099 (2008).
Hurle, B. et al. Non-syndromic vestibular disorder with otoconial agenesis in tilted/mergulhador mice caused by mutations in otopetrin 1. Hum. Mol. Genet. 12, 777–789 (2003).
Sherry, S.T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).
Acknowledgements
We thank all the participants in this study. This study was performed in collaboration with Illumina.
Author information
Authors and Affiliations
Contributions
P.S., H. Helgason, A.O., U.T., G.M., D.F.G. and K.S. designed the experiment. H.S., H. Holm and U.T. collected the samples. Adalbjorg Jonasdottir, Aslaug Jonasdottir, A.S. and O.T.M. performed the sequencing experiments. P.S., H. Helgason, S.A.G., F.Z., E.H., G.T.S., A.K., G.M. and D.F.G. analyzed the data. P.S., H. Helgason, A.H., D.F.G. and K.S. wrote the first draft of the manuscript. All authors contributed to the final version of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors affiliated with deCODE Genetics are employed by the company: P.S., H. Helgason A.O., H.S., S.A.G., F.Z., E.H., G.T.S., Adalbjorg Jonasdottir, Aslaug Jonasdottir, A.S., O.T.M., A.K., A.H., H. Holm, U.T., G.M., D.F.G. and K.S.
Integrated supplementary information
Supplementary Figure 1 The probabilities of a variant being seen once and five times as a function of the number of individuals sequenced by minor allele frequency (MAF).
Variants that are seen at least five times, corresponding to an observed allelic frequency of 0.095%, are likely to be imputed with good quality.
Supplementary Figure 2 The fraction of individuals among 104,220 individuals with imputed genotypes that have genes completely knockout out by LoF variants with MAF below the given threshold.
The second panel shows a magnified view of MAF below 3%.
Supplementary Figure 3 The number of genes that are observed to have at least one LoF variant with MAF below 2% as a function of the number of sequenced individuals and the number of genes that are completely knocked out in at least one individual by LoF variants with MAF below 2% as a function of the number of chip-genotyped individuals.
The curves are derived from the allele frequency distributions and the number of imputed complete knockouts.
Supplementary Figure 4 The cumulative number of genes by the number of completely knocked out individuals.
The total number of genes is 1,171.
Supplementary Figure 5 The frequency distribution of the 6,795 LoF variants among sequenced and imputed individuals.
The leftmost count includes all variants with a frequency less than one and half divided by twice the number of sequenced individuals, which corresponds to the number of variants seen only once in the sequenced set.
Supplementary Figure 7 Transcriptome effect of synonymous SNPs by exon rank.
The allele-specific expression of the non-reference allele was calculated for each variant for a set of 262 individuals with blood RNA sequence data. The top, middle and bottom of the boxes are the top quartile, median and bottom quartile values calculated over the set of variants. The whiskers show the lowest and highest datum within 1.5 times the interquartile range (IQR) from the median. The dots indicate datum more than 1.5 times the IQR from the median. The n values given are the number of variants in each class.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–7, Supplementary Note and Supplementary Tables 1–3 and 5–11. (PDF 3027 kb)
Supplementary Table 4
The observed list of 6,795 LoF mutations in 4,924 genes. (XLSX 959 kb)
Rights and permissions
About this article
Cite this article
Sulem, P., Helgason, H., Oddson, A. et al. Identification of a large set of rare complete human knockouts. Nat Genet 47, 448–452 (2015). https://doi.org/10.1038/ng.3243
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/ng.3243
This article is cited by
-
Neutrophil extracellular traps in systemic autoimmune and autoinflammatory diseases
Nature Reviews Immunology (2023)
-
Ribonuclease inhibitor 1 (RNH1) deficiency cause congenital cataracts and global developmental delay with infection-induced psychomotor regression and anemia
European Journal of Human Genetics (2023)
-
Founder vs. non-founder BRCA1/2 pathogenic alleles: the analysis of Belarusian breast and ovarian cancer patients and review of other studies on ethnically homogenous populations
Familial Cancer (2023)
-
Tozorakimab (MEDI3506): an anti-IL-33 antibody that inhibits IL-33 signalling via ST2 and RAGE/EGFR to reduce inflammation and epithelial dysfunction
Scientific Reports (2023)
-
Rare and population-specific functional variation across pig lines
Genetics Selection Evolution (2022)