Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Identification of a large set of rare complete human knockouts

Subjects

Abstract

Loss-of-function mutations cause many mendelian diseases. Here we aimed to create a catalog of autosomal genes that are completely knocked out in humans by rare loss-of-function mutations. We sequenced the whole genomes of 2,636 Icelanders and imputed the sequence variants identified in this set into 101,584 additional chip-genotyped and phased Icelanders. We found a total of 6,795 autosomal loss-of-function SNPs and indels in 4,924 genes. Of the genotyped Icelanders, 7.7% are homozygotes or compound heterozygotes for loss-of-function mutations with a minor allele frequency (MAF) below 2% in 1,171 genes (complete knockouts). Genes that are highly expressed in the brain are less often completely knocked out than other genes. Homozygous loss-of-function offspring of two heterozygous parents occurred less frequently than expected (deficit of 136 per 10,000 transmissions for variants with MAF <2%, 95% confidence interval (CI) = 10–261).

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Transmission probabilities from carrier parents.
Figure 2: A histogram of the number of frameshift and stop-gain variants by percentage position within the affected protein sequence and the fraction of rare variants within each bin (derived allele frequency (DAF) < 0.5%).
Figure 3: Transcriptome effect of stop-gain SNPs by exon rank.

Similar content being viewed by others

References

  1. Eilbeck, K. et al. The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol. 6, R44 (2005).

    Article  Google Scholar 

  2. MacArthur, D.G. et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823–828 (2012).

    Article  CAS  Google Scholar 

  3. Tennessen, J.A. et al. Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 337, 64–69 (2012).

    Article  CAS  Google Scholar 

  4. Nelson, M.R. et al. An abundance of rare functional variants in 202 drug target genes sequenced in 14,002 people. Science 337, 100–104 (2012).

    Article  CAS  Google Scholar 

  5. Gudbjartsson, D. et al. Large-scale whole-genome sequencing of the Icelandic population. Nat. Genet. doi:10.1038/ng.3247 (25 March 2015).

    Article  CAS  Google Scholar 

  6. Lim, E.T. et al. Rare complete knockouts in humans: population distribution and significant role in autism spectrum disorders. Neuron 77, 235–242 (2013).

    Article  CAS  Google Scholar 

  7. Bamshad, M.J. et al. Exome sequencing as a tool for Mendelian disease gene discovery. Nat. Rev. Genet. 12, 745–755 (2011).

    Article  CAS  Google Scholar 

  8. McKusick, V.A. Mendelian Inheritance in Man and its online version, OMIM. Am. J. Hum. Genet. 80, 588–604 (2007).

    Article  CAS  Google Scholar 

  9. Chen, F.C., Chen, C.J., Li, W.H. & Chuang, T.J. Human-specific insertions and deletions inferred from mammalian genome sequences. Genome Res. 17, 16–22 (2007).

    Article  CAS  Google Scholar 

  10. Montgomery, S.B. et al. The origin, evolution, and functional impact of short insertion-deletion variants identified in 179 human genomes. Genome Res. 23, 749–761 (2013).

    Article  CAS  Google Scholar 

  11. Fu, W. et al. Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 493, 216–220 (2013).

    Article  CAS  Google Scholar 

  12. Helgason, A. et al. The Y-chromosome point mutation rate in humans. Nat. Genet. doi:10.1038/ng.3171 (25 March 2015).

    Article  CAS  Google Scholar 

  13. Lyon, M.F. Gene action in the X-chromosome of the mouse (Mus musculus L.). Nature 190, 372–373 (1961).

    Article  CAS  Google Scholar 

  14. Rosenstein, B.J. & Cutting, G.R. The diagnosis of cystic fibrosis: a consensus statement. Cystic Fibrosis Foundation Consensus Panel. J. Pediatr. 132, 589–595 (1998).

    Article  CAS  Google Scholar 

  15. Pruitt, K.D., Tatusova, T., Brown, G.R. & Maglott, D.R. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic Acids Res. 40, D130–D135 (2012).

    Article  CAS  Google Scholar 

  16. Robinson, P.N. & Mundlos, S. The human phenotype ontology. Clin. Genet. 77, 525–534 (2010).

    Article  CAS  Google Scholar 

  17. Saunders, C.J. et al. Rapid whole-genome sequencing for genetic disease diagnosis in neonatal intensive care units. Sci. Transl. Med. 4, 154ra135 (2012).

    Article  Google Scholar 

  18. Köhler, S. et al. The Human Phenotype Ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 42, D966–D974 (2014).

    Article  Google Scholar 

  19. Cooper, D.N. & Krawczak, M. Human Gene Mutation Database. Hum. Genet. 98, 629 (1996).

    Article  CAS  Google Scholar 

  20. Helgason, A. et al. Estimating Scandinavian and Gaelic ancestry in the male settlers of Iceland. Am. J. Hum. Genet. 67, 697–717 (2000).

    Article  CAS  Google Scholar 

  21. Fagerberg, L. et al. Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol. Cell. Proteomics 13, 397–406 (2014).

    Article  CAS  Google Scholar 

  22. Eppig, J.T., Blake, J.A., Bult, C.J., Kadin, J.A. & Richardson, J.E. The Mouse Genome Database (MGD): comprehensive resource for genetics and genomics of the laboratory mouse. Nucleic Acids Res. 40, D881–D886 (2012).

    Article  CAS  Google Scholar 

  23. Petrovski, S., Wang, Q., Heinzen, E.L., Allen, A.S. & Goldstein, D.B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).

    Article  CAS  Google Scholar 

  24. Wang, T., Wei, J.J., Sabatini, D.M. & Lander, E.S. Genetic screens in human cells using the CRISPR-Cas9 system. Science 343, 80–84 (2014).

    Article  CAS  Google Scholar 

  25. Tint, G.S. et al. Defective cholesterol biosynthesis associated with the Smith-Lemli-Opitz syndrome. N. Engl. J. Med. 330, 107–113 (1994).

    Article  CAS  Google Scholar 

  26. Löffler, J., Trojovsky, A., Casati, B., Kroisel, P.M. & Utermann, G. Homozygosity for the W151X stop mutation in the δ7-sterol reductase gene (DHCR7) causing a lethal form of Smith-Lemli-Opitz syndrome: retrospective molecular diagnosis. Am. J. Med. Genet. 95, 174–177 (2000).

    Article  Google Scholar 

  27. Khurana, E. et al. Integrative annotation of variants from 1092 humans: application to cancer genomics. Science 342, 1235587 (2013).

    Article  Google Scholar 

  28. Montgomery, S.B., Lappalainen, T., Gutierrez-Arcelus, M. & Dermitzakis, E.T. Rare and common regulatory variation in population-scale sequenced human genomes. PLoS Genet. 7, e1002144 (2011).

    Article  CAS  Google Scholar 

  29. Lappalainen, T. et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

    Article  CAS  Google Scholar 

  30. Baker, K.E. & Parker, R. Nonsense-mediated mRNA decay: terminating erroneous gene expression. Curr. Opin. Cell Biol. 16, 293–299 (2004).

    Article  CAS  Google Scholar 

  31. Kettleborough, R.N. et al. A systematic genome-wide analysis of zebrafish protein-coding gene function. Nature 496, 494–497 (2013).

    Article  CAS  Google Scholar 

  32. Ayadi, A. et al. Mouse large-scale phenotyping initiatives: overview of the European Mouse Disease Clinic (EUMODIC) and of the Wellcome Trust Sanger Institute Mouse Genetics Project. Mamm. Genome 23, 600–610 (2012).

    Article  Google Scholar 

  33. Abraira, V.E. et al. Cross-repressive interactions between Lrig3 and netrin 1 shape the architecture of the inner ear. Development 135, 4091–4099 (2008).

    Article  CAS  Google Scholar 

  34. Hurle, B. et al. Non-syndromic vestibular disorder with otoconial agenesis in tilted/mergulhador mice caused by mutations in otopetrin 1. Hum. Mol. Genet. 12, 777–789 (2003).

    Article  CAS  Google Scholar 

  35. Sherry, S.T. et al. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 29, 308–311 (2001).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank all the participants in this study. This study was performed in collaboration with Illumina.

Author information

Authors and Affiliations

Authors

Contributions

P.S., H. Helgason, A.O., U.T., G.M., D.F.G. and K.S. designed the experiment. H.S., H. Holm and U.T. collected the samples. Adalbjorg Jonasdottir, Aslaug Jonasdottir, A.S. and O.T.M. performed the sequencing experiments. P.S., H. Helgason, S.A.G., F.Z., E.H., G.T.S., A.K., G.M. and D.F.G. analyzed the data. P.S., H. Helgason, A.H., D.F.G. and K.S. wrote the first draft of the manuscript. All authors contributed to the final version of the manuscript.

Corresponding authors

Correspondence to Patrick Sulem or Kari Stefansson.

Ethics declarations

Competing interests

The authors affiliated with deCODE Genetics are employed by the company: P.S., H. Helgason A.O., H.S., S.A.G., F.Z., E.H., G.T.S., Adalbjorg Jonasdottir, Aslaug Jonasdottir, A.S., O.T.M., A.K., A.H., H. Holm, U.T., G.M., D.F.G. and K.S.

Integrated supplementary information

Supplementary Figure 1 The probabilities of a variant being seen once and five times as a function of the number of individuals sequenced by minor allele frequency (MAF).

Variants that are seen at least five times, corresponding to an observed allelic frequency of 0.095%, are likely to be imputed with good quality.

Supplementary Figure 2 The fraction of individuals among 104,220 individuals with imputed genotypes that have genes completely knockout out by LoF variants with MAF below the given threshold.

The second panel shows a magnified view of MAF below 3%.

Supplementary Figure 3 The number of genes that are observed to have at least one LoF variant with MAF below 2% as a function of the number of sequenced individuals and the number of genes that are completely knocked out in at least one individual by LoF variants with MAF below 2% as a function of the number of chip-genotyped individuals.

The curves are derived from the allele frequency distributions and the number of imputed complete knockouts.

Supplementary Figure 4 The cumulative number of genes by the number of completely knocked out individuals.

The total number of genes is 1,171.

Supplementary Figure 5 The frequency distribution of the 6,795 LoF variants among sequenced and imputed individuals.

The leftmost count includes all variants with a frequency less than one and half divided by twice the number of sequenced individuals, which corresponds to the number of variants seen only once in the sequenced set. 

Supplementary Figure 6 A histogram of the number of meiosis between the parents of the 104,220 Icelanders in our study and the fraction of individuals that have at least one gene completely knocked out by rare LoF variants.

Supplementary Figure 7 Transcriptome effect of synonymous SNPs by exon rank.

The allele-specific expression of the non-reference allele was calculated for each variant for a set of 262 individuals with blood RNA sequence data. The top, middle and bottom of the boxes are the top quartile, median and bottom quartile values calculated over the set of variants. The whiskers show the lowest and highest datum within 1.5 times the interquartile range (IQR) from the median. The dots indicate datum more than 1.5 times the IQR from the median. The n values given are the number of variants in each class.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–7, Supplementary Note and Supplementary Tables 1–3 and 5–11. (PDF 3027 kb)

Supplementary Table 4

The observed list of 6,795 LoF mutations in 4,924 genes. (XLSX 959 kb)

Source data

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sulem, P., Helgason, H., Oddson, A. et al. Identification of a large set of rare complete human knockouts. Nat Genet 47, 448–452 (2015). https://doi.org/10.1038/ng.3243

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.3243

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing