Analysis

Human gene essentiality

Published online:

Abstract

A gene can be defined as essential when loss of its function compromises viability of the individual (for example, embryonic lethality) or results in profound loss of fitness. At the population level, identification of essential genes is accomplished by observing intolerance to loss-of-function variants. Several computational methods are available to score gene essentiality, and recent progress has been made in defining essentiality in the non-coding genome. Haploinsufficiency is emerging as a critical aspect of gene essentiality: approximately 3,000 human genes cannot tolerate loss of one of the two alleles. Genes identified as essential in human cell lines or knockout mice may be distinct from those in living humans. Reconciling these discrepancies in how we evaluate gene essentiality has applications in clinical genetics and may offer insights for drug development.

  • Subscribe to Nature Reviews Genetics for full access:

    $59

    Subscribe

Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

References

  1. 1.

    The minimal cell genome: “on being the right size”. Proc. Natl Acad. Sci. USA 93, 10004–10006 (1996).

  2. 2.

    et al. Global transposon mutagenesis and a minimal Mycoplasma genome. Science 286, 2165–2169 (1999).

  3. 3.

    , et al. Design and synthesis of a minimal bacterial genome. Science 351, aad6253 (2016).

  4. 4.

    et al. Gene essentiality is a quantitative property linked to cellular evolvability. Cell 163, 1388–1399 (2015).

  5. 5.

    , , , & DEG 10, an update of the database of essential genes that includes both protein-coding genes and noncoding genomic elements. Nucleic Acids Res. 42, D574–D580 (2014).

  6. 6.

    et al. Mechanisms of haploinsufficiency revealed by genome-wide profiling in yeast. Genetics 169, 1915–1925 (2005).

  7. 7.

    et al. A whole-genome analysis of premature termination codons. Genomics 98, 337–342 (2011).

  8. 8.

    et al. Analysis of stop-gain and frameshift variants in human innate immunity genes. PLoS Comput. Biol. 10, e1003757 (2014).

  9. 9.

    et al. Human genomics. Effect of predicted protein-truncating genetic variants on the human transcriptome. Science 348, 666–669 (2015).

  10. 10.

    et al. A systematic survey of loss-of-function variants in human protein-coding genes. Science 335, 823–828 (2012).

  11. 11.

    et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013).

  12. 12.

    , , & Rare and common regulatory variation in population-scale sequenced human genomes. PLoS Genet. 7, e1002144 (2011).

  13. 13.

    , , & Characterising and predicting haploinsufficiency in the human genome. PLoS Genet. 6, e1001154 (2010).

  14. 14.

    et al. Deep sequencing of 10,000 human genomes. Proc. Natl Acad. Sci. USA 113, 11901–11906 (2016).

  15. 15.

    et al. Analysis of protein-coding genetic variation in 60,706 humans. Nature 536, 285–291 (2016). This paper presents the identification by ExAC of 3,230 genes with near-complete depletion of predicted protein-truncating variants. This work describes the widely used pLI score to identify essential genes.

  16. 16.

    et al. Distribution and clinical impact of functional variants in 50,726 whole-exome sequences from the DiscovEHR study. Science 354, aaf6814 (2016).

  17. 17.

    , & Hearing silence: non-neutral evolution at synonymous sites in mammals. Nat. Rev. Genet. 7, 98–108 (2006).

  18. 18.

    , , , & Exposing synonymous mutations. Trends Genet. 30, 308–321 (2014).

  19. 19.

    , , , & Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).

  20. 20.

    , , & EvoTol: a protein-sequence based evolutionary intolerance framework for disease-gene prioritization. Nucleic Acids Res. 43, e33 (2015).

  21. 21.

    et al. A framework for the interpretation of de novo mutation in human disease. Nat. Genet. 46, 944–950 (2014). This is an influential paper describing context-dependent mutation rates across the genome. It forms the basis for several sores of essentiality.

  22. 22.

    , , & LoFtool: a gene intolerance score based on loss-of-function variants in 60 706 individuals. Bioinformatics 33, 471–474 (2016).

  23. 23.

    et al. The characteristics of heterozygous protein truncating variants in the human genome. PLoS Comput Biol 11, e1004647 (2015). This study highlights rare heterozygous variants as an unexplored source of diversity of phenotypic traits and diseases. It describes the lack of compensation at expression level (haploinsufficiency).

  24. 24.

    et al. Estimating the selective effects of heterozygous protein-truncating variants from human exome data. Nat. Genet. 49, 806–810 (2017). This paper describes a large set of essential genes that are likely to have crucial functions but have not yet been characterized.

  25. 25.

    , , & Identification of human haploinsufficient genes and their genomic proximity to segmental duplications. Eur. J. Hum. Genet. 16, 1350–1357 (2008).

  26. 26.

    , , & Interpretation of genomic variants using a unified biological network approach. PLoS Comput. Biol. 9, e1002886 (2013).

  27. 27.

    , , & Haploinsufficiency predictions without study bias. Nucleic Acids Res. 43, e101 (2015).

  28. 28.

    , , & HIPred: an integrative approach to predicting haploinsufficient genes. Bioinformatics 33, 1751–1757 (2017).

  29. 29.

    & The yeast deletion collection: a decade of functional genomics. Genetics 197, 451–465 (2014).

  30. 30.

    Essential Human Genes. Cell Syst. 1, 381–382 (2015).

  31. 31.

    , , & Defining the role of essential genes in human disease. PLoS ONE 6, e27368 (2011).

  32. 32.

    & Essentiality and centrality in protein interaction networks revisited. BMC Bioinformatics 16, 109 (2015).

  33. 33.

    et al. Controllability analysis of the directed human protein interaction network identifies disease genes and drug targets. Proc. Natl Acad. Sci. USA 113, 4976–4981 (2016).

  34. 34.

    , & From mouse to human: evolutionary genomics analysis of human orthologs of essential genes. PLoS Genet. 9, e1003484 (2013).

  35. 35.

    et al. Genetic variants regulating expression levels and isoform diversity during embryogenesis. Nature 541, 402–406 (2017).

  36. 36.

    , , & Lethality and centrality in protein networks. Nature 411, 41–42 (2001).

  37. 37.

    , & Predicting essential genes and proteins based on machine learning and network topological features: a comprehensive review. Front. Physiol. 7, 75 (2016).

  38. 38.

    et al. Gene essentiality and synthetic lethality in haploid human cells. Science 350, 1092–1096 (2015).

  39. 39.

    et al. High-resolution CRISPR screens reveal fitness genes and genotype-specific cancer liabilities. Cell 163, 1515–1526 (2015).

  40. 40.

    , , & Genetic screens in human cells using the CRISPR-Cas9 system. Science 343, 80–84 (2014).

  41. 41.

    & The mouse ascending: perspectives for human-disease models. Nat. Cell Biol. 9, 993–999 (2007).

  42. 42.

    et al. Mouse large-scale phenotyping initiatives: overview of the European Mouse Disease Clinic (EUMODIC) and of the Wellcome Trust Sanger Institute Mouse Genetics Project. Mamm. Genome 23, 600–610 (2012).

  43. 43.

    & Using the mouse to model human disease: increasing validity and reproducibility. Dis. Model. Mech. 9, 101–103 (2016).

  44. 44.

    , & The haplolethal region at the 16F gene cluster of Drosophila melanogaster: structure and function. Genetics 151, 163–175 (1999).

  45. 45.

    , & Transgenic rescue of the mouse t complex haplolethal locus Thl1. Mamm. Genome 16, 838–846 (2005).

  46. 46.

    et al. High-throughput discovery of novel developmental phenotypes. Nature 537, 508–514 (2016). This is the largest study from the International Mouse Phenotyping Consortium. It identifies 410 lethal genes during the production of the first 1,751 mouse gene knockouts.

  47. 47.

    , , , & Systematic discovery of human gene function and principles of modular organization through phylogenetic profiling. Cell Rep. (2015).

  48. 48.

    et al. Too many roads not taken. Nature 470, 163–165 (2011).

  49. 49.

    et al. Quantifying the impact of rare and ultra-rare coding variation across the phenotypic spectrum. Preprint at (2017).

  50. 50.

    et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, D980–D985 (2014).

  51. 51.

    et al. The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Hum. Genet. 133, 1–9 (2014).

  52. 52.

    et al. Large-scale whole-genome sequencing of the Icelandic population. Nat. Genet. 47, 435–444 (2015).

  53. 53.

    , & Human knockout carriers: dead, diseased, healthy, or improved? Trends Mol. Med. 22, 341–351 (2016).

  54. 54.

    et al. Health and population effects of rare gene knockouts in adult humans with related parents. Science 352, 474–477 (2016).

  55. 55.

    et al. Identification of a large set of rare complete human knockouts. Nat. Genet. 47, 448–452 (2015).

  56. 56.

    et al. Distribution and medical impact of loss-of-function variants in the Finnish founder population. PLoS Genet. 10, e1004494 (2014).

  57. 57.

    et al. Human knockouts and phenotypic analysis in a cohort with a high rate of consanguinity. Nature 544, 235–239 (2017). This provides a roadmap for a 'human knockout project' to understand the phenotypic consequences of complete disruption of genes in humans.

  58. 58.

    & A rule for termination-codon position within intron-containing genes: when nonsense affects RNA abundance. Trends Biochem. Sci. 23, 198–199 (1998).

  59. 59.

    & Nonsense-mediated mRNA decay: an intricate machinery that shapes transcriptomes. Nat. Rev. Mol. Cell Biol. 16, 665–677 (2015).

  60. 60.

    & Non-coding genetic variants in human disease. Hum. Mol. Genet. 24, R102–R110 (2015).

  61. 61.

    Non-coding RNAs in human disease. Nat. Rev. Genet. 12, 861–874 (2011).

  62. 62.

    & Pathogenic variants in non-protein-coding sequences. Clin. Genet. 84, 422–428 (2013).

  63. 63.

    & Enhancer mutations and phenotype modularity. Nat. Genet. 46, 3–4 (2014).

  64. 64.

    et al. A whole-genome analysis framework for effective identification of pathogenic regulatory variants in mendelian disease. Am. J. Hum. Genet. 99, 595–606 (2016).

  65. 65.

    , & The mystery of extreme non-coding conservation. Phil. Trans. R. Soc. B 368, 20130021 (2013).

  66. 66.

    & CRISPR screens to discover functional noncoding elements. Trends Genet. 32, 526–529 (2016).

  67. 67.

    An integrated encyclopedia of DNA elements in the human genome. Nature 489, 57–74 (2012).

  68. 68.

    et al. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46, 310–315 (2014).

  69. 69.

    & Predicting effects of noncoding variants with deep learning-based sequence model. Nat. Methods 12, 931–934 (2015).

  70. 70.

    , , & A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nat. Genet. 48, 214–220 (2016).

  71. 71.

    et al. Role of non-coding sequence variants in cancer. Nat. Rev. Genet. 17, 93–108 (2016).

  72. 72.

    & An expanded sequence context model broadly explains variability in polymorphism levels across the human genome. Nat. Genet. 48, 349–355 (2016).

  73. 73.

    et al. The human non-coding genome defined by genetic diversity. Nat. Genet. (in the press) (2017).

  74. 74.

    et al. Systematic mapping of functional enhancer-promoter connections with CRISPR interference. Science 354, 769–773 (2016).

  75. 75.

    et al. Functional genetic screens for enhancer elements in the human genome using CRISPR-Cas9. Nat. Biotechnol. 34, 192–198 (2016).

  76. 76.

    et al. High-resolution interrogation of functional elements in the noncoding genome. Science 353, 1545–1549 (2016).

  77. 77.

    et al. Genome-scale deletion screening of human long non-coding RNAs using a paired-guide RNA CRISPR-Cas9 library. Nat. Biotechnol. 34, 1279–1286 (2016).

  78. 78.

    Developing medicines that mimic the natural successes of the human genome: lessons from NPC1L1, HMGCR, PCSK9, APOC3, and CETP. J. Am. Coll. Cardiol. 65, 1562–1566 (2015).

  79. 79.

    & HIV entry inhibitors. Lancet 370, 81–88 (2007).

  80. 80.

    et al. Immune dysregulation in human subjects with heterozygous germline mutations in CTLA4. Science 345, 1623–1627 (2014). This is a report of haploinsufficiency linked to a severe immune disease in several unrelated adults that escaped diagnosis for years. It serves as a model of the syndromes to come.

  81. 81.

    et al. Evolocumab and clinical outcomes in patients with cardiovascular disease. N. Engl. J. Med. 376, 1713–1722 (2017). This is a clinical trial of a drug built on the knowledge of the cardiovascular phenotype of a human PCSK9 truncation.

  82. 82.

    et al. Regional missense constraint improves variant deleteriousness prediction. Preprint at (2017).

  83. 83.

    et al. Negative selection in humans and fruit flies involves synergistic epistasis. Science 356, 539–542 (2017).

  84. 84.

    et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476–482 (2011).

  85. 85.

    et al. Defining functional DNA elements in the human genome. Proc. Natl Acad. Sci. USA 111, 6131–6138 (2014).

  86. 86.

    et al. Identification and characterization of essential genes in the human genome. Science 350, 1096–1101 (2015).

Download references

Acknowledgements

The authors thank Drs Ewen Kirkness and Michael Hicks for valuable comments. The authors are employees of Human Longevity, Inc.

Author information

Affiliations

  1. Human Longevity Inc., San Diego, California 92121, USA.

    • István Bartha
    • , Julia di Iulio
    • , J. Craig Venter
    •  & Amalio Telenti
  2. J. Craig Venter Institute, Capricorn Lane, La Jolla, California 92037, USA.

    • J. Craig Venter
    •  & Amalio Telenti

Authors

  1. Search for István Bartha in:

  2. Search for Julia di Iulio in:

  3. Search for J. Craig Venter in:

  4. Search for Amalio Telenti in:

Contributions

All authors substantially contributed to discussion of content and to reviewing/editing the manuscript before submission. I.B., J.d.I. and A.T. researched data for the article and contributed to writing the manuscript.

Competing interests

The authors are employees of Human Longevity, Inc. There is no commercial interest or intellectual property associated with this work.

Corresponding authors

Correspondence to J. Craig Venter or Amalio Telenti.

Supplementary information

PDF files

  1. 1.

    Supplementary information

    Supplementary information S1 (box)

Excel files

  1. 1.

    Supplementary information

    Supplementary information S2 (table)

Text files

  1. 1.

    Supplementary information

    Supplementary information S3 (table)

Glossary

Minimal genome

A genome limited to the essential genes for life.

Robustness

The ability of a biological system to keep its behaviour unchanged under perturbation.

Redundancy

The possibility of having a function encoded by more than one gene.

Evolvability

The degree to which an organism can generate adaptive solutions to future environments through heritable phenotypic variation.

Exome

The subset of the genome that is part of mature RNAs and translated into proteins.

Protein truncation

A truncated, incomplete and usually nonfunctional protein product. Generally, the result of stop-gain, frameshift or splice-donor genetic variants.

Loss-of-function variants

Genetic variants that severely disrupt the function of a protein. These can be missense (a change of the codon resulting in a change in the amino acid) or nonsense and protein-truncating variants.

Haploinsufficiency

In a diploid organism, having only a single functional copy of a gene (with the other copy inactivated by mutation), which is insufficient to maintain proper gene function.

Stop-gain variants

Also known as nonsense variants, changes in the genetic material that result in premature termination of the translated protein.

Saturate

When referring to the generation of gene variants genome-wide, the sample size at which all positions in the genome are seen variant at least once.

Frameshift variants

Deletions or insertions in the protein-coding region, the lengths of which are not divisible by three, thus disrupting the reading frame of the gene.

Synonymous variants

A change of nucleotide that does not lead to changes in the amino-acid sequence of a protein.

Neutral variation

Genetic variants that are not subjects of natural selection.

ROC curve

(Receiver operating characteristic curve). A visual and quantitative method of evaluating the performance of binary classifiers. The true positive rate of a classifier is plotted against the false-positive rate.

Expression quantitative trait loci

(eQTLs). Loci where variation is associated with differential expression of a gene.

Haploid

Of cells, containing a single set of chromosomes.

Ploidy

The number of sets of chromosomes in a cell.

Hemizygosity

The absence of one copy of a gene in diploid cells.

Compound heterozygosity

The state in which both alleles of a gene carry a (deleterious) variant, but those variants are different.

Nonsense-mediated mRNA decay

(NMD). A cellular pathway that serves to recognize and degrade mRNAs with translation termination codons that are positioned in abnormal contexts.

Haplotype phasing

The assignment of an allele to one of the two copies of the chromosomes (maternal and paternal).