Abstract

Genetic differences that specify unique aspects of human evolution have typically been identified by comparative analyses between the genomes of humans and closely related primates1, including more recently the genomes of archaic hominins2,3. Not all regions of the genome, however, are equally amenable to such study. Recurrent copy number variation (CNV) at chromosome 16p11.2 accounts for approximately 1% of cases of autism4,5 and is mediated by a complex set of segmental duplications, many of which arose recently during human evolution. Here we reconstruct the evolutionary history of the locus and identify bolA family member 2 (BOLA2) as a gene duplicated exclusively in Homo sapiens. We estimate that a 95-kilobase-pair segment containing BOLA2 duplicated across the critical region approximately 282 thousand years ago (ka), one of the latest among a series of genomic changes that dramatically restructured the locus during hominid evolution. All humans examined carried one or more copies of the duplication, which nearly fixed early in the human lineage—a pattern unlikely to have arisen so rapidly in the absence of selection (P < 0.0097). We show that the duplication of BOLA2 led to a novel, human-specific in-frame fusion transcript and that BOLA2 copy number correlates with both RNA expression (r = 0.36) and protein level (r = 0.65), with the greatest expression difference between human and chimpanzee in experimentally derived stem cells. Analyses of 152 patients carrying a chromosome 16p11.2 rearrangement show that more than 96% of breakpoints occur within the H. sapiens-specific duplication. In summary, the duplicative transposition of BOLA2 at the root of the H. sapiens lineage about 282 ka simultaneously increased copy number of a gene associated with iron homeostasis and predisposed our species to recurrent rearrangements associated with disease.

Access optionsAccess options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Accessions

Primary accessions

BioProject

Gene Expression Omnibus

Data deposits

Clone sequences, haplotype contig sequences and MIP data are available at the NCBI BioProject database under accession number PRJNA325679. RNA-seq data for neural progenitor cells and neurons are available at NCBI Gene Expression Omnibus under accession numbers GSE47626 and GSE83638. Patient WGS and MIP data are available at SFARI Base (https://sfari.org/resources/sfari-base) under accession numbers SFARI_SVIP_WGS_1 and SFARI_SVIP_MIPS_1.

References

  1. 1.

    & Evolution at two levels in humans and chimpanzees. Science 188, 107–116 (1975)

  2. 2.

    et al. The complete genome sequence of a Neanderthal from the Altai Mountains. Nature 505, 43–49 (2014)

  3. 3.

    et al. A high-coverage genome sequence from an archaic Denisovan individual. Science 338, 222–226 (2012)

  4. 4.

    et al. Association between microdeletion and microduplication at 16p11.2 and autism. N. Engl. J. Med. 358, 667–675 (2008)

  5. 5.

    et al. Recurrent 16p11.2 microdeletions in autism. Hum. Mol. Genet. 17, 628–638 (2008)

  6. 6.

    et al. Reconstructing complex regions of genomes using long-read sequencing technology. Genome Res. 24, 688–696 (2014)

  7. 7.

    et al. Positive selection of a gene family during the emergence of humans and African apes. Nature 413, 514–519 (2001)

  8. 8.

    et al. A 600 kb deletion syndrome at 16p11.2 leads to energy imbalance and neuropsychiatric disorders. J. Med. Genet. 49, 660–668 (2012)

  9. 9.

    et al. Mirror extreme BMI phenotypes associated with gene dosage at the chromosome 16p11.2 locus. Nature 478, 97–102 (2011)

  10. 10.

    , , , & Genetic evidence for complex speciation of humans and chimpanzees. Nature 441, 1103–1108 (2006)

  11. 11.

    et al. A global reference for human genetic variation. Nature 526, 68–74 (2015)

  12. 12.

    et al. Diversity of human copy number variation and multicopy genes. Science 330, 641–646 (2010)

  13. 13.

    et al. Ancient human genomes suggest three ancestral populations for present-day Europeans. Nature 513, 409–413 (2014)

  14. 14.

    et al. Genome sequence of a 45,000-year-old modern human from western Siberia. Nature 514, 445–449 (2014)

  15. 15.

    et al. The genetic structure and history of Africans and African Americans. Science 324, 1035–1044 (2009)

  16. 16.

    , & The projection of a test genome onto a reference population and applications to humans and archaic hominins. Genetics 198, 1655–1670 (2014)

  17. 17.

    Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 18, 337–338 (2002)

  18. 18.

    & MSMS: a coalescent simulation program including recombination, demographic structure and selection at a single locus. Bioinformatics 26, 2064–2065 (2010)

  19. 19.

    et al. Excavating Neandertal and Denisovan DNA from the genomes of Melanesian individuals. Science 352, 235–239 (2016)

  20. 20.

    et al. Transcriptome and genome sequencing uncovers functional variation in humans. Nature 501, 506–511 (2013)

  21. 21.

    et al. Differential L1 regulation in pluripotent stem cells of humans and apes. Nature 503, 525–529 (2013)

  22. 22.

    et al. The evolution of gene expression levels in mammalian organs. Nature 478, 343–348 (2011)

  23. 23.

    Simons VIP Consortium. Simons Variation in Individuals Project (Simons VIP): a genetics-first approach to studying autism spectrum and related neurodevelopmental disorders. Neuron 73, 1063–1067 (2012)

  24. 24.

    et al. Rapid and accurate large-scale genotyping of duplicated genes and discovery of interlocus gene conversions. Nature Methods 10, 903–909 (2013)

  25. 25.

    et al. Genetic signatures of strong recent positive selection at the lactase gene. Am. J. Hum. Genet. 74, 1111–1120 (2004)

  26. 26.

    , , , & Human glutaredoxin 3 forms [2Fe-2S]-bridged complexes with human BolA2. Biochemistry 51, 1687–1696 (2012)

  27. 27.

    & Monothiol CGFS glutaredoxins and BolA-like proteins: [2Fe-2S] binding partners in iron homeostasis. Biochemistry 51, 4377–4389 (2012)

  28. 28.

    et al. Identification of FRA1 and FRA2 as genes involved in regulating the yeast iron regulon in response to decreased mitochondrial iron-sulfur cluster synthesis. J. Biol. Chem. 283, 10276–10286 (2008)

  29. 29.

    et al. Crucial function of vertebrate glutaredoxin 3 (PICOT) in iron homeostasis and hemoglobin maturation. Mol. Biol. Cell 24, 1895–1903 (2013)

  30. 30.

    , , & Elucidating the molecular function of human BOLA2 in GRX3-dependent anamorsin maturation pathway. J. Am. Chem. Soc. 137, 16133–16143 (2015)

  31. 31.

    et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nature Methods 10, 563–569 (2013)

  32. 32.

    The 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012)

  33. 33.

    et al. Global diversity, population stratification, and selection of human copy-number variation. Science 349, aab3761 (2015)

  34. 34.

    et al. Great ape genetic diversity and population history. Nature 499, 471–475 (2013)

  35. 35.

    , , , & Single molecule molecular inversion probes for targeted, high-accuracy detection of low-frequency variation. Genome Res. 23, 843–854 (2013)

  36. 36.

    , , & Near-optimal RNA-seq quantification. Preprint at (2015)

  37. 37.

    , , & Resolving genomic disorder-associated breakpoints within segmental DNA duplications using massively parallel sequencing. Nature Protocols 9, 1496–1513 (2014)

  38. 38.

    et al. Palindromic GOLGA8 core duplicons promote chromosome 15q13.3 microdeletion and evolutionary instability. Nature Genet. 46, 1293–1302 (2014)

  39. 39.

    & The Simons Simplex Collection: a resource for identification of autism genetic risk factors. Neuron 68, 192–195 (2010)

Download references

Acknowledgements

We thank families at the participating Simons Variation in Individuals Project (Simons VIP) and Simons Simplex Collection sites, as well as the Simons VIP Consortium. Approved researchers can obtain the Simons VIP data set, the Simons Simplex Collection data set and/or biospecimens by applying at https://base.sfari.org. We thank M. Chaisson for single-molecule, real-time WGS data, B. Vernot for archaic introgression data, B. J. Nelson and K. Munson for technical assistance, M. L. Gage for editorial comments and T. Brown for assistance with manuscript preparation. This work was supported by the Paul G. Allen Foundation (grant 11631 to E.E.E.), the Simons Foundation Autism Research Initiative (SFARI 303241 to E.E.E. and 274424 to A.R.), the US National Institutes of Health (NIH grant 2R01HG002385 to E.E.E.), the Swiss National Science Foundation (31003A_160203 and CRSII33-133044 to A.R.) and funds from NIH TR01 MH095741, the Helmsley Charitable Fund, the Mathers Foundation and the JPB Foundation (to F.H.G.). X.N. was supported by a US National Science Foundation Graduate Research Fellowship under grant DGE-1256082. G.G. was awarded a Pro-Women Scholarship from the Faculty of Biology and Medicine, University of Lausanne. M.H.D. is supported by US National Institute of Mental Health grant 1F30MH105055-01. O.P. is a recipient of a Human Frontier Science Program postdoctoral fellowship. L.B. is supported by EC grant N653706, project iNEXT. S.C.B. and F.C. were supported by an Ente Cassa di Risparmio grant (2013/7201). E.E.E. is an investigator of the Howard Hughes Medical Institute. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Author notes

    • Peter H. Sudmant

    Present address: Program in Computational and Systems Biology, Massachusetts Institute of Technology, Cambridge, Massachusetts 02142, USA.

    • Xander Nuttle
    •  & Giuliana Giannuzzi

    These authors contributed equally to this work.

Affiliations

  1. Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA

    • Xander Nuttle
    • , Michael H. Duyzend
    • , Joshua G. Schraiber
    • , Peter H. Sudmant
    • , Osnat Penn
    • , Maika Malig
    • , John Huddleston
    • , Holly A. F. Stessman
    • , Laura Denman
    • , Lana Harshman
    • , Carl Baker
    • , Archana Raja
    • , Kelsi Penewit
    • , Nicolette Janke
    • , Joshua M. Akey
    •  & Evan E. Eichler
  2. Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland

    • Giuliana Giannuzzi
    •  & Alexandre Reymond
  3. Laboratory of Genetics, The Salk Institute for Biological Studies, 10010 North Torrey Pines Road, La Jolla, California 92037, USA

    • Iñigo Narvaiza
    • , Chris Benner
    • , Maria C. N. Marchetto
    •  & Fred H. Gage
  4. Dipartimento di Biologia, Università degli Studi di Bari ‘Aldo Moro’, Bari 70125, Italy

    • Giorgia Chiatante
    • , Mario Ventura
    •  & Francesca Antonacci
  5. Howard Hughes Medical Institute, Seattle, Washington 98195, USA

    • John Huddleston
    • , Archana Raja
    •  & Evan E. Eichler
  6. Department of Chemistry, University of Florence, Via della Lastruccia 3, 50019 Sesto Fiorentino, Florence, Italy

    • Francesca Camponeschi
    • , Simone Ciofi-Baffoni
    •  & Lucia Banci
  7. Magnetic Resonance Center CERM, University of Florence, Via Luigi Sacconi 6, 50019, Sesto Fiorentino, Florence, Italy

    • Simone Ciofi-Baffoni
    •  & Lucia Banci
  8. Benaroya Research Institute at Virginia Mason, Seattle, Washington 98101, USA

    • W. Joyce Tang
    •  & Chris T. Amemiya
  9. Center for Academic Research and Training in Anthropogeny (CARTA), 9500 Gilman Drive, La Jolla, California 92093, USA

    • Fred H. Gage

Authors

  1. Search for Xander Nuttle in:

  2. Search for Giuliana Giannuzzi in:

  3. Search for Michael H. Duyzend in:

  4. Search for Joshua G. Schraiber in:

  5. Search for Iñigo Narvaiza in:

  6. Search for Peter H. Sudmant in:

  7. Search for Osnat Penn in:

  8. Search for Giorgia Chiatante in:

  9. Search for Maika Malig in:

  10. Search for John Huddleston in:

  11. Search for Chris Benner in:

  12. Search for Francesca Camponeschi in:

  13. Search for Simone Ciofi-Baffoni in:

  14. Search for Holly A. F. Stessman in:

  15. Search for Maria C. N. Marchetto in:

  16. Search for Laura Denman in:

  17. Search for Lana Harshman in:

  18. Search for Carl Baker in:

  19. Search for Archana Raja in:

  20. Search for Kelsi Penewit in:

  21. Search for Nicolette Janke in:

  22. Search for W. Joyce Tang in:

  23. Search for Mario Ventura in:

  24. Search for Lucia Banci in:

  25. Search for Francesca Antonacci in:

  26. Search for Joshua M. Akey in:

  27. Search for Chris T. Amemiya in:

  28. Search for Fred H. Gage in:

  29. Search for Alexandre Reymond in:

  30. Search for Evan E. Eichler in:

Contributions

X.N., G.G., M.H.D., A.Re. and E.E.E. designed the study. X.N., G.G., M.H.D., M.M., J.H., L.D., L.H., C.Ba., A.Ra. and K.P. contributed to sequencing and assembly of haplotypes. X.N. developed the evolutionary model, with input from G.G. P.H.S. genotyped aggregate copy number from WGS data. X.N. and M.H.D. performed MIP experiments and analysed WGS data to genotype paralogue-specific copy number and refine rearrangement breakpoints. N.J. performed massively parallel sequencing. J.G.S., M.H.D. and X.N. performed population genetic simulations, with input from J.M.A. G.G. analysed RNA-seq data from LCLs, performed western blots and assessed the correlation of expression with copy number. I.N., C.Be. and M.C.N.M. performed and analysed RNA-seq experiments over in vitro differentiation of experimentally derived primate stem cells, with supervision from F.H.G. O.P., G.G. and X.N. analysed RNA-seq data from different human and nonhuman primate tissues. J.H. performed inversion density simulations using data provided by F.A. and M.V. G.C. and F.A. performed fluorescence in situ hybridization (FISH) experiments. F.C., S.C.B., H.A.F.S. and L.B. performed functional experiments and provided insights into potential effects of increased BOLA2 dosage. W.J.T. and C.T.A. constructed a bacterial artificial chromosome library. X.N. and E.E.E. wrote the paper, with input and approval from all co-authors.

Competing interests

E.E.E. is on the scientific advisory board of DNAnexus, Inc., and is a consultant for the Kunming University of Science and Technology as part of the 1000 China Talent Program.

Corresponding authors

Correspondence to Alexandre Reymond or Evan E. Eichler.

Reviewer Information

Nature thanks D. Conrad, D. Haussler, C. Tyler-Smith and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Extended data

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    This file contains Supplementary Text – see contents page for details.

Excel files

  1. 1.

    Supplementary Data

    This file contains Supplementary Tables 1-19.

About this article

Publication history

Received

Accepted

Published

DOI

https://doi.org/10.1038/nature19075

Further reading

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Newsletter Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing