Gibbon genome and the fast karyotype evolution of small apes

Journal name:
Nature
Volume:
513,
Pages:
195–201
Date published:
DOI:
doi:10.1038/nature13679
Received
Accepted
Published online

Abstract

Gibbons are small arboreal apes that display an accelerated rate of evolutionary chromosomal rearrangement and occupy a key node in the primate phylogeny between Old World monkeys and great apes. Here we present the assembly and analysis of a northern white-cheeked gibbon (Nomascus leucogenys) genome. We describe the propensity for a gibbon-specific retrotransposon (LAVA) to insert into chromosome segregation genes and alter transcription by providing a premature termination site, suggesting a possible molecular mechanism for the genome plasticity of the gibbon lineage. We further show that the gibbon genera (Nomascus, Hylobates, Hoolock and Symphalangus) experienced a near-instantaneous radiation ~5 million years ago, coincident with major geographical changes in southeast Asia that caused cycles of habitat compression and expansion. Finally, we identify signatures of positive selection in genes important for forelimb development (TBX5) and connective tissues (COL1A1) that may have been involved in the adaptation of gibbons to their arboreal habitat.

At a glance

Figures

  1. Geographic distribution of gibbon species used in the study.
    Figure 1: Geographic distribution of gibbon species used in the study.

    We sequenced two individuals from each gibbon genus and two different species (H. moloch and H. pileatus) for the genus Hylobates. The extant geographic localization for each genus is illustrated on the map. Individuals in the photos are the ones sequenced in this study. The asterisk symbol indicates a deceased animal.

  2. Analysis of gibbon-human synteny and breakpoints.
    Figure 2: Analysis of gibbon–human synteny and breakpoints.

    a, Oxford plots for human chromosomes (y axis) vs. chimpanzee, gorilla, orangutan, gibbon, rhesus macaque and marmoset chromosomes (x axis). Each line represents a collinear block larger than 10 Mb. The gibbon genome displays a significantly larger number of large-scale rearrangements than all the other species. In the gorilla plot, chromosomes 4 and 19 stand out as the product of a reciprocal translocation between chromosomes syntenic to human chromosomes 5 and 17. b, The graph shows the number of collinear blocks in primate genomes with respect to the human genome. The number of collinear blocks is a proxy for the number of rearrangements and decreases as the size of the blocks becomes larger. The gibbon genome has undergone a greater number of large-scale rearrangements; however, the number of small-scale rearrangements is comparable with the other species. The extremely low number of large rearrangements in the gorilla genome (dotted green line) is a reflection of the use of the human genome as a template in the assembly process. c, Examples of gibbon–human synteny breakpoints. The first two are class I breakpoints (that is, base-pair resolution) originated through non-homology based mechanisms. NLE12_1 is the result of an inversion in human chromosome 1 and NLE18_6 is the result of a translocation between human chromosomes 16 and 5 with an untemplated insertion in the gibbon sequence shown in purple; in both cases, micro-homologies in the human sequences are shown in red. The last example (NLE9_4) is a class II breakpoint (3.2 kb) containing a mixture of repetitive sequences.

  3. The LAVA element and evidence for LAVA-mediated early transcription termination.
    Figure 3: The LAVA element and evidence for LAVA-mediated early transcription termination.

    a, Schematic view of the LAVA element highlights the main components that originated from common repeats (L1, Alu, VNTR and Alu-like). Target-site duplications (TSDs) and the poly(A) tail are also indicated. b, Luciferase reporter constructs used to assay for LAVA-mediated early transcriptional termination (left panel) and results of the luciferase reporter assay (right panel) showing increased luciferase activity by ~50% relative to the background for pmiRGlo_LA_F (*P = 0.0013) (see Supplementary Information section S7.8) n = 5, five biological replicates, from five independent transfections done for each experimental condition tested. The experiment shown was replicated twice in the laboratory. Statistics were carried out using a Student’s t-test (two sided), P values for all pairwise comparisons LA_F vs. LA_E, ΔPA vs. LA_F, and ΔPA vs. LA_E respectively (with 95% CI) were adjusted for multiple comparisons according to the Bonferroni method. Centre values show the average, error bars indicate standard deviation. c, A median-joining network showing the relationships among the 22 LAVA subfamilies generated by comparing the 3′ intact LAVA elements. Coloured circles represent subfamilies and their size is proportional to the number of elements in the subfamily (numbers inside each circle). Black dots represent hypothetical sequences connecting adjacent subfamilies. All possible relationships are shown. Branch lengths are not drawn to scale.

  4. Gibbon phylogeny and demography.
    Figure 4: Gibbon phylogeny and demography.

    a, The three most frequently observed UPGMA gene trees (numbers at the top) constructed across the genome at 100-kb sliding windows and posterior probabilities (numbers at the bottom) for the same species topologies from a coalescent-based ABC analysis. The relatively low numbers observed suggest presence of substantial ILS amongst the gibbon genera. b, Parameters estimates describing gibbon population demography assuming an instant radiation for all four genera (left) and the most probable bifurcating species topology (right). Black, green and red numbers indicate divergence times and Ne as calculated by ABC, BEAST and G-PhoCS analysis, respectively (Supplementary Information section S9). c, PSMC analysis estimating changes in historical Ne. The large increase in Ne observed in our PSMC plot for SSY in recent times is probably exaggerated due to higher sequencing error and mapping biases in non-NLE samples (see details in Supplementary section S8). A generation time of 10 years45, 46 was used to obtain a per generation mutation rate of 1 × 10−8 per year.

  5. The gibbon assembly statistics and quality control.
    Extended Data Fig. 1: The gibbon assembly statistics and quality control.

    a, The table compares the gibbon assembly statistics to those of other primates sequenced with a similar strategy. b, The plot represents the percentage of the 10,734 single-copy gene HMMs (hidden Markov models) for which just one gene (blue) is found in the different mammalian genomes in Ensembl 70. Other HMMs match more than one gene (red). The missing HMMs (cyan) either do not match any protein or the score is within the range of what can be expected for unrelated proteins. The remaining category (green) represents HMMs for which the best matching gene scores better than unrelated proteins but not as well as expected. See Supplementary Information section 1.4 for more details.

  6. Analysis of gibbon-human synteny blocks and identification and validation of gibbon segmental duplications.
    Extended Data Fig. 2: Analysis of gibbon–human synteny blocks and identification and validation of gibbon segmental duplications.

    a, The image shows a representative gibbon-only whole-genome shotgun sequence detection (WSSD) call by Sanger read depth. The duplication identified in this case overlaps with the gene CHAD that codes for a cartilage matrix protein. b, Examples of fluorescence in situ hybridizations on gibbon metaphases using duplicated human fosmid clones that were identified by the (WGS) detection strategy (red signals). Left, interchromosomal duplication. Middle, interspersed intrachromosomal duplication. Right, intrachromosomal tandem duplication confirmed using co-hybridization with a single control probe (blue signals). c, Megabases of lineage-specific and shared duplications for primates based on GRChr37 read depth analysis. Copy-number corrected values by species are shown below.

  7. Analysis of LAVA element insertion in genes and early termination of transcription.
    Extended Data Fig. 3: Analysis of LAVA element insertion in genes and early termination of transcription.

    a, The histogram shows the results of permutation analyses. We find a significant association between LAVA elements and genes. Moreover, insertions are significantly enriched in introns and depleted in exons, most probably as a result of selection against insertions in exons. b, Schematic representation of the mechanism through which LAVA intronic insertions in antisense orientation might cause early termination of transcription. The truncated transcript is indicated on the diagram as A and normal transcript indicated on the diagram as B (pA = polyadenylation site). c, We calculated the distance to the nearest exon for each intronic LAVA and compared this to what would be expected for random insertions (that is, background). We found fewer insertions than expected by chance within 1 kb of the nearest exon. d, Identification of pmiRGlo_LA_F polyadenylation sites by 3′ RACE. Alignment of thirteen 3′ RACE PCR clone sequences and the pmiRGlo_LA_F sequence. LAVA_F 3′ TSD is highlighted by dark green background; the major antisense LAVA_F polyadenylation signal (MAPS) is highlighted by red background. The termination sites are marked with arrows on the LAVA_F sequence. Poly(A) tails of the identified transcripts are in red text.

  8. Evolution of the LAVA element.
    Extended Data Fig. 4: Evolution of the LAVA element.

    a, Screenshots from the Integrative Genomics Viewer (IGV) browser for loci MAP4, RABGAP1 and BBS9. Each column shows portions of the IGV visualization of a LAVA insertion locus identified in Nleu1.0 and its flanking sequence. Red rectangles indicate the margins of each LAVA insertion. Read pairs are coloured red when their insert size is larger than expected, indicating the presence of an unshared LAVA insertion. MAP4 is a shared LAVA insertion, whereas RABGAP1 and BBS9 are Nomascus specific. b, LAVA elements containing at least 300 bp of the LA section of LAVA were selected and reanalysed using RepeatMasker to determine subfamily affiliation and divergence from the consensus sequence. LAVA elements are grouped based upon their subfamily affiliations (see legend top right for colour scheme). The x axis shows the per cent divergence from the respective consensus sequence and the y axis shows the number of elements with a certain per cent divergence from the consensus sequence.

  9. Analysis of the phylogenetic relationships between gibbon genera.
    Extended Data Fig. 5: Analysis of the phylogenetic relationships between gibbon genera.

    a, Neighbour-joining trees for gibbons using non-genic loci. b, UPGMA trees for 100 kb non-overlapping sliding windows moving along the gibbon genome reporting the top 15 topologies (see also Supplementary Table ST8.3). The percentage of total support for each topology is given within each subpanel.

  10. Analysis of the relationship between gibbon accelerated regions (gibARs) and genes.
    Extended Data Fig. 6: Analysis of the relationship between gibbon accelerated regions (gibARs) and genes.

    a, Intergenic regions are enriched in gibARs. Different sequence types are shown on the x axis and the y axis displays the fraction of gibARs and candidate regions annotated to the respective class. gibARs are significantly enriched in intergenic regions (P = 4.7 × 10−6) and significantly depleted in exons (P = 7.3 × 10−6). P values for each class were calculated with the Fisher’s exact test. Introns are comparably prevalent in candidates and gibARs, whereas in the UTR and flanking region, counts are too low to draw meaningful conclusions (data not shown). b, TreeMap from REVIGO for GOslim Biological Process terms with a Benjamini–Hochberg false discovery rate of 5%. Each rectangle is a cluster representative; larger rectangles represent ‘superclusters’ including loosely related terms. The size of the rectangles reflects the P value.

Tables

  1. Genes from the /`microtubule cytoskeleton/' GO category with LAVA insertions
    Extended Data Table 1: Genes from the ‘microtubule cytoskeleton’ GO category with LAVA insertions

Accession codes

Primary accessions

GenBank/EMBL/DDBJ

Sequence Read Archive

References

  1. Mittermeier, R. A., Rylands, A. B. & Wilson, D. E. Handbook of the Mammals of the World Vol. 3 (Lynx Edicions,. (2013)
  2. Carbone, L. et al. A high-resolution map of synteny disruptions in gibbon and human genomes. PLoS Genet. 2, e223 (2006)
  3. Locke, D. P. et al. Comparative and demographic analysis of orang-utan genomes. Nature 469, 529533 (2011)
  4. Gibbs, R. A. et al. Evolutionary and biomedical insights from the rhesus macaque genome. Science 316, 222234 (2007)
  5. Girirajan, S. et al. Sequencing human–gibbon breakpoints of synteny reveals mosaic new insertions at rearrangement sites. Genome Res. 19, 178190 (2009)
  6. Carbone, L. et al. Evolutionary breakpoints in the gibbon suggest association between cytosine methylation and karyotype evolution. PLoS Genet. 5, e1000538 (2009)
  7. Bailey, J. A. & Eichler, E. E. Primate segmental duplications: crucibles of evolution, diversity and disease. Nature Rev. Genet. 7, 552564 (2006)
  8. Yan, C. T. et al. IgH class switching and translocations use a robust non-classical end-joining pathway. Nature 449, 478482 (2007)
  9. Hastings, P. J., Ira, G. & Lupski, J. R. A microhomology-mediated break-induced replication model for the origin of human copy number variation. PLoS Genet. 5, e1000327 (2009)
  10. Merkenschlager, M. & Odom, D. T. CTCF and cohesin: linking gene regulatory elements with their targets. Cell 152, 12851297 (2013)
  11. Schwalie, P. C. et al. Co-binding by YY1 identifies the transcriptionally active, highly conserved set of CTCF-bound regions in primate genomes. Genome Biol. 14, R148 (2013)
  12. Carbone, L. et al. Centromere remodeling in Hoolock leuconedys (Hylobatidae) by a new transposable element unique to the gibbons. Genome Biol. Evol. 4, 648658 (2012)
  13. Luan, D. D., Korman, M. H., Jakubczak, J. L. & Eickbush, T. H. Reverse transcription of R2Bm RNA is primed by a nick at the chromosomal target site: a mechanism for non-LTR retrotransposition. Cell 72, 595605 (1993)
  14. Huang da W, Sherman, B. T. & Lempicki, R. A. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nature Protocols 4, 4457 (2009)
  15. Mostafavi, S., Ray, D., Warde-Farley, D., Grouios, C. & Morris, Q. GeneMANIA: a real-time multiple association network integration algorithm for predicting gene function. Genome Biol. 9 (Suppl. 1). S4 (2008)
  16. Kamburov, A., Wierling, C., Lehrach, H. & Herwig, R. ConsensusPathDB—a database for integrating human functional interaction networks. Nucleic Acids Res. 37, D623D628 (2009)
  17. Baker, D. J., Jin, F., Jeganathan, K. B. & van Deursen, J. M. Whole chromosome instability caused by Bub1 insufficiency drives tumorigenesis through tumor suppressor gene loss of heterozygosity. Cancer Cell 16, 475486 (2009)
  18. Samora, C. P. et al. MAP4 and CLASP1 operate as a safety mechanism to maintain a stable spindle position in mitosis. Nature Cell Biol. 13, 10401050 (2011)
  19. Leber, B. et al. Proteins required for centrosome clustering in cancer cells. Sci. Transl. Med. 2, 33ra38 (2010)
  20. Schuyler, S. C., Wu, Y. F. & Kuan, V. J. The Mad1–Mad2 balancing act—a damaged spindle checkpoint in chromosome instability and cancer. J. Cell Sci. 125, 41974206 (2012)
  21. Maia, A. R. et al. Cdk1 and Plk1 mediate a CLASP2 phospho-switch that stabilizes kinetochore-microtubule attachments. J. Cell Biol. 199, 285301 (2012)
  22. Haraguchi, K., Hayashi, T., Jimbo, T., Yamamoto, T. & Akiyama, T. Role of the kinesin-2 family protein, KIF3, during mitosis. J. Biol. Chem. 281, 40944099 (2006)
  23. Han, J. S., Szak, S. T. & Boeke, J. D. Transcriptional disruption by the L1 retrotransposon and implications for mammalian transcriptomes. Nature 429, 268274 (2004)
  24. Wheelan, S. J., Aizawa, Y., Han, J. S. & Boeke, J. D. Gene-breaking: a new paradigm for human retrotransposon-mediated gene evolution. Genome Res. 15, 10731078 (2005)
  25. Damert, A. et al. 5′-Transducing SVA retrotransposon groups spread efficiently throughout the human genome. Genome Res. 19, 19922008 (2009)
  26. Wojtasz, L. et al. Meiotic DNA double-strand breaks and chromosome asynapsis in mice are monitored by distinct HORMAD2-independent and -dependent mechanisms. Genes Dev. 26, 958973 (2012)
  27. Marchani, E. E., Xing, J., Witherspoon, D. J., Jorde, L. B. & Rogers, A. R. Estimating the age of retrotransposon subfamilies using maximum likelihood. Genomics 94, 7882 (2009)
  28. Gronau, I., Hubisz, M. J., Gulko, B., Danko, C. G. & Siepel, A. Bayesian inference of ancient human demography from individual genome sequences. Nature Genet. 43, 10311034 (2011)
  29. Wall, J. D. et al. Incomplete lineage sorting is common in extant gibbon genera. PLoS ONE 8, e53682 (2013)
  30. Drummond, A. J. & Rambaut, A. BEAST: Bayesian evolutionary analysis by sampling trees. BMC Evol. Biol. 7, 214 (2007)
  31. Durand, E. Y., Patterson, N., Reich, D. & Slatkin, M. Testing for ancient admixture between closely related populations. Mol. Biol. Evol. 28, 22392252 (2011)
  32. Hirai, H., Hirai, Y., Domae, H. & Kirihara, Y. A most distant intergeneric hybrid offspring (Larcon) of lesser apes, Nomascus leucogenys and Hylobates lar. Hum. Genet. 122, 477483 (2007)
  33. Li, H. & Durbin, R. Inference of human population history from individual whole-genome sequences. Nature 475, 493496 (2011)
  34. Prabhakar, S. et al. Human-specific gain of function in a developmental enhancer. Science 321, 13461350 (2008)
  35. Pollard, K. S. et al. An RNA gene expressed during cortical development evolved rapidly in humans. Nature 443, 167172 (2006)
  36. Yang, Z. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 15861591 (2007)
  37. Michilsens, F., Vereecke, E. E., D’Août, K. & Aerts, P. Functional anatomy of the gibbon forelimb: adaptations to a brachiating lifestyle. J. Anat. 215, 335354 (2009)
  38. Browne, M. L. et al. Evaluation of genes involved in limb development, angiogenesis, and coagulation as risk factors for congenital limb deficiencies. Am. J. Med. Genet. A. 158A, 24632472 (2012)
  39. Marini, J. C. et al. Consortium for osteogenesis imperfecta mutations in the helical domain of type I collagen: regions rich in lethal mutations align with collagen binding sites for integrins and proteoglycans. Hum. Mutat. 28, 209221 (2007)
  40. Masuda, A. et al. hnRNP H enhances skipping of a nonfunctional exon P3A in CHRNA1 and a mutation disrupting its binding causes congenital myasthenic syndrome. Hum. Mol. Genet. 17, 40224035 (2008)
  41. Hessle, L. et al. The skeletal phenotype of chondroadherin deficient mice. PLoS ONE 8, e63080 (2013)
  42. Cane, M. A. & Molnar, P. Closing of the Indonesian seaway as a precursor to east African aridification around 3–4 million years ago. Nature 411, 157162 (2001)
  43. Xu J.-X, Ferguson D. K, Li, C.-S. & Wang Y.-F Late Miocene vegetation and climate of the Lühe region in Yunnan, southwestern China. Rev. Palaeobot. Palynol. 148, 3659 (2008)
  44. Woodruff D. S & Turner L. M The Indochinese–Sundaic zoogeographic transition: a description and analysis of terrestrial mammal species distributions. J. Biogeogr. 36, 803821 (2009)
  45. Harvey, P. H., Martin, R. D. & Clutton-Brock, T. H. in Primate Societies (eds Smuts B. B., et al.) Life histories in comparative perspective. 181196 (Chicago Univ. Press, 1987)
  46. Kim, S. K. et al. Patterns of genetic variation within and between Gibbon species. Mol. Biol. Evol. 28, 22112218 (2011)

Download references

Author information

Affiliations

  1. Oregon Health & Science University, Department of Behavioral Neuroscience, 3181 SW Sam Jackson Park Road Portland, Oregon 97239, USA.

    • Lucia Carbone &
    • Thomas J. Meyer
  2. Oregon National Primate Research Center, Division of Neuroscience, 505 NW 185th Avenue, Beaverton, Oregon 97006, USA.

    • Lucia Carbone,
    • Kimberly A. Nevonen,
    • Elizabeth Terhune &
    • Larry J. Wilhelm
  3. Oregon Health & Science University, Department of Molecular & Medical Genetics, 3181 SW Sam Jackson Park Road, Portland, Oregon 97239, USA.

    • Lucia Carbone
  4. Oregon Health & Science University, Bioinformatics and Computational Biology Division, Department of Medical Informatics & Clinical Epidemiology, 3181 SW Sam Jackson Park Road, Portland, Oregon 97239, USA.

    • Lucia Carbone,
    • Nathan H. Lazar &
    • Kemal Sonmez
  5. Baylor College of Medicine, Department of Molecular and Human Genetics, One Baylor Plaza, Houston, Texas 77030, USA.

    • R. Alan Harris
  6. Nabsys, 60 Clifford Street, Providence, Rhode Island 02903, USA.

    • Sante Gnerre
  7. University of Arizona, ARL Division of Biotechnology, Tucson, Arizona 85721, USA.

    • Krishna R. Veeramah,
    • Laurel M. Johnstone,
    • Fernando L. Mendez,
    • August E. Woerner &
    • Michael F. Hammer
  8. Stony Brook University, Department of Ecology and Evolution, Stony Brook, New York 11790, USA.

    • Krishna R. Veeramah
  9. IBE, Institut de Biologia Evolutiva (UPF-CSIC), Universitat Pompeu Fabra, PRBB, Doctor Aiguader, 88, 08003 Barcelona, Spain.

    • Belen Lorente-Galdos,
    • Marcos Fernandez-Callejo,
    • Jessica Hernandez-Rodriguez,
    • Javier Quilez &
    • Tomas Marques-Bonet
  10. Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA.

    • John Huddleston,
    • Carl Baker &
    • Evan E. Eichler
  11. Howard Hughes Medical Institute, 1705 NE Pacific Street, Seattle, Washington 98195, USA.

    • John Huddleston &
    • Evan E. Eichler
  12. European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

    • Javier Herrero,
    • Bronwen Aken,
    • Daniel Barrell,
    • Kathryn Beal,
    • Paul Flicek &
    • Matthieu Muffato
  13. The Genome Analysis Centre, Norwich Research Park, Norwich NR4 7UH, UK.

    • Javier Herrero
  14. Leibniz Institute for Primate Research, Gene Bank of Primates, German Primate Center, Göttingen 37077, Germany.

    • Christian Roos,
    • Markus Brameier &
    • Lutz Walter
  15. European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK.

    • Bronwen Aken,
    • Daniel Barrell,
    • Duncan T. Odom,
    • Stephen Searle &
    • Simon White
  16. University of Bari, Department of Biology, Via Orabona 4, 70125, Bari, Italy.

    • Fabio Anaclerio,
    • Nicoletta Archidiacono,
    • Oronzo Capozzi,
    • Giorgia Chiatante,
    • Mariano Rocchi &
    • Mario Ventura
  17. Louisiana State University, Department of Biological Sciences, Baton Rouge, Louisiana 70803, USA.

    • Mark A. Batzer,
    • Miriam K. Konkel &
    • Jerilyn A. Walker
  18. University of Paul Sabatier, Toulouse 31062, France.

    • Antoine Blancher
  19. The Johns Hopkins University School of Medicine, Department of Oncology, Division of Biostatistics and Bioinformatics, Baltimore, Maryland 21205, USA.

    • Craig L. Bohrson &
    • Sarah J. Wheelan
  20. University of Utah, Salt Lake City, Utah 84112, USA.

    • Michael S. Campbell &
    • Mark Yandell
  21. Texas A&M University, Department of Ecosystem Science and Management, College Station, Texas 77843, USA.

    • Claudio Casola
  22. Human Genome Sequencing Center, Department of Molecular and Human Genetics, Baylor College of Medicine, One Baylor Plaza, Houston, Texas 77030, USA.

    • Andrew Cree,
    • Sandra L. Lee,
    • Lora R. Lewis,
    • Yue Liu,
    • Lynne V. Nazareth,
    • Donna M. Muzny,
    • Kim C. Worley,
    • Jeffrey Rogers &
    • Richard A. Gibbs
  23. Babes-Bolyai-University, Institute for Interdisciplinary Research in Bio-Nano-Sciences, Molecular Biology Center, Cluj-Napoca 400084, Romania.

    • Annette Damert,
    • Bianca Ianc &
    • Cornelia Ochis
  24. Children’s Hospital Oakland Research Institute, BACPAC Resources, Oakland, California 94609, USA.

    • Pieter J. de Jong,
    • Boudewijn ten Hallers &
    • Baoli Zhu
  25. University of Colorado School of Medicine, Department of Biochemistry and Molecular Genetics, Aurora, Colorado 80045, USA.

    • Laura Dumas,
    • Anis Karimpour-Fard,
    • Majesta O’Bleness &
    • James M. Sikela
  26. Max Delbrück Center for Molecular Medicine, Berlin 13125, Germany.

    • Nina V. Fuchs &
    • Zsuzsanna Izsvák
  27. Centro Nacional de Análisis Genómico (CNAG), Parc Científic de Barcelona, Barcelona 08028, Spain.

    • Ivo Gut,
    • Marta Gut &
    • Tomas Marques-Bonet
  28. Indiana University, School of Informatics and Computing, Bloomington, Indiana 47408, USA.

    • Matthew W. Hahn &
    • Gregg W. C. Thomas
  29. The Genome Center at Washington University, Washington University School of Medicine, 4444 Forest Park Avenue, Saint Louis, Missouri 63108, USA.

    • LaDeana W. Hillier,
    • Devin P. Locke,
    • Arian Smit,
    • Lucinda Fulton,
    • Catrina Fronick,
    • Wesley C. Warren &
    • Richard K. Wilson
  30. Institute for Systems Biology, Seattle, Washington 98109-5234, USA.

    • Robert Hubley
  31. The Pennsylvania State University, Department of Anthropology, University Park, Pennsylvania 16802, USA.

    • Nina G. Jablonski
  32. University of Pittsburgh School of Medicine, Department of Developmental Biology, Department of Computational and Systems Biology, Pittsburg, Pennsylvania 15261, USA.

    • Dennis Kostka
  33. Harvard Medical School, Department of Genetics, Boston, Massachusetts 02115, USA.

    • Swapan Mallick &
    • David Reich
  34. University of Cambridge, Cancer Research UK-Cambridge Institute, Cambridge CB2 0RE, UK.

    • Duncan T. Odom &
    • Michelle C. Ward
  35. University of California, Gladstone Institutes, San Francisco, California 94158-226, USA.

    • Katherine S. Pollard
  36. Institute for Human Genetics, University of California, San Francisco, California 94143-0794, USA.

    • Katherine S. Pollard &
    • Jeffrey D. Wall
  37. Division of Biostatistics, University of California, San Francisco, California 94143-0794, USA.

    • Katherine S. Pollard &
    • Jeffrey D. Wall
  38. Paul Ehrlich Institute, Division of Medical Biotechnology, 63225 Langen, Germany.

    • Gerald G. Schumann
  39. Gibbon Conservation Center, 19100 Esguerra Rd, Santa Clarita, California 91350, USA.

    • Gabriella Skollar
  40. Oregon Health & Science University, Center for Spoken Language Understanding, Institute on Development and Disability, Portland, Oregon 97239, USA.

    • Kemal Sonmez &
    • Christopher W. Whelan
  41. Louisiana State University, School of Electrical Engineering and Computer Science, Baton Rouge, Louisiana 70803, USA.

    • Brygg Ullmer
  42. USTAR Center for Genetic Discovery, University of Utah, Salt Lake City, Utah 84112, USA.

  43. Present addresses: Bill Lyons Informatics Center, UCL Cancer Institute, University College London, London WC1E 6DD, UK (J.He); Seven Bridges Genomics, Cambridge, Massachusetts 02138, USA (D.P.L.); Department of Genetics, Stanford University School of Medicine, Stanford, California 94305, USA (F.L.M.); BioNano Genomics, San Diego, California 92121, USA (B.t.H.); University of Chicago, Department of Human Genetics, Chicago, Illinois 60637, USA (M.C.W.); Stanley Center for Psychiatric Research, Broad Institute, Cambridge, Massachusetts 02138, USA (C.W.W.); The CAS Key Laboratory of Pathogenic Microbiology and Immunology, Institute of Microbiology, Chinese Academy of Sciences, Beijing 100101, China (B.Z.).

    • Javier Herrero,
    • Devin P. Locke,
    • Fernando L. Mendez,
    • Boudewijn ten Hallers,
    • Michelle C. Ward,
    • Christopher W. Whelan &
    • Baoli Zhu

Contributions

L.C. led the project and the manuscript preparation. L.C., W.C.W., K.C.W., J.R., E.E.E., T.M.-B., R.A.H., K.R.V. and M.F.H. supervised the project and contributed to overall organization of the manuscript. L.C. and T.J.M. prepared the figures. Sanger data production, assembly construction and testing was carried out by: L.F., C.F., D.M.M., L.V.N., A.C., S.L.L., L.R.L., D.P.L., W.C.W., K.C.W., J.R., S.G., L.D.W.H., D.R. and S.M. Mitochondrial genome assembly was done by Y.L. Illumina sequencing production and submission: L.C., T.M.-B., J.D.W., M.F.H., E.T., L.J.W., M.G., I.G., A.B. and J.H.-R. Samples were provided by G.S. Gene set and validation of gene models: D.B., S.W., S.S., B.A., M.M., J.He., P.F., M.S.C. and M.Y. Assembly validation: B.L.-G., J.He. and T.M.-B. BAC library generation: P.J.dJ., B.tH. and B.Z. Cytogenetic analyses: M.R., N.A. and O.C. Segmental duplications and structural variations: J.Hu., C.B., B.L.-G., J.Q., M.F.-C., G.C., F.A., M.V., T.M.-B. and E.E.E. cDNA Array CGH: L.D., M.O’B., A.K.-F. and J.M.S. Comparative analysis of gibbon chromosomal rearrangements was carried out by J.He. Breakpoint analysis: L.C., C.W.W. and L.J.W. LAVA analysis: L.C., R.A.H., T.J.M., N.H.L., L.J.W., K.A.N., K.S., A.D., M.A.B., M.K.K., J.A.W., B.U., A.S. and R.H. Luciferase assay and 3′ RACE: A.D., B.I., C.O., G.G.S., N.V.F. and Z.I. RNA-seq analysis for early transcription termination: S.J.W. and C.L.B. Short-read alignments, SNP calling and population genetics analysis (autosomal DNA): L.M.J., F.L.M., A.E.W., L.J.W., K.R.V., M.F.H. and J.D.W. Population genetics analyses (mtDNA): C.R., L.W., M.B. and T.M.-B. Positive selection analyses: G.W.C.T. and M.W.H. Gene family evolution analyses: M.W.H. and C.C. Gibbon accelerated region analyses: K.S.P. and D.K. CTCF-binding analyses: M.C.W., D.T.O., P.F., E.T., C.W.W., L.J.W., J.He. and K.B. Biogeography analysis: N.G.J. and C.R. Principal investigators: R.K.W. and R.A.G.

Competing financial interests

E.E.E. is on the scientific advisory board (SAB) of DNAnexus and was an SAB member of Pacific Biosciences (2009–2013) and SynapDx (2011–2013).

Corresponding author

Correspondence to:

The N. leucogenys WGS project has been deposited in GenBank under the project accession ADFV00000000.1. All short-read data have been deposited into the Short Read Archive (http://www.ncbi.nlm.nih.gov/sra) under the accession number SRP043117. Resources for exploring the gibbon genome are available at UCSC (http://genome.ucsc.edu), Ensembl (http://ensembl.org), NCBI (http://ncbi.nlm.nih.gov), and the Baylor College of Medicine Human Genome Sequencing Center (https://www.hgsc.bcm.edu/non-human-primates/gibbon-genome-project). This paper is dedicated to the memory of Alan R. Mootnick (1951–2011).

Author details

Extended data figures and tables

Extended Data Figures

  1. Extended Data Figure 1: The gibbon assembly statistics and quality control. (453 KB)

    a, The table compares the gibbon assembly statistics to those of other primates sequenced with a similar strategy. b, The plot represents the percentage of the 10,734 single-copy gene HMMs (hidden Markov models) for which just one gene (blue) is found in the different mammalian genomes in Ensembl 70. Other HMMs match more than one gene (red). The missing HMMs (cyan) either do not match any protein or the score is within the range of what can be expected for unrelated proteins. The remaining category (green) represents HMMs for which the best matching gene scores better than unrelated proteins but not as well as expected. See Supplementary Information section 1.4 for more details.

  2. Extended Data Figure 2: Analysis of gibbon–human synteny blocks and identification and validation of gibbon segmental duplications. (352 KB)

    a, The image shows a representative gibbon-only whole-genome shotgun sequence detection (WSSD) call by Sanger read depth. The duplication identified in this case overlaps with the gene CHAD that codes for a cartilage matrix protein. b, Examples of fluorescence in situ hybridizations on gibbon metaphases using duplicated human fosmid clones that were identified by the (WGS) detection strategy (red signals). Left, interchromosomal duplication. Middle, interspersed intrachromosomal duplication. Right, intrachromosomal tandem duplication confirmed using co-hybridization with a single control probe (blue signals). c, Megabases of lineage-specific and shared duplications for primates based on GRChr37 read depth analysis. Copy-number corrected values by species are shown below.

  3. Extended Data Figure 3: Analysis of LAVA element insertion in genes and early termination of transcription. (316 KB)

    a, The histogram shows the results of permutation analyses. We find a significant association between LAVA elements and genes. Moreover, insertions are significantly enriched in introns and depleted in exons, most probably as a result of selection against insertions in exons. b, Schematic representation of the mechanism through which LAVA intronic insertions in antisense orientation might cause early termination of transcription. The truncated transcript is indicated on the diagram as A and normal transcript indicated on the diagram as B (pA = polyadenylation site). c, We calculated the distance to the nearest exon for each intronic LAVA and compared this to what would be expected for random insertions (that is, background). We found fewer insertions than expected by chance within 1 kb of the nearest exon. d, Identification of pmiRGlo_LA_F polyadenylation sites by 3′ RACE. Alignment of thirteen 3′ RACE PCR clone sequences and the pmiRGlo_LA_F sequence. LAVA_F 3′ TSD is highlighted by dark green background; the major antisense LAVA_F polyadenylation signal (MAPS) is highlighted by red background. The termination sites are marked with arrows on the LAVA_F sequence. Poly(A) tails of the identified transcripts are in red text.

  4. Extended Data Figure 4: Evolution of the LAVA element. (582 KB)

    a, Screenshots from the Integrative Genomics Viewer (IGV) browser for loci MAP4, RABGAP1 and BBS9. Each column shows portions of the IGV visualization of a LAVA insertion locus identified in Nleu1.0 and its flanking sequence. Red rectangles indicate the margins of each LAVA insertion. Read pairs are coloured red when their insert size is larger than expected, indicating the presence of an unshared LAVA insertion. MAP4 is a shared LAVA insertion, whereas RABGAP1 and BBS9 are Nomascus specific. b, LAVA elements containing at least 300 bp of the LA section of LAVA were selected and reanalysed using RepeatMasker to determine subfamily affiliation and divergence from the consensus sequence. LAVA elements are grouped based upon their subfamily affiliations (see legend top right for colour scheme). The x axis shows the per cent divergence from the respective consensus sequence and the y axis shows the number of elements with a certain per cent divergence from the consensus sequence.

  5. Extended Data Figure 5: Analysis of the phylogenetic relationships between gibbon genera. (340 KB)

    a, Neighbour-joining trees for gibbons using non-genic loci. b, UPGMA trees for 100 kb non-overlapping sliding windows moving along the gibbon genome reporting the top 15 topologies (see also Supplementary Table ST8.3). The percentage of total support for each topology is given within each subpanel.

  6. Extended Data Figure 6: Analysis of the relationship between gibbon accelerated regions (gibARs) and genes. (247 KB)

    a, Intergenic regions are enriched in gibARs. Different sequence types are shown on the x axis and the y axis displays the fraction of gibARs and candidate regions annotated to the respective class. gibARs are significantly enriched in intergenic regions (P = 4.7 × 10−6) and significantly depleted in exons (P = 7.3 × 10−6). P values for each class were calculated with the Fisher’s exact test. Introns are comparably prevalent in candidates and gibARs, whereas in the UTR and flanking region, counts are too low to draw meaningful conclusions (data not shown). b, TreeMap from REVIGO for GOslim Biological Process terms with a Benjamini–Hochberg false discovery rate of 5%. Each rectangle is a cluster representative; larger rectangles represent ‘superclusters’ including loosely related terms. The size of the rectangles reflects the P value.

Extended Data Tables

  1. Extended Data Table 1: Genes from the ‘microtubule cytoskeleton’ GO category with LAVA insertions (193 KB)

Supplementary information

PDF files

  1. Supplementary Information (18.5 MB)

    This file contains Supplementary Sections 1-6 – see Supplementary Contents for details.

  2. Supplementary Data (1.7 MB)

    This file contains Supplementary Data 3.

  3. Supplementary Data (2 MB)

    This file contains Supplementary Data 9.

Excel files

  1. Supplementary Data (43 KB)

    This file contains Supplementary Data 1.

  2. Supplementary Data (164 KB)

    This file contains Supplementary Data 2.

  3. Supplementary Data (175 KB)

    This file contains Supplementary Data 4.

  4. Supplementary Data (1.5 MB)

    This file contains Supplementary Data 5.

  5. Supplementary Data (6.4 MB)

    This file contains Supplementary Data 7.

  6. Supplementary Data (44 KB)

    This file contains Supplementary Data 8.

Other

  1. Supplementary Data (4.9 MB)

    This file contains Supplementary Data 6.

Additional data