The genomic substrate for adaptive radiation in African cichlid fish

Journal name:
Nature
Volume:
513,
Pages:
375–381
Date published:
DOI:
doi:10.1038/nature13726
Received
Accepted
Published online

Abstract

Cichlid fishes are famous for large, diverse and replicated adaptive radiations in the Great Lakes of East Africa. To understand the molecular mechanisms underlying cichlid phenotypic diversity, we sequenced the genomes and transcriptomes of five lineages of African cichlids: the Nile tilapia (Oreochromis niloticus), an ancestral lineage with low diversity; and four members of the East African lineage: Neolamprologus brichardi/pulcher (older radiation, Lake Tanganyika), Metriaclima zebra (recent radiation, Lake Malawi), Pundamilia nyererei (very recent radiation, Lake Victoria), and Astatotilapia burtoni (riverine species around Lake Tanganyika). We found an excess of gene duplications in the East African lineage compared to tilapia and other teleosts, an abundance of non-coding element divergence, accelerated coding sequence evolution, expression divergence associated with transposable element insertions, and regulation by novel microRNAs. In addition, we analysed sequence data from sixty individuals representing six closely related species from Lake Victoria, and show genome-wide diversifying selection on coding and regulatory variants, some of which were recruited from ancient polymorphisms. We conclude that a number of molecular mechanisms shaped East African cichlid genomes, and that amassing of standing variation during periods of relaxed purifying selection may have been important in facilitating subsequent evolutionary diversification.

At a glance

Figures

  1. The adaptive radiation of African cichlid fish.
    Figure 1: The adaptive radiation of African cichlid fish.

    Top left, map of Africa showing lakes in which cichlid fish have radiated. Right, the five sequenced species: Pundamilia nyererei (endemic of Lake Victoria); Neolamprologus brichardi (endemic of Lake Tanganyika); Metriaclima zebra (endemic of Lake Malawi); Oreochromis niloticus (from rivers across northern Africa); Astatotilapia burtoni (from rivers connected to Lake Tanganyika). Major ecotypes are shown from each lake: a, pelagic zooplanktivore; b, rock-dwelling algae scraper; c, paedophage (absent from Lake Tanganyika); d, scale eater; e, snail crusher; f, reef-dwelling planktivore; g, lobe-lipped insect eater; h, pelagic piscivore; i, ancestral river-dweller also found in lakes (absent from Lake Tanganyika). Bottom left, phylogenetic tree illustrating relationships between the five sequenced species (red), major adaptive radiations and major river lineages. The tree is from ref. 4, pruned to the major lineages. Upper timescale (4), lower timescale (32). Photos by Ad Konings (Tanganyika a, b, d, e, g, h; Malawi a, c, d, e, f, g, h, i), O.S. (Victoria ag, i; Malawi b), Frans Witte (Victoria h), W.S. (Tanganyika f), Oliver Selz (Victoria f, A. burtoni), Marcel Haesler (O. niloticus).

  2. Gene duplication in the ancestry of East African lake cichlids.
    Figure 2: Gene duplication in the ancestry of East African lake cichlids.

    Black numbers represents species divergence calculated as neutral genomic divergence between the sequenced species using ~2.7 million fourfold degenerate sites from the alignment of 9 teleost genomes. This neutral substitution model suggests ~2% pairwise divergence between the three haplochromines and a ~6% divergence to N. brichardi. Red numbers represent duplicated genes. Asterisks indicate excluded branches owing to incomplete lineage sorting in haplochromines or weak support of consensus species tree.

  3. Novel cichlid microRNAs.
    Figure 3: Novel cichlid microRNAs.

    af, Complementary expression of mir-10029 (b, d, f) and its predicted target gene bmpr1b (a, c, e) in stage 18 (6 days post-fertilization) Metriaclima zebra embryos. cf are 18-μm sagittal sections. In c and d arrows point to expression (black) or lack of expression (white) in the somites, presumptive cerebellum, and optic tectum (from left to right). In e and f, arrows point to expression and lack of expression in the somites (dorsal) and the gut (ventral). In all panels, anterior is to the right.

  4. Genomic divergence stems from incomplete lineage sorting (ILS) and both old and novel coding and noncoding variation.
    Figure 4: Genomic divergence stems from incomplete lineage sorting (ILS) and both old and novel coding and noncoding variation.

    a, Coalescence times and trees supporting ILS among the genomes of allopatric East African cichlid lineages were inferred by coalHMM. The most common genealogy matches the known species tree and represents a M. zebraP. nyererei coalescence that falls between the two speciation times, Tzn (speciation M. zebraP. nyererei) and Tznb (speciation M. zebraP. nyerereiA. burtoni). In genealogies 1 (dashed line), 2, and 3, all coalescence events are ancient and occur before time Tznb. b, Phylogenetic analysis of RAD-sequence data showing well-supported differentiation among young Victoria species. The complete data set (top) renders the genus Mbipia non-monophyletic, exclusion of the top 1% divergent loci (bottom) supports monophyly of each genus. c, Genomic divergence in paired comparisons of Lake Victoria cichlids (per-site FST; black/grey are chromosomes). Sister species from top: Pundamilia nyererei/P. pundamilia and Mbipia lutea/M. mbipi differ in male breeding coloration but have conserved morphology; Neochromis omnicaeruleus/N. sp. “unicuspid scraper” and distant relatives P. pundamilia/M. mbipi and P. nyererei/M. lutea have similar coloration but differ in morphology. Red-highlighted SNPs indicate significantly divergent sites between colour-contrasting species, but not between same-colour species. Bar plots show the proportion of SNPs in four annotation categories: exons (orange), introns (dark blue), 25-kb flanking genes (turquoise), or none of the above (grey), for thresholds of increasing FST. In “All sites” and “Ancient variant sites” analyses, symbols indicate an excess of SNPs in a given annotation category compared to expectations from the full data set or from all non-ancient variant sites, respectively (FDR q-values: *q < 0.05; q = 0.05), (Supplementary Information, Data Portals, Supplementary Population Genomics FTP files).

  5. Genome assembly and evolutionary rates.
    Extended Data Fig. 1: Genome assembly and evolutionary rates.

    a, Genome assembly and annotation. b, Genome-wide dN/dS. Rates are calculated from 20 resampled sets of 200 orthologous genes. Gene annotations from interspecies projections (see Methods in Supplementary Information) were excluded from the data set.

  6. Rapid Evolution of EDNRB1.
    Extended Data Fig. 2: Rapid Evolution of EDNRB1.

    a, Alignments of EDNRB1 in cichlids with human (HS), zebrafish (DAR) and medaka (ORL). Black star denotes site shown to be required to activate SRF in human by interacting with the G protein G13 (ref. 40). Red star denotes site that may affect the anchoring of the C terminus of EDNRB1to the transmembrane domain41. Highlighted are amino acid substitution in the ancestor of haplochromine and lamprologini (blue) and in the ancestor of haplochromine (red). b, Location of substitutions on 7 transmembrane domain representation (Adapted from ref. 42 Science 318, 1453–1455. Reprinted with permission from AAAS.). c, Sites (spheres) on the structure of the human kappa opioid receptor in complex (4DJH). Only the right homodimer is annotated.

  7. Duplication in the cichlid genomes.
    Extended Data Fig. 3: Duplication in the cichlid genomes.

    a, The number of the recently duplicated genomic regions identified by the read depth method in the five East African cichlid genomes. The numbers in red is the number of duplicated genes and the numbers in black is the corresponding branch length. b, Summary of duplication regions in the five African cichlid genomes. c, Venn diagram of the duplicated genes detected by aCGH across cichlid species relative to O. niloticus. d, Expression patterns of duplicate genes. Matrix represents the expression level of retained duplicate genes from the cichlid common ancestor in the specified tissues. Expression is showed as an inverse logit function of log2-transformed, relative sequence fragment numbers. Uncoloured fields designate missing expression data or absence of either of the paralogue copies in the annotation set.

  8. Cichlid OR and TAAR genes.
    Extended Data Fig. 4: Cichlid OR and TAAR genes.

    a, Cichlid OR and TAAR genes identified in this study. b, PHYML tree based on the fish TAAR and OR amino acid sequences. A phylogeny tree was constructed with all OR and TAAR cichlid proteins identified in this study (n = 503 + 119) plus 229 OR and 173 TAAR genes identified in zebrafish, fugu, tetraodon, medaka and stickleback. The amino-acid sequences were aligned with MAFFT version 7 and a tree constructed with PHYML and visualized with Fig Tree (version 1.3.1). The TAAR branches are in pink and the OR branches in blue or yellow. Colours indicate the composition of the branches. Dark blue branches are made of cichlid OR only, light blue indicates the presence of cichlid and model fish OR proteins. Yellow is for branches made of OR model fish proteins, only. Light pink branches correspond to cichlid and model fish TAAR proteins and dark pink to model fish TAAR proteins only. Letters correspond to family names.

  9. Comparison of TEs among cichlids and other vertebrate genomes.
    Extended Data Fig. 5: Comparison of TEs among cichlids and other vertebrate genomes.

    a, Repeat content of selected vertebrate genomes. Table, no legend. b, Proportions of TEs in the genomes. c, Proportions of each TE class among all TEs. The TE proportions are much lower in cichlid genomes than that in zebrafish.

  10. A comparison of TEs in the African cichlids and Medaka genomes.
    Extended Data Fig. 6: A comparison of TEs in the African cichlids and Medaka genomes.

    af, The x-axis indicates a specific TE family at a given divergence from the consensus sequence and y-axis indicates its percentage of the genome.

  11. Cichlid transposable elements.
    Extended Data Fig. 7: Cichlid transposable elements.

    a, Association between TE insertions and gene expression levels of orthologous genes. All 5 cichlids are merged into the same data set. Groupings are based on whether one gene copy lies within 20 kb up or downstream of a TE. b, Orientation bias of transposable elements within or near non-duplicated genes. TE orientation bias in intron sequence of 5 cichlid species. Bias is shown as log2(sense/antisense) of TE counts. c, Orientation bias of transposable elements in introns of protein coding genes. The x-axis denotes the maximum age of the TEs as divergence from the consensus sequence. The y-axis shows the proportion of TE insertions in the sense of transcription. Data points with large confidence intervals (exceeding the display range) are omitted. d, Orientation bias of LINE insertions in introns in 4% divergence wide windows in O. niloticus, N. brichardi and combined haplochromines. Proportion of sense oriented LINEs in introns is shown on the y-axis. Age is shown on the x-axis as percent divergence from the TE consensus.

  12. Reporter gene expression of a selected O. niloticus hCNE-P. nyererei aCNE pair in transgenic zebrafish.
    Extended Data Fig. 8: Reporter gene expression of a selected O. niloticus hCNE–P. nyererei aCNE pair in transgenic zebrafish.

    a, O. niloticus Pbx1a locus showing the conservation track and alignment of an hCNE (LG18.20714) in O. latipes and East Africa cichlids. b, Reporter gene expression in 72 hours post-fertilization (hpf) G1 transgenic zebrafish. Expression is shown for the hCNE in O. niloticus and the corresponding aCNE in P. nyererei. The P. nyererei aCNE also shows expression in circulating blood cells.

  13. Reporter gene expression of a selected hCNE-aCNE pairs in transgenic zebrafish (G0).
    Extended Data Fig. 9: Reporter gene expression of a selected hCNE–aCNE pairs in transgenic zebrafish (G0).

    a, b, Comparison of expression pattern driven by O. niloticus and N. brichardi aCNE #911 (UNCX locus) in 72 hours post-fertilization (hpf) zebrafish embryos. c, d, Comparison of expression pattern driven by tilapia and N. brichardi aCNE #7012 (SERPINH1 locus) in 72 hpf zebrafish embryos. e, f, Comparison of expression pattern driven by tilapia and N. brichardi aCNE #1649 (TBX2 locus) in 72 hpf zebrafish embryos. g, h, Comparison of expression pattern driven by tilapia and M. zebra aCNE #26432 (FOXP4 locus) in 72 hpf zebrafish embryos. i, j, Comparison of expression pattern driven by tilapia and A. burtoni aCNE #5509 (PROX1 locus) in 72 hpf zebrafish embryos.

  14. Cichlid microRNAs.
    Extended Data Fig. 10: Cichlid microRNAs.

    a, Novelty in microRNAs mapped on the phylogenetic tree of the five cichlid species. Complementary expression of novel cichlid miRNA mir-10032 (c, e, g) and predicted target gene neurod2 (b, d, f) in stage 23 (9–10 days post-fertilization) Metriaclima zebra embryos. d, e are 18-μm sagittal sections. In d and e, arrows point to expression in the medulla (left), cerebellum (middle) and optic tectum (right). neurod2 is expressed in the neural tube (b, f), while mir-10032 is expressed in the surrounding somites (c, g). In all panels, anterior is to the right.

References

  1. Darwin, C. On the Origin of Species 6th edn (John Murray, 1859)
  2. Simpson, G. G. Tempo and Mode in Evolution (Columbia Univ. Press, 1944)
  3. Kocher, T. D. Adaptive evolution and explosive speciation: the cichlid fish model. Nature Rev. Genet. 5, 288298 (2004)
  4. Wagner, C. E., Harmon, L. J. & Seehausen, O. Ecological opportunity and sexual selection together predict adaptive radiation. Nature 487, 366369 (2012)
  5. Meyer, A. Morphometrics and allometry in the trophically polymorphic cichlid fish, Cichlasoma citrinellum: alternative adaptations and ontogenic changes in shape. J. Zool. 221, 237260 (1990)
  6. Cohen, A. S., Soreghan, M. J. & Schloz, C. A. Estimating the age of formation of lakes: an example from Lake Tanganyika, East African Rift system. Geology 21, 511514 (1993)
  7. McCune, A. How Fast is Speciation: Molecular, Geological and Phylogenetic Evidences from Adaptive Radiations of Fish pp. 585610 (Cambridge Univ. Press, 1997)
  8. Joyce, D. A. et al. Repeated colonization and hybridization in Lake Malawi cichlids. Curr. Biol. 21, R108R109 (2011)
  9. Loh, Y.-H. E. et al. Origins of shared genetic variation in african cichlids. Mol. Biol. Evol. 30, 906917 (2013)
  10. Albertson, R. C., Streelman, J. T., Kocher, T. D. & Yelick, P. C. Integration and evolution of the cichlid mandible: the molecular basis of alternate feeding strategies. Proc. Natl Acad. Sci. USA 102, 1628716292 (2005)
  11. Muschick, M., Barluenga, M., Salzburger, W. & Meyer, A. Adaptive phenotypic plasticity in the Midas cichlid fish pharyngeal jaw and its relevance in adaptive radiation. BMC Evol. Biol. 11, 116 (2011)
  12. Fernald, R. D. Vision and behavior in an african cichlid fish. Am. Sci. 72, 5865 (1984)
  13. Hofmann, C. M. et al. The eyes have it: regulatory and structural changes both underlie cichlid visual pigment diversity. PLoS Biol. 7, e1000266 (2009)
  14. Maan, M. E. et al. Intraspecific sexual selection on a speciation trait, male coloration, in the Lake Victoria cichlid Pundamilia nyererei. Proc. R. Soc. Lond. B 271, 24452452 (2004)
  15. Parnell, N. F. & Streelman, J. T. Genetic interactions controlling sex and color establish the potential for sexual conflict in Lake Malawi cichlid fishes. Heredity 110, 239246 (2013)
  16. Roberts, R. B., Ser, J. R. & Kocher, T. D. Sexual conflict resolved by invasion of a novel sex determiner in Lake Malawi cichlid fishes. Science 326, 9981001 (2009)
  17. Huber, R., vanStaaden, M. J., Kaufman, L. S. & Liem, K. F. Microhabitat use, trophic patterns, and the evolution of brain structure in African cichlids. Brain Behav. Evol. 50, 167182 (1997)
  18. Sylvester, J. B. et al. Competing signals drive telencephalon diversity. Nat. Commun. 4, 1745 (2013)
  19. Jones, F. C. et al. The genomic basis of adaptive evolution in threespine sticklebacks. Nature 484, 5561 (2012)
  20. Fan, S., Elmer, K. R. & Meyer, A. Positive Darwinian selection drives the evolution of the morphology-related gene, EPCAM, in particularly species-rich lineages of African cichlid fishes. J. Mol. Evol. 73, 19 (2011)
  21. Terai, Y., Morikawa, N. & Okada, N. The evolution of the pro-domain of bone morphogenetic protein 4 (Bmp4) in an explosively speciated lineage of East African cichlid fishes. Mol. Biol. Evol. 19, 16281632 (2002)
  22. Parichy, D. M. et al. Mutational analysis of endothelin receptor b1 (rose) during neural crest and pigment pattern development in the zebrafish Danio rerio. Dev. Biol. 227, 294306 (2000)
  23. Lynch, M. & Conery, J. S. The evolutionary fate and consequences of duplicate genes. Science 290, 11511155 (2000)
  24. Plenderleith, M., van Oosterhout, C., Robinson, R. L. & Turner, G. F. Female preference for conspecific males based on olfactory cues in a Lake Malawi cichlid fish. Biol. Lett. 1, 411414 (2005)
  25. Taylor, J. S., Braasch, I., Frickey, T., Meyer, A. & Van de Peer, Y. Genome duplication, a trait shared by 22000 species of ray-finned fish. Genome Res. 13, 382390 (2003)
  26. Taylor, J. S., Van de Peer, Y. & Meyer, A. Genome duplication, divergent resolution and speciation. Trends Genet. 17, 299301 (2001)
  27. Medstrand, P., van de Lagemaat, L. N. & Mager, D. L. Retroelement distributions in the human genome: variations associated with age and proximity to genes. Genome Res. 12, 14831495 (2002)
  28. Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 10341050 (2005)
  29. Berezikov, E. Evolution of microRNA diversity and regulation in animals. Nature Rev. Genet. 12, 846860 (2011)
  30. Ro, S., Park, C., Young, D., Sanders, K. M. & Yan, W. Tissue-dependent paired expression of miRNAs. Nucleic Acids Res. 35, 59445953 (2007)
  31. Salzburger, W., Meyer, A., Baric, S., Verheyen, E. & Sturmbauer, C. Phylogeny of the Lake Tanganyika cichlid species flock and its relationship to the Central and East African haplochromine cichlid fish faunas. Syst. Biol. 51, 113135 (2002)
  32. Genner, M. J. et al. Age of cichlids: New dates for ancient lake fish radiations. Mol. Biol. Evol. 24, 12691282 (2007)
  33. Scally, A. et al. Insights into hominid evolution from the gorilla genome sequence. Nature 483, 169175 (2012)
  34. Johnson, T. C. et al. Late Pleistocene desiccation of Lake Victoria and rapid evolution of cichlid fishes. Science 273, 10911093 (1996)
  35. Wagner, C. E. et al. Genome-wide RAD sequence data provides unprecedented resolution of species boundaries and relationships in the Lake Victoria cichlid adaptive radiation. Mol. Ecol. 22, 787798 (2012)
  36. Barrett, R. D. & Schluter, D. Adaptation from standing genetic variation. Trends Ecol. Evol. 23, 3844 (2008)
  37. Stern, D. L. & Orgogozo, V. The loci of evolution: how predictable is genetic evolution? Evolution 62, 21552177 (2008)
  38. Thigpen, A. E. et al. Molecular genetics of steroid 5 alpha-reductase 2 deficiency. J. Clin. Invest. 90, 799809 (1992)
  39. Magalhaes, I. S., Lundsgaard-Hansen, B., Mwaiko, S. & Seehausen, O. Evolutionary divergence in replicate pairs of ecotypes of Lake Victoria cichlid fish. Evol. Ecol. Res. 14, 381401 (2012)
  40. Liu, B. & Wu, D. The first inner loop of endothelin receptor type B is necessary for specific coupling to Gα13. J. Biol. Chem. 278, 23842387 (2003)
  41. Okamoto, Y. et al. Palmitoylation of human endothelinB. Its critical role in G protein coupling and a differential requirement for the cytoplasmic tail by G protein subtypes. J. Biol. Chem. 272, 2158921596 (1997)
  42. Lalueza-Fox, C. et al. A melanocortin 1 receptor allele suggests varying pigmentation among Neanderthals. Science 318, 14531455 (2007)

Download references

Author information

  1. These authors contributed equally to this work.

    • David Brawand,
    • Catherine E. Wagner &
    • Yang I. Li

Affiliations

  1. Broad Institute of MIT and Harvard, Cambridge, Massachusetts 02142, USA

    • David Brawand,
    • Jason Turner-Maier,
    • Jeremy Johnson,
    • Hyun Ji Noh,
    • Jessica Alföldi,
    • Aaron Berlin,
    • Leslie Gaffney,
    • Sante Gnerre,
    • David B. Jaffe,
    • Marcia Lara,
    • Iain MacCallum,
    • Dariusz Przybylski,
    • Filipe J. Ribeiro,
    • Ted Sharpe,
    • Ross Swofford,
    • Louise Williams,
    • Sarah Young,
    • Shuangye Yin,
    • Eric S. Lander,
    • Kerstin Lindblad-Toh &
    • Federica Di Palma
  2. MRC Functional Genomics Unit, University of Oxford, Oxford OX1 3QX, UK

    • David Brawand,
    • Yang I. Li,
    • Wilfried Haerty,
    • Luis Sanchez-Pulido &
    • Chris P. Ponting
  3. Department of Fish Ecology and Evolution, Eawag Swiss Federal Institute of Aquatic Science and Technology, Center for Ecology, Evolution & Biogeochemistry, CH-6047 Kastanienbaum, Switzerland

    • Catherine E. Wagner,
    • Lucie Greuter,
    • Salome Mwaiko &
    • Ole Seehausen
  4. Division of Aquatic Ecology, Institute of Ecology & Evolution, University of Bern, CH-3012 Bern, Switzerland

    • Catherine E. Wagner,
    • Irene Keller,
    • Lucie Greuter &
    • Ole Seehausen
  5. Gurdon Institute, Cambridge CB2 1QN, UK

    • Milan Malinsky &
    • Eric A. Miska
  6. Wellcome Trust Sanger Institute, Hinxton CB10 1SA, UK

    • Milan Malinsky,
    • Bronwen Aken,
    • Thibaut Hourlier &
    • Steve Searle
  7. Department of Biology, University of Konstanz, D-78457 Konstanz, Germany

    • Shaohua Fan,
    • Oleg Simakov &
    • Axel Meyer
  8. European Molecular Biology Laboratory, 69117 Heidelberg, Germany

    • Oleg Simakov
  9. Institute of Molecular and Cell Biology, A*STAR, 138673 Singapore

    • Alvin Y. Ng,
    • Zhi Wei Lim,
    • Alison P. Lee &
    • Byrappa Venkatesh
  10. Department of Biology, Reed College, Portland, Oregon 97202, USA

    • Etienne Bezault &
    • Suzy C. P. Renn
  11. Biology Department, Stanford University, Stanford, California 94305-5020, USA

    • Rosa Alcazar &
    • Russell D. Fernald
  12. Division of Biology and Biological Engineering, California Institute of Technology, Pasadena, California 91125, USA

    • Pamela Russell
  13. Benaroya Research Institute at Virginia Mason, Seattle, Washington 98101, USA

    • Chris Amemiya
  14. Institut Génétique et Développement, CNRS/University of Rennes, 35043 Rennes, France

    • Naoual Azzouzi,
    • Frederique Barloy-Hubler,
    • Francis Galibert,
    • Richard Guyon &
    • Michaelle Rakotomanga
  15. CIRAD, Campus International de Baillarguet, TA B-110/A, 34398 Montpellier cedex 5, France

    • Jean-François Baroiller &
    • Helena D'Cotta
  16. School of Biology, Georgia Institute of Technology, Atlanta, Georgia 30332-0230, USA

    • Ryan Bloomquist,
    • Natalie S. Haddad &
    • J. Todd Streelman
  17. Department of Biology, University of Maryland, College Park, Maryland 20742, USA

    • Karen L. Carleton,
    • Matthew A. Conte &
    • Thomas D. Kocher
  18. Animal Genetics, Institute of Animal Science, ARO, The Volcani Center, Bet-Dagan, 50250 Israel

    • Orly Eshel,
    • Gideon Hulata &
    • Micha Ron
  19. Zoological Institute, University of Basel, CH-4051 Basel, Switzerland

    • Hugo F. Gante,
    • Walter Salzburger &
    • M. Emilia Santos
  20. Department of Integrative Biology, Center for Computational Biology and Bioinformatics; The University of Texas at Austin, Austin, Texas 78712, USA

    • Rayna M. Harris &
    • Hans A. Hofmann
  21. Department of Biological Sciences, Tokyo Institute of Technology, Tokyo, 226-8501 Yokohama, Japan

    • Masato Nikaido,
    • Hidenori Nishihara &
    • Norihiro Okada
  22. Systématique, Adaptation, Evolution, National Museum of Natural History, 75005 Paris, France

    • Catherine Ozouf-Costaz
  23. Institute of Aquaculture, University of Stirling, Stirling FK9 4LA, UK

    • David J. Penman
  24. Carnegie Institution of Washington, Department of Embryology, 3520 San Martin Drive Baltimore, Maryland 21218, USA

    • Frederick J. Tan
  25. National Cheng Kung University, Tainan City, 704 Taiwan

    • Norihiro Okada
  26. Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, 751 23 Uppsala, Sweden

    • Kerstin Lindblad-Toh
  27. Vertebrate and Health Genomics, The Genome Analysis Centre, Norwich NR18 7UH, UK

    • Federica Di Palma

Contributions

T.D.K., R.D.F., A.M., O.S., J.T.S., K.L.C., N.O., J.-F.B., D.J.P. and H.A.H. conceived the original tilapia white paper. F.D.P. , K.L.-T. and E.S.L. revised, planned and oversaw the genome project. D.J.P., W.S., H. S. G., M.E.S., O.S., K.L.C., T.D.K., G.H., O.E. and H.A.H. provided tissues and RNAs for sequencing. C.A. prepared the high molecular weight tilapia DNA. M.L. extracted genomic DNA for sequencing. L.W. prepared 40-kb libraries (Fossils) for Illumina sequencing. R.S. performed quality control of RNA. J.A., J.J. and F.D.P. oversaw the sequencing and assembly of genomes and transcriptomes as well as submissions of data. J.T.M. and P.R. performed quality control of assemblies and alignments of genomes. J.M.T. performed de novo assembly of transcriptomes. M.C. performed quality control of tilapia and M. zebra assemblies. A.B., Sa.Y., I.M., S.G., D.P., F.J.R., T.S., Sh.Y. and D.B.J. assembled the genome. F.G., R.G., M.R., J.-F.B., H.D’C., C.O.-C. contributed to the tilapia radiation hybrid map. F.B.-H. and N.A. analysed the OR and TAAR gene families. B.A., T.H. and S.S. annotated the tilapia genome. D.B. and Y.I.L. annotated the N. brichardi and the lake cichlids. D.B. performed gene expression, genome evolution, gene duplication and TE insertion analyses. Y.I.L. and L. S.-P. performed quality control of RNA-seq data and assemblies, gene evolution, incomplete lineage sorting and ancient variant analyses. S.F., Oleg S. and A.M., N.O., M.N. and H.N. analysed the TE landscape of cichlid genomes. S.F., Oleg S. and A.M. performed the TE burst history analysis and analysed copy number variants using read depth. E.B. and S.C.P.R. analysed duplications by comparative genomic hybridization (aCGH). H.A.H. and R.M.H. performed PCR to validate the transcriptome. A.Y.N., Z.W.L., A.P.L. and B.V. performed conserved CNE analysis and functional assays of cichlid CNEs. M.M. and E.M. performed microRNA sequencing and annotation from embryos of cichlid species as well as target identification. R.A., F.J.T. and R.D.F. annotated adult brain microRNAs in A. burtoni. R.B., N.S.H. and J.T.S. performed microRNA and target gene in situ hybridization. O.S. designed and oversaw the population genomics data analysis from Lake Victoria species; L.G., S.M. and I.K. generated the data; C.E.W., I.K., H.J.N. and O.S. analysed the data. F.D.P., K.L.-T. and O.S. wrote the manuscript with input from D.B., C.E.W. and Y.I.L., I.K., J.T.S., W.H., C.P.P. as well as additional authors. L.G. assisted with figure preparation and coordination.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Genome assemblies and transcriptomes have been deposited in GenBank. The BioProject Identifiers are as follows. Genome sequencing: PRJNA59571 (SRP004171) for O. niloticus; PRJNA60365 (SRP004799) for N. brichardi; PRJNA60367 (SRP004869) for P. nyererei; PRJNA60369 (SRP004788) for M. zebra; and PRJNA60363 (SRP004787) for A. burtoni. Transcriptome sequencing (mRNAs): PRJNA78915 for O. niloticus; PRJNA77747 for N. brichardi; PRJNA83153 for P. nyererei; PRJNA77743 for M. zebra; and PRJNA78185 for A. burtoni. Additional SRA information for each tissue can be found in the Supplementary Informations. Transcriptome sequencing (microRNAs): PRJNA221867 (SRS489376) for O. niloticus; PRJNA222491 (SRS491903) for N. brichardi; PRJNA222489 (SRS491906) for P. nyererei; PRJNA221871 (SRS491904) for M. zebra; and PRJNA222490 (SRS491905) for A. burtoni. Cichlid microRNAs were deposited in miRBase.

Author details

Extended data figures and tables

Extended Data Figures

  1. Extended Data Figure 1: Genome assembly and evolutionary rates. (193 KB)

    a, Genome assembly and annotation. b, Genome-wide dN/dS. Rates are calculated from 20 resampled sets of 200 orthologous genes. Gene annotations from interspecies projections (see Methods in Supplementary Information) were excluded from the data set.

  2. Extended Data Figure 2: Rapid Evolution of EDNRB1. (630 KB)

    a, Alignments of EDNRB1 in cichlids with human (HS), zebrafish (DAR) and medaka (ORL). Black star denotes site shown to be required to activate SRF in human by interacting with the G protein G13 (ref. 40). Red star denotes site that may affect the anchoring of the C terminus of EDNRB1to the transmembrane domain41. Highlighted are amino acid substitution in the ancestor of haplochromine and lamprologini (blue) and in the ancestor of haplochromine (red). b, Location of substitutions on 7 transmembrane domain representation (Adapted from ref. 42 Science 318, 1453–1455. Reprinted with permission from AAAS.). c, Sites (spheres) on the structure of the human kappa opioid receptor in complex (4DJH). Only the right homodimer is annotated.

  3. Extended Data Figure 3: Duplication in the cichlid genomes. (338 KB)

    a, The number of the recently duplicated genomic regions identified by the read depth method in the five East African cichlid genomes. The numbers in red is the number of duplicated genes and the numbers in black is the corresponding branch length. b, Summary of duplication regions in the five African cichlid genomes. c, Venn diagram of the duplicated genes detected by aCGH across cichlid species relative to O. niloticus. d, Expression patterns of duplicate genes. Matrix represents the expression level of retained duplicate genes from the cichlid common ancestor in the specified tissues. Expression is showed as an inverse logit function of log2-transformed, relative sequence fragment numbers. Uncoloured fields designate missing expression data or absence of either of the paralogue copies in the annotation set.

  4. Extended Data Figure 4: Cichlid OR and TAAR genes. (232 KB)

    a, Cichlid OR and TAAR genes identified in this study. b, PHYML tree based on the fish TAAR and OR amino acid sequences. A phylogeny tree was constructed with all OR and TAAR cichlid proteins identified in this study (n = 503 + 119) plus 229 OR and 173 TAAR genes identified in zebrafish, fugu, tetraodon, medaka and stickleback. The amino-acid sequences were aligned with MAFFT version 7 and a tree constructed with PHYML and visualized with Fig Tree (version 1.3.1). The TAAR branches are in pink and the OR branches in blue or yellow. Colours indicate the composition of the branches. Dark blue branches are made of cichlid OR only, light blue indicates the presence of cichlid and model fish OR proteins. Yellow is for branches made of OR model fish proteins, only. Light pink branches correspond to cichlid and model fish TAAR proteins and dark pink to model fish TAAR proteins only. Letters correspond to family names.

  5. Extended Data Figure 5: Comparison of TEs among cichlids and other vertebrate genomes. (372 KB)

    a, Repeat content of selected vertebrate genomes. Table, no legend. b, Proportions of TEs in the genomes. c, Proportions of each TE class among all TEs. The TE proportions are much lower in cichlid genomes than that in zebrafish.

  6. Extended Data Figure 6: A comparison of TEs in the African cichlids and Medaka genomes. (424 KB)

    af, The x-axis indicates a specific TE family at a given divergence from the consensus sequence and y-axis indicates its percentage of the genome.

  7. Extended Data Figure 7: Cichlid transposable elements. (281 KB)

    a, Association between TE insertions and gene expression levels of orthologous genes. All 5 cichlids are merged into the same data set. Groupings are based on whether one gene copy lies within 20 kb up or downstream of a TE. b, Orientation bias of transposable elements within or near non-duplicated genes. TE orientation bias in intron sequence of 5 cichlid species. Bias is shown as log2(sense/antisense) of TE counts. c, Orientation bias of transposable elements in introns of protein coding genes. The x-axis denotes the maximum age of the TEs as divergence from the consensus sequence. The y-axis shows the proportion of TE insertions in the sense of transcription. Data points with large confidence intervals (exceeding the display range) are omitted. d, Orientation bias of LINE insertions in introns in 4% divergence wide windows in O. niloticus, N. brichardi and combined haplochromines. Proportion of sense oriented LINEs in introns is shown on the y-axis. Age is shown on the x-axis as percent divergence from the TE consensus.

  8. Extended Data Figure 8: Reporter gene expression of a selected O. niloticus hCNE–P. nyererei aCNE pair in transgenic zebrafish. (598 KB)

    a, O. niloticus Pbx1a locus showing the conservation track and alignment of an hCNE (LG18.20714) in O. latipes and East Africa cichlids. b, Reporter gene expression in 72 hours post-fertilization (hpf) G1 transgenic zebrafish. Expression is shown for the hCNE in O. niloticus and the corresponding aCNE in P. nyererei. The P. nyererei aCNE also shows expression in circulating blood cells.

  9. Extended Data Figure 9: Reporter gene expression of a selected hCNE–aCNE pairs in transgenic zebrafish (G0). (445 KB)

    a, b, Comparison of expression pattern driven by O. niloticus and N. brichardi aCNE #911 (UNCX locus) in 72 hours post-fertilization (hpf) zebrafish embryos. c, d, Comparison of expression pattern driven by tilapia and N. brichardi aCNE #7012 (SERPINH1 locus) in 72 hpf zebrafish embryos. e, f, Comparison of expression pattern driven by tilapia and N. brichardi aCNE #1649 (TBX2 locus) in 72 hpf zebrafish embryos. g, h, Comparison of expression pattern driven by tilapia and M. zebra aCNE #26432 (FOXP4 locus) in 72 hpf zebrafish embryos. i, j, Comparison of expression pattern driven by tilapia and A. burtoni aCNE #5509 (PROX1 locus) in 72 hpf zebrafish embryos.

  10. Extended Data Figure 10: Cichlid microRNAs. (1,449 KB)

    a, Novelty in microRNAs mapped on the phylogenetic tree of the five cichlid species. Complementary expression of novel cichlid miRNA mir-10032 (c, e, g) and predicted target gene neurod2 (b, d, f) in stage 23 (9–10 days post-fertilization) Metriaclima zebra embryos. d, e are 18-μm sagittal sections. In d and e, arrows point to expression in the medulla (left), cerebellum (middle) and optic tectum (right). neurod2 is expressed in the neural tube (b, f), while mir-10032 is expressed in the surrounding somites (c, g). In all panels, anterior is to the right.

Supplementary information

PDF files

  1. Supplementary Information (1.9 MB)

    This file contains Supplementary Text, Supplementary References and links to FTP Data Portals – see Supplementary contents for details.

  2. Supplementary Data (14.6 MB)

    This file contains Supplementary FTP Figures 1-13 and Supplementary FTP Tables 1-4.

Additional data