Cichlid fishes are famous for large, diverse and replicated adaptive radiations in the Great Lakes of East Africa. To understand the molecular mechanisms underlying cichlid phenotypic diversity, we sequenced the genomes and transcriptomes of five lineages of African cichlids: the Nile tilapia (Oreochromis niloticus), an ancestral lineage with low diversity; and four members of the East African lineage: Neolamprologus brichardi/pulcher (older radiation, Lake Tanganyika), Metriaclima zebra (recent radiation, Lake Malawi), Pundamilia nyererei (very recent radiation, Lake Victoria), and Astatotilapia burtoni (riverine species around Lake Tanganyika). We found an excess of gene duplications in the East African lineage compared to tilapia and other teleosts, an abundance of non-coding element divergence, accelerated coding sequence evolution, expression divergence associated with transposable element insertions, and regulation by novel microRNAs. In addition, we analysed sequence data from sixty individuals representing six closely related species from Lake Victoria, and show genome-wide diversifying selection on coding and regulatory variants, some of which were recruited from ancient polymorphisms. We conclude that a number of molecular mechanisms shaped East African cichlid genomes, and that amassing of standing variation during periods of relaxed purifying selection may have been important in facilitating subsequent evolutionary diversification.
At a glance
- 1859) On the Origin of Species 6th edn (John Murray,
- 1944) Tempo and Mode in Evolution (Columbia Univ. Press,
- Adaptive evolution and explosive speciation: the cichlid fish model. Nature Rev. Genet. 5, 288–298 (2004)
- Ecological opportunity and sexual selection together predict adaptive radiation. Nature 487, 366–369 (2012) , &
- Morphometrics and allometry in the trophically polymorphic cichlid fish, Cichlasoma citrinellum: alternative adaptations and ontogenic changes in shape. J. Zool. 221, 237–260 (1990)
- Estimating the age of formation of lakes: an example from Lake Tanganyika, East African Rift system. Geology 21, 511–514 (1993) , &
- 585–610 (Cambridge Univ. Press, 1997) How Fast is Speciation: Molecular, Geological and Phylogenetic Evidences from Adaptive Radiations of Fish pp.
- Repeated colonization and hybridization in Lake Malawi cichlids. Curr. Biol. 21, R108–R109 (2011) et al.
- Origins of shared genetic variation in african cichlids. Mol. Biol. Evol. 30, 906–917 (2013) et al.
- Integration and evolution of the cichlid mandible: the molecular basis of alternate feeding strategies. Proc. Natl Acad. Sci. USA 102, 16287–16292 (2005) , , &
- Adaptive phenotypic plasticity in the Midas cichlid fish pharyngeal jaw and its relevance in adaptive radiation. BMC Evol. Biol. 11, 116 (2011) , , &
- Vision and behavior in an african cichlid fish. Am. Sci. 72, 58–65 (1984)
- The eyes have it: regulatory and structural changes both underlie cichlid visual pigment diversity. PLoS Biol. 7, e1000266 (2009) et al.
- Intraspecific sexual selection on a speciation trait, male coloration, in the Lake Victoria cichlid Pundamilia nyererei. Proc. R. Soc. Lond. B 271, 2445–2452 (2004) et al.
- Genetic interactions controlling sex and color establish the potential for sexual conflict in Lake Malawi cichlid fishes. Heredity 110, 239–246 (2013) &
- Sexual conflict resolved by invasion of a novel sex determiner in Lake Malawi cichlid fishes. Science 326, 998–1001 (2009) , &
- Microhabitat use, trophic patterns, and the evolution of brain structure in African cichlids. Brain Behav. Evol. 50, 167–182 (1997) , , &
- Competing signals drive telencephalon diversity. Nat. Commun. 4, 1745 (2013) et al.
- The genomic basis of adaptive evolution in threespine sticklebacks. Nature 484, 55–61 (2012) et al.
- Positive Darwinian selection drives the evolution of the morphology-related gene, EPCAM, in particularly species-rich lineages of African cichlid fishes. J. Mol. Evol. 73, 1–9 (2011) , &
- The evolution of the pro-domain of bone morphogenetic protein 4 (Bmp4) in an explosively speciated lineage of East African cichlid fishes. Mol. Biol. Evol. 19, 1628–1632 (2002) , &
- Mutational analysis of endothelin receptor b1 (rose) during neural crest and pigment pattern development in the zebrafish Danio rerio. Dev. Biol. 227, 294–306 (2000) et al.
- The evolutionary fate and consequences of duplicate genes. Science 290, 1151–1155 (2000) &
- Female preference for conspecific males based on olfactory cues in a Lake Malawi cichlid fish. Biol. Lett. 1, 411–414 (2005) , , &
- Genome duplication, a trait shared by 22000 species of ray-finned fish. Genome Res. 13, 382–390 (2003) , , , &
- Genome duplication, divergent resolution and speciation. Trends Genet. 17, 299–301 (2001) , &
- Retroelement distributions in the human genome: variations associated with age and proximity to genes. Genome Res. 12, 1483–1495 (2002) , &
- Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005) et al.
- Evolution of microRNA diversity and regulation in animals. Nature Rev. Genet. 12, 846–860 (2011)
- Tissue-dependent paired expression of miRNAs. Nucleic Acids Res. 35, 5944–5953 (2007) , , , &
- Phylogeny of the Lake Tanganyika cichlid species flock and its relationship to the Central and East African haplochromine cichlid fish faunas. Syst. Biol. 51, 113–135 (2002) , , , &
- Age of cichlids: New dates for ancient lake fish radiations. Mol. Biol. Evol. 24, 1269–1282 (2007) et al.
- Insights into hominid evolution from the gorilla genome sequence. Nature 483, 169–175 (2012) et al.
- Late Pleistocene desiccation of Lake Victoria and rapid evolution of cichlid fishes. Science 273, 1091–1093 (1996) et al.
- Genome-wide RAD sequence data provides unprecedented resolution of species boundaries and relationships in the Lake Victoria cichlid adaptive radiation. Mol. Ecol. 22, 787–798 (2012) et al.
- Adaptation from standing genetic variation. Trends Ecol. Evol. 23, 38–44 (2008) &
- The loci of evolution: how predictable is genetic evolution? Evolution 62, 2155–2177 (2008) &
- Molecular genetics of steroid 5 alpha-reductase 2 deficiency. J. Clin. Invest. 90, 799–809 (1992) et al.
- Evolutionary divergence in replicate pairs of ecotypes of Lake Victoria cichlid fish. Evol. Ecol. Res. 14, 381–401 (2012) , , &
- The first inner loop of endothelin receptor type B is necessary for specific coupling to Gα13. J. Biol. Chem. 278, 2384–2387 (2003) &
- Palmitoylation of human endothelinB. Its critical role in G protein coupling and a differential requirement for the cytoplasmic tail by G protein subtypes. J. Biol. Chem. 272, 21589–21596 (1997) et al.
- A melanocortin 1 receptor allele suggests varying pigmentation among Neanderthals. Science 318, 1453–1455 (2007) et al.
Extended data figures and tables
Extended Data Figures
- Extended Data Figure 1: Genome assembly and evolutionary rates. (193 KB)
a, Genome assembly and annotation. b, Genome-wide dN/dS. Rates are calculated from 20 resampled sets of 200 orthologous genes. Gene annotations from interspecies projections (see Methods in Supplementary Information) were excluded from the data set.
- Extended Data Figure 2: Rapid Evolution of EDNRB1. (630 KB)
a, Alignments of EDNRB1 in cichlids with human (HS), zebrafish (DAR) and medaka (ORL). Black star denotes site shown to be required to activate SRF in human by interacting with the G protein G13 (ref. 40). Red star denotes site that may affect the anchoring of the C terminus of EDNRB1to the transmembrane domain41. Highlighted are amino acid substitution in the ancestor of haplochromine and lamprologini (blue) and in the ancestor of haplochromine (red). b, Location of substitutions on 7 transmembrane domain representation (Adapted from ref. 42 Science 318, 1453–1455. Reprinted with permission from AAAS.). c, Sites (spheres) on the structure of the human kappa opioid receptor in complex (4DJH). Only the right homodimer is annotated.
- Extended Data Figure 3: Duplication in the cichlid genomes. (338 KB)
a, The number of the recently duplicated genomic regions identified by the read depth method in the five East African cichlid genomes. The numbers in red is the number of duplicated genes and the numbers in black is the corresponding branch length. b, Summary of duplication regions in the five African cichlid genomes. c, Venn diagram of the duplicated genes detected by aCGH across cichlid species relative to O. niloticus. d, Expression patterns of duplicate genes. Matrix represents the expression level of retained duplicate genes from the cichlid common ancestor in the specified tissues. Expression is showed as an inverse logit function of log2-transformed, relative sequence fragment numbers. Uncoloured fields designate missing expression data or absence of either of the paralogue copies in the annotation set.
- Extended Data Figure 4: Cichlid OR and TAAR genes. (232 KB)
a, Cichlid OR and TAAR genes identified in this study. b, PHYML tree based on the fish TAAR and OR amino acid sequences. A phylogeny tree was constructed with all OR and TAAR cichlid proteins identified in this study (n = 503 + 119) plus 229 OR and 173 TAAR genes identified in zebrafish, fugu, tetraodon, medaka and stickleback. The amino-acid sequences were aligned with MAFFT version 7 and a tree constructed with PHYML and visualized with Fig Tree (version 1.3.1). The TAAR branches are in pink and the OR branches in blue or yellow. Colours indicate the composition of the branches. Dark blue branches are made of cichlid OR only, light blue indicates the presence of cichlid and model fish OR proteins. Yellow is for branches made of OR model fish proteins, only. Light pink branches correspond to cichlid and model fish TAAR proteins and dark pink to model fish TAAR proteins only. Letters correspond to family names.
- Extended Data Figure 5: Comparison of TEs among cichlids and other vertebrate genomes. (372 KB)
a, Repeat content of selected vertebrate genomes. Table, no legend. b, Proportions of TEs in the genomes. c, Proportions of each TE class among all TEs. The TE proportions are much lower in cichlid genomes than that in zebrafish.
- Extended Data Figure 6: A comparison of TEs in the African cichlids and Medaka genomes. (424 KB)
a–f, The x-axis indicates a specific TE family at a given divergence from the consensus sequence and y-axis indicates its percentage of the genome.
- Extended Data Figure 7: Cichlid transposable elements. (281 KB)
a, Association between TE insertions and gene expression levels of orthologous genes. All 5 cichlids are merged into the same data set. Groupings are based on whether one gene copy lies within 20 kb up or downstream of a TE. b, Orientation bias of transposable elements within or near non-duplicated genes. TE orientation bias in intron sequence of 5 cichlid species. Bias is shown as log2(sense/antisense) of TE counts. c, Orientation bias of transposable elements in introns of protein coding genes. The x-axis denotes the maximum age of the TEs as divergence from the consensus sequence. The y-axis shows the proportion of TE insertions in the sense of transcription. Data points with large confidence intervals (exceeding the display range) are omitted. d, Orientation bias of LINE insertions in introns in 4% divergence wide windows in O. niloticus, N. brichardi and combined haplochromines. Proportion of sense oriented LINEs in introns is shown on the y-axis. Age is shown on the x-axis as percent divergence from the TE consensus.
- Extended Data Figure 8: Reporter gene expression of a selected O. niloticus hCNE–P. nyererei aCNE pair in transgenic zebrafish. (598 KB)
a, O. niloticus Pbx1a locus showing the conservation track and alignment of an hCNE (LG18.20714) in O. latipes and East Africa cichlids. b, Reporter gene expression in 72 hours post-fertilization (hpf) G1 transgenic zebrafish. Expression is shown for the hCNE in O. niloticus and the corresponding aCNE in P. nyererei. The P. nyererei aCNE also shows expression in circulating blood cells.
- Extended Data Figure 9: Reporter gene expression of a selected hCNE–aCNE pairs in transgenic zebrafish (G0). (445 KB)
a, b, Comparison of expression pattern driven by O. niloticus and N. brichardi aCNE #911 (UNCX locus) in 72 hours post-fertilization (hpf) zebrafish embryos. c, d, Comparison of expression pattern driven by tilapia and N. brichardi aCNE #7012 (SERPINH1 locus) in 72 hpf zebrafish embryos. e, f, Comparison of expression pattern driven by tilapia and N. brichardi aCNE #1649 (TBX2 locus) in 72 hpf zebrafish embryos. g, h, Comparison of expression pattern driven by tilapia and M. zebra aCNE #26432 (FOXP4 locus) in 72 hpf zebrafish embryos. i, j, Comparison of expression pattern driven by tilapia and A. burtoni aCNE #5509 (PROX1 locus) in 72 hpf zebrafish embryos.
- Extended Data Figure 10: Cichlid microRNAs. (1,449 KB)
a, Novelty in microRNAs mapped on the phylogenetic tree of the five cichlid species. Complementary expression of novel cichlid miRNA mir-10032 (c, e, g) and predicted target gene neurod2 (b, d, f) in stage 23 (9–10 days post-fertilization) Metriaclima zebra embryos. d, e are 18-μm sagittal sections. In d and e, arrows point to expression in the medulla (left), cerebellum (middle) and optic tectum (right). neurod2 is expressed in the neural tube (b, f), while mir-10032 is expressed in the surrounding somites (c, g). In all panels, anterior is to the right.