Genome-enabled discovery of evolutionary divergence in brains and behavior

Lake Malawi cichlid fishes exhibit extensive divergence in form and function built from a relatively small number of genetic changes. We compared the genomes of rock- and sand-dwelling species and asked which genetic variants differed among the groups. We found that 96% of differentiated variants reside in non-coding sequence but these non-coding diverged variants are evolutionarily conserved. Genome regions near differentiated variants are enriched for craniofacial, neural and behavioral categories. Following leads from genome sequence, we used rock- vs. sand-species and their hybrids to (i) delineate the push–pull roles of BMP signaling and irx1b in the specification of forebrain territories during gastrulation and (ii) reveal striking context-dependent brain gene expression during adult social behavior. Our results demonstrate how divergent genome sequences can predict differences in key evolutionary traits. We highlight the promise of evolutionary reverse genetics—the inference of phenotypic divergence from unbiased genome sequencing and then empirical validation in natural populations.


Results
The genomic signature of rock-sand divergence. We compared the genomes of 8 rock dwellers and 14 sand dwellers to uncover the genomic signature of rock-versus sand-evolutionary diversification. We aligned sequence data to a reference genome of nearly 1 gigabase 34 and identified approximately 22 million Single Nucleotide Polymorphisms (SNPs) and 200,000 Insertion-Deletions (InDels). We calculated F ST per variant, and averaged across 10 kb windows, to quantify divergence between rock and sand species. We found that 0.06% of SNPs and 0.44% of InDels are alternately fixed between rock-and sand-groups. When these divergent variants and genome regions (2.5% FDR) are mapped to linkage groups (chromosomes), it is apparent that the signature of rock-vs. sand-divergence is distributed relatively evenly across the chromosomes (Fig. 1B). Among fixed variants, 3.5% were found in coding regions and 96.5% were predicted to be non-coding; ~ 17% in intergenic regions, 38% in introns, 38% in flanking regions (within 25 kb up-or down-stream of a gene), and 3% in annotated UTRs. Rock vs. sand fixed coding variants were more likely to be missense/loss-of-function (72.6%) than silent (27.3%).
We next generated whole-genome alignments of five published cichlid reference genomes from across East Africa 35 and estimated an evolutionary conservation score for each nucleotide position. Akin to phylogenetic footprinting, this approach allows inference of function for regions that are slower to change than others due to the long-term effect of purifying selection. For both coding and non-coding portions of the genome, we found that rock-sand divergence correlates positively with evolutionary conservation scores (Fig. 1C), suggesting that differentiated rock-sand variants, including many non-coding variants, are enriched for function. 4484 genes lie within 25 kb of either an alternately fixed variant or a highly divergent 10 kb window (2.5% FDR). Pathway enrichment analysis 36 of human homologs/analogs for these genes reveals categories spanning early embryonic development, craniofacial morphogenesis, brain development, synaptic transmission and neuronal function (Supplementary Table S2). In particular, rock-sand divergent genes are enriched for GO Biological Process terms 'telencephalon development' (p < 1.7e−18), 'adult behavior' (p < 2e−14), 'synaptic plasticity' (p < 1.4e−12), 'odontogenesis' (p < 3.7e−11), 'response to BMP' (p < 3.2e−09), 'gastrulation' (p < 5.6e−06), 'face morphogenesis' (p < 8.9e−08), 'neural crest cell differentiation' (p < 4.3e−13), and 'eye development' (p < 1.3e−15). Over-represented gene families included nuclear hormone receptors (p < 3.0e−08), HOXL subclass homeoboxes (p < 1.4e−07), TALE class homeoboxes (p < 4.2e−04) and Forkhead boxes (p < 9.33e−04; for novel expression domains in cichlid foxp2 see Supplementary Fig. S2). We observed enrichment for the mouse phenotypes 'abnormal cognition' (p < 3.3e−15), 'abnormal learning and memory' (p < 2.6e−15), 'abnormal craniofacial morphology' (p < 4.9e−11) and 'abnormal social/conspecific interaction' (p < 7.1e −14). We used the list of differentiated genes to query an Allen Brain Atlas dataset that reports gene expression in hundreds of brain regions 37 . Rock-sanddivergent genes were enriched for the basomedial nucleus of the amygdala, a sub-region of the telencephalon (p adj = 0.001) that regulates fear, anxiety, and physiological responses to territorial intruders in rodents 38,39 , and has been linked to Social Anxiety Disorder in humans 40 . Finally, we queried databases of genes involved in human disease. Genes near divergent variants are significantly enriched for factors implicated in neurological disease like Autism Spectrum Disorder (SFARI 41 , Fisher's exact test p value < 2e−16) and disorders related to the neural crest 42 , (Fisher's exact test p value < 2e−16).
Given the prevalence of evolutionarily conserved, non-coding, divergent rock-sand variants and genomewide enrichment for craniofacial and neural crest biology, we examined overlap with published datasets of Figure 1. The genomic substrate for rock vs. sand evolution. (A) A maximum likelihood phylogeny of eight rock-and 14 sand-dwelling species, based on variable sites (informative SNP and InDels) identified throughout the genome. "Sand" species contain representatives of the lineages "shallow benthic, deep benthic and utaka" from Ref. 32 , while the "rock" species correspond to the "mbuna" lineage. (B) A plot of Z-F ST (F ST normalized using Fisher's Z-transformation) across the genome (ggplot2 v3.3.3https:// ggplo t2. tidyv erse. org/), plotting genomic divergence between rock-vs. sand-dwelling groups. Single nucleotide polymorphisms (SNPs) summed over 10 kb bins and insertion-deletion mutations (InDels) are shown on the same scale. Numbers along the x-axis refer to linkage groups (i.e., chromosomes) and threshold lines indicate 2.5% FDR. (C) Evolutionary conservation (PhastCons) scores were calculated for each nucleotide across the genome, subdivided by genome annotation and plotted by bins of increasing F ST . The PhastCons score for each genome category is significantly higher for increasing bins of F ST (pairwise Wilcoxon rank sum p value < 2e−16 for each pair of bins). www.nature.com/scientificreports/ mammalian neural crest and craniofacial enhancers 43,44 . These data allow us to identify craniofacial and cranial neural crest cell (CNCC) enhancers conserved between mammals and cichlids and fixed variants between rock and sand species within these conserved regulatory elements. A total of 275 craniofacial enhancer elements and 234 human CNCC enhancers are evolutionarily conserved between mammals and cichlids. We found divergent rock-sand mutations within the enhancer elements of key genes integral to CNCC specification and migration (Supplementary Table S2). Notably, from both datasets, fixed rock-sand variants were found within the enhancer region of the gene nr2f2, a nuclear receptor and master neural crest regulator 45 . Rock-sand divergent variants were similarly located within craniofacial enhancers of three genes (yap1, fat4, rere) that function in the Hippo signaling pathway, as well as within enhancers of irx3 and axin2. These data linking rock-sand fixed SNP/InDels to evolutionarily conserved, experimentally verified enhancers further underscore the importance of non-coding variation in the craniofacial evolution of rock-and sand-lineages 46 . Genome-wide divergence between rock vs. sand Malawi cichlids involves a relatively small percentage of genetic variants. Divergent variants are (a) predominantly non-coding, (b) in long-term evolutionarily conserved loci (c) enriched for genes and pathways involved in embryonic development, brain development, brain function and behavior, and craniofacial morphogenesis. Given these strong patterns of enrichment, we used the experimental power of the Malawi cichlid system to interrogate features of early development and adult behavior that differ between rock-and sand-groups.
A gastrula-stage map of forebrain diversification. Rock-vs. sand-dwelling Malawi cichlids exhibit divergence in or near genes enriched for BMP signalling, gastrulation, eye and telencephalon development, as well as the TALE (Irx) gene family. To explore the developmental consequences of this differentiation, we investigated early forebrain specification in rock-and sand-embryos, building upon our previous studies and interest in irx1b and early brain development 25,26 . During development, the complexity of the vertebrate brain is first laid out in the neural plate, a single-cell thick sheet of cells that forms between non-neural ectoderm and the germ ring at gastrulation. Irx genes act as transcriptional repressors of BMP signal in gastrulation, and function to specify the neural plate 47 . BMPs, in turn, are protective of the anterior-most region of the neural plate, which will ultimately give rise to the telencephalon, and suppress the eye field 48 . Given alternatively fixed variants in the irx1b gene, expected interactions between Irx and BMP signaling in the early embryo and known telencephalon vs. eye size differences between rock-vs. sand-species 25,26 , we examined and quantified the early activity of irx1b and BMP in rock-vs. sand-embryos.
We used a custom device to orient and image cichlid embryos in toto at gastrula and neurula stages 49 . In early gastrula (EG), irx1b (red) and BMP signal (green, PSMAD) delineate complementary dorsal and ventral domains of the embryo ( Fig. 2A). By mid-gastrula (MG), irx1b shows two expression domains, one in the posterior portion of the developing neural plate (np) and the second co-expressed with PSMAD activity around its anterior border (white arrowheads). By late gastrula (LG), the domains of irx1b expression and PSMAD activity sharpen around the leading edge of the neural plate but remain overlapping around the periphery. Notably, irx1b expression is expanded in the anterior domain of sand-dwellers (S) compared to rock-dwellers at EG and MG, and then defines the boundary of the neural plate earlier in sand-dwellers (S) in LG (arrowheads). As a consequence, BMP signal should have a longer-lasting influence on the neural plate in rock-dwelling species. Based upon manipulative experiments in zebrafish 48 , this is predicted to result in a relatively larger presumptive telencephalon and smaller eye field.
We developed a panel of rock-x sand-hybrid crosses to formally evaluate the role of irx1b in forebrain diversification. First, we used quantitative RT-PCR to measure allele-specific expression (ASE) in heterozygous rockx sand-F 2 hybrids, across the stages of gastrulation. We observed that the sand-irx1b allele was expressed at significantly higher levels (average of 2.5-fold; p = 4.5e−13; Student's t-test) and that this difference was largely confined to MG (Fig. 2B). Next, we used hybrid embryos to chart the development of the telencephalon and the eye field. Rock-x sand-F 2 hybrids, indexed for irx1b genotype, were raised to neurula and somitogenesis stages and we examined the expression of shh (which induces foxg1 and the ventral forebrain), foxg1 (a marker of the telencephalon), and rx3 (a marker of the eye field), by in situ hybridization. F 2 individuals homozygous for rock-irx1b alleles exhibited a larger and more rostral domain of shh expression, an earlier and larger domain of foxg1 and a smaller rx3 domain (Fig. 2C,D). These differences between rock-vs. sand-irx1b genotypes match expression divergence observed amongst rock-vs. sand-species 25,26 . Finally, when we compared the relative size of the telencephalon among irx1b genotypes, individuals homozygous for rock-alleles exhibited larger telencephala (p < 0.0001, Tukey's test, Supplementary Fig. S3). We conclude that genetic variants in and around the irx1b gene contribute to divergent specification of the Malawi cichlid forebrain, likely via spatial, temporal and quantitative variation in the expression of irx1b itself.
Our genome sequencing revealed a near-fixed InDel in the 3′ UTR of the Malawi cichlid irx1b gene (Supplementary Fig. S4). Rock-species possess an 85 bp insertion, compared to cichlid species from outside of the Malawi lineage. Sand-dwellers largely lack the insertion and exhibit a 6 bp deletion compared to outgroups. The insertion shows strong genetic similarity to a fragment from the Rex1 family of non-LTR retrotransposons 50 . Given the likelihood that Astatotilapia calliptera populations surrounding Lake Malawi may have seeded the Malawi evolutionary radiation and contributed to rock-and sand-dwelling lineages 21,32 , we explored the presence/ absence of this InDel in Astatotilapia samples. We found that most Astatotilapia individuals and populations had the rock-irx1b allele (the insertion), but that an individual from Chizumulu Island was fixed for the sandallele and two individuals sampled from Itupi were heterozygous. Because the 85 bp insertion in rock-species is a partial Rex1 fragment, and sand-species carry a 6 bp deletion compared to outgroups, we speculate that the current rock-and sand-divergent alleles were generated by at least two imperfect excision events of an element that invaded the genome of the Malawi + Astatotilapia ancestor. Rex1/Babar retrotransposons have been active www.nature.com/scientificreports/ in African cichlid genomes, and are known to influence gene expression when inserted in 5′ and 3′ UTRs 35 . Future experiments will determine whether this Rex1 insertion causes the differences in irx1b gene expression and forebrain specification we observed.

Genomics of divergent social challenge and opportunity. Rock-and sand-dwelling Malawi cichlids
live in strikingly different social and physical environments. Rock-dwelling males tend to be more aggressive than sand-dwellers 27 and defend territories year-round as sites for feeding and breeding. By contrast, sanddwellers are more exploratory than rock-dwellers 29 and only breeding males tend to be territorial, often building sand bowers to attract females and mitigate male-male aggression. Given these observations and genome-wide enrichment for categories related to adult behavior and social interaction, we designed an experimental paradigm to investigate brain gene expression profiles associated with divergent rock-vs. sand-social behaviors. We evaluated social challenge and opportunity amongst males using a large tank with a 'rock' habitat at one end and 'sand' at the other, separated by glass bottom (Fig. 3A). When parental rock-species are placed in this tank paradigm, males court females on the rock side of the tank. Males of sand-species court females over sand and construct species-appropriate bowers. When single hybrid rock-x sand-F 1 males are placed in this arena with hybrid F 1 females, males invariably court females over the 'rock' habitat. However, when two rock-x sandhybrid F 1 males (brothers) were allowed to compete for gravid hybrid F 1 females in this tank paradigm, we observed something different. One male, typically the larger, courted females over the rock habitat, and the other simultaneously constructed bowers to court females over the sand. We found no difference in gonadal-somatic index (GSI), an established biological metric of reproductive status and maturity 51 , between F 1 males behaving as 'socially rock' vs. 'socially sand. ' (Supplementary Fig. S5). Our observation of divergent behavior between F 1 brothers in the same tank suggests an interaction between the genome and the social environment.
We used RNA-seq to investigate gene expression profiles associated with behavior of rock-x sand-F 1 hybrid males that were actively courting females over rock vs. sand. Whole brains of F 1 males tested singly (n = 2 lone) as well as F 1 brothers assayed in dyads (n = 4 dyads) were collected during courtship, and interrogated by RNA-seq. Strikingly, gene expression profiles clustered not by fraternal relatedness, but rather by behavioral context (Fig. 3B). Males from dyads that courted females over rocks had expression profiles similar to single males (who also courted over rocks) but distinct from their brothers that built bowers and courted females over sand in the same tank. Genes were considered significantly differentially expressed between 'social rock' and 'social sand' brains if they exhibited both a fold change ≥ 2 and crossed the threshold of p adj < 0.05. Based on this criterion, we found 832 genes differentially expressed between rock-vs. sand-behaving males (Fig. 3B, Supplementary Table S3). Among differentially expressed genes, we observed significant functional enrichment for GO Biological Process categories 'synaptic signaling' (p < 2.3e−21), 'synaptic plasticity' (p < 3.6e−09), 'visual behavior' (p < 2.09e−06); mouse phenotypes 'abnormal learning/memory/conditioning' (p < 5.9e−07), 'abnormal telencephalon morphology' (p < 3.95e−07), 'abnormal spatial learning' (p < 9.9e−07) and pathways 'axon guidance' (p < 3.3e−05), 'oxytocin signaling' (p < 5.2e−05) and 'estrogen signaling' (p < 1.8e−04) (Supplementary Table S3). Matches against the Allen Brain Atlas database of gene expression yielded enrichment for sub-regions of the telencephalon: CA3, hippocampus (p adj = 4.6e−6), CA2, hippocampus (p adj = 7.3e−5), CA4, hippocampus (p adj = 0.001), claustrum (p adj = 0.001), subiculum (p adj = 0.003), dentate gyrus (p adj = 0.03) and the basomedial nucleus of the amygdala (p adj = 0.03). The hippocampus encodes episodic memory and spatial representations of the environment 52 , and more recently its subregions have been shown to play critical roles in anxiety, social interaction, and social memory formation [53][54][55] . Roughly 38% of differentially expressed genes also contained genetically differentiated SNP/InDels between rock-and sand-species (p-value < 2e−6, Fisher's exact test), implying considerable cis-acting genetic variation. Enrichment of numerous categories related to brain function and synaptic plasticity showed greater overlap than expected (Fig. 3C). These context-dependent differences suggest concerted changes in brain gene expression as males experienced and responded to different social challenges and opportunities 28,56 .

Discussion
A fundamental problem in evolutionary biology is understanding the cellular, developmental and genetic basis of how traits change. This is a challenge because we lack sufficient information about how genes work in outbred genomes from nature and we do not fully comprehend the causal role of noncoding variation in specifying form and function. This problem is especially difficult for traits that are only observed in particular contexts, like development and behaviour. To make progress, we and others have focused on study systems exhibiting abundant phenotypic diversity built from a relatively small number of genetic changes. Here we identify and characterize the genetic variants that demarcate one of the deepest evolutionary splits amongst Lake Malawi cichlid groups, that between rock-and sand-dwelling species thought to have diverged in the last one million years. We found a small percentage (less than 0.1%) of genetic variants to be differentiated between rock-and sand-groups, and that the majority of differentiated variants (> 96%) were noncoding. Differentiated non-coding variants were more likely to be in an evolutionarily conserved locus as a function of genetic differentiation, suggesting that divergent rock-vs. sand-noncoding changes are functional. To support this idea, we identified alternately fixed rock-vs. sand-noncoding variants within experimentally verified, vertebrate-conserved craniofacial and cranial neural crest cell enhancers. The latter observation is similar in type to the discovery of human-specific deletions within mammal-conserved regulatory sequence 18 .
Recently we surveyed genome-wide divergence between sand-dweller sub-groups that construct pit vs. castle bowers, sand-made structures to attract females for mating 28 . Mapping those variants to the same genome reference, we expected distinct patterns of diversification because rock-sand and pit-castle divergence likely occurred at different times, along different trait axes, under the control of different evolutionary forces 31 Fig. S6), while all chromosomes carry the signature of rock-sand diversification (Fig. 1B) Table S4). This result may imply that evolutionary diversification in Lake Malawi is limited, or constrained, by chromosomal location.
Overall, genes in proximity to rock-sand divergent variants were enriched for functional categories related to early forebrain and craniofacial development, neuronal function and social behavior. This list of variants, coupled with consistent patterns of functional and pathway enrichment, motivated follow up experiments focused on early brain development and adult social behavior (Fig. 4). It is apparent from our work here and previously 25,26 , www.nature.com/scientificreports/ that Malawi cichlid brains and nervous systems begin to differ during gastrulation in pathways that can be predicted from divergent genome sequences. This is interesting for at least two reasons. First, this observation runs counter to the 'late equals large' textbook example 57 of how brains evolve differences in relative proportions of their parts 26 , and along with other reports 58 suggests that variation in early brain development may be a significant factor in brain evolution. Overall, such early variation in development is not thought to be a driving force in evolution, precisely because early changes can have global and ramifying effects. Collectively, our findings provide a partial description of the conditions wherein variation during the earliest stages of development can contribute to evolutionary diversification. In each case we have examined, variation in gene expression is quantitative, heterochronic and limited to a precise stage or time period. Sydney Brenner recognized the relationship between the genetic specification of nervous systems and the behavioral output of the brain 59 . However, because these events take place so far apart in the lifespan of a vertebrate, they are rarely studied simultaneously. Here, the genome connects the two phenomena: rock-vs. sanddivergent gene sets and strong functional enrichment indicate that both brain development and social behavior have contributed to the evolutionary diversification of these groups. To evaluate social behavior in rock-vs. sand-dwelling Malawi cichlids, we constructed a social context arena. The presence of sand and simulated rocky caves was sufficient to elicit species-appropriate male behavior when rock-or sand-males were tested with rockor sand-gravid females. When rock-x sand-F 1 hybrid males were tested, one per tank, with F 1 hybrid females, males courted females in the rock quadrant of the tank. Notably, when dyads of F 1 brothers were tested in this tank paradigm with gravid F 1 females, we observed simultaneous 'social rock' and 'social sand' behavior. Brain gene expression profiles from behaving males clustered by social context (social rock vs social sand), and not by fraternal relationships. Differentially expressed genes were enriched for brain regions and pathways implicated in social interaction and overlapped significantly with rock-vs. sand-divergent genetic variants.
Social context is known to influence the brain. For instance, our clustering results are similar to those of Whitfield and colleagues (Whitfield et al. 2003) who showed that brain gene expression in honey bees was predictive of behavior. Likewise, changes in brain morphology and gene expression predictably accompany the ascent to dominance in the cichlid fish Astatotilapia burtoni 60 . Our data seem not to fit the model of dominant-subordinate however. In our experiments, the gonado-somatic index (GSI) did not differ between social rock vs. social sand brothers within dyads. Both males exhibited nuptial coloration, courted females and in cases with multiple gravid females, both brothers reproduced. Body size was associated with divergent social rock vs. social sand behavior of F 1 males; the social rock brother was always larger (mean mass was 26.96 g ± 3.4 [SE] compared to 19.45 g ± 2.2).
Our experiments demonstrate that F 1 hybrid male brains can express both social rock-and social sandbehavioral programs, and that social context determines which program is executed. This observation is similar to, but also different than, pit-digging × castle-building F 1 male Malawi cichlids who carry out parental bower behaviors in a specific sequence 28 . Notably, in both cases, the hybrid males exhibit one of the two different parental behaviors at any one time-there is no intermediate behavior. In the pit-vs. castle-case, we think that the bower structure itself and/or a threshold signal from females might lock the hybrid male brain into a behavioral state. In the rock-vs sand-case here, it appears that other social cues (i.e., the presence and size of a rival male) lock the hybrid male brain into a behavioral state. These context-dependent behaviors, accompanied by changes in brain gene expression, are compelling examples of interaction between the genome and the social environment. The cellular and genetic basis of these behaviors and their plasticity deserves further attention. Here, our www.nature.com/scientificreports/ comparative genomic and brain gene expression data, combined with enrichment testing and experimental approaches, highlight that the Malawi cichlid telencephalon will be central to this work. The telencephalon is similarly likely to be important for other Malawi cichlid social behaviors, including aggression and bower building 27,28 . Overall, our data add to the reappraisal of the telencephalon as an anatomical nexus of evolutionary innovation across vertebrate lineages 61 .

Materials and methods
Genome sequencing. We extracted genomic DNA, from fin clips of 22 male individuals 62 (Qiagen DNeasy, Cat #69504), from 8 rock dwelling and 14 sand dwelling Lake Malawi species (Supplementary Table S2) representing broad diversity across the rock and sand lineages in Lake Malawi. We made libraries using the Illumina Nextera Library prep kit and performed paired-end sequencing on the Illumina Hi-Seq 2500 at Georgia Tech. The Metriaclima zebra reference genome version MZ_UMD2a 34 was used for genome alignment, variant discovery and annotation using standard BWA 63 and GATK practices 64 . The maximum likelihood tree in Fig. 1A was constructed using SNPhylo 65 , from variant data.
Genetically divergent regions. Vcftools 66 was used to calculate F ST (-weir-fst-pop) between the 8 rock and 14 sand species excluding SNPs with less than half variants called across species (-max-missing 0.5). Variants with F ST = 1 were noted to be alternately fixed between rock and sand lineages in our dataset. F ST was also measured across 10 kb windows (-fst-window-size). Significance thresholds were marked using the fdrtool 67 package in R. All variants were annotated using Snpeff 4.3i 68 . We tested the genes within 25 kb of significantly differentiated variants for enrichment of functional categories. The cichlid gene names were converted to human analogs using Treefam based mapping 69 and functional enrichment was determined using the TOPPFUN webbrowser interface 70 .
PhastCons analysis. Pairwise alignments were generated using lastz v1.02 71  Vertebrate-conserved enhancer elements. A comparative genomic approach was used to identify putative craniofacial and neural crest CNEs in mammals that segregate SNPs between rock-sand cichlid species. Experimentally verified and published genome-wide craniofacial and neural crest enhancers active during early embryonic stages that play a role in shaping the development of neural crest and craniofacial structures in mammals were identified from published literature 43,44 . We used the liftOver tool 75 , which maps orthologous genomic regions between species to convert genomic coordinates from one species to another. Using a Human to Oreochromis niloticus to Metriaclima zebra mapping and a Mouse to Oreochromis niloticus to Metriaclima zebra mapping, we identified the orthologous genomic locations of the published craniofacial and neural crest enhancers in cichlids. We designated any alternately fixed variant (variant with F ST = 1) that was also within an orthologous CNE as putatively involved in the rock-sand divergence (Supplementary Table S2).
Brain region enrichment analysis. We identified 10,391 cichlid genes with human homologues and generated an expression matrix for each gene across 250 human brain structures spanning telencephalon, diencephalon, mesencephalon, and metencephalon using adult human brain microarray data collected by the Allen Brain Institute 37 . Cortical regions and gyri for which fish do not have putative homologues were excluded from the analysis (100/350, leaving 250 regions for subsequent analysis). The expression matrix was generated using the get_expression function in the ABAEnrichment Bioconductor package in R 76 . We then calculated the specificity of expression for each gene in each of these brain regions using the specificity.index function in the pSI package for R. This function calculates a matrix of gene expression specificity indices, and corresponding p-values, as described previously 77,78 . We then tested whether (1) genes within 25 kb of rock vs. sand significantly differentiated variants (described above under "Genetically Divergent Regions"), and (2) genes that were differentially expressed between rock-vs. sand-behaving F1 hybrid males, were enriched for transcriptional markers of specific brain regions using the fisher.iteration function with Benjamini-Hochberg correction, again using the pSI package for R. For enrichment testing of differentially expressed genes, we restricted analysis to genes that met the following criteria: (1) transcripts for the gene were detected in all eight paired behaving males, and (2) at least 6 transcripts were detected in each subject.
Staging during gastrulation. Cichlid gastrulation was split into three sub stages within the gastrula stage 9 79 . Gastrulation lasts 8 to 12 h, depending on the species, and is defined as after the shield (as described in www.nature.com/scientificreports/ zebrafish) stage until the presence of the first somite at the beginning of neurula (stage 10). Embryos were classified as early gastrula (EG) by an asymmetry in epiboly after shield stage until the formation of a ridge that is analogous to the anterior neural ridge (ANR) in chick and mouse and the anterior neural border (ANB) in zebrafish. At that point embryos were classified as mid gastrula (MG). MG lasts until the formation of the dorsal-ventral axis, defined by further lengthening of one side of the embryo, which begins to thicken as epiboly progresses. This is the dorsal side of the embryo, and the side opposite the ANR is classified the ventral side of the embryo. At this point the embryos are defined as late gastrula (LG).
LG ends with the specification of the neural plate, which appears as a portion of the dorsal embryo that is raised relative to ventral side, usually in line with the ANR.
Immunohistochemical staining. Embryos were harvested at 24 h post fertilization (hpf) from each of the rock-dwelling cichlids Metriaclima patricki, Metriaclima zebra and Labeotropheus fuelleborni as well as the sand-dwellers Aulonocara jacobfreibergi, Copadichromis virginalis, Copadichromis borleyi and Tramitichromis intermedius. The embryos were cultured until they reached gastrula stage, approximately 36 to 40 hpf, then fixed at intervals throughout gastrula until neurula. The embryos were then treated with auto-fluorescence reducer (1.55 mL 5 M NaCl, 250 uL Tris-HCl, pH 7.5, and 95 mg NaBH4) overnight, and 10% 2-mercaptoethanol for 1 h. Next, whole mount in situ hybridization was done, using a modification methods we published previously 80 .
irx1b was visualized using Fast Red (naphthol chromogen, Roche Diagnostics), which fluoresces at near red wavelengths (500-650 nm). After in situ hybridization, embryos were immunostained for pSMAD 1,5,8 protein, using published protocols 81 . Embryos were then bathed in Vectashield (Vector Labs) containing DAPI and placed in a specially built mold 49 that accommodates the large yolk and holds the embryo upright. Embryos were then scanned using a Zeiss LSM 700-405 confocal microscope and processed using LSM 700 software (Zeiss) and Image J 82 . For experiments in Fig. 2A, multiple embryos of multiple rock-and sand-species were examined (n = 20 embryos total); and in Fig. 2C,D, multiple embryos per gene, genotype and stage were used (n = 110 embryos total).

Rock-sand hybridization and genotyping. Two rock-sand crosses, one between Copadichromis borleyi
(CB, sand-dweller sire) and Metriaclima zebra (MZ, rock-dweller dam) and another between Mchenga conophoros (MC, sand-sire) and Petrotilapia sp. 'thick bar' (PT, rock-dam), were artificially generated by taking the eggs from the dam just prior to spawning and mixing with sperm from the sire. The resultant F 1 were grown in tanks and allowed to spawn normally to generate F 2 . Several F 2 broods were taken from multiple F 1 females for each cross, a total of 355 individuals for the CB × MZ cross and 608 for the MC × PT cross. The embryos were fixed at every stage starting at gastrula (stage 9) until early pharyngula (stage 14). The F 2 embryos were RNAextracted at stage 9. DNA extraction was performed by fixing the embryos (stage [11][12][13][14] in 70% ethanol, then removing the tail from each individual and extracting the DNA using an extraction kit (Qiagen). Following extraction, the F 2 embryos were genotyped using custom probes (CAA ATC TCCC[C/T]CCG CGG C, Taqman custom probes, Invitrogen) designed to identify a SNP in irx1b using RT-PCR. A subset of the embryos was also sequenced at a 900 bp interval around the irx1b SNP to verify the custom probes.
Quantitative F 2 analysis. We quantified irx1b in F 2 at stage 9 and separated by genotypic class. The 74 heterozygous rock X sand F 2 embryos were dissected to remove most of the yolk and the total RNA was extracted from each individual using an RNA Extraction Kit (Qiagen). The amount of mRNA specific to each allele of irx1b was quantified by using the RNA-to-Ct kit (Invitrogen) and the custom probes. The delta Ct for each heterozygote was generated with the equation, 2 (allele from dam − allele from sire) . We used a Student's t-test to compare the expression of rock-vs sand-alleles in heterozygous F 2 .
Forebrain and eye measurements. The telencephalon and forebrain were measured by integrating the area of transverse sections in embryos of rock-and sand-dweller cichlid species, using previously published methods 25 . The rock-dweller species included Cynotilapia afra (CA, planktivore), Labeotropheus fuelleborni (LF, algivore) and Metriaclima zebra (MZ, generalist); sand-dweller species included Aulonocara jacobfreibergi (AJ, 'sonar' hunter), Copadichromis borleyi (CB, planktivore) and Mchenga conophoros (MC, insectivore/generalist). Embryos from each species, as well as the F 2 individuals, were measured starting from the earliest the telencephalon can be differentiated from the forebrain (mid-somitogenesis, stage 12) and at each subsequent stage until the forebrain has defined prosomeres (early pharyngula, stage 14) 26 . To keep measurements standardized across stages, all measurements were defined by forebrain morphology at the earliest timepoint (stage 12). The 'eye' measurement remains consistent at all stages, the 'anterior' measurement includes the telencephalon and presumptive olfactory bulb, and the 'posterior' measurement includes the diencephalon and each of its constitutive prosomeres (dorsal and ventral thalamus and hypothalamus). To facilitate measurements, we used gene expression of rx3 (for stage 12 embryos) and pax6 (stage 13 and 14) to identify the different structures of the forebrain and eye.
RNA extraction and sequencing, adult social behavior. Two adult F 1 hybrid males from rock-x sandinterspecific crosses (Supplementary Table S5) were introduced to an assay tank containing 3-5 hybrid females of the same cross with simulated rock habitat on one side and simulated sand habitat on the other side separated by empty tank space (Fig. 3A). Males were observed over a period of four days for courtship behavior, either similar to a rock parent (courting females over the rocky parts of the tank), here designated "social rock"-or to a sand parent (bower building in the sand side of the tank while courting females), here designated "social sand". When courtship behavior of both males was observed, we primed experiments the next morning by resetting the www.nature.com/scientificreports/ tanks-knocking down or filling in the sand bower, and adjusting the rock habitat. Both males were allowed to exhibit courtship behavior over a period of 90 min. Males were sacrificed immediately after this 90-min period by rapid decapitation and whole brains were immediately stored in RNAlater (Thermo Fisher Cat# AM7020) within 20 min of decapitation. Two tanks with one F 1 male and 3-5 F 1 females were also observed. These males exhibited social rock behavior and were designated as "lone" in the behavior assay; their brains were collected in the exact same way as the other samples. Tissues were frozen in liquid nitrogen, homogenized using a mortar and pestle and placed in trizol. Following standard chloroform extraction, RNeasy mini columns (Qiagen Cat No./ID: 74104) were utilized to purify RNA for sequencing. Total RNA was quantified using Qubit (Molecular Probes) and quality analyzed using the Agilent 2100 Bioanalyzer System for RNA library preparation. RNA input was normalized to 1 µg and libraries were prepared using the TruSeq Stranded mRNA Sample Prep Kit (Illumina-Kit A). Libraries were again quantified, quality assessed, and normalized for sequencing on the HiSeq 2500 Illumina Sequencing System (Georgia Tech Genomics Core, standard practices). Experimental design and raw files can be accessed on the NCBI Gene Expression Omnibus database under the accession number GSE122500.
Differential gene expression analysis. Raw sequence reads from whole brain transcriptomes were quality controlled using the NGS QC Toolkit 83 . Raw reads with an average PHRED quality score below 20 were filtered out. Filtered reads were also trimmed of low-quality bases at the 3′ end. High quality sequence reads were aligned to the M. zebra reference genome MZ_UMD2a 34 using TopHat v2.0.9 84 . On average, across all samples, over 95% of reads mapped to the reference genome. The resulting TopHat2 output bam files were sorted and converted to sam files using samtools v0.19 85 . Sorted sam files were used as input for the HTSeq-count v0.6.1 program to obtain fragment counts for each locus 86 . Fragment counts were scale-normalized across all samples using the calcNormFactors function in the edgeR package v3.6.8 87 . Relative consistency among replicates and samples was determined via the Multidimensional scaling (MDS) feature within the edgeR package in R. The native R function hclust(dist) used to cluster samples. Scale-normalized fragment counts were converted into log 2 counts per million reads mapped (cpm) with precision weights using voom and fit to a linear model using limma v3.20.9 88 . Pairwise contrasts were constructed between socially rock and socially sand samples. After correcting for multiple comparisons using the Benjamini-Hochberg method 89 , genes were considered differentially expressed between socially rock and socially sand samples if they exhibited both a fold change ≥ 2 and P adj < 0.05. Using Treefam based mapping 69 the cichlid gene names were converted to human analogs and functional enrichment was determined using the TOPPFUN web-browser 70 .
Ethics statement. All of the animals were handled according to methods approved by the Georgia Tech Institutional Animal Care and Use Committee (IACUC) protocols (permit A100029) with every effort made to minimize suffering and is in compliance with the ARRIVE guidelines.