Epigenetic divergence during early stages of speciation in an African crater lake cichlid fish

Epigenetic variation can alter transcription and promote phenotypic divergence between populations facing different environmental challenges. Here, we assess the epigenetic basis of diversification during the early stages of speciation. Specifically, we focus on the extent and functional relevance of DNA methylome divergence in the very young radiation of Astatotilapia calliptera in crater Lake Masoko, southern Tanzania. Our study focuses on two lake ecomorphs that diverged approximately 1,000 years ago and a population in the nearby river from which they separated approximately 10,000 years ago. The two lake ecomorphs show no fixed genetic differentiation, yet are characterized by different morphologies, depth preferences and diets. We report extensive genome-wide methylome divergence between the two lake ecomorphs, and between the lake and river populations, linked to key biological processes and associated with altered transcriptional activity of ecologically relevant genes. Such genes differing between lake ecomorphs include those involved in steroid metabolism, hemoglobin composition and erythropoiesis, consistent with their divergent habitat occupancy. Using a common-garden experiment, we found that global methylation profiles are often rapidly remodeled across generations but ecomorph-specific differences can be inherited. Collectively, our study suggests an epigenetic contribution to the early stages of vertebrate speciation.

samples were sequenced for each population. There is clearly a trade-off between the number of CpGs analyzed and the statistical power to detect true DMRs, and choosing WGBS based on very low sample size over RRBS based on good sample size needs to be justified.
Lines 112-118: Using genetic divergence result from a previous study is OK, but it would be better to directly extract SNPs and estimate genetic divergence from bisulfite sequencing data. Also, I do not agree that a distant distance between HDR and DMR suggests no association. It could be due to trans-acting genetic variants controlling methylation variation, so the statement between these lines are not well supported, especially considering the association between SNPs and DMRs were not specifically tested in this study.
Lines 119-130: There is a flaw in the genomic feature analysis. The authors did not give a precedence when features were overlapped, so fig. 1i could be significantly biased due to repeatedly counting DMRs overlapping with multiple features.
Lines 143-144: Implying adaptation directly from the genomic locations of DMRs is not suitable, especially without measuring fitness. I would suggest the authors to soften language here.  fig. 2a, One of the littoral fish is clustered with riverine fish, but there is no explanation for this. I would exclude this fish in downstream analysis. In addition, there are two different read lengths (100 vs. 150 bp PE) used in RNA-seq. Is there any batch effect due to read length? What is the correlation between samples from the same ecomorph but were sequenced in different read length? Finally, the color spectrum in fig. 2b is qualitative and not very informative, it would be good to show absolute or scaled FC instead of 'lowest'/'highest'.
Lines 183-203: Do the genes, epoR, cHbB, Scl/Tal1, etc., have hypomethylated promoters? Because the authors emphasized the overrepresentation of DMRs in DEGs, I guess the examples in this paragraph are genes with decreased methylation in promoters but increased transcription activities, but I do not see any clear statement about such information. Also, it would be good to report or depict the correlation between methylation levels and transcription activities, either using genome-wide data or DMRs-DEG pairs.
Line 235: I do not understand the process to call 'reset DMRs' and 'fixed DMRs'. I believe some DMRs identified between riverine and benthic/littoral populations will have similar methylation level difference in common garden environment, which can be called as 'fixed', but how about methylated regions with decreased methylation difference between, say, benthic and riverine populations? Do you also consider them as 'fixed DMRs'? Finally, it seems that methylation levels also changed between wild and common garden environments within the same population, so even for some DMRs you called 'fixed' between populations, they can be 'reset' or 'plastic' within populations.
Extended Fig. 8: following my last comment, in wild samples, the variation among riverine samples is higher than littoral or benthic samples, but lower in common garden environments than littoral or benthic samples, suggesting a large number methylated CpGs were plastic or responsive to environmental transition, especially in riverine populations, which adds more complexity in defining reset and fixed DMRs. One possible way to identify transgenerationally stable CpGs is to compare methylation levels of F1 and F2 samples under common garden environment, similar to Heckwolf et al. 2020. Sci. Adv. andHu et al. 2021. Genetics. And to identify reset/plastic and fixed DMRs, it should be better to reciprocally transplant wild samples to opposite environments, e.g., transplant riverine samples to benthic or littoral environment, and identify CpG sites with altered methylation levels that are explained by ecomorph (fixed) or environments (reset/plastic).
Line 238: What is the biological meaning or explanations for reset DMRs having longer length than fixed DMRs? Are you suggesting longer length correlates with higher epimutation rate?
Line 253: While I agree that, to some extent, the fixed DMRs represent population-specific methylation patterns, because only liver tissue was used in methylation analysis, it is hard to say such patterns are also tissue-independent.
Line 355: Why sequencing WGBS samples in two different platforms? How did you incorporate and correct data from these two platforms?
Line 369: CpGs with only at least 4 reads to be included in downstream analysis seem less favorable due to low coverage. Can you provide justifications for such low depth cut-off? Also, the coverage for genome needs to be provided for WGBS.
Reviewer #2 (Remarks to the Author): In the current study Vernaz et al present compelling evidence that DNA methylation might act as a major driver of vertebrate speciation. The study is a timely follow up of authors' recent work, which put forward the exciting hypothesis that DNA methylation participated in the phenotypic diversification of lake Malawi cichlid fish. The study is well-written, the analyses are properly executed and overall, my comments for improvements are minor.
1) It is not clear from the current data how DMRs were associated to genes for the purpose of GO analyses. Did the authors employ the "nearest gene" approach or was more elaborate methodology used for this (such as GREAT for example)? 2) How many DNMTs / TETs were identified in cichlids? Given the enrichment of DMRs in CpG-rich regions, it would be interesting to explore if these differences could be caused by expression differences of DNA methylation machinery components? 3) Also, TET proteins which actively remove DNA methylation are oxygenases and thus require oxygen for their catalytic activity. Could the difference in oxygen concentration therefore act as a driver for such epigenetic diversification? i.e less, oxygen --> less TET activity --> less capacity for 5mC removal --> more hypermethylated regions? 4) Instead of speculating about the possibility of fixation of epigenetic differences, could the authors perform RRBS / WGBS on sperm (or eggs, but more likely sperm given the current literature) on fish form their "garden experiment"? Given the strong enrichment for CpG-rich regions even RRBS might do. Such an experiment would demonstrate the potential for inter-or trans-generational inheritance of acquired methylation differences. Even if only a small subset of DMRs is picked up in sperm, this would be an extremely valuable piece of data. 5) Reference 24 is wrong -the authors probably refer to Skvortsova et al, 2019Nature Comms, rather than Skvortsova et al 2018, where the inheritance of paternal methylome patterns in the zebrafish germline is not discussed.
Reviewer #3 (Remarks to the Author): NatEcolEvo-210814388-T This work by Vernaz and colleagues assess variation in DNA methylation in benthic versus littoral populations of A. calliptera cichlids from river and Lake populations. The role of such epigenetic variation in an ongoing sympatric speciation event like this is largely unexplored. This work assesses overall patterns of methylation, potential relationship to gene expression, and plastic potential in a common garden experiment. While a strong work, there are a number of instances of missing logic of analyses or weak explanations that would strengthen conclusions. These are needed to bring this work from statistical connections in large-scale genomic analyses to the biological relevance and importance for evolutionary processes as pitched.
Major comments: A. Biological relevance of methylation differences A1. The suggestion in the paper is that differences in metabolism due to methylation are driving benthic versus littoral adaptation. However, samples were taken from these 2 environments that are very different in terms of oxygen, etc, which will cause metabolism differences. Is it that methylation differences drive this habitat divergence or that methylation differences in the liver become distinct once animals diverge in habitat usage?
A2. L63 notes that system is addressing sympatric speciation. However, the populations used do not live in sympatry, but are already distinct in their habitat usage.
A3. The connection between differential methylation patterns and transcription factor binding was not clearly defined. Is the suggestion that differential methylation is happening in a way that only impacts certain TFs? How would this work biologically? Or is it that those genes active in the liver are enriched for those TF binding sites in their promoters? See also point C5.
A4. I love the common garden experiment as it begins to address considerations such as that in point A1. However, details of this seem unclear. Were the conditions in this experiment similar to any of the natural environments? For example, were the tanks similar in oxygen and temperature of the littoral, but not benthic fish? How would this bias analysis or alter conclusions? How long was the experiment conducted, what is the logic of that choice, what is the developmental age of the fish at start and end of the experiment, and how does that affect the data obtained? Again, is it that you are catching all the regions of the genome that are responsive to environmental changes or those that are driving habitat changes and speciation? Some of the analysis in this section also highlights limitations of drawing conclusions from large-scale enrichment analyses. For example, GO analysis shows enrichment for fixed DMRs in developmental genes, but genes associated with neuron projection, neural tube development, and mastoid bone, none of which make sense given the tissue analyzed was liver. While interesting, the section about effect of fixed DMGs and brain development in L250-261 thus reads as examples picked from a list rather than making a functional connection between liver activity and brain development.
B. Choice of samples B1. There was no explanation of why liver tissue was used for this analysis. What biases and limitations of conclusions does this introduce into the analysis?
B2. Also not discussed is why males only were used. What is the logic of choosing males with bright nuptial coloring, and could this add any bias to your analysis or our understanding of evolutionary processes?
C. Analyses C1. PCA of data sets is stated as having segregation in L78-79, but this is not supported with statistical analysis such as an ANOVA. Does hierarchical clustering of all regions, not just differentially with your specific cutoffs, also show segregation of populations? C2. What is the logic of cutoffs used for analysis? For example, differentially methylated regions were called based on >25% mCG difference and >4 CpG sites. What difference would using 5 sites make in the conclusions? Why 4 sites? This is stated as "significantly" different in L93 and a p-value is given, but there is not information on how this pvalue is calculated in methods L368-370.
C3. Assigning non-coding regions to gene function is a critical assumption that can bias downstream assessment of biological processes. What is the quality of the annotation of genes in the genome used for this (a critical factor for these analyses)? Why did you chose 1kb for a promoter versus 5kb? How were genes assigned to a specific gene for GO analysis? C4. Analysis for transcription factor binding motifs appears to only be conducted on DMRs in one particular region (outside the gene body). How does this compare to the groupings presented in Figure  1I? How do conclusions change when different genomic features are used? Is there a reason only 2 of 3 combinations are shown (river vs littoral is omitted in extended data Figure 3F)? C5. What is the directionality of relationships between DMRs and DEGs? Given that DNA methylation has a predictable effect on gene expression, are those regions with high methylation associated with low gene expression? This is explored for eklf and epoR in L183-203. However, the cHbB locus is exclusively expressed in benthic fish that hypermethylation, and not expressed in fish populations with low to no methylation. Thus, further explanation is needed to clarify clear how the biological function of methylation relates to gene expression. Without it, the text in L166-213 reads more like anything differentially expressed in the methylation data and transcriptional data were simply compared for overlap, regardless if the biology lines up.
Minor comments: D. Abbreviations. I encourage authors to critically evaluate if all abbreviations are necessary. Three specific examples, though I encourage a careful read for this throughout: (1) HDR and DMR get confusing for readers and are not always clearly defined in figure legends. (2) oCGI, pCGI, and LCR abbreviations are only used once, so there does not seem to be a utility to adding more abbreviations here. (3) GO categories MF, BP, CC, and KEGG in Figure 3 are never defined in the legend or the text.
Reviewer #4 (Remarks to the Author): In this paper, the authors have investigated divergence in DNA methylation pattern of the liver between littoral and benthic ecomorphs of an African crater lake cichlid fish. They found that these two ecomorphs diverge in DNA methylation patterns in several genes related to ecological adaptation and also report that some of the variations may be associated with gene expression variation. They also claim that the majority of the variations are reset during a common garden experiment, but some epigenetic variations are genetically determined.
I agree that the topics of this study, namely the role of DNA methylation in incipient speciation, is one of the important topics in evolutionary biology. However, I found several weaknesses in the paper.
First, they used only 2-3 wild-caught fish for the the whole genome bisulfite sequencing (WGBS). Although they used more individuals for reduced representation bisulfite sequencing (RRBS), the majority of the analysis are based on WGBS. I wonder whether they can validate this small sample size by comparing the results between WGBS and RRBS. Some of the RRBS sites should be included in WGBS data. I suggest that they should test how consistent the results of WGBS with N=2-3 and RRBS with N=11-12 are in such overlapping sites.
Second, I found no evidence for supporting that the DNA methylation differences they found are adaptive. I agree that DNA methylation differences in physiologically important genes are consistent with their claim, but they are not evidence for the adaptive roles of DNA methylation. Do the differentially methylated sites show any signatures of divergent selection? Third, they found that some of the DNA methylation differences detected in the wild-caught fish were reset in the lab-raised fish. However, I am afraid that this may simply result from the chance effect due to the small sample size of the wild fish (i.e., morph-specific DNA methylation in the wild fish may be an artifact).
Overall, I think that this paper tries to address an important topic, but the conclusions on the adaptive significance and the inheritance of DNA methylation are not very solid.
Additionally, given the fact that there are already several papers reporting DNA methylation differences between young species or ecomorphs in darter fishes (eg. Smith et al. 2016 Mol Ecol) and stickleback fishes (Hu et al 2021 Genetics), I have difficulty in finding novelty of the work.

Minor comments
How did you treat the sex? Is there any difference in sex difference in DNA methylation and gene expression? Sometimes, the sex explains the largest amount of gene expression variation. L181-182: Is there any correlation between the direction of DNA methylation and gene expression (i.e., high methylation is associated with low expression etc)? L408: Is it common to exclude the 1 kb downstream of TSS for defining the gene body? ******************* **Although we cannot publish your paper, it may be appropriate for another journal in the Nature Portfolio. If you wish to explore the journals and transfer your manuscript please use our <a href="https://mts-natecolevol.nature.com/cgibin/main.plex?el=A4Cn3FeF3A1EmB4X7A9ftdjXfwlz35q4KkId3135sAZ">manuscript transfer portal</a>. If you transfer to Nature journals or the Communications journals, you will not have to re-supply manuscript metadata and files. This link can only be used once and remains active until used.
All Nature Portfolio journals are editorially independent, and the decision on your manuscript will be taken by their editors. For more information, please see our <a href="http://www.nature.com/authors/author_resources/transfer_manuscripts.html?WT.mc_id=EMI_NP G_1511_AUTHORTRANSF&WT.ec_id=AUTHOR">manuscript transfer FAQ</a> page.
Note that any decision to opt in to In Review at the original journal is not sent to the receiving journal on transfer. You can opt in to <i><a href="https://www.nature.com/nature-research/for-authors/inreview">In Review</a></i> at receiving journals that support this service by choosing to modify your manuscript on transfer. In Review is available for primary research manuscript types only. ** For Nature Research general information and news for authors, see http://npg.nature.com/authors.

Reviewer #1 (Remarks to the Author):
Our response In this study, Vernaz et al. focused on the early stage of cichlid fish species radiation by analyzing DNA methylation and gene expression divergence between the ancestral riverine population and two derived lake populations. While the cichlid system is a well-studied system, results from this study add to the small body of studies focusing on the relationship between epigenetics and speciation, and provide some new insights into how an environmentally sensitive and rapid changing molecular mechanism contributes to the macroevolutionary event. That said, although the topic covered in this study is of immediate interest to the general community of ecology and evolution, I have both general comments on experimental design and specific comments on methods that prevent this manuscript to be published in its current shape. I would suggest the authors to thoroughly revise their MS before resubmission or submitting to a different journal.
We thank the reviewer for highlighting the timely importance and value of our work to ecology and evolution research community.
1. The inclusion of riverine population is a little bit weird. It makes reader confusing on the focus of this study: are you focusing on the speciation between riverine and lake populations, or are you focusing on the divergence Two major evolutionary steps characterise the Lake Masoko cichlid radiation: first, the colonisation of the lake by an ancestral riverine population, forming the littoral population. Second, the colonisation of the between the benthic and littoral ecomorphs? I think the authors may need to make the focus of this paper more clear.
benthic habitat by the shallow population, forming the benthic population. It is therefore highly relevant and important to include the ancestral riverine population (phenotype and genotype) to fully study this instance of speciation. We will revise the text to clarify the importance of studying the ancestral riverine population.
2. The combination of WGBS and RRBS is very confusing (also see my specific comments below). It seems that the data from RRBS were only used in one PCA, with its result quite different from the PCA from WGBS data.
RRBS enriches for certain CG-rich loci (limited overall genome coverage) but enables affordable populationscale methylome analysis of larger sample size. WGBS enables us to characterise and validate the methylome landscape genome-wide. In this case, both RRBS and WGBS datasets strongly indicate population-specific methylome divergence, reaching the same conclusion and adding support to our core hypothesis of functional epigenetic diversity during the speciation.
We will revise the text to clarify the rationale using these two complementary approaches. We will also undertake further cross-validation analyses of the RRBS and WGBS datasets (using DMRs) and quantify biologically relevant overlap.
3. Following my last comment, the methylation-gene expression correlation analysis also can be improved. The samples used in WGBS/RRBS were not the same samples or only a subset of samples used in RNA-seq. So, the discrepancy of samples used in methylation analysis vs. gene expression analysis can potentially bias the results.
Like any biological sampling of individuals in a population, we assume that each sample analysed here is representative of a broader population. Our analytical approach does not depend on the same samples being used. We will clarify this in the text.
4. The choice of male fish for this study needs more explanations. I can understand that methylation and gene expression are gender-specific, but to totally exclude females from analysis needs good justification. Is this because males show more phenotypic/physiological divergence between ectomorphs? Also, because it has been suggested that epigenetic patterns at early developmental stages can often reprogram to reflect the paternal state, identifying transgenerationally stable CpGs in male fish seems not unexpected but makes me wondering the magnitude of methylation stability in females.
There are three reasons we focus on males: 1) Practically, we are more confident in our assignment to male individuals to ecomorphs.
2) In our sampling, to test our hypotheses, we aimed to maximise between-population variability, while minimising within-population variability, within the constraints of resources available.
3) Epigenetic inheritance / reprogramming varies substantially among species (Potok et al. 2013;Skvortsova et al. 2019; Wang and Bhandari 2019), yet the nature of this in cichlid species was hitherto unknown. We surmised that if epigenetic inheritance is relevant to speciation, then it must be contributing to between-morph differences in traits possessed by both sexes (habitat occupancy, diet). Since we see differences in males, then the same may well apply to females, and this is a clear hypothesis that can be tackled with future work. We will clarify the rationale for our choice of samples to analyses in the revised text and highlight the limitations.

5.
A lot of test statistics are missing (e.g., lines 139-142), which needs to be improved.
Some full test statistic reports are missing, and we will ensure complete reporting in the revised manuscript.
Specific comments: Lines 37-38: Do you refer to methylation or expression variation between genes involved in key biological processes? This sentence sounds like you are talking about two things, with transcriptional changes in 'ecologically-relevant genes' as the result of 'methylome divergence between populations'? If so, I would suggest adding a logical link between the methylome variation part and the transcriptional change part in this sentence.
This sentence will be reformulated accordingly to better clarify the link between methylation and transcription.
Also, it seems some key information are missing in the Summary section, which are (1) a brief description of how the habitat differs between ecomorphs. It makes readers hard to understand how the genes and their related functions, e.g., steroid metabolism, erythropoiesis, are ecologically relevant to speciation without a context of environmental difference between habitats.
(2) Definition of early stage speciation. Why are you interested in early stage but not other stages, and how do you know the ectomorphs were at the early stage of speciation?
We will amend the abstract and introduction to more comprehensively describe key (published) background information related to ecomorph habitats and the evidence that the ecomorphs are at the early stages of speciation.
We are interested in these early stages as this is the most challenging phase to study. It is easy to identify long-diverged species pairs, but identifying ongoing divergence where we know the precise ecological and historical context of the system is extremely rare. We have "caught" these populations undergoing speciation, and that is what excites us (and many other researchers who cite our work).
Line 48: I agree that the role of epigenetic processes in speciation is less understood, but there are some recent studies focusing on epigenetic difference between ecotypes/populations in a wide range of taxa Nat. Ecol. Evol.). So, I would suggest the authors to cite at least the above papers to acknowledge the recent advances, and emphasize more on the knowledge gap of the role of epigenetic processes in early stage of speciation.
We will cite those references.
Lines 55-58: While Fst is one of the key indicators of divergence or speciation stage, more results such as gene flow may better support the statement of 'early stages of divergence'. In addition, the relatively low overall genetic divergence can be caused by relatively Our evidence of the nature and timescale of divergence does not solely rely on FST. Key evidence is demographic modelling by Malinsky et al. (2015), which shows ecomorph divergence 500-1000 years ago (200-350 generations). Moreover, more recent work has high gene flow between ectomorphs, so it may be more appropriate to report gene flow levels rather than Fst.
revealed the presence of 'hybrid' individuals of varying levels of admixture, demonstrating incomplete reproductive isolation (Munby et al. 2021, preprint). We will describe this evidence consistent with the early stage of speciation in a revision.
Lines 76-77: It is evident that the mapping rate is different between RRBS and WGBS. It would be good to have some explanations for this difference, and test for if such difference have an impact on downstream analysis.
Differences in mapping rates between RRBS vs WGBS datasets are expected and derive from the distinct techniques used: the length of sequencing reads (single end 75bp vs paired end 150 bp reads) and CG-rich DNA fragment bias of RRBS (Krueger and Andrews 2011). The mapping efficiencies in our analysis (~80%/~62% for RRBS/WGBS respectively) are comparable to other published methylome analyses. Such differences have limited or null impact on downstream analyses, as we have stringent quality and coverage filters in place to avoid any technical biases. We will provide the technical explanations for the differences in mapping rates in a supplementary text in the revision, citing relevant studies.
In addition, the choice of liver tissue needs to be justified. Liver is a key organ involved in dietary metabolism and hormone production, and is involved in haematopoiesis. We know that the ecomorphs differ in their diet ( Fig.1d and (Malinsky et al. 2015) and live in habitats substantially differing in oxygen levels (Fig.1b). Therefore, liver is a highly relevant organ to investigate the role of epigenetic in phenotypic diversification. Moreover, the liver is a largely homogenous organ, composed at >70% of hepatocytes, which limits cofounding variation associated with heterogenous cell populations when performing methylation studies. In the revision we will explain the rationale behind the use of liver tissue.
fig. 1d: Maybe I am missing something, but I do not find any text related to fig. 1d. This panel suggests that there is a clear difference between food choice between littoral or river population vs. benthic population, but no difference between littoral and river populations. However, this panel is somewhat contradicted to the PCA results in fig. 1e-f, where the three populations are separated. In addition, data used for fig. 1d were generated from muscle tissue, which is different from the tissue used for RRBS and WGBS, so it is hard to crossvalidate or combine results between different panel. Figure 1D shows differences in stable isotope ratios, that provide evidence for different food source/diets. We will describe the differences more clearly in the manuscript.
Muscle tissue is commonly used as a time-integrated indicator of the diet of the fish as whole, and it is simply used here to illustrate the contrasting diets of the three focal populations to provide the reader with ecological context. Our analyses do not depend on the usage of the same tissue as in the RRBS/WGBS samples. We will clarify this point in the revision.
For fig. 1e-f: It would be good to provide explanations for why PC1 and PC2 explain different variation in the same tissue between RRBS and WGBS.
The two PCA plots (Fig 1e and 1f) reveal remarkably similar methylome patterns of variation, given that we are dealing with two very different datasets. Roughly 24 million CpG sites are analysed in the case of WGBS, which is reduced to 3-4 million in the case of RRBS. In both datasets, the major source of variation is among different populations [PC1]. As suggested already above, in the revision we will explain the main technical differences between RRBS/WGBS, including references.
Lines 103-105: It is quite confusing why WGBS data were chosen for DMR analysis, considering only 2-3 samples were sequenced for each population. There is clearly a trade-off between the number of CpGs analyzed and the statistical power to detect true DMRs, and choosing WGBS based on very low sample size over RRBS based on good sample size needs to be justified.
In the revised manuscript we will clearly mention the rationale behind the use of only the WGBS dataset for DMR prediction. Both RRBS and WGBS datasets show the same pattern of methylation variation segregating primarily among populations. Therefore, we chose to focus on WGBS as it enables DMR prediction genomewide, without the length or DNA fragment biases that are associated with RRBS. Although DMRs predicted using the RRBS dataset will be based on a bigger sample size, only ~10% of the genome will be analysed compared to WGBS, which we believe would be a substantive limitation for our analysis.
Nevertheless, to address this point and clearly highlight the high level of similarity between the two datasets, we will expand on the DMR analysis, and test if the same DMRs are found using WGBS and RRBS. Based on results of previous analyses, we expect to be able to show DMRs found using RRBS are as present in the WGBS dataset, justifying our choice of using WGBSderived DMRs.
Lines 112-118: Using genetic divergence result from a previous study is OK, but it would be better to directly extract SNPs and estimate genetic divergence from bisulfite sequencing data. Also, I do not agree that a distant distance between HDR and DMR suggests no association. It could be due to trans-acting genetic variants controlling methylation variation, so the statement between these lines are not well supported, especially considering the association between SNPs and DMRs were not specifically tested in this study.
In the revision we will explain that our analysis cannot exclude trans-acting effects of SNPs on methylation variation. Nevertheless, our analysis suggests that, locally, DMRs are not a key force driving HDRs -which we believe is an important result.
Lines 119-130: There is a flaw in the genomic feature analysis. The authors did not give a precedence when features were overlapped, so fig. 1i could be significantly biased due to repeatedly counting DMRs overlapping with multiple features.
Genes were counted only once even when associated with multiple DMRs (per genomic features). We will clarify this point in the method section. This is not a flaw, but key detail regarding our methodology was missing.
Lines 143-144: Implying adaptation directly from the genomic locations of DMRs is not suitable, especially without measuring fitness. I would suggest the authors to soften language here.
We agree with the reviewer and will rephrase the sentence, removing the term adaptation. We will include a PCA plot to better visualise overall RNAseq variation in two dimensions (rather than a single dimension of hierarchical clustering). We have confirmed an absence of confounding variation due to sequencing read length and batch effects. We will length? What is the correlation between samples from the same ecomorph but were sequenced in different read length? Finally, the color spectrum in fig. 2b is qualitative and not very informative, it would be good to show absolute or scaled FC instead of 'lowest'/'highest'.
include such graphs and discuss the associated results in supporting text. The colour gradient in Fig 2c was incorrectly labelled -it represents Z-score associated with gene expression (i.e., scaled gene expression for each gene). This will be corrected.
Lines 183-203: Do the genes, epoR, cHbB, Scl/Tal1, etc., have hypomethylated promoters? Because the authors emphasized the overrepresentation of DMRs in DEGs, I guess the examples in this paragraph are genes with decreased methylation in promoters but increased transcription activities, but I do not see any clear statement about such information. Also, it would be good to report or depict the correlation between methylation levels and transcription activities, either using genomewide data or DMRs-DEG pairs.
We will include a plot showing the correlation between methylation and transcription, and clarify in the text the interpretation of this result.
Line 235: I do not understand the process to call 'reset DMRs' and 'fixed DMRs'. I believe some DMRs identified between riverine and benthic/littoral populations will have similar methylation level difference in common garden environment, which can be called as 'fixed', but how about methylated regions with decreased methylation difference between, say, benthic and riverine populations? Do you also consider them as 'fixed DMRs'? Finally, it seems that methylation levels also changed between wild and common garden environments within the same population, so even for some DMRs you called 'fixed' between populations, they can be 'reset' or 'plastic' within populations.
We will clarify the text explaining differences between fixed and reset methylome patterns.
We will more clearly emphasise how methylation is typically plastic (reprogrammed) within populations, as we see in the transition from wild and common garden conditions in both Lake Masoko populations.
Extended Fig. 8: following my last comment, in wild samples, the variation among riverine samples is higher than littoral or benthic samples, but lower in common garden environments than littoral or benthic samples, suggesting a large number methylated CpGs were plastic or responsive to environmental transition, especially in riverine populations, which adds more complexity in defining reset and fixed DMRs. And to identify reset/plastic and fixed DMRs, it should be better to reciprocally transplant wild samples to opposite environments, e.g., transplant riverine samples to benthic or littoral environment, and identify CpG sites with altered methylation levels that are explained by ecomorph (fixed) or environments (reset/plastic).
We found that from our common garden experiment that within one generation methylome variation in Lake Masoko ecomorphs is drastically reprogrammed compared to river methylome patterns. Therefore, it is the methylome of river fish that appears most robust to environmental perturbation (see PCA Ext Dat Fig 8c  and heatmap Fig 3a), We will rewrite the text reporting this result to better highlight how the methylome of Lake Masoko ecomorphs is largely reprogrammed to resemble the ancestral-related river methylome patterns.
We agree with the reviewer that our work lays groundwork for further multigenerational experimentation. The reciprocal transplant experiment proposed would be technically and ethically challenging. Although this further work would be highly informative, it is beyond the scope of this current paper.
Line 238: What is the biological meaning or explanations for reset DMRs having longer length than fixed DMRs? Are you suggesting longer length correlates with higher epimutation rate?
That the two DMR classes are characterised by a least one distinct property (length) implies there may be biological/functional differences associated with them. Further work will be required to characterise any differences. We do not make any suggestion of association between DMR class and epimutation rates.
Line 253: While I agree that, to some extent, the fixed DMRs represent population-specific methylation patterns, because only liver tissue was used in methylation analysis, it is hard to say such patterns are also tissueindependent.
We agree with the reviewer and will rewrite the sentence.
Line 355: Why sequencing WGBS samples in two different platforms? How did you incorporate and correct data from these two platforms?
Samples were sequenced on different platforms due to logistic reasons (discontinuation of the first machine used), however the technology remains the same (Illumina HiSeq4000 and NovaSeq). Only one wild shallow and one wild benthic were sequenced with NovaSeq (on the same flow cell to minimise technical variation). We could not identify any evident bias in methylome variation associated with the two Illumina machines used, using PCA or hierarchical clustering analyses (see Fig.1f and Ext Data Fig 8c). We will specifically discuss the difference in platform used in supplementary information, and clearly label samples sequenced on NovaSeq in supplementary figures.
Line 369: CpGs with only at least 4 reads to be included in downstream analysis seem less favorable due to low coverage. Can you provide justifications for such low depth cut-off? Also, the coverage for genome needs to be provided for WGBS.
We used 4 as the minimum number of sequence read (coverage) required at any single CpG site -which is a widely used threshold for most methylome analyses. However, the median coverage at any single CpG was ~8-14x for all WGBS samples, and it is around 40x for RRBS samples. These represent excellent coverages, for WGBS in particular, that are rarely seen in methylome studies. In the revision we will include a graph showing CpG coverage for all WGBS and RRBS samples. We will cite other methylome studies for comparisons.

Reviewer #2 (Remarks to the Author): Our response
In the current study Vernaz et al present compelling evidence that DNA methylation might act as a major driver of vertebrate speciation. The study is a timely follow up of authors' recent work, which put forward the exciting hypothesis that DNA methylation participated in the phenotypic diversification of lake Malawi cichlid fish. The study is well-written, the analyses are properly executed and overall, my comments for improvements are minor.
We thank the reviewer for their positive overall comments and for highlighting the quality and timely importance of our work.
1) It is not clear from the current data how DMRs were associated to genes for the purpose of GO analyses. Did the authors employ the "nearest gene" approach or was more elaborate methodology used for this (such as GREAT for example)?
The method section will be modified to include this detail. In short, DMRs were associated with genes when located in their promoter regions (TSS ± 1kbp), gene body or intergenic regions (1-5kbp away from closest genes).
2) How many DNMTs / TETs were identified in cichlids? Given the enrichment of DMRs in CpG-rich regions, it would be interesting to explore if these differences could be caused by expression differences of DNA methylation machinery components?
We believe it would be difficult to link any new cichlidspecific copies of DNMT/TETs to the specific DNA methylation patterns we observe. Previous work in mammals have highlighted that duplication of certain DNMT3 enzyme (for example DNMT3C) could be associated with targeted methylation of certain classes of transposable elements (TE), consisting of an arm race between host DNMTs and TE sequence evolution (see Barau et al. 2016). However, the methylome divergence we observed between Lake Masoko cichlids is not exclusively localised in one genomic feature (such as cichlid-specific TEs for example), therefore we do not believe this can result from novel DNMT/TET enzymes. From our previous work , Lake Malawi cichlids are believed to have one copy of each of the three Tet enzymes, one copy of DNMT1 and multiple copies of DNMT3s, just like in zebrafish (Campos, Valente, and Fernandes 2012;Seritrakul and Gross 2014;Shimoda et al. 2005; Smith, Collins, and McGowan 2011) -however we have not characterised or identified new DNMTs/TETs in Lake Masoko cichlids. This could be the subject of further work, but would require extensive RNAseq, proteomics analyses in different tissues, together with full genomic scan, in addition to lab experiments to elucidate the sequence-specificity of such proteins, which we believe is beyond the scope and the main hypothesis of this manuscript.
3) Also, TET proteins which actively remove DNA methylation are oxygenases and thus require oxygen for their catalytic activity. Could the difference in oxygen concentration therefore act as a driver for such epigenetic diversification? i.e less, oxygen --> less TET activity --> less capacity for 5mC removal --> more hypermethylated This hypothesis is interesting, but our current work is not able to address this question and it is beyond the scope of this study. Nonetheless, it is important to note that we observe a widespread increase in methylation in both the shallow and benthic populations (Fig. 1G). This is despite differences in environmental oxygen levels (the benthic habitat of the lake is almost anoxic; the shallow habitat regions? remains well oxygenated; Fig.1D). Therefore, blood oxygen levels might be sufficient in both ecomorphs for an efficient TET enzyme activity.

4)
Instead of speculating about the possibility of fixation of epigenetic differences, could the authors perform RRBS / WGBS on sperm (or eggs, but more likely sperm given the current literature) on fish form their "garden experiment"? Given the strong enrichment for CpG-rich regions even RRBS might do. Such an experiment would demonstrate the potential for inter-or trans-generational inheritance of acquired methylation differences. Even if only a small subset of DMRs is picked up in sperm, this would be an extremely valuable piece of data.
We agree with the reviewers that our study also lays the groundwork for further experiments, especially regarding the assessment of the multigenerational inheritance of methylome patterns. Our lab is currently initating work on this aspect, but such experiments represent a significant amount (several years) of work, and are beyond the scope of this current paper. This will be corrected.

Reviewer #3 (Remarks to the Author): Our response:
This work by Vernaz and colleagues assess variation in DNA methylation in benthic versus littoral populations of A. calliptera cichlids from river and Lake populations. The role of such epigenetic variation in an ongoing sympatric speciation event like this is largely unexplored. This work assesses overall patterns of methylation, potential relationship to gene expression, and plastic potential in a common garden experiment. While a strong work, there are a number of instances of missing logic of analyses or weak explanations that would strengthen conclusions. These are needed to bring this work from statistical connections in large-scale genomic analyses to the biological relevance and importance for evolutionary processes as pitched.
We are pleased the reviewer recognises the novelty of the work, and we would be pleased to clarify the sections of the paper the reviewer highlights.
Major comments: A. Biological relevance of methylation differences We agree that we could be clearer on when methylome divergence takes place and the relevance to speciation, and we will expand and discuss this point further in the A1. The suggestion in the paper is that differences in metabolism due to methylation are driving benthic versus littoral adaptation. However, samples were taken from these 2 environments that are very different in terms of oxygen, etc, which will cause metabolism differences. Is it that methylation differences drive this habitat divergence or that methylation differences in the liver become distinct once animals diverge in habitat usage? revised manuscript.
Our hypothesis is that: i) methylation differences occur after the fish arrive in the specific habitats (e.g., deep water). They may have low initial fitness, which does not matter initially given the abundance of food and lack of competition. After a period of acclimation due in part to DNA methylation affecting expression of key ecologically important genes, they are able to thrive. The positive traits are present in offspring, through inheritance, and through exposure to the same environment from young.
ii) any new coloniser from the shallow environment will now be outcompeted in the deep water, and relative fitness will matter now due to competition. Equally, a fish of a deep-water origin will struggle in shallow water environment because they will not have been reared in such shallow environment, and/or possess the beneficial epialleles.
iii) This process becomes amplified when genetic variants start to become fixed. This is important to discuss in the paper, however it does not highlight any flaws in our analysis or question our conclusions.
A2. L63 notes that system is addressing sympatric speciation. However, the populations used do not live in sympatry, but are already distinct in their habitat usage.
The exact definition of sympatric speciation is highly debated. We will replace any reference to "sympatric speciation" with "ecological speciation".
A3. The connection between differential methylation patterns and transcription factor binding was not clearly defined. Is the suggestion that differential methylation is happening in a way that only impacts certain TFs? How would this work biologically? Or is it that those genes active in the liver are enriched for those TF binding sites in their promoters? See also point C5.
In the revised manuscript we will expand on the relationship between DNA methylation and binding affinity of transcription factor in the text, and further discuss how changes in DNA methylation can impact binding affinity of certain TFs (using examples we show in the manuscript). We will also discuss how changes in TF activity can also result in methylation changes, adding relevant references.
A4. I love the common garden experiment as it begins to address considerations such as that in point A1. However, details of this seem unclear. Were the conditions in this experiment similar to any of the natural environments? For example, were the tanks similar in oxygen and temperature of the littoral, but not benthic fish? How would this bias analysis or alter conclusions? How long was the experiment conducted, what is the logic of that choice, what is the developmental age of the fish at start and end of the experiment, and how does that affect the data obtained? Again, is it that you are catching all the regions of the genome that are responsive to environmental changes or those that are driving habitat changes and speciation? Some of the analysis in this section also highlights limitations of drawing conclusions from large-scale enrichment analyses. For example, GO analysis shows enrichment for fixed DMRs in developmental genes, but genes associated with neuron We thank the reviewer for the positive comment. We will include all the requested experimental detail in the methods, including a rationale for the design. In brief, to minimise confounding variation, all experiments involving common garden fish were performed in the same laboratory conditions, at the same time and for the same duration. Fish were euthanised at the same biological stages (i.e., adult males).
With regard to GO analysis, we have previously shown that fully differentiated organs, such as liver or muscle, can harbour epigenetic memories of early life developmental stages, carried over across cell divisions ; see also Bogdanović et al. 2016). Approximately half of liver methylome divergence among Lake Malawi cichlid fishes has been associated with genes with functions related to embryonic and projection, neural tube development, and mastoid bone, none of which make sense given the tissue analyzed was liver. While interesting, the section about effect of fixed DMGs and brain development in L250-261 thus reads as examples picked from a list rather than making a functional connection between liver activity and brain development.
developmental processes. While possibly functionally irrelevant in fully differentiated organs, such as the liver, it reflects that methylome divergence can be associated with developmental processes, possibly participating in core phenotypic diversification (Bogdanović et al. 2016;Vernaz et al. 2021).
We will expand on those points in the text and include relevant references. Moreover, we will focus on the gene groups that are linked functionally to liver functions.
B. Choice of samples B1. There was no explanation of why liver tissue was used for this analysis. What biases and limitations of conclusions does this introduce into the analysis?
Reviewer 1 made the same point, that we have addressed above.
In the revision we will explain the rationale behind the use of liver tissue, and highlight the limitations.
B2. Also not discussed is why males only were used. What is the logic of choosing males with bright nuptial coloring, and could this add any bias to your analysis or our understanding of evolutionary processes?
Reviewer 1 made the same point, that we have addressed above.
We will clarify the rationale for our choice of samples to analyses in the revised text, and highlight the limitations.
C1. PCA of data sets is stated as having segregation in L78-79, but this is not supported with statistical analysis such as an ANOVA. Does hierarchical clustering of all regions, not just differentially with your specific cutoffs, also show segregation of populations?
We will formally test for differences between groups using ANOVA and PC1 and PC2 as response variables.
Regarding the second point, it is important to note that PCAs (Fig.1e, f) were performed in an unbiased manner, using all mapped data. Specifically, Fig.1f shows the variance in methylome across all mapped CG sites genome-wide and shows clear segregation of populations.
C2. What is the logic of cutoffs used for analysis? For example, differentially methylated regions were called based on >25% mCG difference and >4 CpG sites. What difference would using 5 sites make in the conclusions? Why 4 sites? This is stated as "significantly" different in L93 and a p-value is given, but there is not information on how this pvalue is calculated in methods L368-37.
Reviewer 2 made a similar point. Those parameters were based on previous methylome studies and are recognised as being the ideal/default options to reliably identify DMRs (Krueger and Andrews 2011;Wu et al. 2015). Again, those are minimum cut-offs, but most DMRs consist of >4 CpG sites (median 15) and show larger differences (median difference of ~45% ; see Ext Data Fig 3C).
We will add information on statistic tests performed to identify significant DMRs in the main text and methods -DMR prediction and associated statistics were performed following a well-established and recognised pipeline (Wu et al. 2015). We will also include graphs plotting CpG site count against read coverage for all DMRs predicted between the ecomorphs.
C3. Assigning non-coding regions to gene function is a critical assumption that can bias downstream assessment of biological processes. What is the quality of the Annotation of the Maylandia zebra reference genome is of high quality, and we used the latest annotation file. We cautiously defined promoter regions (1kbp around TSS), annotation of genes in the genome used for this (a critical factor for these analyses)? Why did you chose 1kb for a promoter versus 5kb? How were genes assigned to a specific gene for GO analysis?
based on published literature, as we lacked any other annotation files (such as histone marks). We appreciate the limitations of this, as promoters can sometimes expand beyond this 1kbp region. We will also run a sensitivity analysis to see how results change when increasing to 5kbp.
To assign a gene to a DMR in intergenic regions, we use the closest-to-gene approach, as used in most methylome studies. We were cautious, however, only assigning genes to DMRs within 5kbp. We will include a detailed rationale in the Methods, citing other methylome work.
C4. Analysis for transcription factor binding motifs appears to only be conducted on DMRs in one particular region (outside the gene body). How does this compare to the groupings presented in Figure 1I? How do conclusions change when different genomic features are used? Is there a reason only 2 of 3 combinations are shown (river vs littoral is omitted in extended data Figure  3F)?
We will repeat this analysis for gene bodies as well, enabling a more complete comparison. The text will be reformulated accordingly.
C5. What is the directionality of relationships between DMRs and DEGs? Given that DNA methylation has a predictable effect on gene expression, are those regions with high methylation associated with low gene expression? This is explored for eklf and epoR in L183-203. However, the cHbB locus is exclusively expressed in benthic fish that hypermethylation, and not expressed in fish populations with low to no methylation. Thus, further explanation is needed to clarify clear how the biological function of methylation relates to gene expression. Without it, the text in L166-213 reads more like anything differentially expressed in the methylation data and transcriptional data were simply compared for overlap, regardless if the biology lines up.
We will discuss this link in detail and include a graph on how expression and methylation covary along the gene (see Fig.1e in Vernaz et al. 2021). In brief, we observe a negative correlation between methylation at promoters and gene expression, however this link is not always obvious/predictable. We will briefly discuss the reasons for varying responses of gene expression to methylation in the revised manuscript.
Minor comments: D. Abbreviations. I encourage authors to critically evaluate if all abbreviations are necessary. Three specific examples, though I encourage a careful read for this throughout: (1) HDR and DMR get confusing for readers and are not always clearly defined in figure legends. (2) oCGI, pCGI, and LCR abbreviations are only used once, so there does not seem to be a utility to adding more abbreviations here.
(3) GO categories MF, BP, CC, and KEGG in Figure 3 are never defined in the legend or the text.
We will limit the use of abbreviations throughout the text.

Reviewer #4 (Remarks to the Author): Our response:
In this paper, the authors have investigated divergence in DNA methylation pattern of the liver between littoral and benthic ecomorphs of an African crater lake cichlid fish. They found that these two ecomorphs diverge in DNA methylation patterns in several genes related to ecological adaptation and also report that some of the variations may be associated with gene expression variation. They also claim that the majority of the variations are reset during a common garden experiment, but some epigenetic variations are genetically determined. I agree that the topics of this study, namely the role of DNA methylation in incipient speciation, is one of the important topics in evolutionary biology. However, I found several weaknesses in the paper.
We are pleased that the reviewer recognises the importance of the work, and will be pleased to address the points made in a revision.
First, they used only 2-3 wild-caught fish for the the whole genome bisulfite sequencing (WGBS). Although they used more individuals for reduced representation bisulfite sequencing (RRBS), the majority of the analysis are based on WGBS. I wonder whether they can validate this small sample size by comparing the results between WGBS and RRBS. Some of the RRBS sites should be included in WGBS data. I suggest that they should test how consistent the results of WGBS with N=2-3 and RRBS with N=11-12 are in such overlapping sites.
Reviewer 1 made a similar point, that we respond to in full above. In brief, we will re-analyse both WGBS and RRBS to test if the same DMRs are found using WGBS and RRBS. This will validate the use of the WGBS dataset for subsequent analyses. We believe this point will strengthen the conclusions of the paper.
Second, I found no evidence for supporting that the DNA methylation differences they found are adaptive. I agree that DNA methylation differences in physiologically important genes are consistent with their claim, but they are not evidence for the adaptive roles of DNA methylation. Do the differentially methylated sites show any signatures of divergent selection?
Reviewer 1 made a similar point, and we agree. We cannot unequivocally conclude the methylome differences we observe are adaptive, without preforming fitness experiments, which would be beyond the scope of this study. We will modify the text to clarify this point.
Third, they found that some of the DNA methylation differences detected in the wild-caught fish were reset in the lab-raised fish. However, I am afraid that this may simply result from the chance effect due to the small sample size of the wild fish (i.e., morph-specific DNA methylation in the wild fish may be an artifact).
With respect, we do not agree with the reviewer's comment here. The suggestion that ecomorph-specific DNA methylation differences in wild fish are an artefact is NOT supported by the RRBS dataset (Fig.1e). This is the reason why we included both the WGBS and RRBS dataset (genome-wide and population levels).
We will reanalyse the RRBS and WGBS datasets to cross-validate the relevance of using wild WGBS data for the common garden experiment. Specifically, we will assess the numbers of wild RRBS-DMRs that are reprogrammed/fixed in WGBS data from common garden reared fish. We expect to see similar amount of reset/reprogrammed wild DMRs as we have seen in the WGBS DMRs.
Overall, I think that this paper tries to address an important topic, but the conclusions on the adaptive significance and the inheritance of DNA methylation are not very solid.
The adaptive significance is not tested in our study, and we do not claim that it has been tested. For that we would need to associate patterns of DNA methylation with differential fitness of individuals across different habitats.
Here we show a) patterns of DNA methylation do differ between environments, b) those differences are associated with gene expression and c) there is evidence for cross-generational inheritance of epigenetic markers.
Each of these patterns is known from other systems, but we are unaware of any other study that has included them collectively in a speciation context. Our evidence is collectively supportive of a role for epigenetic divergence in speciation, but we agree the direct association between epigenomic variation and fitness needs to be studied, and that will be highlighted in the revised discussion.
Additionally, given the fact that there are already several papers reporting DNA methylation differences between young species or ecomorphs in darter fishes (eg. Smith et al. 2016 Mol Ecol) and stickleback fishes (Hu et al 2021 Genetics), I have difficulty in finding novelty of the work.
There are striking differences between our work and that highlighted by the reviewer. The darter study used MSAP loci (a rather antiquated and imprecise method) and did not consider the variation in relation to gene expression or inheritance. It focussed on allopatric methylome variation in one species (which might have nothing to do with speciation) and variation among long-diverged species pairs (3-18Mya).
The stickleback study focussed exclusively on RRBS data. It showed evidence of divergence of allopatric marine and freshwater ecotypes, and cross-generation inheritance. Thus, the results support those of our study. However, the stickleback study does not quantify variation at the genome scale, or in relate methylome variation to gene expression. Notably, the study suggests epigenetic differences among populations across a geographic distance of 150 kilometres, while our studyremarkably -suggests divergence across only 20 metres! We will cite and discuss this work in more detail in the revised version of the manuscript.
How did you treat the sex? Is there any difference in sex difference in DNA methylation and gene expression? Sometimes, the sex explains the largest amount of gene expression variation.
Reviewer 1 made the same point, that we address above.
We will clarify the rationale for our choice of samples to analyses in the revised text, and highlight the limitations.
L181-182: Is there any correlation between the direction of DNA methylation and gene expression (i.e., high methylation is associated with low expression etc)?
Reviewer 3 made a similar point. We will include a graph showing the correlation between methylation and gene expression, and will discuss this in the text.
L408: Is it common to exclude the 1 kb downstream of TSS for defining the gene body?
Reviewer 3 commented on this, and we have responded above. We excluded 1kbp downstream of TSS to avoid overlap between TSS and gene body categories. We will provide a full rationale for our definitions of genomic annotations//regions in the revised manuscript.

Decision Letter, first revision:
2nd November 2021 Dear Professor Miska, Thank you for your letter asking us to reconsider our decision on your Article entitled "Epigenetic Divergence during Early Stages of Speciation in an African Crater Lake Cichlid Fish". After careful consideration we have decided that we would be willing to consider a revised version of your manuscript.
Please note that while we agree with your plan for revision, we feel that we will not be able to make a final decision on whether to resume the review process until we see the revisions themselves. We understand that some of the concerns raised by the reviewers (particularly regarding heritability and adaptation) is beyond the scope of the current work and these issues can just be discussed. However, we feel that the outcome of the planned analyses, particularly regarding the several concerns raised on the RRBS and WGBS findings, is still uncertain at this stage and will be critical to the decision on whether to send the revised manuscript back to review.
Along with your revised manuscript, you should also submit a separate point-by-point response to all of the concerns raised by the reviewers, in each case describing what changes have been made to the manuscript or, alternatively, if no action has been taken, providing a compelling argument for why that is the case. If we feel that a substantial and likely successful attempt has been made to address the reviewers' comments, this response will be sent back to the reviewers -along with the revised manuscript -so that they can judge whether their concerns have been addressed satisfactorily or otherwise.
As noted above, however, we will be reluctant to trouble our reviewers again unless we believe that their comments have been resolved in full.
When revising your paper: -ensure it complies with our format requirements for Articles as set out in our guide to authors at www.nature.com/natecolevol/authors/index.html -state in a cover note the length of the text, methods and legends; the number of references and the number of display items.
Please ensure that all correspondence is marked with your Nature Ecology & Evolution reference number in the subject line.
Please use the following link to submit your revised manuscript: Decision Letter, second revision: 25th April 2022 *Please ensure you delete the link to your author homepage in this e-mail if you wish to forward it to your co-authors.
Dear Professor Miska, Your manuscript entitled "Epigenetic Divergence during Early Stages of Speciation in an African Crater Lake Cichlid Fish" has now been seen again by our 3 reviewers, whose comments are attached. While two of the reviewers are now fully satisfied, Reviewer 1 has a couple of further concerns that need addressing before we can reach a final decision regarding publication.
We therefore invite you to revise your manuscript again. Please highlight all changes in the manuscript text file.
We are committed to providing a fair and constructive peer-review process. Do not hesitate to contact us if there are specific requests from the reviewers that you believe are technically impossible or unlikely to yield a meaningful outcome.
When revising your manuscript: * Include a "Response to reviewers" document detailing, point-by-point, how you addressed each reviewer comment. If no action was taken to address a point, you must provide a compelling argument. This response will be sent back to the reviewers along with the revised manuscript.
* If you have not done so already please begin to revise your manuscript so that it conforms to our Article format instructions at http://www.nature.com/natecolevol/info/final-submission. Refer also to any guidelines provided in this letter.
* Include a revised version of any required reporting checklist. It will be available to referees (and, potentially, statisticians) to aid in their evaluation if the manuscript goes back for peer review. A revised checklist is essential for re-review of the paper.
Please use the link below to submit your revised manuscript and related files:

[REDACTED]
<strong>Note:</strong> This URL links to your confidential home page and associated information about manuscripts you may have submitted, or that you are reviewing for us. If you wish to forward this email to co-authors, please delete the link to your homepage.
We hope to receive your revised manuscript within four to eight weeks. If you cannot send it within this time, please let us know. We will be happy to consider your revision so long as nothing similar has been accepted for publication at Nature Ecology & Evolution or published elsewhere. We look forward to seeing the revised manuscript and thank you for the opportunity to review your work.

[REDACTED]
Reviewers' comments: Reviewer #1 (Remarks to the Author): This is the second time I read this MS, and I find it greatly improved after revision. I only have two more comments below that I would like to see the responses from authors before the publication of this MS. Figure 4), properly justifying the choice of proceeding with WGBS for downstream analysis, the small sample size in WGBS will intrinsically bring higher methylation variance and result in larger number of DMRs than RRBS. In addition, methylation levels of DMRs derived from WGBS are sometime quite difference from the same DMRs derived from RRBS in panel B of the Extended Data Figure 4. I may miss it, but I think the caveat of using small sample size should be noted. Also, the overlap vs. non-overlap of the number of regions identified as DMRs between WGBS and RRBS, i.e., how many DMRs out of the total DMRs identified using WGBS are also called as DMRs using RRBS, and vice versa, could be reported in addition to only cross-validating the methylation levels at DMRs shared by the two techniques in Extended Data Figure 4A.

While the authors have demonstrated a correlation between WGBS and RRBS by quantifying methylation levels at DMRs (Extended Data
2. For fixed DMRs, it seems the current definition only considers the significance of methylation change in pairwise comparison, but misses the direction of methylation change. For example, it is possible that a DMR can be significantly hypermethylated in littoral fish relative to benthic fish in wild populations, and at the same time significantly hypomethylated in littoral fish relative to benthic fish in common garden populations. Such DMR fits into the current definition of fixed DMRs but apparently does not reflect evolutionary divergence between ecomorphs. It might be good to double check the direction of methylation change, and only select those had concordant direction of change. Reviewer #2 (Remarks to the Author): The authors did an excellent job in answering the Reviewers' comments and in my opinion the manuscript is ready for publication. This is a very interesting study that provides further evidence that epigenetics and speciation might be linked. Moreover, the study lays important groundwork for future multigenerational epigenome inheritance studies in cichlids that could help to further our understanding of the function of epigenetic inheritance in vertebrates.

Decision Letter, third revision:
8th June 2022 Dear Dr. Miska, Thank you for submitting your revised manuscript "Epigenetic Divergence during Early Stages of Speciation in an African Crater Lake Cichlid Fish" (NATECOLEVOL-210814388C). It has now been seen again by the original reviewers and their comments are below. The reviewers find that the paper has improved in revision, and therefore we'll be happy in principle to publish it in Nature Ecology & Evolution, pending minor revisions to satisfy the reviewers' final requests and to comply with our editorial and formatting guidelines.
If the current version of your manuscript is in a PDF format, please email us a copy of the file in an editable format (Microsoft Word or LaTex)--we can not proceed with PDFs at this stage.
We are now performing detailed checks on your paper and will send you a checklist detailing our editorial and formatting requirements in about a week. Please do not upload the final materials and make any revisions until you receive this additional information from us.
Thank you again for your interest in Nature Ecology & Evolution. Please do not hesitate to contact me if you have any questions.

[REDACTED]
Reviewer #1 (Remarks to the Author): The authors have well addressed my last comments. I do not have further comments, and recommend publication of the current version of manuscript. This is a very interesting and timing study that demonstrates the role of epigenetics in speciation. Thank you for your patience as we've prepared the guidelines for final submission of your Nature Ecology & Evolution manuscript, "Epigenetic Divergence during Early Stages of Speciation in an African Crater Lake Cichlid Fish" (NATECOLEVOL-210814388C). Please carefully follow the step-by-step instructions provided in the attached file, and add a response in each row of the table to indicate the changes that you have made. Please also check and comment on any additional marked-up edits we have proposed within the text. Ensuring that each point is addressed will help to ensure that your revised manuscript can be swiftly handed over to our production team. **We would like to start working on your revised paper, with all of the requested files and forms, as soon as possible (preferably within two weeks). Please get in contact with us immediately if you anticipate it taking more than two weeks to submit these revised files.** When you upload your final materials, please include a point-by-point response to any remaining reviewer comments.
If you have not done so already, please alert us to any related manuscripts from your group that are under consideration or in press at other journals, or are being written up for submission to other journals (see: https://www.nature.com/nature-research/editorial-policies/plagiarism#policy-onduplicate-publication for details).
In recognition of the time and expertise our reviewers provide to Nature Ecology & Evolution's editorial process, we would like to formally acknowledge their contribution to the external peer review of your manuscript entitled "Epigenetic Divergence during Early Stages of Speciation in an African Crater Lake Cichlid Fish". For those reviewers who give their assent, we will be publishing their names alongside the published article.
Nature Ecology & Evolution offers a Transparent Peer Review option for new original research manuscripts submitted after December 1st, 2019. As part of this initiative, we encourage our authors to support increased transparency into the peer review process by agreeing to have the reviewer comments, author rebuttal letters, and editorial decision letters published as a Supplementary item. When you submit your final files please clearly state in your cover letter whether or not you would like to participate in this initiative. Please note that failure to state your preference will result in delays in accepting your manuscript for publication.

Cover suggestions
As you prepare your final files we encourage you to consider whether you have any images or illustrations that may be appropriate for use on the cover of Nature Ecology & Evolution.
Covers should be both aesthetically appealing and scientifically relevant, and should be supplied at the best quality available. Due to the prominence of these images, we do not generally select images featuring faces, children, text, graphs, schematic drawings, or collages on our covers.
We accept TIFF, JPEG, PNG or PSD file formats (a layered PSD file would be ideal), and the image should be at least 300ppi resolution (preferably 600-1200 ppi), in CMYK colour mode.
If your image is selected, we may also use it on the journal website as a banner image, and may need to make artistic alterations to fit our journal style.
Please submit your suggestions, clearly labeled, along with your final files. We'll be in touch if more information is needed.
Nature Ecology & Evolution has now transitioned to a unified Rights Collection system which will allow our Author Services team to quickly and easily collect the rights and permissions required to publish your work. Approximately 10 days after your paper is formally accepted, you will receive an email in providing you with a link to complete the grant of rights. If your paper is eligible for Open Access, our Author Services team will also be in touch regarding any additional information that may be required to arrange payment for your article.
Please note that <i>Nature Ecology & Evolution</i> is a Transformative Journal (TJ). Authors may