On the post-glacial spread of human commensal Arabidopsis thaliana

Recent work has shown that Arabidopsis thaliana contains genetic groups originating from different ice age refugia, with one particular group comprising over 95% of the current worldwide population. In Europe, relicts of other groups can be found in local populations along the Mediterranean Sea. Here we provide evidence that these ‘relicts' occupied post-glacial Eurasia first and were later replaced by the invading ‘non-relicts', which expanded through the east–west axis of Eurasia, leaving traces of admixture in the north and south of the species range. The non-relict expansion was likely associated with human activity and led to a demographic replacement similar to what occurred in humans. Introgressed genomic regions from relicts are associated with flowering time and enriched for genes associated with environmental conditions, such as root cap development or metal ion trans-membrane transport, which suggest that admixture with locally adapted relicts helped the non-relicts colonize new habitats.

1. The interpretation seems too ready to assume selection favouring the outlier regions from the relicts without exploring alternative possibilities. For example, presumably, recombination rate affects the degree of introgression or at least the blockiness of introgression, and there are hotspots and coldspots in the genome. It therefore seems possible that the GO term analysis identified genes and GO-terms that co-vary with cold spots or hot-spots? Jumping to adaptive explanations without considering alternatives seems premature. Similarly, I think the outlier-based trends (GWAS + introgression) could plausibly be explained by pure demography rather than introgression + selection, and more care needs to be taken to recognize this possibility (see #5, below).
2. Assuming the introgression hypothesis is true, what evidence actually pegs the expansion of non-relicts to the original expansion into Europe? There have been many waves of human movement back and forth over the ages, and it seems just as likely that in increase in trade during, say, the middle ages or the industrial revolution, was responsible for the observed introgression patterns (or more likely, very complex patterns of human-associated movement). The authors seem inordinately eager to link this with the Homo sapiens vs. Neanderthal introgression story (e.g. line 170-173) and post-glacial expansion, when there is really only conjecture to support the case one way or the other. I think a lot of reframing and toning down in the Discussion is necessary to be more scientifically rigourous here. In particular, I find the paragraph from lines 336-353 is WAY too speculative. For example, saying that the "northward expansion certainly happened in Sicily and Lebanon relict" is putting too much certainty in these results. This statement arises from the result that K = 5 and is based on genotyped accessions, but other relatives that were not included in this panel could have been the source of such northern populations, and if K != 5 other patterns might be seen. Overall, the paper explores one hypothesis that is consistent with the data, but this does not mean that it is the best explanation, as many other possible hypotheses have not been tested and much data is not included here. More caveats are necessary with observational studies such as this. you identify regions of the genome that are outliers in terms of genetic distance, then count the number of outliers, you see more outliers in extreme northern and southern regions. As noted on line 200, the same pattern could be seen with isolation-by-distance, but the authors discount this because the geographic distance is longer on the east-west axis than the north-south axis. I do not find this discounting entirely convincing, because presumably as a human associated plant, it has been dispersing with human migration, which may have gone much more often east-west than north-south. Geographic distance is less appropriate than "human migration distance" (or something like that). As I mentioned above, human migration is complicated and the simplest story may not be the most accurate one. I find that there is a bit of "just-so story" here and lack of testing against other hypotheses, and I would like to see much more hedging. 4. I am not a fan of the "methods last" style of journal article that has become more popular in the past 10 or so years, but this is hardly the authors' fault. However, I think it would be helpful to have at least one paragraph summarizing the approaches that are being used, perhaps at the end of the introduction. The results section feels like a very abrupt transition. In some cases there are results that are introduced with little context for how the method work. For for example, what are "outlier haplotypes" (line 179)? I can guess, but going to the methods should only be necessary to understand the details of the method, not to understand roughly how it is working. Please try to put more context for the methods in the actual paper, as I don't think page counts are very restrictive for this journal.
5. How was the GWAS done? The methods are very sparse. What program was used? Was there a correction for population structure? If not, some consideration must be made for how co-variation between environment and underlying population structure may bias the results. The lack of detail here made it difficult to assess some of the findings. If population structure covarying with environment is driving the GWAS patterns, then presumably this could also be driving many of the patterns of covariation with blocks of extreme introgression. I think this is unlikely to explain the particularly strong single locus effect, but could be much more important for the other results. For example, on lines 150-157, "the top 25% of SNPs in terms of introgression are more likely to be associated with flowering time than are random SNPs". Assuming there has been no correction for population structure in the GWAS, couldn't this be driven by drift? The phenotype varies along a spatial gradient, as does introgression. Therefore, spurious associations between neutral alleles and phenotypes could occur just due to drift, and these would also be likely to be outliers in terms of introgression. GWAS without correction for population structure will just identify those loci that are most strongly structured in such case, so of course the same ones would be found in each analysis. I couldn't see any way that the analysis had accounted for this very important issue.
Minor comments: -line 22: the genome is a product of a dynamic equilibrium, but is not itself an equilibrium. It is a physical thing.
-this use of "genome" seems strange, as it represents all variation in the arabidopsis species. I think of a genome being a thing that each individual has, and that the species can be thought of as having a meta-genome, or just a bunch of genomic variation, but not a singular "genome".
There are several places where grammar needs polishing, for example: -line 25: evidences -> evidence -line 45: that -> those -line 66: change colon to semicolon -line 88-95: this reads more like a discussion paragraph, summarizing findings before they have even been presented (i.e. line 94-95). -I would appreciate a bit more description of how to interpret the ABBA-BABA and f3 statistics and significance, either in the results or methods. These methods are not standard enough that the statistics will be clear to a wide range of readers. What is the cutoff for significance with these tests? -line 159: define "high relict introgression" quantitatively here.
-line 302-303: "The other relicts belonged to a group that also contained non-relict accession the Mediterranean region and Sweden": Don't the other non-relicts belong to one of the other 4 groups? Which group is this referring to? This paragraph is generally kind of confusing.

Reviewer #2 (Remarks to the Author)
This manuscript from Lee et al is a beautifully pitched and presented contribution to our understanding of an interesting demographic phenomenon in an important, broadly-studied species, Arabidopsis thaliana that is rapidly becoming an interesting evolutionary model. The work focuses on demographic turnover and admixture of populations of this human commensal as (most likely) human-associated populations moved though more natural 'relict' populations in the Iberian Peninsula. The work is pitched strongly as an strong example reminiscent of early important demographic turnover events in humans and for this reason, as well as the fact that thaliana is such a broadly studied model, will surely be of broad interest. This is an important, highly original contribution and overall I have few concerns about it as presented. Overall, the data and methodology are very sound and some very creative and interesting novel analysis approaches are employed, especially the assignment of outlier haplotypes to unknown groups (ca ln 222). The conclusions are very robust and tested by several independent methods. References to earlier work are appropriate and judiciously used.
Major point: The use of QQ plots Figure 2 and S3D should be better explained for those unfamiliar, at least in the methods. It seems surprising that more commonplace enrichment testing methods are not reported for this showcased category (flowering time) and instead an approach that seems relatively obscure to this reader is provided. Can the authors better explain this? In particular, in figure 2, the bend downward in the grey significance threshold may seem very strange to the nonspecialist. Why is flowering time not showing up in Table 1? Whatever the relative powers of these two enrichment methods are, it would help the reader to understand what looks like a discrepancy. If in fact flowering time is not enriched by the analysis in Table 1, I would see this as not detracting from the important main points of the MS; it would simply help for the nonspecialist reader to understand this piece of the story.

Response to reviewers
Reviewer #1 (Remarks to the Author): The authors have presented a study of patterns of genomic variation and introgression among relict and nonrelict populations of Arabidopsis thaliana. The subject matter and findings are interesting, and the authors have used some creative methods to study these patterns. But I find that the interpretation seems to go beyond what the data warrants in several important areas, as I describe below in "Major comments" #1 & #2. I like the study of the large effect locus that is highly differentiated with high LD and the description of how this region is introgressed, but there are some issues with respect to the broader use of GWAS that made it difficult to interpret the results (#5, below). The supporting evidence from the chromosomal transposition is quite interesting and it's nice to see that it supports some of the other patterns of introgression. In light of these comments, I think the manuscript requires extensive reframing, but could be suitable for Nature Communications following such revision. It seems that many of the findings, however, are mainly confirming and extending previous results, so the novelty may not be significant enough for this journal (e.g. references 2325 previously studied the major effect locus; relict vs. nonrelict populations and humanassociated spread studied in references 10, 19, 26; identification of the transposition in references 18,29).
Thanks for this positive and constructive review. We respond to the specific comments below. Regarding novelty, the main point of our study is to investigate the detailed population history after nonrelict expansion across Eurasia, and we bring in relevant observations (some of which have indeed been published) as needed. For example, refs 2325 contain GWAS of flowering time, while we provide an evolutionary explanation for the origin of a major polymorphism identified. Similarly, we use the spatial distribution of a previously identified transposition (18, 29) to shed light on history. The observations from human (19,26) are only included to frame our findings, and have nothing to do with this study. Ref 10, finally, did look at migration history, but with very limited data, and without knowledge of the existence of relict populations.
Major comments: 1. The interpretation seems too ready to assume selection favouring the outlier regions from the relicts without exploring alternative possibilities. For example, presumably, recombination rate affects the degree of introgression or at least the blockiness of introgression, and there are hotspots and coldspots in the genome. It therefore seems possible that the GO term analysis identified genes and GOterms that covary with cold spots or hotspots? Jumping to adaptive explanations without considering alternatives seems premature. Similarly, I think the outlierbased trends (GWAS + introgression) could plausibly be explained by pure demography rather than introgression + selection, and more care needs to be taken to recognize this possibility (see #5, below).
Certainly alternative explanations should be considered, but only if they are plausible. As far as we are aware, there is no evidence that recombination is correlated with GO, and also no reason to expect it to be (in this or any other organism). There is evidence that some R genes are recombinational hotspots (Choi et al. 2016), but this is not a general pattern. We are well aware of the problems inherent in these analyses, which is why we base all conclusions on intersection between results (that are as independent as possible in population genetics). For example, the experimental GWAS results (which were of course corrected for population structure; see point 5 below) overlap with the climate associations, as well as with evidence for admixture. Of course these are not strictly independent (the same SNP data are used in all analyses), but the explanation proposed by us seems a lot simpler than any alternative that we are aware of. We have added a note to the Discussion clarifying this.
2a. Assuming the introgression hypothesis is true, what evidence actually pegs the expansion of nonrelicts to the original expansion into Europe? There have been many waves of human movement back and forth over the ages, and it seems just as likely that in increase in trade during, say, the middle ages or the industrial revolution, was responsible for the observed introgression patterns (or more likely, very complex patterns of humanassociated movement). The authors seem inordinately eager to link this with the Homo sapiens vs. Neanderthal introgression story (e.g. line 170173) and postglacial expansion, when there is really only conjecture to support the case one way or the other. I think a lot of reframing and toning down in the Discussion is necessary to be more scientifically rigorous here.
We are certainly not linking the expansion of nonrelict A. thaliana to the original expansion of anatomically modern humans at the expense of Neanderthals -the latter took place well over 50,000 years ago, much earlier than the nonrelict expansion, which is estimated to have occurred during the last 10,000 years (i.e., postglacially, as in the title). As stated in several places, we are proposing that the nonrelict expansion is linked to the spread of agriculture (or, possibly, Indoeuropeans), based on timing and the estimated pattern of spread. This is of course mere conjecture, but it is labeled as such.
The Neanderthal introgression story is simply an analogy; a device used to explain the rationale for our study. It is also clearly labeled as such.
2b. In particular, I find the paragraph from lines 336353 is WAY too speculative. For example, saying that the "northward expansion certainly happened in Sicily and Lebanon relict" is putting too much certainty in these results. This statement arises from the result that K = 5 and is based on genotyped accessions, but other relatives that were not included in this panel could have been the source of such northern populations, and if K != 5 other patterns might be seen. Overall, the paper explores one hypothesis that is consistent with the data, but this does not mean that it is the best explanation, as many other possible hypotheses have not been tested and much data is not included here. More caveats are necessary with observational studies such as this.
This paragraph and the accompanying figure (Fig. 8) are in the "Discussion" rather than "Results" for a reason: they are meant to summarize our explanation for lot of very complex data and propose the most likely model. However, we agree that the tone is wrong, and we have rewritten it to more clearly label it as one model, and make sure readers understand that other models are possible.
In this study, we have clearly considered other hypotheses and tried our best to rule them out. For example, line 213 to 235, and the whole analysis for Figure 4 was designed to answer whether northern outlier haplotypes descended from southern relicts or from unknown northern relicts. The results support the former. Regarding the different K values, one may increase the K value until one finds north is different from south, but in Materials and Methods we have stated why K=5 is a good choice. As we added in the manuscript, K = 6 and 7 have similar pattern as K = 5.
3. While the explanation for the pattern observed in Figure 3C seems consistent with the data (line 186195), other explanations might also make sense. The fundamental pattern here is that when you identify regions of the genome that are outliers in terms of genetic distance, then count the number of outliers, you see more outliers in extreme northern and southern regions. As noted on line 200, the same pattern could be seen with isolationbydistance, but the authors discount this because the geographic distance is longer on the eastwest axis than the northsouth axis. I do not find this discounting entirely convincing, because presumably as a human associated plant, it has been dispersing with human migration, which may have gone much more often eastwest than northsouth. Geographic distance is less appropriate than "human migration distance" (or something like that). As I mentioned above, human migration is complicated and the simplest story may not be the most accurate one. I find that there is a bit of "justso story" here and lack of testing against other hypotheses, and I would like to see much more hedging.
We feel that we have tried to rule out the most obvious alternative hypotheses. For example, while it is correct that simple northsouth isolation by distance can cause the pattern that more outlier haplotypes are observed in the northern and southern end of species range, such a model would also lead to major population structure separating north from south, rather than east from west as observed by all studies. Furthermore, under this northsouth isolationbydistance scenario one would expect outlier haplotypes in the north to be very genetically distant from southern outlier haplotypes. But as our further analyses showed ( Figure  4), a large proportion of northern outlier haplotypes are similar to those identified in the south. These results do not support the northsouth isolationbydistance scenario. We have added this to the manuscript.
On general level, we agree that you can never be certain about these historical models, and we have made changes throughout to clarify this. However, we think we have considerable evidence for a very interesting model that will likely guide further research, and that is the purpose of models. 4. I am not a fan of the "methods last" style of journal article that has become more popular in the past 10 or so years, but this is hardly the authors' fault. However, I think it would be helpful to have at least one paragraph summarizing the approaches that are being used, perhaps at the end of the introduction. The results section feels like a very abrupt transition. In some cases there are results that are introduced with little context for how the method work. For for example, what are "outlier haplotypes" (line 179)? I can guess, but going to the methods should only be necessary to understand the details of the method, not to understand roughly how it is working. Please try to put more context for the methods in the actual paper, as I don't think page counts are very restrictive for this journal.
We agree, and have added descriptions in the appropriate parts of the results. 5. How was the GWAS done? The methods are very sparse. What program was used? Was there a correction for population structure? If not, some consideration must be made for how covariation between environment and underlying population structure may bias the results. The lack of detail here made it difficult to assess some of the findings. If population structure covarying with environment is driving the GWAS patterns, then presumably this could also be driving many of the patterns of covariation with blocks of extreme introgression. I think this is unlikely to explain the particularly strong single locus effect, but could be much more important for the other results. For example, on lines 150157, "the top 25% of SNPs in terms of introgression are more likely to be associated with flowering time than are random SNPs". Assuming there has been no correction for population structure in the GWAS, couldn't this be driven by drift? The phenotype varies along a spatial gradient, as does introgression. Therefore, spurious associations between neutral alleles and phenotypes could occur just due to drift, and these would also be likely to be outliers in terms of introgression. GWAS without correction for population structure will just identify those loci that are most strongly structured in such case, so of course the same ones would be found in each analysis. I couldn't see any way that the analysis had accounted for this very important issue.
We used a mixed linear model with a kinship term (Korte et al., Nature Genetics , 2012) to correct for population structure in all GWAS. This information has been added. We apologize for the omission: correcting for population structure is absolutely essential, and we always do it. Evidently it was so obvious to us that we neglected to mention it! We believe this correction addresses the remaining comments.
Minor comments: 6. line 22: the genome is a product of a dynamic equilibrium, but is not itself an equilibrium. It is a physical thing. this use of "genome" seems strange, as it represents all variation in the arabidopsis species. I think of a genome being a thing that each individual has, and that the species can be thought of as having a metagenome, or just a bunch of genomic variation, but not a singular "genome".
We have rephrased this part.
7. There are several places where grammar needs polishing, for example: line 25: evidences > evidence line 45: that > those line 66: change colon to semicolon All corrected.
8. line 8895: this reads more like a discussion paragraph, summarizing findings before they have even been presented (i.e. line 9495).
Here we mention North American accessions being introduced by humans in historical times (in the last few hundred years). This is the result from other studies, not our conclusion from this paper. We simply mention this to justify why we only use samples from Eurasia. In this paper we deal with prehistorical patterns of dispersal in Eurasia, not recent human assisted migrations to other continents. 9. Figure  Arabidopsis thaliana is highly selfing and there is essentially no heterozygosity in the analyzed individuals.
10. I would appreciate a bit more description of how to interpret the ABBABABA and f3 statistics and significance, either in the results or methods. These methods are not standard enough that the statistics will be clear to a wide range of readers. What is the cutoff for significance with these tests?
We added some explanation in the Results section while leaving detailed information in Materials and Methods. 11. line 159: define "high relict introgression" quantitatively here.
We used the same criteria (top 25% windows with highest relict introgression) as the SNPbased analysis in the previous paragraph. We added this information. 12. line 302303: "The other relicts belonged to a group that also contained nonrelict accession the Mediterranean region and Sweden": Don't the other nonrelicts belong to one of the other 4 groups? Which group is this referring to? This paragraph is generally kind of confusing.
We are referring to the orange haplogroup in Figure 6A. We have modified this paragraph to make the context more clear.

Reviewer #2 (Remarks to the Author):
This manuscript from Lee et al is a beautifully pitched and presented contribution to our understanding of an interesting demographic phenomenon in an important, broadlystudied species, Arabidopsis thaliana that is rapidly becoming an interesting evolutionary model. The work focuses on demographic turnover and admixture of populations of this human commensal as (most likely) humanassociated populations moved though more natural 'relict' populations in the Iberian Peninsula. The work is pitched strongly as an strong example reminiscent of early important demographic turnover events in humans and for this reason, as well as the fact that thaliana is such a broadly studied model, will surely be of broad interest. This is an important, highly original contribution and overall I have few concerns about it as presented. Overall, the data and methodology are very sound and some very creative and interesting novel analysis approaches are employed, especially the assignment of outlier haplotypes to unknown groups (ca ln 222). The conclusions are very robust and tested by several independent methods. References to earlier work are appropriate and judiciously used.
Major point: The use of QQ plots Figure 2 and S3D should be better explained for those unfamiliar, at least in the methods. It seems surprising that more commonplace enrichment testing methods are not reported for this showcased category (flowering time) and instead an approach that seems relatively obscure to this reader is provided. Can the authors better explain this? In particular, in figure 2, the bend downward in the grey significance threshold may seem very strange to the nonspecialist. Why is flowering time not showing up in Table 1? Whatever the relative powers of these two enrichment methods are, it would help the reader to understand what looks like a discrepancy. If in fact flowering time is not enriched by the analysis in Table 1, I would see this as not detracting from the important main points of the MS; it would simply help for the nonspecialist reader to understand this piece of the story.
QQ plots are a standard way of comparing two distributions. There are of course many ways of testing for difference, but most rely on a single summary (e.g. difference in mean) and furthermore assume independent observations. Because of linkage disequilibrium, these pvalues are anything but (they are spatially autocorrelated along the chromosomes). For this reason we use a permutation scheme that maintains the SNP positions, but "rotates" the annotation vectors (GWAS pvalues vs high/low introgression statues). We suspect that these autocorrelations may be responsible for the downturn in significance threshold, but confess that we have not researched this further. Similar deviations are often seen in GWAS QQplots. We have added this explanation to the Methods -thanks for pointing it out.
As for why flowering time does show up in Table 1, the simplest explanation is that the category as a whole is not overrepresented, just small subset that actually harbors polymorphism involved in flowering time variation. Our analyses used standard GO categories, not ad hoc lists of known flowering time genes.