Reply to: Revisiting the origin of octoploid strawberry

1Department of Horticulture, Michigan State University, East Lansing, MI, USA. 2Ecology, Evolutionary Biology and Behavior, Michigan State University, East Lansing, MI, USA. 3Department of Biological Sciences, University of Alabama, Tuscaloosa, AL, USA. 4Department of Plant Biology, Michigan State University, East Lansing, MI, USA. 5Department of Plant Sciences, University of California–Davis, Davis, CA, USA. 6School of Agriculture, Yunnan University, Kunming, China. 7College of Chinese Material Medica, Yunnan University of Chinese Medicine, Kunming, China. 8These authors contributed equally: Patrick P. Edger, Michael R. McKain. *e-mail: edgerpat@msu.edu; qiaoqin@ynu.edu.cn; zhangticao@mail.kib.ac.cn The origin of octoploid strawberry has been the focus of several phylogenetic studies over the past decade (for example, refs. 1–3). Our previous study, using the octoploid genome and transcriptomes of every extant diploid Fragaria species, provided support for four species (Fragaria vesca, Fragaria iinumae, Fragaria viridis and Fragaria nipponica) as the closest extant relatives of the diploids that contributed to the origin of octoploid strawberry4. In a response paper5, Liston et al. stated “that only two extand diploids were progenitors” with one subgenome being contributed by F. vesca and three by F. iinumae–like ancestors. Our reanalysis of the transcriptome data and comparative genomic analyses of a chromosome-scale F. iinumae genome support our previous model for the origin of octoploid strawberry4. Liston et al.5 raised a concern regarding one of the steps in the phylogenetic analysis of the subgenome tree-searching algorithm (PhyDS) tool we developed to identify extant relatives of diploid progenitors of allopolyploids. Specifically, they argue that we may have incorrectly identified F. viridis and F. nipponica as extant relatives because in-paralogs were excluded from our previous phylogenetic analysis4. Our reanalysis of the data using PhyDS, now including Reply to: Revisiting the origin of octoploid strawberry

The origin of octoploid strawberry has been the focus of several phylogenetic studies over the past decade (for example, refs. [1][2][3]. Our previous study, using the octoploid genome and transcriptomes of every extant diploid Fragaria species, provided support for four species (Fragaria vesca, Fragaria iinumae, Fragaria viridis and Fragaria nipponica) as the closest extant relatives of the diploids that contributed to the origin of octoploid strawberry 4 . In a response paper 5 , Liston et al. stated "that only two extand diploids were progenitors" with one subgenome being contributed by F. vesca and three by F. iinumae-like ancestors. Our reanalysis of the transcriptome data and comparative genomic analyses of a chromosome-scale F. iinumae genome support our previous model for the origin of octoploid strawberry 4 .
Liston et al. 5 raised a concern regarding one of the steps in the phylogenetic analysis of the subgenome tree-searching algorithm (PhyDS) tool we developed to identify extant relatives of diploid progenitors of allopolyploids. Specifically, they argue that we may have incorrectly identified F. viridis and F. nipponica as extant relatives because in-paralogs were excluded from our previous phylogenetic analysis 4 . Our reanalysis of the data using PhyDS, now including  (Fig. 2). Phylogenetic analysis of the subgenome tree-searching algorithm searched a set of gene trees to identify sequences most closely related to a set of user-provided paralogs (or homoeologs in polyploids). Homoeologs are orthologous genes that were brought back into the same nucleus by allopolyploidization 6 . For our analyses, we used syntenic (that is, positionally conserved) homoeologs that were present on all subgenomes in octoploid strawberry. Gene trees were estimated using RAxML 7 based on orthologs identified using established orthogrouping approaches 8 applied to de novo assembled transcriptomes for each diploid Fragaria species 4 . PhyDS performs a relatively simple and straightforward analysis of gene trees. First, it identifies the user-provided paralog present in a gene tree and then moves to the direct ancestral node of the paralog. Second, PhyDS then returns to the user the direct descendants (that is, sequence identities including the paralog) of that ancestral node with its bootstrap support value (Fig. 1).
We have two major concerns regarding the methods used in refs. 2,5 . First, phylogenetic analyses aimed at estimation of species relationships are reliant first on correct identification of orthologs 9 . These authors used a sequence similarity-based approach to identify  putative orthologs that has relatively high error rates 10 . Furthermore, pangenome studies have shown that up to one-half of gene content exhibits presence-absence variation at the species level in plants 11 . In other words, many genes are individual-or population-specific. Thus, many of the putative ortholog predictions in their studies may be inaccurate. Second, Liston et al. 5 performed analyses of 100-kb windows across each of the seven base chromosomes. This could be problematic because chromosomal regions from one parental species can be replaced with chromosomal regions from the other parental species during meiosis in polyploids (referred to as homoeologous exchanges 12 ). Homoeologous exchanges can range in size from large megabase-sized regions to single genes (see a recent review on its impact on subgenome assignment in ref. 13 ). We identifed extensive homoeologous exchanges throughout the octoploid strawberry genome 4 . Thus, the 100-kb windows Liston et al. used consist of genes with different evolutionary histories reflecting each of the different progenitor species. This could result in inaccurate estimates of species relationships.
Here we present a chromosome-scale genome of F. iinumae with a scaffold minimum scaffold length needed to cover 50% of the genome of 33.98 Mb and 23,665 protein-coding genes (see Supplementary Information). This genome was used to calculate the synonymous substitution (K s ) divergence between F. iinumae to each of the four subgenomes (Fig. 2a). This revealed that only one of the subgenomes of octoploid strawberry is F. iinumae-like, which does not support the model presented by Liston et al. 5 that the origin of octoploid strawberry involved three F. iinumae-like and one F. vesca-like progenitor species. Instead, these results are consistent with our phylogenetic estimates supporting more than two diploid progenitors (Fig. 2b-d). The F. viridis (Fig. 2c) and F. nipponica (Fig. 2d)

subgenomes are not F. iinumae-like.
Our new phylogenetic analyses support four distinct progenitor species, which is consistent with our previous results 4 and that of other groups 3 . The conflicting results obtained by Liston et al. 5 are probably due to differences in methodology. As pointed out above, establishing gene orthology is crucial for molecular phylogenetics. Our pipeline started by identifying high-confidence syntenic 1:1 homoeologs present on each of the subgenomes. This step alone filtered out 82.1% of genes from the octoploid strawberry genome 4 . The number of genes analyzed in our study was further reduced due to absence across transcriptome data, stringent orthogroup filtering and bootstrap value filtering. In short, more data are not always better if one introduces 'phylogenetic noise' . It is unclear to us how Liston et al. 5 obtained high unique mapping rates (~89% alignment) across the F. vesca genome, which consists of ~31% transposable elements and hundreds of duplicate genes. Furthermore, many genes are species-specific based on previous pangenome studies.
As pointed out by Liston et al. 5 , incomplete lineage sorting can impact phylogenetic inferences. However, that is far more likely to impact within-species than between-species estimates. This is exactly what was observed in our study. Other F. vesca subspecies were identified as contributors but were present at notably lower levels than F. viridis and F. nipponica (Fig. 1a). These patterns provide further support for F. viridis and F. nipponica as extant relatives of the progenitors that contributed to the origin of the intermediate hexaploid ancestor. Lastly, we did state that F. moschata may be an extant relative of the intermediate hexaploid ancestor. Given the high frequency of polyploid formation in Fragaria 14 and birth-death dynamics of polyploids 15 , we agree it is possible that the hexaploid ancestor may be extinct. This remains to be properly evaluated using robust phylogenetic approaches and datasets.

Online content
Any methods, additional references, Nature Research reporting summaries, source data, extended data, supplementary information, acknowledgements, peer review information; details of author contributions and competing interests; and statements of data and code availability are available at https://doi.org/10.1038/s41588-019-0544-2.

Statistics
For all statistical analyses, confirm that the following items are present in the figure legend, table legend, main text, or Methods section.

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement A statement on whether measurements were taken from distinct samples or whether the same sample was measured repeatedly The statistical test(s) used AND whether they are one-or two-sided Only common tests should be described solely by name; describe more complex techniques in the Methods section.
A description of all covariates tested A description of any assumptions or corrections, such as tests of normality and adjustment for multiple comparisons A full description of the statistical parameters including central tendency (e.g. means) or other basic estimates (e.g. regression coefficient) AND variation (e.g. standard deviation) or associated estimates of uncertainty (e.g. confidence intervals) For null hypothesis testing, the test statistic (e.g. F, t, r) with confidence intervals, effect sizes, degrees of freedom and P value noted

Software and code
Policy information about availability of computer code Data collection All commericial DNA and RNA sequencing platforms used in this study are fully described.

Data analysis
All software used in this study for data analysis is fully described including specifying versions used. All custom software developed for this study has already been deposited on Github with weblinks (e.g. PhyDS; (https://github.com/mrmckain/PhyDS/)).
For manuscripts utilizing custom algorithms or software that are central to the research but not yet described in published literature, software must be made available to editors/reviewers. We strongly encourage code deposition in a community repository (e.g. GitHub). See the Nature Research guidelines for submitting code & software for further information.

Data
Policy information about availability of data All manuscripts must include a data availability statement. This statement should provide the following information, where applicable: -Accession codes, unique identifiers, or web links for publicly available datasets -A list of figures that have associated raw data -A description of any restrictions on data availability Raw sequencing data is available under accession code BioProjects PRJNA544784 and PRJNA508389 in the Sequence Read Archive on NCBI (https:// www.ncbi.nlm.nih.gov/sra). Genome assemblies and annotations are on NCBI GenBank (https://www.ncbi.nlm.nih.gov/genome) under the same BioProjects and also available on the Genome Database for Rosaceae (https://www.rosaceae.org/).