Strawberry is an emerging model for studying polyploid genome evolution and rapid domestication of fruit crops. Here we report haplotype-resolved genomes of two wild octoploids (Fragaria chiloensis and Fragaria virginiana), the progenitor species of cultivated strawberry. Substantial variation is identified between species and between haplotypes. We redefine the four subgenomes and track the genetic contributions of diploid species by additional sequencing of the diploid F. nipponica genome. We provide multiple lines of evidence that F. vesca and F. iinumae, rather than other described extant species, are the closest living relatives of these wild and cultivated octoploids. In response to coexistence with quadruplicate gene copies, the octoploid strawberries have experienced subgenome dominance, homoeologous exchanges and coordinated expression of homoeologous genes. However, some homoeologues have substantially altered expression bias after speciation and during domestication. These findings enhance our understanding of the origin, genome evolution and domestication of strawberries.
This is a preview of subscription content, access via your institution
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Rent or buy this article
Prices vary by article type
Prices may be subject to local taxes which are calculated during checkout
All the raw genome sequencing data have been submitted to the National Genomics Data Center (https://ngdc.cncb.ac.cn/), and the accession number is CRA005392. All the genome assemblies reported in this paper have been deposited in the Genome Warehouse of the National Genomics Data Center (https://ngdc.cncb.ac.cn/gwh), and the accession numbers are GWHDEDQ00000000 (F. chiloensis), GWHDEDR00000000 (F. virginiana) and GWHDEDN00000000 (F. nipponica). All the genome assembly and annotation files are also available in the Genome Database for Rosaceae (GDR) (https://www.rosaceae.org/Analysis/16216791,16216792,16216793).
Soltis, P. S. & Soltis, D. E. Polyploidy and Genome Evolution (Springer, 2012).
Chen, J. Z. & Birchler, J. A. Polyploid and Hybrid Genomics (Wiley-Blackwell, 2013).
Ye, C. Y. et al. The genomes of the allohexaploid Echinochloa crus-galli and its progenitors provide insights into polyploidization-driven adaptation. Mol. Plant 13, 1298–1310 (2020).
Osborn, T. C. et al. Understanding mechanisms of novel gene expression in polyploids. Trends Genet. 19, 141–147 (2003).
Comai, L. The advantages and disadvantages of being polyploid. Nat. Rev. Genet. 6, 836–846 (2005).
Michael, T. P. & VanBuren, R. Building near-complete plant genomes. Curr. Opin. Plant Biol. 54, 26–33 (2020).
Koren, S. et al. De novo assembly of haplotype-resolved genomes with trio binning. Nat. Biotechnol. 36, 1174–1182 (2018).
Campoy, J. A. et al. Gamete binning: chromosome-level and haplotype-resolved genome assembly enabled by high-throughput single-cell sequencing of gamete genomes. Genome Biol. 21, 306 (2020).
Wenger, A. M. et al. Highly-accurate long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37, 1155–1162 (2019).
Hon, T. et al. Highly accurate long-read HiFi sequencing data for five complex genomes. Sci. Data 7, 399 (2020).
Mascher, M. et al. Long-read sequence assembly: a technical evaluation in barley. Plant Cell 33, 1888–1906 (2021).
Zhou, Q. et al. Haplotype-resolved genome analyses of a heterozygous diploid potato. Nat. Genet. 52, 1018–1023 (2020).
Sun, X. et al. Phased diploid genome assemblies and pan-genomes provide insights into the genetic history of apple domestication. Nat. Genet. 52, 1423–1432 (2020).
Chen, H. et al. Allele-aware chromosome-level genome assembly and efficient transgene-free genome editing for the autotetraploid cultivated alfalfa. Nat. Commun. 11, 2494 (2020).
Folta, K. M. & Davis, T. M. Strawberry genes and genomics. Crit. Rev. Plant Sci. 25, 399–415 (2006).
Hummer, K. E. & Hancock, J. Strawberry genomics: botanical history, cultivation, traditional breeding, and new technologies. In Genetics and genomics of Rosaceae (eds Folta, K. M. & Gardiner, S. E.) 413–435 (Springer, 2009).
Qiao, Q. et al. Evolutionary history and pan-genome dynamics of strawberry (Fragaria spp.). Proc. Natl Acad. Sci. USA 118, e2105431118 (2021).
Liston, A., Cronn, R. & Ashman, T. L. Fragaria: a genus with deep historical roots and ripe for evolutionary and ecological insights. Am. J. Bot. 101, 1686–1699 (2014).
Njuguna, W., Liston, A., Cronn, R., Ashman, T. L. & Bassil, N. Insights into phylogeny, sex function and age of Fragaria based on whole chloroplast genome sequencing. Mol. Phylogenet. Evol. 66, 17–29 (2013).
Whitaker, V. M. et al. A roadmap for research in octoploid strawberry. Hortic. Res. 7, 33 (2020).
Moyano-Cañete, E. et al. FaGAST2, a strawberry ripening-related gene, acts together with FaGAST1 to determine cell size of the fruit receptacle. Plant Cell Physiol. 54, 218–236 (2013).
Gaston, A. et al. The FveFT2 florigen/FveTFL1 antiflorigen balance is critical for the control of seasonal flowering in strawberry while FveFT3 modulates axillary meristem fate and yield. N. Phytol. 232, 372–387 (2021).
Hirakawa, H. et al. Dissection of the octoploid strawberry genome by deep sequencing of the genomes of Fragaria species. DNA Res. 21, 169–181 (2014).
Hirsch, C. N. & Buell, C. R. Tapping the promise of genomics in species with complex, nonmodel genomes. Annu. Rev. Plant Biol. 64, 89–110 (2013).
Hardigan, M. A. et al. Genome synteny has been conserved among the octoploid progenitors of cultivated strawberry over millions of years of evolution. Front. Plant Sci. 10, 1789 (2020).
Edger, P. P. et al. Origin and evolution of the octoploid strawberry genome. Nat. Genet. 51, 541–547 (2019).
Liston, A. et al. Revisiting the origin of octoploid strawberry. Nat. Genet. 52, 2–4 (2020).
Feng, C. et al. Tracing the diploid ancestry of the cultivated octoploid strawberry. Mol. Biol. Evol. 38, 478–485 (2021).
Zhang, X., Zhang, S., Zhao, Q., Ming, R. & Tang, H. Assembly of allele-aware, chromosomal-scale autopolyploid genomes based on Hi-C data. Nat. Plants 5, 833–845 (2019).
Edger, P. P. et al. Single-molecule sequencing and optical mapping yields an improved genome of woodland strawberry (Fragaria vesca) with chromosome-scale contiguity. GigaScience 7, 1–7 (2018).
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
Abou, Saada et al. nPhase: an accurate and contiguous phasing method for polyploids. Genome Biol. 22, 126 (2021).
Hardigan, M. A. et al. Unraveling the complex hybrid ancestry and domestication history of cultivated strawberry. Mol. Biol. Evol. 38, 2285–2305 (2021).
Tennessen, J. A., Govindarajulu, R., Ashman, T. L. & Liston, A. Evolutionary origins and dynamics of octoploid strawberry subgenomes revealed by dense targeted capture linkage maps. Genome Biol. Evol. 6, 3295–3313 (2014).
Session, A. M. & Rokhsar, D. S. Transposon signatures of allopolyploid genome evolution. Nat. Commun. 14, 3180 (2023).
Mitros, T. et al. Genome biology of the paleotetraploid perennial biomass crop Miscanthus. Nat. Commun. 11, 5442 (2020).
Edger, P. P. et al. Reply to: Revisiting the origin of octoploid strawberry. Nat. Genet. 52, 5–7 (2020).
Zhang, J. et al. The high-quality genome of diploid strawberry (Fragaria nilgerrensis) provides new insights into anthocyanin accumulation. Plant Biotechnol. J. 18, 1908–1924 (2020).
Wei, N., Tennessen, J. A., Liston, A. & Ashman, T. L. Present-day sympatry belies the evolutionary origin of a high-order polyploid. N. Phytol. 216, 279–290 (2017).
Zhang, X., Wu, R., Wang, Y., Yu, J. & Tang, H. Unzipping haplotypes in diploid and polyploid genomes. Comput. Struct. Biotechnol. J. 18, 66–72 (2019).
Della Coletta, R., Qiu, Y., Ou, S., Hufford, M. B. & Hirsch, C. N. How the pan-genome is changing crop genomics and improvement. Genome Biol. 22, 3 (2021).
Hancock, J. F. & Bringhurst, R. S. Evolution of California populations of diploid and octoploid Fragaria (Rosaceae): a comparison. Am. J. Bot. 68, 1–5 (1981).
Harrison, R. E., Luby, J. J., Furnier, G. R. & Hancock, J. F. Morphological and molecular variation among populations of octoploid Fragaria virginiana and F. chiloensis (Rosaceae) from North America. Am. J. Bot. 84, 612–620 (1997).
Qu, M. et al. Karyotypic stability of Fragaria (strawberry) species revealed by cross-species chromosome painting. Chromosome Res. 29, 285–300 (2021).
Chen, Z. J. et al. Genomic diversifications of five Gossypium allopolyploid species and their impact on cotton improvement. Nat. Genet. 52, 525–533 (2020).
Hancock, J. F. et al. Reconstruction of the strawberry, Fragaria × ananassa, using genotypes of F. virginiana and F. chiloensis. HortScience 45, 1006–1013 (2010).
Nakashima, K. & Yamaguchi-Shinozaki, K. ABA signaling in stress-response and seed development. Plant Cell Rep. 32, 959–970 (2013).
Li, J. et al. Research advances of MYB transcription factors in plant stress resistance and breeding. Plant Signal. Behav. 14, 1613131 (2019).
Marçais, G. & Kingsford, C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27, 764–770 (2011).
Vurture, G. W. et al. GenomeScope: fast reference-free genome profiling from short reads. Bioinformatics 33, 2202–2204 (2017).
Ranallo-Benavidez, T. R. et al. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun. 11, 1432 (2020).
Cheng, H. et al. Haplotype-resolved assembly of diploid genomes without parental data. Nat. Biotechnol. 40, 1332–1335 (2022).
Alonge, M. et al. RaGOO: fast and accurate reference-guided scaffolding of draft genomes. Genome Biol. 20, 224 (2019).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell Syst. 3, 99–101 (2016).
Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).
Vaser, R., Sović, I., Nagarajan, N. & Šikić, M. Fast and accurate de novo genome assembly from long uncorrected reads. Genome Res. 27, 737–746 (2017).
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS ONE 9, e112963 (2014).
Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).
Dierckxsens, N., Mardulyn, P. & Smits, G. NOVOPlasty: de novo assembly of organelle genomes from whole genome data. Nucleic Acids Res. 45, e18 (2017).
Nurk, S. et al. HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads. Genome Res. 30, 1291–1305 (2020).
Rhie, A. et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature 592, 737–746 (2021).
Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. 46, e126 (2018).
Ou, S. & Jiang, N. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 176, 1410–1422 (2018).
Ou, S. & Jiang, N. LTR_FINDER_parallel: parallelization of LTR_FINDER enabling rapid identification of long terminal repeat retrotransposons. Mob. DNA 10, 48 (2019).
Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinform. 9, 18 (2008).
Langdon, Q. K., Peris, D., Kyle, B. & Hittinger, C. T. sppIDer: a species identification tool to investigate hybrid genomes with high-throughput sequencing. Mol. Biol. Evol. 35, 2835–2849 (2018).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Guan, D. et al. Identifying and removing haplotypic duplication in primary genome assemblies. Bioinformatics 36, 2896–2898 (2020).
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Tarailo‐Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinform. 25, 4.10.1–4.10.14 (2009).
Hoff, K. J., Lomsadze, A., Borodovsky, M. & Stanke, M. Whole-genome annotation with BRAKER. Methods Mol. Biol. 1962, 65–95 (2019).
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protoc. 8, 1494–1512 (2013).
Cheng, C. Y. et al. Araport11: a complete reannotation of the Arabidopsis thaliana reference genome. Plant J. 89, 789–804 (2017).
Li, Y., Pi, M., Gao, Q., Liu, Z. & Kang, C. Updated annotation of the wild strawberry Fragaria vesca V4 genome. Hort. Res. 6, 61 (2019).
Raymond, O. et al. The Rosa genome provides new insights into the domestication of modern roses. Nat. Genet. 50, 772–777 (2018).
Zhang, L. et al. A high-quality apple genome assembly reveals the association of a retrotransposon and red fruit colour. Nat. Commun. 10, 1494 (2019).
Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinform. 12, 491 (2011).
Eddy, S. R. Accelerated profile HMM searches. PLoS Comput. Biol. 7, e1002195 (2011).
Marçais, G. et al. MUMmer4: a fast and versatile genome alignment system. PLoS Comput. Biol. 14, e1005944 (2018).
Goel, M., Sun, H., Jiao, W. B. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 20, 277 (2019).
McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing nextgeneration DNA sequencing data. Genome Res. 20, 1297–1303 (2010).
Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Price, M. N. et al. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 26, 1641–1650 (2009).
Jia, K. H. et al. SubPhaser: a robust allopolyploid subgenome phasing method based on subgenome-specific k-mers. N. Phytol. 235, 801–809 (2022).
Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).
Buti, M. et al. The genome sequence and transcriptome of Potentilla micrantha and their comparison to Fragaria vesca (the woodland strawberry). GigaScience 7, 1–14 (2018).
Edgar, R. C. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797 (2004).
Suyama, M., Torrents, D. & Bork, P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 34, W609–W612 (2006).
Castresana, J. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol. 17, 540–552 (2000).
Nguyen, L. T., Schmidt, H. A., von Haeseler, A. & Minh, B. Q. IQ-TREE: a fast and effective stochastic algorithm for estimating maximum-likelihood phylogenies. Mol. Biol. Evol. 32, 268–274 (2015).
Mirarab, S. et al. ASTRAL: genome-scale coalescent-based species tree estimation. Bioinformatics 30, i541–i548 (2014).
Fan, H., Ives, A. R., Surget-Groba, Y. & Cannon, C. H. An assembly and alignment-free method of phylogeny reconstruction from next-generation sequencing data. BMC Genomics 16, 522 (2015).
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Ramírez-González, R. H. et al. The transcriptional landscape of polyploid wheat. Science 361, eaar6089 (2018).
Cantalapiedra, C. P. et al. eggNOG-mapper v2: functional annotation, orthology assignments, and domain prediction at the metagenomic scale. Mol. Biol. Evol. 38, 5825–5829 (2021).
This study was financially supported by the National Key Research and Development Program of China (grant no. 2018YFD1000107), CAS Pioneer Hundred Talents, and the open research project of the ‘Cross-Cooperative Team’ of the Germplasm Bank of Wild Species to A.Z., by the University of Nebraska–Lincoln to J.P.M, and by the National Science Foundation of China (grant no. 31860534) to J.R.
The authors declare no competing interests.
Peer review information
Nature Plants thanks Andrew H. Paterson, John Lovell and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
(a) Morphological features of F. chiloensis, F. virginiana, F. nipponica, and the cultivar ‘Camarosa’. Scale bar, 1 cm. (b) Seedings of F. chiloensis, F. virginiana, F. nipponica in the greenhouse.
Smudge plots showing the ploidy level estimation for the sequenced F. chiloensis, F. virginiana and F. nipponica plants.
(a), F. virginiana (b) and diploid F. nipponica (c). Note: Low to high densities of interaction signals were scaled with colours from orange to deep red.
Extended Data Fig. 4 Graphical alignment of F. vesca genome with F. chiloensis genome, the F. virginiana genome, and the F. nipponica genome.
Macrosynteny between the F. vesca genome and the F. chiloensis (a) and the F. virginiana genome (b). Syntenic gene pairs are denoted by black points. (c) Macrosynteny between the F. nipponica genome and the F. vesca genome. Syntenic gene pairs are denoted by gray line.
Extended Data Fig. 5 Mapping-based subgenome assignments of the F. chiloensis and F. virginiana chromosomes.
Note: The top and bottom line of box plot represent 25th and 75th percentiles, the centre line is the median and whiskers are the full data range. Different lowercase letters indicate the significance of differences in mapping rates among subgenomes, using one-way ANOVA with Duncan’s multiple range test (df = 27; P < 0.05).
Extended Data Fig. 6 Identification of specific subgenome k-mers (K = 13 and frequency = 50) F. virginiana (hap1) based on the subgenomic assignment originally proposed in the ‘Camarosa’ genome by Edger et al., (2019) and Hardigan et al., (2021).
The difference of chromosome assignments (2 C, 2D, 5 C, 5D, 6 C, 6D) are shown. Note: n = number of specific k-mer on each subgenome.
Extended Data Fig. 7 Identification of F. chiloensis(hap1), F. chiloensis(hap2), F. virginiana(hap1) and F. virginiana(hap2) subgenome specific LTR-RTs based on new subgenome assignment and subgenome assignment by Edger et al., (2019) and Hardigan et al., (2021).
Note: n = number of specific LTR-RTs on each subgenome.
Extended Data Fig. 8 Genetic distance matrix between diploid species and each subgenome based on 21 k-mer calculation.
(homoeologous exchange regions were filtered).
(a) Total of 6345 single copy gene were identified and 122 single copy gene located in homoeologous exchange regions (HEs) were filtered. (b) Coalescent-based analysis of 6223 genes from four diploid species and each subgenome of the F. virginiana genome. (c) Summary of phylogenetic positions of the four octoploid subgenomes. Different colour indicates the number of kept homologous gene clade with diploid species.
The distribution of HEB between all gene pairs in the red fruits of F. chiloensis, F. virginiana and ‘Camarosa’.
About this article
Cite this article
Jin, X., Du, H., Zhu, C. et al. Haplotype-resolved genomes of wild octoploid progenitors illuminate genomic diversifications from wild relatives to cultivated strawberry. Nat. Plants 9, 1252–1266 (2023). https://doi.org/10.1038/s41477-023-01473-2
This article is cited by
Nature Plants (2023)