Abstract
Assembly of complete genomes can reveal functional genetic elements missing from draft sequences. Here we present the near-complete telomere-to-telomere and contiguous genome of the cotton species Gossypium raimondii. Our assembly identified gaps and misoriented or misassembled regions in previous assemblies and produced 13 centromeres, with 25 chromosomal ends having telomeres. In contrast to satellite-rich Arabidopsis and rice centromeres, cotton centromeres lack phased CENH3 nucleosome positioning patterns and probably evolved by invasion from long terminal repeat retrotransposons. In-depth expression profiling of transposable elements revealed a previously unannotated DNA transposon (MuTC01) that interacts with miR2947 to produce trans-acting small interfering RNAs (siRNAs), one of which targets the newly evolved LEC2 (LEC2b) to produce phased siRNAs. Systematic genome editing experiments revealed that this tripartite module, miR2947–MuTC01–LEC2b, controls the morphogenesis of complex folded embryos characteristic of Gossypium and its close relatives in the cotton tribe. Our study reveals a trans-acting siRNA-based tripartite regulatory pathway for embryo development in higher plants.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$209.00 per year
only $17.42 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
Data availability
All raw sequencing data (ONT sequencing, PacBio HiFi and MGI reads) are available from the National Center for Biotechnology Information (NCBI) BioProject database (accession no. PRJNA812351) and the China National Center for Bioinformation (CNCB) (accession no. PRJCA008338). The genome sequence and gene annotation files for G. raimondii are available at the CNCB genome warehouse (https://ngdc.cncb.ac.cn/gwh) under accession no. GWHBISS00000000 or NCBI under accession nos. CP156731–CP156743 (BioProject PRJNA812351). The genome sequence, gene and transposable element annotation files for G. raimondii are also available at GitHub (https://github.com/huanggai/T2T-Cotton-DD5.git) and figshare (https://doi.org/10.6084/m9.figshare.25771302.v1)80. The RNA-seq, full-length NanoPore cDNA-seq, small RNA-seq and bisulfite sequencing for G. raimondii were deposited under NCBI project nos. PRJNA1128012, PRJNA1127988, PRJNA1127990, PRJNA1128006 or CNCB project nos. PRJCA016685, PRJCA016686, PRJCA016683 and PRJCA016687. The small RNA-seq data and degradome sequencing for G. hirsutum are available under the NCBI PRJNA1128045 or CNCB PRJCA016684 and PRJCA016814, respectively. The following databases were used: the Seed Information Database (https://ser-sid.org); the Seed Biology Place (http://seedbiology.eu/); and the Plant Genomes Database (https://www.plabipd.de). Source data are provided with this paper.
Code availability
All software used in the study is publicly available as described in the Methods and Reporting Summary. The custom code for the centromere analysis is available at Zenodo (https://zenodo.org/records/11115400)81 and GitHub (https://github.com/huanggai/T2T-Cotton-DD5.git).
References
Linkies, A., Graeber, K., Knight, C. & Leubner-Metzger, G. The evolution of seeds. New Phytol. 186, 817–831 (2010).
Martin, A. C. The comparative internal morphology of seeds. Am. Midl. Nat. 36, 513–660 (1946).
Seelanan, T., Schnabel, A. & Wendel, J. F. Congruence and consensus in the cotton tribe (Malvaceae). Syst. Bot. 22, 259–290 (1997).
Fryxell, P. A. A redefinition of the tribe Gossypieae. Bot. Gaz. 129, 296–308 (1968).
Finch-Savage, W. E. & Leubner-Metzger, G. Seed dormancy and the control of germination. New Phytol. 171, 501–523 (2006).
Fryxell, P. A. The Natural History of the Cotton Tribe (Malvaceae, Tribe Gossypieae) (Texas A & M Univ. Press, 1978).
Wendel, J. F. & Grover, C. E. in Cotton Vol. 57 (eds Fang, D. D. & Percy, R. G.) 25–44 (ASA-CSSA-SSSA, 2015).
Huang, G., Huang, J.-Q., Chen, X.-Y. & Zhu, Y.-X. Recent advances and future perspectives in cotton research. Annu. Rev. Plant Biol. 72, 437–462 (2021).
Huang, G. et al. Genome sequence of Gossypium herbaceum, and genome update of G. arboreum and G. hirsutum provide insights into cotton A-genome evolution. Nat. Genet. 52, 516–524 (2020).
Viot, C. R. & Wendel, J. F. Evolution of the cotton genus, Gossypium, and its domestication in the Americas. Crit. Rev. Plant Sci. 42, 1–33 (2023).
Wendel, J. F., Brubaker, C., Alvarez, I., Cronn, R. & Stewart, J. M. in Genetics and Genomics of Cotton (ed. Paterson, A. H.) 3–22 (Springer-Verlag, 2009).
Wang, K. et al. The draft genome of a diploid cotton Gossypium raimondii. Nat. Genet. 44, 1098–1103 (2012).
Wang, M. et al. Comparative genome analyses highlight transposon-mediated genome expansion and the evolutionary architecture of 3D genomic folding in cotton. Mol. Biol. Evol. 38, 3621–3636 (2021).
Paterson, A. H. et al. Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. Nature 492, 423–427 (2012).
Udall, J. A. et al. De novo genome sequence assemblies of Gossypium raimondii and Gossypium turneri. G3 9, 3079–3085 (2019).
Wen, X. et al. A comprehensive overview of cotton genomics, biotechnology and molecular biological studies. Sci. China Life Sci. 66, 2214–2256 (2023).
Song, J. M. et al. Two gap-free reference genomes and a global view of the centromere architecture in rice. Mol. Plant 14, 1757–1767 (2021).
Naish, M. et al. The genetic and epigenetic landscape of the Arabidopsis centromeres. Science 374, eabi7489 (2021).
Chen, J. et al. A complete telomere-to-telomere assembly of the maize genome. Nat. Genet. 55, 1221–1231 (2023).
Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).
Wlodzimierz, P. et al. Cycles of satellite and transposon evolution in Arabidopsis centromeres. Nature 618, 557–565 (2023).
Allen, E., Xie, Z. X., Gustafson, A. M. & Carrington, J. C. microRNA-directed phasing during trans-acting siRNA biogenesis in plants. Cell 121, 207–221 (2005).
Liu, Y. L., Teng, C., Xia, R. & Meyers, B. C. PhasiRNAs in plants: their biogenesis, genic sources, and roles in stress responses, development, and reproduction. Plant Cell 32, 3059–3080 (2020).
Zhan, J. P. & Meyers, B. C. Plant small RNAs: their biogenesis, regulatory roles, and functions. Annu. Rev. Plant Biol. 74, 21–51 (2023).
Catlin, N. S. & Josephs, E. B. The important contribution of transposable elements to phenotypic variation and evolution. Curr. Opin. Plant Biol. 65, 102140 (2022).
Fueyo, R., Judd, J., Feschotte, C. & Wysocka, J. Roles of transposable elements in the regulation of mammalian transcription. Nat. Rev. Mol. Cell Biol. 23, 481–497 (2022).
Hawkins, J. S., Kim, H., Nason, J. D., Wing, R. A. & Wendel, J. F. Differential lineage-specific amplification of transposable elements is responsible for genome size variation in Gossypium. Genome Res. 16, 1252–1261 (2006).
Chang, X. et al. High-quality Gossypium hirsutum and Gossypium barbadense genome assemblies reveal the centromeric landscape and evolution. Plant Commun. 5, 100722 (2024).
Wang, M. et al. Evolutionary dynamics of 3D genome architecture following polyploidization in cotton. Nat. Plants 4, 90–97 (2018).
McCartney, A. M. et al. Chasing perfection: validation and polishing strategies for telomere-to-telomere genome assemblies. Nat. Methods 19, 687–695 (2022).
Gan, Y. et al. Chromosomal locations of 5S and 45S rDNA in Gossypium genus and its phylogenetic implications revealed by FISH. PLoS ONE 8, e68207 (2013).
Rautiainen, M. et al. Telomere-to-telomere assembly of diploid chromosomes with Verkko. Nat. Biotechnol. 41, 1474–1482 (2023).
Bzikadze, A. V., Mikheenko, A. & Pevzner, P. A. Fast and accurate mapping of long reads to complete genome assemblies with VerityMap. Genome Res. 32, 2107–2118 (2022).
Yang, Y., Wen, X., Wu, Z., Wang, K. & Zhu, Y. Large-scale long terminal repeat insertions produced a significant set of novel transcripts in cotton. Sci. China Life Sci. 66, 1711–1724 (2023).
Han, J. et al. Rapid proliferation and nucleolar organizer targeting centromeric retrotransposons in cotton. Plant J. 88, 992–1005 (2016).
Lanciano, S. & Cristofari, G. Measuring and interpreting transposable element expression. Nat. Rev. Genet. 21, 721–736 (2020).
Axtell, M. J., Jan, C., Rajagopalan, R. & Bartel, D. P. A two-hit trigger for siRNA biogenesis in plants. Cell 127, 565–577 (2006).
Dai, X. B., Zhuang, Z. H. & Zhao, P. X. psRNATarget: a plant small RNA target analysis server (2017 release). Nucleic Acids Res. 46, W49–W54 (2018).
Yang, Z., Qanmber, G., Wang, Z., Yang, Z. & Li, F. Gossypium Genomics: trends, scope, and utilization for cotton improvement. Trends Plant Sci. 25, 488–500 (2020).
Su, H. D. et al. Centromere satellite repeats have undergone rapid changes in polyploid wheat subgenomes. Plant Cell 31, 2035–2051 (2019).
Wolfgruber, T. K. et al. Maize centromere structure and evolution: sequence analysis of centromeres 2 and 5 reveals dynamic loci shaped primarily by retrotransposons. PLoS Genet. 5, e1000743 (2009).
Gong, Z. et al. Repeatless and repeat-based centromeres in potato: implications for centromere evolution. Plant Cell 24, 3559–3574 (2012).
Perumal, S. et al. A high-contiguity Brassica nigra genome localizes active centromeres and defines the ancestral Brassica genome. Nat. Plants 6, 929–941 (2020).
Ahmed, H. I. et al. Einkorn genomics sheds light on history of the oldest domesticated wheat. Nature 620, 830–838 (2023).
Dawe, R. K. Centromere renewal and replacement in the plant kingdom. Proc. Natl. Acad. Sci. USA 102, 11573–11574 (2005).
Talbert, P. B. & Henikoff, S. What makes a centromere? Exp. Cell Res. 389, 111895 (2020).
Liu, P., Cuerda-Gil, D., Shahid, S. & Slotkin, R. K. The epigenetic control of the transposable element life cycle in plant genomes and beyond. Annu. Rev. Genet. 56, 63–87 (2022).
Cvetkovic, T. et al. Phylogenomics resolves deep subfamilial relationships in Malvaceae s.l. G3 11, jkab136 (2021).
Areces-Berazain, F. & Ackerman, J. D. Phylogenetics, delimitation and historical biogeography of the pantropical tree genus Thespesia (Malvaceae, Gossypieae). Bot. J. Linn. Soc. 181, 171–198 (2016).
Lunardon, A. et al. Integrated annotations and analyses of small RNA-producing loci from 47 diverse plants. Genome Res. 30, 497–513 (2020).
Borges, F. & Martienssen, R. A. The expanding world of small RNAs in plants. Nat. Rev. Mol. Cell Biol. 16, 727–741 (2015).
Liang, M. et al. Taxon-specific, phased siRNAs underlie a speciation locus in monkeyflowers. Science 379, 576–582 (2023).
Cheng, H. Y., Concepcion, G. T., Feng, X. W., Zhang, H. W. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).
Jiang, T. et al. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol. 21, 189 (2020).
Manni, M., Berkeley, M. R., Seppey, M. & Zdobnov, E. M. BUSCO: assessing genomic data quality and beyond. Curr. Protoc. 1, e323 (2021).
Rhie, A., Walenz, B. P., Koren, S. & Phillippy, A. M. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).
Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinformatics 9, 18 (2008).
Ou, S. & Jiang, N. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 176, 1410–1422 (2018).
Nussbaumer, T. et al. MIPS PlantsDB: a database framework for comparative plant genome research. Nucleic Acids Res. 41, D1144–D1151 (2013).
Ou, S. J. et al. Benchmarking transposable element annotation methods for creation of a streamlined, comprehensive pipeline. Genome Biol. 20, 275 (2019).
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12, 357–360 (2015).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Shumate, A., Wong, B., Pertea, G. & Pertea, M. Improved transcriptome assembly using a hybrid of long and short reads with StringTie. PLoS Comput. Biol. 18, e1009730 (2022).
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 9, R7 (2008).
Kumar, S., Stecher, G., Li, M., Knyaz, C. & Tamura, K. MEGA X: molecular evolutionary genetics analysis across computing platforms. Mol. Biol. Evol. 35, 1547–1549 (2018).
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Vollger, M. R., Kerpedjiev, P., Phillippy, A. M. & Eichler, E. E. StainedGlass: interactive visualization of massive tandem repeat structures with identity heatmaps. Bioinformatics 38, 2049–2051 (2022).
Zhao, H. et al. Gene expression and chromatin modifications associated with maize centromeres. G3 6, 183–192 (2015).
Magoč, T. & Salzberg, S. L. FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27, 2957–2963 (2011).
Vainshtein, Y., Rippe, K. & Teif, V. B. NucTools: analysis of chromatin feature occupancy profiles from high-throughput sequencing data. BMC Genomics 18, 158 (2017).
Sun, L. et al. Heat stress-induced transposon activation correlates with 3D chromatin organization rearrangement in Arabidopsis. Nat. Commun. 11, 1886 (2020).
Ramirez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016).
Liao, Y., Smyth, G. K. & Shi, W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics 30, 923–930 (2014).
Chen, C. J. et al. TBtools: an integrative toolkit developed for interactive analyses of big biological data. Mol. Plant 13, 1194–1202 (2020).
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Wen, X. et al. Molecular studies of cellulose synthase supercomplex from cotton fiber reveal its unique biochemical properties. Sci. China Life Sci. 65, 1776–1793 (2022).
Shi, Y.-H. et al. Transcriptome profiling, molecular biological, and physiological studies reveal a major role for ethylene in cotton fiber cell elongation. Plant Cell 18, 651–664 (2006).
Huang, G. Telomere-to-telomere Gossypium raimondii genome (final version). figshare https://doi.org/10.6084/m9.figshare.25771302.v1 (2024).
Huang, G. A telomere-to-telomere cotton genome assembly reveals centromere evolution and a Mutator transposon-linked module regulating embryo development. Zenodo https://doi.org/10.5281/zenodo.11115400 (2024).
Acknowledgements
This work was supported by grants from the National Natural Science Foundation of China (grant nos. 32388101 to Y.Z. and 32201747 to G.H.).
Author information
Authors and Affiliations
Contributions
Y.Z. and G.H. conceived and designed the project. G.H. and Z.B. conducted the T2T genome assembly. G.H. analyzed the data and performed the experiments. L.F. and J.Z. assisted in the small RNA analysis. X.C. and J.F.W. provided input for the discussions. Y.Z. and G.H. wrote and revised the paper, with assistance from J.F.W.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks Daniel Peterson, Qian-Hao Zhu and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Figs. 1–16 and Tables 1–13, excluding Tables 8 and 10.
Supplementary Table
Supplementary Tables 8 and 10 in Excel format.
Source Data Fig. 1
Values plotted in Fig. 1b.
Source Data Fig. 4
Values plotted in Fig. 4c,j.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Huang, G., Bao, Z., Feng, L. et al. A telomere-to-telomere cotton genome assembly reveals centromere evolution and a Mutator transposon-linked module regulating embryo development. Nat Genet 56, 1953–1963 (2024). https://doi.org/10.1038/s41588-024-01877-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/s41588-024-01877-6