Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Genomic innovation and regulatory rewiring during evolution of the cotton genus Gossypium

Abstract

Phenotypic diversity and evolutionary innovation ultimately trace to variation in genomic sequence and rewiring of regulatory networks. Here, we constructed a pan-genome of the Gossypium genus using ten representative diploid genomes. We document the genomic evolutionary history and the impact of lineage-specific transposon amplification on differential genome composition. The pan-3D genome reveals evolutionary connections between transposon-driven genome size variation and both higher-order chromatin structure reorganization and the rewiring of chromatin interactome. We linked changes in chromatin structures to phenotypic differences in cotton fiber and identified regulatory variations that decode the genetic basis of fiber length, the latter enabled by sequencing 1,005 transcriptomes during fiber development. We showcase how pan-genomic, pan-3D genomic and genetic regulatory data serve as a resource for delineating the evolutionary basis of spinnable cotton fiber. Our work provides insights into the evolution of genome organization and regulation and will inform cotton improvement by enabling regulome-based approaches.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Evolution and demographic history of diploid Gossypium genomes.
Fig. 2: Pan-genome and transposon amplification for diploid Gossypium species.
Fig. 3: Evolutionary genomics of TADs in Gossypium.
Fig. 4: Rewiring of chromatin loop-mediated regulatory networks.
Fig. 5: eQTL mapping and discovery of candidate genes.
Fig. 6: Genomic and regulatory innovation during fiber evolution.

Similar content being viewed by others

Data availability

All raw sequencing data generated in this paper have been deposited into the National Center for Biotechnology Information database (BioProject ID: PRJNA788082, PRJNA788661, PRJNA788322). All genome assemblies are uploaded into the NCBI database (PRJNA788082). Genome assemblies, annotations, pan-gene, pan-genome, pan-3D genomes and eQTL dataset are available at the Figshare website (https://figshare.com/projects/Gossypium_pan-genomes/128336; https://figshare.com/projects/Gossypium_Pan-3D/128924; https://figshare.com/projects/Regulatory_dataset/128927). Source data are provided with this paper.

Code availability

All software used in the study are publicly available from the Internet as described in Methods and Reporting Summary.

References

  1. Meyer, R. S. & Purugganan, M. D. Evolution of crop species: genetics of domestication and diversification. Nat. Rev. Genet. 14, 840–852 (2013).

    Article  CAS  Google Scholar 

  2. Studer, A., Zhao, Q., Ross-Ibarra, J. & Doebley, J. Identification of a functional transposon insertion in the maize domestication gene tb1. Nat. Genet. 43, 1160–1163 (2011).

    Article  CAS  Google Scholar 

  3. Huang, X. et al. Genome-wide association studies of 14 agronomic traits in rice landraces. Nat. Genet. 42, 961–967 (2010).

    Article  CAS  Google Scholar 

  4. Li, H. et al. Genome-wide association study dissects the genetic architecture of oil biosynthesis in maize kernels. Nat. Genet. 45, 43–50 (2013).

    Article  CAS  Google Scholar 

  5. Kremling, K. A. G. et al. Dysregulation of expression correlates with rare-allele burden and fitness loss in maize. Nature 555, 520–523 (2018).

    Article  CAS  Google Scholar 

  6. Lu, S. et al. Stepwise selection on homeologous PRR genes controlling flowering and maturity during soybean domestication. Nat. Genet. 52, 428–436 (2020).

    Article  CAS  Google Scholar 

  7. Lu, Z. et al. The prevalence, evolution and chromatin signatures of plant regulatory elements. Nat. Plants 5, 1250–1259 (2019).

    Article  CAS  Google Scholar 

  8. Prescott, S. L. et al. Enhancer divergence and cis-regulatory evolution in the human and chimp neural crest. Cell 163, 68–83 (2015).

    Article  CAS  Google Scholar 

  9. Hu, G. et al. Evolution and diversity of the cotton genome. In Cotton Precision Breeding (eds M. Rahman, Y. Zafar & T. Zhang) 25–78 (Springer, 2021).

  10. Hawkins, J. S., Proulx, S. R., Rapp, R. A. & Wendel, J. F. Rapid DNA loss as a counterbalance to genome expansion through retrotransposon proliferation in plants. Proc. Natl Acad. Sci. USA 106, 17811–17816 (2009).

  11. Hawkins, J. S., Kim, H., Nason, J. D., Wing, R. A. & Wendel, J. F. Differential lineage-specific amplification of transposable elements is responsible for genome size variation in Gossypium. Genome Res. 16, 1252–1261 (2006).

    Article  CAS  Google Scholar 

  12. Wang, M. et al. Comparative genome analyses highlight transposon-mediated genome expansion and the evolutionary architecture of 3D genomic folding in cotton. Mol. Biol. Evol. 38, 3621–3636 (2021).

    Article  CAS  Google Scholar 

  13. Applequist, W. L., Cronn, R. & Wendel, J. F. Comparative development of fiber in wild and cultivated cotton. Evol. Dev. 3, 3–17 (2001).

    Article  CAS  Google Scholar 

  14. Yuan, D. et al. Parallel and intertwining threads of domestication in allopolyploid cotton. Adv. Sci. 8, 2003634 (2021).

    Article  Google Scholar 

  15. Wang, M. et al. Asymmetric subgenome selection and cis-regulatory divergence during cotton domestication. Nat. Genet. 49, 579–587 (2017).

    Article  CAS  Google Scholar 

  16. Li, J. et al. Cotton pan-genome retrieves the lost sequences and genes during domestication and selection. Genome Biol. 22, 119 (2021).

    Article  CAS  Google Scholar 

  17. He, S. et al. The genomic basis of geographic differentiation and fiber improvement in cultivated cotton. Nat. Genet. 53, 916–924 (2021).

    Article  CAS  Google Scholar 

  18. Ou, S., Chen, J. & Jiang, N. Assessing genome assembly quality using the LTR Assembly Index (LAI). Nucleic Acids Res. 46, e126 (2018).

    Google Scholar 

  19. Grover, C. E. et al. The Gossypium anomalum genome as a resource for cotton improvement and evolutionary analysis of hybrid incompatibility. G3 (Bethesda) 11, jkab319 (2021).

    Article  CAS  Google Scholar 

  20. Grover, C. E. et al. The Gossypium longicalyx genome as a resource for cotton breeding and evolution. G3 (Bethesda) 10, 1457–1467 (2020).

    Article  CAS  Google Scholar 

  21. Yu, D. et al. Multi-omics assisted identification of the key and species-specific regulatory components of drought-tolerant mechanisms in Gossypium stocksii. Plant Biotechnol. J. 19, 1690–1692 (2021).

    Article  Google Scholar 

  22. Huang, G. et al. Genome sequence of Gossypium herbaceum and genome updates of Gossypium arboreum and Gossypium hirsutum provide insights into cotton A-genome evolution. Nat. Genet. 52, 516–524 (2020).

    Article  CAS  Google Scholar 

  23. Han, J. et al. Rapid proliferation and nucleolar organizer targeting centromeric retrotransposons in cotton. Plant J. 88, 992–1005 (2016).

    Article  CAS  Google Scholar 

  24. Grover, C. E. et al. Comparative genomics of an unusual biogeographic disjunction in the cotton tribe (Gossypieae) yields insights into genome downsizing. Genome Biol. Evol. 9, 3328–3344 (2017).

    Article  CAS  Google Scholar 

  25. Udall, J. A. et al. The genome sequence of Gossypioides kirkii illustrates a descending dysploidy in plants. Front. Plant Sci. 10, 1541 (2019).

    Article  Google Scholar 

  26. Cronn, R. C., Small, R. L., Haselkorn, T. & Wendel, J. F. Rapid diversification of the cotton genus (Gossypium: Malvaceae) revealed by analysis of sixteen nuclear and chloroplast genes. Am. J. Bot. 89, 707–725 (2002).

    Article  CAS  Google Scholar 

  27. Wendel, J. F. New world tetraploid cottons contain old-world cytoplasm. Proc. Natl Acad. Sci. USA 86, 4132–4136 (1989).

    Article  CAS  Google Scholar 

  28. Clark, P. U. et al. The last glacial maximum. Science 325, 710–714 (2009).

    Article  CAS  Google Scholar 

  29. Chen, Z. J. et al. Genomic diversifications of five Gossypium allopolyploid species and their impact on cotton improvement. Nat. Genet. 52, 525–533 (2020).

    Article  CAS  Google Scholar 

  30. Udall, J. A. et al. De Novo genome sequence assemblies of Gossypium raimondii and Gossypium turneri. G3 (Bethesda) 9, 3079–3085 (2019).

    Article  CAS  Google Scholar 

  31. Wang, M. et al. Reference genome sequences of two cultivated allotetraploid cottons, Gossypium hirsutum and Gossypium barbadense. Nat. Genet. 51, 224–229 (2019).

    Article  Google Scholar 

  32. Yang, Z. et al. Cotton D genome assemblies built with long-read data unveil mechanisms of centromere evolution and stress tolerance divergence. BMC Biol. 19, 115 (2021).

    Article  CAS  Google Scholar 

  33. Urbut, S. M., Wang, G., Carbonetto, P. & Stephens, M. Flexible statistical methods for estimating and testing effects in genomic studies with multiple conditions. Nat. Genet. 51, 187–195 (2019).

    Article  CAS  Google Scholar 

  34. Stein, J. C. et al. Genomes of 13 domesticated and wild rice relatives highlight genetic conservation, turnover and innovation across the genus Oryza. Nat. Genet. 50, 285–296 (2018).

    Article  CAS  Google Scholar 

  35. Zhang, X. et al. Pan-genome of Raphanus highlights genetic variation and introgression among domesticated, wild and weedy radishes. Mol. Plant 14, 2032–2055 (2021).

    Article  CAS  Google Scholar 

  36. Cai, X. et al. Impacts of allopolyploidization and structural variation on intraspecific diversification in Brassica rapa. Genome Biol. 22, 166 (2021).

    Article  CAS  Google Scholar 

  37. Wendel, J. F. & Grover, C. E. Taxonomy and evolution of the cotton genus, In Cotton, Agronomy Monograph 24 (eds D. Fang and R. Percy) 25–44 (American Society of Agronomy, 2015).

  38. Hughes, S. E. & Hawley, R. S. Heterochromatin: a rapidly evolving species barrier. PLoS Biol. 7, e1000233 (2009).

    Article  Google Scholar 

  39. Luo, X. et al. 3D Genome of macaque fetal brain reveals evolutionary innovations during primate corticogenesis. Cell 184, 723–740 (2021).

    Article  CAS  Google Scholar 

  40. Zhang, Y. et al. Evolutionary rewiring of the wheat transcriptional regulatory network by lineage-specific transposable elements. Genome Res. 31, 2276–2289 (2021).

    Article  Google Scholar 

  41. Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).

    Article  CAS  Google Scholar 

  42. Burton, J. N. et al. Chromosome-scale scaffolding of de novo genome assemblies based on chromatin interactions. Nat. Biotechnol. 31, 1119–1125 (2013).

    Article  CAS  Google Scholar 

  43. Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 259 (2015).

    Article  Google Scholar 

  44. Ou, S. & Jiang, N. LTR_retriever: A highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 176, 1410–1422 (2018).

    Article  CAS  Google Scholar 

  45. Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).

    Article  Google Scholar 

  46. Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).

    Article  CAS  Google Scholar 

  47. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Article  CAS  Google Scholar 

  48. Purcell, S. et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    Article  CAS  Google Scholar 

  49. Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

    Article  CAS  Google Scholar 

  50. Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).

    Article  CAS  Google Scholar 

  51. Green, R. E. et al. A draft sequence of the Neandertal genome. Science 328, 710–722 (2010).

    Article  CAS  Google Scholar 

  52. Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics 30, 1312–1313 (2014).

    Article  CAS  Google Scholar 

  53. Grover, C. E. et al. Re-evaluating the phylogeny of allopolyploid Gossypium L. Mol. Phylogenet. Evol. 92, 45–52 (2015).

    Article  Google Scholar 

  54. Emms, D. M. & Kelly, S. OrthoFinder: phylogenetic orthology inference for comparative genomics. Genome Biol. 20, 238 (2019).

    Article  Google Scholar 

  55. De Bie, T., Cristianini, N., Demuth, J. P. & Hahn, M. W. CAFE: a computational tool for the study of gene family evolution. Bioinformatics 22, 1269–1271 (2006).

    Article  Google Scholar 

  56. Katoh, K. & Standley, D. M. MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability. Mol. Biol. Evol. 30, 772–780 (2013).

    Article  CAS  Google Scholar 

  57. Goel, M., Sun, H., Jiao, W. B. & Schneeberger, K. SyRI: finding genomic rearrangements and local sequence differences from whole-genome assemblies. Genome Biol. 20, 277 (2019).

    Article  Google Scholar 

  58. Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).

    Article  CAS  Google Scholar 

  59. Zhu, T., Liao, K., Zhou, R., Xia, C. & Xie, W. ATAC-seq with unique molecular identifiers improves quantification and footprinting. Commun. Biol. 3, 675 (2020).

    Article  CAS  Google Scholar 

  60. Imakaev, M. et al. Iterative correction of Hi-C data reveals hallmarks of chromosome organization. Nat. Methods 9, 999–1003 (2012).

    Article  CAS  Google Scholar 

  61. Ramirez, F. et al. High-resolution TADs reveal DNA sequences underlying genome organization in flies. Nat. Commun. 9, 189 (2018).

    Article  Google Scholar 

  62. Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).

    Article  CAS  Google Scholar 

  63. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    Article  CAS  Google Scholar 

  64. Smith, T., Heger, A. & Sudbery, I. UMI-tools: modeling sequencing errors in Unique Molecular Identifiers to improve quantification accuracy. Genome Res. 27, 491–499 (2017).

    Article  CAS  Google Scholar 

  65. Zhang, Y. et al. Model-based analysis of ChIP-seq (MACS). Genome Biol. 9, R137 (2008).

    Article  Google Scholar 

  66. Wang, X. T. et al. HiTAD: detecting the structural and functional hierarchies of topologically associating domains from chromatin interactions. Nucleic Acids Res. 45, e163 (2017).

    Article  Google Scholar 

  67. Kaul, A., Bhattacharyya, S. & Ay, F. Identifying statistically significant chromatin contacts from Hi-C data with FitHiC2. Nat. Protoc. 15, 991–1012 (2020).

    Article  CAS  Google Scholar 

  68. Rao, S. S. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).

    Article  CAS  Google Scholar 

  69. Hinrichs, A. S. et al. The UCSC genome browser database: update 2006. Nucleic Acids Res. 34, D590–D598 (2006).

    Article  CAS  Google Scholar 

  70. Denas, O. et al. Genome-wide comparative analysis reveals human-mouse regulatory landscape and evolution. BMC Genomics 16, 87 (2015).

    Article  Google Scholar 

  71. Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).

    Article  CAS  Google Scholar 

  72. Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).

    Article  CAS  Google Scholar 

  73. Lippert, C. et al. FaST linear mixed models for genome-wide association studies. Nat. Methods 8, 833–835 (2011).

    Article  CAS  Google Scholar 

  74. Ongen, H., Buil, A., Brown, A. A., Dermitzakis, E. T. & Delaneau, O. Fast and efficient QTL mapper for thousands of molecular phenotypes. Bioinformatics 32, 1479–1485 (2015).

    Article  Google Scholar 

  75. Silva, I. T., Rosales, R. A., Holanda, A. J., Nussenzweig, M. C. & Jankovic, M. Identification of chromosomal translocation hotspots via scan statistics. Bioinformatics 30, 2551–2558 (2014).

    Article  CAS  Google Scholar 

  76. Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).

    Article  CAS  Google Scholar 

  77. Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).

    Article  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China (2021YFF1000100 to X.Z., 2021YFF1000900 to M.W.) and the National Natural Science Foundation of China (32170645 and 31922069 to M.W.). This work was also supported by the Foundation of Hubei Hongshan Laboratory (2021hszd014 to M.W.) and the Young Elite Scientists Sponsorship Program (2021QNRC001 to J.L.) by China Association for Science and Technology. We thank the National Wild Cotton Nursery (Sanya, China) for providing seeds of wild cotton species. We thank the high-performance computing platform at National Key Laboratory of Crop Genetic Improvement in Huazhong Agricultural University.

Author information

Authors and Affiliations

Authors

Contributions

X.Z., M.W. and J.F.W. designed the experiments and managed the project. J.L., M.W. and X. He performed genome assembly, annotation and pan-genome construction. M.W., Z.Q., Y.M., Z.L., R.W. and X.C. performed genome resequencing, RNA-seq and data analysis. Y.L., X. Huang. and J.Y. performed 3D genome data analysis. L.P., P.W. and X.T. performed Hi-C experiments. L.P. and C.X. performed ATAC-seq experiments. M.W. and J.L. proposed a framework of cotton fiber evolution. X.D. provided seeds of G. arboreum. Y.S. provided fiber RNA-seq data of wild cotton. D.F., L.T., S.J. and L.Z. contributed to project discussion. M.W., J.L., Y.L. and Z.Q. wrote the manuscript draft, and X.Z., J.F.W. and C.E.G. revised it.

Corresponding authors

Correspondence to Jonathan F. Wendel or Xianlong Zhang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Michael Bevan and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Assembly assessment and chromosome characterization of the 7 Gossypium genomes.

a, BUSCO assessment in 7 genomes. b, LAI evaluation for each assembled chromosome in 3 Mb windows sliding 300 Kb. c, Circos plot showing chromosome-level features of the Gossypium genomes. Tracks represent chromosome length (a), gene coverage (b), TE coverage (c), LTR/non-LTR coverage (d), Gypsy, DIRS, LARD, Copia coverage (e), GC content (f), gene expression (g), A/B compartment (h), and synthetic blocks (i) in 1 Mb windows sliding 200 Kb.

Extended Data Fig. 2 Composition and evolutionary history of diploid Gossypium genomes.

a, Genomic components in 7 Gossypium species. b, Phylogenetic relationships based on 3,928 single-copy orthologs from 12 diploid cotton genomes, 5 tetraploid cotton genomes, G. kirkii and T. cacao. c, Estimation of whole-genome duplication across 10 Gossypium genomes, G. kirkii and T. cacao. The distribution of Ks (Synonymous substitution rate) was calculated. d, Estimation of divergence time for species speciation.

Extended Data Fig. 3 Inferred genomic history of Gossypium.

a, Phylogenetic tree and population structure with different numbers of clusters (K = 2, 3, and 4) in 70 Gossypium accessions. Gossypioides kirkii was used as an outgroup for Gossypium population analysis. b, Nucleotide diversity (π) and fixation index divergence (Fst) across 8 karyotype groups. c, The degree of gene flow (introgression) on 13 chromosomes, with green and orange lines indicate ancient genomes introgression regions in A1 and A2 species compared to other genomes. d, The expression level of genes located in introgression and non-introgression genome regions (two-sided Wilcoxon signed-rank test, P < 2.2 × 10−16). center line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range. e, Demographic history of representative Gossypium species reconstructed from genome resequencing data. The red line represents the estimated effective population size (Ne), and the gray curve represents a bootstrap estimate of 100 PSMC resampling from the original sequence. The DNA resequencing data are shown in Supplementary Table 9.

Extended Data Fig. 4 Structural variations among Gossypium species.

a, Genomic components of structural variation and syntenic regions in 10 genomes. b, The length of inversion (INV), deletion (DEL), duplication (DUP) and translocation (TRA). c, Identification of large inversions in Chr02, Chr04 between A1 and A1a genomes. The two heatmaps of the first panel show a chromatin interaction matrix, including mapping of Hi-C data in A1a against the A1 genome (A1a_map_A1), mapping of Hi-C data in A1 against A1 genome (A1_map_A1). The second panel shows genome sequence alignment between A1 and A1a. The validated inversions are shown by arrows. The third panel represents mapping of Hi-C data in A1 against the A1a genome (A1_map_A1a), and mapping of Hi-C data in A1a against A1a genome (A1a_map_A1a). The fourth panel represents long reads mapping of the inversion breakpoints (bk) (flanking 10 Kb regions) that supports the correct assembly of the A1a, A1 and A2 genomes. The read counts are shown for each breakpoint. At the bottom panel, gel images show PCR validation of these inversions. For c and d, the PCR experiment was performed twice independently. d, Identification of large inversions and validation in Chr04 and Chr10 between A1a, A1 and A2.

Source data

Extended Data Fig. 5 Gene-based pan-genome feature for Gossypium.

a, The number of gene families in Gossypium pan-genes. The histogram shows the number of gene families in 10 subgenomes or 12 diploid genomes with different frequencies. b, The number of core, softcore, variable and specific genes in the pan-gene analysis. c, Proportion of core, softcore, variable and specific genes with InterPro domains. d-h, Gene expression, CDS length, exon number, TE coverage of genes, and nonsynonymous/synonymous (Ka/Ks) rate. center line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range. For d-h, two-sided Wilcoxon rank sum test, ***P < 2.2 × 10−16.

Extended Data Fig. 6 Characterization of shared and specific flLTR transposable elements.

a, Correlation of gene number and TE content with genome size variation in Gossypium. b, Insertion time of flLTR amplification across 15 Gossypium genomes and 2 outgroup genomes. c, Insertion time of Gypsy, Copia, DIRS and LARD. d, Phylogeny of full-length Ty3-Gypsy, DIRS, and Copia in 10 genomes. e, The number of shared and specific flLTRs in 10 genomes. f, The gene expression levels with shared and specific flLTRs. center line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range. Two-sided Wilcoxon rank sum test, ***P < 2.2 × 10−16. g, GO enrichment of shared and specific flLTR inserted genes.

Extended Data Fig. 7 Characterization of compartment and TADs.

a-c, TE coverage (a), expression correlation (b) and ATAC-seq signals (c) of orthologous genes located in changed and conserved compartment, respectively. d and e, The number (d) and length (e) of TADs in 8 genomes. f, The number of conserved TADs between 8 genomes. g, Percentage of core, variable and specific genes in core, variable and specific TAD boundary. The numbers show gene number.. Fisher’s exact test is used for significance test. h, TAD-separation score in TAD boundaries with different conservation degrees and random regions. i and j, The number of genes (i), and enrichment of ATAC-seq signals (j) in TAD boundaries with different conservation degrees. For a, b and i, two-sided Wilcoxon rank sum test, ***P < 2.2 × 10−16. For a, b, e and i, center line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range.

Extended Data Fig. 8 Characterization of chromatin loops.

a, Chromatin loops with different conservation degrees (specific, shared in 2 genomes, shared in 3 genomes …, shared in 8 genomes). b, TE coverage of loop anchors with different conservation degrees. c, Two examples show specific TE insertions leading to changes in distance between genes of B1 and G1 genomes accompanied by the formation of specific loops and differential expression of genes. The gray bands connect the orthologous segments. d-f, Differences in related loop numbers (d), expression of regulated genes (e) and distance of related loops (f) between conserved and specific CREs in 8 species. For b and d-f, two-sided Wilcoxon rank sum test, ***P < 2.2 × 10−16; center line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range. g, Proportion of specific, variable, and conserved CREs with and without TE. h, Proportion of conserved and specific CREs with and without PAV. Two-sided Fisher’s exact test, ***P < 0.001. i, GO enrichment of genes involved in highly (blue) and lowly (yellow) conserved networks.

Extended Data Fig. 9 GWAS and eQTL identification.

a, Manhattan plot of GWAS for fiber length (FL). The dashed line represents the significant threshold (-log10P = 6.15). b, QQ-plot result for the GWAS on fiber length. c, Box and Dot plots show the effects of allelic combinations for FL. Diploid cotton accessions were categorized according to the combination of different numbers of loci with favorable alleles. The considering QTLs were listed in the right. center line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range. d-h, Dot plot of eQTLs and the corresponding eGenes at 4 DPA (d), 8 DPA (e), 12 DPA (f), 16 DPA (g) and 20 DPA (h). X-axes show the SNP position in each chromosome and y-axes show gene position in each chromosome. The color of each dot represents the significance (P-value) of each eQTL-eGene association.

Extended Data Fig. 10 Characterization of eQTL and fine-mapping.

a, The number of cis-eGenes that were shared by eQTLs in at least two developmental stages or specifically found in only one stage. b, Distribution of the distance between cis-eQTL and transcription start site (TSS) of the regulated gene. c, Genomic annotations of cis-eQTLs at each developmental stage. d, The number of trans-eGenes at each developmental stage. The bars are colored by the number of trans-eQTLs for each gene. e, Functional enrichment of independent cis-eQTLs and trans-eQTLs. Odds ratio larger than 1 means that eQTLs are associated with higher odds of variants in specific genomic regions. Bars indicate mean ± SD across 5 stages. f, The number of stage-specific cis-eQTLs. g, Pearson correlation between cis-eQTL effect size and cis-eGene expression level across stages. ‘Uninterpretable’ means the directions of effect sizes of cis-eQTL are different among stages. ‘Low expr.’ means not used for correlation test owing to low expression. h, A representative example of chromatin loop-supported genetic regulation between eQTL and eGene. i, Comparison of fiber length (FL) between cotton accessions in Group-1 and Group-2, related to Fig. 5j. Two-sided Wilcoxon rank sum test, P-value = 0.047. center line, median; box limits, upper and lower quartiles; whiskers, 1.5× interquartile range. j, Manhattan plot of transcriptome-wide association analysis (TWAS) of FL at 4 DPA (circle), 8 DPA (square), 12 DPA (diamond), 16 DPA (up-triangle) and 20 DPA (down-triangle). The red dashed line represents Bonferroni-corrected P-value of 0.05 and the black dashed line represents FDR threshold of 0.05. k, Colocalization of FL-associated GWAS loci and cis-eQTLs. PP.H4 from COLOC is shown. Red color means the gene expression level is positively correlated with fiber length, while blue color represents a negative correlation. l, The Hi-C interaction matrices at 5 Kb resolution. Small purple boxes indicate chromatin loops that connect the eQTL and target gene and yellow boxes indicate other loops, related to Fig. 5n.

Supplementary information

Supplementary Information

This file contains Supplementary Figures 1–11, Results and Methods.

Reporting Summary

Peer Review File

Supplementary Table 1

This file contains Supplementary Tables 1–30.

Source data

Source Data Extended Data Fig. 4

Unprocessed gels for Extended Data Fig. 4.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, M., Li, J., Qi, Z. et al. Genomic innovation and regulatory rewiring during evolution of the cotton genus Gossypium. Nat Genet 54, 1959–1971 (2022). https://doi.org/10.1038/s41588-022-01237-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-022-01237-2

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research