The ancestors of Gossypium arboreum and Gossypium herbaceum provided the A subgenome for the modern cultivated allotetraploid cotton. Here, we upgraded the G. arboreum genome assembly by integrating different technologies. We resequenced 243 G. arboreum and G. herbaceum accessions to generate a map of genome variations and found that they are equally diverged from Gossypium raimondii. Independent analysis suggested that Chinese G. arboreum originated in South China and was subsequently introduced to the Yangtze and Yellow River regions. Most accessions with domestication-related traits experienced geographic isolation. Genome-wide association study (GWAS) identified 98 significant peak associations for 11 agronomically important traits in G. arboreum. A nonsynonymous substitution (cysteine-to-arginine substitution) of GaKASIII seems to confer substantial fatty acid composition (C16:0 and C16:1) changes in cotton seeds. Resistance to fusarium wilt disease is associated with activation of GaGSTF9 expression. Our work represents a major step toward understanding the evolution of the A genome of cotton.

  • Subscribe to Nature Genetics for full access:



Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Wendel, J. F., Flagel, L. E. & Adams, K. L. Jeans, genes, and genomes: cotton as a model for studying polyploidy. in Polyploidy and Genome Evolution (eds. Soltis, P. S. & Soltis, D. E.) 181–207 (Springer, Berlin and Heidelberg, 2012).

  2. 2.

    Wendel, J. F., Brubaker, C. L. & Seelanan, T. The origin and evolution of Gossypium. in Physiology of Cotton (eds. Stewart, J. M. et al.) 1–18 (Springer Netherlands, Houten, the Netherlands, 2010).

  3. 3.

    Watt, G. The Wild and Cultivated Cotton Plants of the World (Longmans, London, 1907).

  4. 4.

    Institute of Cotton Research, CAAS & Institute of Industrial Crops, JAAS. The Chinese Asiatic Cottons (ChinaAgriculture Press, Beijing, 1989).

  5. 5.

    Desai, A., Chee, P. W., Rong, J., May, O. L. & Paterson, A. H. Chromosome structural changes in diploid and tetraploid A genomes of Gossypium. Genome 49, 336–345 (2006).

  6. 6.

    Ma, X. X., Zhou, B. L., Lü, Y. H., Guo, W. Z. & Zhang, T. Z. Simple sequence repeat genetic linkage maps of A-genome diploid cotton (Gossypium arboreum). J. Integr. Plant Biol. 50, 491–502 (2008).

  7. 7.

    Stanton, M. A., Stewart, J. M., Pervical, A. E. & Wendel, J. F. Morphological diversity and relationships in the A-genome cottons, Gossypium arboreum and G. herbaceum. Crop Sci. 34, 519–527 (1994).

  8. 8.

    Chen, Y. et al. A new synthetic amphiploid (AADDAA) between Gossypium hirsutum and G. arboreum lays the foundation for transferring resistances to Verticillium and drought. PLoS One 10, e0128981 (2015).

  9. 9.

    Kulkarni, V. N., Khadi, B. M., Maralappanavar, M. S., Deshapande, L. A. & Narayanan, S. S. The worldwide gene pools of Gossypium arboreum L. and G. herbaceum L. and their improvement. in Genetics and Genomics of Cotton (ed. Paterson, A. H.) 69–97 (Springer, New York, 2009).

  10. 10.

    Wang, K. et al. The draft genome of a diploid cotton Gossypium raimondii. Nat. Genet. 44, 1098–1103 (2012).

  11. 11.

    Paterson, A. H. et al. Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. Nature 492, 423–427 (2012).

  12. 12.

    Li, F. et al. Genome sequence of the cultivated cotton Gossypium arboreum. Nat. Genet. 46, 567–572 (2014).

  13. 13.

    Li, F. et al. Genome sequence of cultivated Upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution. Nat. Biotechnol. 33, 524–530 (2015).

  14. 14.

    Zhang, T. et al. Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement. Nat. Biotechnol. 33, 531–537 (2015).

  15. 15.

    Liu, X. et al. Gossypium barbadense genome sequence provides insight into the evolution of extra-long staple fiber and specialized metabolites. Sci. Rep. 5, 14139 (2015).

  16. 16.

    Yuan, D. et al. The genome sequence of Sea-Island cotton (Gossypium barbadense) provides insights into the allopolyploidization and development of superior spinnable fibres. Sci. Rep. 5, 17662 (2015).

  17. 17.

    Huang, X. et al. Genome-wide association studies of 14 agronomic traits in rice landraces. Nat. Genet. 42, 961–967 (2010).

  18. 18.

    Huang, X. et al. Genome-wide association study of flowering time and grain yield traits in a worldwide collection of rice germplasm. Nat. Genet. 44, 32–39 (2011).

  19. 19.

    Huang, X. et al. A map of rice genome variation reveals the origin of cultivated rice. Nature 490, 497–501 (2012).

  20. 20.

    Hufford, M. B. et al. Comparative population genomics of maize domestication and improvement. Nat. Genet. 44, 808–811 (2012).

  21. 21.

    Chia, J. M. et al. Maize HapMap2 identifies extant variation from a genome in flux. Nat. Genet. 44, 803–807 (2012).

  22. 22.

    Zhou, Z. et al. Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean. Nat. Biotechnol. 33, 408–414 (2015).

  23. 23.

    Jia, G. et al. A haplotype map of genomic variations and genome-wide association studies of agronomic traits in foxtail millet (Setaria italica). Nat. Genet. 45, 957–961 (2013).

  24. 24.

    Qi, J. et al. A genomic variation map provides insights into the genetic basis of cucumber domestication and diversity. Nat. Genet. 45, 1510–1515 (2013).

  25. 25.

    Lin, T. et al. Genomic analyses provide insights into the history of tomato breeding. Nat. Genet. 46, 1220–1226 (2014).

  26. 26.

    Wang, M. et al. Asymmetric subgenome selection and cis-regulatory divergence during cotton domestication. Nat. Genet. 49, 579–587 (2017).

  27. 27.

    Fang, L. et al. Genomic analyses in cotton identify signatures of selection and loci associated with fiber quality and yield traits. Nat. Genet. 49, 1089–1098 (2017).

  28. 28.

    Wendel, J. F., Olson, P. D. & Stewart, J. M. Genetic diversity, introgression, and independent domestication of old world cultivated cottons. Am. J. Bot. 76, 1795–1806 (1989).

  29. 29.

    Guo, W., Zhou, B. L., Yang, L. M., Wang, W. & Zhang, T. Z. Genetic diversity of landraces in Gossypium arboreum L. race sinense assessed with simple sequence repeat markers. J. Integr. Plant Biol. 48, 1008–1017 (2006).

  30. 30.

    Olsen, K. M. & Wendel, J. F. A bountiful harvest: genomic insights into crop domestication phenotypes. Annu. Rev. Plant Biol. 64, 47–70 (2013).

  31. 31.

    Liu, Q., Singh, S. P. & Green, A. G. High-stearic and high-oleic cottonseed oils produced by hairpin RNA-mediated post-transcriptional gene silencing. Plant Physiol. 129, 1732–1743 (2002).

  32. 32.

    Yu, N., Xiao, W. F., Zhu, J., Chen, X. Y. & Peng, C. C. The Jatropha curcas KASIII gene alters fatty acid composition of seeds in Arabidopsis thaliana. Biol. Plant. 59, 773–782 (2015).

  33. 33.

    Turley, R. B. & Chapman, K. D. Ontogeny of cotton seeds: gametogenesis, embryogenesis, germination, and seedling growth. in Cotton Physiology (eds. Stewart, J. M. et al.) 332–341 (Springer Netherlands, Houten, the Netherlands, 2010).

  34. 34.

    Kelley, L. A., Mezulis, S., Yates, C. M., Wass, M. N. & Sternberg, M. J. E. The Phyre2 web portal for protein modeling, prediction and analysis. Nat. Protoc. 10, 845–858 (2015).

  35. 35.

    Oerke, E. C. Crop losses to pests. J. Agric. Sci. 144, 31–43 (2005).

  36. 36.

    Edwards, R., Dixon, D. P. & Walbot, V. Plant glutathione S-transferases: enzymes with multiple functions in sickness and in health. Trends Plant Sci. 5, 193–198 (2000).

  37. 37.

    Roppolo, D. et al. A novel protein family mediates Casparian strip formation in the endodermis. Nature 473, 380–383 (2011).

  38. 38.

    Roppolo, D. et al. Functional and evolutionary analysis of the CASPARIAN STRIP MEMBRANE DOMAIN PROTEIN family. Plant Physiol. 165, 1709–1722 (2014).

  39. 39.

    Schnittger, A., Schöbinger, U., Stierhof, Y. D. & Hülskamp, M. Ectopic B-type cyclin expression induces mitotic cycles in endoreduplicating Arabidopsis trichomes. Curr. Biol. 12, 415–420 (2002).

  40. 40.

    Yang, C. et al. A regulatory gene induces trichome formation and embryo lethality in tomato. Proc. Natl Acad. Sci. USA 108, 11836–11841 (2011).

  41. 41.

    Lieberman-Aiden, E. et al. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326, 289–293 (2009).

  42. 42.

    Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).

  43. 43.

    Chin, C. S. et al. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods 10, 563–569 (2013).

  44. 44.

    Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21  (Suppl. 1), i351–i358 (2005).

  45. 45.

    Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).

  46. 46.

    Han, Y. & Wessler, S. R. MITE-Hunter: a program for discovering miniature inverted-repeat transposable elements from genomic sequences. Nucleic Acids Res. 38, e199 (2010).

  47. 47.

    Edgar, R. C. & Myers, E. W. PILER: identification and classification of genomic repeats. Bioinformatics 21 (Suppl. 1), i152–i158 (2005). 

  48. 48.

    Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mob. DNA 6, 11 (2015).

  49. 49.

    Keilwagen, J. et al. Using intron position conservation for homology-based gene prediction. Nucleic Acids Res. 44, e89 (2016).

  50. 50.

    Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435–W439 (2006).

  51. 51.

    Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome Biol. 9, R7 (2008).

  52. 52.

    Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).

  53. 53.

    Marchler-Bauer, A. et al. CDD: a Conserved Domain Database for the functional annotation of proteins. Nucleic Acids Res. 39, D225–D229 (2011).

  54. 54.

    Hunter, S. et al. InterPro in 2011: new developments in the family and domain prediction database. Nucleic Acids Res. 40, D306–D312 (2012).

  55. 55.

    Dimmer, E. C. et al. The UniProt-GO Annotation database in 2011. Nucleic Acids Res. 40, D565–D570 (2012).

  56. 56.

    Kanehisa, M. & Goto, S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 28, 27–30 (2000).

  57. 57.

    Paterson, A. H., Brubaker, C. L. & Wendel, J. F. A rapid method for extraction of cotton (Gossypium spp.) genomic DNA suitable for RFLP or PCR analysis. Plant Mol. Biol. Rep. 11, 122–127 (1993).

  58. 58.

    Takagi, H. et al. QTL-seq: rapid mapping of quantitative trait loci in rice by whole genome resequencing of DNA from two bulked populations. Plant J. 74, 174–183 (2013).

  59. 59.

    Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

  60. 60.

    McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

  61. 61.

    Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

  62. 62.

    Felsenstein, J. PHYLIP-phylogeny inference package (version 3.2). Cladistics 5, 163–166 (1989).

  63. 63.

    Falush, D., Stephens, M. & Pritchard, J. K. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164, 1567–1587 (2003).

  64. 64.

    Barrett, J. C., Fry, B., Maller, J. & Daly, M. J. Haploview: analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263–265 (2005).

  65. 65.

    Haegi, A. et al. A newly developed real-time PCR assay for detection and quantification of Fusarium oxysporum and its use in compatible and incompatible interactions with grafted melon genotypes. Phytopathology 103, 802–810 (2013).

  66. 66.

    Dowd, M. K. et al. Fatty acid profiles of cottonseed genotypes from the national cotton variety trials. J. Cotton Sci. 14, 64–73 (2010).

  67. 67.

    Kang, H. M. et al. Variance component model to account for sample structure in genome-wide association studies. Nat. Genet. 42, 348–354 (2010).

  68. 68.

    Yang, J., Zaitlen, N. A., Goddard, M. E., Visscher, P. M. & Price, A. L. Advantages and pitfalls in the application of mixed-model association methods. Nat. Genet. 46, 100–106 (2014).

  69. 69.

    Li, M. X., Yeung, J. M. Y., Cherny, S. S. & Sham, P. C. Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets. Hum. Genet. 131, 747–756 (2012).

  70. 70.

    Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree: computing large minimum evolution trees with profiles instead of a distance matrix. Mol. Biol. Evol. 26, 1641–1650 (2009).

  71. 71.

    Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).

Download references


This work was supported by funding from the National Natural Science Foundation of China (grants 31621005 to F. Li and 90717009 to Y.Z.), the National Key Technology R&D Program, the Ministry of Science and Technology (2016YFD0100203 to X.D. and 2016YFD0100306 to S. He), the National Science and Technology Support Program, the Ministry of Agriculture (2013BAD01B03 to X.D.), the Agricultural Science and Technology Innovation Program of the Chinese Academy of Agricultural Sciences (CAAS-ASTIP-IVFCAAS to S. Huang), and the leading talents of Guangdong Province Program (00201515 to S. Huang).

Author information

Author notes

  1. These authors contributed equally: Xiongming Du, Gai Huang, Shoupu He, Zhaoen Yang, Gaofei Sun, Xiongfeng Ma, Nan Li.


  1. Research Base, Anyang Institute of Technology, State Key Laboratory of Cotton Biology, Anyang, China

    • Xiongming Du
    • , Gaofei Sun
    •  & Fuguang Li
  2. Institute of Cotton Research of the Chinese Academy of Agricultural Sciences, Anyang, China

    • Xiongming Du
    • , Shoupu He
    • , Zhaoen Yang
    • , Xiongfeng Ma
    • , Xueyan Zhang
    • , Junling Sun
    • , Yinhua Jia
    • , Zhaoe Pan
    • , Wenfang Gong
    • , Heqin Zhu
    • , Lei Ma
    • , Daigang Yang
    • , Qian Gong
    • , Zhen Peng
    • , Liru Wang
    • , Xiaoyang Wang
    • , Shuangjiao Xu
    • , Haihong Shang
    • , Cairui Lu
    •  & Fuguang Li
  3. State Key Laboratory of Protein and Plant Gene Research, College of Life Sciences, Peking University, Beijing, China

    • Gai Huang
    • , Zhaohui Liu
    •  & Yuxian Zhu
  4. Institute for Advanced Studies and College of Life Sciences, Wuhan University, Wuhan, China

    • Gai Huang
    •  & Yuxian Zhu
  5. Genome Analysis Laboratory of the Ministry of Agriculture, Agricultural Genomics Institute at Shenzhen, Chinese Academy of Agricultural Sciences, Shenzhen, China

    • Nan Li
    • , Wei Fan
    • , Sanwen Huang
    •  & Tao Lin
  6. Biomarker Technologies Corporation, Beijing, China

    • Min Liu
    • , Fuyan Liu
    • , Fan Wang
    •  & Hongkun Zheng


  1. Search for Xiongming Du in:

  2. Search for Gai Huang in:

  3. Search for Shoupu He in:

  4. Search for Zhaoen Yang in:

  5. Search for Gaofei Sun in:

  6. Search for Xiongfeng Ma in:

  7. Search for Nan Li in:

  8. Search for Xueyan Zhang in:

  9. Search for Junling Sun in:

  10. Search for Min Liu in:

  11. Search for Yinhua Jia in:

  12. Search for Zhaoe Pan in:

  13. Search for Wenfang Gong in:

  14. Search for Zhaohui Liu in:

  15. Search for Heqin Zhu in:

  16. Search for Lei Ma in:

  17. Search for Fuyan Liu in:

  18. Search for Daigang Yang in:

  19. Search for Fan Wang in:

  20. Search for Wei Fan in:

  21. Search for Qian Gong in:

  22. Search for Zhen Peng in:

  23. Search for Liru Wang in:

  24. Search for Xiaoyang Wang in:

  25. Search for Shuangjiao Xu in:

  26. Search for Haihong Shang in:

  27. Search for Cairui Lu in:

  28. Search for Hongkun Zheng in:

  29. Search for Sanwen Huang in:

  30. Search for Tao Lin in:

  31. Search for Yuxian Zhu in:

  32. Search for Fuguang Li in:


F. Li, Y.Z., X.D., and T.L. conceived and designed the research. F. Li and S. Huang managed the project. T.L., N.L., M.L., F. Liu, F.W., H. Zheng., and G.S. performed the genome sequencing, assembly, and bioinformatics. X.D., S. He, J.S., Z.Y., X.M., X.Z., Y.J., Z. Pan., W.G., Z.L., H. Zhu., L.M., D.Y., Q.G., Z. Peng., L.W., S.X., and X.W. prepared the samples, performed phenotyping, and contributed to data analysis. Y.Z. designed the molecular experiments, and Z.Y. and G.H. performed the molecular experiments and led interpretation of the molecular-data analysis. S. He, Z.Y., and G.H. prepared the figures and tables. Y.Z., S. He, G.H., Z.Y., T.L., S. Huang, H.S., C.L., and W.F. wrote and revised the manuscript.

Competing interests

The authors declare no competing interests.

Corresponding authors

Correspondence to Tao Lin or Yuxian Zhu or Fuguang Li.

Supplementary information

  1. Supplementary Text and Figures

    Supplementary Figures 1–15 and Supplementary Tables 1–6

  2. Reporting Summary

  3. Supplementary Tables

    Supplementary Tables 7–18