Resequencing a core collection of upland cotton identifies genomic variation and loci influencing fiber quality and yield



Upland cotton is the most important natural-fiber crop. The genomic variation of diverse germplasms and alleles underpinning fiber quality and yield should be extensively explored. Here, we resequenced a core collection comprising 419 accessions with 6.55-fold coverage depth and identified approximately 3.66 million SNPs for evaluating the genomic variation. We performed phenotyping across 12 environments and conducted genome-wide association study of 13 fiber-related traits. 7,383 unique SNPs were significantly associated with these traits and were located within or near 4,820 genes; more associated loci were detected for fiber quality than fiber yield, and more fiber genes were detected in the D than the A subgenome. Several previously undescribed causal genes for days to flowering, fiber length, and fiber strength were identified. Phenotypic selection for these traits increased the frequency of elite alleles during domestication and breeding. These results provide targets for molecular selection and genetic manipulation in cotton improvement.

  • Subscribe to Nature Genetics for full access:



Additional access options:

Already a subscriber?  Log in  now or  Register  for online access.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.


  1. 1.

    Zhang, J. F., Fang, H., Zhou, H. P., Sanogo, S. & Ma, Z. Y. Genetics, breeding, and marker-assisted selection for Verticillium wilt resistance in cotton. Crop Sci. 54, 1–15 (2014).

  2. 2.

    Wendel, J. F. New World tetraploid cottons contain Old World cytoplasm. Proc. Natl Acad. Sci. USA 86, 4132–4136 (1989).

  3. 3.

    Chen, Z. J. et al. Toward sequencing cotton (Gossypium) genomes. Plant Physiol. 145, 1303–1310 (2007).

  4. 4.

    Dai, P. et al. Construction of core collection of upland cotton based on phenotypic data. J. Plant Genetic Resour. 17, 961–968 (2016).

  5. 5.

    Wang, R. H. A brief history of the introduction of American cotton cultivars into China. Zhongguo Nong Ye Ke Xue 4, 30–35 (1983).

  6. 6.

    Fang, L. et al. Genomic analyses in cotton identify signatures of selection and loci associated with fiber quality and yield traits. Nat. Genet. 49, 1089–1098 (2017).

  7. 7.

    Wang, M. et al. Asymmetric subgenome selection and cis-regulatory divergence during cotton domestication. Nat. Genet. 49, 579–587 (2017).

  8. 8.

    Huang, C. et al. Population structure and genetic basis of the agronomic traits of upland cotton in China revealed by a genome-wide association study using high-density SNPs. Plant Biotechnol. J. 15, 1374–1386 (2017).

  9. 9.

    Sun, Z. et al. Genome-wide association study discovered genetic variation and candidate genes of fibre quality traits in Gossypium hirsutum L. Plant Biotechnol. J. 15, 982–996 (2017).

  10. 10.

    Brown, A. H. D. The case for core collection. in The Use of Plant Genetic Resources (eds. Brown, A. H. D. et al.) 136–156 (Cambridge Univ. Press, Cambridge, 1989).

  11. 11.

    Foulk, J., Meredith, W., Mcalister, D. & Luke, D. Fiber and yarn properties improve with new cotton cultivar. J. Cotton Sci. 13, 212–220 (2009).

  12. 12.

    Huang, X. et al. Genome-wide association studies of 14 agronomic traits in rice landraces. Nat. Genet. 42, 961–967 (2010).

  13. 13.

    Huang, X. et al. A map of rice genome variation reveals the origin of cultivated rice. Nature 490, 497–501 (2012).

  14. 14.

    Yano, K. et al. Genome-wide association study using whole-genome sequencing rapidly identifies new genes influencing agronomic traits in rice. Nat. Genet. 48, 927–934 (2016).

  15. 15.

    Li, H. et al. Genome-wide association study dissects the genetic architecture of oil biosynthesis in maize kernels. Nat. Genet. 45, 43–50 (2013).

  16. 16.

    Mace, E. S. et al. Whole-genome sequencing reveals untapped genetic potential in Africa’s indigenous cereal crop sorghum. Nat. Commun. 4, 2320 (2013).

  17. 17.

    Jia, G. et al. A haplotype map of genomic variations and genome-wide association studies of agronomic traits in foxtail millet (Setaria italica). Nat. Genet. 45, 957–961 (2013).

  18. 18.

    Huang, X. & Han, B. Natural variations and genome-wide association studies in crop plants. Annu. Rev. Plant Biol. 65, 531–551 (2014).

  19. 19.

    Wang, K. et al. The draft genome of a diploid cotton Gossypium raimondii. Nat. Genet. 44, 1098–1103 (2012).

  20. 20.

    Dai, P. et al. Comprehensive evaluation and genetic diversity analysis of phenotypic traits of core collection in upland cotton. Zhongguo Nong Ye Ke Xue 49, 3694–3708 (2016).

  21. 21.

    Zhang, T. et al. Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement. Nat. Biotechnol. 33, 531–537 (2015).

  22. 22.

    Jiao, Y. et al. Genome-wide genetic changes during modern breeding of maize. Nat. Genet. 44, 812–815 (2012).

  23. 23.

    Wei, X. et al. Genetic discovery for oil production and quality in sesame. Nat. Commun. 6, 8609 (2015).

  24. 24.

    Li, F. et al. Genome sequence of the cultivated cotton Gossypium arboreum. Nat. Genet. 46, 567–572 (2014).

  25. 25.

    Zhou, Z. et al. Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean. Nat. Biotechnol. 33, 408–414 (2015).

  26. 26.

    Fang, L. et al. Genomic insights into divergence and dual domestication of cultivated allotetraploid cottons. Genome Biol. 18, 33 (2017).

  27. 27.

    Kopp, M. & Hermisson, J. The evolution of genetic architecture under frequency-dependent disruptive selection. Evolution 60, 1537–1550 (2006).

  28. 28.

    Arioli, T. Genetic engineering for cotton fiber improvement. Pflanzenschutz-Nachrichten Bayer 58, 140–150 (2005).

  29. 29.

    Kim, H. J. & Triplett, B. A. Cotton fiber growth in planta and in vitro: models for plant cell elongation and cell wall biogenesis. Plant Physiol. 127, 1361–1366 (2001).

  30. 30.

    Deng, X. W. et al. COP1, an Arabidopsis regulatory gene, encodes a protein with both a zinc-binding motif and a G beta homologous domain. Cell 71, 791–801 (1992).

  31. 31.

    Albert, S. & Gallwitz, D. Msb4p, a protein involved in Cdc42p-dependent organization of the actin cytoskeleton, is a Ypt/Rab-specific GAP. Biol. Chem 381, 453–456 (2000).

  32. 32.

    Hussey, P. J., Ketelaar, T. & Deeks, M. J. Control of the actin cytoskeleton in plant cell growth. Annu. Rev. Plant Biol. 57, 109–125 (2006).

  33. 33.

    Staiger, C. J. & Blanchoin, L. Actin dynamics: old friends with new stories. Curr. Opin. Plant Biol. 9, 554–562 (2006).

  34. 34.

    Li, X. B., Fan, X. P., Wang, X. L., Cai, L. & Yang, W. C. The cotton ACTIN1 gene is functionally expressed in fibers and participates in fiber elongation. Plant Cell 17, 859–875 (2005).

  35. 35.

    Serna, L. & Martin, C. Trichomes: different regulatory networks lead to convergent structures. Trends Plant Sci. 11, 274–280 (2006).

  36. 36.

    Jégu, T. et al. Multiple functions of Kip-related protein5 connect endoreduplication and cell elongation. Plant Physiol. 161, 1694–1705 (2013).

  37. 37.

    Shi, Y. H. et al. Transcriptome profiling, molecular biological, and physiological studies reveal a major role for ethylene in cotton fiber cell elongation. Plant Cell 18, 651–664 (2006).

  38. 38.

    Beasley, C. A. Hormonal regulation of growth in unfertilized cotton ovules. Science 179, 1003–1005 (1973).

  39. 39.

    Beasley, C. A. & Ting, I. P. Effects of plant growth substances on in vitro fiber development from unfertilized cotton ovules. Am. J. Bot. 61, 188–194 (1974).

  40. 40.

    Gialvalis, S. & Seagull, R.W. Plant hormones alter fiber initiation in unfertilized, cultured ovules of Gossypium hirsutum. J. Cotton Sci. 5, 252–258 (2001).

  41. 41.

    Seagull, R. W. & Giavalis, S. Pre- and post-anthesis application of exogenous hormones alters fiber production in Gossypium hirsutum L. cultivar Maxxa GTO. J. Cotton Sci. 8, 105–111 (2004).

  42. 42.

    Zhang, M. et al. Spatiotemporal manipulation of auxin biosynthesis in cotton ovule epidermal cells enhances fiber yield and quality. Nat. Biotechnol. 29, 453–458 (2011).

  43. 43.

    Tseng, T. S., Swain, S. M. & Olszewski, N. E. Ectopic expression of the tetratricopeptide repeat domain of SPINDLY causes defects in gibberellin response. Plant Physiol. 126, 1250–1258 (2001).

  44. 44.

    Lin, Z. et al. SlTPR1, a tomato tetratricopeptide repeat protein, interacts with the ethylene receptors NR and LeETR1, modulating ethylene and auxin responses and development. J. Exp. Bot. 59, 4271–4287 (2008).

  45. 45.

    Lin, Z., Ho, C. W. & Grierson, D. AtTRP1 encodes a novel TPR protein that interacts with the ethylene receptor ERS1 and modulates development inArabidopsis. J. Exp. Bot. 60, 3697–3714 (2009).

  46. 46.

    Zhang, M. et al. A tetratricopeptide repeat domain-containing protein SSR1 located in mitochondria is involved in root development and auxin polar transport in Arabidopsis. Plant J. 83, 582–599 (2015).

  47. 47.

    May, O. L., Bowman, D. T. & Calhoun, D. S. Genetic diversity of U.S. upland cotton cultivars released between 1980 and 1990. Crop Sci. 35, 1570–1574 (1995).

  48. 48.

    Van Esbroeck, G. A., Bowman, D. T., Calhoun, D. S. & May, O. L. Changes in the genetic diversity of cotton in the USA from 1970 to 1995. Crop Sci. 38, 33–37 (1998).

  49. 49.

    Chen, G. & Du, X. M. Genetic diversity of source germplasm of upland cotton in China as determined by SSR marker analysis. Acta Genet. Sin. 33, 733–745 (2006).

  50. 50.

    Fang, D. D. et al. A microsatellite-based genome-wide analysis of genetic diversity and linkage disequilibrium in upland cotton (Gossypium hirsutum L.) cultivars from major cotton-growing countries. Euphytica 191, 391–401 (2013).

  51. 51.

    Tyagi, P. et al. Genetic diversity and population structure in the US upland cotton (Gossypium hirsutum L.). Theor. Appl. Genet. 127, 283–295 (2014).

  52. 52.

    Ingvarsson, P. K. & Street, N. R. Association genetics of complex traits in plants. New Phytol. 189, 909–922 (2011).

  53. 53.

    Korte, A. & Farlow, A. The advantages and limitations of trait analysis with GWAS: a review. Plant Methods 9, 29 (2013).

  54. 54.

    Long, A. D. & Langley, C. H. The power of association studies to detect the contribution of candidate genetic loci to variation in complex traits. Genome Res. 9, 720–731 (1999).

  55. 55.

    McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

  56. 56.

    Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

  57. 57.

    Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics 26, 589–595 (2010).

  58. 58.

    Danecek, P. et al. The variant call format and VCFtools. Bioinformatics 27, 2156–2158 (2011).

  59. 59.

    Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

  60. 60.

    Zhou, X. & Stephens, M. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44, 821–824 (2012).

  61. 61.

    Poland, J. A., Bradbury, P. J., Buckler, E. S. & Nelson, R. J. Genome-wide nested association mapping of quantitative resistance to northern leaf blight in maize. Proc. Natl Acad. Sci. USA 108, 6893–6898 (2011).

  62. 62.

    Pfaffl, M. W. A new mathematical model for relative quantification in real-time RT-PCR. Nucleic Acids Res. 29, e45 (2001).

  63. 63.

    Senthil-Kumar, M. & Mysore, K. S. Tobacco rattle virus-based virus-induced gene silencing in Nicotiana benthamiana. Nat. Protoc. 9, 1549–1562 (2014).

  64. 64.

    Trapnell, C. et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat. Protoc. 7, 562–578 (2012).

Download references


We thank the National Mid-term Gene Bank for Cotton at the Cotton Research Institute, Chinese Academy of Agricultural Sciences, for providing the original collection seeds. We thank T. Zhang for releasing resequencing data for wild cotton accessions. This work was supported by the Fund of the China Agriculture Research System (CARS18-08) and the Science and Technology Support Program of Hebei Province (16226307D) to Z.M.; the National Major Science and Technology Program (2016ZX08005003-005) to X.W.; the National Key Research and Development Program (2016YFD0100203) to X.D., (2016YFD0101405) to Y.Z., and (2016YFD0100306) to S.H.; and the National Science and Technology Support Program (2013BAD01B03) to X.D.

Author information

Author notes

  1. These authors contributed equally: Zhiying Ma, Shoupu He, Xingfen Wang, Junling Sun, Yan Zhang, Guiyin Zhang, Liqiang Wu, Zhikun Li, Zhihao Liu.


  1. North China Key Laboratory for Crop Germplasm Resources of Education Ministry, Hebei Agricultural University, Baoding, China

    • Zhiying Ma
    • , Xingfen Wang
    • , Yan Zhang
    • , Guiyin Zhang
    • , Liqiang Wu
    • , Zhikun Li
    • , Yuanyuan Yan
    • , Jun Yang
    • , Qishen Gu
    • , Zhengwen Sun
    • , Zhengwen Liu
    • , Jinhua Wu
    • , Huifeng Ke
    • , Guoning Wang
    •  & Nan Wang
  2. State Key Laboratory of Cotton Biology, Institute of Cotton Research of the Chinese Academy of Agricultural Sciences, Anyang, China

    • Shoupu He
    • , Junling Sun
    • , Yinhua Jia
    • , Zhaoe Pan
    • , Panhong Dai
    • , Wenfang Gong
    • , Jun Peng
    • , Liru Wang
    • , Baoyin Pang
    • , Zhen Peng
    •  & Xiongming Du
  3. Novogene Bioinformatics Institute, Beijing, China

    • Zhihao Liu
    • , Ruiqiang Li
    •  & Shilin Tian
  4. Anyang Institute of Technology, Anyang, China

    • Gaofei Sun
  5. Xinjiang Academy of Agricultural Sciences, Urumchi, China

    • Xueyuan Li
    •  & Junduo Wang
  6. Yangtze University, Jingzhou, China

    • Panhong Dai
    •  & Mi Wang
  7. Suzhou University of Science and Technology, Suzhou, China

    • Hengwei Liu
  8. Gansu Academy of Agricultural Sciences, Lanzhou, China

    • Keyun Feng
    •  & Hongyu Lan


  1. Search for Zhiying Ma in:

  2. Search for Shoupu He in:

  3. Search for Xingfen Wang in:

  4. Search for Junling Sun in:

  5. Search for Yan Zhang in:

  6. Search for Guiyin Zhang in:

  7. Search for Liqiang Wu in:

  8. Search for Zhikun Li in:

  9. Search for Zhihao Liu in:

  10. Search for Gaofei Sun in:

  11. Search for Yuanyuan Yan in:

  12. Search for Yinhua Jia in:

  13. Search for Jun Yang in:

  14. Search for Zhaoe Pan in:

  15. Search for Qishen Gu in:

  16. Search for Xueyuan Li in:

  17. Search for Zhengwen Sun in:

  18. Search for Panhong Dai in:

  19. Search for Zhengwen Liu in:

  20. Search for Wenfang Gong in:

  21. Search for Jinhua Wu in:

  22. Search for Mi Wang in:

  23. Search for Hengwei Liu in:

  24. Search for Keyun Feng in:

  25. Search for Huifeng Ke in:

  26. Search for Junduo Wang in:

  27. Search for Hongyu Lan in:

  28. Search for Guoning Wang in:

  29. Search for Jun Peng in:

  30. Search for Nan Wang in:

  31. Search for Liru Wang in:

  32. Search for Baoyin Pang in:

  33. Search for Zhen Peng in:

  34. Search for Ruiqiang Li in:

  35. Search for Shilin Tian in:

  36. Search for Xiongming Du in:


Z.M., X.W., X.D., and S.T. designed the analyses. Z.M., X.W., X.D., S.H., Y.Z., Zhihao Liu, and R.L. performed sequencing, genomic-variant, and GWAS analyses. X.W., G.Z., L. Wu, J.P., and S.T. managed the project. J.S., L. Wu, Z. Li, G.Z., J.Y., Y.J., Q.G., Z. Pan, X.L., Z.S., P.D., Zhengwen Liu, W.G., J. Wu, M.W., H. Liu, K.F., H.K., J. Wang, H. Lan, G.W., L. Wang, B.P., and Z. Peng performed field experiments and phenotyping. X.W., G.S., Y.J., Z.S., Zhengwen Liu, and N.W. performed data integration. Y.Z., Zhengwen Liu, and Z.S. performed transcriptome analyses. J.S., L. Wang, Y.J., and H.K. prepared the population material. Y.Z., Y.Y., and X.W. conducted gene expression analysis and functional validation. X.W. and Z.M. designed the research and wrote the manuscript. S.H., Y.Z., S.T., and X.D. designed the research and revised the manuscript. Z.M. and X.D. conceived the research.

Competing interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to Zhiying Ma or Xingfen Wang or Shilin Tian or Xiongming Du.

Supplementary information

  1. Supplementary Tables and Figures

    Supplementary Figures 1–23 and Supplementary Tables 3, 6, 8, 9, 11–13 and 15

  2. Reporting Summary

  3. Supplementary Table 1

    The list of 419 cotton accessions used in this study and their sequenced information

  4. Supplementary Table 2

    Statistics of different SNP mutation types for 419 accessions

  5. Supplementary Table 4

    Tracy-Widom statistics of eigenvalues from PCA analysis of 419 accessions

  6. Supplementary Table 5

    The ancestry proportion estimates for each accession when the ancestral population was specified as three

  7. Supplementary Table 7

    Number of SNP variation of different genes between core collection and wild races

  8. Supplementary Table 10

    List of the associated SNPs and genes for 13 traits

  9. Supplementary Table 14

    SNPs, elite alleles and their frequency of 13 traits in wild races, early- and modern-varieties