Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Genomic analyses in cotton identify signatures of selection and loci associated with fiber quality and yield traits

Abstract

Upland cotton (Gossypium hirsutum) is the most important natural fiber crop in the world. The overall genetic diversity among cultivated species of cotton and the genetic changes that occurred during their improvement are poorly understood. Here we report a comprehensive genomic assessment of modern improved upland cotton based on the genome-wide resequencing of 318 landraces and modern improved cultivars or lines. We detected more associated loci for lint yield than for fiber quality, which suggests that lint yield has stronger selection signatures than other traits. We found that two ethylene-pathway-related genes were associated with increased lint yield in improved cultivars. We evaluated the population frequency of each elite allele in historically released cultivar groups and found that 54.8% of the elite genome-wide association study (GWAS) alleles detected were transferred from three founder landraces: Deltapine 15, Stoneville 2B and Uganda Mian. Our results provide a genomic basis for improving cotton cultivars and for further evolutionary analysis of polyploid crops.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Formation and improvement of the allotetraploid cottons.
Figure 2: Phylogenetic relationships and genetic diversity of 318 cotton accessions.
Figure 3: GWAS for yield and identification of the candidate gene AIL6 on chromosome A02.
Figure 4: GWAS for yield and identification of the candidate gene EIL.
Figure 5: GWAS for fiber quality and identification of the candidate gene GhXI-K.
Figure 6: Identification of IBD regions in the cotton breeding pedigree.

Similar content being viewed by others

Accession codes

Primary accessions

BioProject

Sequence Read Archive

References

  1. Kohel, R.J. & Lewis, C.F. (eds.) Cotton (American Society of Agronomy, Madison, Wisconsin, USA, 1984).

  2. Stephens, S.G. Evolution under domestication of the new world cottons (Gossypium spp.). Cienc. Cult. 19, 118–134 (1967).

    Google Scholar 

  3. Zhang, T. et al. Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement. Nat. Biotechnol. 33, 531–537 (2015).

    Article  CAS  PubMed  Google Scholar 

  4. Fryxell, P.A. (ed.) Natural History of the Cotton Tribe (Texas A&M University Press, College Station, Texas, USA, 1979).

  5. May, O.L., Bowman, D.T. & Calhoun, D.S. Genetic diversity of US upland cotton cultivars released between 1980 and 1990. Crop Sci. 35, 1570–1574 (1995).

    Article  Google Scholar 

  6. Wendel, J.F., Brubaker, C.L. & Percival, A.E. Genetic diversity in Gossypium hirsutum and the origin of upland cotton. Am. J. Bot. 79, 1291–1310 (1992).

    Article  Google Scholar 

  7. McGarry, R.C. et al. Monopodial and sympodial branching architecture in cotton is differentially regulated by the Gossypium hirsutumSINGLE FLOWER TRUSS and SELF-PRUNING orthologs. New Phytol. 212, 244–258 (2016).

    Article  CAS  PubMed  Google Scholar 

  8. Brubaker, C.L. & Wendel, J.F. Reevaluating the origin of domesticated cotton (Gossypium hirsutum; Malvaceae) using nuclear restriction fragment length polymorphisms (RFLPs). Am. J. Bot. 81, 1309–1326 (1994).

    Article  Google Scholar 

  9. Wang, G.L., Dong, J.M. & Paterson, A.H. The distribution of Gossypium hirsutum chromatin in G. barbadense germ plasm: molecular analysis of introgressive plant breeding. Theor. Appl. Genet. 91, 1153–1161 (1995).

    Article  CAS  PubMed  Google Scholar 

  10. Lacape, J.M., Dessauw, D., Rajab, M., Noyer, J.L. & Hau, B. Microsatellite diversity in tetraploid Gossypium germplasm: assembling a highly informative genotyping set of cotton SSRs. Mol. Breed. 19, 45–58 (2007).

    Article  CAS  Google Scholar 

  11. Tyagi, P. et al. Genetic diversity and population structure in the US Upland cotton (Gossypium hirsutum L.). Theor. Appl. Genet. 127, 283–295 (2014).

    Article  PubMed  Google Scholar 

  12. Huang, X. et al. Genome-wide association studies of 14 agronomic traits in rice landraces. Nat. Genet. 42, 961–967 (2010).

    Article  CAS  PubMed  Google Scholar 

  13. Huang, X. et al. A map of rice genome variation reveals the origin of cultivated rice. Nature 490, 497–501 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Meyer, R.S. et al. Domestication history and geographical adaptation inferred from a SNP map of African rice. Nat. Genet. 48, 1083–1088 (2016).

    Article  CAS  PubMed  Google Scholar 

  15. Chia, J.M. et al. Maize HapMap2 identifies extant variation from a genome in flux. Nat. Genet. 44, 803–807 (2012).

    Article  CAS  PubMed  Google Scholar 

  16. Hufford, M.B. et al. Comparative population genomics of maize domestication and improvement. Nat. Genet. 44, 808–811 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Jiao, Y. et al. Genome-wide genetic changes during modern breeding of maize. Nat. Genet. 44, 812–815 (2012).

    Article  CAS  PubMed  Google Scholar 

  18. Wang, X. et al. Genetic variation in ZmVPP1 contributes to drought tolerance in maize seedlings. Nat. Genet. 48, 1233–1241 (2016).

    Article  CAS  PubMed  Google Scholar 

  19. Li, Y.H. et al. De novo assembly of soybean wild relatives for pan-genome analysis of diversity and agronomic traits. Nat. Biotechnol. 32, 1045–1052 (2014).

    Article  CAS  PubMed  Google Scholar 

  20. Zhou, Z. et al. Resequencing 302 wild and cultivated accessions identifies genes related to domestication and improvement in soybean. Nat. Biotechnol. 33, 408–414 (2015).

    Article  CAS  PubMed  Google Scholar 

  21. Lin, T. et al. Genomic analyses provide insights into the history of tomato breeding. Nat. Genet. 46, 1220–1226 (2014).

    Article  CAS  PubMed  Google Scholar 

  22. Fang, L. et al. Genomic insights into divergence and dual domestication of cultivated allotetraploid cottons. Genome Biol. 18, 33 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  23. Wang, M. et al. Asymmetric subgenome selection and cis-regulatory divergence during cotton domestication. Nat. Genet. 49, 579–587 (2017).

    Article  CAS  PubMed  Google Scholar 

  24. Calhoun, D.S., Bowman, D.T. & May, O.L. Pedigrees of Upland and Pima cotton cultivars released between 1970 and 1990 (Mississippi Agricultural & Forestry Experiment Station, 1994).

  25. Cavanagh, C.R. et al. Genome-wide comparative diversity uncovers multiple targets of selection for improvement in hexaploid wheat landraces and cultivars. Proc. Natl. Acad. Sci. USA 110, 8057–8062 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Wei, X. et al. Genetic discovery for oil production and quality in sesame. Nat. Commun. 6, 8609 (2015).

    Article  CAS  PubMed  Google Scholar 

  27. Jia, G. et al. A haplotype map of genomic variations and genome-wide association studies of agronomic traits in foxtail millet (Setaria italica). Nat. Genet. 45, 957–961 (2013).

    Article  CAS  PubMed  Google Scholar 

  28. Chen, H., Patterson, N. & Reich, D. Population differentiation as a test for selective sweeps. Genome Res. 20, 393–402 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Si, L. et al. OsSPL13 controls grain size in cultivated rice. Nat. Genet. 48, 447–456 (2016).

    Article  CAS  PubMed  Google Scholar 

  30. Yano, K. et al. Genome-wide association study using whole-genome sequencing rapidly identifies new genes influencing agronomic traits in rice. Nat. Genet. 48, 927–934 (2016).

    Article  CAS  PubMed  Google Scholar 

  31. Horstman, A., Willemsen, V., Boutilier, K. & Heidstra, R. AINTEGUMENTA-LIKE proteins: hubs in a plethora of networks. Trends Plant Sci. 19, 146–157 (2014).

    Article  CAS  PubMed  Google Scholar 

  32. Chao, Q. et al. Activation of the ethylene gas response pathway in Arabidopsis by the nuclear protein ETHYLENE-INSENSITIVE3 and related proteins. Cell 89, 1133–1144 (1997).

    Article  CAS  PubMed  Google Scholar 

  33. Chen, Y.F., Etheridge, N. & Schaller, G.E. Ethylene signal transduction. Ann. Bot. 95, 901–915 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Shi, Y.H. et al. Transcriptome profiling, molecular biological, and physiological studies reveal a major role for ethylene in cotton fiber cell elongation. Plant Cell 18, 651–664 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Peremyslov, V.V., Prokhnevsky, A.I., Avisar, D. & Dolja, V.V. Two class XI myosins function in organelle trafficking and root hair development in Arabidopsis. Plant Physiol. 146, 1109–1116 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Peremyslov, V.V., Prokhnevsky, A.I. & Dolja, V.V. Class XI myosins are required for development, cell expansion, and F-actin organization in Arabidopsis. Plant Cell 22, 1883–1897 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Serna, L. & Martin, C. Trichomes: different regulatory networks lead to convergent structures. Trends Plant Sci. 11, 274–280 (2006).

    Article  CAS  PubMed  Google Scholar 

  38. Zhang, W. et al. QTL analysis on yield and its components in recombinant inbred lines of Upland cotton. Acta Agronomica Sinica 37, 433–442 (2011).

    CAS  Google Scholar 

  39. Lacape, J. et al. Mapping QTLs for traits related to phenology, morphology and yield components in an inter-specific Gossypium hirsutum × G. barbadense cotton RIL population. Field Crops Res. 144, 256–267 (2013).

    Article  Google Scholar 

  40. Ma, X. et al. QTL mapping in A-genome diploid Asiatic cotton and their congruence analysis with AD-genome tetraploid cotton in genus Gossypium. J. Genet. Genomics 35, 751–762 (2008).

    Article  PubMed  Google Scholar 

  41. Hutchison, C.E. et al. The Arabidopsis histidine phosphotransfer proteins are redundant positive regulators of cytokinin signaling. Plant Cell 18, 3073–3087 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. Zhao, J. et al. Moderately enhancing cytokinin level by down-regulation of GhCKX expression in cotton concurrently increases fiber and seed yield. Mol. Breed. 35, 60 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  43. Huang, Z. (ed.) Cotton Varieties and Their Genealogy in China (China Agriculture Press, Beijing, China, 1996).

  44. Paterson, A.H., Brubaker, C. & Wendel, J.F. A rapid method for extraction of cotton (Gossypium spp.) genomic DNA suitable for RFLP or PCR analysis. Plant Mol. Biol. Report. 11, 122–127 (1993).

    Article  CAS  Google Scholar 

  45. Felsenstein, J. PHYLIP: phylogeny inference package (version 3.2). Cladistics 5, 164–166 (1989).

    Google Scholar 

  46. Huson, D.H. et al. Dendroscope: an interactive viewer for large phylogenetic trees. BMC Bioinformatics 8, 460 (2007).

    Article  PubMed  PubMed Central  Google Scholar 

  47. Browning, B.L. & Browning, S.R. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am. J. Hum. Genet. 84, 210–223 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).

    CAS  PubMed  Google Scholar 

  49. Pritchard, J.K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  50. Li, M.X., Yeung, J.M., Cherny, S.S. & Sham, P.C. Evaluating the effective numbers of independent tests and significant p-value thresholds in commercial genotyping arrays and public imputation reference datasets. Hum. Genet. 131, 747–756 (2012).

    Article  CAS  PubMed  Google Scholar 

  51. Shin, J.-H., Blay, S., McNeney, B. & Graham, J. LDheatmap: an R function for graphical display of pairwise linkage disequilibria between single nucleotide polymorphisms. J. Stat. Softw. http://dx.doi.org/10.18637/jss.v016.c03 (2006).

  52. Trapnell, C. et al. Transcript assembly and quantification by RNA-seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511–515 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  53. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).

    PubMed  PubMed Central  Google Scholar 

  55. McKenna, A. et al. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20, 1297–1303 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  56. Talevich, E., Shain, A.H., Botton, T. & Bastian, B.C. CNVkit: genome-wide copy number detection and visualization from targeted DNA sequencing. PLoS Comput. Biol. 12, e1004873 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work was financially supported in part by the NSFC (grant U1503284 to T.Z.), the National Key R&D Program for Crop Breeding in China (grant 2016YFD0100203 to X.D. and B.Z.), the Distinguished Discipline Support Program of Zhejiang University, the Priority Academic Program Development of Jiangsu Higher Education Institutions, the 111 project (grant B08025 to Nanjing Agricultural University) and the JCIC-MCP project (to Nanjing Agricultural University). We thank the National Medium-term Gene Bank of Cotton in the Institute of Cotton Research of the Chinese Academy of Agricultural Sciences (Henan, China) for providing some of the cotton germplasm resource seeds and permitting us to harvest the leaves of seven G. hirsutum races so we could isolate DNA for the present study.

Author information

Authors and Affiliations

Authors

Contributions

T.Z. conceptualized the research program. T.Z. and X.D. designed the experiments and coordinated the project. T.Z., X.D., B.Z., Y.J., J.S., Z.P., S.H., S.X., W.S., W. Gong, J.L., J.M., X.Z. and W. Guo collected the 318 cotton samples and worked on the phenotype. Y.H., L.F., Q.W., J.C., B.L. and G.M. extracted the high-quality DNA. L.F., Y.H., S.C., C.C. and B.L. constructed DNA-sequencing libraries and carried out genome sequencing. L.F., Q.W., J.C., Y.H., Z.Z. and X.G. performed the genotyping and bioinformatics analyses. T.Z. and L.F. analyzed all of the data and wrote the manuscript. All authors discussed the results and commented on the manuscript.

Corresponding authors

Correspondence to Xiongming Du or Tianzhen Zhang.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 The pedigree information for American upland cotton breeding.

The integrated figure was modified from Fig. 1 to 10 in Calhoun, Bowman & May (1994). The accessions with blue color were collected and analyzed in our study.

Supplementary Figure 2 Phylogenetic tree of all resequenced accessions.

Neighbour-joining tree of all these accessions was constructed using the whole-genome SNPs, including outgroup, landrace and improved modern cultivars marked with light blue, red and dark blue lines, respectively. All these cultivars could be classified into three clades, including a special clade I, Stoneville 2B clade (subclade 1) and Deltapine 15 clade (subclade 2).

Supplementary Figure 3 Phylogenetic relationships of 318 cotton accessions.

(a) Principal component analysis of all cotton accessions using whole-genome SNP data. The outgroups were clustered together. Clade I mainly contained recently domesticated founders and from which developed cultivars such as Acala, Burling’s Mexican and the Maryland Green seed. Clade II included most of Upland cotton cultivars mainly derived from the landraces originated from the same ancestral resource. Subclade #x0049; included the American landraces Stoneville 2B (STV2B), Auburn 56 and Paymaster 54, and those modern cultivars developed from STV2B and were planted mainly in the Yellow river cotton growing area in China. Subclade I#x0049; contained the American landraces such as Deltapine 15 (DPL15), Lankart 57 and Dixie king, and the cultivars developed mainly from DPL15 and were planted in the Yellow river, Yangtze river and Northwest (Xinjiang) cotton growing areas in China. (b) Population structure of cotton accessions determined using STRUCTURE. When K was set to 4, the outgroup component was present. The components of clade #x0049; and clade #x0049;#x0049; including landraces and modern cultivars were difficult to be distinguished from K=2 to K=4.

Supplementary Figure 4 Frequency distribution of phenotypic variation of ten agronomic traits in 258 accessions.

The yield traits included lint percent, seed index, boll weight and boll number. The fiber quality traits included fiber length, fiber elongation, fiber strength, micronaire and fiber uniformity. The Verticillium wilt was for biotic disease resistance.

Supplementary Figure 5 The overlapping regions between selective sweeps and GWAS associated loci.

Four selective sweeps were found located around associated loci for fiber length (FL) (A11:63389724), fiber strength (FS) (A11:68009704) and lint percentage (LP) (D02:5454307 and D03:35858092). The horizontal dashed line indicated the genome-wide threshold (2.5) defining the 1% of πlandracecultivar values. The horizontal line in Manhattan plot indicated the threshold of GWAS (1 × 10-6).

Supplementary Figure 6 The independent assortment between two elite alleles of GhLYI-A02 and GhLYI-D08.

These two alleles segregated independently and randomly united at fertilization. Along with crossing over, they would get together through independent assortment and introduce into some cultivars leading to increased number of bolls per plant and lint percentage. The horizontal box indicated the chromosomes A02 and D08. The vertical line in chromosome indicated two elite alleles.

Supplementary Figure 7 Manhattan plots for lint percentage in nine environments, determined with EMMAx.

Negative log10 (P-value) from a genome-wide scan was plotted against position on each of 26 chromosomes. The horizontal line indicated the threshold (10-6).

Supplementary Figure 8 Two significant associated loci for lint percentage.

(a) Manhattan plot for lint percentage. Negative log10 (P-value) from a genome-wide scan was plotted against position on chromosome D03 and D11. The horizontal line indicated the threshold (10-6). The arrows indicated the associated signal peak D03: 35858092 and D11: 60519747. (b) Local Manhattan plot and LD heatmap. The candidate region was identified between dashed lines. The arrow indicated the SNP in candidate gene.

Supplementary Figure 9 Manhattan plots for seed index in nine environments, generated with EMMAx.

Negative log10 (P-value) from a genome-wide scan was plotted against position on each of 26 chromosomes. The horizontal line indicated the threshold (10-6).

Supplementary Figure 10 The candidate gene AHP5 within the most strongly associated locus for seed index.

(a) Negative log10 P-values for association of seed index (SI) was plotted against SNP positions (X axis). The genome-wide significant P-value threshold (10-6) was indicated by a horizontal blue line. The arrow indicated the signal peak containing the candidate gene (AHP5). (b) Transcription level of the gene AHP5 in different tissues using FPKM with single experiment. (c) Comparison of expression level of AHP5 in ovule development stage from two accessions, TM-1 and ZMS12. The asterisk indicated the significant difference at P < 0.01 (two-side t-test, three independent biological replications).

Supplementary Figure 11 GWAS of Verticillium wilt resistance in nine environments, done with EMMAx.

(a) Manhattan plot for Verticillium wilt-resistance. Negative log10 (P-value) from a genome-wide scan was plotted against position on chromosome D06. The horizontal line indicated the threshold (10-6). The arrows indicated the associated signal peak D06:11815621. (b) Local Manhattan plot and LD heatmap. The candidate region was identified between dashedlines. Red arrow indicated the SNP in candidate gene.

Supplementary Figure 12 The pedigrees of ten deep-sequenced cultivars and landraces extensively planted in China.

The accessions for deep sequencing included three founder landraces, DPL15, STV2B and UGDmian (UDGM). The remaining seven grown cultivars for deep sequencing included SM2, SM3, 86-1, Shiyuan321, ZMS12, Jumian 1 and XLZ42.

Supplementary Figure 13 The detection of in-frame indels in the NAC gene in accessions.

This gene was located in two GWAS associated loci, D02:7014970 and D02:1486735, which were associated with boll weight (BW) and number of bolls per plant (BN), respectively. A 7-bp indel insertion was detected in the founders compared to the reference genome (TM-1).

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–13 and Supplementary Tables 2–5, 7, 8, 12, 13, 15, 17, 20, 22, 23, 27 and 29–32 (PDF 3539 kb)

Supplementary Table 1

Summary of 318 cotton samples and sequencing. (XLS 95 kb)

Supplementary Table 6

Detection of SNP quality by PCR sequencing. (XLSX 15 kb)

Supplementary Table 9

Cotton QTLs overlapping with improvement sweeps. (XLSX 20 kb)

Supplementary Table 10

Genome-wide association signals of yield, fiber quality and disease resistance. (XLS 43 kb)

Supplementary Table 11

The associated loci that overlapped with QTLs. (XLS 40 kb)

Supplementary Table 14

Identification of candidate gene in associated locus A02_79153947 for lint yield. (XLSX 14 kb)

Supplementary Table 16

Identification of candidate gene in associated locus D08_3040023 for lint yield. (XLSX 22 kb)

Supplementary Table 18

The SNP information in associated loci for fiber quality. (XLSX 28 kb)

Supplementary Table 19

Identification of candidate gene in associated loci for fiber quality. (XLSX 28 kb)

Supplementary Table 21

Identification of candidate gene in associated locus D03_35858092 for LP. (XLSX 12 kb)

Supplementary Table 24

The summary of indels identified in ten cultivars by deep sequencing. (XLS 290 kb)

Supplementary Table 25

The indels that overlapped with GWAS loci. (XLSX 23 kb)

Supplementary Table 26

The genes involved in copy-number variation. (XLS 152 kb)

Supplementary Table 28

Identity-by-descent (IBD) regions for Chinese cultivars compared with the traditional landraces. (XLSX 239 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fang, L., Wang, Q., Hu, Y. et al. Genomic analyses in cotton identify signatures of selection and loci associated with fiber quality and yield traits. Nat Genet 49, 1089–1098 (2017). https://doi.org/10.1038/ng.3887

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/ng.3887

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing