Genome sequence of cultivated Upland cotton (Gossypium hirsutum TM-1) provides insights into genome evolution

Journal name:
Nature Biotechnology
Year published:
Published online


Gossypium hirsutum has proven difficult to sequence owing to its complex allotetraploid (AtDt) genome. Here we produce a draft genome using 181-fold paired-end sequences assisted by fivefold BAC-to-BAC sequences and a high-resolution genetic map. In our assembly 88.5% of the 2,173-Mb scaffolds, which cover 89.6%~96.7% of the AtDt genome, are anchored and oriented to 26 pseudochromosomes. Comparison of this G. hirsutum AtDt genome with the already sequenced diploid Gossypium arboreum (AA) and Gossypium raimondii (DD) genomes revealed conserved gene order. Repeated sequences account for 67.2% of the AtDt genome, and transposable elements (TEs) originating from Dt seem more active than from At. Reduction in the AtDt genome size occurred after allopolyploidization. The A or At genome may have undergone positive selection for fiber traits. Concerted evolution of different regulatory mechanisms for Cellulose synthase (CesA) and 1-Aminocyclopropane-1-carboxylic acid oxidase1 and 3 (ACO1,3) may be important for enhanced fiber production in G. hirsutum.

At a glance


  1. Evolution and syntenic analysis of the G. hirsutum genome.
    Figure 1: Evolution and syntenic analysis of the G. hirsutum genome.

    (a) G. hirsutum and six other genomes descended from common eudicot genome ancestors. Colored blocks within modern chromosomes of the species represent the chromatin origin from seven ancestral chromosomes. Numbers denote the predicted divergence times (MYA) and each red dot represents one whole genome duplication. (b) Syntenic blocks between the At subgenome in G. hirsutum and the diploid A genome in G. arboreum genome. (c) Syntenic blocks between the Dt subgenome in G. hirsutum and the diploid D genome in G. raimondii.

  2. Characterization of copia and gypsy TEs in the G. hirsutum genome.
    Figure 2: Characterization of copia and gypsy TEs in the G. hirsutum genome.

    (a) Statistics for these two types of TEs present in the Dt (left half) and At (right half) subgenomes. The outer circle shows the percent coverage of copia (green histogram) and gypsy (purple histogram) in nonoverlapping windows (window size = 500 kb). The following two inner circles indicate the copia and gypsy transcript levels, which were estimated by averaging values of reads (log10) from different tissues in nonoverlapping 500-kb windows. The links in the center indicate collinearity between At and Dt subgenomes. Only syntenic blocks of >1 Mb in length are shown. (b) Estimated insertion time for copia and gypsy LTR retrotransposons. (c) Distances from individual TEs to their nearest gene.

  3. Evolution of gene models, DNA fragments and syntenic blocks among G. hirsutum (AtDt) and two diploid cotton genomes, G. arboreum (A) and G. raimondii (D).
    Figure 3: Evolution of gene models, DNA fragments and syntenic blocks among G. hirsutum (AtDt) and two diploid cotton genomes, G. arboreum (A) and G. raimondii (D).

    (a) Scenarios and statistics of gene conservation. Solid lines indicate currently observed genes, and dotted lines indicate lost genes. The numbers beneath each drawing represent the number of gene pairs found in the three different genomes that fit the specific model. From left to right, genes present in all four genomes, genes not observed in At, genes not observed in Dt, genes not observed in either A or At, genes not observed in either D or Dt, genes not observed in A, genes not observed in D. (b) HE of genomic segments between the At and Dt subgenomes in a region of G. hirsutum chromosome 9 (Gh9). The curves in the upper panel show homologous gene pairs between At and A or At and D. The lower panel shows the Ks value distribution for syntenic blocks, which indicates HE in the tetraploid cotton. The dot plots show distribution of Ks values and the boxplots display variations of Ks values between A and D genomes. Note that some of the dots (outliers) are not included in the boxplots due to their low probabilities. (c) Distribution of Ks values between four cotton genomes and T. cacao (upper panel) and single-nucleotide variation (SNV) rate (lower panel) among different cotton genomes.

  4. Ethylene production and its regulatory mechanisms in three cotton species (G. raimondii, G. arboreum and G. hirsutum).
    Figure 4: Ethylene production and its regulatory mechanisms in three cotton species (G. raimondii, G. arboreum and G. hirsutum).

    (a) Comparisons of ethylene production from cultured ovules collected at 1 DPA and cultured for 14 d, with air samples collected at the different time points as shown. Data reported are the mean ± s.e.m. from three independent ovule culture experiments, with triplicate measurements for each sample. (b,c) Electrophoretic mobility shift assays (EMSA) showing the specific binding complex on the P6 fragment of ACO1 (b) and ACO3 (c) promoters. 32P-labeled probes were incubated with nuclear protein samples prepared from 10-DPA G. hirsutum ovules. Dotted lines shown on the top of each panel show the lost sequences in the corresponding genome with the red boxes representing MYB binding sites. In each panel, one representative EMSA obtained using probes originating from the Dt or the At subgenomes of G. hirsutum is shown in the middle, and data obtained using probes produced from G. arboreum (Ga) or G. raimondii (Gr) in the bottom. (d) Comparisons of the binding activity on P6 from ACO1 and ACO3 promoter regions of the three different cottons. Shown are data obtained from nucleoproteins prepared from 0-, 3-, 5-, 10-, 15- and 20-DPA ovules and incubated with P6 originating from G. raimondii, from the Dt copy of G. hirsutum and from G. arboreum. Error bars, mean ± s.e.m. from three independent EMSA experiments. (e) Phylogenetic and evolutionary analysis of ACO1 and ACO3 promoter regions from G. raimondii, G. arboreum and T. cacao. Scale bars, 100 bp. Statistical significance was determined using one-way analysis of variance software. *P < 0.05, **P < 0.01, ***P < 0.001.

  5. Fiber growth, expression and potential regulatory mechanisms of genes important for cell wall biosynthesis.
    Figure 5: Fiber growth, expression and potential regulatory mechanisms of genes important for cell wall biosynthesis.

    (a) Comparisons of Fiber lengths of the three cotton species (left), and growth rate analysis for G. hirsutum and G. arboreum (right). Error bars, mean ± s.d. (bd) qRT-PCR analysis of primary and secondary cell wall biosynthesis genes using the At- (indicated by blue lines) or A-originated (red lines) copy as the template. UER, UDP-4-keto-6-deoxy-D-glucose 3,5-epimerase 4-reductase; UGP, UDP-D-glucose pyrophosphorylase; UGD, UDP-D-glucose dehydrogenase. See Supplementary Table 19 for gene-specific primers. Error bars, mean ± s.d. (e) The evolution of regulatory mechanisms for fiber-specific and highly expressed genes from the At subgenome.

Accession codes

Primary accessions


Sequence Read Archive


  1. Zhu, Y.-X. & Li, F.G. The Gossypium raimondii genome, a huge leap forward in cotton genomics. J. Integr. Plant Biol. 55, 570571 (2013).
  2. Chen, Z.J. et al. Toward sequencing cotton (Gossypium) genomes. Plant Physiol. 145, 13031310 (2007).
  3. Wendel, J., Brubaker, C., Alvarez, I., Cronn, R. & Stewart, J.M. in Genetics and Genomics of Cotton, vol. 3 (ed. Paterson, A.H.) 322 (Springer, New York, 2009).
  4. Paterson, A.H. et al. Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres. Nature 492, 423427 (2012).
  5. Li, F.G. et al. Genome sequence of the cultivated cotton Gossypium arboreum. Nat. Genet. 46, 567572 (2014).
  6. Wang, K. et al. The draft genome of a diploid cotton Gossypium raimondii. Nat. Genet. 44, 10981103 (2012).
  7. Wendel, J.F. & Albert, V.A. Phylogenetics of the cotton genus (Gossypium): character-state weighted parsimony analysis of chloroplast-DNA restriction site data and its systematic and biogeographic implications. Syst. Bot. 17, 115143 (1992).
  8. Wendel, J.F. New world tetraploid cottons contain old-world cytoplasm. Proc. Natl. Acad. Sci. USA 86, 41324136 (1989).
  9. The International Wheat Genome Sequencing Consortium. A chromosome-based draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science 345, 1251788 (2014).
  10. Chalhoub, B. et al. Early allopolyploid evolution in the post-Neolithic Brassica napus oilseed genome. Science 345, 950953 (2014).
  11. Kohel, R.J., Richmond, T.R. & Lewis, C.F. Texas Marker-1. Description of a genetic standard for Gossypium hirsutum. Crop Sci. 10, 670671 (1970).
  12. Yu, J.Z. et al. A high-density simple sequence repeat and single nucleotide polymorphism genetic map of the tetraploid cotton genome. G3 (Bethesda) 2, 4358 (2012).
  13. Arumuganathan, K. & Earle, E.D. Nuclear DNA content of some important plant species. Plant Mol. Biol. Rep. 9, 208218 (1991).
  14. Hendrix, B. & Stewart, J.M. Estimation of the nuclear DNA content of Gossypium species. Ann. Bot. 95, 789797 (2005).
  15. Li, R. et al. The sequence and de novo assembly of the giant panda genome. Nature 463, 311317 (2010).
  16. Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265272 (2010).
  17. Wendel, J.F. & Cronn, R.C. in Advances in Agronomy (ed. Sparks, D.L.) 139186 (Academic Press, 2003).
  18. Zhang, H.B., Li, Y., Wang, B. & Chee, P.W. Recent advances in cotton genomics. Int. J. Plant Genomics 2008, 742304 (2008).
  19. Jaillon, O. et al. The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449, 463467 (2007).
  20. Argout, X. et al. The genome of Theobroma cacao. Nat. Genet. 43, 101108 (2011).
  21. Myburg, A.A. et al. The genome of Eucalyptus grandis. Nature 510, 356362 (2014).
  22. The Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796815 (2000).
  23. Schmutz, J. et al. Genome sequence of the palaeopolyploid soybean. Nature 463, 178183 (2010).
  24. Tang, H. et al. Unravelling ancient hexaploidy through multiply-aligned angiosperm gene maps. Genome Res. 18, 19441954 (2008).
  25. Oliver, K.R., McComb, J.A. & Greene, W.K. Transposable elements: powerful contributors to angiosperm evolution and diversity genome. Genome Biol. Evol. 5, 18861901 (2013).
  26. Bennetzen, J.L. & Wang, H. The contributions of transposable elements to the structure, function, and evolution of plant genomes. Annu. Rev. Plant Biol. 65, 505530 (2014).
  27. Ossowski, S. et al. The rate and molecular spectrum of spontaneous mutations in Arabidopsis thaliana. Science 327, 9294 (2010).
  28. Zhang, T. et al. Sequencing of allotetraploid cotton (Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement. Nat Biotechnol. doi:10.1038/nbt.3207 (20 April 2015).
  29. Tang, H. et al. Synteny and collinearity in plant genomes. Science 320, 486488 (2008).
  30. Udall, J.A., Quijada, P.A. & Osborn, T.C. Detection of chromosomal rearrangements derived from homeologous recombination in four mapping populations of Brassica napus L. Genetics 169, 967979 (2005).
  31. Wright, R.J., Thaxton, P.M., El-Zik, K.M. & Paterson, A.H. D-subgenome bias of Xcm resistance genes in tetraploid Gossypium (cotton) suggests that polyploid formation has created novel venues for evolution. Genetics 149, 19871996 (1998).
  32. Yu, J. et al. The genomes of Oryza sativa: a history of duplications. PLoS Biol. 3, e38 (2005).
  33. Small, R.L. & Wendel, J.F. Differential evolutionary dynamics of duplicated paralogous Adh loci in allotetraploid cotton (Gossypium). Mol. Biol. Evol. 19, 597607 (2002).
  34. Shi, Y.H. et al. Transcriptome profiling, molecular biological, and physiological studies reveal a major role for ethylene in cotton fibre cell elongation. Plant Cell 18, 651664 (2006).
  35. Qin, Y.M. et al. Saturated very-long-chain fatty acids promote cotton fibre and Arabidopsis cell elongation by activating ethylene biosynthesis. Plant Cell 19, 36923704 (2007).
  36. Qin, Y.M. & Zhu, Y.-X. How cotton fibres elongate: a tale of linear cell-growth mode. Curr. Opin. Plant Biol. 14, 106111 (2011).
  37. Pang, C.Y. et al. Comparative proteomics indicates that biosynthesis of pectic precursors is important for cotton fibre and Arabidopsis root hair elongation. Mol. Cell. Proteomics 9, 20192033 (2010).
  38. Peng, L.C., Kawagoe, Y., Hogan, P. & Delmer, D. Sitosterol-β-glucoside as primer for cellulose synthesis in plants. Science 295, 147150 (2002).
  39. McFarlane, H.E., Doring, A. & Persson, S. The cell biology of cellulose synthesis. Annu. Rev. Plant Biol. 65, 6994 (2014).
  40. Rozen, S. et al. Abundant gene conversion between arms of palindromes in human and ape Y chromosomes. Nature 423, 873876 (2003).
  41. You, M. et al. A heterozygous moth genome provides insights into herbivory and detoxification. Nat. Genet. 45, 220225 (2013).
  42. Boetzer, M., Henkel, C.V., Jansen, H.J., Butler, D. & Pirovano, W. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27, 578579 (2011).
  43. Stam, P. Construction of integrated genetic linkage maps by means of a new computer package: JoinMap. Plant J. 3, 739744 (1993).
  44. Xie, W. et al. Parent-independent genotyping for constructing an ultrahigh-density linkage map based on population sequencing. Proc. Natl. Acad. Sci. USA 107, 1057810583 (2010).
  45. Jurka, J. Repbase Update: a database and an electronic journal of repetitive elements. Trends Genet. 16, 418420 (2000).
  46. Edgar, R.C. & Myers, E.W. PILER: identification and classification of genomic repeats. Bioinformatics 21 (suppl. 1), i152i158 (2005).
  47. Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265W268 (2007).
  48. Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics. Chapter 4, Unit 4.10 (2009).
  49. Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573580 (1999).
  50. Elsik, C.G. et al. Creating a honey bee consensus gene set. Genome Biol. 8, R13 (2007).
  51. Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 34, W435W439 (2006).
  52. Majoros, W., Pertea, M. & Salzberg, S. TigrScan and GlimmerHMM: Twoopen source ab initio eukaryotic gene-finders. Bioinformatics 20, 28782879 (2004).
  53. Birney, E., Clamp, M. & Durbin, R. GeneWise and genomewise. Genome Res. 14, 988995 (2004).
  54. Kent, W.J. BLAT: The BLAST-like alignment tool. Genome Res. 12, 656664 (2002).
  55. Trapnell, C. et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat. Biotechnol. 28, 511515 (2010).
  56. Yang, Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput. Appl. Biosci. 13, 555556 (1997).
  57. Remm, M., Storm, C.E. & Sonnhammer, E.L. Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J. Mol. Biol. 314, 10411052 (2001).
  58. Finn, R.D., Clements, J. & Eddy, S.R. HMMER web server: interactive sequence similarity searching. Nucleic Acids Res. 39, W29W37 (2011).
  59. Tamura, K. et al. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 28, 27312739 (2011).

Download references

Author information

  1. These authors contributed equally to this work.

    • Fuguang Li,
    • Guangyi Fan,
    • Cairui Lu,
    • Guanghui Xiao,
    • Changsong Zou,
    • Russell J Kohel,
    • Zhiying Ma,
    • Haihong Shang,
    • Xiongfeng Ma,
    • Jianyong Wu &
    • Xinming Liang


  1. State Key Laboratory of Cotton Biology, Institute of Cotton Research of the Chinese Academy of Agricultural Sciences, Anyang, China.

    • Fuguang Li,
    • Cairui Lu,
    • Changsong Zou,
    • Haihong Shang,
    • Xiongfeng Ma,
    • Jianyong Wu,
    • Kun Liu,
    • Weihua Yang,
    • Xiongming Du,
    • Youlu Yuan,
    • Wuwei Ye,
    • Xueyan Zhang,
    • Hengling Wei,
    • Shoujun Wei,
    • Jinjie Cui,
    • Guoli Song,
    • Kunbo Wang &
    • Shuxun Yu
  2. BGI-Shenzhen, Shenzhen, China.

    • Guangyi Fan,
    • Xinming Liang,
    • Wenbin Chen,
    • Chengcheng Shi,
    • Xin Liu,
    • Weiqing Liu,
    • Guodong Huang,
    • He Zhang,
    • Fengming Sun,
    • Jie Liang,
    • Jiahao Wang,
    • Qiang He,
    • Leihuan Huang,
    • Jun Wang &
    • Xun Xu
  3. State Key Laboratory of Protein and Plant Gene Research, College of Life Sciences, Peking University, Beijing, China.

    • Guanghui Xiao,
    • Gai Huang &
    • Yuxian Zhu
  4. Institute for Advanced Studies and College of Life Sciences, Wuhan University, Wuhan, China.

    • Guanghui Xiao,
    • Gai Huang &
    • Yuxian Zhu
  5. Crop Germplasm Research Unit, Southern Plains Agricultural Research Center, US Department of Agriculture–Agricultural Research Service (USDA-ARS), College Station, Texas, USA.

    • Russell J Kohel,
    • Richard G Percy &
    • John Z Yu
  6. Key Laboratory for Crop Germplasm Resources of Hebei, Agricultural University of Hebei, Baoding, China.

    • Zhiying Ma &
    • Xingfen Wang
  7. National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, China.

    • Xianlong Zhang
  8. Department of Agronomy, Zhejiang University, Hangzhou, China.

    • Shuijin Zhu


F.L., G.F., C.L., Z.M., R.J.K., X.X., J.Z.Y., Y.Z. and S.Y. designed the analyses. G.F., X.m.L., W.C., C.S., X. Liu, W.L., G.d.H., H.Z., J.L., J.W., Q.H., L.H., F.S., J.h.W. and X.X. performed sequencing, assembly and genome annotation. F.L., X.m.L., K.W., G.S., J.y.W., J.C., X.X., J.Z.Y. and S.Y. managed and coordinated the project. C.L., C.Z., H.S., G.X., J.Z.Y. and G.d.H. performed the genome analysis and physical map integration. F.L., C.L., C.Z., Z.M., H.S., X.M., K.W., G.S., J.y.W., J.C., K.L., W.Y., X.D., Y.Y., W.Ye, X.l.Z., H.W., S.Y., G.X., G.H., X.W., S.W., X.Z. and S.Z. prepared DNA/RNA samples and performed PCR analysis. Y.Z., G.X., H.S., C.Z., C.L. and G.H. performed transcriptome and lineage-specific gene functional analyses. R.J.K., R.G.P. and J.Z.Y conceived the project, provided the homozygous seeds and revised the manuscript. Y.Z., C.L., C.Z., G.X. and H.S. wrote the manuscript. S.Y., Y.Z. and F.L. conceived and directed the project.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

PDF files

  1. Supplementary Text and Figures (4,577 KB)

    Supplementary Figures 1–11 and Supplementary Tables 1–12 and 14–19

Excel files

  1. Supplementary Table 13 (4,922 KB)

    Orthologous gene pairs of G. hirsutum, G. arboreum, and G. raimondii.

Additional data