Repeated polyploidization of Gossypium genomes and the evolution of spinnable cotton fibres

Polyploidy often confers emergent properties, such as the higher fibre productivity and quality of tetraploid cottons than diploid cottons bred for the same environments1. Here we show that an abrupt five- to sixfold ploidy increase approximately 60million years (Myr) ago, and allopolyploidy reuniting divergent Gossypium genomes approximately 1–2 Myr ago2, conferred about 30–36-fold duplication of ancestral angiosperm (flowering plant) genes in elite cottons (Gossypium hirsutum and Gossypium barbadense), genetic complexity equalled only by Brassica3 among sequenced angiosperms. Nascent fibre evolution, before allopolyploidy, is elucidated by comparison of spinnable-fibred Gossypium herbaceum A and non-spinnable Gossypium longicalyx F genomes to one another and the outgroup D genome of non-spinnable Gossypium raimondii. The sequence of a G. hirsutum AtDt (in which ‘t’ indicates tetraploid) cultivar reveals many non-reciprocal DNA exchanges between subgenomes that may have contributed to phenotypic innovation and/or other emergent properties such as ecological adaptation by polyploids. Most DNA-level novelty in G. hirsutum recombines alleles from the D-genome progenitor native to its New World habitat and the Old World A-genome progenitor in which spinnable fibre evolved. Coordinated expression changes in proximal groups of functionally distinct genes, including a nuclear mitochondrial DNA block, may account for clusters of cotton-fibre quantitative trait loci affecting diverse traits. Opportunities abound for dissecting emergent properties of other polyploids, particularly angiosperms, by comparison to diploid progenitors and outgroups.

  1. Evolution of spinnable cotton fibres.
    Figure 1: Evolution of spinnable cotton fibres.

    Paleohexaploidy in a eudicot ancestor (red, yellow and blue lines) formed a genome resembling that of grape (bottom right). Shortly after divergence from cacao (bottom left), the Gossypium lineage experienced a five- to sixfold ploidy increase. Spinnable fibre evolved in the A genome after its divergence from the F genome, and was further elaborated after the merger of A and D genomes ~1–2Myr ago, forming the common ancestor of G. hirsutum (Upland) and G. barbadense (Egyptian, Sea Island and Pima) cottons.

  2. Syntenic relationships among grape, cacao and cotton.
    Figure 2: Syntenic relationships among grape, cacao and cotton.

    a, Macro-synteny connecting blocks of >30 genes (grey lines). Highlighted regions (pink and red) trace to a common ancestor before the pan-eudicot hexaploidy7, with the Gossypium lineage five- to sixfold ploidy increase forming multiple derived regions. Inferred duplication depth in cotton varies (top). b, Micro-synteny of grape chromosome (Chr) 3, cacao chromosome 2 and five cotton chromosomes. Rectangles represent predicted genes, with connecting grey lines showing co-linear relationships. An example (1 grape, 1 cocoa, 5 cotton) is highlighted in red.

  3. Paleo-evolution of cotton gene families.
    Figure 3: Paleo-evolution of cotton gene families.

    a, Myb subgroup 9 (ref. 12) originated from a gene on the progenitor of cacao chromosome 2 that formed two adjacent copies after Malvales–Brassicales divergence and then triplicated in cotton, with subsequent loss of one chromosome 8 and two chromosome 12 paralogues. One extant paralogue traces to pan-eudicot hexaploidy, Tc04 g009420, and reduplicated in cotton (Gorai.012G052500.1 and Gorai.011G122800.1) and Arabidopsis8 (At3g01140 and At5g15310). The other, Tc01 g036330, has reduplicated in cotton (Gorai.004G157600.1 and Gorai.001G169700.1). Asterisk indicates increased gene expression in elite versus wild tetraploids (Supplementary Table 5.3). b, The most NBS-rich region of T. cacao, on chromosome 7, corresponds to regions of G. raimondii chromosome triplets 2/10/13 and 7/9/4. Cacao chromosome 7 NBSs form a single branch, indicating lineage-specific expansion. G. raimondii chromosome 7 and 13 NBSs form distinct branches, indicating cluster/tandem duplication (gene numbers also reflect physical proximity of genes to one another).

  4. Allelic changes between A- and D-genome diploid progenitors and the At and Dt subgenomes of G. hirsutum cultivar Acala Maxxa.
    Figure 4: Allelic changes between A- and D-genome diploid progenitors and the At and Dt subgenomes of G. hirsutum cultivar Acala Maxxa.

A.H.P., D.S.R., J.S. and D.G.P. conceived the study. J.S., J.G., D.S.R., K.C.S., S.D., M.V.D., C.L., L.N.R., B.E.S. and J.A.S. performed sequencing and associated clone manipulations. A.H.P., J.F.W., D.L., E.S.D., J.U., E.R., Z.W., H.A., L.V.H., R.H., D.M.S., A.V.-D. and T.Z. contributed unpublished data. A.H.P., J.F.W., H.Guo, H.Gundlach, J.J., D.J., D.L., S.S., J.U., M.-j.Y., R.B., W.C., A.D.-F., L.G., C.G., K.G., G.H., T.-h.L., J.L., L.L., T.L., B.S.M., J.T.P., A.W.R., E.R., E.S., X.T., H.T., C.X., J.W., Z.W., D.Z., L.Z., F.B., C.H.H., D.H., L.V.H., R.H., S.M., M.F.S.V., S.A.W., T.Z., E.S.D., D.S.R., X.W. and J.S. analysed data. A.H.P., J.F.W., D.L., E.S.D., K.F.X.M., D.G.P. and J.S. wrote the manuscript. All authors discussed results and commented on the manuscript.

The authors declare no competing financial interests.

Sequences have been deposited in NCBI for G. raimondii (BioProject accession PRJNA171262), G. longicalyx (accession F1-1, SRA061660), G. herbaceum (accession A1-97, SRA061243) and G. hirsutum (cultivar Acala Maxxa, SRS375727) genomes; G. hirsutum (SRA061240) and G. barbadense (SRA061309) fibre transcriptomes; G. hirsutum (SRA061456) seed transcriptomes; and G. hirsutum microRNAs (SRA061415).

