Introduction

TCP proteins, designated from names of four proteins TB1 (TEOSINTE BRANCHED 1) in maize (Zea mays), CYC (CYCLOIDEA) in snapdragon (Antirrhinum majus) and PCF1/2 (PROLIFERATING CELL FACTORS 1/2) in rice (Oryza sativa), are plant-specific transcription factors (TFs). They all contain a highly conserved TCP domain and are widely distributed in higher plants including monocot and dicot species. TCP domain consists of 59 amino acid residues that forms a basic helix-loop-helix (bHLH) type of DNA-binding domain1. Arabidopsis TCP proteins are classified into two classes, Class I (also named PCF subgroup) and Class II (including CYC/TB1 and CIN subgroups) based on the sequence similarity of the TCP domains2. It has been reported that Class I TCPs participate in promoting plant growth and proliferation. While CIN subgroup plays a key role in lateral organ development, and CYC/TB1 subgroup (also named as CYC/DICH) contributes to shoot branching, axillary meristems developing2.

TCP proteins usually form homodimers or heterodimers with each other to regulate the target genes’ expression. The target genes of TCP TFs all contain a highly conserved DNA motif G(T/C)GGNCCCAC, especially the core motif TGGGCC, GCCCR, GG(A/T)CCC3,4,5,6,7,8,9. They can also interact with other TFs such as DELLAs, AS2, ABI4, MYBs, and bHLHs, promoting flavonoid biosynthesis, triggering effector immunity, responding to abiotic stress and mediating salicylic acid (SA), jasmonate (JA), auxin, cytokinin (CK), abscisic acid (ABA) and gibberellin (GA) response10,11,12,13,14,15,16,17,18,19,20,21.

Allotetraploid upland cotton (G. hirsutum), accounting for more than 90% of cultivated cotton worldwide, is the most important fiber-producing crop22, 23. Cotton fibers are the single-cell trichomes derived from epidermal layers of seeds. Fiber development undergoes four distinctive but overlapping developmental stages: initiation (from −2 to 5 day post anthesis, −2–5 DPA), elongation (3–20 DPA), secondary cell wall deposition (16–40 DPA), and maturation (40–50 DPA)24. The mechanism of fiber cell differentiation is supposed to be similar to Arabidopsis leaf trichome24,25,26,27,28. In Arabidopsis, the positive regulators, including GL1 (GLABRA1), GL3 (GLABRA3), EGL3 (ENHANCER OF GL3) and TTG1 (TRANSPARENT TESTA GLABRA1), take control over trichome cell fate. GL1 belongs to the R2R3 MYB subfamily, which shows functional redundancy with MYB23 partially. GL3 and its homolog EGL3 are basic helix-loop-helix (bHLH) TFs, while TTG1 is a WD40-repeat protein. These proteins are assembled into a trimeric MYB–bHLH–WD protein complex to promote the expression of GL2 (encoding a homeodomain/leucine zipper TF) and TTG2 (encoding a WRKY TF), thereby controlling trichome formation27, 29. Similarly, it has been reported that GhMYB2/GhMYB23 (GL1 homolog) and two R2R3 MYBs (GhMYB25 and GhMYB25L), GhDEL65 (GL3 homolog), GhTTG1/GhTTG3 and GhHD1/GhHOX3 (GL2 homolog) regulate fiber initiation and differentiation of cotton25, 26, 28, 30,31,32. Additionally, previous studies showed that overexpressing GhTCP14 in Arabidopsis enhances the initiation and elongation of trichomes by binding to the promoters of auxin-related genes33, whereas GbTCP (a homolog of AtTCP15) silence in cotton leads to shorter fibers, associating with decreased expression of JA biosynthesis genes34. These data indicate that GhTCP14 and GbTCP play important roles in fiber development through phytohormone signaling pathways.

Recently, 38 and 36 TCPs were identified in two diploid cotton species Gossypium raimondii (DD genome) and Gossypium arboreum (AA genome), respectively35, 36. However, no genome-wide characterization of TCP family has been reported in allotetraploid cotton species (such as upland cotton) as so far. On the other hand, genome sequence and annotation of upland cotton (G. hirsutum TM-1) have been completed recently22, 23. This great progress on cotton genome research provides us a great opportunity to identify TCP TFs in the allotetraploid cotton species. In present study, we identified 74 TCP genes in upland cotton, and analyzed their gene/protein architectures, conserved domain profiles, physical properties, chromosomal location, and phylogenetic relationship. The expression dynamics of these TCP genes in cotton tissues (especially in developing fibers), and the capacity of the cotton TCP proteins to form homodimers/heterodimers, and the interaction with several fiber-related transcription factors were also studied. These data provide valuable information for understanding the classification and putative functions of GhTCPs, also throw some light into further investigation of the molecular mechanism of TCP proteins involved in fiber development.

Results

Identification of TCP genes in upland cotton

To identify all members of TCPs in upland cotton (G. hirsutum) genome, we performed a BLASTp search against upland cotton protein database (https://www.cottongen.org/tools/blast/blast) using the TCP sequences of G. raimondii and G. arboreum as queries. All potential upland cotton proteins were then submitted to MotifScan and SMART databases for annotation of the domain structure. Only the candidates containing TCP domains were regarded as “true” TCP proteins. Discarding the redundant and partial sequences manually, there are 64 GhTCPs in CGP-BGI assembled Gossypium hirsutum (AD1) Genome22, and 72 GhTCPs in NAU-NBI assembled Gossypium hirsutum (AD1) Genome23. Among all identified GhTCPs, 62 members were identical, while the rest 12 GhTCPs are different in above two Genome databases through protein sequence alignment. Totally, 74 non-redundant TCP genes were identified in upland cotton genome (Table 1). The number of GhTCPs is about 3.1 folds of AtTCPs, which is slightly higher than the ratio of putative cotton homologs to each Arabidopsis gene22, 23, 37. Considering upland cotton is an allotetraploid cotton species which contains A and D genomes, we named the 74 putative TCP genes as GhTCP1-A/D to GhTCP25-A/D according to the nomenclature system applied to Arabidopsis TCPs.

Table 1 TCP gene family in upland cotton (Gossypium hirsutum L. acc. TM-1)a.

Phylogenetic relationship of the cotton TCP family

To reveal the evolutionary relationship of the identified cotton TCP proteins, a phylogenetic tree was constructed by Neiboring-Joining (NJ) method using the full length 298 TCP protein sequences from G. hirsutum, G. arboreum, G. raimondii, Theobroma cacao, Vitis vinifera, Arabidopsis thaliana, Solanum lycopersicum, Oryza sativa, and Brachypodium distachyon. As shown in Fig. 1, the TCP family is divided into 11 groups designated Group A to Group K. GhTCPs in Group A−G belong to PCF clade, while Group H belongs to CYC/TB1 clade and Group I−K belong to CIN clade (Table 2)2, 35, 36. Group A, the largest clade among all groups, contains 12 GhTCP members, accounting for 16.2% of total GhTCPs; Group E, the smallest clade, only contains 2 members. Out of the 74 GhTCPs, 48 members belong to class I and the rest 26 fall into class II. In Arabidopsis, there are 13 class I TCPs and 11 class II TCPs. Compared with Arabidopsis TCPs, the expansion of TCPs in G. hirsutum genome is biased, which occurs mainly in class I (about 3.7 folds expansion). The class II remains about 2.5-fold size as that in Arabidopsis (Fig. 1, Table 2). In addition, we found that Group E is specific for eudicots species. And among the eight chosen species, only Vitis vinifera lacks Group E, F, G. This may imply that the divergence of these species took place after the TCP transcription factor family expansion.

Figure 1
figure 1

Phylogenetic analysis of upland cotton (G. hirsutum) TCP family. Phylogenetic tree was constructed using 298 protein sequences from G. hirsutum A-subgenome (37) and D-subgenome (37), G. arboreum (36), G. raimondii (38), Arabidopsis thaliana (24), Solanum lycopersicum (36), Oryza sativa (23), Brachypodium distachyon (21), Theobroma cacao (31), and Vitis vinifera (15) by Neighbor-joining method in MEGA 6.06 with bootstrap replication of 1000 times. Arabidopsis TCPs are highlighted with red colored text.

Table 2 Number of TCPs in upland cotton (G. hirsutum), G. arboreum, G. raimondii, Arabidopsis thaliana, Solanum lycopersicum, Oryza sativa, Brachypodium distachyon, Theobroma cacao, and Vitis vinifera.

Chromosomal distribution and gene duplication

Among the 74 GhTCPs, 69 members are located at the 22 chromosomes, and the else five genes are located in 4 unmapped scaffolds (scaffold4574_D12, scaffold4706_D13, scaffold2345_A09, and scaffold4070_D05). The distribution of GhTCP genes on the chromosomes is uneven, with the number of TCP genes per chromosome ranging from 0 to 7. Chromosomes At_Chr12 and Dt_Chr12 contain seven genes, while no TCP gene is found on At_Chr2, Dt_Chr3, At_Chr6 and Dt_Chr6 (Fig. 2). The distribution patterns of TCP genes in G. hirsutum chromosomes are similar to that in G. raimondii, but more uneven than that in G. arboreum 35, 36.

Figure 2
figure 2

Physical locations and gene duplication status of TCP genes on upland cotton (G. hirsutum) chromosomes. The TCP genes are located according to the upland cotton (G. hirsutum) genome NAU-NBI Assembly V 1.1 and Annotation v1.1 in COTTONGEN (https://www.cottongen.org/find/genes), and possible gene duplication events are indicated by gray lines.

Additionally, the gene duplication events were further investigated to reveal the expansion mechanism of the TCP gene family in G. hirsutum. As shown in Fig. 2, 14 pairs of duplicated genes in A-genome and 15 pairs of duplicated genes in D-genome were identified, accounting for about 70% of cotton TCP gene family. In fact, as the five genes located in unmapped scaffolds also show high identity to other genes, there could be even more duplication events. Further, except GhTCP15b and GhTCP15c, all the paralogous gene pairs are located on different chromosomes, suggesting that they result from segment duplications rather than tandem duplications.

Genomic structure of GhTCP genes and domain analysis of their protein products

To get a better understanding of the diversification of the GhTCP genes, the exon/intron organization of GhTCPs were analyzed. As shown in Fig. 3B, most (64 out of 74) of GhTCP genes contain no intron, and 7 members contain only one intron in the open reading frame (ORF). However, two genes (GhTCP18a-A and GhTCP25-D) consist of four introns and five exons, and one gene (GhTCP25-A) possesses six introns and seven exons. Moreover, similar exon/intron structures were found in GhTCP genes within the same phylogenetic subfamily (Fig. 3B).

Figure 3
figure 3

Characterization of upland cotton (G. hirsutum) TCPs. (A) Phylogenetic analysis of GhTCP proteins. The phylogenetic tree was generated using the Neighbor-Joining (NJ) method implemented in the MEGA 6.0 software with JTT model and pairwise gap deletion option. The bootstrap analysis was conducted with 1000 iterations. (B) exon/intron organization of GhTCP genes. Exons and introns are indicated with yellow boxes and gray lines, respectively. (C) Motif composition of GhTCP proteins. Conserved motifs in the GhTCP proteins are indicated by colored boxes.

To further reveal the diversification of cotton TCP family, putative motifs of cotton TCP proteins were predicted by program MEME choosing 20 motifs’ mode (Fig. 3C, Supplementary Fig. S1, and Supplementary Table 1). Based on the composition of motifs, the GhTCP proteins can be classified into 11 groups, just the same as that in Figs 1 and 3A,C). Motif 1 was identified as the conserved TCP domain which is present in every G. hirsutum TCP protein, providing further support for the reliability of our identification (Fig. 3C, Supplementary Fig. S1, and Supplementary Table 1). GhTCPs members within a sub-clade usually exhibit similar motif composition, while the motif composition among GhTCPs members from distinct clades shows significant difference, It indicates that there is possible intra-subclade functional redundancy and inter-subclade functional divergence (Fig. 3C).

Expression profiling of TCP genes in cotton

To investigate the functional divergence of cotton TCP genes, their expression levels in different organs/tissues (including roots, stems, leaves, ovules and fibers) were analyzed by quantitative RT-PCR (qRT-PCR). Because of the high sequence similarity between GhTCP-A and GhTCP-D cDNAs, we designed one common primer pair for analyzing TCP-A/D gene expression. As shown in Fig. 4, the expression of GhTCP7a, GhTCP9b, GhTCP11, GhTCP19a and GhTCP23 showed no tissue difference, with relatively high expression levels in all tissues. The majority of the rest genes’ expression exhibit obvious tissue difference. For example, GhTCP2, GhTCP3, GhTCP4, GhTCP5, GhTCP6a/6b/6c, GhTCP7a/7b, GhTCP9a/9b, GhTCP10, GhTCP11, GhTCP12, GhTCP13a/13b, GhTCP14b, GhTCP15b/15c, GhTCP16, GhTCP17, GhTCP18a, GhTCP20b, GhTCP23 and GhTCP24 were specifically or preferentially expressed in leaves. These genes are homologs of class I and CIN AtTCPs which are involved in regulating leaf morphology4, 38,39,40,41,42,43. This indicates that these genes may be associated with developmental regulation of cotton leaves. The transcripts of some other genes, such as GhTCP1, GhTCP6a, GhTCP14c and GhTCP20a, were predominantly accumulated in stems. The different expression patterns of GhTCPs in cotton suggest the functional divergence of these GhTCP genes in cotton development.

Figure 4
figure 4

Quantitative RT-PCR analysis of expressions of TCP genes in upland cotton tissues. 0o and 9 f indicate 0 DPA (day post anthesis) ovules and 9 DPA fibers, respectively. Error bars indicate ± SD of triplicate experiments. Three biological replicates were used for calculation. Y-axis represents the relative expression value (%) to GhUBI1 gene.

We are more concerned about the function of the TCP genes in fiber development. qRT-PCR results showed that GhTCP2, GhTCP7a/7b, GhTCP8, GhTCP9b, GhTCP10, GhTCP11, GhTCP19a/19b, GhTCP20b, GhTCP23 and GhTCP24 were strongly expressed in 0 DPA ovules relatively. While GhTCP5, GhTCP7a, GhTCP9b, GhTCP10, GhTCP14a, GhTCP15a/15b/15c, GhTCP19b, GhTCP21 and GhTCP22 were expressed in 9 DPA fibers at relatively high levels. The 0 DPA ovules and 9 DPA fibers refer to the cotton fiber cells at the stages of initiation and fast elongation, respectively. Therefore, some genes, which are relatively higher expressed in 0 DPA ovules or 9 DPA fibers, were selected out as candidates to investigate their expression patterns during cotton fiber development. As shown in Fig. 5C, Class I members, including GhTCP7a, GhTCP14a, GhTCP15a/15b/15c, GhTCP21 and GhTCP22, were preferentially expressed in fast elongating fibers (6~12 DPA), especially, Group A members (GhTCP14a and GhTCP15a/15b/15c) which were predominantly expressed in the fibers of this stage (Fig. 5C). The result implied that Class I, especially Group A, TCP genes may be involved in cotton fiber elongation. GhTCP2, GhTCP8, GhTCP9b, GhTCP19a, GhTCP23 and GhTCP24 were preferentially expressed at the stage of fiber initiation. Relatively, GhTCP2, GhTCP10, GhTCP11, GhTCP19a and GhTCP24 were highly expressed in secondary cell wall deposition stage (Fig. 5C). Furthermore, expression patterns of these genes were verified by using transcriptome data during cotton fiber development. The RPKM (reads per kb per million reads) values denoting the expression levels of TCP genes in the cotton -3, 0, 3 DPA ovule, 5, 10, 20, and 25 DPA fibers were used to create a heat-map of TCP expression (Table S2). As shown in Supplementary Fig. S2, GhTCP7a, GhTCP14a, GhTCP15a/15b/15c, GhTCP20b, GhTCP21-D, GhTCP22 and GhTCP25-A were preferentially expressed in fast elongating fibers. GhTCP1-A, GhTCP3, GhTCP4-D, GhTCP5, GhTCP6a/6b/6c, GhTCP10, GhTCP11, GhTCP12-D, GhTCP13a and GhTCP20a-D were preferentially expressed in secondary cell wall deposition stage. GhTCP2, GhTCP7b, GhTCP8, GhTCP9a/9b, GhTCP14b/14c, GhTCP12-A, GhTCP16, GhTCP19a/19b, GhTCP20a-A, GhTCP23, GhTCP24-A and GhTCP25-D were preferentially expressed in cotton fiber initiation. The transcriptome data were consistent with the qRT-PCR results (Fig. 5C, Supplementary Fig. S2). These results suggest that GhTCPs’ expression is developmentally regulated in cotton fibers.

Figure 5
figure 5

Quantitative RT-PCR analysis of Epressions of GhTCP genes in developing fibers. (A) Cotton boll and fiber development: bolls at increasing stages of development were partially dissected to show ovules. (B) Cotton fiber development is shown over developmental time. Red arrow showed the fiber cells. All scale bars = 1 cm. (C) Epressions of GhTCP genes in developing fibers. Relative values of expressions of GhTCP genes in fibers are shown as percentage of GhUBI1 expression activity. Error bars represent SD. −2o and 0o represent −2 and 0 DPA ovules; 3o + f represents 3DPA ovules with fibers; 6f–21 f represent 6DPA fibers to 21 DPA fibers. Error bars indicate ± SD of triplicate experiments. Three biological replicates were used for calculation. DPA, day post anthesis. Y-axis represents the relative expression value (%) to GhUBI1 gene.

Differential expressions of GhTCPs in cotton Xuzhou 142 and its natural fuzzless-lintless mutant (fl)

To determine whether GhTCPs are involved in fiber initiation, we analyzed the expressions of six GhTCP genes (GhTCP2, GhTCP7a, GhTCP8, GhTCP9b, GhTCP22, and GhTCP24) in early developing ovules/fibers of wild type cotton (cv. Xuzhou142) and its fuzzless-lintless mutant (fl). As shown in Fig. 6, GhTCP8 and GhTCP22 showed high expression levels in 0–1 DPA fl ovules and in –1 DPA Xuzhou 142 ovules. The expression of GhTCP7a in Xuzhou 142 ovules was higher than that in fl ovules. Interestingly, GhTCP2 and GhTCP24 showed opposite expression profiles in ovules of Xuzhou 142 and its fl mutant. The expression of GhTCP2 in –2 to 0 DPA Xuzhou 142 ovules was higher than that in fl ovules, while its expression declined in 1 DPA Xuzhou 142 ovules and became lower than that in fl ovules. GhTCP9b showed relatively high expression activity in –2 DPA Xuzhou 142 ovules, while its expression in −1 to 1 DPA ovules displayed slight difference between Xuzhou 142 and fl.

Figure 6
figure 6

Comparison of expressions of GhTCP genes in upland cotton XuZhou142 and its fiberless mutant (fl). Quantitative RT-PCR was performed for analyzing expression levels of TCP genes in early developing ovules of wild type cotton Xuzhou 142 and fl. 1,2,3,4 represent the cotton ovules at −2, −1, 0 and 1 DPA (day post anthesis), respectively. Error bars indicate ± SD. Three biological replicates were used for calculation. *. There was significant difference in gene expression level between Xuzhou 142 and fl (P < 0.05). **. There was very significant difference in gene expression level between Xuzhou 142 and fl (P < 0.01). Y-axis represents the relative expression value (%) to GhUBI1 gene.

Interactions among GhTCP proteins and several regulators related to cotton fiber development

TCP proteins tend to form homodimers or heterodimers that may be required for their DNA-binding activity3, 9. To understand how GhTCP proteins interact with each other, yeast two-hybrid technique was employed to analyze the interactions among these GhTCP proteins. The coding sequences of GhTCP genes were cloned as translational fusions with the yeast GAL4 TF binding (BD) or activation (AD) domain, and all combinations were tested in a DDO medium (Supplementary Fig. S3). As shown in Fig. 7, all the class I GhTCPs could form both homodimers and heterodimers. GhTCP2, belonging to class II, can interact with all the GhTCPs, while GhTCP18b, another class II TCP, can interact with GhTCP2, GhTCP7a/7b and GhTCP14a/15c. Additionally, GhTCP10 and GhTCP18b have autoactivation activity in yeast on both selection media, while GhTCP22 shows weak autoactivation activity only on TDO medium with 1 mM 3-AT, and group F GhTCPs (GhTCP9a, GhTCP9b and GhTCP19a) can not interact with GhSLR1 (Supplementary Fig. S4).

Figure 7
figure 7

Interactions among GhTCP proteins. Coding sequences of GhTCP genes were cloned into pGADT7 and pGBKT7 vectors. Interactions among the GhTCP proteins were analyzed by yeast two-hybrid assay. Transformants were assayed for growth on QDO nutritional selection medium.

We also checked whether GhTCP14a and GhTCP22 can interact with some TFs related to fiber development. As shown in Fig. 8 and Supplementary Fig. S5, GhTCP14a can interact with GhSLR1, GhARF6, GhBZR1, GhEIN3 and GL1-GL3-TTG1 members (GhGL3, GhMYB23, GhMYB25, GhMYB25L and GhTTG1), while GhTCP22a can interact with GhSLR1, GhARF6 and GL1-GL3-TTG1 members (GhGL3, GhMYB23, GhMYB25 and GhTTG1) in yeast cells.

Figure 8
figure 8

Interactions between GhTCP14a/GhTCP22 and several TFs related to cotton fiber development. Interactions between GhTCP proteins and the TF condidates were analyzed by yeast two-hybrid assay. Transformants were assayed for growth on TDO nutritional selection medium.

Discussion

Plant TCP TFs are ancient proteins. The number of TCP proteins is expanded from 5~6 members in pluricellular algae/moss to more than 20 members in Arabidopsis thaliana, rice, and poplar2, 44, 45. Recently, genome-wide identification revealed that segmental duplication may be a predominant duplication event for TCP genes and a major contributor to expansion of TCP gene family in two diploid cotton species G. raimondii and G. arboreum35, 36. In our study, 74 GhTCP genes were identified in allotetraploid upland cotton genome (AADD). These GhTCPs can be divided into two classes (class I and class II), and class II can be further split into two clades (TB1/CYC clade and CIN clade) (Fig. 3A). TCP domain allows TCP proteins to bind to DNA and to mediate protein-protein interaction1, 46. In this study, sequence analysis revealed that TCP domains are highly conserved in each group of GhTCP family, suggesting that the GhTCPs in the same group may share similar DNA binding capacity and protein interaction pattern. Upland cotton TCPs are classified into eleven groups based on their phylogenetic relationship and motif distribution patterns (Figs 1 and 3). GhTCPs members within a sub-clade usually exhibit similar motif composition, while the motif composition among GhTCPs members from distinct clades shows significant difference. Some special motifs are only present in certain clade. Recent studies reported there are about 70,000~76,000 protein-coding genes existing in G. hirsutum genome22, 23, and 27,029 protein-coding genes in Arabidopsis genome37. This means that there are about 2.6~2.8 times duplication of protein coding genes in the G. hirsutum genome compared with Arabidopsis. Thus, the duplication ratio of TCP genes is slightly higher than other gene families in G. hirsutum. Furthermore, we found the duplication ratio of class I TCP genes (3.7 fold) is higher than that of Class II (2.5 fold) during evolution, likely to G. arboretum and G. raimondii (Table 2).

Previous studies showed GhTCP14 (named as GhTCP14a in this paper) and GbTCP (homolog of GhTCP15a) play critical roles in cotton fiber development which are expressed predominantly in initiating and elongating fibers33, 34. In our study, GhTCP14a and GhTCP15a were predominantly expressed in fast elongating fibers (6–12 DPA). In addition, several class I GhTCPs, including GhTCP7a, 9b, 15b/c, 21, and 22, were coexpressed with GhTCP14a and GhTCP15a during cotton fiber development, suggesting that class I TCPs may function redundantly in regulating fiber development. Similarly, many class I TCPs function redundantly to control plant grow and development in Arabidopsis 8, 15, 41, 43. Additionally, AtTCP8/14/15/22 interact with DELLA proteins mediating GA signaling15. In our study, GhTCP7a, GhTCP14a, GhTCP15a/15b/15c, and GhTCP22 proteins can form homodimer and hetrodimers, and can interact with GhSLR1. These data suggest a GA-regulated DELLA-TCP interaction may also exist in cotton fiber for regulating fiber elongation. The qRT-PCR results also showed several GhTCPs were differentially expressed between Xuzhou142 and its natural fuzzless-lintless mutant (fl) during cotton fiber initiation (Figs 5C6). However, no differentially expressed GhTCPs was found in the identified 865 DEGs (differentially expressed genes) between the Xuzhou 142 and fl in ovules at −3 and 0 DPA47. The reason for this conflict may be that the differential expression levels of the DEGs exhibited in the transcriptome data are over 3 folds47, but our results have shown that the differential expression levels of all selected GhTCPs genes are less than 3 times between Xuzhou 142 and fl ovules (Fig. 6). Additionally, GhTCP11 is preferentially expressed in fibers at the stage of secondary cell wall biosynthesis, suggesting that this gene may be involved in secondary cell wall formation of fibers. Except that, many GhTCPs are preferentially expressed in leaves suggesting these genes may be involved in cotton leaf development, similar to their homologs in Arabidopsis 4, 38,39,40,41,42,43, 48. Previous studies showed CYC/TB1 TCPs contribute to shoot branching, as well as control the growth and development of axillary buds2, 49,50,51,52,53. Antirrhinum CYC and DICH were expressed in dorsal domain of early floral meristems49. LjCYC2 was expressed in floral meristems and the dorsal organs of developing flowers52. OsTB1 and AtTCP18 (AtBRC1) are expressed in axillary buds50, 53. Our results showed that the expression activities of all 8 G. hirsutum CYC/TB1 members (CYC/DICH clade) are very low in the 5 selected cotton tissues (Fig. 3). Hence, their expression patterns in the axillary tissues or developing flowers need to be further investigated.

It has been reported that TCP proteins interact preferentially with those TCP proteins from the same class to form homodimer or heterodimer in Arabidopsis, tomato and rice8, 9. Similarly, our data revealed that some GhTCP proteins, especially class I TCPs, have the ability to form homodimer and heterodimer. Furthermore, GhTCP10 and GhTCP18b have autoactivation activity, while GhTCP22 showed weak autoactivation in yeast cells (Supplementary Fig. S4). In contrast, other class I GhTCPs did not show any self-activation activities when they were used as baits in yeast two-hybrid assay. Therefore, it is likely that at least some TCP TFs are not transcriptional activators per se, and need to interact with other proteins for controlling transcription. Recently, several studies showed that TCPs interact with some TFs, such as DELLAs, AS2, ABI4, MYBs (TT2, PAP1, PAP2, MYB113 and MYB114), and bHLHs (TT8, TOC1), suggesting that TCPs are involved in regulating plant growth and development11, 13, 15, 16, 18. Our studies showed GhTCP14a and GhTCP22 interact with GhMYB23/GhMYB25-GhGL3-GhTTG1, the homologs of triplet GL1-GL3-TTG1 that control Arabidopsis trichome initiation27. GhMYB23/GhMYB25, GhGL3 and GhTTG1 are preferentially expressed in initiating fibers, and promote fiber initiation of cotton26, 31, 54. Thus, GhTCP14a and GhTCP22 may play an important role in regulating cotton fiber initiation. Additionally, GhTCP14a and GhTCP22 have the ability to interact with GhSLR1, GhBZR1 and GhARF6. These results suggest that GhTCP14a/22 may participate in controlling cotton fiber elongation via GA, BR and auxin signaling pathways.

In brief, the data presented in this study systematically analyzed TCP gene family of upland cotton. Our results lay the foundation for functional characterization of GhTCP genes and will lead to further understanding of the structure-function relationship among these TCP members. Additionally, our study also provides comprehensive information and novel insights into evolution and divergence of TCP genes in upland cotton.

Materials and Methods

Plant materials

Upland cotton (G. hirsutum cv. Coker312, Xuzhou142 and its natural fuzzless-lintless mutant fl) seeds were surface sterilized with 70% (v/v) ethanol for 1 min and 10% hydrogen peroxide for 2 h, followed by washing with sterile water. The sterilized seeds were germinated on one-half strength Murashige and Skoog (MS) medium (12-h-light/12-h-dark cycle, 28 °C), and sterile seedlings were transplanted in soil for further growing to maturation. The roots, stems (near the shoot apical meristem) and leaves of four leaves period cotton plants were harvested for RNA extraction. The ovules and cotton fibers in different developmental stage were collected for RNA extraction.

Identification of GhTCP genes and proteins

The genome sequence of G. hirsutum was downloaded from the Cotton Genome Project (CGP; http://cgp.genomics.org.cn/page/species/index.jsp) and CottonGen (http://www.cottongen.org/)22, 23. In order to identify all members of TCPs in G. hirsutum genome, a BLASTP search was performed against G. hirsutum protein database in CottonGen using the TCP sequences of G. raimondii and G. arboreum as queries. The candidate TCP genes were further aligned to remove redundant sequences. Subsequently, the TCP sequences were manually inspected with MotifScan (http://myhits.isb-sib.ch/cgi-bin/motif_scan) and SMART (http://smart.embl-heidelberg.de/) databases to confirm the presence of the conserved TCP domain. The TCP gene and protein sequences from Arabidopsis thaliana, Theobroma cacao, Vitis vinifera, Solanum lycopersicum, Oryza sativa, and Brachypodium distachyon were retrieved from PlantTFDB plant transcription factor database (http://planttfdb.cbi.pku.edu.cn/), while the GrTCP and GaTCP sequences were obtained from previous studies35, 36.

DNA and protein sequence analysis

DNA and protein sequences were analyzed using DNASTAR software (DNAStar, MD, USA). Phylogenetic analysis was performed to determine evolutionary relationship among protein sequences. The phylogenetic tree was generated using the Neighbor-Joining (NJ) method implemented in the Clustal X, and output by MEGA 6.06 software (http://www.megasoftware.net/). GhTCP protein sequences were submitted to online Multiple Expectation maximization for Motif Elicitation (MEME) program (http://meme-suite.org/, Version 4.11.0) for identification of conserved protein motifs. The optimized MEME parameters are as follows: any number of repetitions, the optimum width: 6 to 50, maximum number of motifs: 20, and minimum sites per motif: 4.

Expression pattern analysis

For the qRT-PCR analysis, total RNA was extracted from roots, stems, leaves, ovules and fibers. RNA was purified using Qiagen RNeasy kit according to the manufacturer’s instructions. First strand of cDNA was reversely synthesized from the purified RNA using Moloney murine leukemia virus reverse transcriptase (Promega) according to the manufacturer’s instructions. Quantative PCR was performed using the fluorescent intercalating dye SYBR-Green (Toyobo) in a detection system (MJ Research; Option 2), and a cotton polyubiquitin gene (GhUBI1, GenBank accession no. EU604080) was used as a standard control. A two-step PCR procedure was performed in all experiments using a method described earlier55. The relative target gene expression was determined using the comparative cycle threshold method. To achieve optimal amplification, PCR conditions for every primer combination were optimized for annealing temperature and Mg2+ concentration. PCR products were confirmed on an agarose gel. Data presented in the qRT-PCR analysis are mean and standard deviation of three biological replicates of plant materials and three technical replicates in each biological sample using gene-specific primers (Supplementary Table 2).

Heat-map analysis of gene expression

The RPKM (reads per kb per million reads) values denoting the expression levels of TCP genes were isolated from a comprehensive profile of the TM-1 transcriptome data (Accession codes, SRA: PRJNA248163)23, 56, downloaded from http://www.ncbi.nlm.nih.gov/sra/?term=PRJNA248163. A heat-map analysis was performed using Genesis57.

Yeast two-hybrid assay

The coding sequences of GhTCP and TF genes amplified by PCR using Pfu DNA polymerase and gene-specific primers (Supplementary Table 3) were cloned into the different restriction sites of yeast two-hybrid vectors pGBKT7 (bait vector) and pGADT7 (prey vector), creating fusions to the binding domain and activation domain of the yeast transcriptional activator GAL4, respectively. All these constructs were checked by sequencing. The corresponding constructs were co-transformed into Y2HGold yeast strain using the high-efficiency lithium acetate transformation procedure following the manufacturer’s instructions (Clontech). Successfully transformed cell colonies were identified on yeast double drop-out (DDO) medium lacking Leu and Trp after the transformants were incubated on DDO medium at 30 °C for 3–4 days. The positive interactions were identified on yeast quadruple dropouts (QDO) lacking Leu, Trp, His and Ade or on yeast drop-out triple dropouts (TDO) lacking Leu, Trp, and His with 1 mM 3-amino-1,2,4-triazole (3-AT). The pGADT7 empty vector and pGADT7-GhSLR1 were also co-transformed with pGBKT7 constructs as negative and positive controls, respectively.