Genome-wide identification and characterization of TCP transcription factor genes in upland cotton (Gossypium hirsutum)

TCP proteins are plant-specific transcription factors (TFs), and perform a variety of physiological functions in plant growth and development. In this study, 74 non-redundant TCP genes were identified in upland cotton (Gossypium hirsutum L.) genome. Cotton TCP family can be classified into two classes (class I and class II) that can be further divided into 11 types (groups) based on their motif composition. Quantitative RT-PCR analysis indicated that GhTCPs display different expression patterns in cotton tissues. The majority of these genes are preferentially or specifically expressed in cotton leaves, while some GhTCP genes are highly expressed in initiating fibers and/or elongating fibers of cotton. Yeast two-hybrid results indicated that GhTCPs can interact with each other to form homodimers or heterodimers. In addition, GhTCP14a and GhTCP22 can interact with some transcription factors which are involved in fiber development. These results lay solid foundation for further study on the functions of TCP genes during cotton fiber development.


Identification of TCP genes in upland cotton.
To identify all members of TCPs in upland cotton (G. hirsutum) genome, we performed a BLASTp search against upland cotton protein database (https://www.cottongen.org/ tools/blast/blast) using the TCP sequences of G. raimondii and G. arboreum as queries. All potential upland cotton proteins were then submitted to MotifScan and SMART databases for annotation of the domain structure. Only the candidates containing TCP domains were regarded as "true" TCP proteins. Discarding the redundant and partial sequences manually, there are 64 GhTCPs in CGP-BGI assembled Gossypium hirsutum (AD1) Genome 22 , and 72 GhTCPs in NAU-NBI assembled Gossypium hirsutum (AD1) Genome 23 . Among all identified GhTCPs, 62 members were identical, while the rest 12 GhTCPs are different in above two Genome databases through protein sequence alignment. Totally, 74 non-redundant TCP genes were identified in upland cotton genome ( Table 1). The number of GhTCPs is about 3.1 folds of AtTCPs, which is slightly higher than the ratio of putative cotton homologs to each Arabidopsis gene 22,23,37 . Considering upland cotton is an allotetraploid cotton species which contains A and D genomes, we named the 74 putative TCP genes as GhTCP1-A/D to GhTCP25-A/D according to the nomenclature system applied to Arabidopsis TCPs.
Phylogenetic relationship of the cotton TCP family. To reveal the evolutionary relationship of the identified cotton TCP proteins, a phylogenetic tree was constructed by Neiboring-Joining (NJ) method using the full length 298 TCP protein sequences from G. hirsutum, G. arboreum, G. raimondii, Theobroma cacao, Vitis vinifera, Arabidopsis thaliana, Solanum lycopersicum, Oryza sativa, and Brachypodium distachyon. As shown in Fig. 1, the TCP family is divided into 11 groups designated Group A to Group K. GhTCPs in Group A−G belong to PCF clade, while Group H belongs to CYC/TB1 clade and Group I−K belong to CIN clade ( Table 2) 2,35,36 . Group A, the largest clade among all groups, contains 12 GhTCP members, accounting for 16.2% of total GhTCPs; Group E, the smallest clade, only contains 2 members. Out of the 74 GhTCPs, 48 members belong to class I and the rest 26 fall into class II. In Arabidopsis, there are 13 class I TCPs and 11 class II TCPs. Compared with Arabidopsis TCPs, the expansion of TCPs in G. hirsutum genome is biased, which occurs mainly in class I (about 3.7 folds expansion). The class II remains about 2.5-fold size as that in Arabidopsis (Fig. 1, Table 2). In addition, we found that Group E is specific for eudicots species. And among the eight chosen species, only Vitis vinifera lacks Group E, F, G. This may imply that the divergence of these species took place after the TCP transcription factor family expansion.
Chromosomal distribution and gene duplication. Among the 74 GhTCPs, 69 members are located at the 22 chromosomes, and the else five genes are located in 4 unmapped scaffolds (scaffold4574_D12, scaf-fold4706_D13, scaffold2345_A09, and scaffold4070_D05). The distribution of GhTCP genes on the chromosomes is uneven, with the number of TCP genes per chromosome ranging from 0 to 7. Chromosomes At_Chr12 and Dt_ Chr12 contain seven genes, while no TCP gene is found on At_Chr2, Dt_Chr3, At_Chr6 and Dt_Chr6 (Fig. 2). The distribution patterns of TCP genes in G. hirsutum chromosomes are similar to that in G. raimondii, but more uneven than that in G. arboreum 35,36 .
Additionally, the gene duplication events were further investigated to reveal the expansion mechanism of the TCP gene family in G. hirsutum. As shown in Fig. 2, 14 pairs of duplicated genes in A-genome and 15 pairs of duplicated genes in D-genome were identified, accounting for about 70% of cotton TCP gene family. In fact, as the five genes located in unmapped scaffolds also show high identity to other genes, there could be even more duplication events. Further, except GhTCP15b and GhTCP15c, all the paralogous gene pairs are located on different chromosomes, suggesting that they result from segment duplications rather than tandem duplications.  To further reveal the diversification of cotton TCP family, putative motifs of cotton TCP proteins were predicted by program MEME choosing 20 motifs' mode ( Fig. 3C, Supplementary Fig. S1, and Supplementary Table 1). Based on the composition of motifs, the GhTCP proteins can be classified into 11 groups, just the same as that in Figs 1 and 3A,C). Motif 1 was identified as the conserved TCP domain which is present in every G. hirsutum TCP protein, providing further support for the reliability of our identification (Fig. 3C, Supplementary  Fig. S1, and Supplementary Table 1). GhTCPs members within a sub-clade usually exhibit similar motif composition, while the motif composition among GhTCPs members from distinct clades shows significant difference, It indicates that there is possible intra-subclade functional redundancy and inter-subclade functional divergence (Fig. 3C).    GhTCP20b, GhTCP23 and GhTCP24 were specifically or preferentially expressed in leaves. These genes are homologs of class I and CIN AtTCPs which are involved in regulating leaf morphology 4,[38][39][40][41][42][43] . This indicates that these genes may be associated with developmental regulation of cotton leaves. The transcripts of some other genes, such as GhTCP1, GhTCP6a, GhTCP14c and GhTCP20a, were predominantly accumulated in stems. The different expression patterns of GhTCPs in cotton suggest the functional divergence of these GhTCP genes in cotton development.

Differential expressions of GhTCPs in cotton Xuzhou 142 and its natural fuzzless-lintless mutant (fl).
To determine whether GhTCPs are involved in fiber initiation, we analyzed the expressions of six GhTCP genes (GhTCP2, GhTCP7a, GhTCP8, GhTCP9b, GhTCP22, and GhTCP24) in early developing ovules/ fibers of wild type cotton (cv. Xuzhou142) and its fuzzless-lintless mutant (fl). As shown in Fig. 6, GhTCP8 and GhTCP22 showed high expression levels in 0-1 DPA fl ovules and in -1 DPA Xuzhou 142 ovules. The expression of GhTCP7a in Xuzhou 142 ovules was higher than that in fl ovules. Interestingly, GhTCP2 and GhTCP24 showed opposite expression profiles in ovules of Xuzhou 142 and its fl mutant. The expression of GhTCP2 in -2 to 0 DPA Xuzhou 142 ovules was higher than that in fl ovules, while its expression declined in 1 DPA Xuzhou 142 ovules Interactions among GhTCP proteins and several regulators related to cotton fiber development. TCP proteins tend to form homodimers or heterodimers that may be required for their DNA-binding activity 3,9 . To understand how GhTCP proteins interact with each other, yeast two-hybrid technique was employed to analyze the interactions among these GhTCP proteins. The coding sequences of GhTCP genes were cloned as translational fusions with the yeast GAL4 TF binding (BD) or activation (AD) domain, and all combinations were tested in a DDO medium (Supplementary Fig. S3). As shown in Fig. 7, all the class I GhTCPs could form both homodimers and heterodimers. GhTCP2, belonging to class II, can interact with all the GhTCPs, while GhTCP18b, another class II TCP, can interact with GhTCP2, GhTCP7a/7b and GhTCP14a/15c. Additionally, GhTCP10 and GhTCP18b have autoactivation activity in yeast on both selection media, while GhTCP22 shows weak autoactivation activity only on TDO medium with 1 mM 3-AT, and group F GhTCPs (GhTCP9a, GhTCP9b and GhTCP19a) can not interact with GhSLR1 ( Supplementary Fig. S4).

Discussion
Plant TCP TFs are ancient proteins. The number of TCP proteins is expanded from 5~6 members in pluricellular algae/moss to more than 20 members in Arabidopsis thaliana, rice, and poplar 2,44,45 . Recently, genome-wide identification revealed that segmental duplication may be a predominant duplication event for TCP genes and a major contributor to expansion of TCP gene family in two diploid cotton species G. raimondii and G. arboreum 35,36 . In our study, 74 GhTCP genes were identified in allotetraploid upland cotton genome (AADD). These GhTCPs can be divided into two classes (class I and class II), and class II can be further split into two clades (TB1/CYC clade and CIN clade) (Fig. 3A). TCP domain allows TCP proteins to bind to DNA and to mediate protein-protein interaction 1,46 . In this study, sequence analysis revealed that TCP domains are highly conserved in each group of GhTCP family, suggesting that the GhTCPs in the same group may share similar DNA binding capacity and protein interaction pattern. Upland cotton TCPs are classified into eleven groups based on their phylogenetic relationship and motif distribution patterns (Figs 1 and 3). GhTCPs members within a sub-clade usually exhibit   Error bars indicate ± SD. Three biological replicates were used for calculation. *. There was significant difference in gene expression level between Xuzhou 142 and fl (P < 0.05). **. There was very significant difference in gene expression level between Xuzhou 142 and fl (P < 0.01). Y-axis represents the relative expression value (%) to GhUBI1 gene. similar motif composition, while the motif composition among GhTCPs members from distinct clades shows significant difference. Some special motifs are only present in certain clade. Recent studies reported there are about 70,000~76,000 protein-coding genes existing in G. hirsutum genome 22,23 , and 27,029 protein-coding genes in Arabidopsis genome 37 . This means that there are about 2.6~2.8 times duplication of protein coding genes in the G. hirsutum genome compared with Arabidopsis. Thus, the duplication ratio of TCP genes is slightly higher than other gene families in G. hirsutum. Furthermore, we found the duplication ratio of class I TCP genes (3.7 fold) is higher than that of Class II (2.5 fold) during evolution, likely to G. arboretum and G. raimondii (Table 2).
Previous studies showed GhTCP14 (named as GhTCP14a in this paper) and GbTCP (homolog of GhTCP15a) play critical roles in cotton fiber development which are expressed predominantly in initiating and elongating fibers 33,34 . In our study, GhTCP14a and GhTCP15a were predominantly expressed in fast elongating fibers (6-12 DPA). In addition, several class I GhTCPs, including GhTCP7a, 9b, 15b/c, 21, and 22, were coexpressed with GhTCP14a and GhTCP15a during cotton fiber development, suggesting that class I TCPs may function redundantly in regulating fiber development. Similarly, many class I TCPs function redundantly to control plant grow and development in Arabidopsis 8,15,41,43 . Additionally, AtTCP8/14/15/22 interact with DELLA proteins mediating GA signaling 15 . In our study, GhTCP7a, GhTCP14a, GhTCP15a/15b/15c, and GhTCP22 proteins can form homodimer and hetrodimers, and can interact with GhSLR1. These data suggest a GA-regulated DELLA-TCP interaction may also exist in cotton fiber for regulating fiber elongation. The qRT-PCR results also showed several GhTCPs were differentially expressed between Xuzhou142 and its natural fuzzless-lintless mutant (fl) during cotton fiber initiation (Figs 5C, 6). However, no differentially expressed GhTCPs was found in the identified 865 DEGs (differentially expressed genes) between the Xuzhou 142 and fl in ovules at −3 and 0 DPA 47 . The reason for this conflict may be that the differential expression levels of the DEGs exhibited in the transcriptome data are over 3 folds 47 , but our results have shown that the differential expression levels of all selected GhTCPs genes are less than 3 times between Xuzhou 142 and fl ovules (Fig. 6). Additionally, GhTCP11 is preferentially expressed in  Interactions between GhTCP proteins and the TF condidates were analyzed by yeast two-hybrid assay. Transformants were assayed for growth on TDO nutritional selection medium.
fibers at the stage of secondary cell wall biosynthesis, suggesting that this gene may be involved in secondary cell wall formation of fibers. Except that, many GhTCPs are preferentially expressed in leaves suggesting these genes may be involved in cotton leaf development, similar to their homologs in Arabidopsis 4,[38][39][40][41][42][43]48 . Previous studies showed CYC/TB1 TCPs contribute to shoot branching, as well as control the growth and development of axillary buds 2,[49][50][51][52][53] . Antirrhinum CYC and DICH were expressed in dorsal domain of early floral meristems 49 . LjCYC2 was expressed in floral meristems and the dorsal organs of developing flowers 52 . OsTB1 and AtTCP18 (AtBRC1) are expressed in axillary buds 50,53 . Our results showed that the expression activities of all 8 G. hirsutum CYC/TB1 members (CYC/DICH clade) are very low in the 5 selected cotton tissues (Fig. 3). Hence, their expression patterns in the axillary tissues or developing flowers need to be further investigated.
It has been reported that TCP proteins interact preferentially with those TCP proteins from the same class to form homodimer or heterodimer in Arabidopsis, tomato and rice 8,9 . Similarly, our data revealed that some GhTCP proteins, especially class I TCPs, have the ability to form homodimer and heterodimer. Furthermore, GhTCP10 and GhTCP18b have autoactivation activity, while GhTCP22 showed weak autoactivation in yeast cells (Supplementary Fig. S4). In contrast, other class I GhTCPs did not show any self-activation activities when they were used as baits in yeast two-hybrid assay. Therefore, it is likely that at least some TCP TFs are not transcriptional activators per se, and need to interact with other proteins for controlling transcription. Recently, several studies showed that TCPs interact with some TFs, such as DELLAs, AS2, ABI4, MYBs (TT2, PAP1, PAP2, MYB113 and MYB114), and bHLHs (TT8, TOC1), suggesting that TCPs are involved in regulating plant growth and development 11,13,15,16,18 . Our studies showed GhTCP14a and GhTCP22 interact with GhMYB23/ GhMYB25-GhGL3-GhTTG1, the homologs of triplet GL1-GL3-TTG1 that control Arabidopsis trichome initiation 27 . GhMYB23/GhMYB25, GhGL3 and GhTTG1 are preferentially expressed in initiating fibers, and promote fiber initiation of cotton 26,31,54 . Thus, GhTCP14a and GhTCP22 may play an important role in regulating cotton fiber initiation. Additionally, GhTCP14a and GhTCP22 have the ability to interact with GhSLR1, GhBZR1 and GhARF6. These results suggest that GhTCP14a/22 may participate in controlling cotton fiber elongation via GA, BR and auxin signaling pathways.
In brief, the data presented in this study systematically analyzed TCP gene family of upland cotton. Our results lay the foundation for functional characterization of GhTCP genes and will lead to further understanding of the structure-function relationship among these TCP members. Additionally, our study also provides comprehensive information and novel insights into evolution and divergence of TCP genes in upland cotton.

Materials and Methods
Plant materials. Upland cotton (G. hirsutum cv. Coker312, Xuzhou142 and its natural fuzzless-lintless mutant fl) seeds were surface sterilized with 70% (v/v) ethanol for 1 min and 10% hydrogen peroxide for 2 h, followed by washing with sterile water. The sterilized seeds were germinated on one-half strength Murashige and Skoog (MS) medium (12-h-light/12-h-dark cycle, 28 °C), and sterile seedlings were transplanted in soil for further growing to maturation. The roots, stems (near the shoot apical meristem) and leaves of four leaves period cotton plants were harvested for RNA extraction. The ovules and cotton fibers in different developmental stage were collected for RNA extraction.
Identification of GhTCP genes and proteins. The genome sequence of G. hirsutum was downloaded from the Cotton Genome Project (CGP; http://cgp.genomics.org.cn/page/species/index.jsp) and CottonGen (http://www.cottongen.org/) 22,23 . In order to identify all members of TCPs in G. hirsutum genome, a BLASTP search was performed against G. hirsutum protein database in CottonGen using the TCP sequences of G. raimondii and G. arboreum as queries. The candidate TCP genes were further aligned to remove redundant sequences. Subsequently, the TCP sequences were manually inspected with MotifScan (http://myhits.isb-sib.ch/cgi-bin/ motif_scan) and SMART (http://smart.embl-heidelberg.de/) databases to confirm the presence of the conserved TCP domain. The TCP gene and protein sequences from Arabidopsis thaliana, Theobroma cacao, Vitis vinifera, Solanum lycopersicum, Oryza sativa, and Brachypodium distachyon were retrieved from PlantTFDB plant transcription factor database (http://planttfdb.cbi.pku.edu.cn/), while the GrTCP and GaTCP sequences were obtained from previous studies 35,36 .
DNA and protein sequence analysis. DNA and protein sequences were analyzed using DNASTAR software (DNAStar, MD, USA). Phylogenetic analysis was performed to determine evolutionary relationship among protein sequences. The phylogenetic tree was generated using the Neighbor-Joining (NJ) method implemented in the Clustal X, and output by MEGA 6.06 software (http://www.megasoftware.net/). GhTCP protein sequences were submitted to online Multiple Expectation maximization for Motif Elicitation (MEME) program (http:// meme-suite.org/, Version 4.11.0) for identification of conserved protein motifs. The optimized MEME parameters are as follows: any number of repetitions, the optimum width: 6 to 50, maximum number of motifs: 20, and minimum sites per motif: 4.
Expression pattern analysis. For the qRT-PCR analysis, total RNA was extracted from roots, stems, leaves, ovules and fibers. RNA was purified using Qiagen RNeasy kit according to the manufacturer's instructions. First strand of cDNA was reversely synthesized from the purified RNA using Moloney murine leukemia virus reverse transcriptase (Promega) according to the manufacturer's instructions. Quantative PCR was performed using the fluorescent intercalating dye SYBR-Green (Toyobo) in a detection system (MJ Research; Option 2), and a cotton polyubiquitin gene (GhUBI1, GenBank accession no. EU604080) was used as a standard control. A two-step PCR procedure was performed in all experiments using a method described earlier 55 . The relative target gene expression was determined using the comparative cycle threshold method. To achieve optimal amplification, PCR conditions for every primer combination were optimized for annealing temperature and Mg 2+ concentration. PCR products were confirmed on an agarose gel. Data presented in the qRT-PCR analysis are mean and standard deviation of three biological replicates of plant materials and three technical replicates in each biological sample using gene-specific primers (Supplementary Table 2).
Heat-map analysis of gene expression. The RPKM (reads per kb per million reads) values denoting the expression levels of TCP genes were isolated from a comprehensive profile of the TM-1 transcriptome data (Accession codes, SRA: PRJNA248163) 23, 56 , downloaded from http://www.ncbi.nlm.nih.gov/sra/?term=PR-JNA248163. A heat-map analysis was performed using Genesis 57 .
Yeast two-hybrid assay. The coding sequences of GhTCP and TF genes amplified by PCR using Pfu DNA polymerase and gene-specific primers (Supplementary Table 3) were cloned into the different restriction sites of yeast two-hybrid vectors pGBKT7 (bait vector) and pGADT7 (prey vector), creating fusions to the binding domain and activation domain of the yeast transcriptional activator GAL4, respectively. All these constructs were checked by sequencing. The corresponding constructs were co-transformed into Y2HGold yeast strain using the high-efficiency lithium acetate transformation procedure following the manufacturer's instructions (Clontech). Successfully transformed cell colonies were identified on yeast double drop-out (DDO) medium lacking Leu and Trp after the transformants were incubated on DDO medium at 30 °C for 3-4 days. The positive interactions were identified on yeast quadruple dropouts (QDO) lacking Leu, Trp, His and Ade or on yeast drop-out triple dropouts (TDO) lacking Leu, Trp, and His with 1 mM 3-amino-1,2,4-triazole (3-AT). The pGADT7 empty vector and pGADT7-GhSLR1 were also co-transformed with pGBKT7 constructs as negative and positive controls, respectively.