Introduction

Regulation of gene transcription, one of the most complex activities in cells, plays a significant role in a wide variety of biological processes, such as cell growth, cell cycle control, signal transduction, metabolic and physiological balance and response to environmental stimuli1,2,3,4,5. Among many mechanisms of transcriptional regulation of gene expression, transcription factors are considered to be the most important. Transcription factors are a diverse family of regulatory proteins with specific DNA-binding domains involved in the regulation of many cellular processes by either stimulating or repressing transcription of the related genes2,3,6. Up to now, more than 60 transcription factor families have been identified in plants7.

The TCP proteins are a family of transcription factors exclusive to higher plants and involved in the regulation of cell growth and proliferation8,9. This class of transcription factors are featured by a highly conserved ~60-residue-long DNA-binding motif at the N-terminus called TCP domain, which is named after four founding members: TB1 (TEOSINTE BRANCHED 1) in Zea mays, CYC (CYCLOIDEA) in Antirrhinum majus and the PCF1 and PCF2 (PROLIFERATING CELL FACTORS 1 and 2) in Oryza sativa8,9. The TCP domain contains a non-canonical basic-Helix-Loop-Helix (bHLH) structure involved in DNA binding, protein-protein interaction and protein nuclear localization8,10. The two amphipatic helical motifs are abundant in hydrophobic Ala, Leu and Trp residues while the disordered linking loop region contains acidic, polar and non-charged amino acids. By comparison, the most conserved basic region is rich in positively charged Lys and Arg amino acids11. The TCP transcription factor family can be further divided into two subfamilies, class I and class II, based mainly on amino acid sequence differences, especially in the basic region of the TCP domain8. According to the results of bioinformatic analysis, several class II TCP members also share an arginine-rich R domain outside the conserved TCP domain with unknown function, speculated to facilitate protein-protein interaction8,12. The DNA-Binding site selection assays revealed that the two TCP classes can specifically recognize and bind to slightly different but partly overlapping GC-rich DNA sequences which act as cis-element in a large number of plant genes. The DNA binding sequence for class I is GGNCCCAC while class II prefer to bind the DNA motif G(T/C)GGNCCC10,13,14,15.

Widely cultivated in more than 100 countries, cotton is considered one of the most important fiber-producing and economic crops around the world, providing fiber for the textile industry and cooking oil extracted from its oil-rich seeds for food industry. The cotton industry is estimated to produce $133 billion in products and services annually, creating about 350 million jobs on farm or in the industry sectors. In spite of the economic and social importance of cotton and the critical role of TCP transcription factors in the control of plant cell proliferation and development, the research on cotton TCP family is much beyond on other plant species. In a recent study, Hao et al (2012) reported the functional characterization of a cotton TCP transcription factor GbTCP. According to this research, GbTCP was expressed in cotton fiber at much higher level than other tissues tested. Overexpression of GbTCP in Arabidopsis facilitated the initiation and elongation of root hair which has similar developmental mechanisms with cotton fiber, whereas RNAi silencing of GbTCP in cotton led to shorter fiber and low fiber quality, indicating that GbTCP played a significant role in fiber elongation16. Up to now, however, no genome-wide characterization of TCP family members has been performed in cotton. The recent availability of the completed genome sequence of Gossypium raimondii5, a diploid cotton species, provides us with a great opportunity to identify and characterize TCP transcription factors in cotton genome.

In the present study, we performed for the first time the comprehensive analysis of the TCP transcription factor family in G. raimondii. A total of 38 non-redundant TCP transcription factor encoding genes were identified in the genome of G. raimondii and were subsequently subjected to a systematic analysis, including phylogenetic relationships, chromosomal location, gene duplication status, substitution rates, gene structure, conserved motif and expression profiling. On the basis of the expression profiles of TCP members in G. raimondii and the phylogenetic analysis among the TCP domain proteins in Arabidopsis, rice and G. raimondii, the functions of GrTCPs were predicted. Besides, it is also remarkable that the expansion of TCP family in G. raimondii may be caused mainly by segmental duplication and is not associated with tandem duplication. In a word, our genome-wide analysis of TCP gene family will contribute to future studies on the functional characterization of TCP proteins in G. raimondii as well as the identification and comprehensive analysis of the TCP transcription factor family in other species.

Results

Identification of TCP genes

In order to identify the TCP transcription factor coding genes of G. raimondii, the HHM profile of TCP domain (PF03634) was employed as query to perform a blast search against the G. raimondii genome (http://www.phytozome.net/cotton). Originally, 62 candidate TCP genes were identified in G. raimondii. Among them, 24 redundant sequences were discarded from further analysis based on their sequence similarity. Subsequently, with the aim to verify the reliability of the initial results, a survey was conducted to confirm the existence of the conserved TCP domain with InterproScan18. The results showed that all of the 38 putative TCP genes contained conserved TCP domain. Due to the lack of standard annotation designated to the 38 TCP genes in the G. raimondii, we named them GrTCP1 to GrTCP25 according to the Arabidopsis TCP proteins with highest sequence similarity and following the nomenclature system applied to Arabidopsis. The length of the 38 newly identified TCP transcription factors varied from 196 to 549 amino acids with an average of 353.5 amino acids. Other characteristics of TCP transcription factors in G. raimondii, including isoelectric point (pI), molecular weight (Mw) and chromosome location, were listed in Table 1.

Table 1 TCP gene family in Gossypium raimondii

Phylogenetic analysis

To get a better understanding of the evolutionary history and phylogenetic relationships of TCP transcription factor family in G. raimondii, an unrooted phylogenetic tree was constructed with Neiboring-Joining method on the basis of multiple sequence alignment of 38 G. raimondii TCP protein sequences with all TCP sequences from Arabidopsis and rice, including 24 Arabidopsis TCP protein sequences and 22 rice TCP protein sequences (Figure 1). The bootstrap values for some nodes of the NJ tree were low as a result of relatively large number of sequences, which was also shown in previous reports9,11. Therefore, we sought other evidence to verify the reliability of our phylogenetic tree. The phylogenetic trees of TCP transcription family were reconstructed with Maximum likelihood, Minimal Evolution and PhyML methods. The trees produced by the three methods mentioned above were almost identical with only minor differences at some branches, suggesting that the four methods were largely consistent with each other. Besides, the analysis of gene structure, conserved motif structure and expression profiles were also used to confirm the validity of the phylogenetic tree. Considering the great similarity among these tree topologies as well as previous studies9,11, the NJ tree was employed for further analysis.

Figure 1
figure 1

Phylogenetic relationships of TCP transcription factors from Gossypium ramondii, Arabidopsis and rice.

The unrooted phylogenetic tree was constructed using MEGA 6.0 by Neighbor-Joining method and the bootstrap test was performed with 1,000 iterations. The eleven subclades are indicated with different colors.

According to the NJ phylogenetic tree (Figure 1), the TCP transcription factor family was divided into eleven subgroups designated Group A to Group K. According to their sequence features within and outside the TCP domain, GrTCPs in Group A, B, C, D, E, F and G belong to Class I subfamily while GrTCPs in the rest groups belong to Class II subfamily8. Group A, the largest clade among all subgroups, contained 12 members, representing 14.3% of the total TCP genes; Group E constituted the smallest clade, containing 3 members. Generally speaking, the TCP genes showed an interspersed distribution in most clades, indicating that the TCP family expanded before the divergence of the lineages. Additionally, the TCP genes were not evenly distributed in some clades in G. raimondii, Arabidopsis and rice. Many Arabidopsis TCP genes had two or more counterparts in G. raimondii, suggesting that GrTCP genes duplicated after the divergence of G. raimondii and Arabidopsis. For example, Group A contained seven G. raimondii TCPs but there were only three Arabidopsis members; Group D contained four G. raimondii TCPs but there were only two Arabidopsis TCPs. Specifically, Group F contained five G. raimondii TCPs but there was only one Arabidopsis TCP and no TCP gene was identified in rice genome, which implied that this group was either acquired after the divergence of monocots and dicots or lost in rice. By comparison, the rice TCP genes were overrepresented in Group K, which contained six rice TCP genes, while there were only two G. raimondii members and two Arabidopsis members in the same group. Some groups had almost equal number of TCP genes in the three species, such as Group B, C and E (Figure 1).

Many Arabidopsis TCP genes with similar functions tended to cluster into the same clade, which may imply that TCP genes within the same clade had similar functions in G. raimondii. For example, all AtTCPs in Group A and Group B (AtTCP8, AtTCP14, AtTCP15, AtTCP22, AtTCP23), which clustered with ten GrTCPs (Figure 1), play an important role in the regulation of leaf development by modulating gene networks involved in cell-cycle control and shoot apical meristem (SAM) maintenance27. AtTCPs in Group H function in the process of lateral branching which determines shoot architecture28,29. All AtTCP genes in Group I and Group K (AtTCP2, AtTCP3, AtTCP4, AtTCP10, AtTCP24), which cluster with five GrTCPs (Figure 1), are down regulated by miRNA319 and act as negative cell proliferation factors in the regulation of leaf margins30.

Chromosomal location and gene duplication

To determine the chromosomal distribution of the TCP genes in G. raimondii, the physical locations of all GrTCP genes on chromosomes were obtained through BLASTN searches against G. raimondii genome database in Phytozome (http://www.phytozome.net/cotton.php). Among the 38 GrTCP genes, a total of 36 genes were distributed across 11 out of the 13 G. raimondii chromosomes, while the rest two (GrTCP15b and GrTCP16) were anchored on unmapped scaffolds (Figure 2). Generally speaking, the number of GrTCP genes on each chromosome appeared to be uneven, ranging widely from 0 to 8 genes per chromosome. For example, chromosome 8 contained the highest number of 8 GrTCPs, accounting for 21.1% of the total GrTCP genes, followed by 4 GrTCPs on each of chromosome 1, 6, 9 and 13, whereas relatively low number of GrTCP genes were found on several chromosomes, including 2 genes on each of chromosome 2 and 11 and one gene on each of chromosome 4 and 5. By contrast, GrTCP genes were not observed on chromosome 3 and 10 (Figure 2).

Figure 2
figure 2

Chromosomal distribution and gene duplication of TCP genes in G. raimondii.

The scale is in megabases (Mb). The chromosome numbers are indicated at the top of each chromosome. The paralogous TCP genes are connected with a red line.

Given the importance of gene duplication in the amplification of gene families, potential duplication events involved in the evolution of G. raimondii genome were analyzed to shed light on the mechanism behind the expansion of the GrTCP gene family. On the basis of protein sequence identities, 19 pairs of putative paralogous GrTCP genes were identified, accounting for more than 70% of the entire GrTCP gene family and thereby supporting the hypothesis that putative gene duplication events are the main causes of the expansion of the GrTCP gene family. These gene pairs are in the same clade of the phylogenetic tree with high degree of protein sequence identities. For instance, the sequence of GrTCP15a covers 100% of that of GrTCP15b after alignment and the identity of aligned region is 98%, while the protein sequence identity of GrTCP7a and GrTCP7b is 88%. Among these paralogous gene pairs, 18 pairs are located on different chromosomes, suggesting a high number of segmental duplication events, whereas no traceable duplication events could be determined for another gene pair because one gene of this pair was anchored on unmapped scaffolds (Figure 2). Interestingly, six genes (GrTCP3, GrTCP6, GrTCP7a, GrTCP7b, GrTCP7c and GrTCP20c) participated in two segmental duplication events (e.g. GrTCP20a/GrTCP20b/GrTCP20c, GrTCP7a/GrTCP7b/GrTCP7c and GrTCP3/GrTCP4/GrTCP10). In contrast, no tandem duplication events were observed in these duplicated pairs (Figure 2).

In the present study, we further calculated the approximate dates of duplication events with the DnaSP program. The results showed that segmental duplications of GrTCP genes occurred between 11.28 Mya (million years ago) to 36.51 Mya, with an average of 19.83 Mya.

Gene structure and conserved motifs

With the aim to gain further insights into the evolutionary relationships among GrTCP genes, we investigated the exon/intron structures of individual GrTCP genes by alignment of cDNA sequences and corresponding genomic DNA sequences. As illustrated in Figure 3b, 32 out of 38 GrTCP genes had no intron, while the other GrTCP genes possess one intron, with the exception of GrTCP25 containing four introns. Additionally, an unrooted phylogenetic tree was constructed with GrTCP protein sequences to determine if the exon/intron organization of GrTCP genes is consistent with the phylogenetic subfamilies (Figure 3a). As expected, most GrTCP genes within the same subfamily demonstrated very similar exon/intron distribution patterns in terms of exon length and intron number. For example, most GrTCP gene in subfamily A, B, C, D and K had only one exon of similar length without intron, whereas members within subfamily H contain one intron, except for GrTCP12, which possesses no intron. By comparison, GrTCP genes in subfamily G showed great variability in exon length and intron number (Figure 3a and 3b).

Figure 3
figure 3

Phylogenetic analysis, gene structure and conserved motifs of TCP family in Gossipium raimondii.

(a). The phylogenetic tree of all TCP transcription factors in G. raimondii was constructed using Neighbor-Joining method and the bootstrap test was performed with 1,000 iterations. Bootstrap values higher than 50% support are displayed. (b). The exon/intron organization of TCP genes of G. raimondii. The blue lines represent 5′-UTR or 3′-UTR, green boxes represent exons and black lines indicate introns. (c). The conserved protein motifs in the TCP family were identified using MEME program. Each motif is indicated with a specific color.

We further searched for the conserved motifs in GrTCP proteins by MEME program to obtain more insights into the diversity of motif compositions among GrTCPs. As shown in Figure 3c, a total of 20 conserved motifs designated as motif 1 to motif 20 were identified. Most of GrTCP proteins within the same subfamily shared similar motif compositions while high divergence was observed among different subfamilies, implying that the GrTCP members within the same subfamily may perform similar functions and that some motifs may plan an important role in the subfamily-specific- functions. For example, all GrTCPs in subfamily C possess motif 1, 2, 4, 8, 12 and 15 while all members in subfamily I contain motif 1, 2, 9, 14, 17 and 18 (Figure 3c). In addition, some motifs were exclusively present in a particular subfamily, suggesting that these motifs may contribute to the specific function of that subfamily. For instance, motif 20 for subfamily A, motif 6 for subfamily F and motif 14, 17 and 18 for subfamily I (Figure 3c). Moreover, the program ScanProsite was employed to annotate the identified 20 motifs. However, few motifs hit for PROSITE (release 20.103) motifs in the database. Therefore, the functions of most motifs are still left unknown. The only motif that matched to protein sequences in the ScanProsite database was motif 1, which was annotated as the conserved TCP domain and was uniformly observed in all GrTCP proteins. Generally speaking, the consistency of the motif compositions of GrTCP proteins as well as the exon/intron structures of most GrTCP genes with the phylogenetic subfamilies further supported the close evolutionary relationships among GrTCPs as well as the reliability of our phylogenetic analysis.

Expression profiles of TCP genes in G. raimondii

To investigate the tissue-specific expression profiles of TCP genes in G. raimondii, the quantitative real time PCR (qRT-PCR) was performed for different organs, including leaf, flower bud, shoot and sepal. As indicated in Figure 4 and Figure 5, some GrTCP genes were differentially expressed in the four tissues tested while other GrTCP genes showed similar expression patterns in different tissues, which may indicate functional divergence of GrTCP genes during plant development. For example, GrTCP2, GrTCP3, GrTCP13b, GrTCP15c, GrTCP19a and GrTCP23 were constitutively expressed in every tissue tested at very high level, implying that these genes may play regulatory roles at multiple development stages, whereas GrTCP6, GrTCP9a, GrTCP20a and GrTCP24 were expressed at very low level in all tissues examined, which suggested that they may be primarily expressed in other organs not tested or under special conditions (Figure 5). In contrast, the expression levels of GrTCP20a, GrTCP20b and GrTCP20c were very high in leaf and bud and were relatively low in shoot and sepal, indicating that they may play an important role in the development of leaf and bud. A similar expression profile was also found for GrTCP1 and GrTCP25. In addition, some genes were exclusively highly expressed in a specific tissue. For example, GrTCP10, GrTCP12, GrTCP14b, GrTCP18b and GrTCP20d were relatively highly expressed in leaf while GrTCP7a, GrTCP7c, GrTCP9b and GrTCP14a were exclusively expressed in shoot at very high level, implying their specific roles in the corresponding tissues (Figure 4). In general, the GrTCP genes that are highly expressed in specific tissues may be involved in the regulation of plant development. For instance, GrTCP15c were relatively highly expressed in leaf and bud, suggesting that it may play a role in the development of the two tissues. According to previous studies, AtTCP15, the homologous counterparts of GrTCP15c, is involved in the regulation of leaf development31, which supported our hypothesis. However, further studies are still needed to unravel the divergent roles of GrTCP genes.

Figure 4
figure 4

Heatmap representation for expression patterns of G. raimondii TCP genes across different tissues.

The expression profile data of GrTCP genes in leaf, bud, shoot and sepal were obtain through quantitative real-time PCR.

Figure 5
figure 5

Expression profiles of 20 GrTCP genes across different tissues.

The y-axis represent the relative expression levels of GrTCP genes against reference gene TuA11. Error bars indicate standard deviation for three replicates.

Discussion

TCP transcription factors play important roles in plants

TCP transcription factors are a class of plant-specific transcription factors, which play an versatile function in multiple biological processes during plant growth and development (Figure 6). It has been reported that many TCP transcription factors participate in the regulation of multiple aspects of plant development, such as gametophyte development32,33,34, hormone signal transduction29,35,36, mitochondrial biogenesis37, regulation of the circadian clock38,39, lateral branching28,29,40, flower development31,41,42, seed germination43,44 and leaf development14,31. Class II TCP members have been found to function in a similar manner mainly by preventing plant growth and cell proliferation based on the mutation studies of multiple members in this subfamily9,30,40,45,46,47,48,49, whereas the predicted role of class I members seems to promote plant growth and cell proliferation10,50. In Arabidopsis, mutation in AtTCP18 (BRC1) gene led to a significant increase in the number of rosette branches while up-regulation of AtTCP18 resulted in the inhibition of lateral branching, suggesting that AtTCP18 plays a critical role in axillary bud outgrowth29. AtTCP4 has been shown to influence early embryo development and recent evidence revealed that pollen grains produced by transgenic Arabidopsis line expressing hyper-activated AtTCP4 genes cannot yield viable seeds, indicating that AtTCP4 may regulate plant reproduction32,33. Functional analysis of AtTCP1 showed that AtTCP1 is involved in the regulation of Brassinosteroid hormone signaling pathway by positively controlling the expression of a key enzyme DWARF435. In a recent study, AtTCP8 was proposed to be associated with mitochondrial biogenesis based on the evidence that AtTCP8 is able to bind to the promoter region of PNM1, a gene encoding a newly identified pentatricopeptide repeat protein that function in the mitochondrial gene expression37. Yeast two-hybrid assays revealed the interaction between some TCP transcription factors (AtTCP2, AtTCP3, AtTCP11 and AtTCP15) and several regulatory components of the circadian clock, suggesting that TCP proteins may control or influence the circadian networks38. AtTCP14 and AtTCP15 were reported to regulate floral organ development and reduced expression of the two transcription factors resulted in phenotypic abnormalities in the three outer whorls and the gynoecia31,41. In addition, AtTCP14 and AtTCP15 were also found to regulate leaf development: mutant AtTCP14 and AtTCP15 led to broader leaves towards the base and shorter petioles than the wild type31.

Figure 6
figure 6

TCP transcription factors play an important role in the multiple biological process during plant growth and development.

TCP transcription factors were widely existed in cotton

In the present study, a total of 38 TCP genes were identified from G. raimondii; the number of TCP genes in G. raimondii was higher than that in Arabidopsis (24) and in rice (22)9. The number of TCP genes in G. raimondii is approximately 1.58 times that in Arabidopsis, which is in strong agreement with the fact that the protein coding genes in G. raimondii genome (40,976 genes) is about 1.6 times that in Arabidopsis (25,498 genes)5,51. It is found that many TCP genes in Arabidopsis have two or more counterparts in G. raimondii, suggesting that the expansion of TCP family in G. raimondii may be caused by genome duplication events such as segmental duplication, tandem duplication and transposition events.

The high number of TCP genes in G. raimondii was more likely caused by gene duplication. Gene duplication, an outstanding feature of genomic architecture, plays a significant role in the process of plant genomic and organismal evolution, generating raw genetic material necessary for mutation, genetic drift and selection and contributing to the origin of new gene functions and the evolution of gene networks52,53. It has been demonstrated that the expansion of gene families is mainly attributed to gene duplication events on various scales, including tandem duplication, segmental duplication, transposition events and whole-genome duplication52,53. Our results indicate that segmental duplication is a predominant duplication event for TCP genes and the major contributor to the expansion of TCP gene family in G. raimondii. It has been reported that G. raimondii genome has undergone at least two rounds of genome-wide duplication, an ancient paleohexaploidization event at approximately 130.8 Mya and a recent whole-genome duplication event at around 13.3–20.0 Mya5. The average duplication date of GrTCP genes is very close to the recent whole-genome duplication date of G. raimondii, suggesting that large-scale genome duplication events may also contribute to the expansion of GrTCP family. In addition, according to a recent study, the split of G. raimondii/Arabidopsis and G. raimondii/T. cacao occurred at approximately 82.3 million years ago and 33.7 million years ago, respectively5. Since the duplication of GrTCPs originated from 11.28 to 36.51 million years ago, most of the GrTCP genes duplicated after the divergence of G. raimondii/Arabidopsis and G. raimondii/T. cacao. Summarizedly, our results indicate that both segmental duplication and whole-genome duplication contribute to the expandedness of TCP family in G. raimondii. This duplication may contribute to the unique functions of TCP transcription factors in cotton, for example controlling cotton fiber initiation and development.

Evolutionary conservation and divergence of the TCP family in G. raimondii

According to recent studies, G. raimondii and Arabidopsis diverged from a common ancestor at approximately 82.3 million years ago, which was followed by paleopolyploidy events in both species, contributing to evolutionary innovation5,54. Our studies showed that many TCP genes in Arabidopsis have two or more counterparts in G. raimondii with high protein sequence similarity, implying that the TCP genes may undergo differential expansion in G. raimondii and Arabidopsis. Through comparison of the exon/intron organization of individual TCP family members in the two species, we observed that their gene structures exhibited high similarities (Figure 3b and Supplementary Fig. S1 and 2). Among ten pairs of TCP genes with high protein sequence identity and query coverage, seven pairs exhibited conserved gene structure in terms of exon length and intron numbers. While most pairs showed similar exon/intron structure, a few displayed some degree of divergence. For example, GrTCP12 contained one exon without intron, whereas its counterpart in Arabidopsis, AtTCP12, possessed one exon and one intron. Such variation may be caused by single intron loss or gain during the process of structural evolution. In addition, we further analyzed the gene structure and conserved motifs of paralogous pairs of TCPs in G. raimondii to shed light on the diversification of TCP genes. Of the 19 paralogous pairs, 14 pairs of GrTCPs shared conserved exon/intron organization (Figure 3b). Similar to the gene structure, most paralogous pairs of GrTCPs also exhibited conserved motif composition, with only several unique motifs observed in some GrTCP members, such as motif 20 for GrTCP15a, motif 3 for GrTCP9b and motif 9 for GrTCP18b and GrTCP14a (Figure 3c). These specific motifs may contribute to the neofunctionalization (in which one paralogous member obtains a new function after gene duplication), or subfunctionalization (where each paralog retains part of its original ancestral function) of duplicated genes by a series of synonymous and/or non-synonymous mutation during evolution55. In all, the majority of GrTCPs are evolutionarily conserved, while the variation of exon/intron distribution and motif composition in certain paralogous pairs of GrTCPs suggests that some members of TCP family in G. raimondii are functionally diversified through differential expansion. The diversification of GrTCPs may contribute to the unique functions of TCP transcription factors in cotton, for example controlling cotton fiber initiation and development. It has been reported that cotton transcription factor TCP14 was predominantly in cotton fiber cell, particularly at the stage of cotton fiber initiation and elongation56. Another study demonstrated that cotton TCP transcription factor regulated fiber and root hair development by regulating jasmonic acid biosynthesis and response as well ethylene signaling pathway16.

Methods

Sequence retrieval for TCP proteins

The conserved TCP DNA-binding domain based on Hidden Markov Model (HMM) (PF03634) was obtained from Pfam protein family database (http://pfam.sanger.ac.uk/). In order to identify the TCP transcription factor coding genes of G. raimondii, the HHM profile of TCP domain was subsequently employed as query to perform a HMMER search (http://hmmer.janelia.org/) against the G. raimondii genome derived from Phytozome (http://www.phytozome.net/) (E-value = 0.01). All redundant sequences were discarded from further analysis based on cluster W17 alignment results, sequence identification numbers and chromosome location. Furthermore, to verify the reliability of the initial results, all non-redundant candidate TCP sequences were analyzed to confirm the presence of the conserved TCP domain using the InterproScan program18. The sequences of TCP family members in the genome of Arabidopsis and Oryza sativa were retrieved from PlantTFDB plant transcription factor database (http://planttfdb.cbi.pku.edu.cn/, v3.0).

Phylogenetic analysis

Multiple sequence alignments were conducted on the amino acid sequences of TCP proteins in G. raimondii, Arabidopsis and rice genomes using Cluster X19 with default settings. Subsequently, MEGA 6.0 software20 was employed to construct an unrooted phylogenetic tree based on alignments using the Neighbor-Joining (NJ) method with the following parameters: JTT model, pairwise gap deletion and 1,000 bootstraps. Furthermore, Maximum likelihood, Minimal Evolution and PhyML methods were also applied in the tree construction to validate the results from the NJ method. Additionally, a separate phylogenetic tree was constructed with all the TCP protein sequences in G. raimondii for further analysis.

Analysis of chromosomal location and gene duplication

Information about the physical locations of all GrTCP genes on chromosomes was obtained through BLASTN searches against G. raimondii genome database in Phytozome (http://www.phytozome.net/cotton.php). All GrTCP genes were then mapped on the chromosome using the software MapInspect. The detection of TCP gene duplication events was also carried out. Paralogous TCP gene pairs in G. raimondii were identified on the basis of alignment results. The criteria described in previous studies21,22 were adopted: the shorter sequences covers over 70% of the longer sequence after alignment and the minimum identity of aligned regions is 70%. In addition, to further analyze gene duplication events, the synonymous substitution rate (Ks) and non-synonymous substitution rate (Ka) were calculated using the software DnaSp23. The date of duplication events was subsequently estimated according to the equation T = Ks/2λ. The approximate value for clock-like rate (λ) was 1.5 synonymous substitutions per 108 years for G. raimondii5.

Gene structure analysis and conserved motif identification

The TCP genomic sequences and CDS sequences extracted from Phytozome were compared in gene structure display server program24 to infer the exon/intron organization of TCP genes. TCP protein sequences in G. raimondii were submitted to online Multiple Expectation maximization for Motif Elicitation (MEME) program25 for identification of conserved protein motifs. The optimized MEME parameters were as follows: any number of repetitions, the optimum width from 6 to 250 and maximum number of motifs-20. The identified protein motifs were further annotated with ScanProsite26.

RNA isolation and Real-time quantitative RT-PCR analysis

To detect the expression profiles of TCP genes in G. raimondii, total RNA was extracted from plant leaves, buds, shoots and sepals using the mirVanaTM miRNA Isolation Kit (Ambion, Austin, TX, USA), according to the manufacturer's instructions. The total RNA quantity and purity were assessed by a NanoDrop ND-1000 Spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA). 1 μg of total RNA isolated from each tissue was reverse transcribed into cDNA using the TaqMan® MicroRNA Reverse Transcription Kit (Applied Biosystems, Foster City, CA, USA) and a poly-T primer. The real-time RT-PCR was then performed with a 7300 Real-Time PCR System (Applied Biosystems, Foster City, CA) according to the supplier's protocols. Each reaction mixture contains 2 μL of DNase/RNase free water, 5 μL Real-Time SYBR Green PCR master mix, 1 μL diluted cDNA product from reverse transcription PCR reaction and 2 μL gene-specific primers. Three biological replicates were conducted for each tissue and each biological replicate was technically repeated three times. The thermal cycle applied was as follows: 95°C for 10 min followed by 45 cycles of denature at 95°C for 15 s and annealing and elongation at 60°C for 60 s. The expression values of TCP genes tested were normalized with an internal reference gene TuA11. The relative expression levels (R) was calculated using the following equation: R = 2−(Ct1 – Ct2), where Ct1 stands for the Ct value of TCP genes while Ct2 is the Ct value of the reference gene. A heatmap for gene expression patterns was generated with the software MultiExperiment Viewer (MeV).