Genome-wide characterization of the β-1,3-glucanase gene family in Gossypium by comparative analysis

The β-1,3-glucanase gene family is involved in a wide range of plant developmental processes as well as pathogen defense mechanisms. Comprehensive analyses of β-1,3-glucanase genes (GLUs) have not been reported in cotton. Here, we identified 67, 68, 130 and 158 GLUs in four sequenced cotton species, G. raimondii (D5), G. arboreum (A2), G. hirsutum acc. TM-1 (AD1), and G. barbadense acc. 3–79 (AD2), respectively. Cotton GLUs can be classified into the eight subfamilies (A–H), and their protein domain architecture and intron/exon structure are relatively conserved within each subfamily. Sixty-seven GLUs in G. raimondii were anchored onto 13 chromosomes, with 27 genes involved in segmental duplications, and 13 in tandem duplications. Expression patterns showed highly developmental and spatial regulation of GLUs in TM-1. In particular, the expression of individual member of GLUs in subfamily E was limited to roots, leaves, floral organs or fibers. Members of subfamily E also showed more protein evolution and subgenome expression bias compared with members of other subfamilies. We clarified that GLU42 and GLU43 in subfamily E were preferentially expressed in root and leaf tissues and significantly upregulated after Verticillium dahliae inoculation. Silencing of GLU42 and GLU43 significantly increased the susceptibility of cotton to V. dahliae.

have been independently domesticated for the production of economically valuable fibers. Previous studies have shown that allotetraploid Gossypium species were formed by a polyploidization event that occurred 1-2 million years (Myr) ago, which involved a maternal A-genome species resembling G. arboreum and a paternal D-genome species resembling G. raimondii 15 . With its high yield properties, G. hirsutum (commonly called Upland cotton) contributes over 90% of the annual global commercial cotton production. Nevertheless, cotton production is limited by diverse biotic and abiotic stresses. Verticillium wilt is a widespread disease that occurs in a range of cotton producing areas. More than 50% of the cotton acreage is influenced by Verticillium wilt in some years, significantly reducing the fiber quality and yield (National Cotton Council, 2012 http://www.cotton. org/). β -1,3-Glucanases have been proposed to play important roles in physiological and developmental processes, as well as in the response of plants to microbial pathogens, and show great functional diversity between the members of this large gene family 7 . In order to obtain an integrated image of the evolutionary characteristics and possible functions of the β -1,3-glucanase gene family in Gossypium, it is necessary to develop genome-wide analyses of the gene family at various levels. The publically available genomic information from four sequenced cotton species, G. raimondii 16 , G. arboreum 17 , G. hirsutum 18 and G. barbadense 19 , provides us with a data source to identify candidate GLUs at a genome-wide level in Gossypium, and to mine key genes for the genetic improvement of yield and fiber quality, as well as disease resistance.
In this study, we individually identified GLUs from four sequenced cotton species, and analyzed their chromosomal location, sequence phylogeny, genomic structure and expansion pattern. We carried out a genome-wide analysis of the temporal and spatial expression profiles of GLUs in G. hirsutum, and investigated their molecular evolution and subgenome expression divergence. Further, specific focus was placed on the GLUs in subfamily E and virus-induced gene silencing (VIGS) analysis confirmed that silencing of the two GLUs in subfamily E significantly increased the susceptibility of cotton to V. dahlia. Our study provides a basis for systematically elucidating the evolutionary and functional characteristics of GLUs in cotton, for the effective clarification of the precise biological roles of GLUs and their utilization in future cotton-breeding programs.

Results
Genome-wide identification of the β-1,3-glucanase gene family in Gossypium. The whole genome sequence scaffolds of four sequenced cotton species (G. raimondii, G. arboreum, G. hirsutum acc. TM-1 and G. barbadense acc.  were used for the genome-wide exploration of the β -1,3-glucanase gene family in Gossypium. With data from a query on the glycoside hydrolase family 17 domain (PF00332), we searched the protein databases of four cotton species using HMMER v3.0 software. In total, 67 GLUs in G. raimondii were obtained after confirming the "Glyco_hydro_17" domain with a blastp program and these were named GrGLU1 through GrGLU67 by combining their chromosome order (D1 to D13) 20 with their location on the chromosome (Fig. 1). The chromosomal distribution patterns of these GrGLUs were uneven. Chrs D5 and D6 contained the most GLUs (10 genes), while Chr D3 contained the fewest (one gene). Further, we identified 68 GLUs in G. arboreum. As these had an orthologous relationship to the GLUs in G. raimondii, the 67 GLUs in G. arboreum were named GaGLU1-67, corresponding to GrGLU1-67. Notably, GaGLU54 had undergone a tandem duplication event in G. arboreum, and the two tandem duplication genes were named GaGLU54a and GaGLU54b.
From a phylogenetic point of view, there were two homologous genes (homeologs from the A and D subgenomes) in the tetraploid cotton species. In total, we identified 130 GLUs in G. hirsutum acc. TM-1 (named GhGLU1-67A/D; Supplementary Fig. 1) and 158 in G. barbadense acc. 3-79, and these candidate GLUs in G. barbadense experienced more tandem duplications at the subgenome level. More than 90% of the GLUs had homoeologous genes in the two tetraploid cotton species, indicating the independent evolution of the A-and D-subgenomes after polyploid formation. Other inconsistencies might result from different sequencing methods, assembly error in partial chromosomal regions, or different degrees of colonization during the evolutionary process of Gossypium, and need to be further confirmed. In G. hirsutum acc. TM-1, the chromosomal location of GhGLUs in the D subgenome shows good collinearity with that in the D genome (G. raimondii), however, two reciprocal translocations between the A2 (Chr2) and A3 (Chr3) and between the A4 (Chr4) and A5 (Chr5) Figure 1. Chromosomal distribution of β-1,3-glucanase genes in G. raimondii. The chromosome numbers were consistent with the interspecific genetic map (D1 to D13) in allotetraploid cultivated cotton species 20 and the scaffolds (Chr.1 to Chr.13) in the genomic data of G. raimondii 16 . The nomenclature of GLUs was based on the order of the chromosomes in G. raimondii. Lines were drawn to indicate the tandem duplicated genes.
Classification and structural analysis of β-1,3-glucanases. Each ortholog from the four surveyed cotton species had a similar protein structure. With the information on the GrGLUs in G. raimondii as an example of the further analysis carried out, all 67 GrGLUs contained an N-terminal signal peptide and a glycoside hydrolase family 17 domain. In detail, five types of protein domain architectures (type I to type V) were observed in the GrGLUs (Fig. 2), which was consistent with that in Arabidopsis 3 . The carbohydrate-binding modules family 43 (CBM43) domain, which has the ability to bind β -1,3-glucan 21 , was found in 39 of the 67 GrGLUs. A C-terminal hydrophobic sequence encoding a transient transmembrane domain was present in 42 GrGLUs. The transient transmembrane domain may be a vacuolar targeting peptide 22 or a glycosylphosphatidylinisotol (GPI)-anchor attachment 23,24 . Further, we compared the number of GrGLUs with that in A. thaliana, Theobroma cacao and Vitis vinifera for each protein domain architectural type (Table 1). Type III was the smallest, with only one member in G. raimondii and A. thaliana, and none in T. cacao or V. vinifera. Type I was the largest, with 23 members in G. raimondii, 19 in A. thaliana and 16 in T. cacao. It is reasonable to suggest that the ancestral GLUs in plants may have the protein domain architecture as type I, with a CBM43 domain and a C-terminal hydrophobic sequence after the core glycosyl hydrolase family 17 domain, and these genes may play an important role in cell division or cell wall remodeling as they have abundant expression in a variety of tissues and organs. During the evolutionary process, the loss of the C-terminal region, including the GPI anchor, and the alteration of expression patterns of duplicated genes by acquiring or losing of regulatory cis-elements, led to the formation of new GLUs, which were classified as types II to V, and resulting in their functional diversity 3 .
Phylogenetic analysis of β-1,3-glucanase genes. Systematic classifications of GLUs at a genome-wide level have been reported previously in Arabidopsis 3 and tobacco 5 . To gain further insights into the evolutionary relationships, we employed MEGA v5.2 software to construct an unrooted phylogenetic tree of GLUs from G. raimondii, A. thaliana and tobacco. The phylogenetic tree clearly showed that the 123 GLUs could be clustered into eight subfamilies (A-H) ( Supplementary Fig. 2), with At5g67469 and GrGLU58 (which contain a partial "Glyco_hydro_17" domain) as outgroup. All five classes of tobacco GLUs fall into subfamily E. Arabidopsis clades β and γ were consistent with subfamilies E and F, respectively, and the GLUs in clade α fall into the other subfamilies. The GrGLUs were classified into eight subfamilies, and this phylogenetic classification was consistent with the protein domain architecture and exon-intron organization (Fig. 3). Subfamilies A, C, D, E and F each contained more GrGLUs, with the number of members ranging from 9 to 13, while subfamilies B, G and H each contained only 3, 4 and 2 GrGLUs, respectively.
The CBM43 domain and the hydrophobic C-terminal sequence were found to be subfamily dependent. All members of subfamilies B, D, F and G contained the CBM43 domain, whereas it was absent from the members of subfamily E. The C-terminal hydrophobic sequence was present in members of subfamilies A, F and H, but was absent from members of subfamilies B and G. With the exception of subfamily C, and some members of subfamily D, such as GrGLU11 and GrGLU12, and subfamily E, such as GrGLU20 and GrGLU23, the introns in most  of the GrGLUs were highly subfamily-specific (Fig. 3), indicating that GLUs in different subfamilies may have undergone independent gene duplication in G. raimondii.

β-1,3-Glucanase gene family expansion in the Gossypium genus. Previous studies have suggested
that there was at least one additional whole genome duplication (WGD) after the pan-eudicot triplication in the Gossypium genome, however, the frequency and timing of the event are still being debated 16,25 . To investigate the β -1,3-glucanase gene family expansion pattern in cotton, we download the syntenic data of G. raimondii, A. thaliana, V. vinifera and T. cacao from the Plant Genome Duplication Database (PGDD). A. thaliana has undergone at least three WGD events, including two doublings (α and β events) since its divergence from other members of the Brassicales clade 26 and one tripling (γ event) which was shared with most if not all eudicots 27 . The V. vinifera genome has experienced no WGD since the pan-eudicot triplication, and it is regarded as an excellent reference for determining the numbers of WGDs in eudicots 28 . Of the whole genome sequenced plants, T. cacao is most closely related to cotton and had no WGD after the ancient hexaploidization event 16,29 . The WGD and tandem duplication of GLUs were analyzed to elucidate the gene family expansion events in G. raimondii. Among the 67 GrGLUs in G. raimondii, we identified 27 genes within 17 pairs of syntenic blocks ( Fig. 4; Supplementary Table 2). The Ks values for each pair of genes within a syntenic block were used to interpret duplication events ( Table 2). For two pairs of paralog genes, GrGLU6 vs GrGLU17 and GrGLU30 vs GrGLU50, the Ks values were 2.260 and 2.034, respectively, and these two paralog gene pairs may be derived from the ancient hexaploidization event. For the remaining 15 paralog gene pairs, the Ks values ranged from 0.436 to 0.806, implying that these paralog gene pairs originated from the Gossypium lineage WGD events 16,25 . We also analyzed the adjacent genes to investigate whether tandem duplication had taken place. 13 of the 67 GrGLUs showed tandem duplication, with a cluster of 4 members (GrGLU20, GrGLU21, GrGLU22 and GrGLU23) present in Chr D6 (Fig. 1).
Furthermore, we analyzed the gene duplication pattern of GLUs from different phylogenetic subfamilies. Inter-genomic synteny analysis showed that GLUs with syntenic relationships were detected in all subfamilies with the exception of subfamily E, and this indicated that the gene orders of the blocks in subfamily E were less conserved during the evolution of diploid cotton. In total there were 12 pairs of genes with syntenic relationships in subfamilies A, B, C, G and H, but these were not generated by tandem duplication events. In subfamily E, which contained 12 GLUs, there were no genes with syntenic relationships but 8 tandem duplicated genes were present. In subfamilies D and F, there were 7 genes within different syntenic blocks and 5 tandem duplicated genes (Figs 1 and 4; Table 2). These results show that GLUs from different phylogenetic subfamilies have different expansion patterns. In summary, Gossypium lineage WGD event contributed to the expansion of subfamilies A, B, C, G and H, while subfamily E expanded through tandem duplication events. WGD and tandem duplication worked together to promote subfamily D and F expansion.
It has been suggested that allotetraploid cotton species appeared in the last 1-2 Myr through hybridization and subsequent polyploidization events that involved a maternal A-genome species and a paternal D-genome species. As a relatively young polyploidy species, genes from parental genomes were mostly preserved in the G. hirsutum subgenomes 18 . Here, we identified 66 and 64 GLUs in the A and D subgenomes of G. hirsutum acc. TM-1, respectively ( Fig. 5; Supplementary Table 1). The number of GLUs in allotetraploid cotton has nearly doubled when compared with that in diploid cotton, although a handful of homologs have been lost.    Table 2. Ks values of gene pairs in syntenic blocks. * Numbers of gene pairs in syntenic blocks. different organs and developmental stages. The FPKM method was employed to normalize the total short read sequences. Among the 130 GhGLUs, there were 98 genes with an FPKM > 1 in at least one of the 12 investigated organs and developmental stages, and these 98 genes were used to gauge the relative expression of each β -1,3-glucanase gene (Fig. 6). The remaining 32 GhGLUs may be pseudogenes, or are only expressed at specific developmental stages, or under special environmental conditions that were not analyzed in this study. We observed that the accumulation of the 98 GhGLUs transcripts was associated with different organs and developmental stages, and that the expression patterns differed among each subfamily (Fig. 6). Notably, the expression of the individual member of GLUs in subfamily E was limited to roots, leaves, floral organs or cotton fibers, not all tissues we analyzed.
A hierarchical clustering analysis of expression profiles in the 12 tissues produced 8 clusters (Cluster I-VIII) comprising a total of 94 GhGLUs, and the remaining 4 genes could not be divided into any of the above clusters (Fig. 7, Supplementary Table 3). Thirty-eight GhGLUs were highly expressed in − 3, 0 and 3 days post anthesis (DPA) ovules and fell into Cluster I. Members in this cluster also had high expression in stems but low expression in stamens. There were 14 GhGLUs in Clusters II and III and these were highly expressed in leaves. The GhGLUs in Cluster III were also highly expressed in stems. The expression of 26 genes was highest in petals or stamens  and these fell into Clusters IV and V, respectively. The remaining 16 genes were highly expressed in the fibers and these fell into Clusters VI, VII and VIII. Expression of the GhGLUs in Clusters VI and VII was highest in the 20 DPA fibers, although transcripts of the genes in Cluster VI were also detected in vegetative and reproductive organs.
To confirm the expression profiles above, qRT-PCR was conducted for 26 genes in 8 organs and developmental stages of TM-1 (Fig. 8). The qRT-PCR primers used (Supplementary Table 3) were gene-specific but not homoeolog-specific. Firstly, 11 GhGLUs showed expression peaks in 0 DPA ovules, and these were divided into two groups. GhGLU33, GhGLU3, GhGLU41, GhGLU3, GhGLU35 and GhGLU26 were predominantly found to accumulate in ovules at 0 DPA ovules, and GhGLU37, GhGLU16, GhGLU24, GhGLU28 and GhGLU66 were also detected in vegetative organs and fibers, with low expression levels in petals and anthers. Secondly, four GhGLUs, GhGLU42, GhGLU43, GhGLU21 and GhGLU22, were predominantly found to accumulate in roots and leaves. Thirdly, the expression peaks of four GhGLUs, GhGLU45, GhGLU59, GhGLU17 and GhGLU65, occurred mainly in petals or anthers. Lastly, seven GhGLUs were found to accumulate at high levels in fibers at 10 and 20 DPA. GhGLU63 showed preferential expression in fibers at 10 DPA, and GhGLU30 had similar expression levels in 10 and 20 DPA fibers, while GhGLU9, GhGLU18, GhGLU19, GhGLU1 and GhGLU34 were highly expressed in 20 DPA fibers (Fig. 8). The results of our qRT-PCR analysis of GhGLUs are consistent with the RNA-Seq data. Molecular evolution and subgenome expression divergence of β-1,3-glucanase genes. The whole genome sequencing of G. raimondii, G. arboreum, G. hirsutum, and G. barbadense has provided us with an opportunity to explore the molecular evolutionary properties of GLUs in diploids and their allotetraploid derivatives. We calculated the nonsynonymous substitution rate (Ka), synonymous substitution rate (Ks) and Ka/Ks ratios within and between species 30 to explore the divergence and selection pressures that have accumulated in GLUs (Table 3). Average Ka and Ks values were higher in inter-genomic contrasts (A vs. D or At vs. Dt) than in intra-genomic contrasts (A vs. At or D vs. Dt), and this is consistent with the hypothesis that the A-D genome divergence occurred well before the formation of the allotetraploid species. To our surprise, the average Ka of GLUs in subfamily E was higher than that in other subfamilies, indicating that members of subfamily E experienced quicker protein evolution. The overall low Ka/Ks ratios in the inter-and intra-genomic contrasts suggested negative or purifying selection of GLUs in both diploid and polyploid cotton species, however, subfamily E had a higher Ka/Ks ratio in At vs. Dt than other subfamilies, with A vs. D as the control. Therefore, GLUs in subfamily E experienced more protein evolution in both diploid and polyploid cotton, and they also underwent more selection pressures when compared to other subfamilies.
Through the genome-wide analysis of GLU expression levels in 12 TM-1 (G. hirsutum) tissues (53 homoeologs × 12 tissues = 636 combinations), we found that 147 (23.1%) homoeologous gene pairs showed similar expression levels, 340 (53.5%) pairs were undetectable, and 149 (23.4%) pairs were subgenome-biased (Fig. 9). Of the 149 subgenome-biased pairs, 52 pairs showed one gene expressed while the other was undetectable, 45 pairs showed higher levels in A than D subgenome (A subgenome bias), and 52 pairs showed higher levels in D than A subgenome (D subgenome bias). In particular, only 19 homoeologous gene pairs in subfamily E were found to have transcripts in the tested tissues, of them, there were 15 (78.9%) pairs showed subgenome bias expression, implying specific roles for members in subfamily E during the evolutionary process.
Potential function of two tandem duplicated β-1,3-glucanase genes in Verticillium dahliae resistance. Four tandem duplicated GLUs from subfamily E, GLU42 and GLU43, and GLU21 and GLU22, were found to be predominantly accumulated in leaves, and GLU42 and GLU43 were also detected in roots (Fig. 8). qRT-PCR showed that GLU42 and GLU43 were significantly induced after Verticillium dahliae inoculation, with the highest expression observed at 96 h post-inoculation (Fig. 10a).
To investigate the function of GLU42 and GLU43 in Verticillium resistance, we constructed TRV2:GLU42 and TRV2:GLU43 vectors to silence the endogenous genes in Hai7124, with TRV:00 as the mock treatment and TRV2:GhCLA1 (Cloroplastos alterados 1) as an indicator. An Agrobacterium culture containing all recombinant pTRV vectors was infiltrated into the cotyledons of cotton cultivar Hai7124. Three weeks later, the plants infiltrated with CLA1 exhibited the photobleaching phenotype ( Supplementary Fig. 4a), and GLU42 and GLU43 were silenced in TRV2:GLU42-and TRV2:GLU43-infiltrated plants when compared to no-infiltration control (CK) and mock treated plants (Fig. 10b). Further, we found that GLU42 was silenced in TRV2:GLU43-silenced plants, and GLU43 was also silenced in TRV2:GLU42 silenced plants. This may be due to the 90.98% nucleotide identity between the two tandem duplicated genes ( Supplementary Fig. 5). We also performed qRT-PCR to examine the expression of other GLUs in subfamily E. Only two GLUs, GLU21 and GLU22, were detected to express in leaf tissue, however their expression had not been interfered in GLU42/43 silenced plants ( Supplementary  Fig. 6). After the GLU42/43 were silenced, we inoculated the cotton seedlings by dip-infection with V. dahliae isolate V991, with the final concentration adjusted to 1 × 10 7 spores per milliliter. Eleven days after inoculation, yellow leaf veins and wilting were found in cotyledons of GLU42/43-silenced plants. Similar phenotypes were observed in susceptible plants (G. hirsutum cv. Junmian 1), but not in no-infiltration control and mock treated plants. Twenty days later, the true leaves of GLU42/43-silenced plants became withered and etiolated, and thirty days later, most of the true leaves were defoliated (Fig. 10c). In general, no-infiltration control and mock treated plants exhibited a weak partial leaf wilting phenotype. However, silencing of GLU42/43 enhanced the plants' susceptibility to V. dahlia: over 83% of GLU42/43 silenced plant leaves exhibited severe wilting symptoms inoculated thirty days later, it's higher than that of mock treated plants. Further, more than 96% GLU42/43 silenced plant inoculated thirty-five days later were severely infected, similar to the results observed in susceptible control plants (G. hirsutum cv. Junmian 1) (Fig. 10c,d, Supplementary Table 4). No-inoculation control with silenced GLU42/43 showed the normal growth and development ( Supplementary Fig. 4b). The VIGS results indicated that GLU42 and GLU43 play important roles in resistance to V. dahliae infection in cotton, and have a potential utilization in cotton disease-resistant breeding.

Discussion
Whole-genome duplication, followed by massive silencing and elimination of duplicated genes, has long been recognized as a pervasive force in plant evolution 31 . Complete genome analyses support the idea that two polyploidy events (ρ and σ ) occurred in monocots, and one triplication event (γ ) was probably shared by all core eudicots 26,27,32 . The Gossypium lineage underwent a WGD after the γ event, although the frequency and timing of this event are still being debated 16,25 . Meanwhile, the fate of the duplicated genes derived from WGD events is still poorly understood, both evolutionarily and functionally. Several studies have shown that some duplicate genes were more prone to retention than others. For example, transcription factors, signal proteins, and membrane proteins were preferentially retained after duplication in Arabidopsis 33 . The duplicate genes retained after an ancestral WGD event enhanced root nodule symbiosis in the Papilionoideae 34 .
In this study, we identified 67 GrGLUs in G. raimondii. Of them, 27 GrGLUs were detected within 17 pairs of syntenic blocks ( Fig. 4; Table 2), with two paralogs derived from the ancient hexaploidization, and 15 paralogs from the Gossypium lineage WGD events. These paralogs were retained after the WGD events. We also detected 21 GrGLUs that showed there was only one β -1,3-glucanase gene remaining in the syntenic blocks (Supplementary Table 2), and further, their corresponding orthologs were also found in A. thaliana (Fig. 4a), V. vinifera (Fig. 4b) or T. cacao (Supplementary Table 2). This indicated the duplicated genes in the syntenic blocks were lost in Gossypium after WGD event. The retained paralogs might have gained new functions (neofunctionalization), lost functions (pseudogenization), or partitioned aggregate ancestral function (subfunctionalization) 35,36 . In G. raimondii, 99.4% of the duplicated genes were reported to exhibit differential expression in at least one of three tissues (petal, leaf, and seed) 37 . In this study, we also found that GhGLU41 and GhGLU26 showed different expression to GhGLU11, indicating the functional diversity of these paralogs. However, evolutionary mechanisms underlying the preservation and diversification of duplicated genes followed by WGD remain largely unknown.
According to parsimony reconstruction, the ancestral GLUs most probably displayed a somewhat nonspecific expression pattern and were involved in cell division or cell wall remodeling-like functions 3 . After several rounds of ancient WGD events in seed plants and angiosperms, the GLUs with ancestral functions may have diverged in expression; leading to the production of new functions in pathogen resistance, pollen development and tube growth, seed germination, and other processes 3,31 .  Table 3. Molecular evolutionary rates of β-1,3-glucanase genes in various comparisons among Gossypium taxa. a Genome codes are as follows: G. arboreum (A), G. raimondii (D), G. hirsutum "A homoelog" (At), G. hirsutum "D homoeolog" (Dt). "* ": t test at P < 0.05.
The sequence-based phylogeny of GLUs is not entirely consistent with the expression divergence in Arabidopsis 3 and cotton in our study, possibly because duplicated genes acquired or lost tissue or developmental cis-regulatory elements during the evolutionary process. However, the GLUs in subfamily E were specifically Figure 9. Ratios of β-1,3-glucanase homoeologs transripts in G. hirsutum. Nosignal: FPKM of both homoeologs was less than one. Similar: No expression divergence detected between homoeologs. Ahigh: A homoeolog expression was higher than D. Dhigh: D homoeolog expression was higher than A. Ahigh.Dsilent: A homoeolog expression was higher than D, and FPKM of D homoeolog was less than one. Dhigh.Asilent: D homoeolog expression were higher than A, and FPKM of A homoeologs was less than one. The higher expression level of homoelogs were detected through pairwise t-test (P value < 0.01, FDR < 0.05 and at least 1.5-fold difference in expression levels). expressed in particular tissues or organs. For example, the expression of GhGLU18 (Accession number: D88416.1) specific to fibers at 20 DPA played an important role in reopening plasmodesmata by degrading callose during cotton fiber development 38 . An Arabidopsis β -1,3-glucanase gene, AtBG_pap, is considered as a plasmodesmal gate keeper for intercellular communication 12,13 . Due to the temporary deposition of callose, a transient closure of plasmodesmata occurs during the rapid phase of cotton fiber elongation. The expression of GhGLU18 highly correlates with the timing of plasmodesmata opening, and could play a role in cotton fiber development by degrading callose in plasmodesmata [38][39][40] . The expression of GhGLU21, GhGLU22, GhGLU42 and GhGLU43 was highly specific to roots and leaves. The expression peaks of GhGLU45 occurred mainly in anthers, which may indicate a role in cotton flowering. At the onset of meiosis, a secondary callose wall composed of β -1,3-glucan is deposited between the primary wall and the plasma membrane of the pollen mother cells 41 . It is reported that the timing of the callose wall degradation by β -1,3-glucanase is important for proper microsporegenesis, and the premature dissolution and delayed degradation of the callose walls caused male sterility in tobacco 9 and rice 4 , respectively.
The protein domain architecture and intron/exon structure of the genes in subfamily E were highly conserved (Fig. 3). Most of these genes lost the C-terminal region, including the CBM43 domain and the hydrophobic C-terminal sequence, and this change may enable them to be secreted extracellularly. Furthermore, GLUs in subfamily E experienced quicker protein evolution in both diploid and polyploid cotton, and showed subgenome-biased expression. Our findings suggest that the specific expression, loss of the C-terminal region, fast evolution and subgenome expression bias of these members together led to the functional divergence of GLUs in subfamily E in cotton.
As a member of the pathogenesis-related (PR) protein group, the interest in β -1,3-glucanases has focused primarily on their antifungal activity. This glucanohydrolase can degrade β -1,3-glucan in the cell walls of pathogens, and the release of cell-wall derived elicitors can further stimulate defense reactions 7,8 . In soybean, β -1,3-glucanases have been demonstrated to release active elicitors from fungal cell walls, and then induce the accumulation of the phytoalexin glyceollin 42 . Constitutive expression of a β -1,3-glucanase gene in tobacco plants led to reduced symptoms when infected with the glucan-containing fungi Peronospora tabacina and Phytophthora parasiticavar nicotiana 43 . Further, expression of the β -1,3-glucanase gene in combination with chitinase transgenes led to a remarkable synergic effect. Transgenic tomato plants that were overexpressing β -1,3-glucanase in class I and chitinase from tobacco, showed strong antifungal activity against Fusarium oxysporum 44 . Tobacco plants expressing both β -1,3-glucanase and chitinase showed reduced susceptibility to infection by Fusarium solani 45 . Verticillium wilt is a serious disease that significantly reduces the yield and quality of cotton. The availability of the whole genome sequences of Gossypium species provide us with an opportunity to identify certain GLUs that are involved in V. dahliae resistance. In this study, based on the genome-wide analyses, we found two tandem duplicated GLUs in subfamily E, GLU42 and GLU43, which were significantly induced in cotton roots after V. dahliae inoculation. VIGS assays showed that silencing of these two genes significantly increased the susceptibility of cotton plants to V. dahlia, while the control and mock treated plants exhibited a weak and partial leaf wilting phenotype (Fig. 10). Our finding implies that GLU42 and GLU43 play crucial roles in resistance to V. dahliae in cotton, and that overexpression of GLU42/43 will help defend cotton against fungal infection.

Materials and Methods
Identification and chromosomal mapping. The gene files of G. raimondii, A. thaliana, V. vinifera, and T. cacao were downloaded from Phytozome v9.0 (http://www.phytozome.net/). The gene information of G. arboreum, G. hirsutum acc. TM-1, and G. barbadense acc. 3-79, were downloaded from http://cgp.genomics.org.cn, http://mascotton.njau.edu.cn and http://cotton.cropdb.org, respectively. The Hidden Markov Model (HMM) profile of the glycoside hydrolase family 17 domain (PF00332) was obtained from the Pfam website (http://pfam.xfam.org/), and was employed as a query to identify all possible GLUs using HMMER (V3.0) software 46 . To validate the HMM search, all candidate sequences were used as queries to search the NCBI non-redundant (nr) protein database with the blastp program, and the results with the best hits of "Glyco_hydro_17" were retained for further analysis. For prediction of GPI-anchor attachment sites, the BGI-PI 47 and GPI-SOM 48 algorithms were used.

Phylogenetic and exon-intron structural analysis.
A multiple alignment of the sequences encoding the conserved glycosyl hydrolase family 17 domain were constructed with ClustalX (version 2.0) 49 , and gaps and poorly aligned sections were removed. A phylogenetic tree was generated using the maximum likelihood method and WAG model in MEGA v5.2 50 software, and the reliability of interior branches was assessed with 1000 bootstrap resamplings.
The gene structure of the GLUs was parsed from the General Feature Format (GFF) files, and diagrams of the exon-intron structures were drawn using the online program Gene Structure Display Server (GSDS; http://gsds. cbi.pku.edu.cn/). Genome synteny and gene duplication. The syntenic information of G. raimondii, A. thaliana, V. vinifera and T. cacao was downloaded from the Plant Genome Duplication Database (PGDD; http://chibba.agtec. uga.edu/duplication/). GLUs were mapped to the syntenic blocks for intra-and inter-genomic comparison. The syntenic diagram was drawn using Circos software 51 .
The timing of segmental duplication events can be estimated by computing mean Ks values for all anchor points located in the corresponding syntenic block 16,25 , and all the Ks values were parsed from PGDD syntenic data. Genes separated by five or fewer genes within a 100-kb region on a chromosome may have resulted from tandem duplication 52 .
Plant materials and treatments. The genetic standard line, G. hirsutum acc. TM-1, was used for tissue/ organ expression analysis. Roots, stems and leaves were collected from two-week-old seedlings grown in a greenhouse. Petals, anthers and ovules were collected from plants grown under standard field conditions on the day of flowering, and fibers were excised from developing bolls on selected days post anthesis (DPA). The tissue/organ materials were quick-frozen in liquid nitrogen and stored at − 70 °C before RNA extraction.
G. barbadense cv. Hai7124 with Verticillium resistance, was used for fungal pathogen (V. dahliae) inoculation. We hand-inoculated three-week-old Hai7124 seedlings with conidial suspensions carrying 1 × 10 7 spores of V. dahliae strain V991 through dip-infection. The roots were harvested, with three repeats at different time points (0, 24, 48, 96 and 144 hours) after V991 treatment, and then quick-frozen in liquid nitrogen and stored at − 70 °C before RNA extraction. G. hirsutum cv. Junmian 1 with Verticillium susceptibility was used as a susceptible control.
RNA isolation and quantitative reverse transcription PCR. Total RNA was isolated using the CTAB-acidic Phenolic method 53 . Each RNA sample was treated with DNase I to remove the genomic DNA. Total RNA samples (2 μ g per reaction) from different tissues/organ were reversely transcribed into cDNA by M-MLV reverse transcriptase.
The expression of GLUs was analyzed using an ABI 7500 real-time PCR system with the HiScript Q RT SuperMix (Vazyme, Nanjing, China). Gene-specific primers were designed based on the β -1,3-glucanase gene sequences using Oligo 6.0. Cotton histone3 (AF024716) was used as the reference gene 54 . The amplification parameters were as follows: 95 °C hold for 10 min, followed by 40 cycles at 95 °C for 15 s, 58 °C for 15 s and 72 °C for 15 s. For the melting curve stage, the default settings were chosen. Nonspecific products were identified by inspecting melting curves. The primer pairs used for real-time PCR were listed in Supplementary Table 5.
Estimation of the evolution rates of β-1,3-glucanase genes. Estimation of the rates of nonsynonymous substitutions per nonsynonymous site (Ka) and synonymous substitutions per synonymous site (Ks) was performed within and between Gossypium species using DnaSP version 5 55 . Based on the definition of Ka/Ks, a value of 1 represented neutral evolution, and a value less than 1 indicated negative or purifying selection, whereas a value greater than 1 indicated positive selection acting on amino acids.