Species-specific duplications of NBS-encoding genes in Chinese chestnut (Castanea mollissima)

The disease resistance (R) genes play an important role in protecting plants from infection by diverse pathogens in the environment. The nucleotide-binding site (NBS)-leucine-rich repeat (LRR) class of genes is one of the largest R gene families. Chinese chestnut (Castanea mollissima) is resistant to Chestnut Blight Disease, but relatively little is known about the resistance mechanism. We identified 519 NBS-encoding genes, including 374 NBS-LRR genes and 145 NBS-only genes. The majority of Ka/Ks were less than 1, suggesting the purifying selection operated during the evolutionary history of NBS-encoding genes. A minority (4/34) of Ka/Ks in non-TIR gene families were greater than 1, showing that some genes were under positive selection pressure. Furthermore, Ks peaked at a range of 0.4 to 0.5, indicating that ancient duplications arose during the evolution. The relationship between Ka/Ks and Ks indicated greater selective pressure on the newer and older genes with the critical value of Ks = 0.4–0.5. Notably, species-specific duplications were detected in NBS-encoding genes. In addition, the group of RPW8-NBS-encoding genes clustered together as an independent clade located at a relatively basal position in the phylogenetic tree. Many cis-acting elements related to plant defense responses were detected in promoters of NBS-encoding genes.

species for the Fagaceae that dominate the hardwood forests of the northern hemisphere 22 and have significant economic and ecological value. Chestnut can be infected with pathogens that cause diseases such as chestnut blight 23,24 , ink disease 25 or bark disease 26,27 . Furthermore, many disease resistance genes belonging to the NBS-LRR gene families have been identified 9,[28][29][30][31][32][33][34][35] .
Chestnuts (Castanea spp.) are important nuts and forest trees, and play important roles in ecosystem and generate nuts for wildlife and specialty nuts for human consumption. Chinese chestnut is resistant to chestnut blight, caused by fungal pathogen Cryphonectria parasitica (formerly Endothia parasitica), which, when it was accidentally introduced to North America around 1904 from Japan with nursery stocks, has almost wiped out the American chestnut tree (Castanea dentata) in the early 1900's, once plentiful tree in the eastern United States. However, the resistance mechanism to chestnut blight in Chinese chestnut is not clear. The whole genome sequence recently released 22,36 provides the opportunity to undertake a whole-genome analysis of NBS-encoding R genes in Chinese chestnut to obtain insight into the evolutionary development of this gene family. We used poplar (Populus tricocarpa) 21,37 as a reference, which was the most closely related to chestnut among all the sequenced species, to identify chestnut-specific duplications after the divergence between Chinese chestnut and poplar. Results of this genome-wide analysis suggests that ancient and species-specific duplications have contributed to the expansion of NBS-encoding genes in Chinese chestnut. This research lays a foundation for further characterizing these R genes and helps identifying R genes that may be involved in resistance to chestnut blight.

Results
Total number of NBS-encoding genes in Chinese chestnut. A total of 519 NBS-encoding genes were identified in the C. mollissima genome, including 374 NBS-LRR genes and 145 NBS-only genes ( Table 1). The NBS-encoding genes comprised 1.36% of expressed genes in Chinese chestnut compared with 0.91% in poplar. Accordingly, the proportion of NBS-LRR genes in Chinese chestnut (0.98%) was also higher than that in poplar (0.72%). Among NBS-LRR and NBS-only genes, two types of genes could be subclassified based on their N-terminal structures: TIRs and non-TIRs. Among the NBS-encoding genes, 27 TIR genes (22 TNLs and 5 TNs) and 492 non-TIR genes (352 non-TNLs and 140 non-TNs) were found. These results demonstrated that the number of non-TIR genes was greater than that of TIR genes, which was similar to the numbers of the two types of NBS-encoding genes in poplar. Furthermore, the proportions of non-TNLs were greater than these of TNLs in both Chinese chestnut and poplar genomes. In the Chinese chestnut genome, 0.06% and 0.92% of genes detected were TNLs and non-TNLs, respectively, compared to proportions of 0.17% and 0.55% for those gene categories in poplar. However, although the number of NBS-encoding genes in Chinese chestnut was greater than that in poplar, the number of TIR genes in Chinese chestnut was smaller than that in poplar. Moreover, the non-TIR genes could also be further divided into 32 CNLs, 96 CNs, 320 XNLs, and 44 XNs (Table 1).
The average number of exons identified in NBS-LRR genes from Chinese chestnut was 4.37, which was greater than the average of 2.35 exons for NBS-LRR genes in poplar. In addition, the average number of exon in TNL genes was less than that in non-TNL genes in Chinese chestnut, which was different from the average exon numbers in these genes in grape and poplar 21 . The average exon number in TNL, non-TNL, CNL, NBS-encoding, and NBS-LRR genes was 2.50, 2.93, 3.32, 2.66, and 2.90, respectively, which were all less than for all genes predicted in the Chinese chestnut genome. Duplication of NBS-encoding genes in the Chinese chestnut genome. According to criteria for both coverage ≥70% and identity ≥70%, 273 genes were detected in 64 NBS gene families in Chinese chestnut. Therefore, 246 of these genes were singletons in the Chinese chestnut genome ( Table 2). The percentage of multiple NBS-encoding genes in Chinese chestnut (52.60%) was much lower than that in poplar (78.13%). The average number of members per family was 4.27 in chestnut and 5.33 in poplar, and the maximal number members within a family of these genes in Chinese chestnut (19) was less than that in poplar (23), which indicated fewer NBS gene duplications and multi-gene families in Chinese chestnut than in poplar.
If coverage and identity criteria were changed to 80%, the proportion of multiple genes among all NBS-encoding genes decreased in both Chinese chestnut and poplar. The proportions of multiple NBS-encoding genes were 41.43% and 74.52% between Chinese chestnut and poplar, respectively, which could allow the inference that more relatively recent duplications have occurred in the poplar genome. When coverage and identity criteria were increased to 90%, 19.46% of the multiple genes were still identified in Chinese chestnut, indicating that recent duplication events partly contributed to the expansion of NBS-encoding genes.
The multi-gene families and number of multiple genes encoding TNLs and non-TNLs were diverse in Chinese chestnut. The numbers of both multi-gene families and multiple genes in non-TNLs were greater than those in TNLs, indicating that duplication of NBS-encoding genes occurred primarily among non-TNL genes ( Table 2).

Selective pressure on NBS-encoding genes in Chinese chestnut. Plant disease resistance genes
have been shown to be subject to positive selection 38 . To better understand the evolutionary fate of NBS-encoding duplicates in Chinese chestnut, we used site and branch models in PAML4 to detect positive selection patterns. Because this analysis requires comparison of at least three genes, TIR-NBS-encoding genes were not evaluated and 34 non-TIR genes families with greater than three members each were analyzed.
We calculated the ratios of nonsynonymous to synonymous nucleotide substitutions (ω ), a molecular evolutionary measure of selection pressure 39 , to detect positive selection in NBS-encoding genes. A value of ω greater than 1 indicates that a gene is evolving with more constraint on nonsynonymous substitutions than on synonymous substitutions, which is evidence of positive selection 40 . In contrast, value of ω less than 1 indicate purifying selection. A value for ω of 1 means that a gene is under neutral selection. As Fig. 2 shows, most of these gene families (30/34) have undergone purifying selection. Meanwhile, four gene families had values for ω of greater than 1, which demonstrated that some NBS-encoding gene families were under positive selection.
Subsequently, the result of the LR test, 2Δ ln detected positive selection of significant differences between the M7 and M8. It was important to note that 85.29% (29/34) ( Fig. 2) of non-TIR gene families had some sites under highly significant positive selection pressure, showing positive selection on gene families played a certain role in the evolution of non-TIR-NBS-encoding genes. Specifically, the analysis revealed that two gene families (Family 30 and 33) (Fig. 2) were driven by significant positive selection. Further, Bayes Empirical Bayes (BEB) analysis was performed to detect amino acid sites that have been under positive selection 41 . The sites inferred to be under positive selection at the 95% (*) and 99% (**) confidence intervals are shown in Table S1 (Additional file 1), which suggested that these positively selected sites possessed relatively high substitutions compared with others among NBS-LRR genes.
Taken together, 530 amino acid sites from 22 families were revealed to be under positive selection in non-TIR gene families, which might have driven the evolution of function in NBS-encoding genes in Chinese chestnut (Additional file 1: Table S1).

Phylogenetic analysis of NBS-encoding genes.
To confirm the phylogenetic relationships among the NBS-encoding genes, the NBS domains of 519 NBS-encoding genes were analyzed and compared with those in poplar, and a phylogenetic tree was constructed. A few genes had longer branch lengths but the majority had relatively shorter branches (Additional file 2: Figure S1), which indicated two different evolutionary patterns occurred in NBS-encoding genes.
In general, the majority of the NBS-encoding genes were clustered according to species, which suggested species-specific duplication during the evolution of Chinese chestnut. In the phylogenetic tree, when the clade was defined by bootstrap values of greater than 50%, the clades resulting from species-specific duplication were counted. Specifically, 81 clades, including 401 NBS-encoding genes were identified with 4.95 R paralogs per clade. Additionally, the proportion of species-specific duplicated genes in Chinese chestnut was 77.26%. This result clearly indicated that NBS-encoding genes expanded into multiple gene family members after the divergence of Chinese chestnut and poplar. The N-terminal RPW8 domain (RESISTANCE TO POWDERY MILDEW8) in the NBS-encoding genes, medicates broad-spectrum resistance in Arabidopsis 12 . The analysis of RPW8 genes helps explain the origin and relationships of RPW8 genes to other genes. In the present study, a total of 15 Chinese chestnut RPW8-NBS-encoding genes are marked with solid circle in phylogenetic tree (Additional file 2: Figure S1). For comparison, five NBS-encoding genes from poplar containing the RPW8 domain are marked with triangle in poplar (Additional file 2: Figure S1). Interestingly, the genes carrying the RPW8 domain were clustered together (Additional file 2: Figure S1). Additionally, the 15 RPW8 genes from Chinese chestnut formed a relatively independent and monophyletic group, that was not phylogenetically embedded within other clades. Similar results were obtained for RNL genes from M. truncatula, potato, soybean, common bean and pigeon pea genomes [15][16][17] . Notably, the position of RPW8 genes of Chinese chestnut were located at a relatively basal, but not the most basal position on the phylogenetic tree.
The cis-element analysis of NBS-encoding genes promoter sequences. Plant defense is controlled by cis-regulatory elements corresponding to key genes involved in defense, and pathogen-specific responses 42 . Therefore, the investigation and identification of cis-acting elements in the promoters of NBS-encoding genes will help us understand the function in plant defense responses. In this analysis, we performed the cis-elemtne analysis of all NBS-encoding genes in the PLACE. It is noted that many cis-regulatory elements associate with plants responding to pathogens, including DOFCOREZM, EECCRCAH1, GT1GAMSCAM4, GT1CONSENSUS, and AGCBOXNPGLB 43,44 in promoter regions of NBS-encoding genes (Table 3 and Additional file 3), which might demonstrate that NBS-encoding genes involve in response to pathogen infections. Moreover, the results (Additional file 3) point a way to identify candidate genes that might be used for conferring disease resistance. Discussion A small number of TIR-NBS-encoding genes in Chinese chestnut. Surveys for NBS-encoding genes in the sequenced genomes of many species, including Arabidopsis, rice, grapevine, and poplar have found variable numbers of NBS-encoding genes 2,19-21 . Based on the structure of the N-terminal domain, NBS-encoding genes have been categorized into two classes, TIR type and non-TIR type. TNLs were absent in monocots, such as rice 19,20 , the dicot Aquilegia coerulea 14 , and the dicotyledonous order Lamiales , but were found in most dicots 10,45,46 . However, only 27 TIR-type genes including 22 TNL genes and five TN genes, were identified in Chinese chestnut. The proportion of TNLs in entire genome of Chinese chestnut was only 0.0577%, which was much lower than that in Arabidopsis (0.348%), grapevine (0.319%), or poplar (0.171%) 2,21 . Furthermore, the percentage of TNLs among the NBS-LRR genes of poplar (23.6%) was four times to that in Chinese chestnut (5.88%), which also demonstrated that the number of TNLs was lower in Chinese chestnut. In Chinese chestnut, the average number of exons estimated for TNLs (2.50) was lower than that in whole genome genes and non-TNLs, a result that differed from those of previous studies in Arabidopsis, poplar, grapevine, apple, pear, and peach 2,21,47 .
Furthermore, because of the small number of TIR-type NBS-encoding genes, only two Ks values (0.4781 and 0.1623) were estimated for the two TIR gene families. The average Ka/Ks ratio for TIR-type genes, with Ks ranging from 0 to 1, was lower than that for non-TIR-type genes (0.4080 compared with 0.5871). This result was consistent with those of previous studies of TNLs and non-TNLs in Arabidopsis, grapevine, poplar, apple, pear, and peach 2,21,47 . The lower Ka/Ks values for TIR gene families than for non-TIR gene families indicates greater functional constraints during the evolution of TIR gene subfamilies 48 and stronger diversifying selection in non-TIR gene families, which could provide variation that would allow plants to adapt to different pathogens in their environment 47 .

Duplication events of NBS-encoding genes in Chinese chestnut. Gene duplication plays a crit-
ical role in the generation of new R genes, increasing the number of these genes and dispersing them in the genome 7 . The number of NBS-encoding genes in Chinese chestnut was 3-and 1.2-fold that in Arabidopsis and poplar, respectively. To identify duplicated gene pairs, we defined a gene family according to three criteria. Using 70% criterion of both cutoff of coverage and sequence identity of not less than 70%, 64 gene families were detected, including 273 NBS-encoding genes in Chinese chestnut (53.60%) ( Table 2). The percentage of NBS-encoding genes occurring in multi-gene families in Chinese chestnut was nearly the same as that in the rice genome (53.7%) and lower than that in the poplar genome (78.1%). Using our 80% criterion of both cutoff of coverage and sequence identity of not less than 80%, 41.43% of NBS genes were found to belong to multi-gene families in Chinese chestnut, while 74.52% of NBS genes occurred in multi-gene families in the poplar genome. However, the proportion of genes in multi-gene families sharply declined to 19.46% when the 90% criterion was used, which showed that relatively recent duplications resulted in a small portion of the multi-gene families in Chinese chestnut.
The number of synonymous substitutions per synonymous site and the frequency distribution of Ks values could be used to infer the age of genome duplications 21 . The peak and distribution of Ks values was distinct in different species. Ks peaks varied greatly, occurring at Ks values from 0 to 0.1 in grape and poplar, and at Ks values from 0.1 to 0.2 in Arabidopsis, apple, pear, peach, and Prunus mume, but at Ks values from 0.3 to 0.4 in rice and strawberry 21,47 . In Chinese chestnut, Ks peaked in the range of values from 0.4 to 0.5 (Fig. 1), a much higher Ks value than that in poplar (Ks = 0-0.1) or in other species, indicating that the gene expansions of Chinese chestnut were more ancient. Furthermore, the Ks values for Chinese chestnut were mainly distributed in a range from 0.2 to 0.5 and were rarely greater than 0.5 (Fig. 1). The fact that 63.94% of multiple NBS-encoding genes were present in the range of Ks values from 0.2 to 0.5, demonstrating that relatively ancient duplications played an important role in the expansion of NBS-encoding genes in Chinese chestnut.

Species-specific duplication driving in the expansion of NBS-encoding genes in Chinese chestnut.
Gene duplication has supplied raw genetic material for evolution 49 and has been a major force for generating biological novelties that can lead to adaptation to environments 50 . To elucidate the evolutionary pattern of NBS-encoding genes in Chinese chestnut, a phylogenetic tree of conserved NBS domain from Chinese chestnut and poplar genomes was constructed (Fig. 3). The most distinct characteristic was species-specific duplication events, that might be responsible for the evolution of recognition of species-specific pathogens, and that responsed to selective pressure imposed by species-specific pathogens 21 .
The 401 species-specific NBS-encoding genes detected in Chinese chestnut were classified into 81 clades with bootstrap values were greater than 50%. The proportion of the species-specific duplicates reached 77.26%, suggesting large-scale genes expansion after the divergence of Chinese chestnut and poplar. A similar result has also been reported for four gramineous plants (Zea mays, Sorghum bicolor, Brachypodium distachyon, and Oryza sativa) 51  protein encoded by the polymorphic powdery mildew resistance locus RPW8 12 . In previous studies, proteins containing the RPW8 domain have been categorized into the non-TIR-NBS-encoding genes subgroup 13,14 . However, 15 RPW8 domain-encoding genes (1 XNs, 2 TNLs, 3 CNLs and 9 XNLs) clustered into an independent lineage in phylogenetic relationships (Additional file 2: Figure S1). Also, five NBS-encoding-genes from poplar encoding the RPW8 domain clustered together with 15 RPW8-NBS-encoding genes from Chinese chestnut. The clustering of RPW8-NBS-encoding genes observed in Chinese chestnut also occurred with RPW8-NBS-LRR genes from M. truncatula 15 , potato 16 , soybean, common bean, and pigeon pea 17 , which illustrated that RPW8-NBS-encoding genes did not comprise a special group of non-TIR genes but formed a separate group independent from the non-TIR genes. Moreover, the group of RPW8-NBS-encoding genes located at a relatively basal but not the most basal position in this phylogenetic tree of Chinese chestnut (Additional file 2: Figure S1). This result is similar to the study in Arabidopsis that a monophyletic group (the CNL-A group) consisted of RNL genes 2,17 . Thus, the RPW8-NBS-encoding genes were phylogenetically separate from the non-TIR genes.

Materials and Methods
Identification of NBS-encoding genes in Chinese chestnut and classification of gene family members. The entire genome sequence and annotation V1.0 of Chinese chestnut (C. mollissima) were downloaded from the hardwood genomics project (http://www.hardwoodgenomics.org/chinese-chestnut-genome). To identify NBS-encoding genes in Chinese chestnut, we used the amino acid sequence of the NB-ARC domain (PF00931) as blastp query against all known protein sequences with the threshold expectation value set to 1.0. All hits were further submitted to Pfam analysis (http://pfam.xfam.org/) to verify the presence of the NBS (NB-ARC) domain. Furthermore, the identified NBS-encoding genes were examined to detect whether they encode the LRR or TIR domain using the merged results from Pfam analysis and SMART protein motif analysis (http://smart.embl-heidelberg.de/). Finally, all identified genes were examined to detect the presence of CC domain using COILS (http://embnet.vital-it.ch/ software/COILS_form.html) databases with a threshold of 0.9 52 in Chinese chestnut and RPW8 domain using Pfam analysis in Chinese chestnut and poplar.
Genes were grouped into gene families according to two criteria, including the cutoff of coverage (aligned sequence lengths/gene lengths), and the sequence identity of not less than 70%. Likewise, the  stricter criteria for both coverage and sequence identity of not less than 80% and 90%, respectively, were used to detect the relatively recent duplications among NBS-encoding genes.

Sequence alignment and phylogenetic analyses. The NBS domain sequences of all identified
NBS-encoding genes were aligned in MEGA 5.0 using the MUSCLE program 53 . A neighbor-joining (NJ) method was then applied to construct a phylogeny of NBS-encoding genes using ClustalW 2.0 with default options and 1000 bootstrap replications 54 . Data for NBS-encoding genes from poplar (Populus trichocarpa) were obtained from a previous study 21 .
The ratio of nonsynonymous substitutions to synonymous substitutions. To detect the mode of selection, we evaluated the ratio of nonsynonymous substitutions to synonymous substitutions. Firstly, based on the protein sequences, the CDSs (nucleotide coding sequences) of NBS-encoding genes in each gene family were aligned using ClustalW 2.0 54 . Subsequently, nonsynonymous substitutions (Ka), package 55 was used for the site model and branch model test to determine selective pressure on NBS-LRR genes in Chinese chestnut. A single dN/dS ratio (model = 0), in addition to models M7 (beta) and M8 (beta-ω ) (NS site = 7 8) were used for the site model in all gene families with at least three members. Subsequently, the LR test between model M7 and M8 was carried out with critical criteria of chi-square 5.991 (p < 0.05, df = 2) and 9.210 (p < 0.01, df = 2), respectively. For the branch model, a single dN/dS ratio (model = 0) and model 0 (NS site = 0) were applied in the codeml program.

Promoter regions analysis.
To characterize the cis-acting element(s) of NBS-encoding genes, we isolated an approximately 1000 bp promoter sequence of each NBS-encoding gene. Analysis of promoter sequences was conducted using SIGNALSCAN program available in Plant cis-acting regulatory DNA Elements (PLACE) (http://www.dna.affrc.go.jp/PLACE/), a database containing mainly plant motifs extracted from the published reports 56 .

Conclusions
NBS-LRR genes, as one of the largest families of R genes, were analyzed in Chinese chestnut (Castanea mollissima), a model species for the Fagaceae, to determine their pattern of evolution. In the present study, several TNLs were identified in Chinese chestnut that were absent from monocot genomes. In addition, we found a relatively ancient duplication in Chinese chestnut compared with poplar. The expansion of NBS-encoding genes could be attributed to such species-specific duplications during the evolution of Chinese chestnut. The values for Ka/Ks in all TIR and most non-TIR gene families were less than 1, indicating purifying selection as a leading force in the evolution of NBS-encoding genes. However, the Ka/Ks values for four non-TIR gene families were greater than 1, demonstrating that their evolution was driven by positive selection. Furthermore, the relationship between Ka/Ks and Ks illustrated higher selective pressure on the newer and older genes compared with genes in the critical range of Ks from 0.4 to 0.5. Interestingly, RPW8-NBS-encoding genes clustered into an independent clade at a relatively basal, but not most basal, position in this phylogenetic analysis. Finally, many cis-elements in NBS-encoding genes promoter were related to disease resistance, which demonstrated the function in responsing pathogens and laid the foundation of identifying candidate R genes.