Identification and characterisation of Dof transcription factors in the cucumber genome

Cucumber is vulnerable to many foliage diseases. Recent studies reported cloning of candidate genes for several diseases in cucumber; however, the exact defence mechanisms remain unclear. Dof genes have been shown to play significant roles in plant growth, development, and responses to biotic and abiotic stresses. Dof genes coding for plant-specific transcription factors can promote large-scale expression of defence-related genes at whole genome level. The genes in the family have been identified and characterized in several plant species, but not in cucumber. In the present study, we identified 36 CsDof members from the cucumber draft genomes which could be classified into eight groups. The proportions of the CsDof family genes, duplication events, chromosomal locations, cis-elements and miRNA target sites were comprehensively investigated. Consequently, we analysed the expression patterns of CsDof genes in specific tissues and their response to two biotic stresses (watermelon mosaic virus and downy mildew). These results indicated that CsDof may be involved in resistance to biotic stresses in cucumber.

development 25,26 . In tomato, StDof1 is specifically expressed in epidermal fragments of guard cells; StDof1 can interact with the promoter of the KST1 gene 13 . Rice OsDof3 was reported to be involved in gibberellins-regulated expression 27 . Maize Dof1 and Dof2 were confirmed to be activators of genes associated with carbohydrate metabolism and contained the C 4 key photosynthetic gene phosphoenolpyruvate carboxylase (PEPC) 28 .
Dof transcription factors have been functionally characterised in Arabidopsis 22,23 and a number of crops such as tomato 18 , rice 27 and soybean 29 . However, no such information is available in cucumber. The objective of the present study was to conduct a genome-wide characterisation of Dof gene families in the cucumber genome. We identified 36 CsDof members in the cucumber draft genome, which were classified into eight subgroups. The proportions of the CsDof family genes, duplication events, and chromosomal locations were investigated. We also analysed the expression patterns of CsDof members in various tissues and the response to inoculation of the watermelon mosaic virus (WMV) as well as downy mildew (DM) pathogens. Our study provided novel insights into the stress responses of CsDof genes to these two important cucumber pathogens and deepens our understanding of the structure and function of CsDof genes in the cucumber genome.

Results
Identification of CsDof homologues in the cucumber genome. To identify the Dof transcription factor coding genes in the cucumber genome, we used the HHM profile of the Dof domain (PF02701) as a query to perform an HMMER search (http://hmmer.janelia.org/) against the 9930 and Gy14 cucumber draft genomes. Thirty-six Dof gene sequences were identified from the HMMER database (http://www.ebi.ac.uk/Tools/hmmer/ search/hmmsearch). We then performed SMART search (http://smart.emblheidelberg.de/) to confirm the presence of the conserved Dof domain which confirmed the existence of 36 putative CsDof genes in the cucumber genome. For convenience, the 36 CsDof genes were assigned names from CsDof01 to CsDof36 based on their chromosomal locations (Table 1). These CsDof genes encoded predicted peptides ranging from 150 to 503 aa with the pI value from 4.8 to 9.6 and the molecular weight from 16.9 to 55.2 kD. All CsDof proteins were predicted to be localised in the nucleus except for CsDof30 which was extracellular (Table 1). In addition, using BLASTN, we found nine CsDof genes (CsDof02, CsDof04, CsDof10, CsDof11, CsDof13, CsDof16, CsDof18, CsDof28 and CsDof36) showed polymorphisms between 9930 and Gy14. The SNPs and their locations within each gene are presented in Supplementary Table S1.

Chromosomal locations and duplications of Dof members. As shown in
In this study, the CsDof genes duplications were analysed using the CoGe website. Two pairs of tandem duplicated genes were identified (CsDof06/CsDof07 and CsDof34/CsDof35), and 6 pairs of segmental duplication genes were observed (CsDof01/CsDof03, CsDof02/CsDof04, CsDof03/CsDof25, CsDof07/CsDof35, CsDof09/CsDof36 and CsDof20/CsDof24) (Fig. 2). To trace the dates of the duplication blocks, we estimated the Ks/Ka distances and ratios. The segmental duplications of the CsDof genes in cucumber originated from approximately 4.639 Mya (million years ago, Ks = 0.6031) to 11.66 Mya (Ks = 1.5161) with an average of 9.079 Mya (Ks = 1.180). The Ks of the tandem duplication of CsDof06 and CsDof07 was 1.8735, which dated the duplication event at 14.41 Mya; that the tandem duplication of CsDof34 and CsDof35 was 1.3398, dating the duplication event at 10.31 Mya (Supplementary Table S2).
Gene structure and conserved motifs analysis. In order to investigate the characterization of exon-intron structure, all of the CsDof genes were analyzed by using the Gene Structure Display Server. As shown in Fig. 3, the predicted number of exons among the 36 CsDof genes were relatively fewer, and they varied from one to three with 17 members having one and 14 with two. Five (CsDof 05, CsDof15, CsDof 20, CsDof 27, and CsDof 35) had three exons. Furthermore, some CsDof genes within the same subgroup demonstrated similar exon/ intron structure patterns in terms of exon number. For instance, all CsDof genes in subgroup A had no introns. The two exons CsDof gene were mainly in group C (four in C1, C2.1 and C2.2) and D (five in D1 and one in D2). The majority three excons were belonging in group B (B1 and B2). These similar structure feature may be related to their functions in cucumber genome (Fig. 3).
To obtain insights into the diversity of motif compositions in CsDof proteins, these proteins were assessed using the MEME programme. A total of 15 conserved motifs were identified (Supplementary Table S3; Fig. S2). These motifs are represented in their relative location within the protein. Motif 1 was uniformly observed in all Dof proteins and was confirmed to be the conserved Dof domain. Moreover, the CsDofs in each subgroup had several special motifs at C-terminal regions, suggesting that they had a similar function in the CsDof members within the same subgroups ( Supplementary Fig. S2). Groups A, C2.2 and D2 showed one conserved motif (motif 1). Group D1 members contained motif 4 and motif 12, and Group B1 contained motif 2 and motif 7. Cis-element analysis of CsDof genes. In the CsDof genes promoter region, many key defence cis-elements (such as ACGTABOX, ASF1MOTIFCAMV, and WBOXATNPR1) were identified. Other key elements included those in hormone signalling, such as ARFAT (auxin), ASF1MOTIFCAMVT (auxin), ERELEE4 (ethylene), GAREAT (gibberellin), LECPLEACS2 (ethylene), MYBGAHV (gibberellin), NTBBF1ARROLB (auxin), and TATCCAOSAMY (gibberellin). Some CsDof genes harboured additional transcription factor cis-regulatory elements, e.g., MYB1AT, MYB1AT, MYB1AT, MYBGAHV, MYBCOREATCYCB1, and WRKY71OS (Supplementary Table S4). All CsDof gene promoters contained the conserved elements DOFCOREZM, suggesting that these CsDof genes could be regulated by themselves. Additionally, an in silico analysis showed that the CsDof gene promoters maintained several tissue-specific elements. Some examples included root-specific (ROOTMOTIFTAPOX1, RAV1AAT and SP8BFIBSP8BIB), leaf-specific (CCA1ATLHCB1, DOFCOREZM, GATABOX, GT1CONSENSUS, IBOXCORE and RAV1AAT) and flower-specific (CARGCW8GAT) responsive elements (Supplementary Table S4).

Responses to DM and WMV pathogen inoculations. Of 36
CsDof genes, 22 were responsive to DM pathogen inoculation (Fig. 5). Of these, four CsDof genes (CsDof07, CsDof17, CsDof19 and CsDof34) had lower expression as compared with the control, whereas two genes (CsDof27 and CsDof29) exhibited higher expression than in the control after inoculation. Additionally, five members (CsDof03, CsDof18, CsDof28, CsDof35 and CsDof36) were first up-and then down-regulated. Moreover, four genes (CsDof10, CsDof12, CsDof19 and CsDof31) were the highest in expression on the first day after DM inoculation, whereas the expression of seven genes (CsDof02, CsDof04, CsDof18, CsDof25, CsDof28, CsDof35 and CsDof36) peaked on the second day after inoculation. Quantitative real-time PCR was conducted for these 36 CsDof genes to study their expression upon inoculation of the WMV viral pathogen (Fig. 6). Among them, 34 were up-or down-regulated after WMV inoculation, and no changes were observed for CsDof08 or CsDof16. Three genes (CsDof02, CsDof14 and CsDof35) were initially up-regulated but down-regulated after 24 d. The expressions of ten genes (CsDof02, CsDof09, CsDof12, CsDof14, CsDof15, CsDof19, CsDof20, CsDof22, CsDof26 and CsDof32) increased on the first day after inoculation. Overall, 20 CsDof genes exhibited higher expression on the third day after WMV inoculation. CsDof28, CsDof30 and CsDof33 were highly expressed at 3 d, 9 d and 24 d after inoculation, respectively, whereas the expression of CsDof04 and CsDof13 peaked at 18 d after inoculation.

Discussion
In recent years, gene family analysis has become an important approach to understand gene structure, function, and evolution. The Dof genes are plant-specific transcription factors that are involved in various biological processes and are ubiquitous in many plant species. The function and evolution of Dof genes have been thoroughly studied in Arabidopsis (36 AtDof genes), rice (30 OsDof genes) 30 , soybean 29   we conducted a comprehensive analysis of the CsDof family in cucumber to determine their potential functions in response to biotic stresses.
Similar to that found in rice and Arabidopsis 30 , the cucumber Dof genes had few introns (0-2) in each gene (Fig. 3). A motif analysis indicated that motif 1 was uniformly observed in all Dof proteins (Supplementary Fig. S2). Similar to the results from Arabidopsis, rice 30 and tomato 18 , this phenomenon suggested that CsDof transcription factors were evolutionarily conserved in plants.
The whole-genome duplications, segmental/tandem duplication and transposition events were crucial for gene family expansion. Cucumber did not experience a recent whole-genome duplication 1 ; thus, the remainder duplications would play crucial roles in gene expansion. In this study, we found 8 pairs of CsDof genes that had undergone duplications (Fig. 2). Intriguingly, we observed six pairs of segmental duplication and two pairs of tandem duplication events in CsDof genes. This indicated that CsDof gene segmental duplication is predominant in the evolution of cucumber and that tandem duplication is involved 18 . The duplication divergence of monocots  32 . This study showed that the mean date of CsDof genes tandem duplication events was at 12.36 Mya, whereas the mean date of segmental duplication events was at 9.079, which showed that the segmental/tandem duplication occurred after the divergence of the monocot-dicot split and that the tandem duplication occurred prior to the segmental duplication in cucumber CsDof genes.
Gene expression patterns provided important clues for gene function; thus, we conducted a digital gene expression analysis for duplicated CsDof genes in the root, leaf, stem, tendril and flower tissues using public RNA-seq data 33 . The Ka/Ks ratio analysis indicated that the CsDof genes had divergent functions after the duplication events. All tested 8 pairs of genes showed distinct expression patterns both in a tissue-specific and biotic responsive manner. For example, in one pair of segmental duplication CsDof genes, CsDof07 was highly expressed in the root and CsDof35 was highly expressed in female flowers (Fig. 4). The expression of CsDof07 was relatively low and reached maximum at the third day after WMV inoculation; however, CsDof35 was expressed at a higher level and peaked at the sixth day after WMV inoculation (Fig. 6). The tandemly duplicated CsDof genes were very differently expressed. CsDof34 and CsDof35 had specific expressions in the leaves and female flowers, respectively, and CsDof34 had the highest expression at day zero (Mock treatment) under DM inoculations; however, CsDof35 reached its peak expression at 2 d after DM inoculation (Fig. 5). These results indicated that the duplicated CsDof genes may play crucial and diversifying roles in plant development.
Dof TFs have been shown to play crucial roles in the regulatory networks of plant defence, including responses to diverse biotic and abiotic stresses 15,34,35 . In tobacco, the Sar8.2b gene can be promoted by the Dof transcription factor, which is involved in systemic acquired resistance 36 . In tomato, the Dof transcription factor was found to regulate fungus resistance through the ACBP3 gene, which promoted autophagy-mediated leaf senescence and conferred resistance to Pseudomonas syringae pv. tomato DC3000 37 . In barley seeds, the Dof transcription factors play a role in biotic stress tolerance based on their association with the cystatin gene 38 . The Dof gene may play an indirect role in responding to biotic stresses, and we speculated that the CsDof transcript factor is involved in resistance through a direct or indirect manner; these may thus promote different target resistant genes in cucumber. In this study, we investigated the response patterns of CsDof genes to DM and WMV inoculations. We found that 19 and 34 Dof genes were up regulated after DM and WMV inoculations, respectively, suggesting that most of these Dof genes may play positive roles in host defence responses; however, additional work is needed to confirm their roles.
To summarise, in this study, we reported a comprehensive analysis of CsDof transcription factor genes in the cucumber genome. The 36 CsDof genes were categorised into eight subgroups, and the structural and functional properties of each CsDof member were characterised. Most of the CsDof genes were induced by biotic stresses. Our work will assist in understanding the roles of these CsDof transcription factors in response to biotic stresses and their potential interactions with defence-related genes in the disease resistance network. pfam.sanger.ac.uk/). To identify the Dof transcription factor coding genes of Cucumis sativus, we used the HHM profile of the Dof domain as a query to perform a HMMER search (http://hmmer.janelia.org/) against the cucumber genome databases (http://www.icugi.org/; http://cucumber.genomics.org.cn/; http://wenglab.horticulture. wisc.edu/). All non-redundant sequences encoding complete Dof domains were considered to be putative Dof genes. Each non-redundant sequence was double checked for the presence of the conserved Dof domain using a SMART search (http://smart.emblheidelberg.de/). The candidate cucumber CsDof genes were named based on their distribution on the seven cucumber chromosomes. The ExPASy server (http://web.expasy.org/compute_ pi/) 39 was used to compute the pI and molecular weight of the identified CsDof proteins. We performed a nuclear localisation signal (NLS) analysis prediction of the Dof protein on the website (http://cello.life.nctu.edu.tw/).

Phylogenetic characterisation of CsDof homologs.
For a phylogenetic analysis of the plant Dof gene family, nucleotide or protein sequences from Arabidopsis were obtained from previous studies 30 . The information of Dof genes in Arabidopsis are presented in Table S6. Multiple sequence alignments were conducted on the amino acid sequences of Dof proteins from cucumber and Arabidopsis using ClusterW with default settings. Subsequently, MEGA 6.0 software was employed to construct an unrooted phylogenetic tree based on alignments using the Neighbour-Joining (NJ) method with the following parameters: JTTmodel, pairwise gap deletion and 1000 bootstraps 40 . Furthermore, maximum likelihood, minimal evolution and PhyML methods were also applied for the tree construction to validate the results of the NJ method.
Chromosomal locations and duplications. Chromosomal locations for each CsDof gene were determined via BLASTP search against the cucumber genome databases with default settings. Tandem duplications and segmental duplications in the cucumber genome were analysed using the website tool CoGe (https://genomevolution.org/CoGe/) 41 , and the duplicated genes were linked with coloured lines which created by using the Circos software 42 (http://circos.ca/). To estimate the synonymous and non-synonymous substitution rates, we used the software DnaSp 43 . The time (million years ago, Mya) of duplication and divergence of each CsDof genes were estimated using a synonymous mutation rate of λ substitutions per synonymous site per year, T = Ks/2λ (λ = 6.5610 e-9) 33 .
Gene structure analysis and conserved motif identification. The exon-intron organisations of the genes were determined using the Gene Structure Display Server (http://gsds.cbi.pku.edu.cn) through a comparison of their full-length cDNA or predicted coding sequence (CDS) 44 . The motifs of the Dof protein sequences were statistically identified using the MEME programme (http://meme-suite.org/tools/meme) with the motif length set to 6-100 and motif sites to 2-120. The maximum number of motifs was set to 15, the distribution of one single motif was "any number of repetitions" and the other parameter was "search given strand only".
Cis-element and miRNA target analysis. The mature miRNAs sequences were downloaded from miR-Base v20.0 (http://www.mirbase.org) and PMRD (http://bioinformatics.cau.edu.cn/PMRD). Known cucumber miRNAs were used to identify the miRNAs target genes in the CsDof families. The prediction was conducted using the Plant Small RNA Target Analysis Server (psRNA Target: http://plantgrn.noble.org/psRNATarget) with default parameters. Alignment between all known plant miRNA and their potential CsDofs targets were evaluated using previously described parameters 45 . To identify the putative cis-acting regulatory elements presented in the promoter regions of CsDof genes, nucleotide sequences of 2000 bp upstream regions from the translational start codon (ATG) were retrieved from the PGSC database. An in silico promoter analysis was carried out using the PLACE database (http://www.dna.affrc.go.jp/PLACE/signalscan.html).
Tissue-specific expression. The cucumber genome-wide digital gene expression was assessed as described by Baloglu et al. 33 . Illumina sequencing reads from RNA-Seq studies were retrieved from a public repository database (SRA, Sequence Read Archive) with the following accession numbers: SRR351499 (cucumber root tissue), SRR351905 (cucumber stem tissue), SRR351906 (cucumber leaf tissue), SRR351908 (cucumber male flower tissue), SRR351912 (cucumber female flower tissue) and SRR351910 (cucumber tendril tissue). All transcript data were analysed using Gene-E 3.0.240 (www.broadinstitute.org/cancer/software/GENE-E).
Response to biotic stresses. To investigate the expression profiling of CsDof genes in response to downy mildew inoculation, the GEO data were downloaded from the NCBI PubMed database (http://www.ncbi.nlm. nih.gov/pubmed/) based on previous literature 46 . The data were used to draw heatmaps with Gene-E 3.0.240 (www.broadinstitute.org/cancer/software/GENE-E).
We also experimentally tested the responses of the CsDof genes to the inoculation of WMV with the cucumber line 'Europe 8' , which is susceptible to WMV. The original WMV isolate was kindly provided by Prof. Rosario Provvidenti of Cornell University (Ithaca, New York, USA). The virus was maintained in plants of Cucurbita pepo. For inoculum preparation, diseased leaves were washed with ddH 2 O, and the suspension was diluted in 0.2 mol • L −1 phosphoric acid buffer (pH 7.0) to a concentration of 1:3 (W/V). We used the leaf inoculation method of friction. Seeds of "Europe 8" were soaked in 55 °C warm water for 4 h for disinfection. Then, the seeds were germinated in an incubator (at 28 °C) for 18 h and planted in 4 × 8 plugs in a greenhouse. When the second true leaf was fully expanded, we sprinkled a small amount of 600 ~ 800 mesh emery after frictional artificial inoculation, flushed it with clear water, and used the phosphoric acid buffer to inoculate the health leaves as a control. We set up 3 duplications, and 10 cucumber seedlings for each. Both the inoculated and control seedlings were maintained in an insect-free growth chamber with 25 ~ 30 °C and at 100% RH (Relative Humidity). Leaf samples were harvested at 0, 1, 3, 6, 9, 12, 18 and 24 days post-inoculation (dpi), which were used to extract intercellular fluid or gene expression analysis.
To investigate the CsDof gene expression, total RNA was extracted using a Huayueyang Quick RNA isolation Kit (Cat. No.: ZH120, Huayueyang Biotechnology, Beijing, China) following the manufacturer's procedure. To remove trace DNA contamination, DNase (Cat. No.: D2270A, TaKaRa Biotechnology, Dalian, China) was added to the total RNA samples. The qualities and quantities of RNA were determined using agarose gel electrophoresis and a NanoDrop ND-2000 Spectrophotometer (Thermo Fisher Scientific Inc., USA). For cDNA synthesis, 1 μg high-quality total RNA was reverse-transcribed with oligodT and random primers with Super Script III Reverse Transcriptase (TaKaRa, Dalian, China) according to the manufacturer's instructions.
For qRT-PCR analysis, the specific primers for each CsDof gene were designed according to the Dof gene sequences using Primer 3 online (http://primer3.ut.ee/) (Supplementary Table S7). The cucumber ubiquitin extension protein gene (primer sequences: 5′-GGCAGTGGTGGTGAACATG-3′ and 5′ -TTCTGGTGATGGTGTGAGTC-3′ ) was used as the reference gene 47 . qRT-PCR reactions were performed using the SYBR Premix Ex TaqTM kit (TaKaRa, Dalian, China) and a Roche LightCycler 480. All qRT-PCR experiments were performed with three biological and three technical replications. Relative gene expression was calculated using the 2 −ΔΔCt method. Then, the data were compiled to make a heatmap via Gene-E 3.0.240 (www. broadinstitute.org/cancer/software/GENE-E).