Functional divergence and origin of the DAG-like gene family in plants

The nuclear-encoded DAG-like (DAL) gene family plays critical roles in organelle C-to-U RNA editing in Arabidopsis thaliana. However, the origin, diversification and functional divergence of DAL genes remain unclear. Here, we analyzed the genomes of diverse plant species and found that: DAL genes are specific to spermatophytes, all DAL genes share a conserved gene structure and protein similarity with the inhibitor I9 domain of subtilisin genes found in ferns and mosses, suggesting that DAL genes likely arose from I9-containing proproteases via exon shuffling. Based on phylogenetic inference, DAL genes can be divided into five subfamilies, each composed of putatively orthologous and paralogous genes from different species, suggesting that all DAL genes originated from a common ancestor in early seed plants. Significant type I functional divergence was observed in 6 of 10 pairwise comparisons, indicating that shifting functional constraints have contributed to the evolution of DAL genes. This inference is supported by the finding that functionally divergent amino acids between subfamilies are predominantly located in the DAL domain, a critical part of the RNA editosome. Overall, these findings shed light on the origin of DAL genes in spermatophytes and outline functionally important residues involved in the complexity of the RNA editosome.

interact selectively with diverse PPR proteins by the binding of the DAL domain to PPR motifs 17 , and moreover, DAL proteins can connect to form hetero-and homodimers 16 . A variation of the DAL gene (ORRM1) was identified and functionally analyzed; it harbors a pair of truncated RIP domains (RIP-RIP) at its N terminus and an RRM domain at its C terminus 28 . ORRM1 is an essential plastid editing factor that can interact selectively with PPR proteins via its RIP-RIP domain, and the ORRM1 protein can also bind to sequences near at least some of its RNA targets in vitro 28 . Furthermore, the RRM domain can rescue the editing defect in orrm1 protoplasts independent of RIP domains, and three other RRM-containing proteins were identified because of their roles in organelle RNA editing, suggesting that the RRM domain participates in the RNA editosome [18][19][20] . Together, DAL proteins may be connectors between the site-specific PPRs and the as-yet-unknown deaminase or other components in the RNA editosome, such as RRM-containing proteins, PPO1 21 and OZ1 22 .
Compared with the RNA editosomes responsible for C-to-U or A-to-I (deamination of adenosine to inosine) RNA editing in mammals, the plant organelle RNA editosomes have more diverse components 15 . In addition to the interpretation that more RNA-edited sites in plant organelles require more trans-acting editing factors, the diverse composition of the organelle RNA editosome in plants probably overcomes the deficiency in RNA editing caused by the mutation of PPR protein or changes in the cis-acting sequences of edited sites 15 , especially for those edited sites in plant mitochondrial genomes which evolve much more quickly. Thus, the origin, classification and evolution analysis of trans-acting factors is important for understanding the evolution and molecular mechanism of the RNA editosome in plants. In the RNA editosome, DYW-PPR genes undergo purifying selection at sites targeted for RNA editing because they are important for recognizing cis-element sequences 1,29,30 . However, the functional evolution and origin of the DAL gene family is unknown.
In this study, we identified the DAL proteins in various plant lineages, including green algae, moss, ferns, gymnosperms and flowering plants, to investigate functional divergence and origin of the DAL gene family in plants. The result indicated DAL genes are specific to spermatophytes other than to lower plants. Plant DAL genes shared a strong conserved gene structure and appear to have evolved from the I9-containing proprotease via exon shuffling. Functional divergence analysis revealed that there was significant functional divergence between different DAL clades which may be associated with differences in the roles different DAL genes play in RNA editing and RNA metabolism. The evolutionary and functional divergence analysis of the DAL genes in plants presented here provides useful information for further probing the molecular mechanism by which DAL proteins contribute to the RNA editosome.

Results
Identification and sequence analysis of DAL genes in maize. To identify putative DAL genes in the maize genome, we searched the maize genome annotation data with known plant DAL proteins as a query. In total, we obtained 7 putative DAL genes in maize named ZmDAL1-ZmDAL7 based on their order on the chromosomes ( Fig. 1a and Supplementary Table S1). ZmDALs were distributed on 5 of 10 maize chromosomes, and chromosomes 9 and 10 both had two ZmDAL genes ( Supplementary Fig. S1). The gene model of ZmDAL1 was reannotated correctly by analyzing the similarity between ZmDAL genes and their orthologs ( Supplementary  Fig. S2). The veracity of each gene model of ZmDAL genes was assessed using reverse transcription polymerase chain reaction (RT-PCR) assays with the gene-specific primers listed in Supplementary Table S2, as 4 of 7 ZmDAL genes had more than one transcript for each ZmDAL gene in the MaizeGDB database (http://www.maizegdb. org/). The RT-PCR results indicated that seven ZmDAL genes were expressed in maize seedlings and only a single transcript was found for each ZmDAL gene ( Supplementary Fig. S3). All identified maize DAL genes encoded proteins ranging from 215 (ZmDAL1) to 412 amino acids (aa) (ZmDAL2), and their isoelectric points (Ip) were similar (>8.0).
No known motif was found in the maize DAL proteins by screening the PFAM and INTERPRO databases, except the MORF box (called the DAL domain in this study) which had been identified previously 16,17 . Novel putative motifs were explored using the MEME server with different motif lengths. By selecting a motif length between 10 and 50 aa, we identified 4 conserved motifs, and all 4 motifs were located in the DAL domain (Fig. 1b,c,d), suggesting that the DAL domain is a conserved sequence among Arabidopsis and maize DAL proteins. To obtain an intact motif containing the DAL domain, we enlarged the MEME motif length and identified one motif containing 114 aa ( Supplementary Fig. S4). Like their homologs in Arabidopsis 16 , maize DAL proteins were predicted using TargetP and Predotar to enter mitochondria or chloroplasts. Of them ZmDAL1 and ZmDAL6 were also detected in the plastid nucleoid proteome by searching the maize organelle proteomics database (PPDB, http://ppdb.tc.cornell.edu/) (Fig. 1a).
The gene structures of the ZmDAL genes were constructed by aligning the extracted genomic sequences to predicted cDNA sequences of maize DAL genes. This showed that ZmDAL genes have a conserved gene structure (Fig. 2); each of the ZmDAL genes has 3 introns with the intron phases 2, 1 and 1 separating DAL domain-encoding exons 1, 2, 3 and 4 (Fig. 2b). Motifs 1 and 2 are encoded by exon 1; motif 3 is encoded by exons 2 and 3; and motif 4 is located in exon 4 ( Fig. 2c and d). Furthermore, the length of exons 2 (98 basepairs, bp) and 3 (66 bp) is conserved among all five ZmDAL genes, even though the size of the introns between the exons varies between different ZmDAL genes.
Identification and phylogenetic analysis of plant DAL genes. To (Fig. 3). The numbers of DAL genes of higher plants used here are comparable, ranging from 6 (in A. coerulea) to 11 (in A. lyrata). In total, 79 DAL genes were identified in 10 plant genomes   Table S3). In addition, we identified ORRM1-like genes in this study that were also specific to seed plants, and these genes encoded two tandem truncated DAL domains at the N terminus and one RNA recognition motif (RRM) at the C terminus, except the MA_10436715g0010 protein found in P. abies, which had no C-terminal RRM domain ( Supplementary Fig. S5).
To investigate the phylogenetic relationship among plant DAL genes, an unrooted neighbor-joining (NJ) tree containing all 79 DAL proteins was generated based on the conserved DAL domain alignment (Figs 4 and S5). On the basis of the phylogeny, the DAL gene family in plants was subdivided into five groups, named group I to group RNA editing sites (organelle C-to-U RNA editing), DYW-type PPR genes, DAG-like genes, ORRM genes, and Inhibitor I9 genes were identified in the plants listed on the left. The checkmark in the box denotes that the above genes can be found in the corresponding genomes, and the cross in the box indicates none of above genes are found in these genomes.  (Fig. 4). In the NJ tree shown in Fig. 4, DAL genes of each group were all from diverse plant species. In groups I, II and V, species-specific gene duplication events occurred after the lineages diverged, resulting in the inclusion of more than one DAL gene per species (Fig. 4). Since the DAL genes were found to be specific to spermatophytes, we can infer that the ancestral DAL gene appeared after the divergence of seed plants and ferns.
The exon-intron organization analysis of 79 plant DAL genes indicated that plant DAL genes all share a conserved gene structure, with the 2-1-1 intron phase pattern separating DAL domain-encoding exons, as observed in maize DAL genes, except Al_477997 and MA_489006g0010, which have intron phase patterns 0-1-1 and 0-2-2, respectively ( Supplementary Fig. S6).
The source of plant DAL genes. Putative genes or gene fragments homologous to DAL genes were identified in lower plants to identify the origin of DAL genes in higher plants by lowering the HMMER search threshold (E-value of full sequence <0.01). Peptidase S8 propeptide/proteinase inhibitor I9 domain of subtilisins were identified as putative homologs of DAL proteins in P. patens and S. moellendorffii but not in C. reinhardtii (Supplementary Table S4). The proteinase inhibitor I9 domain is the propeptide of the serine peptidase family S8A (subtilisin family) and is responsible for the modulation of folding and activity of these proenzymes 32 . In addition to the protein similarity of the inhibitor I9 domain and the DAL domain, inhibitor I9 domain-encoding genes or gene fragments have conserved gene structure with DAL genes, including the 2-1-1 intron phase pattern and the 98-bp exon (Figs 5 and S7), which suggests that DAL genes probably originated from inhibitor I9 domain-encoding DNA sequences. The combination of inhibitor I9 domain-encoded exons and other exons, such as RRM-encoding exons, could be responsible for the appearance of DAL genes and ORRM1-like genes in higher plants.

Functional divergence evaluation between plant DAL subfamilies. As reported previously,
Arabidopsis DAL genes play inequable roles in RNA editing for different RNA sites by binding diverse DYW-PPR proteins 27 . To investigate the different functional constraints between these members, we conducted a maximum likelihood test of functional divergence using DIVERGE v3.0 33 . The unrooted NJ tree was generated with complete amino acid sequences of plant DAL proteins excluding those of P. abies and V. vinifera ( Supplementary  Fig. S8). Two types of functional divergence (type I and type II) between gene clusters of the DAL gene family in flowering plants were examined. The theta (θ) ML values were calculated, and the results demonstrated that the coefficients of type I (θ I ) for 6 of 10 pairwise comparisons between DAL subfamilies were significantly greater than zero (Bonferroni corrected P < 0.05), and only one pairwise subfamily comparisons showed significant divergence with the coefficients of type II (θ II ) test (Bonferroni corrected P < 0.05) ( Table 1). Functional divergence-related sites were identified based on the posterior probabilities with a cut-off value of 0.85, and most were located in the DAL domain ( Supplementary Fig. S9). These observations indicate that there were significant site-specific shifted selective constraints on most members of the DAL gene family. Furthermore, we also observed that the values of θ I were much larger than the estimates of θ II in each pairwise comparison ( Table 1), indicating that type I functional divergence predominantly contributed to the diversified evolution of plant DAL genes. In addition, we checked the functional divergence of intragroup members of the DAL genes, such as Sub. Ia vs. Sub.Ib ( Supplementary Fig. S8), but there was no significant functional divergence in any intragroup comparison (Supplementary Table S5), suggesting that intragroup DAL genes might play similar conserved roles in different plant lineages.
GC content of DAL genes in monocots and dicots. The GC content, an important genomic feature, plays a critical role in determining the physical properties of DNA molecules and genome regulation by providing substrates for DNA methylation 34 . The base composition analysis of plant DAL genes revealed that the DAL genes of monocots (grass) have a higher GC content than those of dicots (Fig. 6a). To investigate DNA methylation of the GC-rich DAL genes, CpG islands of ZmDAL6 and At1g11430 as representatives were predicted, and there were two CpG island regions identified in ZmDAL6 but not in At1g11430 (Fig. 6b). We analyzed the DNA methylation of the first CpG island of ZmDAL6, which was located at the exon 1-harboring region, using bisulfite sequencing and observed that 7 cytosines within a 19-bp region (nucleotides 1,222-1,240 of the ZmDAL6 DNA sequence in Fig. 6c) were methylated, including three CHH sites, two CHG sites and one CG site (Fig. 6c). In the 607-bp CpG island, however, most cytosines were not substantially modified by DNA methylation. Moreover, the codon base composition of the DAL genes was analyzed, and the result showed that the GC content of each base of one codon in monocot DAL genes was higher than that of dicot DAL genes (Student t-test, P < 0.01). For each group, only the codon second base (GC2) of group III DAL genes showed no significant difference in GC content between dicots and monocots. The largest difference of GC content was found in the codon third base (GC3) of groups I and IV between dicots and monocots (Fig. 6d), and in these two groups, more than 75% of the monocot DAL GC3 bases were guanine or cytosine nucleotides. Given that no significant functional divergence was observed between intragroup DAL genes, we inferred that the higher GC content of DAL genes in monocots could be caused by GC-biased gene conversion because the base composition of DAL genes was consistent with the high GC content of monocot genomes 34-36 .

Expression analysis of ZmDAL genes by RT-PCR.
Given the important roles of DAL genes in plant organelle RNA editing, the preferential expression patterns of ZmDAL genes were analyzed. To analyze the expression pattern of ZmDAL genes, a NimbleGen maize microarray data (ZM37) 37 was performed for 60 tissues representing 11 major organ systems and various developmental stages of the B73 maize inbred line using ZmDAL probes. The median log signal values for all 7 ZmDAL genes were extracted. Four ZmDAL genes (ZmDAL2, ZmDAL3, ZmDAL4, and ZmDAL5) showed a constitutive expression pattern in 60 different tissues, with a CV value <5% (Supplementary Fig. S10). ZmDAL1 had a much higher expression level in the leaves than in other tissues, while ZmDAL6 showed a lower expression level in the roots and first internode compared with that in other tissues or organs. The predominant expression levels of ZmDAL7 were observed in the developing seed, embryo and endosperm. Of these DAL genes, ZmDAL1 and ZmDAL6 were predicted to localize to chloroplasts and were preferentially expressed in the leaves, despite different expression patterns of the two genes ( Supplementary Fig. S10). The expression pattern was similar for paralogous genes (ZmDAL3 and ZmDAL4), indicating they were formed by segmental duplication and retained their function (Supplementary Fig. S1).
To confirm the organ-specific expression of ZmDAL genes shown by microarray data, RT-PCR was performed with total RNA isolated from the roots, leaves, ears, and immature tassels. The RT-PCR analysis revealed that the results for ZmDAL2, ZmDAL3, ZmDAL4, and ZmDAL5 match with the DNA chip data but that the other ZmDAL genes do not (Fig. 7). However, the expression levels of ZmDAL2, ZmDAL3, ZmDAL4, and ZmDAL5 were abundant in the ears and tassels, where more biological energy from mitochondria is required 38 . ZmDAL1 was expressed little in the four tissues. ZmDAL7 showed higher expression levels in the leaves, ears, and tassels in comparison to the roots. ZmDAL6 was predominately expressed in tassels but showed little or no expression in roots, leaves and ears, according to the RT-PCR analysis (Fig. 7).

Discussion
RNA editing of a single nucleotide, such as C-to-U and A-to-I substitution, requires trans-factors to recognize the nucleotide to be edited and remove the amino group. In mammals, A-to-I RNA editing is catalyzed by a family of enzymes called adenosine deaminases that act on RNA (ADARs) 39 , while C-to-U editing of apolipoprotein B mRNAs is performed by the zinc-dependent RNA-editing enzyme apolipoprotein B editing catalytic subunit 1 (APOBEC-1), which interacts with APOBEC-1 complementation factor (ACF) for site-specific editing 40 .
In plant organelles, C-to-U RNA editing is mediated by site-specific DYW-PPR proteins and by several other nuclear-encoded factors, including DAL proteins, PPO1 21 and OZ1 22 . However, the precise roles of each component of the plant RNA editosome in the complicated editing machinery are not yet well known. In this study, we performed a comprehensive analysis of DAL genes in plants and uncovered their seed plant-specific distribution, origin and the potential role of functional divergence. Using a custom-built HMM file derived from multiple DAL domain alignments, we screened a plant annotation database for putative DAL genes and found that DAL genes were specific to seed plants. The absence of DAL genes in lower plants, in which there are thousands of organelle RNA-edited sites 41,42 , indicates that the RNA editosome differs between higher and lower plants 2,16,17 . Given that the presence of DYW-PPR genes in plants is associated with C-to-U RNA editing events 42 and that DAL proteins interact with PPR proteins 15 , PPR genes may not be necessary for the emergence of DAL genes in plants (Fig. 3), although DAL proteins can interact with P. patens PPR proteins 43 . Furthermore, the homologs of OZ1, ORRM and PPO1 proteins, which have been proven to interact with DAL proteins, were found in P. patens and S. moellendorffii 15 (Fig. 3), and in particular,  Supplementary Table S2, and the data were displayed using Kismeth software (http://katahdin.mssm.edu/kismeth/revpage.pl). (d) Codon base composition of DAL genes in flowering plants. The significant differences of codon base composition between dicot and monocot DAL genes were statistically analyzed according to Student's t-test. *P < 0.05; **P < 0.01. GC1, the GC contents at the first base of one codon; GC2, the GC contents at the second base; GC3, the GC contents at the third base.
Scientific RepoRts | 7: 5688 | DOI:10.1038/s41598-017-05961-2 putative PPO1 proteins in P. patens (Pp1s28_304V6.1) and S. moellendorffii (Sm_82264) all have the 22-aa regions that are essential for their interaction with DAL proteins 21 , suggesting that these additional RNA editing factors are also not required for the evolution/function of DAL genes, although there is no evidence for the roles of these proteins in RNA editing in lower plants. However, we cannot exclude the possibility that unidentified alternative genes aside from those of the DAL gene family may play similar roles in RNA editing in lower plants. In addition to the regulation of RNA editing via bridging multiple subunits in the RNA editosome, DAL proteins probably have other functions required in higher plants.
When we searched DAL proteins with the dal.hmm information as a query, propeptide inhibitor I9 domains of plant subtilisins were found to show high similarity to part of the DAL domain. In addition, the inhibitor I9 domain-encoding genes and DAL genes have a conserved gene structure, including the intron phases 2, 1, and 1 and the 98-bp exon ( Supplementary Fig. S7), and inhibitor I9 genes were present prior to the divergence of seed plants from ferns (Fig. 3). Therefore, it is inferred that the DAL genes originated from inhibitor I9 genes by combining I9 domain-encoding exons with another unidentified sequence via exon shuffling. Additional evidence for this hypothesis stems from the fact that the ORRM1-like genes encode two tandem DAL domains followed by an RRM domain at the C terminus 28 , which could have arisen from the combination of two I9 domain-encoding exons and an extra RRM domain-encoding sequence. In addition to the protein sequence similarity and conserved gene structure between the DAL genes and I9 domain-encoding genes, they also have similar molecular function, as the DAL domain mediates the protein-protein interaction of DAL proteins with other RNA editing factors 15 , while the I9 domain inhibits proenzymes by hiding substrate-binding domains 32 .
Plant DAL genes were assigned to five distinct subfamilies based on their phylogenetic relationships (Fig. 4). In Arabidopsis, DAL/RIP genes play nonequivalent roles in RNA editing. The major factor is RIP1 (At3g15000), which belongs to group II and controls 75% of RNA-edited sites in mitochondria and 20% of RNA-edited sites in chloroplasts 27 . RIP2 (At2g33430), RIP3 (At3g06790), RIP9 (At1g11430) and RIP8 (At4g20020) have moderate effects on RNA editing and are in groups I, III, IV and V, respectively (Fig. 4). As minor genes, RIP4 (At5g44780), RIP5 (At1g32580), RIP6 (At2g35240) and RIP7 (At1g72530) each shares clades with the above major or moderate factors, suggesting that these DAL genes have recently duplicated and have evolved to only control few RNA editing sites. We analyzed the functional divergence between DAL groups to identify the sites distinguishing the different DAL members. Five groups resulted in 10 pairwise comparisons; of these comparisons, 6 showed significant type I functional divergence, and one showed significant type II functional divergence (Table 1). In addition, we analyzed the functional divergence between intragroup members. These intragroup DAL genes were divided into two subgroups: a dicot subgroup (a) and a monocot subgroup (b) (Supplementary Fig. S8). No significant functional divergence was observed between intragroup members of DAL genes, although the GC contents of DAL genes in monocots and dicots were different. The sites involved in functional divergence between DAL groups were predominantly localized to the DAL domains, and relaxed selection on these sites would serve to increase the complexity for determination of RNA editing because DAL proteins act on RNA editing in the form of heterodimers in addition to homodimers. Also, these sites probably account for the interaction of each DAL with different PPR proteins in the RNA editosome. Therefore, it is understandable that different dimers formed with homogenous or heterogeneous DAL proteins, which confer RNA editing to corresponding sites, have increased the diversity of RNA editing regulation in higher plants. However, further studies on the biochemical character of DAL proteins and the crystal structure of the RNA editosome are required to parse the roles of Figure 7. RT-PCR analysis of maize DAL genes in the four different tissues of maize. The total RNA of four tissues including seedling roots, seedling leaves, 5-cm immature ears and non-emerged immature tassels was isolated and used to perform the semi-quantitative RT-PCR of ZmDAL genes. Actin1 was used for internal controls to normalize the RNA contents in each sample. Primers used are shown in Supplementary Table S2. DAL proteins with the functionally diverged sites. In addition, the putative effects of DAL proteins on other RNA processing events in addition to RNA editing should be further investigated.

Identification and sequence analysis of putative DAG-like genes in plants. Known MORF/RIP
genes At4g20020, At2g33430, At3g06790, At5g44780, At1g32580, At2g35240, At1g72530, At3g15000, and At1g11430 from A. thaliana 16,17,26,44 and DAG (NCBI Protein ID: Q38732) from A. majus 25 were used to query the maize filtered gene set (ZmB73_5b_FGS_translations.fasta downloaded from www.maizesequence.org) using a local BLASTP program with an E-value <1e-10 and a bit score >100. ZmDAL protein sequences were analyzed using ExPASy tools available at http://us.expasy.org/tools/. Multiple sequence alignments of ZmDAL proteins and the above known DAL proteins were performed using ClustalW (http://www.ebi.ac.uk/Tools/msa/clustalw2/) 45 . To mine the conserved domain, the alignment results (Supplementary Data File S1) were used to build a protein HMM file, dubbed dal.hmm (Supplementary Data File S2) by the hmmbuild program in HMMER 3.0 package 31 .
To investigate the evolution of DAL genes in the plant kingdom, the dal.hmm information was used as a query to search the genome annotation data of the following representative species from Phytozome v8.0 (http:// www.phytozome.net/), except those of P. abies, which were from ConGenIE (http://congenie.org/), in HMMER 3.0 package: C. reinhardtii, P. patens, S. moellendorffii, P. abies, B. distachyon, O. sativa Japonica, S. bicolor, A. formosa, V. vinifera, A. lyrata and P. trichocarpa. Protein hits with an e-value <1e-10 and sequence score of "best 1 domain" >100 were collected. The homologs of the OZ1, PPO1 and ORRM genes in the above plants were identified using a local BLASTP program with the protein sequences of known Arabidopsis OZ1, PPO1, ORRM2, ORRM3 and ORRM4 proteins as queries 15 .
The PFAM (http://pfam.sanger.ac.uk/) and INTERPRO (http://www.ebi.ac.uk/interpro/) databases were screened to detect known motifs in ZmDAL proteins and the DAL proteins of other plants. The MEME program (http://meme.nbcr.net/meme/cgi-bin/meme.cgi) was used to investigate the putative conserved motifs among these ZmDAL proteins with the following parameters: length between 10 and 50 aa, maximum number of motifs to find = 5, and one per sequence. To obtain the intact conserved DAL domain, different limits for length of each motif were taken that were between 100 and 120 aa. Gene structures of plant DAL genes. The DNA and transcript sequences of ZmDAL genes obtained from the maize sequence annotation database MaizeGDB (http://www.maizegdb.org/) were used to analyze the gene structures of ZmDAL genes. Several ZmDAL genes had more than one gene model annotated in MaizeGDB. To confirm the putative alternative splicing transcripts, transcript-specific primers (Supplementary Table S2) were designed to amplify corresponding DNA isolated from B73 seedlings and cDNA derived from B73 seedling RNA. Gene structures producing validated transcripts were drawn and displayed using the online GSDraw program of the PIECE server (http://wheat.pw.usda.gov/piece/GSDraw.php) 46 . Conserved DAL domains were also displayed using the GSDraw program. The gene structures of DAL genes from other plant species were obtained from the Phytozome v8.0 annotation database (http://www.phytozome.net/) and displayed using the GSDraw program 46 .

Subcellular localization prediction of ZmDAL proteins. Two in silico programs, Predotar 47 and
TargetP 48 , were used to predict the putative organelle localization of ZmDAL proteins. The maize organelle proteomics database (PPDB, http://ppdb.tc.cornell.edu/) 49 was screened to detect the accumulation of ZmDAL proteins.
Phylogenetic dendrogram. The multiple sequence alignment analysis of conserved DAL domains collected from 79 DAL proteins identified in maize and in other higher plants was carried out using MUSCLE v3.8.31 50 , and the resulting alignment was used to build the NJ distance phylogenetic tree using MEGA v5.0 51 by applying the Poisson substitution model, 1000 bootstrap samples, and pairwise deletion for gaps/missing data. The tree was displayed using FigTree v1.4.0 (http://tree.bio.ed.ac.uk/software/figtree/).

Functional divergence analysis.
To investigate the functional alteration of duplicated DAL genes in plants, the GU99 method within DIVERGR v3.0 33 was used to calculate the coefficients of type I and type II functional divergence (θ I and θ II , respectively) between two groups after gene duplication and to predict functionally divergent amino acids based on their different evolutionary rates. Within two duplicated groups of a gene family, type I functional divergence helps identify the relaxation of functional constraint in one group relative to that of another, while type II identifies shifting patterns of functional constraint. GC content and DNA methylation analysis with bisulfite sequencing. The entire DNA sequences of plant DAL genes together with 1 kb of upstream and downstream flanking sequences were used for calculation of GC content and prediction of CpG islands in the EMBOSS CpGplot online server (http://www.ebi.ac.uk/ Tools/seqstats/emboss_cpgplot/). To confirm the DNA methylation of ZmDAL DNA sequences, leaf DNA of B73 seedlings was isolated and treated with bisulfate using the EpiTect ® bisulfite kit (QIAGEN, USA) according to the manufacturer's instructions. The primers for detection of DNA methylation were designed using MethPrimer (http://www.urogene.org/methprimer/) 52 and modified using Primer3web (http://primer3.ut.ee/). PCR products were cloned into a pGEM-T vector (Promega, USA) and subsequently sequenced using an ABI3730 DNA sequencer (Shanghai Sunny Bio., China). DNA methylation states via bisulfite sequencing were analyzed and displayed using Kismeth software (http://katahdin.mssm.edu/kismeth/revpage.pl) 53 .

Expression analysis of ZmDAL genes in different tissues.
To investigate the spatiotemporal expression patterns of ZmDAL genes, the log2-transformed and RMA-normalized data for ZmDAL genes were downloaded from PLEXdb (http://www.plexdb.org/) 54 , and cluster analysis of these expression data were performed using Cluster v3.0 55 with the hierarchic method. A heat map was produced using Java TreeView v1.1.5 56 . The coefficient of variation (CV) was calculated according to the following equation to estimate the expression variation of each ZmDAL gene in different tissues: CV = sd/mean, where sd indicates the standard deviation of a gene in different tissues and the mean represents the average expression level.

Semi-quantitative reverse transcription PCR (semiq-RT-PCR).
Total RNA was isolated from different tissues of the B73 inbred lines, including seedling roots, leaves, 5-cm ears and immature tassels, using the Trizol ® reagent (Invitrogen, USA) according to the manufacturer's protocol. First-strand cDNA was produced from 1 μg of total RNA (25 μl reaction volume) using M-MLV reverse transcriptase (Invitrogen, USA) at 37 °C for 1 h. All gene-specific primers were designed as shown in Supplementary Data Table S2. Specific primers for the maize Actin1 gene (GenBank ID: NM_001155179) were used as an internal control. Reactions were performed with Taq Polymerase (Dalian Takara Biotechnology, China) on a Bio-Rad Thermal Cycler (Bio-Rad, USA) using the following procedure: 5 min at 94 °C to start; 32 cycles of 30 s at 94 °C, 30 s at 58 °C and 1 min at 72 °C; and a final extension step of 72 °C for 10 min to complete the reaction, and the Actin1 transcript was amplified with 28 PCR cycles. Each PCR pattern was performed in triplicate, mixtures without a template were employed as negative controls, and the maize Actin1 amplicon served as an internal control for each gene investigated.