Characterization of the polyphenol oxidase gene family reveals a novel microRNA involved in posttranscriptional regulation of PPOs in Salvia miltiorrhiza

Salvia miltiorrhiza is a well-known material of traditional Chinese medicine. Understanding the regulatory mechanisms of phenolic acid biosynthesis and metabolism are important for S. miltiorrhiza quality improvement. We report here that S. miltiorrhiza contains 19 polyphenol oxidases (PPOs), forming the largest PPO gene family in plant species to our knowledge. Analysis of gene structures and sequence features revealed the conservation and divergence of SmPPOs. SmPPOs were differentially expressed in plant tissues and eight of them were predominantly expressed in phloem and xylem, indicating that some SmPPOs are functionally redundant, whereas the others are associated with different physiological processes. Expression patterns of eighteen SmPPOs were significantly altered under MeJA treatment, and twelve were yeast extract and Ag+-responsive, suggesting the majority of SmPPOs are stress-responsive. Analysis of high-throughput small RNA sequences and degradome data showed that miR1444-mediated regulation of PPOs existing in P. trichocarpa is absent from S. miltiorrhiza. Instead, a subset of SmPPOs was posttranscriptionally regulated by a novel miRNA, termed Smi-miR12112. It indicates the specificity and significance of miRNA-mediated regulation of PPOs. The results shed light on the regulation of SmPPO expression and suggest the complexity of SmPPO-associated phenolic acid biosynthesis and metabolism.

putational search and subsequent gene model prediction, a total of 12 full-length and 7 partial SmPPOs were obtained. To verify the prediction and get full-length coding sequences (CDSs) of all 19 SmPPOs, molecular cloning approaches, including 5′ and 3′ RACE and PCR amplification of coding regions, were performed. The identified SmPPOs were named SmPPO1-SmPPO19, respectively. The deduced SmPPO proteins have amino acid numbers from 564 to 625, isoelectric points (pI) from 5.23 to 8.60, and molecular weights (Mw) from 49.0 to 66.0 kDa. All of the cloned SmPPO CDSs have been submitted to GenBank under the accession numbers shown in Table 1.

Phylogenetic analysis of SmPPOs and PPOs from other plant species.
To examine the phylogenetic relationship among 70 PPO proteins from S. miltiorrhiza, P. trichocarpa, Physcomitrella patens, rice (Oryza sativa), maize(Zea mays) and soybean (Glycine max), an unrooted Neighbor-Joining (NJ) tree was constructed ( Fig. 1). PPO proteins can be divided into six subgroups (Fig. 1). Phylogenetic analysis reveals species-specific PPO subgroups, which is consistent with previous observation 16 . In P. patens, S. miltiorrhiza and P. trichocarpa, PPO expansion is largely a consequence of lineage-specific gene duplication and subsequent divergence. 12 PPO sequences from P. patens are clustered into a group. 18 of 19 SmPPOs belong to single group. 14 PtPPOs form a monophyletic group. It indicates that the expansion and diversification of these PPOs was occurred independently in different lineage. OsPPO1, OsPPO2 and PtPPO13 are clustered with ZmPPOs in group E, whereas PtPPO3, PtPPO11 and SmPPO1 are grouped with GmPPOs in group D. It indicates that these PPOs from different plant species share common ancestors.
In total, 17 SmPPO genes belonging to six paralogous groups were identified from the 19 cloned SmPPO genes ( Table 2). In order to examine the divergence of these paralogs, Ka  Conservation and divergence of SmPPO gene structures, conserved domains and motifs. Gene structure analysis showed that the intron number in the coding regions of 19 SmPPO genes varied between 0 and 2 with the majority (63%) to be 0 (Fig. 2). Consistently, most PtPPO genes have no introns in the coding regions 16 .  Only a few eudicot PPOs contain introns, such as the cherimoya PPO gene 16 and four Salicaceae PPOs 21 . It suggests the similarity of PPOs in gene structures. Further examining the position of introns showed that SmPPO8, SmPPO12, SmPPO16 and SmPPO17 contained an intron located in the region encoding CuA domain. SmPPO5 has an intron in the PPO_DWL domain-encoding region. SmPPO15 contains two introns. One is located in the CuA domain-encoding region, whereas the other one is located in the PPO_DWL domain-encoding region (Fig. 2). Plant PPO proteins typically consist of three domains: an N-terminal targeting signal, a dicopper centre, and a C-terminal region 16 . These domains are highly conserved in plants. In order to elucidate sequence features of domains and the degree of conservation of each residue, multiple sequence alignment was performed and sequence logos for these domains of SmPPOs were created using WebLogo. The results showed that the distribution of residues in domains of SmPPOs was quite similar to other plant PPOs 16 . The N-terminal region of SmPPOs, varying from 80 to 90 amino acids (Fig. 3a), harbors the chloroplast transit peptide (cTP) ( Table 3). The cTP (～ 60 aa) is rich in serine residues. All chloroplast transit peptides of SmPPOs display conserved motifs for a thylakoid transfer domain (TTD) and an alanine cleavage motif (AxA) (Fig. 3a). The dicopper centre includes two copper-binding domains, CuA and CuB (Fig. 3b,c), each of which is approximately 50 amino acids in length 16 . The highly conserved histidine residues were found in CuA and CuB domains. The CuA domain of SmPPOs contains a highly conserved HCAYC motif (HCAYC) (Fig. 3b), which also exists in other plant PPOs 16 . The second Cys in this motif is predicted to form a thioether bond with the second conserved histidine of the CuA domain. The sequences between the HxxxC motif and the second conserved histidine are highly variable. In addition to the three histidines, several other amino acids located downstream of the third conserved histidine in the CuA domain are also conserved. It includes arginine, tyrosine, leucine, phenylalanine, glutamic acid, aspartic acid, proline, and tryptophan. In the CuB domain, there are two conserved histidine residues (HxxxH motif) (Fig. 3c). The CuB domain contains several highly conserved residues (Fig. 3c). Sequence comparison suggests that CuB is more variable than CuA in SmPPOs (Fig. 3b,c). The C-terminal portion of SmPPOs shows highly conserved features. It has the PPO1_DWL domain and the PPO1_KFDV domain (Fig. 3d,e) 16 . PPO1_DWL domain is approximately of 50 amino acids in length (Fig. 3d). It contains a conserved DWL sequence motif and a tyrosine motif (YxY) (Fig. 3d). The PPO1_KFDV domain has a highly conserved sequence motif, KFDV (Fig. 3e). Additionally, the EEEEEVLVI motif enriched in glutamic acid residues and the EFAGSF motif are present in many SmPPOs

Paralogous group
Gene name  Expression profiles of SmPPO genes in S. miltiorrhiza. In order to preliminarily elucidate the physiological roles of SmPPOs, we detected the levels of SmPPO transcripts in roots, stems, leaves and flowers of S. miltiorrhiza plants. Differential expression was observed (Fig. 4). SmPPO1 and SmPPO16 were predominantly expressed in roots. SmPPO11 was predominantly expressed in stems. SmPPO3, SmPPO7, SmPPO8, SmPPO14 and SmPPO17 were strongly expressed in leaves. SmPPO6, SmPPO9, SmPPO10, SmPPO12, SmPPO13 and SmPPO15 exhibited the highest expression in flowers. The other 5 SmPPOs were mainly expressed in at least two tissues analyzed. Differentially expressed SmPPOs may be involved in different physiological processes. S. miltiorrhiza roots are well-known materials of various TCMs. The pharmacologically active phenolic acid, lithospermic acid B, accumulates mainly in the phloem and xylem of S. miltiorrhiza roots 22 . In order to elucidate the role of SmPPOs in S. miltiorrhiza roots, transcriptome-wide analysis were carried out. RNA-seq reads from periderm, phloem and xylem of S. miltiorrhiza roots were downloaded from GenBank and then mapped to SmPPOs using SOAP2.0 22,23 . The results showed that 8 SmPPOs were expressed in S. miltiorrhiza roots with RPKM value greater than 2 24 . All of them exhibited higher expression levels in the phloem and xylem than the peridem (Fig. 5), indicating their potential in lithospermic acid B biosynthesis and metabolism. Yeast extract and Ag + -responsive SmPPOs. It has been shown that PPOs are involved in plant response to stress 5 . In order to determine whether SmPPOs play a role in plant defense, we carried out a transcriptome-wide analysis of SmPPO expression in response to the treatment of yeast extract (100 μ g/ml) and Ag + (30 μ M), a combination of biotic and abiotic stresses 24 . RNA-seq data of S. miltiorrhiza hairy roots treated with or without yeast extract (100 μ g/ml) and Ag + (30 μ M) were downloaded from GenBank and then mapped to SmPPOs using the SOAP2.0 software 23,24 . Using a cutoff of RPKM value greater than 2 24 , a total of 12 SmPPOs were found to be expressed in hairy roots. Compared with the level in non-treated control, all of the 12 SmPPOs expressed in hairy roots were differentially expressed at all three time-points of yeast extract and Ag + treatment (Fig. 7). SmPPO5 was up-regulated at the time-point of 12-h-treatment, whereas down-regulated after 24-and 36-h-treatment. SmPPO1 and SmPPO14 were significantly up-regulated at all time points. The other 9 SmPPOs were significantly down-regulated at all time points (Fig. 7). It suggests that over 60% of SmPPOs are yeast extract and Ag + -responsive.

Responses of
SmPPOs are not regulated by miR1444 in S. miltiorrhiza. In Salicaceae, a subset of PPOs are regulated by a microRNA, termed miR1444 7,15,21 . This microRNA is a lineage-specific young microRNA 15 . MiR1444-mediated regulation of PtPPOs is significant in copper homeostasis and stress responses in P. trichocarpa 7 . In order to examine whether SmPPOs are regulated by miR1444, we analyzed published and our own high throughput sequencing data of small RNAs from roots, stems, leaves and flowers of S. miltiorrhiza 26,27 29 . With the maximum expectation of 3.0 applied in the target search, a total of 54 small RNAs with sequence reads greater than four were identified. These small RNAs were aligned with the whole genome sequence of S. miltiorrhiza 28 . Secondary structures were predicted for the retrieved genomic DNA sequences using mfold 30 as described previously 31 . The structures were manually checked and miRNAs were annotated by applying the criteria suggested by Meyers et al. 32 . As a result, we finally identified a miRNA stem-loop structure designated as Smi-MIR12112 (Fig. 8a). Smi-miR12112 is a novel miRNA that has not been reported previously. Using the poly(A) adaptor RT-PCR method 33 , we analyzed the expression of Smi-miR12112 in roots, stems, leaves and flowers of 2-year-old, field-grown S. miltiorrhiza Bunge (line 993) and roots, stems and leaves of two-month-old plants cultivated in vitro. Smi-miR12112 exhibited higher expression level in young roots, young stems and young leaves compared with the level in mature roots, mature stems, mature leaves and flowers (Fig. 8b). High level of expression in young tissues indicates that Smi-miR12112 plays more important roles in young tissues than mature ones. In order to determine whether Smi-miR12112 is transcriptionally regulated by MeJA, we analyzed the expression of Smi-miR12112 in leaves of S. miltiorrhiza plantlets with or without MeJA treatment using the qRT-PCR method. The results showed that the level of Smi-miR12112 was up-regulated at all four time-points (Fig. 8c). Differential response of Smi-miR12112 (Fig. 8c) and SmPPOs (Fig. 6) to MeJA treatment suggest the complexity of the miRNA and SmPPO-associated gene regulatory networks.
Computational target prediction showed that 15 of the 19 identified SmPPOs contained a sequence near-perfectly complementary to Smi-miR12112 (Fig. 9). The sequence locates in a region encoding KFDV, a conserved domain of PPO proteins. Large proportion of SmPPOs having putative miRNA target sites could be due to deep conservation of sequence in the KFDV region. Plant mature miRNAs usually guide RNA-induced silencing complexes (RISCs) to cleave target mRNAs at the tenth complementary nucleotide from the 5′ end of

. Expression of SmPPOs in roots (Rt), stems (St), leaves (Le) and flowers (Fl) of S. miltiorrhiza.
The expression levels were quantified using the quantitative RT-PCR method. Transcript levels in leaves were arbitrarily set to 1 and the levels in other tissues were given relative to this. Error bars represent standard deviations of mean value from three biological and three technical replicates. ANOVA (analysis of variance) was calculated using SPSS. P < 0.05 was considered statistically significant.

Discussion
PPOs exist in land plants, fungi and some bacteria, such as P. patens 17,35 , P. trichocarpa 1,7,8 potato 36 , tomato 37 , walnut 38 , eggplant 39 , sugarcane 17 , litchi 40 , Vicia faba 41 , and grapevine 42 . PPOs are encoded by a gene family with member numbers varied significantly among species 17 . The non-vascular moss, P. patens, contains thirteen PPO members 36 . Soybean has eleven. Sorghum (Sorghum bicolor) has eight. Maize (Zea mays) and purple false brome (Brachypodium distachyon) contain six. Fox millet (Setaria italica) has four. Rice contains two. Cucumber, cassava, castor bean and walnut have one 6,38 . No PPO was detected in Arabidopsis, Brassica napus and green algae (Chlorella, Stigeoclonium, Microspora, Ulva and Spirogyra) 17 . The lack of a PPO gene in various plant species points to the ecological or secondary metabolic functions for PPOs 16 . Consistently, the number of PPO genes in an organism is not directly related to the size of its genome. For instance, the Selaginella moellendorffii genome (~100 Mbp) is one of the smallest plant genomes known; however, the number of PPO genes, 11, in this species is relatively greater 16 . Analysis of the P. trichocarpa genome assembly v3.0 (http://www.phytozome.net/poplar.php#B) showed the existence of 15 full-length PtPPO genes. In this study, we identified a total of 19 full-length PPO genes in S. miltiorrhiza. It suggests that S. miltiorrhiza contains the largest PPO gene family in plant species with the whole genome sequence available.
It has been shown that land plant PPO genes originate in bacteria via an ancient horizontal gene transfer event 43 . The transferred PPO gene may expand through gene duplication or lost through chromosome rearrangement or fragment deletion during plant evolution, resulting in significant variation in PPO gene numbers in different plant species. The lack of PPO genes in Arabidopsis suggests that PPO genes may be lost through deletion and mutation during chromosome rearrangement. Consistently, phylogenetic and gene structure analysis indicated that PPO genes are relatively conserved across different species. For instance, most PPO proteins share similar domains and motifs across plants. The intron/exon structures of PPO genes are also largely conserved. The evolution history suggests that PPO genes from different organisms may share conserved molecular functions.
Recent studies implicate that PPOs are involved in the biosynthesis of phenolic acids. PPOs from Portulaca grandiflora and other plants in the Caryophyllales are able to hydroxylate tyrosine to 3, 4-dihydroxyphenylalanine (DOPA) and are considered to be one of the key enzymes for the biosynthesis of water-soluble pigment betalains, which replace the anthocyanins in flowers and fruits of most Caryophyllales plants [44][45][46][47][48] . A vacuole-localized RNA-seq reads were mapped to the cloned ORFs of SmPPOs. Genes with RPKM value greater than 2 were analyzed for differential expression using Fisher's exact test. P < 0.05 was considered as differentially expressed. *Indicates significant differential expression compared with the level in periderm.
PPO from snapdragon (Antirrhinum majus), named AmAS1, specifically catalyzes the formation of aurones from chalcones, a class of plant flavonoids responsible for the yellow coloration of flowers 17 . Larreatricin hydroxylase (LtLH), a typical PPO from creosote bush (Larrea tridentate), specifically hydroxylates (+ )-larreatricin to (+ )-3′ -hydroxylarreatricin and is thought to play a central role in the biosynthesis of the creosote bush 8-8′ linked lignans 49 . In walnut, PPO is involved in the phenylpropanoid pathway and the tyramine pathway and acts as an indirect regulator of cell death 38 . Phenolic acids are a group of bioactive compounds in S. miltiorrhiza. The SmPPOs invovled in phenolic acid biosynthesis are currently unknown. Further analysis of SmPPO functions through genetic transformation will definitely shed lights on the mechanism of phenolic acid biosynthesis in S. miltiorrhiza. In addition to be involved in phenolic acid biosynthesis, the diversified expression patterns indicate that SmPPOs may also play significant roles in other physiological processes. Evidence obtained in this study includes that the expression of SmPPOs exhibits apparent tissue specificity (Fig. 4), and the majority of SmPPOs are responsive to MeJA treatment (Fig. 6) and yeast extract and Ag + treatment (Fig. 7).
miRNAs are a class of small endogenous non-coding RNAs with size about 21 nucleotides. They play vital roles in multiple developmental and physiological processes in various organisms through sequence-specific regulation of target genes at the transcriptional or post-transcriptional level 50 . MiRNA-mediated posttranscriptional regulation is important for the function of a subset of PPOs in P. trichocarpa and grapevine 15,51 . In P. trichocarpa, at least 13 PtPPOs are regulated by miR1444 7 . In grapevine, VvPPO is regulated by miR058 51 . Although miRNAs have been studied extensively in the past several years, only few documents have been reported for miRNAs in S. miltiorrhiza 26,27,[52][53][54][55] . The functions of most S. miltiorrhiza miRNAs are still unknown. Analysis of small RNA data revealed the loss of miR1444 in S. miltiorrhiza. Instead, a novel PPO-targeting miRNA, designated as Smi-miR12112, exists in S. miltiorrhiza. It targets to 15 of the 19 identified SmPPOs in a region encoding the conserved KFDV domain. The target site of Smi-miR12112 is different from those of Pt-miR1444s and Vv-miR058, which locate in the regions encoding the conserved CuB domain and the thylakoid transfer domain, respectively. Since miRNA target sites generally occur outside of family-defining domains 56 , the origination and evolution of PPO conserved domain-targeting miRNAs may be under strong negative selection. It suggests the significance of miRNA-mediated posttranscriptional regulation of PPOs in plants.  57 . Plantlets treated with carrier solution were used as controls. Three independent biological replicates were carried out for each experiment.
Gene prediction and cDNA cloning. The amino acid sequences of 15 P. trichocarpa PPOs (PtPPOs) were downloaded from the P. trichocarpa genome assembly v3.0 (http://www.phytozome.net/poplar.php#B). To obtain sequencing data for all of the SmPPO genes, tBLASTn searches were performed on the genome database of S. miltiorrhiza line 99-3 (http://www.ndctcm.org/) 28 . An e-value cut-off of e −10 was applied. Gene model prediction was carried out for the retrieved genomic DNA sequences on the GENSCAN web server (http://genes.mit. edu/GENSCAN.html). The predicted gene models were examined and comparatively analyzed with the other genome database of S. miltiorrhiza (http://www.herbal-genome.cn) 58 .
To clone the full-length SmPPO cDNAs, rapid amplification of 5′ (5′ -RACE) and 3′ (3′ -RACE) cDNAs ends was performed using the SMART TM RACE cDNA amplification kit (TaKaRa Bio, Otsu, Japan). The nesting and the nested PCR amplification was performed on cDNA reverse-transcribed from total RNA using primers listed in Supplementary Tables S1 and S2. Full-length coding sequences were amplified by PCR using a combination of gene-specific forward primers and reverse primers (Supplementary Table S3). PCR products were purified, cloned and sequenced.
Bioinformatic analysis and phylogenetic tree construction. The theoretical isoelectric point (pI) and molecular weight (Mw) were analyzed using the Compute pI/Mw tool on the ExPASy server (http://web.expasy. org/compute_pi/) 59 . The conserved domain of SmPPO proteins was searched against the Conserved Domain Database (CDD, http://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi). The expected e-value threshold of 1.0 and the maximum size of hits to be 500 amino acids were applied 60 . Sequence logos were created on the WebLogo server (http://weblogo.berkeley.edu/logo.cgi) 61 .
Paralog identification and synteny analysis. Paralog groups were identified using BLASTP with the following criteria applied. It includes E value ≤ 10-40, cumulative identity percentage (CIP) ≥ 60%, and cumulative alignment length percentage (CALP) ≥ 70%. The paralog groups were recognized as homologous SmPPO gene groups among all of the SmPPO genes identified in the S. miltiorrhiza genome. cDNA sequences of each paralogous group and orthologous group were subjected to multiple sequence alignments. The numbers of nonsynonymous substitutions per nonsynonymous site (Ka) and synonymous substitutions per synonymous site (Ks) were calculated using MEGA 7.0 62 . Quantitative real-time reverse transcription-PCR (qRT-PCR). The first cDNA strand was reversely transcribed using SuperScript Ш Reverse Transcriptase (Invitrogen, Carlsbad, CA, USA). qRT-PCR analysis of The level of transcripts in leaves treated with carrier solution (CK) was arbitrarily set to 1 and the levels in leaves treated with MeJA were given relative to this. Mean values and standard deviations were obtained from three biological and three technical replicates. ANOVA (analysis of variance) was calculated using SPSS. P < 0.01(**) were considered statistically extremely significant. (d) Validation of Smi-miR12112-directed cleavage using degradome analysis. X-axis shows the nucleotide (nt) position of targets and Y-axis shows the reads obtained by degradome sequencing. Each black spot represents a degradome fragment mapped to the target gene. The red spots indicate that the products are resulted from Smi-miR12112-directed cleavage. SmPPOs in flowers, leaves, stems and roots of 2-year-old plants and in plantlets treated with MeJA was carried out as described previously 57 . Gene-specific primers were designed using the tool of IDT designing primers (http://www.idtdna.com/scitools/Applications/RealTimePCR/) (Supplementary Table S5). The length of amplicons was between 80 bp and 200 bp. SmUBQ10 was used as an internal control as previously described 54,57 . The expression of Smi-miR12112 was analyzed using the poly(A) adaptor RT-PCR method as described previously 33 . Real-time PCR was performed using 5′ -CGATCTTGATACCACCAATGG-3′ as the forward primers and 5′ -GCGAGCACAGAATTAATACGAC-3′ as the reverse primer. 5.8S rRNA gene was selected as a reference 7 . Gene expression data from three biological replicates was standardized as described previously 63 . ANOVA (analysis of variance) was calculated using SPSS (Version 19.0, IBM, USA). P < 0.05 was considered statistically significant, and P < 0.01 was considered extremely significant.
Analysis of SmPPOs expression using RNA-seq data. RNA-seq data from various S. miltiorrhiza tissues was downloaded from GenBank. The tissues include periderm, phloem and xylem of roots (SRR1640458), hairy roots, and hairy roots treated with yeast extract (100 μ g/ml) and Ag + (30 μ M) (SRR924662) 22,24 . RNA-seq reads were mapped to SmPPOs using SOAP2.0 23 and analyzed as described previously 57 . SmPPOs with the RPKM value greater than 2 were analyzed for differential expression using Fisher's exact test. P < 0.05 was considered as differentially expressed.

Identification of S. miltiorrhiza miRNAs with perfect or near-perfect complementarity to
SmPPOs. S. miltiorrhiza small RNAs with the potential to target SmPPOs for cleavage were predicted using psRNATarget 29 . The nineteen SmPPO genes identified were used as target transcript candidates. The maximum expectations of 3.0 and the target accessibility-allowed maximum energy to unpair the target site of 25.0 were applied. The identified small RNAs were aligned with the genome of S. miltiorrhiza 27 . Secondary structures of S. miltiorrhiza genomic DNA sequences with small RNAs aligned were predicted using mfold 30 . In each case, the lowest energy structure was analyzed as described previously 30 . Degradome and experimental verification of miRNA-directed cleavage of SmPPOs. Degradome analysis of miRNA-targeted SmPPOs was carried out using SOAP2.0 23 . To map miRNA cleavage sites in SmPPO targets, the modified RNA ligase-mediated rapid amplification of 5′ cDNAs method (5′ RLM-RACE) was performed using the SMART TM RACE cDNA amplification kit (TaKaRa Bio, Otsu, Japan). The nesting and the nested primers used in this experiment are listed in Supplementary Table S6. Nesting PCR amplification was performed under the following program of touchdown PCR: 94 °C for 5 min, 5 cycles of amplification 94 °C for 30 s and 72 °C for 3 min, 5 cycles of amplification at 94 °C for 30 s, 70 °C for 30 s and 72 °C for 3 min, 25 cycles of amplification at 94 °C for 30 s, 56 °C for 30 s and 72 °C for 3 min, followed by a final extension at 72 °C for 10 min. Nested PCR amplification was carried out under the following conditions: 94 °C for 5 min, 25 cycles of amplification at 94 °C for 30 s, 56 °C for 30 s and 72 °C for 3 min, followed by a final extension at 72 °C for 10 min. PCR products were purified, cloned and sequenced.