De novo assembly, transcriptome characterization, lignin accumulation, and anatomic characteristics: novel insights into lignin biosynthesis during celery leaf development

Celery of the family Apiaceae is a biennial herb that is cultivated and consumed worldwide. Lignin is essential for cell wall structural integrity, stem strength, water transport, mechanical support, and plant pathogen defense. This study discussed the mechanism of lignin formation at different stages of celery development. The transcriptome profile, lignin distribution, anatomical characteristics, and expression profile of leaves at three stages were analyzed. Regulating lignin synthesis in celery growth development has a significant economic value. Celery leaves at three stages were collected, and Illumina paired-end sequencing technology was used to analyze large-scale transcriptome sequences. From Stage 1 to 3, the collenchyma and vascular bundles in the petioles and leaf blades thickened and expanded, whereas the phloem and the xylem extensively developed. Spongy and palisade mesophyll tissues further developed and were tightly arranged. Lignin accumulation increased in the petioles and the mesophyll (palisade and spongy), and the xylem showed strong lignification. Lignin accumulation in different tissues and at different stages of celery development coincides with the anatomic characteristics and transcript levels of genes involved in lignin biosynthesis. Identifying the genes that encode lignin biosynthesis-related enzymes accompanied by lignin distribution may help elucidate the regulatory mechanisms of lignin biosynthesis in celery.

gies present an efficient and economic choice for characterizing nonmodel organisms, such as celery, without a reference genome 9 . Extensive de novo genomic or transcriptomic sequences of A. graveolens are valuable resources for gene discovery, molecular marker development, gene localization, comparative genomics, etc. 10,11 .
The aromatic heteropolymer lignin is the second most abundant biopolymer in secondarily thickened plant cell walls after cellulose. In particular, lignin comprises up to 30% of the total plant biomass 12 . Lignin polymers are primarily derived from the monolignols phydroxyphenyl (H), guaiacyl (G), and syringyl (S), which are formed by the dehydrogenation of the hydroxycinnamyl alcohols p-coumaryl, coniferyl, and sinapyl, respectively 13 . Lignin is predominantly deposited in secondarily thickened cell walls of vascular plants and plays important roles in cell wall structural integrity, stem strength, water transport, mechanical support, and plant pathogen defense 14 . Thus, studying the effects of lignin amount and composition on enduse properties and understanding the regulatory factors of lignin biosynthesis are indispensable to improve plant quality.
The main building blocks of lignin are monolignols (p-hydroxyphenyl, guaiacyl, and syringyl). Numerous studies have elucidated the mechanism of lignin formation and have identified the corresponding enzymes and genes responsible for lignin biosynthesis 15 . Monolignols are derived from phenylalanine through the phenylpropanoid pathway 16 . Cinnamic acid is initially formed by phenylalanine (PAL) deamination, followed by a series of hydroxylation (C4H, C3H, F5H, 4CL, and HCT), O-methylation (COMT and CCoAOMT), and reduction reactions (CCR and CAD). Lignin is finally formed by dehydrogenation polymerization (POD or LAC) after the synthesis of monolignols. The methoxylation levels of the three monolignols determine the amount and composition of lignin 17 .
Transcriptomes and microRNAs of three celery cultivars, namely, A. graveolens cvs 'Liuhe Huangxinqin', 'Jinnan Shiqin', and 'Ventura' are involved in the temperature stress response 18 . However, little is known about the molecular mechanism of genes involved in lignin biosynthesis in celery leaf development. Celery is a kind of low fat and high fiber food diets. Lignin content in celery has a directly influence on the development of cell wall thickness and xylem, which affect the quality of celery. Better understanding of the lignin biosynthesis mechanisms in celery should help to improve the celery quality.
In this study, celery ('Ventura') leaves at three stages were collected and analyzed using Illumina paired-end sequencing technology. The leaves were qualitatively evaluated for lignin distribution using histochemistry and autofluorescence microscopy ( Figure 1), and then anatomically characterized. We constructed a database of the transcriptome sequences in the 'Ventura' leaves and then isolated cDNA sequences with highly homologous genes that encode lignin biosynthesis-related enzymes, including PAL, C4H, 4CL, HCT, C3H, CCoAOMT, CCR, CAD, F5H, COMT, and POD. The cDNA sequences were used to analyze the expression profiles of genes involved in lignin biosynthesis in celery by quantitative real-time PCR. Profiling the expression of genes that encode lignin biosynthesis-related enzymes and understanding lignin distribution may help elucidate the regulatory mechanisms of lignin biosynthesis in celery. The results of this study may serve as a guide to develop breeding strategies that improve celery quality.

Results
Illumina paired-end sequencing and de novo assembly of celery. Transcriptome sequencing was carried out on A. graveolens cv 'Ventura' to reveal the transcriptome differences among the different stages of celery leaves. In this study, 35 512 555, 58 074 820 and 29 938 235 raw reads were obtained from leaves at Stage 1, Stage 2, and Stage 3, respectively. The average length of raw reads was 100 nucleotides. After stringent quality assessment and data filtering, reads with a base quality greater than 20 were selected for further analysis. A total of 32,477,416 quality reads were recorded for the leaves at Stage 1, 53,675,555 at Stage 2, and 27,158,566 at Stage 3, respectively. The quality reads for the leaves at the three stages were combined and used to draw the transcriptome information of 'Ventura'. The high quality reads were de novo assembled by Trinity software, and default setting K-mer value of 25 was used to construct the unique consensus sequences. A K-mer value of 25 improved the N50 and average length by a minimal percentage while avoided the loss in sensitivity. Finally, all short sequences were assembled into 33,213 unigenes with an average length of 1,478 bp, a maximum length of 17,075 bp, and an N50 of 2,060 bp. The length distribution of the unigenes is illustrated in Figure S1. The sequences ranging from 200 bp to 2,700 bp in length accounted for nearly 89.8% of the total. Up to 2,918 unigenes (8.8%) and 466 unigenes (1.4%) were 2,700 bp to 5,000 bp and .5000 bp in length, respectively.
Functional classification by evolutionary genealogy of genes: Non-Supervised Orthologous Groups (eggNOG), Gene Ontology (GO), Kyoto Encyclopedia of Genes and Genomes (KEGG). The eggNOG database consists of protein sequences and is encoded in 21 complete genomes, including proteins of bacteria, algae, and eukaryotes. This database was built on classifications according to their evolutionary relationships 19 . To further evaluate the completeness of the transcriptome library and the effectiveness of the annotation process, all annotated unigene sequences were searched against the eggNOG database for functional prediction and classification. A total of 26,804 sequences were assigned to eggNOG classifications ( Figure 2). Among the 26 eggNOG categories that were assigned to unigenes, the cluster for function unknown (6,606, 24.65%) was the largest group, followed by general function prediction only (5,123,19.11%), i.e., basic physiological and metabolic functions; signal transduction mechanisms (2,226, 8.30%); and post-translational modification, protein turnover, and chaperones (1,754, 6.54%). Undetermined (1, 0.01%) represented the smallest group. The unigenes for secondary metabolite biosynthesis, transport, and catabolism accounted for 3.67% (983) of the total functional genes, which are essential for secondary metabolites that contribute to the quality and taste of celery.
GO terms were assigned to assemble unigenes and provided defined ontologies to express gene product properties. GO terms are a dynamically structured control vocabulary that is applied to describe gene product in terms of their associated biological processes, cellular components, and molecular functions 20 . In the present study, 132,740 unigenes with known functions were assigned to one or more ontologies, and each unigene was assigned to a set of GO Slims. The GO enrichment analysis of unigenes is summarized in   Metabolic process (18,380 unigenes, 13.85%) and cellular process (16,165 unigenes, 12.18%) were the most highly represented groups under the biological process category. We also identified genes involved in other important biological processes, such as cellular component organization, multicellular organismal development, post-embryonic development, reproduction, response to abiotic stimulus, response to stress, and transport. For the cellular component category, cell and intracellular were the most highly represented groups, followed by cytoplasm and membrane. Regarding molecular function, binding was the most highly represented group, with numerous ligases, hydrolases, oxidoreductases, and transferases annotated. However, a few genes were assigned to the clusters of ''abscission'', ''fruit ripening'', ''extracellular space'' and ''nuclear envelope'' and no genes were found in the clusters of ''behavior'', ''translation regulator activity'', ''proteinaceous extracellular matrix'' or ''transcription regulator activity'' ( Figure 3). These GO annotations demonstrate that 'Ventura' expresses genes that encode diverse structural, regulatory, and stress proteins.
KEGG analysis 21 demonstrated the biological pathways in which the unigenes are involved. Assembled unigenes were compared with the KEGG database using BLASTx, and the corresponding pathways were established. Among the 10,973 unigenes assigned to KEGG pathways, 3,914 were assigned to metabolism, 2,207 to human diseases, 1,858 to genetic information processing, 1,416 to organismal systems, 962 to cellular processes, and 616 to environmental information processing ( Figure 4). Among the pathways that were assigned to the unigenes, carbohydrate metabolism (1,091 unigenes) was the largest group, followed by infectious diseases (976 unigenes), translation (746 unigenes), and energy metabolism (589 unigenes). Signaling molecules and interaction (1) represented the smallest group. These annotations provide a valuable resource for investigating specific processes, structures, functions, and pathways in celery research.
Significant differences in the number of unigenes and their transcript abundances were observed among the three stages ( Figure S3). Stage 1 and Stage 2 significantly differed in gene number on the pathways of energy metabolism, biosynthesis of other secondary metabolites, replication and repair, cell growth and death, immune diseases, and substance dependence. Meanwhile, Stage 2 and Stage 3 significantly differed in gene number on the biosynthetic pathways of other secondary metabolites as well as replication and repair. In general, the three stages significantly differed in gene number on the biosynthetic pathways of other secondary metabolites as well as replication and repair. Unigenes functioning in the biosynthesis of other secondary metabolites were further analyzed. The lignin biosynthetic pathway, which is tightly related to celery quality, was selected for further analysis.
Anatomic characteristics of leaf blades and petioles from 'Ventura' at three stages. This study comprehensively investigated the structural leaf development of 'Ventura' using resin-embedding microtomy and scanning electron microscopy (SEM). As shown in Figure 5, the leaf blade gradually thickened, and the spongy and palisade mesophyll tissues were tightly arranged at the three stages. The collenchyma and vascular bundles in the leaf vein thickened and expanded during leaf growth and development. As shown in Figure 6, the collenchyma grew, the vascular bundles further expanded, the phloem and xylem extensively developed, and the epidermal cells of the petioles expanded from Stage 1 to Stage 3.
Petiole transection structure was characterized with SEM ( Figure 7).The vascular bundles in petioles serve as important channels for material transport, and collenchyma strengthen and support   Lignin distribution in celery leaf identified by histochemistry and autofluorescence microscopy. Lignification has been studied through various microscopic techniques, such as histochemistry, microautoradiography, interference microscopy, UV absorbance, fluorescence microscopy, confocal Raman microscopy, and transmission electron microscopy 24 . In the present study, lignin content was qualitatively evaluated in different tissues during leaf development to investigate the lignin accumulation pattern in celery leaves. Lignified tissues were identified through histochemistry and fluorescence microscopy. Lignin presents a broad fluorescence emission range and is excited with UV. Tissues positive to safranin O-fast green staining as indicated by red stain contained lignin.
The UV-excited fluorescence in the leaf blades and petioles of celery is shown in Figure 8. Autofluorescence was easily observed in the leaves at the three stages. Lignin fluorescence was observed in the vascular bundles and cell walls, and a strong intensity was detected in the xylem. No difference was detected in the location or intensity of lignin autofluorescence in the cell walls of the leaf blades among the three stages ( Figure 8A, C, and E). The vascular bundles in the petioles expanded with development and were intensely located in the secondary walls of xylem tissues. The largest density was recorded at Stage 3 ( Figure 8B, D, and F). Lignin was deposited in the vascular bundles and the cell walls, and strong lignification was observed in the xylem. No difference in the lignin content in leaf blade cell walls was detected among the three stages. However, the lignin content in petiole vascular bundles increased with development and was particularly high in the secondary walls of xylem tissues. Lignin accumulated in the vascular tissues at Stage 1, and the highest lignin content was recorded at Stage 3.
Safranin O-fast green reagent was used for lignin histochemical staining to qualitatively evaluate lignin content. Leaf blades and petioles were embedded in paraffin, and cross sections were stained with safranin O-fast green. Red staining indicated lignin deposition. As illustrated in Figure 9, the collenchyma and vascular bundles in the petioles showed slight lignification at Stage 1. No significant difference was detected in lignin contents in the collenchymas and epidermis between Stage 1 and Stage 2, but strong lignification was observed at Stage 3 ( Figure 9A, C, and E). Lignin accumulation in the vascular bundles increased with the stages, and strong lignification was observed in the xylem ( Figure 9B, D, and E). Only a slight deposition in vascular bundle was observed at Stage 1, and considerable deposition was previously observed at Stage 3. As shown in Figure 10, lignin accumulation occurred and gradually increased in the palisade and spongy mesophyll tissues at the three stages. The palisade and spongy tissues showed minimal lignification at Stage 1 but demonstrated strong lignification at Stage 3. No lignin deposition was observed in the leaf vein of the three stages.
Lignin distribution in different tissues and at different leaf developmental stages was qualitatively measured by histochemistry and autofluorescence microscopy. In the leaf blade tissues of celery, lignin accumulation gradually increased in the palisade and spongy tissues at the three stages. However, no difference in the lignin content of cell walls was observed among the three stages. Lignin accumulation increased in the vascular bundles, collenchyma, and epidermis in the petioles at the three stages, and the xylem showed stronger lignification.

Isolation candidate genes involved in lignin biosynthesis in celery.
Monolignol biosynthesis, which contributes to celery taste, was selected for further analysis. The 11 genes involved in lignin biosynthesis are members of multigene families in higher plant (e.g. 9 CAD in Arabidopsis, 9 CAD in rice, 3 COMT in oat grass, 5 PAL in pine). In celery, the genes (AgPAL, AgC4H, Ag4CL, AgHCT,    AgC3H, AgCCoAOMT, AgCCR, AgCAD, AgF5H, AgCOMT, AgPOD) involved in lignin biosynthesis maybe also have multiple copies ( Figure 11). The AgPAL, AgC4H, Ag4CL, AgHCT, AgC3H, AgCCoAOMT, AgCCR, AgCAD, AgF5H, AgCOMT, AgPOD genes were identified using a BLAST-based search tool from the transcriptome data of celery. In this study, the genes involved in lignin biosynthesis in celery showing high homology with the reported genes participated in lignin biosynthesis were selected. These selected genes also showed high RNA transcript levels based on the transcriptome data of different stages of celery.
Our dataset includes annotated sequences for all genes involved in monolignol biosynthesis. These genes displayed high homology to Arabidopsis or other dicot genes, and most genes presented more than 80% similarity at the protein level (data not shown). This result suggests that these genes were highly conserved during the evolution. The 11 genes that encode lignin biosynthesis-related enzymes were cloned from 'Ventura'. The nucleotide sequences of AgPAL, AgC4H, Ag4CL, AgHCT, AgC3H, AgCCoAOMT, AgCCR, AgCAD, AgF5H, AgCOMT, and AgPOD are listed in Figure S4 to S14.
Expression profiles of candidate genes involved in lignin biosynthesis in different celery tissues. In the monolignol biosynthetic pathway, quantitative real-time PCR assays were utilized to study the expression profiles of lignin biosynthesisrelated genes in different tissues in 'Ventura'. The following 11 genes involved in lignin biosynthesis were differentially expressed in the roots, stems, petioles, and leaf blades of 'Ventura': AgPAL, AgC4H, Ag4CL, AgHCT, AgC3H, AgCCoAOMT, AgCCR, AgCAD, AgF5H, AgCOMT, and AgPOD.
The 11 genes were differentially expressed in the roots, stems, petioles, and leaf blades of 'Ventura' (Figure 12). These genes may perform different functions in different celery tissues and thus affect the growth and development of celery plants. The relative expression level of AgCCoAOMT was higher in the roots than in the other three tissues, and no significant difference in its expression was detected among the stems, petioles, and leaf blades. The transcript levels of AgCOMT were the highest in the stems, followed by the petioles and leaf blades, and then the roots. The transcript levels of AgC4H, AgF5H, and AgHCT were higher in the petioles than in the other three tissues. The relative expression levels of AgC4H and AgHCT were slightly lower in the roots and leaf blades and remained low in the stems. The expression level of AgF5H was lower in the roots and stems than in the leaf blades and petioles. The transcript levels of AgPAL, AgC3H, Ag4CL, AgCCR, AgCAD, and AgPOD were higher in the leaf blades than in the other three tissues. The relative expression levels of AgPAL, AgC3H, AgCCR, and AgCAD were lower in the roots, stems, and petioles than in the leaf blades, with no significant difference among the three former tissues. The transcript level of Ag4CL was slightly low in the stems and remained relatively low in the roots and petioles, with no significant difference between the two latter tissues. The relative expression of AgPOD was relatively low in the roots and petioles with no significant difference, and the lowest expression level was observed in the stems (Figure 12).
Expression profiles of candidate genes involved in lignin biosynthesis at different developmental stages in celery. The 11 genes mentioned above were expressed in the petioles and leaf blades of 'Ventura' at the three stages.
In petiole. The relative transcript level of AgCOMT in the petioles increased and peaked at Stage 1, decreased at Stage 2, and then continued to decrease and diminish at Stage 3. The transcript levels of AgC3H, Ag4CL, and AgPOD gradually increased among the three stages, with no significant difference in the relative expression of AgC3H and AgPOD between the two former stages. The relative expression levels of AgPAL, AgC4H, AgF5H, AgHCT, AgCCoAOMT, AgCCR, and AgCAD increased at Stage 2 and then decreased at Stage 3. The transcript levels of AgPAL, AgC4H, and AgCCoAOMT were higher at Stage 3 than at Stage 1, but that of AgHCT was lower at Stage 3 than at Stage 1. No significant difference in the transcript levels of AgF5H, AgCCR, and AgCAD was detected between Stage 1 and Stage 3 ( Figure 13).  In leaf blade. The transcript levels of AgPAL, AgC4H, and AgCOMT in the leaf blades gradually decreased at the three stages. No significant difference in AgC4H transcript was found between Stage 2 and Stage 3, and AgCOMT transcript was undetected at these two stages. A deficiency in AgCOMT reduces the quantity of S monomers and thus affects lignin composition 25 . In other words, the content of S monomers would no longer increase at the two latter stages. The expression levels of AgF5H and Ag4CL gradually increased at the three stages and peaked at Stage 3. The relative expression levels of AgC3H and AgPOD decreased at Stage 2 but increased at Stage 3. The transcript levels were slightly lower at Stage 2 than at Stage 1. The transcript levels of AgHCT, AgCCR, and AgCAD increased at Stage 2 and decreased at Stage 3. AgHCT transcript level was higher at Stage 1 than at Stage 3. No significant differences in AgCCR and AgCAD transcript levels was detected at Stage 1 and Stage 3. The relative expression of AgCCoAOMT was not significantly different among the three stages ( Figure 13).

Discussion
With the introduction of NGS, transcriptome sequencing has become an important tool in research because of its low cost and high throughput 26,27 . Short reads from Illumina high-throughput paired-end sequencing can be well assembled and used for transcriptome analysis and gene identification 28 . In the present study, larger amount raw reads were obtained from celery leaves collected at Stage 1, Stage 2, and Stage 3, respectively. The reads were assembled into 33,213 unigenes with an average length of 1,478 bp, a maximum length of 17,075 bp, and an N50 of 2,060 bp. The de novo assembly was proven to be of high quality by comparison with the reported celery databases. The unigenes obtained in the present study were basically in accordance with those reported for celery, but the average length was longer than that reported using the same platform 29 .
Considering the difficulty in estimating the number of genes and predicting the potential functions of the transcripts because of lacking reference genome, we conducted BLAST analysis using the public protein databases to indirectly identify genes. Detailed functional information is important to understand the overall expression profiles of 'Ventura' 18,30,31 . Large numbers of unigenes were assigned to various eggNOG 32 , GO 33 , and KEGG 34 classifications. To evaluate the completeness of the transcriptome library and the effectiveness of the annotation process, annotated unigene sequences were searched against the eggNOG database for functional prediction and classification. A total of 26,804 sequences were assigned to eggNOG classifications 35 . Among the 26 eggNOG categories, the cluster for function unknown represented the largest group. GO terms were assigned to assembled unigenes, and defined ontologies representing gene product properties were provided 36 . In the present study, 132,740 unigenes with known functions were assigned to one or more ontologies, and each unigene was assigned to a set of GO Slims. The gene numbers of each subcategory varied, and the variety of proteins encoded differed among the three stages. KEGG analysis can appropriately demonstrate the various biological pathways 34 . In the present study, 10,973 unigenes were assigned to KEGG pathways. The number of unigenes and their transcript abundances significantly differed among the three stages in 'Ventura'. The number of genes on the biosynthetic pathways of other secondary metabolites significantly varied among the three stages. Further analysis revealed unigenes involved in lignin biosynthesis 37 .
Lignin accumulation in different tissues and at different stages is closely related to the anatomic characteristics of 'Ventura'. Lignin is usually distributed in the cell walls of mechanical and vascular tis- sues. Lignin increases the intensity and the watertightness of cell walls, enhances the mechanical strength of stems, and improves material transport ability 38 . In the present study, lignin distribution in different tissues and at different stages was qualitatively measured by histochemistry and autofluorescence microscopy. In the petioles, the collenchyma increased in size, the vascular bundles further expanded, and the phloem and xylem extensively developed. Consistent with lignin distribution, lignin accumulation increased in the petioles, and the xylem exhibited strong lignification. In leaf blade tissues, the spongy and palisade mesophyll tissues in the leaf blades further developed and increasingly became tightly arranged at the three stages. These results were consistent with that lignin accumulation was gradually increased in the palisade and spongy tissues at the three stages.
Lignin accumulation in different tissues and at different stages is typically correlated with the activity of enzymes involved in lignin biosynthesis. This relationship has been reported in several plants, including Arabidopsis thaliana, rice (Oryza sativa), poplar (Populus tomentosa), jute (Corchorus capsularis), and eucalyptus (Eucalyptus globules) 39 . In the present study, lignin accumulation positively correlated with the transcription levels of genes that encode lignin biosynthesis-related enzymes in 'Ventura' (Figure 8-13). Lignin accumulation was gradually increased in the palisade and spongy tissues in the leaf blades at the three stages. Meanwhile, lignin accumulation increased in the vascular bundles, collenchyma, and epidermis, and the xylem exhibited strong lignification. Consistent with the observed lignin accumulation patterns, the expression levels of AgPAL, AgC4H, Ag4CL, AgHCT, AgC3H, AgCCoAOMT, AgCCR, AgCAD, AgF5H, AgCOMT, and AgPOD remained high in the petioles and leaf blades of 'Ventura' at the three stages. The transcript levels of these 11 genes were regulated during the leaf stage and remained high prior to the increase in lignin levels, suggesting the important roles of these genes in lignin biosynthesis in celery.
PAL is a rate-limiting enzyme in the phenylpropanoid pathway 40 . In the present study, PAL indirectly participated in monolignol biosynthesis and served as an intermediate link of phenolic biosynthesis in celery. The expression of AgPAL was discrepant in the different tissues, and it was higher in the leaf blades than in the other tissues. This finding suggests that this gene has different functions in celery growth and development. 4CL is another key enzyme involved in lignin biosynthesis, the last step of phenylpropanoid metabolism; this enzyme catalyzes 4-cinnamic acid into 4-cinnamoyl CoA and participates in the biosynthesis of other secondary metabolites, such as phytoalexins, flavonoids, and phenylpropanoid 41 . The transcript level of Ag4CL gradually increased in the petioles and leaf blades at the three stages; this result coincided with the increasing lignin accu- mulation at the three stages. Moreover, the transcript level of Ag4CL was higher in the leaf blades than in the petioles. Ag4CL downregulation can decrease lignin content but increase cellulose content 42 . Therefore, the ratio of lignin to cellulose maybe controlled by regulating Ag4CL expression to improve the quality of industrial crops.
CCR and CAD, which specifically catalyze the final steps of monolignol biosynthesis, may determine the total lignin content and composition in a plant. CCR catalyzes the reaction from thioesters, the products of 4CL, to their corresponding cinnamaldehydes 43 . CAD catalyzes the final reduction of hydroxyl-cinnamaldehydes to the corresponding alcohols 37 . In the present study, the transcript levels of AgCCR and AgCAD were higher in the leaf blades than in the other tissues; in addition, these levels increased at Stage 2 and decreased at Stage 3. Lignin distribution and accumulation were observed in the palisade and spongy tissues of leaves at the three stages but can only be found in the vascular bundles, collenchyma, and epidermis of the petioles. In general, lignin distribution increased in the petioles and leaf blades at the three stages. Repressing the activities of CAD and CCR, which play significant roles in plant lignification, severely affects lignification formation during plant growth and development 44 . The lignin content of celery may be controlled by regulating the transcript levels of AgCCR and AgCAD, which help improve the comprehensive utilization of celery.
HCT affects lignin content and significantly changes lignin composition. HCT downregulation greatly reduces lignin content and markedly increases the proportion of H units as compared with G and S units 45 . AgHCT transcripts were higher in the petioles than in the other tissues. The relative expression of AgHCT was higher at Stage 2 than at Stage 3, and the transcript level of AgHCT was lower at Stage 3 than at Stage 1. The proportion of H units may be higher in the leaf blades than in the petioles, and the increased proportion of H units at Stage 3 may be attributed to AgHCT downregulation.
F5H and COMT play key roles in the formation of syringyl units. F5H catalyzes the 5-hydroxylation of coniferaldehyde and coniferyl alcohol, which are then methylated by COMT 25 . The reactions either directly generate sinapyl alcohol or produce sinapaldehyde, which is reduced to sinapyl alcohol by the action of CAD 46 . AgF5H or AgCOMT downregulation strongly reduces Sunit content, whereas F5H upregulation increases S-unit content 47 . The transcript level of AgF5H gradually increased at the three stages and peaked at Stage 3. The transcript level of AgCOMT was undetected in the petioles at Stage 3 and in the leaf blades at Stage 2 and Stage 3. In celery, G units but not S units would be produced in the petioles at Stage 3 and in the leaf blades at Stage 2 and Stage 3.The lignin composition in the different tissues maybe controlled by AgF5H or AgCOMT regulation. Lignin biosynthesis is closely related to the quality of celery, a vegetable rich in various nutrients, such as vitamin C, folic acid, beta-carotene, calcium, magnesium, and potassium 48 . The taste of celery is affected by cellulose and lignin contents, whereas the quality of this vegetable is considerably influenced by lignin synthesis and related enzyme activity 49 . The main eaten part of celery were leaves, such as petioles and leaf blades. During celery growth and development, lignin content of the cell walls increased and led increased cell walls thickness. Then, tissues of celery become lignified with the increasing lignin concentration. At sufficient lignin concentration, celery leaves become lignified and inedible. Regulating lignin synthesis in celery growth and development is important to the quality of celery.
Celery is an important vegetable with low calorie and rich in dietary fiber, which will help human to keep healthy. In China, 'Ventura', 'Liuhe Huangxinqin' and 'Jinnan Shiqin' were the important three celery cultivars. 'Liuhe Huangxinqin' and 'Jinnan Shiqin' were originated from China, while 'Ventura' is originated from the United States and introduced to China. 'Ventura' and 'Jinnan Shiqin' have similar phenotypes. 'Liuhe Huangxinqin' is a local cultivar from Nanjing, which distinguishes it from the other two celeries in phenotypes 18,30 . Till now, 'Ventura' is well known for the main celery line used to genetic research 30 . Here, to investigate the lignin biosynthesis during celery leaf development, 'Ventura' was selected for de novo assembly, transcriptome characterization, lignin accumulation, and anatomic characteristics.
In this research, we elucidated the mechanism of lignin formation at the different growth and development stages of celery. The corresponding enzymes and genes involved in lignin biosynthesis were also identified. The lignin biosynthesis in celery was regulated by the genes that encode the key enzymes involved in this process ( Figure 11). The monolignols are synthesized from phenylalanine through corresponding enzymes of hydroxylation (C4H, C3H, F5H, 4CL, and HCT), O-methylation (COMT and CCoAOMT) and reduction reactions (CCR and CAD), and finally lignin is formed by the dehydrogenation polymerization enzymes (POD or LAC) 12,13,16,17 . Profiling the expression of genes that encode lignin biosynthesis-related enzymes and understanding lignin distribution may help elucidate the regulatory mechanisms of lignin biosynthesis in celery. Our study also supplied useful resources for understanding the mechanism of lignin biosynthesis and the interaction of these genes involved in plants.

Methods
Plant material preparation. Seeds of 'Ventura' were deposited in the State Key Laboratory of Crop Genetics and Germplasm Enhancement, Nanjing Agricultural University, Nanjing, China. The seeds were sown in plastic basins containing peat:vermiculite (251,v/v) and then grown in an artificial climate chamber in April 2014. The artificial climate chamber was programmed for 16 h/8 h at 25uC/15uC for day/night and 3000 lux of light intensity. Management of fertilizer and water conditions was consistent.
In this study, the leaf developmental stages of celery were defined as follows: Stage 1: leaf length was 10 cm, Stage 2: leaf length was 20 cm; and Stage 3: leaf length was 30 cm (Figure 1). Leaves at the three stages were selected for transcriptome analysis and sequenced with an Illumina Hi-seq 2000 platform (Shanghai Personal Biotechnology Co., Ltd.). The petioles and leaf blades at each stage and in different tissues (root, stems, petioles, and leaf blades) in 'Ventura' were obtained, immediately frozen in liquid nitrogen, and then stored at 280uC until use for total RNA isolation. Petioles and leaf blades at the three stages were prepared for resin-embedded sections stained with methylviolet, paraffin sections stained with safranin O-fast green, scanning electron micrographs, and fluorescence micrographs.
Transcriptome sequence processing and annotation. Illumina reads processing and assembly. A Perl script was written to remove low-quality sequences (reads with a base quality less than 20), and all sequences # 50 bp in length were discarded. Quality reads were assembled into contigs, transcripts, and unigenes with Trinity software (http://trinityrnaseq.sf.net). Multiple k-mers potentially guarantee high sensitivity, particularly against low-expressed genes 50 . De novo assembly was carried out using a k-mer value of 25 RPKM (reads per kilobase of exon model per million mapped reads) to normalize the abundance of transcripts 51 . A two fold differential was used to identify differentially expressed genes between two stages.
The orthologous gene products are classified in the eggNOG database 32 . The unigenes were aligned to the eggNOG database to predict and classify the possible functions of the unigenes (http://eggnog.embl.de/version-3.0/, http://www.ncbi.nlm. nih.gov/COG/). GO offers a dynamic-updated controlled vocabulary and comprehensively describes the properties of genes and their products in any organism. Unigenes can be classified according to GO terms within molecular functions, cellular components, and biological processes 53 . Functional category assignment was conducted using Blast2GO (http://www.blast2go.com/).
The KEGG database contains metabolic pathways, which represent molecular interactions and reaction network. Pathway assignments were carried out on the basis of the KEGG database (http://www.genome.jp/kegg). Unigenes were assigned to special biochemical pathways according to the KEGG database using BLASTx, and then a Perl script, retrieving KO (KEGG Orthology) information from BLASTx results, was developed. We established the pathway correlation between unigenes and database 54 . Isolated the genes involved in lignin biosynthesis from 'Ventura'. Genes involved in monolignol biosynthesis and celery quality were analyzed as illustrated in Figure 11. Arabidopsis or other dicot protein sequences from public databases were aligned with their corresponding genes, which were obtained from key word searches using TBLASTN (E-value threshold of 1e 25 ) 55,56 . Genes that encode lignin biosynthesis-related enzymes were cloned from 'Ventura'. Primers were designed by Primer Premier 5.0 (Table 1). cDNA sequences of 'Ventura' were used as templates for PCR amplification. The reaction volume for PCR amplification was 50 mL525 mL of Premix Ex Taq, 2 mL of template cDNA, 1 mL of each primer (20 mM), and 21 mL of sterile distilled water. The PCR reactions were run using the following program: denaturation at 94uC for 5 min; 35 cycles of 30 s at 94uC, 30 s at Tm (annealing temperature), and 90 s at 72uC; and a final step at 72uC for 10 min. The PCR products were detected and recovered by 1.2% agarose gel electrophoresis, ligated into a pMD19-T vector, and then transformed into Escherichia coli. After extracting the plasmid, PCR detection was conducted, and the bacteria liquid was sequenced. Primer synthesis and gene sequencing were accomplished by GenScript (Nanjing, China).
RNA extraction and quantitative real-time PCR analysis. The total RNA sequences were extracted by an RNAsimple Total RNA Kit (Tiangen-bio, Beijing, China) according to the manufacturer's instructions. RNA quality and purity were assessed with OD260/280 ratio and determined using a Nanodrop 2000 spectrophotometer (Thermo Scientific, Wilmington, DE). Approximately 10 mg of each sample was reverse transcribed into cDNA using a Prime Script RT reagent Kit (TaKaRa, Dalian, China).
The genes that encode lignin biosynthesis-related enzymes were subjected to quantitative real-time PCR analysis to confirm their expression. Quantitative realtime PCR was experimented to analyze the gene expression at the transcript level in an ABI 7300 Real-Time PCR System and 7300 System software. It was conducted in accordance with the manufacturer's instructions of a Real-Time PCR Kit (SYBR Green) (TaKaRa, Dalian, China). Primers were designed by Primer Premier 5.0 according to the cloned sequences derived from the genes that encode lignin biosynthesis-related enzymes in celery and selected after testing their specificity at 58uC. Agactin was used as an internal control. The primers used for quantitative real-time PCR are listed in Table 2. PCR reactions were performed as follows: 94uC for 30 s, followed by 40 cycles of 94uC for 5 s, and 58uC for 30 s. A melting curve (61cycles at 65uC for 10 s) was generated to check amplification specificity. Each reaction had three biological repeats. Ct values were represented by the mean values of three independent replicates, and the relative gene expression levels were calculated using the DD Ct method 57 . The standard errors of the mean among the replicates were calculated. Primers were synthesized by GenScript (Nanjing, China).
Resin-embedding microtomy. In this study, resin-embedding sections were prepared to analyze the inner structure of celery. Petioles and leaf blade specimens at the three stages were cut from healthy 'Ventura' plants using a razor blade. Petiole specimens were cut from the middle part, leaf blade specimens from the main leaf vein, and mesophyll cells near the vein. Specimens of approximately 2 mm 3 were prefixed with 2.5% glutaraldehyde at 4uC overnight, which was formulated with 0.05 mol?L 21 dimethyl sodium arsenate buffer (pH 7.2). The specimens were washed and dehydrated with an ethanol series of 30% to 100% and then embedded in Spurr's low-viscosity embedding medium 58 . Sections of 1 mm thickness were cut with a glass knife using a Leica Ultracut R (Germany) and then stained with 0.5% methylviolet for 10 min. The structures of the petioles and leaf blades were observed from the sections, and pictures were taken using a CCD camera under a Leica DMLS light microscope.
Scanning electron microscopy. Petiole transection structures were observed through SEM. Petiole specimens at Stage 3 were vacuum fixed in 3% glutaraldehyde at 4uC overnight and then washed four times with fresh PBS solution. The specimens were postfixed in 1% osmiun tetroxide for 2 h on ice and then washed with fresh PBS solution. The samples were dehydrated in an ethanol series from 30% to 100% and then vacuum dried. The dried samples were installed on aluminum stubs and then coated with gold palladium. Petiole transection structures were examined under a Zeiss DSM-960A SEM at an accelerating voltage of 5 kV. Photographs were collected and assembled using PHOTOSHOP (Adobe, http://www.adobe.com).
Histochemistry and autofluorescence microscopy. Lignified tissues were identified in the sections through histochemistry and fluorescence microscopy. The petioles and leaf blade specimens at the three stages were fixed in FAA (50 mL of 40% formaldehyde, 50 mL of glacial acetic acid, and 90 mL of 50% ethanol) for 24 h, dehydrated with ethanol, and then embedded in paraffin. Sections of 5 mm thickness were cut with a glass knife using a Leica Ultracut R (Germany).
For safranin O-fast green staining, sections were stained with 1% safranin O in distilled water (pH 6.7) for 10 min, washed, and then counterstained with a 0.1% solution of fast green in water for 5 min. Tissue sections with lignin were stained and observed under a light microscope. The presence of lignin was considered when the tissues were stained with red.
Lignin has a broad range of fluorescence emission and is excited with UV 59 . Xylemlignified cell walls exhibit autofluorescence 60 . To visualize lignin autofluorescence, tissue sections with lignin were visualized by fluorescence microscopy under UV excitation (330 nm to 380 nm). The fluorescence images were collected with a CCD camera.