Identification and developmental expression profiling of putative alkaloid biosynthetic genes in Corydalis yanhusuo bulbs

Alkaloids in bulbs of Corydalis (C.) yanhusuo are the major pharmacologically active compounds in treatment of blood vessel diseases, tumors and various pains. However, due to the absence of gene sequences in C. yanhusuo, the genes involved in alkaloid biosynthesis and their expression during bulb development remain unknown. We therefore established the first transcriptome database of C. yanhusuo via Illumina mRNA-Sequencing of a RNA composite sample collected at Bulb initiation (Day 0), early enlargement (Day 10) and maturation (Day 30). 25,013,630 clean 90 bp paired-end reads were de novo assembled into 47,081 unigenes with an average length of 489 bp, among which 30,868 unigenes (65.56%) were annotated in four protein databases. Of 526 putative unigenes involved in biosynthesis o f various alkaloids, 187 were identified as the candidate genes involved in the biosynthesis of benzylisoquinoline alkaloids (BIAs), the only alkaloid type reported in C. yanhusuo untill now. BIAs biosynthetic genes were highly upregulated in the overall pathway during bulb development. Identification of alkaloid biosynthetic genes in C. yanhusuo provide insights on pathways and molecular regulation of alkaloid biosynthesis, to initiate metabolic engineering in order to improve the yield of interesting alkaloids and to identify potentially new alkaloids predicted from the transcriptomic information.

Scientific RepoRts | 6:19460 | DOI: 10.1038/srep19460 a mean size of 90 bp and the Q20 percentage of 96.22% were generated and sequentially de novo assembled by SOAP into 158,101 contigs, 65,839 scaffolds and 47,081 unigenes ranging from 150 to 7973 bp with the mean length of 489 bp and an N50 length of 681 bp (Table 1). A total of 9,754,893 clean reads, accounting for 39.0% of all raw reads, were uniquely mapped to 47,058 unigenes. Assembly quality analysis revealed that 74.10% of the contigs were shorter than 200b p and only 17,292 scaffolds (26.26%) and 17,274 unigenes (36.69%) were longer than 500 bp (Fig. S1a). However, 79.44% of the scaffolds (52,300) and 98.98% of the unigenes (46,132) had no gap (Fig. S1b).
To reveal the putative functions of the unigenes expressed in C. yanhusuo developing bulbs, the BLASTx searches was made against Nr, Swiss-Prot, KEGG and COG protein databases and detected 30,850 protein-coding unigenes (65.53%). Another 2,818 unigenes (5.98%) were identified by ESTScan. The lengths and gap distributions of the predicted protein-coding unigenes were displayed in Fig. S2a,b. Almost all the BLASTx and ESTScan hits had no gap in their aligned or predicted protein-coding regions.
Overview of differentially expressed genes during C. yanhusuo bulb development. To investigate dynamic changes of transcript abundance during bulb development, we performed separate RNA-seq analyses on the bulbs Day 0, Day 10 and Day 30 and evaluated the expression abundances using RPKM-normalized read counts. By means of the Illumina Hiseq 2000 sequencing platform, 6.9-7.5 million single-end (SE) clean reads were generated in the samples (Table 2). Using our assembled C. yanhusuo unigenes as the reference transcriptome, 44.8-59.6% of the unique clean reads (3.1-4.1 million) were uniquely mapped, representing 99.8% of the total mapped reads. 45,289, 45,547 and 44,831 reference unigenes were identified in the corresponding individual samples, indicating that our sequencing depth was enough to approach its saturation. The distributions of unigene coverage were similar between the three sampling times, making them comparable (Fig. S5).
Using only uniquely-mapped SE reads, we detected a total of 46,791 assembled unigenes (99.38%) expressed during C. yanhusuo bulb development, of which 46,678 had RPKM ≥ 2 in at least one developmental stage and 37,943 had RPKM ≥ 2 at all three studied stages. However, the majority (~80%) were expressed at levels of between 2 and 100 RPKM (Fig. 3a) and 42,955 unigenes were expressed at all three developmental stages (Fig. 3b), 1,387 were co-expressed at Day 0 and Day 10, 917 were co-expressed at Day 10 and Day 30, and 662 were co-expressed in Day 0 and Day 30. About 300 genes were exclusively expressed at one specific developmental stage. With a threshold of FDR ≤ 0.001 and |log2Ratio| ≥ 1, pairwise developmental stage comparisons revealed 13,354 unigenes with RPKM values ranging from 0 to 24867 and showing expression differences in at least one bulb stage. This represented 28.36% of the bulb transcriptome (Fig. 3b) (Fig. 3c). Figure S6 showed the distribution of developmentally modulated transcripts in 24 COG functional categories. Besides General function prediction only, Carbohydrate transport and metabolism and Transcription were the next two prominent categories in both the Day 10/0 and Day 30/0 comparisons. In the Day 30/10 comparison, Transcription and Posttranslational modification, protein turnover, chaperones were dominant. KEGG pathway analysis revealed a total of 3,209 DEGs, accounting for 69.03% of all KEGG-annotated DEGs. These were assigned to 276 basic reference pathways of five modules, among which metabolic pathways and biosynthesis of secondary metabolites dominated in all of three comparisons (Table S3). Sixty-eight, 26 and 59 pathways were enriched in Day 10/0, Day 30/10 and Day 30/0, respectively (Table S4), among which, there were 19 common enriched KEGG pathways, including Biosynthesis of secondary metabolites, Starch and sucrose metabolism, Glycolysis/Gluconeogenesis, Ascorbate and aldarate metabolism in carbohydrate metabolism and Tyrosine metabolism and Phenylalanine metabolism in the amino acid metabolism group. Phytohormones such as CK, auxin and GA and the underlying genes have important roles in the initiation and development of plant organs [20][21][22] . Unigenes involved in Plant hormone signal transduction were also significantly enriched in the comparisons of Day 10/0 and Day 30/0, suggesting that phytohormones play important roles in regulating the initiation and enlargement of bulbs and should be a future topic for research. GO analysis revealed that biological processes of the DEGs including carbohydrate metabolic process (GO:0005975) and response to stimulus (GO:0050896) were mainly enriched in Day 10/0 and Day 30/0 (Table S5), indicating that the bulb enlargement and maturation was associated with storage metabolism and stimulus signaling.

Putative alkaloid biosynthetic unigenes identified by KEGG-mapping analysis.
Transcriptome-wide KEGG pathway analysis showed that 1,730 unigenes were involved in biosynthesis of secondary metabolites in C. yanshusuo bulbs (Table S2). Among them, 72 were annotated to participate in alkaloid

Figure 4. Expression patterns of BIAs biosynthetic unigenes during C. yanhusuo bulb development.
(a) Biosynthetic pathways leading to major benzylisoquinoline alkaloids (BIAs) are based on Beaudoin and Facchini 26 . Pathways: black arrows 1-benzylisoquinoline; dark blue arrows, papaverine; green arrows, morphinan; purple arrows, phthalideisoquinoline; light blue arrows, protoberberine; yellow arrows, bisbenzylisoquinoline; grey arows, aporphine; pink arrows, benzo[c]phenanthridine; Enzymes for which corresponding genes have been isolated from other plants were shown in green. Enzymes for which corresponding genes have not been isolated from any plant are shown in black. The number of identified C. yanhusuo unigenes is shown in () following the abbreviated nameof the corresponding enzyme whose full name can be found in Table S1. Intermediates or alkaloid compounds that have been identified in C. yanhusuo are shaded in gray. Although coptisine, tetrahydrojatrorrhizine, jatrorrhizine, allocryptopine, N-methyltetrahydrocolumbamine, N-methyltetrahydrocoptisine and columbamine are also present in C. yanhusuo bulbs, their final biosynthetic steps mainly involved STOX 28 and are not displayed here since STOX catalyzes the formation of many such metabolites. Two unigenes identified as Pavine N-methyltransferase are not listed here. Abbreviations unlisted in Table S1 Table S7. Unigene IDs are shown on the left. On the right of the grids, the numbers of differentially expressed BIA unigenes and DEGs with similar expression patterns are shown in (), following the abbreviated names of the corresponding enzymes. Expression of the enzymes functioning in multiple branch pathways is displayed just once. biosynthesis (Table S6). Of the five assigned alkaloid biosynthetic pathways, isoquinoline alkaloid biosynthesis (49 unigenes) followed by Tropane, piperidine and pyridine alkaloid biosynthesis (30 unigenes) and Indole alkaloid biosynthesis (18 unigenes) was well represented in the transcriptome. This was consistent with the main types of abundant alkaloids found in C. yanhusuo bulb extracts, such as glaucine, tetrahydropalmatine, dehydrocorydaline and palmatine 14,23 . Isoquinoline alkaloid biosynthesis was enriched in all bulb stages and 34 unigenes in this pathway were regulated by bulb developmental stage (Table S4). However, it should be noted that some KEGG-identified unigenes may participate indirectly in alkaloid biosynthesis, functioning well upstream pathway in the alkaloid biosynthesis pathways, or as a linkage between primary and secondary metabolisms. For example, aspartate aminotransferase was associated with the biosynthesis of tyrosine-derived isoquinoline alkaloids, tropane, piperidine and pyridine alkaloids, because the product, aspartate, can participate in the biosynthesis of tyrosine as an indirect precursor 24 . To complement KEGG mapping results, we further examined the presence and expression of C. yanhusuo unigenes homologous to known alkaloid biosynthetic genes (Table S1).
Candidate genes with homology to known BIAs biosynthetic genes. Most of the reported alkaloids in C. yanhusuo are BIAs 13 . The biosynthesis of BIAs including the routes and enzymes involved in the synthesis of each one is relatively clear, although not complete [25][26][27] (Fig. 4a). Among the known intermediates /end-products in BIAs biosynthesis pathways, only a few have been identified in C. yanhusuo bulbs, as shaded in grey in Fig. 4a. However, homologs of almost all the known enzymes were identified in the C. yanhusuo bulb transcriptome (Table S7, Fig. 4a), indicating that BIA biosynthesis among different BIA-producing plant species shared most of common steps, especially those in the upstream pathways. For example, (S)-reticuline is the key intermediary branch-point in biosynthesis of different types of BIAs and exists in a varied abundance in many BIAs-producing plants 28 . Homologs that we failed to identify in C. yanhusuo are mainly downstream in the BIAs biosynthesis pathway, and include reticuline 7-O-methyltransferase (7OMT), norreticuline 7-O-methyltransferase (N7OMT), berbamunine synthase (BS), corytuberine synthase (CTS), salutaridine synthase (SalSyn) and columbamine O-methyltransferase (CoOMT). The majority of functionally characterized BIAs biosynthetic genes (Table S1) were cloned from opium poppy (papaver somniferum), which, like C. yanhusuo, belongs to family Papaveraceae and is thought to be the only source of morphine, codeine, sanguinarine and papaverine 29 . This may indicate that synthesis of the substrates /alkaloids by the downstream enzymes (BS, CTS, N7OMT and 7OMT) are themselves species-specific and deficient or undetectable in C. yanhusuo. However, this needs to be further biochemically investigated in C. yanhusuo.
DEG analysis revealed that BIAs biosynthetic unigenes were highly regulated during bulb development (Table  S7, Fig. 4b). Among 187 putative BIA biosynthetic unigenes, 147 encoding 24 enzymes except for tyrosine aminotransferase and berberine bridge enzyme were differentially expressed across the three studied developmental stages. One hundred and fourteen DEGs were significantly up-regulated durign bulb development, reaching their maximum expression levels at early bulb enlargement (Day 10) or at maturation (Day 30) (Fig. 4b). These up-regulated unigenes were distributed in all 24 gene families, indicating that biosynthesis of various BIAs is synchronously regulated in both the early route (reticuline biosynthesis) and the late steps after branch points. In addition, 33 DEGs from gene families encoding NCS, STOX, TNMT, MSH, P6H, SanR, and COR were highly expressed at bulb initiation but less at the later stages. However, some members in these gene families underwent the increased expression with bulb development. Five DEGs of TYDC which yields tyramine or L-DOPA for tyrosine-derived BIA biosynthesis were down-regulated at bulb enlargement and then increased at maturation, showing a different expression pattern from other BIA biosynthetic DEGs. This difference may indicate that expression of TYDC is regulated by multiple factors since its products tyramine or L-DOPA serve not only as the distant precursors to tyrosine-derived BIA biosynthesis but also as immediate precursors to various amines and amides.
Candidate genes for non-isoquinoline-type alkaloids biosynthesis in C. yanhusuo bulbs. Hither to, no phytochemical work has been conducted to determine whether C. yanhusuo contains other types of alkaloids such as indoles and tropane alkaloids. Although some of these types are mainly or specifically found in certain plant families, several indole alkaloids were isolated from the isoquinoline-producing plants Corydalis saxicola 30 and papaveraceous species 31 , indicating that non-isoquinoline alkaloids may also be synthesized in C. yanhusuo. Furthermore, KEGG analysis identified the involvement of some C. yanhusuo unigenes in biosynthesis of non-isoquinoline-type alkaloids (Table S6). Therefore, we also examined the presence and expression of putative C. yanhusuo homologs involved in the biosynthesis of non-isoquinoline-type alkaloids including indole alkaloids (Fig. 5), Trapone and Pyridine alkaloids (Fig. 6) and Betalains (Fig. 7) (full information is provided in Table S8. Known genes for the indole alkaloid biosynthesis, which were mainly isolated from Catharanthus roseus (Table S1). participate in the biosynthesis of monoterpenoid indole alkaloids (MIAs). Through the two branch pathways (terpenoid-forming and indole pathways), the upstream MIAs biosynthesis pathway synthesizes strictosidine, the common final precursor for various downstream MIAs (Fig. 5a). By sequence similarity searches, we identified homologs of all the known MIAs genes in the C. yanhusuo transcrtiptome, except for 7-deoxyloganic acid-7-hydroxylase (DL7H) and secologanin synthase (SLS) involved in the strictosidine biosynthesis and downstream MIAs biosynthetic genes including two O-acetyltransferase genes (Deacetylvindoline-4-O-acetyltransferase (DAT) and Minovincinine-19-hydroxy-O-acetyltransferase(MAT)), Tetrahydroalstonine synthase and N-methyl-3-aminomethylindole N-methyltransferase (Table S8, Fig. 5a). DL7H and SLS are important enzymes in the terpenoid-forming pathway that synthesizes secologanin. Both belong to the CYP72A subfamily and share about 50% sequence identity 32 . By tBlastn searches, we identified three C. yanhusuo unigenes sharing 40% identity with DL7H and SLS, but belonging to CYP734A subfamily (data not shown). Identification of intermediates /metabolites such as secologanin in the future will further clarify whether C. yanhusuo bulbs synthesize secologanin-derived MIAs and confirm the involvement of related biosynthetic genes identified in Scientific RepoRts | 6:19460 | DOI: 10.1038/srep19460 this study. We were able to identify the C. yanhusuo homologs for 12 of 17 known trapone and pyridine alkaloid biosynthetic genes and all 7 known betalaine biosynthetic genes in our transcriptome (Figs 6 and 7). Expression analysis revealed that developmentally regulated MIAs biosynthetic unigenes were mainly found in downstream synthesis routes of specific MIAs (Fig. 5b). In upstream secologanin and strictosidine biosynthesis, only four gene families, including geraniol 10-hydroxylase, 10-hydroxygeraniol oxidoreductase, 7-deoxyloganetic acid synthase and strictosidine glucosidase had differentially expressed genes across the three developmental stages. Similar to BIAs biosynthetic unigenes, expressions of gene families involved in biosynthesis of trapone and pyridine alkaloids and betalaine alkaloids were collectively regulated by the bulb development (Figs 6b and 7b).
The representativeness of the known genes participating in the biosynthesis of acridone alkaloid, capsaicinoids, caffeine and steroidal glycoalkaloids in C. yanhuso was similar to BIAs biosynthetic genes (Table S8). That is, genes encoding enzymes involved in early or certain biosynthetic steps of these alkaloids or their derivatives were present among C. yanhuso unigenes. However, no homologs of the key enzymes especially in the branch-point or final steps of respective alkaloid biosynthesis were found in our current transcriptomic database. This finding is not surprising, because the processes in which the identified genes were involved are not only confined to alkaloid metabolism but are also connected with many other pathways in the plant metabolic network. For example, the intermediates/products of the phenylpropanoid pathway that produces phenylalanine precursor for capsaicinoid biosynthesis 33 also have roles in synthesis of lignin and phenolics 34 . These unidentified enzymes included anthranilate N-methyltransferase, acridone synthase, four caffeine biosynthetic enzymes, cocaine synthase, capsaicin synthase and so on (Table S8), indicating that C. yanhusuo bulbs may not synthesize some kinds    53 and Verma et al. 54 . Enzymes for which corresponding genes have been isolated from other plants are shown in green next to the arrows. Two or more known steps between two intermediates, whose cDNA sequences were not isolated, are omitted here but indicated by two continuous arrows. Black arrows, secologanin and strictosidine biosynthesis; blue arrows, ajmaline and raucaffricine biosynthesis after the branch-point intermediate aglycone; pink arrows, tabersonine-derived alkaloid biosynthesis after the branchpoint intermediate aglycone, represented by vindoline and vinblastine biosynthesis. Full names of enzymes represented by their abbreviated names can be found in Table S1 56 and Bedewitz et al. 57 . Black arrows, N-methyl-Δ 1 -pyrrolinium cation biosynthesis; blue arrows, various tropane alkaloids biosynthesis; pink arrows, pyridine alkaloid biosynthesis, represented by nicotine biosynthesis. Enzymes for which corresponding genes have been isolated from other plants are shown in green next to the arrows. Two or more known steps between two intermediates, whose cDNA sequences haven't been isolated, are omitted here but indicated by continuous arrows. Full names of enzymes by abbreviations are provided in Table S1.  58 . Full names of abbreviated enzymes can be found in Table S1. (b) Expression patterns of representative DEGs. Illustrations are provided in Fig. 4b.
of alkaloids isolated in other species. Information on expression of the identified candidate unigenes was shown in Table S8.

HPLC-PDA analysis.
To justify the feasibility of transcriptomic information in predicting existence or non-existence of new alkaloids in C. yanhusuo bulbs, five BIAs, including sanguinarine (SG), dihydrosanguinarine (DHSG), magnoflorine, corytuberine and THP, and betalain-type alkaloids were chosen for biochemical analysis. Except for the previously known THP and DHSG, sanguinarine(SG), magnoflorine and corytuberine were not detected in C. yanhusuo bulb (Fig. 8b), Absence of the last two was consistent with the transcriptomic information (Fig. 4a).
Betalains comprise betaxanthins and betacyanins, which respectively have UV-vis wavelength absorbance maxima at around 470 nm 35 and 540 nm 35,36 . HPLC analysis detected several peaks at 470 nm (Fig. 8c); however, further full UV-vis wavelength spectra scanning showed that the absorbance maxima were between 200 and 360 nm, rather than 470 nm (Fig. S7), indicating that no betaxanthins were present in C. yanhusuo bulb, the same as betacyanins (Fig. 8d). This result conflicted with our earlier transcriptomic analysis that revealed that C. yanhusuo contained and expressed homologs of all 7 known betalaine alkaloids biosynthetic genes (Fig. 7). This suggested that one or more step(s) in the betalain biosynthetic pathway was inactivated in C. yanhusuo. Recently, betalains synthesis was induced in non-betalain plants or anthocyanin-producing species via L-DOPA feeding 37,38 and/or the introduction of a betalain-specific tyrosine-hydroxylating gene and a 4,5-dioxygenase (DODA) gene from betalain-producing plants 38,39 . These studies demonstrated that hydroxylation of tyrosine to DOPA, and the conversion of L-DOPA to betalamic acid were crucial steps for successful betalain synthesis. Moreover, they indicated that non-betalain, anthocyanin-producing plant species including Arabidopsis and potato lack the tyrosine-hydroxylating enzyme and DOPA enzymes that are essential for betalain biosynthesis. Phylogenetic analyses of CYP76AD1 and DODA among betalainic taxa and anthocyanic taxa of the order Caryophyllale revealed that their isoforms had diverged into betalain-specific clades and anthocyan-specific clades during the evolution 40 . Anthocyanic taxa lack or have decreased DODA-α and CYP76AD1-α isoforms that catalyze L-DOPA into betalamic acid and cyclo-DOPA, respectively. Thus, no or only traces of betalains are synthesized in anthocyan-producing plants. Although sequence similarity comparison assigned 15 C. yanhusuo unigenes as homologs of the known betalain-specific CYP76AD1 and DODA genes ( Fig. 7 and Table S8), it is possibly the same reason for no betalains detected in C. yanhusuo bulb.

Discussion
Palmatine and its derivated THP are the main bioactive alkaloids in C. yanhusuo bulb. Their final biosynthetic steps involve columbamine O-methyltransferase (CoOMT) and (S)-tetrahydroprotoberberine oxidase (STOX) (Fig. 4a). Although CoOMT has been shown to participate in palmatine biosynthesis in Berberis wilsoniae and Berberis aggregate in family Berberidaceae 41 and in Coptis japonica of family Ranunculaceae 42 , however, the cDNA sequence encoding CoOMT was cloned only from Coptis japonica cultured cells 42 . Tblastn of CjCoOMT combining Nr annotation find no CoOMT homologs among our C. yanhusuo unigenes (Table S7, Fig. 4a). Several reasons could account for the failure to detect homologs of a known gene in our transcriptome database: i. low expression making it hard to detect. ii. the gene is temporally and spatially-specific and not expressed in our sampled tissue and stage; or iii. protein sequence variation between different species. We excluded the first two reasons, because THP and palmatine were the second and fourth most abundant BIAs in the bulbs 23 and therefore genes responsible for their synthesis, especially the key steps, should be highly expressed and detectable. Furthermore, C. bracteata BIAs were found not translocated between plant organs and were only de novo synthesized in studied organs 43 . In addition, THP accumulated throughout bulb growth 18 . Thus, THP/palmatine biosynthetic genes should be expressed in our samples representing three key stages of bulb development. It is therefore most likely that a protein sequence variant is responsible for non-detection of CoOMT homologs in C. yanhusuo. Many reports show that palmatine is widely distributed in Phellodendron amurense, Berberis species, Mahonia species, Enantia species, Coptis species, Stephania species and Corydalis species, all of which belong to different plant families. We further examined the presence of CoOMT homologs in other palmatine-producing medicinal plants via blast searches against their available 454 transcriptomic databases (Table S9). Ours results based on protein sequence similarity showed that CoOMT homologs were present only in plants of core Ranunculales family Ranunculaceae excluding Xanthorhiza simplicissima, Berberidaceae and Menispermaceae. They are not detected in early-diverging Ranunculale family Papaveraceae to which C. yanhusuo belongs. Among BIA-related OMTs, PsSOMT2 clustered with the same clade as CjCoOMT, 4'OMTs and 6OMTs 44 . PsSOMT2 and its homologs were identified in the early-diverging Ranunculale Papaveraceae. In-vitro enzymatic analysis showed that PsSOMT1 in the same clade as CjSMT, but not PsSOMT2, can convert THC into THP while O-methylating its main substrate (S)-scoulerine into THC 44 . This suggested that the function of PsSOMT2 homologs and their catalyzed substrates changed accordingly with their protein sequences in the evolution of core Ranunculales from early diverging Ranunculales. There were three C. yanhusuo unigenes weakly homologous to PsSOMT1 and PsSOMT2, three unigenes to CjSMT, 13 unigenes annotated as 6OMT and three as 4'OMT (Table S7, Fig. 4a).
Among six morphine-specific biosynthetic enzymes (Fig. 4a), salutaridine synthase (SalSyn) was the only unidentified enzyme in the C. yanhusuo transcriptome. SalSyn catalyzes the phenol-coupling reaction. Except for the substrate recognition sites, the overall amino acid composition is poorly conserved with other phenol-coupling enzymes such as CYP80s and the human cytochromes P450 2D6 and 3A4 which catalyze C-C phenol-coupling of reticuline 45 . However, two C. yanhusuo unigenes having low homology with Papaver somniferum Salsyn protein were either not long enough to cover the conserved substrate recognition site region or lacked conserved amino acids at substrate recognition sites.

Conclusions
Alkaloids are the major bioactive compounds in C. yanhusuo bulbs and so far BIAs are the only identified alkaloid type. Here, for the first time, we presented the transcriptomic information for C. yanhusuo bulbs via the next-generation transcriptomic sequencing and bioinformatic analysis. Through blast searches against our transcriptomic database using known alkaloid biosynthetic genes as queries, the majority of BIAs enzymes involved in the biosynthetic pathway had homologs present in C. yanhusuo bulbs. Expression analysis showed that the BIAs biosynthetic candidate genes were regulated in the overall pathway during the bulb development. We also identified C. yanhusuo unigenes involved in certain steps of biosynthetic pathways of other types of alkaloids including MIAs, tropane alkaloids and betalaine alkaloids. Genes that we failed to identify in C. yanhusuo were mainly downstream in the alkaloid biosynthesis pathway and catalyzed the formation of species-specific alkaloids, which clarified at the transcriptomic level why non-isoquinoline-type alkaloids were not detected in C. yanhusuo using HPLC-DAD and ESI-MS methods (MS data not shown in the paper). Our results not only provided knowledge on molecular mechanisms regulating the biosynthesis of known alkaloids in C. yanhusuo but also information indicating what unknown alkaloids possibly exist in the bulbs and transcriptiomic basis for undetected non-isoquinoline-type alkaloids. Our study will help determine and narrow the range of enzymes to introduce into C. yanhusuo in the future genetic engineering. RNA extraction, Illumina cDNA library construction and sequencing. Developing bulbs were harvested at Days 0, 10 and 30, frozen immediately in liquid nitrogen and stored at − 80 °C untill used. Total RNA was extracted using TRIplant reagent (BioTeke, Beijing, China) following the manufacturer's instructions, quantified spectrophotometrically at 260 nm and submitted to the Beijing Genomics Institute (BGI, Shenzhen, China) for Illumina mRNA library construction and HiSeq ™ 2000 sequencing. In order to obtain the reference transcriptome data for C. yanhusuo, equal quantities of high-quality total RNA from the above three bulb stages were pooled for Illumina paired-end cDNA library preparation. For digital expression profiling of genes during bulb development, separate RNA samples from all three stages were used for Illumina single-end cDNA library preparation. Briefly, Illumina mRNA-seq library construction was as follows: mRNA was purified from 20 μ g of DNaseI-treated total RNA with a RIN of ≥ 8 using oligo(dT) magnetic beads and then sheared into short fragments by heat treatment in fragmentation buffer. Cleaved mRNA fragments were reverse transcribed into the first cDNA strands using random hexamer primers followed by the second-strand cDNA synthesis using DNA polymerase I and RNase H. The dscDNAs were purified using a QiaQuick PCR extraction kit (Qiagen), washed with the elution buffer and followed with end repair, Poly (A) tailing addition and sequencing adapter ligation. The ligated products were run on agarose gels and suitable fragments were size-selected for PCR amplification as templates. The PCR libraries were purified, quantified and tested for quality by an Agilent 2100 Bioanalyzer and ABI StepOnePlus Real-Time PCR System and sequenced using Illumina HiSeq ™ 2000.

Methods
De novo assembly and annotation of the reference transcriptome. The 90 bp paired-end raw reads were generated on the Solexa/Illumina platform to establish a reference transcriptome of C. yanhusuo. The clean reads were produced by removing the raw reads containing the adapters, any unknown nucleotides larger than 5% and low-quality reads containing more than 50% bases with Q-values ≤ 20. Preprocessed reads with a certainty of overlapping lengths were first de novo assembled into gapless contigs (≥ 60 bp) using the short read assembler-SOAPdenovo (version 1.04) (http://soap.genomics.org.cn/). Using paired-end reads, SOAPdenovo detected the contigs from the same transcripts and joined them into scaffolds (≥ 100 bp) using Ns to represent unknown bases between adjacent contigs. Unigenes with the lowest numbers of Ns were generated by further using paired-end reads to fill gaps between scaffolds until no further extension could be achieved at either end.
Unigene annotation was performed by BLASTx against protein databases including NCBI non-redundant (Nr), Swiss-Prot, KEGG and Clusters of Orthologous Groups (COG) with an e-value threshold ≤ 1 × 10 −5 . The best hits together with a priority order of Nr, Swiss-Prot, KEGG and COG, in the case of alignment conflict among the four databases, were used to decide coding sequence regions and direction of unigenes. When a unigene could not be aligned to any one database, ESTScan 46 was used to predict its coding region and to decide the sequence direction. The Blast2GO 47 program and WEGO software 48 were used in unigene GO (Gene Ontology) annotation and functional classification.
Profiling of differentially expressed genes (DEGs) using mRNA-seq reads. Fifty bp single-end raw reads were generated from mRNA-seq of the three individual developmental stages of C. yanhusuo bulbs and preprocessed by removing the reads containing only adaptors, more than 10% of unknown bases (N) and low-quality reads containing more than 50% bases with Q-values ≤ 5. The clean reads for each stage were then independently mapped onto the Corydalis reference transcriptomic assembly using SOAPaligner/soap2 (ver-sion2.20) 49 , allowing a maximum of 2 bp mismatches per read. To estimate the expression levels of transcripts, the number of uniquely-mapped reads for each unigene in a given bulb stage was computed and normalized to RPKM (Reads Per kb per Million reads) values using the RPKM formula 50 . If there were multiple unigenes for a gene, the longest one was used to calculate its expression level and coverage.
RPKMs of unigenes with missing values in a specific sample were adjusted to 0.001. RPKM values of unigenes were then compared on a pairwise developmental stage basis, that is, Day 10 versus Day 0, Day 30 versus Day 10 and Day 30 versus Day 0. P-values were calculated using the formula described by Liu et al. 51 to identify genes expressed differentially between paired developmental stages. Significantly DEGs were identified using an FDR (false discovery rate) threshold of ≤ 0.001 and a minimum two-fold change.
Functional enrichment analysis of differentially expressed genes. Gene ontology (GO) and KEGG pathway enrichment analyses of DEGs were conducted on the three genesets obtained by pairwise stage comparisons. Hypergeometric distribution test followed by Benjamini-Hochberg correction was applied to assess the significance of each term. The cutoff of Q-values for enriched function was ≤ 0.05.
Identification of alkaloid biosynthetic genes in C. yanhusuo bulb. C. yanhusuo alkaloid biosynthetic genes were decided by a combination of direct screening of transcriptome assembly annotations (Nr and KEGG) and Tblastn searches of the C. yanhusuo unigenes using known alkaloid biosynthetic genes (Table S1).
HPLC-PDA analysis of selected alkaloids. Standards of sanguinarine, dihydrosanguinarine, magnoflorine, corytuberine and tetrahydropalmatine with 98% of HPLC purity were purchased from Herbest Bio-Tech Co. Ltd, China (http://www.herbest.cn/). THP was used for the internal quality control of experimental procedures including sample extraction and HPLC-PDA analysis.
C. yanhusuo bulbs at maturation were fast frozen in liquid nitrogen and stored at − 20 °C till use. The samples of three biological replicates were homogenized with liquid nitrogen and 200 mg of each replicate was extracted with 1 ml of 80% aqueous methanol (v/v) in an ultrasonic bath for 30 min at 30 °C. The extract was then centrifuged at 10,000 rpm for 10 min at 4 °C. The supernatant was collected, filtered through a 0.45-μ m membrane filter and stored at − 20 °C for HPLC-PDA analysis. Four microliters of extracts and standards were run at 25 °C on a Waters Acquity ultra performance liquid chromatography (UPLC, Ireland) system coupled with a Waters BEH C18 column (1.7 μ m, 2.1 × 100 mm) and a photodiode array detector. The mobile phase consisted of acetonitrile (A) and 0.2% formic acid in water (B) and was delivered at 0.2 ml/min. A linear gradient was used as follows: 0-4 min, 20% A; 4-11 min, A from 20 to 100%; 11-14 min, A from 100% to 20%. DAD spectra data were achieved via PDA scanning from 190 nm to 500 nm. Betacyanins were monitored only at 535 nm using the above HPLC conditions.
The presence of five BIAs in C. yanhusuo was determined by comparing both retention time and spectral data with that of their corresponding authentic standards. The HPLC chromatogram of 5 BIAs was finally monitored