Full-length transcriptome analysis provides new insights into the early bolting occurrence in medicinal Angelica sinensis

Angelica sinensis (Oliv.) Diels root part is an integral component of traditional Chinese medicine, widely prescribed to improve blood circulation and blood stasis. However, early bolting of A. sinensis compromises the quality of the roots and hence is a major limitation for yield of medicinal materials. To date, little information about the molecular mechanisms underlying bolting is available for this important medicinal plant. To identify genes putatively involved in early bolting, we have conducted the transcriptome analysis of the shoot tips of the early-bolting plants and non-bolting (normal) plants of A. sinensis, respectively, using a combination of third-generation sequencing and next-generation sequencing. A total of 43,438 non-redundant transcripts were collected and 475 unique differentially expressed genes (DEGs) were identified. Gene annotation and functional analyses revealed that DEGs were highly involved in plant hormone signaling and biosynthesis pathways, three main flowering pathways, pollen formation, and very-long-chain fatty acids biosynthesis pathways. The levels of endogenous hormones were also changed significantly in the early bolting stage of A. sinensis. This study provided new insights into the transcriptomic control of early bolting in A. sinensis, which could be further applied to enhance the yield of medicinally important raw materials.

To explore the full-length transcriptome of A. sinensis and transcriptomic differences between an early bolting genotype and a normal bolting genotype during bolting, the present study is performed using a combination of SMRT and NGS sequencing technologies. Differentially expressed genes (DEGs) were identified and key genes which involved in early bolting were assessed. Moreover, changes in endogenous hormone levels were also detected during bolting. The obtained transcriptome data enable further exploration into the molecular mechanism of early bolting and growth regulation of important medicinal plants in the Apiaceae family.

Results
Full-length transcriptome sequencing. Two sequencing technologies, Illumina NGS sequencing and PacBio SMRT sequencing were employed for full-length transcriptome analysis of A. sinensis (Fig. 1a), and whole shoot tips were collected before bolting (Fig. 1b). Overall, Illumina sequencing produced more than 2.61 billion clean reads (Table 1), while SMRT sequencing generated 526,679 reads. Of the total SMRT reads, 413,886 were full-length non-chimeric (flnc) with an average read length of 2024 bp (Table 2). To reduce the high error  . Approximately 98.87% of the total genes identified from SMRT sequencing were successfully annotated using these databases (Table 3), of which 12,537 were simultaneously annotated by the NR, NT, BLASTX, and BLASTP databases ( Supplementary Fig. S1). A total of 11,469 genes showed significant homologies to genes distributed across 25 categories in the KOG database ( Supplementary Fig. S2). For functional classification, genes were mapped onto the GO database (http:// www. geneo ntolo gy. org/). In total, 22,169 genes were classified into three overarching categories, namely, biological processes, cellular components, and molecular functions ( Supplementary Fig. S3). Transcription factors (TFs) regulates many morphology and biological processes, hence their identification is of interest in the sequencing data. The transcriptomic data revealed 19,164 putative genes encoding TFs, which were classified into 60 gene families within the Plant TFDB 5.0 (http:// plant tfdb. gao-lab. org) ( Supplementary Fig. S4). The top ten families with the highest representation were the basic/helix-loop-helix (bHLH), a novel MYB-like gene (MYB-related), NAM, ATAF, and CUC (NAC), WRKY, B3-like DNA binding domain (B3), FAR-RED IMPAIRED RESPONSE 1 (FAR1), Cys3His zinc finger domain (C3H), zinc finger sequence CX2-4CX3FX5LX2HX3-5H (C2H2), basic-leucine zipper (bZIP), and Ethylene-responsive factor (ERF) TF families, respectively. The identification of these TFs will allow a better understanding of the regulation of gene expression that underlines the bolting process in A. sinensis.
Differentially expressed genes (DEGs). To evaluate differential gene expression levels in response to early bolting, two groups of bolting plants (BP) and non-bolting (normal) plants (NP) Illumina clean reads were taken to assemble with the SMRT full-length transcriptome (Supplementary Table S3). Fragments per kilobase per million reads (FPKM) values of assembling unigenes were calculated with |log 2 ratio|≥ 1 and P < 0.05. In summary, 475 DEGs between the BP and NP groups were identified, of which 208 genes were up-regulated,  www.nature.com/scientificreports/ while 267 genes were down-regulated between the two groups. The corresponding genes hierarchical clustering thermogram was showed in Supplementary Fig. S5. Functional enrichment analysis was conducted to determine the biological functions of the DEGs. A total of 475 DEGs were classified into 42 functional groups using GO assignments (Fig. 3) as 18 functional groups were involved in biological processes, 14 in cellular components, and 10 are in molecular functions. Within the biological process groups the functional groups with the largest enrichment were "metabolic process" containing 136 DEGs (59.65%), "cellular process" containing 122 DEGs (53.51%), "biological regulation" containing 46 DEGs (20.18%), and "response to stimulus" containing 39 DEGs (17.11%). In the two largest functional groups within "molecular function" processes, 115 DEGs (50.44%) were assigned to "binding", and 113 DEGs (49.56%) were assigned to "catalytic activity". For the "cellular component" domain, approximately 61% of DEGs (138 total) were assigned to "cell part", while 34.65% (79 DEGs), 29.39% (67 DEGs), and 28.95% (66 DEGs) were assigned to "organelle, " "membrane part, " and "membrane", respectively.
Furthermore, 475 DEGs were successfully annotated to 133 KEGG pathways to further characterize the molecular functions and biological pathways. A KEGG scatter plot was shown in Supplementary Fig. S6. In conclusion, these results provide insight into the regulatory elements of A. sinensis, which participate in the early bolting process and will contribute to the decoding of these genes.
qRT-PCR validation of genes related to early bolting in A. sinensis. 16 candidate DEGs that were presumably related to early bolting were randomly selected for qRT-PCR analysis to validate the transcriptome data. The differential expression of each of these 16 genes between BP and NP was consistent with transcrip- www.nature.com/scientificreports/ tome data (Fig. 5), which confirmed the reliability of the gene expression values obtained from SMRT and NGS sequencing.
Endogenous hormone contents in early bolting A. sinensis. Abscisic acid (ABA), cytokinins (CKs), and jasmonic acid (JA) content were found to change significantly in BP. As shown in Fig. 6, ABA content in BP was significantly higher than NP. Also, two active forms of CKs, including kinetin and trans-Zeatin levels have changed dramatically. For instance, kinetin levels in BP are lower than NP, while trans-Zeatin levels are higher than NP. Besides, the synthesis of dihydro-jasmonic acid was significantly decreased in BP.

Discussion
A. sinensis has a long history of use as a traditional herbal medicine in China, however the early bolting of A. sinensis severely restricted its sustainability of resource utilization. Early bolting greatly reduced the accumulation of secondary metabolites contents like ferulic acid and soluble sugar in the roots of A. sinensis 50 , causing a complete loss in its medicinal value. Moreover, the genetic background of A. sinensis is still unclear, which further limits research on its cultivar improvement. Recently, high-throughput sequencing technology especially NGS has been widely used to generate large amounts of omics data of medicinal plants, however, a major limitation of NGS is the length of the short reads, which affected the accuracy of sequence assembly 51 . Single-molecule long-read sequencing offers full-length reads that reduce mis assembles of genes with high sequence identity,  www.nature.com/scientificreports/ greatly improving the accuracy of de novo transcriptome assembly 52 . Therefore, a hybrid sequencing approach combining both short and long-read sequencing technologies provides high-quality and more accurate assemblies for transcriptomic studies in non-model species 53,54 . In the present study, a valid utility was demonstrated based on a hybrid SMRT and NGS sequencing approach for determining the early bolting molecular mechanism of the alpine perennial medicinal plant A. sinensis. Overall, the research outcomes increase the understanding of early bolting in A. sinensis at the molecular level, and also provides complete transcriptome resource for A. sinensis. In future, this knowledge could be applied in the selection of high bolting-tolerant germplasm resources and molecular breeding of A. sinensis to develop bolting-tolerant A. sinensis varieties for the traditional Chinese medicine market.  Table S4), such as IAA32, AUX22, SAUR21, and the auxin response transcription activator, SHI1 were up-regulated in BP; concurrently, the negative regulator of auxin response GH3.1 was down-regulated [19][20][21][22] . Previous studies have revealed that auxin and its corresponding receptors are necessary for the initiation of flowering and floral organ identity 23,24 . It is worth noting that SHI regulates flowering time and promotes pistil development 25 , signifying its key role in promoting early bolting. Four ethylene response transcription factors (ERFs), namely RAV1, ERF4, ERF99, and RAP2-7 were down-regulated in BP. Moreover, it was earlier reported that ERFs are involved in the regulation of Arabidopsis bolting 26 . Down-regulation of RAV1 in Arabidopsis leads to an early flowering phenotype 27 . RAP2-7 negatively regulates the transition from vegetative to reproductive growth, results in a delay in flowering time 28 . The down-regulated expression of RAV1 and RAP2-7 in BP is therefore consistent with its early bolting phenotype. In the JA signaling pathway, the transcription factor MYC2 29 was downregulated in BP. MYC2 is a member of the basic helix-loop-helix transcription factor family and is a high-level transcription regulatory element in the JA signaling pathway and has been shown to participate in JA-mediated flowering inhibition in Arabidopsis 30,31 . Finally, GA signal response gene GASA11, which putatively contribute to hormone-regulated flowering was also down-regulated in BP 32 .
To validate the transcriptome analysis and further explore the effect of hormones on early bolting and flowering of A. sinensis, a total of 24 hormones in BP and NP were identified. UHPLC results showed the level of endogenous hormones, including ABA, JA, and CKs in BP were significantly changed. Genes responsible for the biosynthesis/metabolism of these endogenous hormones in BP, including ABA synthesis genes NCED1 33 , metabolic genes ABAH2 and ABAH4 34 , and JA synthesis genes 4CLL1 35 were up-regulated, whereas CTK synthesis genes ZOG and CKX7 were down-regulated 36 . Altogether, genes involved in multiple hormones signaling or biosynthesis/metabolism pathways regulate early bolting of A. sinensis were differentially expressed between the two phenotypes, suggesting early bolting was simultaneously controlled by multiple hormones. These findings also revealed that bolting and flowering in A. sinensis were regulated by the complex genetic network.
Members of the square promoter binding protein-like (SPL) family of transcription factors, including SPL5, SPL6, SPL8, and SPL14 were also up-regulated at different levels in BP, hence attracted our special attention. SPL genes regulate flowering through the photoperiod pathway and can directly activate specific genes like LFY, FUL, and AP1 to further promote flowering through the aging pathway, which is dependent on endogenous miRNA level (miR156 and miR172) 37,38 . The relationship between environment and bolting as well as genes that are directly or indirectly involved in this regulatory network, including SPL, warrant further investigation to determine their role in A. sinensis early bolting.
We noticed that a large number of genes related to pollen formation were up-regulated in BP, including MYB35(TDF1), TKPR1 and TKPR2, QRT3, LAT52, CYP704B1, CYP703A2, and CYP86A22 ( Fig. 4; Supplementary Table S4). MYB35, an R2R3 MYB transcription factor was previously identified in Arabidopsis as a putative transcription factor regulator of tapetal development and function 39 . The Arabidopsis CYP450 family, TKPR1 and TKPR2 are conserved genes in land plants that control the production of sporopollenin, a major constituent of the exine of pollen 40 . Whereas CYP704B1 and CYP703A2, CYP86A22, QRT3, and LAT52 are all key factors in both pollen and anther development [41][42][43][44] , TKPR1 and TKPR2 are associated with the early stages of anther development. The up-regulated expression of multiple genes related to pollen formation suggests that pollen production is initiated in anticipation of the plant transaction from its vegetative phase to its reproductive phase.
Interestingly, in the present study genes involved in the regulation of the cuticle and epidermal wax production are relatively up-regulated. The expression of the ethylene response factor WIN1, PAS2, HACD2, CER1 and CER26, KCS5, KCS6, KCS10, GPAT4, and GPAT6 were up-regulated in BP ( Fig. 4; Supplementary Table S4). The majority of these genes are involved in the biosynthesis of very-long-chain fatty acids (VLCFA) [45][46][47] . VLCFAs are direct precursors of wax compounds that are synthesized in the epidermis. They are essential for plant development, and have been reported for their involvement in cellular communications, mainly in pollen-stigma interactions 48 . Recent studies have shown that VLCFAs may regulate cell proliferation in the Arabidopsis shoot apex 49 . These genes were assigned to six main KEGG pathways which centered around fatty acid biosynthesis www.nature.com/scientificreports/ and metabolism ( Supplementary Fig. S6) 55 . The up-regulation of genes that promoted VLCFAs synthesis in accordance with those genes related to pollen development suggests an imminent transition to flowering in BP.
In another words, this may also accelerate the transformation of vegetative to reproductive growth phase, and consequently triggered early bolting. Further investigation on the functions of VLCFAs related to flowering may identify useful targets for the development of slow bolting A. sinensis varieties. The expression of genes controlling cell differentiation was also differentially expressed in the two genotypes, namely KANADI2 (KAN2) and HOTHEAD (HTH) (Fig. 4; Supplementary Table S4). The functions of KAN2 and HTH were reported in determining the fate of paraxial stem cells, and in maintaining the activity of the shoot apical meristem 56,57 . The stems of kan2 mutants failed to elongate during flowering, which was consistent with the phenotype of BP in Min County. Plants require additional energy when entering the reproductive growth phase to sustain reproduction, and therefore genes related to energy acquisition may be up-regulated. Indeed, genes associated with energy transport, ubiquitination, and enzyme-mediated reactions were found to be upregulated in BP, including ABC transporter G family member25 (ABCG25), GDSL esterase/lipase At5g03810 (GDL72), and E3 ubiquitin-protein ligase complex encoding gene BT4 58,59 .
Various signaling factors of bolting induction pathways regulated the expression of a group of meristemspecific genes that determines the growth characteristics of A. sinensis. A model of DEGs involved in early bolting and their potential interactions is presented in Fig. 7. Overall, our findings offer comprehensive information on A. sinensis transcriptome analysis, which could be further used to develop early bolting resistance varieties, and subsequently supports a higher yield of medicinally important A. sinensis roots.   Iso-Seq data processing with standard bioinformatics pipeline. Polymerase reads (raw sequencing data) were processed using the SMRTlink software (v7.0) with the following parameters: -min Length 50, -max Length 15,000, -min Passes 1 61 . Reads of Insert (ROI) from polymerase reads after the detection and removal of polyA tail by cDNA primers were separated into full-length, non-chimeric reads, and non-full length reads. Full-length non-chimeric ROIs were clustered and assembled into consensus isoforms by ICE (isoform-level clustering algorithm). Finally, high quality isoforms HQ (above 99% accuracy) and low-quality isoforms LQ were obtained after polished by Quiver.

Methods
Functional annotation and classification. Gene expression levels quantification. NGS data were compared with the reference sequence using the software package RSEM (v1.3.1). We compare the read-count of each isoform in each sample, and convert the FPKM value to obtain the expression level of each isoform. FPKM considering both the depth of sequencing and the influence of isoform length on fragments is a general method for estimating isoforms expression level.

Analysis of differentially expressed genes (DEGs).
The DEGSeq R package (v1.28.0) was used to identify DEGs between the BP and NP samples 62 . Clustering patterns of DEGs between BP and NP were determined by the Euclidean distance cluster analysis method, and heatmaps were drawn by pheatmap R package (v1.0.12) 63 . Genes with an absolute log2 (BP/NP) value ≥ 1 and P values ≤ 0.05 were identified as significant DEGs. GO functional enrichment analysis of the DEGs was performed using the GOseq R package (v1.26.0) based on the Wallenius non-central hyper-geometric distribution 64 . The software KOBAS (v2.0) was used to test the statistical enrichment of DEGs in KEGG pathways 65 . After multiple testing corrections, a KEGG scatter plot was drawn by adjusted P value ≤ 0.05.

Validation of DEGs using qRT-PCR.
Quantitative RT-PCR (qRT-PCR) was used to validate 16 candidate DEGs associated with early bolting. The AsACTIN gene was used as an internal control 66 . Three technical repeats were used for each gene, and the data shown are representative of three independent experiments. RNA for validation is based on previous sequencing samples. All reactions were performed with the CFX96 Real-Time PCR System (Bio-Rad, CA, USA) using HiScript® II Q RT SuperMix for qPCR (+ gDNA wiper) (Vazyme, Nanjing, China). The primers sequences can be found in Supplementary Table S5.
Determination of endogenous hormone levels. The endogenous hormonal levels of A. sinensis were determined by BIOTREE Biotechnology Co., Ltd in Shanghai. The young developing leaves from actively growing shoots of A. sinensis before bolting were grinded into powder in liquid nitrogen and each sample was precisely weighed for 20 mg aliquot, then put into extract solution (50% acetonitrile in water, precooled at − 40 °C, containing isotopically-labelled internal standard mixture), thereafter further purified with SPE 67 . The purified product was then subjected to ultra-high performance liquid chromatography-tandem mass spectrometry (UHPLC-MS/MS) analysis. The UHPLC separation was carried out using an EXIONLC System (Sciex), equipped with a Waters ACQUITY UPLC CSH C18 column (150 × 2.1 mm, 1.7 μm, Waters). Mobile phase A contains 0.01% formic acid in water, and the mobile phase B was 0.01% formic acid in acetonitrile. The column temperature was set at 50℃ and the auto-sampler temperature was set at 4 °C. The injection volume was 5 μL. A SCIEX 6500 QTRAP + triple quadrupole mass spectrometer (Sciex), equipped with an IonDrive Turbo V electrospray ionization (ESI) interface was applied for assay development. Typical ion source parameters were: