Introduction

Strawberry (Fragaria) is considered one of the most economically important soft fruits in the world. Fragaria belongs to the family Rosaceae, which contains Duchesnea and Potentilla as close relatives1,2. This genus consists of approximately 24 species representing different ploidy levels, in which the wild members are mainly distributed in the Northern Hemisphere as well as western South America3. Fruit development and ripening of Fragaria are closely correlated with its economic value. Among other factors, plant hormones have crucial roles in regulating these processes. Numerous advances have been made in understanding the impact of auxin and gibberellins (GAs) on Fragaria fruit development and ripening4,5,6. It has been confirmed that the interaction between auxin and GA signaling pathways is essential for the promotion of fruit set and the early stage of fruit development6. The levels of indole-3-acetic acid (IAA) and bioactive GAs rose rapidly during the early phases of fruit development and then declined as ripening progressed, whereas abscisic acid (ABA) showed the opposite pattern in strawberry7. Auxin is already known to have an important impact on fruit growth and size as well as anthocyanin synthesis and, hence, the fruit ripening process8,9. Transcriptome profiling further revealed that auxin suppressed anthocyanin biosynthesis by downregulating genes involved in anthocyanin and flavonoid synthesis and transportation into the vacuole. In contrast, genes related to the auxin signaling pathway were upregulated, including Aux/IAA and auxin response factor (ARF)10. Recent results reveal that the upregulation of the FveCYP707A4a gene by GAs and auxin is a key cross-talk signal in the early fruit development stage to repress the elevation of ABA, which mainly acts in the ripening phase11. Furthermore, differentially expressed genes encoding proteins for phytohormones at various developmental stages provide evidence that other hormones, such as cytokinin, brassinosteroid (BR), and ethylene, are also involved in the fruit development and ripening of Fragaria12. Physiological and molecular evidence suggests that BR is associated with the early fruit development and ripening of Fragaria13. In addition, silencing of a 9-cis-epoxycarotenoid dioxygenase gene (FaNCED1), which functions in ABA biosynthesis, resulted in a significant decrease in ABA level and uncolored fruits14.

Fruit ripening is a complex biological process involving sugar accumulation, cell wall hydrolysis, flavor and aroma production as well as color changes15. Color change, which can be directly observed, is often used as a marker for ripening. Anthocyanins are the most prominent pigments in strawberry that not only contribute to fruit color modification but also have potential health benefits, which subsequently affect the aesthetic and commercial value of strawberry. It has been revealed that anthocyanins, flavonols, proanthocyanidins (PAs), and lignin are all derived from the phenylpropanoid pathway16. Moreover, miscellaneous enzyme families, transport proteins as well as transcription factors (TFs) construct complex and intricate biological networks that regulate anthocyanin synthesis. Some flavonoid biosynthetic pathway genes, including cinnamate 4-hydroxylase (C4H), chalcone synthase (CHS), chalcone isomerase (CHI), flavanone 3-hydroxylase (F3H), dihydroflavonol-4-reductase (DFR), and anthocyanidin synthase (ANS), showed lower expression abundances in yellow strawberry fruits than in red strawberry fruits, which were coordinated by genes encoding TFs, such as the MYB, bHLH, WD40, MADS-box, and WRKY families17,18. The members of the MYB family were previously identified and shown to play critical roles in the regulation of red pigment synthesis in F.×ananassa and F. vesca19,20. One SNP causing an amino acid change in the FveMYB10 gene was identified in red and yellow F. vesca accessions and further confirmed to be responsible for the yellow color of fruits20. In addition, pathogenesis-related protein family 10 (PR-10) has received considerable attention for its potential to control anthocyanin formation21,22. Fra a allergen, one member of the PR10 family, was directly linked to flavonoid biosynthesis and had an essential biological function in strawberry fruit color formation22.

Non-coding RNAs with a length longer than 200 nucleotides are defined as long noncoding RNAs (lncRNAs). They are involved in chromatin modification, epigenetic regulation, genomic imprinting, transcriptional control as well as pre- and post-translational mRNA processing23,24. A large number of lncRNAs have been identified and characterized in plants. It has been revealed that lncRNAs play widespread roles in diverse biological processes in plants25,26. The function of COOLAIR and COLDAIR has been studied thoroughly in regulating flowering in Arabidopsis. The floral repressor flowering locus C (FLC) is repressed by COOLAIR and COLDAIR during the process of vernalization27,28. The intergenic lncRNA LDMAR (long-day-specific-male fertility-associated RNA) is required for normal pollen development in rice under long-day conditions and regulated by Psi-LDMAR through the RNA-dependent DNA methylation (RdDM) pathway26,29. Furthermore, a number of lncRNAs have been reported to participate in stress responses, such as Mt4 in Medicago truncatula30, IPS1 (induced by phosphate starvation 1) and At4 in A. thaliana31,32,33, TPSI1 (tomato phosphate starvation-induced gene 1) in Solanum lycopersicum34 and OsPI1 (Oryza sativa phosphate starvation-induced gene 1) in O. sativa35, which were induced by phosphate starvation.

A growing number of reports have revealed that lncRNAs play vital roles in gene regulation and other biological processes in plants. However, the function of lncRNAs in strawberry fruit development and ripening processes, especially color change, is largely unexplored. Thus, we profiled the global gene expression patterns of two typical wild varieties of F. pentaphylla Lozinsk with different fruit colors at three developmental stages using high-throughput transcriptome sequencing. We identified lncRNAs from different samples using de novo assembly and genome-guided assembly. Differential expression of lncRNAs and comparative analysis between two varieties of F. pentaphylla were investigated to reveal the different regulation of key pathways. Furthermore, regulatory networks of the identified lncRNAs, mRNAs and miRNAs were constructed to elucidate the potential regulatory processes of lncRNAs. This study provides valuable molecular information for insight into the function of lncRNAs in F. pentaphylla during fruit development and ripening.

Materials and methods

F. pentaphylla fruit collection, RNA extraction, library construction, and sequencing

Two typical wild varieties of F. pentaphylla with different fruit colors (FPR, red fruit; FPW, white fruit) were collected from the field in Sichuan Province, China. Three stages of fruit development and ripening were distinguished based on the weight and color of the fruit, which can be separated as the green fruit stage, turning stage and ripe fruit stage (Fig. 1a). Three biological replicates of each fruit sample were immediately flash-frozen in liquid nitrogen after collection in the field and stored at −80 °C for later use. Total RNAs were extracted from the collected fruit samples using an RNeasy Plant Mini Kit (Qiagen, USA) according to the manufacturer’s instructions and then subjected to RNA-seq sequencing and qRT-PCR verification. The RNA quality and quantity were determined using agarose gel electrophoresis and an Agilent 2100 Bioanalyzer (Agilent, USA). A total of 18 RNA samples (FPR and FPW at three stages, each with three biological replicates) were submitted to Chengdu Life Baseline Technology Co., Ltd. (Chengdu, China) for strand-specific library preparation and sequencing. Sequencing of the RNA-Seq libraries was performed on an Illumina HiSeq 4000 platform, producing 150 bp paired-end reads. Sequence data have been uploaded to the NCBI Short Read Archive with accession number SRP114679.

Fig. 1: Transcriptome profiles of two typical wild varieties of F. pentaphylla during fruit development and ripening.
figure 1

a Color change in the fruit of two typical wild varieties of F. pentaphylla during fruit development and ripening. F. pentaphylla fruit development was divided into three stages: green fruit stage, turning stage, and ripe fruit stage. b Venn diagram of differentially expressed genes and lncRNAs during fruit development and ripening. The number of differentially expressed lncRNAs is indicated by bracket. c GO enrichment analysis of differentially expressed protein-coding genes in FPW during fruit development and ripening. The x-axis corresponds to GO terms, and the y-axis shows the log2 fold change of differentially expressed protein-coding genes. The color and size of the dot represent the enrichment degree and the number of differentially expressed protein-coding genes, respectively

Sequence processing, alignment, and transcript assembly

To obtain high-quality clean reads, the assessment of sequencing quality and read processing were performed by fastQC (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) and FASTX-Toolkit (http://hannonlab.cshl.edu/fastx_toolkit/). The low-quality reads were removed using the same filtering criteria as described previously36. The clean reads were aligned to the F. vesca reference genome37 using HISAT2 (ref. 38) with default parameters. Stringtie38 was used to assemble transcripts. Additionally, all unmapped reads derived from the above libraries were assembled using Trinity39 under default parameters to obtain comprehensive information of the F. pentaphylla transcriptome. Novel transcripts obtained by Stringtie and Trinity were prefixed with “SN” and “TN”, respectively. Transcriptome assembly quality was assessed by examining the RNA-Seq read representation of the assembly and searching the assembled transcripts against the database of known protein sequences. All assembled transcripts were filtered by CD-HIT40 (similarity > 95%) to generate a non-redundant set of transcripts. The remaining transcripts were further filtered based on expression levels. The transcripts that had an average FPKM > 1 and occurred in two or more samples were selected for further analysis. The open reading frame (ORF) of the transcript was extracted using TransDecode39 with default settings.

Identification of lncRNAs and target prediction in F. pentaphylla during fruit development and ripening

To distinguish the lncRNAs from protein-coding transcripts, a strict filtering strategy was developed based on the criteria currently used to identify lncRNAs, comprising the following steps: (i) transcripts with lengths less than 200 bp or ORF lengths greater than 100 amino acids were excluded. (ii) Transcripts with sequence homology to existing proteins were eliminated by searching against the non-redundant protein database (NR) and the SwissProt protein database using BLASTx. (iii) The coding potential of the transcript was calculated by CPC41 and PLEK42 to remove transcripts with potential coding ability. (iv) Other classes of noncoding RNAs were removed by searching against multiple databases, including the Rfam database (http://rfam.xfam.org/), Dfam database (http://www.dfam.org/) and rRNA database (http://ssu-rrna.org/). After filtering by the above steps, the remaining transcripts were considered putative lncRNAs.

To understand the biological function of lncRNAs, the target genes were predicted based on cis-acting and trans-acting modes42. The sliding window strategy was used to search cis-acting target genes within 10 kb upstream and downstream of lncRNAs. To identify lncRNAs that may act in trans-acting mode, the expression correlations between lncRNAs and all protein-coding genes were calculated. LncRNAs with a trans-acting mode were recognized if Pearson’s correlation coefficient (r) > 0.9.

Functional annotation of protein-coding genes

The protein-coding genes were compared with sequences in the NR and SwissProt protein databases to assign potential functions to genes. The programs Blast2GO43 and InterproScan44 were used to obtain GO annotations and identify protein domains, respectively. GO enrichment analysis was performed using GOseq with the following terms: biological process, molecular function, and cellular component. The metabolic pathways were predicted using KAAS45. KOBAS3.0 (ref. 46) was used for pathway enrichment analysis. Finally, genome-wide predictions of TFs in F. pentaphylla were performed on PlantTFDB47.

Abundance estimation and differential expression analysis

The expression abundance of all genes and transcripts was estimated by RSEM48. Differentially expressed genes in each pairwise comparison were identified by DESeq2 (ref. 49). The absolute value of log2 ratio > 1 and the adjusted p-value < 0.05 were used as the thresholds to determine genes with significant expression changes. Correlation analysis among biological replicates was performed by the function corrplot in R. Expression patterns of differentially expressed genes were also analyzed using K-means and hierarchical clustering.

Regulatory network construction in F. pentaphylla during fruit development and ripening

To gain insight into the regulatory mechanism of F. pentaphylla fruit development and ripening, the mRNA–lncRNA–miRNA regulatory network was constructed based on interactions and target relationships. Protein–protein interactions were analyzed by STRING50, and a confidence score ≥ 0.7 was set as the cut-off criterion for highly confident interactions. The miRNA sequences of Fragaria were widely determined in previous studies51,52,53. The target relationships of miRNA–mRNA and miRNA–lncRNA were predicted using psRNATarget54 with default parameters. The weighted gene co-expression network analysis (WGCNA) package was then used to build a weighted gene correlation network as described in the previous publications55. For the correlation analysis between the WGCNA modules and the traits, fruit colors were manually scored as discrete numerical numbers: 1, green; 2, white; 3, red, referring to Duan et al.56 Cytoscape57 was used to integrate and visualize the regulatory network.

Validation of RNA-seq data by quantitative RT-PCR (qRT-PCR)

Ten lncRNAs and protein-coding genes were randomly selected from the present study. The cDNA synthesis, primer design (listed in Supplementary Table S1) and execution of qRT-PCR were conducted following a previous study58. qRT-PCR was performed on an ABI 7500 Real-Time System with three technical replicates. Statistical analysis of qRT-PCR data used the 2−ΔΔCt method59, and the actin gene FaACT (GenBank accession LC017712) was used as the internal reference to normalize expression levels.

Results and discussion

Transcriptome profiling and genome-wide identification of lncRNA in F. pentaphylla fruit

Two typical wild varieties of F. pentaphylla with a significant difference in fruit color were found by field collection. FPR fruit showed a marked change in color from the turning stage to the ripe stage, while anthocyanins were abundant in the ripe stage. Conversely, no significant color variation in FPW fruit was observed at the late developmental stage (Fig. 1a). To investigate the underlying molecular changes during fruit development and ripening in F. pentaphylla, we used strand-specific sequencing to explore the expression profiles of protein-coding genes and lncRNAs for each variety at three developmental stages. After removing low-quality reads and filtering contaminant rRNA reads, a total of 186 GB high-quality clean reads were obtained from 18 samples, with 10.33 GB per sample on average (Supplementary Table S2). A relatively low rate of clean reads (61.31% on average) was mapped to the F. vesca reference genome, suggesting a large genomic difference between F. pentaphylla and F. vesca. Genome-guided transcriptome assembly was carried out based on uniquely mapped reads. Through the above analysis, 66,323 transcripts derived from 33,443 genes were obtained, of which 38,381 novel transcripts were detected on the basis of mapped reads. Furthermore, the unmapped reads were used to construct the full-length transcripts via de novo assembly. The two assembled transcript sets as described above were reduced to 279,129 transcripts after removing redundancy and low-expression transcripts. The remaining transcripts constituted a comprehensive transcript set and were used in subsequent analyses. The number of transcripts in F. pentaphylla was greater than that in the diploid strawberry model plant F. vesca (139,997)60, whereas it was close to half of the number of transcripts in F.×ananassa (octoploid)61. The relatively large difference in the number of transcripts between F. pentaphylla and F. vesca might be due mainly to genomic differences and alternative splicing. The total length of the transcripts was 207,450,409 bp, and the average length of transcripts was approximately 750 bp (Supplementary Figure S1). Among them, 56,003 transcripts had lengths longer than 1000 bp, representing 20.06% of the total transcripts (Supplementary Figure S1). The correlation analysis and dendrogram clustering illustrated the global relative relationships among the 18 samples (Supplementary Figures S2 and S3). All biological replicates clustered together with high correlation coefficients (r > 0.94, Supplementary Figure S2).

To predict potential lncRNAs in F. pentaphylla fruit, the comprehensive set of transcripts was then subjected to a specific pipeline to identify lncRNAs (see Materials and methods). As a result, 25,699 transcripts from 25,107 loci were identified as putative lncRNAs (Supplementary Figure S4a). The length of lncRNAs ranged from 201 to 2992 bp with an average length of 313 bp (Supplementary Figure S4b), which is consistent with previous analyses in plants62,63,64. In addition, lncRNAs showed lower expression levels and shorter transcript lengths than protein-coding genes (Supplementary Figure S4c). A Venn diagram revealed that only 5786 lncRNAs were expressed among all fruit samples (Supplementary Figure S4d), and a much larger proportion of lncRNAs were expressed in genotype- and stage-specific manners (Supplementary Figure S4d).

Differential gene expression during fruit development and ripening in F. pentaphylla

To better understand the dynamic processes of fruit development and ripening, we evaluated the time course of development of F. pentaphylla fruit. We identified 882 and 2625 differentially expressed (DE) protein-coding genes in the comparisons of FPW1 vs. FPW2 and FPW1 vs. FPW3, respectively (Fig. 1b, Supplementary Table S3). Compared with the first fruit development stage, the number of DE protein-coding genes in the late stage was larger than that in the middle stage (Fig. 1b). GO enrichment analysis revealed that these DE genes in FPW were significantly enriched in important primary and secondary metabolisms, such as cell wall organization or biogenesis, photosynthesis, and response to hormone and phenylpropanoid catabolic processes (Fig. 1c). It is noteworthy that the vast majority of genes related to cell wall biogenesis and photosynthesis were downregulated in FPW2 and FPW3 compared to FPW1 (Fig. 1c). In accordance with GO enrichment analysis, the results of pathway enrichment analysis also revealed that the DE protein-coding genes were mainly involved in photosynthesis, plant hormone signal transduction and phenylpropanoid biosynthesis (Supplementary Figure S5). Furthermore, some differentially expressed genes were involved in the pigment biosynthesis pathway, including flavonoid biosynthesis (10 protein-coding genes) and carotenoid biosynthesis (13 protein-coding genes, Supplementary Figure S5). A larger number of DE genes was detected in the comparisons of FPR1 vs. FPR2 (1744) and FPR1 vs. FPR3 (3991), which reflected more intensive developmental changes in FPR (Fig. 1b). Functional annotation revealed that genes associated with photosynthesis were predominantly downregulated (Supplementary Figure S6), which was consistent with that in FPW. The development and ripening of strawberry fruit are accompanied by massive and coordinated changes in metabolic and physiological traits. Photosynthetic changes of developing strawberry fruit were different from those of the strawberry leaves, reflecting the scarcity of stomata, CO2 efflux, scarcity of chlorophyll, and closer chlorophyll a:b ratio65. Immunohistochemical and transcriptional studies revealed that the abundance of several genes involved in photosynthesis declined during fruit development in grape berry66,67. In this study, many protein-coding genes related to photosynthesis were repressed during fruit development and ripening (Fig. 1c). As expected, the protein-coding genes related to chlorophyll binding were also downregulated along with the fading of the green color of fruit (Fig. 1c). Numerous publications reported that changes in the cell wall occurred in strawberry during ripening68,69,70, which was consistent with what we observed in this study (Fig. 1c).

To further explore the role of lncRNAs in fruit development and ripening of F. pentaphylla, we investigated the lncRNA expression patterns and predicted their potential functions. As shown in Fig. 1b, more DE lncRNAs were detected in FPR when comparing FPR2 and FPR3 to FPR1, consisting of 42 and 128 DE lncRNAs. Moreover, 20 and 81 lncRNAs showed differential expression in the comparisons of FWR1 vs. FWR2 and FWR1 vs. FWR3, respectively. It is noteworthy that the number of DE lncRNAs was remarkably larger in comparison to the ripe fruit stage and green fruit stage, indicating that the late developmental stage is more complex. The function of lncRNAs in regulating fruit development and ripening has been elucidated62,71. Silencing of two lncRNAs in tomato resulted in an obvious delay of fruit ripening71. To predict the biological function of the lncRNAs, we identified lncRNA target genes and assigned them to different types of functional terms. This process identified a total of 306 DE protein-coding genes comediated by DE lncRNAs in fruit development and ripening, which were involved in important biological processes, such as photosynthesis, starch and sucrose metabolism, and diterpenoid biosynthesis (Supplementary Table S4). A considerable set of studies have demonstrated dramatic changes in the cell wall during fruit development and ripening, which directly affect fruit softening and edibility15. Fruit-specific rhamnogalacturonate lyase 1 (FaRGLyase1) has been confirmed to play an important role in the fruit ripening-related softening process, which is involved in the degradation of cell-wall middle lamellae and ultimately reduces strawberry firmness72. In the present study, SN.315 encoding rhamnogalacturonate lyase was significantly induced at the ripe fruit stage in both of the F. pentaphylla varieties, whereas no lncRNA was determined to target this gene (Supplementary Table S3). Intriguingly, 47 target genes related to cell wall organization or biogenesis were co-mediated by DE lncRNAs in FPR and FPW. Of these genes, 42 (89.3%) were repressed in the late developmental stage (Supplementary Table S4). The above results suggested that lncRNAs may be involved mainly in the control of cell wall changes via repressing the expression of target genes related to cell wall organization or biogenesis. Photosynthetic pathways were also found to be repressed during fruit ripening73. In the present study, 32 target genes annotated as being related to photosynthesis were downregulated in the late developmental stage. Fifteen of these genes belong to the chlorophyll a-b binding protein family, and two of these genes encode a ribulose bisphosphate carboxylase small chain (Supplementary Table S4). Taken together, lncRNAs may act as negative regulators of photosynthesis and cell wall organization or biogenesis during fruit development and ripening in F. pentaphylla.

Increasing evidence has suggested that multiple hormones coordinate and interact with each other, forming a complex network to control early stages of fruit development and late stages of ripening. Although the entire molecular mechanisms that coordinate each hormone and the regulation of this network are largely unknown, existing investigations show that genes related to the biosynthesis and signaling of auxin and GA displayed dramatic changes through fruit development and ripening. Examples include genes related to auxin biosynthesis such as YUCCA (YUC) flavin monooxygenases, tryptophan aminotransferase (TAA), and GA-related genes such as GA receptor (GID), GA 3‐oxidase (GA3ox), and GA 2-oxidase (GA2ox)5,6. Seven YUC genes and five TAA genes exhibited different expression patterns during fruit development and ripening in both F. pentaphylla accessions (Supplementary Table S3), which may directly control auxin levels in F. pentaphylla fruits, thus permitting ripening. GID, GA3ox, and GA2ox are crucial components of the GA signaling pathway. Three transcripts encoding GID (TN118649_c0_g1, SN.8344 and SN.20031) were differentially expressed during fruit development and ripening only in the FPR accessions. In contrast, two transcripts encoding GA3ox (TN117107_c0_g1 and SN.6207) and one for GA2ox (TN131209_c0_g1) altered transcription abundance during fruit development and ripening in both F. pentaphylla accessions (Supplementary Table S3). Interestingly, the expression levels of FpGA3ox showed a progressive decline during fruit development and ripening in both accessions, whereas FpGA2ox was expressed at relatively high levels in the latter developmental stages (Supplementary Table S3). This finding is consistent with previous observations showing that GA3ox and GA2ox genes showed opposite expression patterns during the fruit growth and ripening period, reflecting a strategy used by the plant to fine-tune its control of bioactive GA levels5. Existing investigations show that the ABA/IAA ratio serves as a part of the signal to trigger fruit ripening11. Disruption of 9-cis-epoxycarotenoid dioxygenase (NCED1), which encodes a rate-limiting enzyme involved in ABA biosynthesis, resulted in a significant decrease in ABA levels and uncolored fruits in strawberry14. Six genes encoding NCED1 were upregulated during fruit development and ripening in both FPW and FPR (Supplementary Table S3), which was in line with previous reports in F. vesca11. Among these genes, SN.7771, which shows high similarity to FaNCED1 (JX013944.1), may be the key player involved in the magnitude of higher transcriptional activity than others and the rapid elevated response from the green fruit stage to the ripe fruit stage in FPR (Supplementary Table S3). It is noteworthy that five genes related to auxin and GA signaling pathways were targeted by lncRNAs encoding TAA (SN.16092) and GA3ox (gene06004-v1.0-hybrid, TN117107_c0_g1, SN.6206, and SN.6207), respectively (Supplementary Table S4). All these target genes were downregulated in the ripening process, while the lncRNA genes were upregulated. These data provide evidence that auxin, GA and ABA have prominent roles in fruit development and ripening in F. pentaphylla. Additionally, lncRNAs could alter the expression of key players related to hormone biosynthesis and be involved in the regulation of fruit development and ripening in F. pentaphylla.

Identification and characterization of genes related to color changes in F. pentaphylla fruits

To investigate the differential regulation of key pathways between FPR and FPW, we compared gene expression patterns in FPR and FPW at three stages. In total, we identified 5621 DE genes from all comparisons (Supplementary Figure S7). Of these, 1416 protein-coding genes showed commonly differential expression in all comparisons (Supplementary Figure S7). Moreover, 2474 genes showed distinctive expression profiles in only one comparison (Supplementary Figure S7). DE genes were related to plant–pathogen interactions, phenylpropanoid biosynthesis and photosynthesis (Supplementary Figure S8). GO enrichment analysis identified 60 terms that were significantly enriched among DE genes, in which the three most enriched terms in the biological process category were “DNA integration”, “oxidation–reduction process” and “aminoglycan catabolic process” (Supplementary Figure S9). The high auxin concentration significantly inhibited the expression of anthocyanin structural genes and regulatory genes in callus cultures of red-fleshed apples, such as MYB and bHLH9. The auxin signaling pathway has been well studied and involves a cascade mediated through the SCFTIR1/AFB-Aux/IAA-ARF nuclear signaling module74. Among 29 DE genes involved in the auxin signaling pathway, seven genes encoding ARF, AUX/IAA, and SCF complex subunit showed distinct expression dynamics in FPR and FPW across three developmental stages (Supplementary Table S5). However, no significantly different expression was observed in these genes during the process of fruit color change, except for SN.5270, which was downregulated in FPR3 compared to FPR1 (Supplementary Tables S3 and S5). These results implied that the differential regulation of the auxin signaling pathway may not be the main factor contributing to the difference in fruit color in F. pentaphylla varieties. Additionally, transcription factors bHLH (13), MYB (12), WRKY (10) and MADS (3) were detected as significantly differentially expressed between FPR and FPW (Supplementary Table S5) and have been found to play roles in anthocyanin biosynthesis17,18. Of these, seven bHLH and six MYB transcription factors showed stage-specific differential expression in the ripe fruit stage, suggesting that these transcription factors may influence fruit reddening in F. pentaphylla (Supplementary Table S5).

The MBW complex consisting of MYB, bHLH, and WD-repeat proteins has been shown to play critical roles in regulating anthocyanin biosynthesis by controlling the expression of the anthocyanin pathway structural genes in plants75,76. Among them, FaMYB10 was previously identified and shown to play critical roles in regulating the red pigment in the fruit receptacle, which was repressed by auxin and stimulated by ABA19. Furthermore, a specific SNP (G to C, leading to W12S variant) in FveMYB10 was identified and then functionally confirmed to be responsible for causing the yellow fruit color in the woodland strawberry20. The conserved tryptophan (W) residue within the MYB protein was shown to contribute to a hydrophobic core in the DNA-binding structure77. Therefore, the W12S variant in the DNA-binding domain likely disrupts the DNA-binding function of MYB in F. vesca20. Transcriptomic analyses revealed that FpMYB10 (SN.2358), being homologous to FveMYB10, was preferentially expressed in FPR (Supplementary Table S5). FpMYB10 harbors W at position 12 and exhibits a Lys (K) to Ile (I) nonsynonymous mutation at position 31 (Supplementary Figure S10). This K31I substitution at a non-conservative site was unlikely to have much of an impact on the function of MYB. A highly homologous transcript TN118211_c0_g2 (95.54% sequence similarities to FpMYB10, Supplementary Figure S11) followed an expression trend similar to FpMYB10 (Supplementary Table S3), which might also be associated with red pigment synthesis. Since a transient functional assay clearly showed that FveMYB10 was sufficient to restore red pigments in yellow fruit accessions of F. vesca20, the relatively lower expression level of FpMYB10 in FPW might be one contributor to the white fruit variety of F. pentaphylla.

We further focused on the comparisons between FPR and FPW at three stages to characterize the specific function of lncRNAs in the color change of F. pentaphylla fruits. In total, 365 lncRNAs showed differential expression in at least one pairwise comparison (Fig. 2a, Supplementary Table S5). Differentially expressed lncRNAs showed stage-specific and genotype-specific expression trends across fruit development and ripening of F. pentaphylla (Fig. 2b). Subcluster 1 and subcluster 2, representing 172 lncRNAs, showed higher expression abundance in FPW than in FPR at all three stages (Fig. 2b). In contrast, the lncRNAs in subcluster 3 and subcluster 4 had higher expression levels in FPR than in FPW during fruit development and ripening. Previous studies have shown that lncRNAs operate not only in cis but also in trans78,79. According to the fundamental differences between these two categories, we used two strategies to identify DE genes that may be potential targets of these lncRNAs (see Materials and methods). In total, we identified 3059 differentially expressed target genes that were regulated by lncRNA via cis-acting and trans-acting modes (Fig. 2c). Functional annotation showed that these DE target genes were mainly associated with defense response, oxidoreductase activity, and ADP binding (Supplementary Figure S12a). They were also involved in diverse biological pathways, such as the biosynthesis of secondary metabolites, plant–pathogen interactions, and starch/sucrose metabolism (Supplementary Figure S12b).

Fig. 2: Identification and characterization of differentially expressed lncRNAs and target genes between FPW and FPR during fruit development and ripening.
figure 2

Three comparisons were made to quantify the differentially expressed lncRNAs and target genes at three stages. a Venn diagram of differentially expressed lncRNAs in FPW vs. FPR. b Expression patterns of differentially expressed lncRNAs in FPW vs. FPR. c Venn diagram of differentially expressed target genes in FPW vs. FPR. d Differential expression analysis of transcription factor (TF) families in comparisons of FPR and FPW. The x-axis represents different target gene sets, corresponding to all differentially expressed TFs (open circles) and stage-specific differentially expressed TFs (solid circles). The size of the circle represents the number of differentially expressed TFs in each TF family, and the color of the solid circle represents the rate of stage-specific differentially expressed TFs among all differentially expressed TFs

To validate the reliability of the RNA-seq data, the expression profiles of ten randomly chosen genes were examined by qRT-PCR. As shown in Supplementary Figure S13, similar expression patterns were observed via qRT-PCR analysis as those identified in the RNA-seq data (Supplementary Figure S13).

The phenylpropanoid biosynthetic pathway genes are responsible for anthocyanin biosynthesis in plants80. In the present study, 30 target genes related to phenylpropanoid biosynthesis were targeted by DE lncRNAs (Table 1), encoding 4-coumarate: CoA ligase (4CL), beta-glucosidase (BGLU), cinnamyl-alcohol dehydrogenase (CAD), peroxidase (POD), caffeic acid 3-O-methyltransferase (COMT), and coniferyl-aldehyde dehydrogenase (CALDH). 4CL, as an important enzyme, plays vital roles in general phenylpropanoid metabolism and catalyzes reactions at a key branching point in the phenylpropanoid pathway81. Three genes encoding 4CL were potentially targeted by a single or multiple lncRNAs and showed significant expression differences between FPW and FPR, ranging from 8-fold to 32-fold, respectively (Table 1). Glycosylation is assumed to improve anthocyanin stability and provide an immense range of color variation in anthocyanins82,83. BGLU is also involved in anthocyanin glycosylation in addition to the well-known UDP-sugar-dependent glycosyltransferases (UGTs)83,84,85. At the ripe stage, we detected three differentially expressed BGLU genes, which were predicted as target genes of DE lncRNAs (Table 1). The competition of anthocyanin biosynthesis and lignin pathways for their common precursors has been demonstrated in strawberry86. Four genes encoding CAD and five genes encoding POD showed significant differences in expression levels between FPW and FPR (Table 1), which were also potentially targeted by DE lncRNAs. Among them, CAD is a crucial enzyme in lignin biosynthesis, and POD is associated with the metabolic interaction between anthocyanin and lignin biosynthesis86,87. The members of the POD family also have a function in anthocyanin degradation88. Based on the functional annotation, we further discovered the target genes of lncRNAs involved in plant hormone signal transduction (Supplementary Figure S14). Twenty-two of the DE lncRNA target genes widely participated in auxin, cytokinin, GA, ABA, BR, and salicylic acid (SA) signal transduction, such as ARF, cytokinin receptor (CRE) and ethylene-responsive transcription factor (ERF) (Supplementary Figure S14). In addition, many transcription factors with different expression abundances in FPW and FPR were potentially targeted by DE lncRNAs, including bHLH (13), MYB (12), ERF (12), WRKY (10), C2H2 (6) and bZIP (5) (Fig. 2d). Among them, members of the MYB, bHLH, MADS, and WRKY families were known regulatory factors of anthocyanin biosynthesis89. Some DE transcription factors specific to the turning stage and ripe fruit stage are presented in Fig. 2d, which may be potential targets of DE lncRNAs and involved in the control of the fading of green fruit color and the reddening of fruit, including MYB, bHLH, Trihelix, WRKY, ERF, C2H2, bZIP, and ARF (Fig. 2d). The above results suggested that the fruit development and color change in F. pentaphylla may be largely regulated by lncRNAs via regulation of the expression of structural genes and regulatory factors involved in phenylpropanoid and anthocyanin biosynthesis.

Table 1 Differentially expressed lncRNAs and target genes in phenylpropanoid pathway modulation

Construction of regulatory network for fruit development and color formation

To obtain a global view of the regulatory mechanism underlying fruit development and color changes in F. pentaphylla, protein-coding genes and lncRNAs were used to construct a gene co-expression network by WGCNA. In total, sixteen co-expression modules were developed, containing 15,289 elements. The Pearson correlation coefficient of each co-expression module with fruit color was calculated by WGCNA and visualized using heat maps (Fig. 3a). Out of 16 modules, module ME9 was strongly positively correlated with fruit color (r = 0.92). Moreover, modules ME10 and ME11 showed a relatively high correlation with fruit color (r = 0.65 and 0.48, respectively). To gain a further understanding of the three positive correlation modules, functional analysis of genes in these modules was conducted. The results indicated that the co-expression modules were associated with specific biological processes or cellular components. The genes in ME9 were overrepresented in the fatty acid metabolic process, carboxylic acid biosynthetic process, organic acid biosynthetic process, monocarboxylic acid biosynthetic process, oxoacid metabolic process, cutin biosynthetic process, fatty acid transport, and response to abscisic acid (Fig. 3b). For the ME10 module, key functional over-representations included the oxidation–reduction process, methionine catabolic process, and steroid metabolic process. Module ME11 was shown to be enriched in biological processes related to plant stress response, including defense response, the phytoalexin biosynthetic process, the camalexin biosynthetic process, the salicylic acid-mediated signaling pathway, and the jasmonic acid mediated signaling pathway. Differences in cell component categories further reflected functional differences between modules (Fig. 3b). In the category of cell component, the genes of ME9 were enriched in the plasma membrane and acetyl-CoA carboxylase complex (Fig. 3b). Nevertheless, the genes of the ME10 module were significantly enriched in the cell wall and cell surface (Fig. 3b).

Fig. 3: Gene co-expression module construction and functional analysis.
figure 3

a Correlations between detected modules and fruit color. The Pearson correlation coefficient is shown with the corresponding p-value in brackets below (left panel). The genes in different co-expression modules have distinct expression profiles (right panel). b Functional analysis of genes in positive correlation modules. Different colors in the block represent the different significance levels of the overrepresentation

Close inspection revealed that genes in the co-expression module correlated with fruit color were significantly enriched in the biological processes of plant–pathogen interactions and stress responses. Multiple pathogenesis-related (PR) proteins showed distinct expression patterns between FPW and FPR, including PR10 proteins. The PR10 family is an important member of the disease-related protein family and has well-characterized functions in biotic and abiotic stress responses and plant growth and development90,91. Previous findings provided strong evidence that the PR10 family Fra a gene is directly linked to anthocyanin formation in strawberry fruit and control of color-producing flavonoid biosynthesis through binding to metabolic intermediates21,92,93. Silencing of Fra a genes led to significantly decreased levels of anthocyanins along with parallel downregulation of phenylalanine ammonia lyase (PAL) and, to a lesser extent, chalcone synthase (CHS)22. In this study, 29 members of the PR10 protein family were identified in F. pentaphylla, all of which contained the conserved Bet_v1-like domain (cd07816, Fig. 4). The members of the PR10 family contained the highly conserved glycine-rich loop sequence EG(D/N)GG(V/P)G(T/S)91, except for TN49653_c0_g2, SN.5876, and TN118165_c0_g1 (Fig. 4). Among these identified genes, SN.13759 was highly homologous to the Fra a member Fra a 1 in F.×ananassa (Fig. 4). The results of cDNA amplification and SNP calling showed that the transcripts of SN.13759 in FPR and FPW were entirely identical. SN.13759 showed differential expression between FPW and FPR fruits, and the expression abundance was higher in FPR. The expression fold changes ranged from 1.45 to 5.12 at the three developmental stages. Notably, the expression level of SN.13759 had a decreasing trend in FPW during fruit development and ripening, which was also observed in F.×ananassa18, whereas it decreased at first and then increased in FPR. Recently, three members of the Fra a proteins, which are essential for anthocyanin biosynthesis and red color formation in fruits, have been well studied in strawberry22. The transient RNAi-mediated silencing of Fra a resulted in significantly decreased levels of anthocyanins and upstream metabolites, which demonstrated a clear link between Fra a expression and anthocyanin formation22. Fra a seems to fine-tune the flavonoid composition by binding to diverse intermediates in flavonoid biosynthesis21. It is worth mentioning that Fra a proteins have different selectivity and affinity when binding natural metabolites in the flavonoid pathway, which might be the result of amino acid variability at certain positions21. Additionally, distinct expression patterns were detected in the members of the Fra a family in F.×ananassa22. The Fra a 1e and Fra a 3 transcription levels decreased from the green fruit stage to the red fruit stage, whereas Fra a 2 was actively transcribed in the late stages of fruit ripening22. Consistent with this observation, other members of the Fra a family showed different variation trends in expression levels during fruit development and ripening of F. pentaphylla, indicating functional differences in the members of the Fra a family.

Fig. 4: Phylogenetic analysis of the PR10 family in F. pentaphylla.
figure 4

a Phylogenetic analysis of PR10 family members in F. pentaphylla. b The highly conserved glycine-rich loop region of PR-10 proteins. c Structural features of PR-10 proteins, including Fra a 1, Fra a 3, and SN.13759

Despite great progress in the elucidation of Fra a function in anthocyanin biosynthesis, our understanding of the Fra a-dependent regulatory network in fruit color formation is very limited. We further intended to determine whether lncRNAs participate in this regulatory network. We attempted to construct a comprehensive regulatory network of fruit development and color formation by analyzing protein–protein interactions and target predictions of miRNA–mRNA and miRNA–lncRNA relationships. Based on the analysis above, a comprehensive regulatory network was established that contained 20,770 protein-coding genes, 3,036 lncRNAs, and 821 miRNAs. Within this network, multiple interaction patterns were recognized, such as one-to-one, one-to-many and many-to-many. We extracted a sub-network from the abovementioned regulatory network to further investigate Fra a-dependent regulatory relationships. A total of 24 elements were identified in this sub-network, which included nineteen protein-coding genes, three lncRNAs and two miRNAs (Fig. 5a). These protein-coding genes were associated with photosynthesis, nitrogen metabolism, and signal transduction (Fig. 5b). Coupled with fading of green and increased reddening in fruits, the expression levels of genes encoding chlorophyll a-b binding protein and ribulose-1,5-bisphosphate carboxylase oxygenase (Rubisco) were continuously downregulated in FPR and FPW fruits (Fig. 5b). Simultaneously, SN.13759 showed a downregulatory expression trend in FPW, whereas it decreased at first and then increased in FPR (Fig. 5b). At present, it is not known whether this differential expression pattern has exactly contributed to the color variation and the mechanisms of the regulation of color variation remain unclear. Interestingly, Fra a expression was upregulated in naturally occurring white-fruited genotypes of F. chiloensis and F. vesca, which was likely to compensate for the low expression levels of PAL and CHS in these mutant genotypes22. Silencing of Fra a reduced FaPAL and FaCHS transcript levels by as much as 60% (ref. 22). In this study, the CHS gene TN122753_c0_g2 was expressed at a much higher level in FPR at the ripe fruit stage than in FPW (Supplementary Table S5). It can be inferred that the deficiency of this first key enzyme leading to anthocyanin biosynthesis could contribute to colorless fruits. However, the function of the observed upregulation of SN.13759 at the ripe fruit stage in FPR and its function related to the observed upregulation of TN122753_c0_g2 CHS gene might be of particular interest in future studies. Moreover, three lncRNAs (TN126539_c0_g1, TN131414_c0_g9, and SN.15536) targeted multiple genes in this sub-network, again indicating that lncRNAs play roles in fruit color change in F. pentaphylla. Moreover, we identified one uncharacterized gene that was involved in the above process (Fig. 5b). Further studies focused on the functions of these genes would greatly help to elucidate the molecular regulatory mechanisms of fruit development and color formation in F. pentaphylla.

Fig. 5: Fra a-dependent regulatory network in F. pentaphylla.
figure 5

a Fra a-dependent regulatory network in F. pentaphylla. The red, green, and black nodes denote lncRNAs, miRNAs, and protein-coding genes, respectively. The dashed arrow line denotes a target relationship between an miRNA and a protein-coding gene or lncRNA; the dashed line represents the protein–protein interaction; the solid line represents the co-expression relationship. b Heat map of protein-coding genes in the Fra a-dependent regulatory network