Revealing the developmental dynamics in male strobilus transcriptome of Gnetum luofuense using nanopore sequencing technology

Gnetum is a pantropical distributed gymnosperm genus. As being dioecious, Gnetum species apply female and male strobili to attract and provide nutrition to insect pollinators. Due to its unique gross morphology, a Gnetum male strobilus receives much attention in previous taxonomic and evolutionary studies. However, underlying molecular mechanisms that control male strobilus development and pollination adaptation have not been well studied. In the present study, nine full-length transcriptomes were sequenced from three developmental stages of the G. luofuense male strobili using Oxford Nanopore Technologies. In addition, weighted gene co-expression network analysis (WGCNA), and RT-qPCR analysis were performed. Our results show that a total of 3138 transcription factors and 466 long non-coding RNAs (lncRNAs) were identified, and differentially expressed lncRNAs and TFs reveal a dynamic pattern during the male strobilus development. Our results show that MADS-box and Aux/IAA TFs were differentially expressed at the three developmental stages, suggesting their important roles in the regulation of male strobilus development of G. luofuense. Results of WGCNA analysis and annotation of differentially expressed transcripts corroborate that the male strobilus development of G. luofuense is closely linked to plant hormone changes, photosynthesis, pollination drop secretion and reproductive organ defense. Our results provide a valuable resource for understanding the molecular mechanisms that drive organ evolution and pollination biology in Gnetum.


Results
Assembly and functional annotation of full-length reads. In the present study, we defined three developmental stages of G. luofuense male strobilus-FA represents 1-10 days' growth, FB represents 10-15 days' growth, and FC represents 15-25 days' growth ( Fig. 1). A total of 30,261,170 clean reads were generated by Nanopore sequencing, with mean lengths ranging from 993 (FC03) to 1236 bp (FB02) ( Table S1). Among these clean reads, 22,997,187 full length (FL) reads with clear primer sequences at both ends were identified. FL reads accounted for 78.05% (1,971,235; FA02) to 79.64% (1,485,322; FA03) of the clean reads (Table S2). All FL reads were clustered and polished, yielding a total of 257,886 consensus reads, ranging from 22,119 (FB01) to 33,351 (FC03) in individual samples (Table S3). To delete redundant reads, all consensus reads were mapped against the www.nature.com/scientificreports/ reference genome of G. luofuense with the mapping rates ranging from 99.26 to 99.40% (Table S4). After mapping, a total of 132,653 non-redundant full-length reads (nFLs) were obtained, ranging from 12,066 (FB01) to 16,689 (FC03) in individual samples (Table S5). A total of 45,036 transcription were functionally annotated by searching against the NCBI non-redundant protein sequence (NR), Swiss-Prot, Clusters of Orthologous Groups of proteins (KOG/COG), Gene ontology (GO), Protein Family (Pfam) and Kyoto Encyclopedia of Genes and Genomes (KEGG) databases (Table S6).

Identification of CDS and lncRNAs.
We identified a total of 21,720 open reading frames (ORFs), among which 16,692 (76.85%) were complete ORFs with both start and stop codons. The length distribution of the complete ORFs is shown in Fig. 2A. Among the complete ORFs, the average length of the 5′ untranslated regions (UTRs) was 530 bp, and the average length of the 3′ UTR was 557 bp. A total of 15,439 coding sequences (CDS) were detected, with an average length of 394 bp. In addition, 466 lncRNAs with a mean length of 689 nt were identified using four separate methods (Fig. 2B). The lncRNAs comprised 349 lincRNAs (74.9%), 20 antisense lncRNAs (4.3%), 10 intronic lncRNAs (2.1%), and 87 sense lncRNAs (18.7%) (Fig. 2C). The differentially expressed lncRNAs are shown in Fig. 2D, of which we found the number between FA and FC is the largest (22 lncRNAs), while the smallest between FB and FC (one lncRNA). Furthermore, three hundred twenty-six genes were predicted to be regulated by 296 lncRNAs in cis, 218 genes regulated by 81 lncRNAs in trans. Networks of target genes with the regulation of lncRNAs in cis and in trans are shown in Fig. 2E. Moreover, KEGG pathway enrichment analysis was performed for those target genes (Fig. 2F). Our results show that target genes were primarily enriched in "photosynthesis (ko00195)", "plant-pathogen interaction" (ko04626), "starch and sucrose metabolism" (ko00500), "plant hormone signal transduction" (ko04075), "flavonoid biosynthesis" (ko00941), "amino sugar and nucleotide sugar metabolism" (ko00520, 25).

WGCNA analysis. Expression of all transcripts were quantified with CPM values and listed in Supplemen-
tary Dataset File 1. Based on the data, we performed weighted correlation network analysis (WGCNA), and our results show that seven modules of highly correlated TFs across the three developmental stages of G. luofuense male strobili (Fig. 3A, B). The three largest modules of enriched transcripts are shown in turquoise genes), brown (349), blue (340). At the FA stage, values of Pearson's correlation coefficient in the brown and turquoise modules were both positive, but with the q value > 0.05. At the FC stage, however, positive values of correlation coefficient were found among blue, green, and red modules with all q values < 0.05. KEGG pathway enrichment analysis was performed for transcripts in the five modules (Fig. 3C). TFs in the turquoise module were primarily enriched in "photosynthesis (ko00195)", "phenylpropanoid biosynthesis" (ko00940), and transcripts in the brown module were primarily enriched in "phenylpropanoid biosynthesis", "DNA replication" (ko03030), and "plant hormone signal transduction" (ko04075). Transcripts in the blue module were primarily enriched in "starch and sucrose metabolism", "phenylpropanoid biosynthesis", and "plant hormone signal transduction".  (Fig. 4B). Among the up-regulated genes, the set of all DETs was significantly enriched in the top three KEGG pathways ( Fig. 4C): "glutathione metabolism" (ko00480), "glycolysis /gluconeogenesis" (ko00010), and "amino sugar and nucleotide sugar metabolism" (ko00520). Among the down-regulated genes, the set of all DETs was significantly enriched in the top three KEGG pathways ( Fig. 4D): "phenylpropanoid biosynthesis", "photosynthesis", "starch and sucrose metabolism" (ko00500).

TF identification and RT-qPCR validation.
A total of 3138 transcription factors (TFs) was detected, of which MYB-related, MADS-box, and bHLH TFs constituted the most abundant TFs (Fig. 5A, Supplementary Dataset File 3). We further investigated the differentially expressed TFs among the three developmental stages of G. luofuense male strobili. We found that bHLH, MYB and MADS-box MIKC were highly expressed at FA and FC, while TFs e.g. Y-subunit B, ABI3 and CBF/NF-Y were highly expressed at FB (Fig. 5B). In addition, we performed RT-qPCR experiments to validate the eight genes of interest, which encode the differentially expressed TFs (Fig. 5C). Results of RT-qPCR are largely congruent with the results of ONT sequencing, but there are exceptions such as genes TnS001008199g01 and TnS000980857g03 at the stage of FC and TnS000498063g52 at the stage of FB. It might because the sampling of RT-qPCR and ONT-sequencing slightly differed at the developmental stages of male strobili.

Discussion
Reproductive organ development. MADS-box genes play an essential role in reproductive organ development of seed plants [28][29][30] . Transcription factors encoded by MADS-box genes consists of two types, i.e. type I (SRF-like) and type II (MEF2-like). A previous study reports that DEF/GLO-like gene GGM15 are differentially expressed during a sterile ovule development of G. gnemon 16 . Another study shows that transcription factors (TFs) encoded by DEF/GLO-like genes GGM2 and GGM15, AGL6-like genes GGM9 and GGM11 were examined to form a heterodimer that determine the initiation of male strobilus development in G. gnemon 31    In angiosperms, it has been known that Aux/IAA TFs play an important role in gynoecium morphogenesis, ovule development, and formation of primary branch in Arabidopsis 36 . Another study shows that Aux/IAA genes take effects in reproductive organ development and responses to abiotic stress in rice 37 . In Gnetum, a previous study has shown that concentration of endogenous hormones gibberellin A3 (GA 3 ) and zeatin riboside (ZR) dramatically increases while indole-3-acetic acid (IAA) declines over the course of male strobilus development in G. parvifolium 12 . Another study shows that six Aux/IAA genes are involved in the female strobilus development of G. luofuense 38 . In the present study, the results of present study that 13 DETs annotated with hormone and signal transduction are down-regulated between FA and FC (Fig. 4C). Besides, three Aux/IAA TFs were identified and validated to be differentially expressed across the male strobilus development (Fig. 5C). Thus, our results corroborate that plant hormone, e.g. indole-3-acetic acid might play an important role in the male strobilus development of G. luofuense.  shows that sepals and petals of tobacco flowers have the photosynthetic capability of fixing CO 2 and pigment biosynthesis 39 . Another example shows that photosynthetic capacity of grapevine flowers gradually decrease across the procedure of flowering and finally cease at fruit periods 40 . In gymnosperms, female strobili of Pseudotsuga menziesi (Douglas-fir) were examined to possess considerable rates of photosynthesis 41 . Another study shows that female strobili of Pinus sylvestris (Scots pine) enclosed by aluminum foil yield lower seed weight B A

Pollination drop secretion.
Pollination drops in general function as a media to capture pollen grains and further transport them to interior nucellus for fertilization [43][44][45] . Gnetum is an entemophilous tropical genus, and a few Asian species (e.g. G. gnemon, G. parvifolium and G. luofuense) are reported to be insect pollinated 13,15,46 . Sugary pollination drops in Gnetum are able to attract and provide rewards to pollinators like nectars in angiosperms [43][44][45] . It has known pollination drops of Gnetum are rich in carbohydrates (sucrose, fructose, and glucose), proteins (degradome and secrotome) and amino acids, phosphate, and minerals 6,47 . Our results of WGCNA analysis shows that co-expressed transcripts in the blue module were enriched in KEGG pathways related to carbohydrate metabolism, e.g. starch and sucrose metabolism, glycolysis/gluconeogenesis, and fructose and mannose metabolism (Fig. 3C). These genes were shown to be highly expressed at the late developmental stages of male strobili. For examples, expression of gene TnS000345359g03 that encodes beta-fructofuranosidase is up-regulated from FA to FC. These results corroborate that G. luofuense male strobilus development is closely associated with sugar reproduction; the process might be related to pollination drop production as a response to insect pollinators.
Defense mechanism. As being reproductive organs, male strobili in G. luofuense demands protection against fungi and pathogens throughout their development. An effective manner in G. luofuense is to produce a physical barrier-an involucre collar to protect sterile ovules that secrete pollination drops at anthesis. A previous study shows that pollination drops are composed of defense-related proteins, e.g. thaumatin-like proteins, xylosidase, beta-glucodiase, and chitinases secreted from sterile ovules of G. gnemon and G. luofuense 6,47 . In the present study, we found three genes, i.e. TnS000958803g05, TnS000762467g04, and TnS000052095g04 were all up-regulated from FA to FC. Of these genes, genes TnS000958803g05 and TnS000762467g04 encode thaumatinlike proteins, which were validated to participate in ovule defense in hybrid yew (Taxus × media) 48 . The other gene TnS000052095g04 encodes chitinases, which was considered to be important in protection of female ovules among various species of Ephedra 49 and Welwitschia 50 . Thus, this evidence corroborates active defense reaction against external organisms during the male strobilus development of G. luofuense. RNA extraction and nanopore sequencing. The collected male strobilus material was snap-frozen in liquid nitrogen and stored at − 20 ℃. A RNeasy Plant Mini Kit (Qiagen, Valencia, CA, USA, product No. 74903) was used to extract total RNA from the nine samples, and relic DNA was removed using RNase-free DNase (Qiagen). The concentration, purity, and integrity of extracted RNA was assessed using 1% agarose gel electrophoresis with NorthernMax gel buffer (Qiagen), a NanoDrop spectrophotometer (ThermoFisher Scientific, Wilmington, DE, USA), and an Agilent 2100 Bioanalyzer (Agilent Technologies, Palo Alto, CA, USA), respectively. The synthesis of cDNA for Nanopore sequencing was performed according to the protocol from Oxford Nanopore Technologies, UK: 1 μg total RNA was prepared for cDNA libraries using cDNA-PCR Sequencing Kit (SQK-PCS109) protocol provided by Oxford Nanopore Technologies (ONT) 51 . Libraries were then created using a sequencing library preparation kit. We added defined PCR adapters directly to both ends of the first-strand cDNA. The establishment of cDNA libraries was subject to 14 cycles of PCR amplification with LongAmp Taq (NEB). The cycling parameters were set as 94 ℃ for 3 min, followed by 35 cycles of 94 ℃ for 30 s, 56 ℃ for 45 s, 72 ℃ for 1.5 min, and a final extension step of 72 ℃ for 10 min. The PCR products were then subjected to ONT adaptor ligation using T4 DNA ligase (NEB). Agencourt XP beads was used for DNA purification according to ONT protocol. The final cDNA libraries were added to FLO-MIN109 flowcells and libraries were then sequenced using a MinION Mk1B sequencer. Nanopore sequencing data from the nine male strobilus samples were all deposited in the NCBI Sequence Read Archive (SRA) under BioProject accession number PRJNA645614.

Methods
Data processing and genome mapping. Raw sequencing reads were analyzed using MinKNOW version 2.2 (Oxford, UK). Raw reads were filtered with the following settings: read quality score ≥ 7 and read length ≥ 500 bp. Ribosomal RNA was removed by searching against the Silva rRNA database (https:// www. arb-silva. de). Fulllength reads were identified when primers were detected at both ends of the cleaned reads. All full-length reads were clustered after mapping to the G. luofuense reference genome, https:// datad ryad. org/ stash/ datas et/ 10. 5061/ dryad. 0vm37 52) using minimap2 version 2.1.7 52  www.nature.com/scientificreports/ com/ nanop orete ch/ pinfi sh), consensus reads were again mapped to the G. luofuense reference genome using minimap2. The mapped reads were further collapsed using the cDNA_Cupcake package with a minimum coverage of 85% and a minimum identity of 90%. Consensus reads with sequence differences at the 5′ ends were not considered to be redundant transcripts.
Functional annotation and classification. All  DET detection analysis. Full-length reads were mapped to the G. luofuense reference genome mentioned above, and mapped reads with coverage above five were saved. Transcript expression was quantified as counts per million (CPM), where CPM = (reads mapped to transcript/total reads mapped in one sample) × 1,000,000. Differential expression analysis between each pair of developmental stages was performed using DESeq version 1.10.1 (https:// bioco nduct or. org/ packa ges/ relea se/ bioc/ html/ DESeq. html) 68 . The t-test was used to judge the statistical significance of expression difference, and the threshold of P-value was determined with the FDR in multiple testing. Three replicates at each developmental stages were set as an independent group for pairwise comparisons. The resulting P-values were adjusted using Benjamini and Hochberg's 69 approach in order to control the false discovery rate. Genes with FDR-adjusted p-values < 0.01 and fold changes ≥ 2 were defined as DETs. KEGG enrichment analyses were performed as described above. Out of these DETs, transcript expression of three replicates at each developmental stage was averaged, and denoted FA, FB, and FC.
TF identification and RT-qPCR validation. Transcription factors were identified and their genes were assigned to different families using iTAK version 1.7 (http:// bioin fo. bti. corne ll. edu/ tool/ itak) 70 . We selected eight differentially expressed TFs for gene expression validation by RT-qPCR referred to the MIQE guidelines. Primers were designed using Primer Premier version 5.0 71 , and information on the RT-qPCR protocol is presented in Supplementary Dataset File 1. The cDNA was synthesized from 2 μl of total RNA were extracted from the nine samples using a PrimeScript RT Master Mix (Perfect Real Time) (TaKaRa, China). For each sample, three independent analyses were performed, and the mean and standard deviation of the RT-qPCR gene expression values were calculated. The entire experiment was performed in Applied Biosystems Real-Time PCR Instruments-QuantStudio 6 (ThermoFisher Scientific, China). The amplification program consisted of 2 min of initial denaturation at 95 ℃, followed by 40 cycles of 20 s at 94 ℃, 20 s at 58 ℃, 20 s at 72 ℃. When producing melting curves, the amplification program was set at 30 s at 94 ℃, followed by 30 s at 65 ℃ and 30 s at 94 ℃. The G. luofuense actin gene was used as an endogenous control to estimate the relative expression of TFs using the ΔΔCt method 72 . Based on the slope of the standard curve, amplification efficiency was calculated in the website sever (https:// www. therm ofish er. com/ uk/ en/ home/ brands/ therm oscie ntific/ molec ularb iology/ molec ular-biolo gy-learn ing-center/ molec ular-biolo gy-resou rce-libra ry/ therm oscie ntific-web-tools/ qpcr-effic iency-calcu lator. html).When amplification efficiency was all close to 100%, relative expression values were calculated as −2∆∆C t , of which ∆∆C t = ∆C t -the average values of three replicates of ∆C t , and ∆C t = C t (genes of our interest) -C t (actin gene).