Pan-transcriptome identifying master genes and regulation network in response to drought and salt stresses in Alfalfa (Medicago sativa L.)

Alfalfa is an important legume forage grown worldwide and its productivity is affected by environmental stresses such as drought and high salinity. In this work, three alfalfa germplasms with contrasting tolerances to drought and high salinity were used for unraveling the transcriptomic responses to drought and salt stresses. Twenty-one different RNA samples from different germplasm, stress conditions or tissue sources (leaf, stem and root) were extracted and sequenced using the PacBio (Iso-Seq) and the Illumina platforms to obtain full-length transcriptomic profiles. A total of 1,124,275 and 91,378 unique isoforms and genes were obtained, respectively. Comparative analysis of transcriptomes identified differentially expressed genes and isoforms as well as transcriptional and post-transcriptional modifications such as alternative splicing events, fusion genes and nonsense-mediated mRNA decay events and non-coding RNA such as circRNA and lncRNA. This is the first time to identify the diversity of circRNA and lncRNA in response to drought and high salinity in alfalfa. The analysis of weighted gene co-expression network allowed to identify master genes and isoforms that may play important roles on drought and salt stress tolerance in alfalfa. This work provides insight for understanding the mechanisms by which drought and salt stresses affect alfalfa growth at the whole genome level.

www.nature.com/scientificreports/ to generate gene regulation networks of highly correlated genes 12 . The circRNA is a form of non-coding RNA generated by 3′-5′ head-to-tail back-splicing and is highly stable in comparison with linear RNAs. It has been reported that circRNA is involved in many biological processes including abiotic stress response 13 .
In alfalfa, transcriptomic analysis were used to identify differentially expressed genes under different conditions including salt stress 14,15 , freezing stress 16 , and bacterial pathogen infection 17 . Most recently, Duan et al. 18 identified hub genes and modules closely related to floral pigmentation in two alfalfa cultivars. Luo et al. 19 generated a full-length transcriptome of alfalfa root tips of 'Zhongmu No. 1' under osmotic and ionic stresses. However, only parts of genes and isoforms were captured in these studies as only a single tissue source and single variety was used. Furthermore, no study has yet identified circRNAs in autotetraploid alfalfa. A robust transcriptome in alfalfa using more tissue sources and varieties with extreme responses to specific stresses is required to generate a more complete transcriptomic atlas that can be used in alfalfa genomics-assisted breeding programs.
In this research we used three germplasms, Wilson (drought resistant), Saranac (nontolerant to drought and salt), PI467895 (salt resistant) and generated 21 full-length alfalfa transcriptomes from three tissue sources (leaf, stem and root) of plants subjected to salt (SS), drought (DS) and control (CK) treatments. Our goal was to generate more complete, rigorous, and comprehensive isoform expression atlas in alfalfa response to drought and high salinity. Here we report the discovery of ASE, NMD, transcription factors and regulators, ncRNAs in alfalfa and the construction of gene regulation networks to better understand molecular mechanisms by which drought and high salinity affect plant growth and biomass production in alfalfa.

Results
Generation of full-length non-concatemers reads. RNA libraries were generated from 21 samples for Iso-Seq and RNA-seq. More than 5.5 terabytes of raw reads were obtained from Iso-Seq and 108.5 gigabytes of circular consensus sequences (CCS) files were generated from reads that passed at least two times through the insert. The CCS files produced a total of 19,405,930 zero-mode waveguides (ZMWs) and 17,007,966 (87.64%) ZMWs passed all filters (Supplementary Table 1). The demultiplexing process was done by the LIMA barcode demultiplexing 20 to produce 21 BAM files with a total of 16,594,967 QC-passed reads. The demultiplexed reads were refined with Isoseq3 to remove concatemers and polyA tails, resulting in a total of 16,419,731 (98.94%) full length non-concatemers (FLNC) reads (Supplementary Table 2).
Redundant isoforms from SAM files were collapsed with TAMA collapse 9 using the option exon cascade collapse generating 21 collapsed transcriptomes in BED format. Comparing the number of corrected-trimmed reads with collapsed isoforms showed a clear positive Pearson's correlation of R = 0.81. Saranac-SS-Stem had a relatively low number of corrected-trimmed reads (456,574) but produced a high number of collapsed isoforms (49,646). Saranac-CK-Leaf had a high number of corrected-trimmed reads (1,061,656) but produced a similar number of collapsed isoforms (49,687) to Saranac-SS-Stem (Supplementary Figure 1).

Characterization of alfalfa isoforms.
To evaluate the quality of the libraries and to confirm the number of detected genes in the pan-transcriptome, TAMA-GO was used to produce a saturation curve for gene discovery by Iso-Seq using read support levels for each gene/transcript collected during the clustering and merging process. Saturation curves in 21 transcriptomes intend to capture all gene repertoires and the curves became a plateau when the number of reads reached > 300,000 for most of libraries. (Fig. 1a-c). Some libraries such as Wilson-DS-Root, Wilson-CK-Stem or Wilson-CK-Root showed lower read counts, associated with low number of FLNC (Supplementary Table 2) which caused the count curve stopped before reaching plateau. When all libraries were merged, the saturation curve occurred after 3 million reads (Fig. 1d).
To unify the gene and isoform IDs among transcriptomes, all 21 BED files were merged using the TAMA merge tool 9 . A pan-transcriptome with 1,124,275 non-redundant isoforms and 91,378 non-redundant genes was obtained. To discover the individual contribution of each treatment to the pan-transcriptome, non-redundant isoforms and genes were identified in Wilson-DS, Wilson-CK, PI467895-SS, PI467895-CK, Saranac-CK, Saranac-DS and Saranac-SS. Wilson-CK has the lowest number of isoforms and genes (162,837 and 41,200, respectively) and PI467895-SS has the highest number of isoforms (352,824) and PI467895-CK has the highest number of genes (62,064) (Fig. 1e). Non-redundant isoforms and genes were identified by germplasm where Saranac had a higher number of isoforms and genes (641,942 and 73,438 respectively), while Wilson had lowest number of isoforms and genes (349,909 and 56,395 respectively).
Core and tissue-specific transcriptomes were obtained from PI467895, Saranac and Wilson identifying shared isoforms or genes by the intersection (∩) of leaf, stem and root by germplasm and conditions (Supplementary Table 3). PI467895 was the germplasm with more core isoforms and genes (14,860 and 17,018, respectively) while Wilson was the germplasm with lowest number of isoforms and genes (9,140 and 11,597, respectively). Isoforms and genes from same germplasm and tissue were compared by treatments (Supplementary Table 4). Stress-specific isoforms were higher compared with control-specific isoforms. For example, in Wilson > 50% of isoforms were unique in drought stress in all tissue sources. Conversely, core genes were higher than stress or control genes. In Wilson > 48% of genes were core between control and drought stress (DS ∩ CK).
Transcriptome annotation and nonsense-mediated mRNA decay. Isoforms were annotated against Uniprot100 21 Medicago truncatula 22 , iTAK 23 , PlantRegMap 24 , and pfamA 25 databases (Fig. 2). Uniprot100 and M. truncatula protein databases retrieve 827,313 and 445,462 isoforms annotated with > 90% of amino acid identity, respectively. 697,742 isoforms were annotated by pfamA and domain information was used to retrieve GO terms. 17,084 isoforms were annotated as transcription regulators (TR) and most common families were SNF2 (2,613), PHD (2,123), SET (1,368) Table 5). All 21 transcriptomes showed similar numbers of TF, TR, or PK (Supplementary Table 6). The most frequent PK and TF were RLK-Pelle and bHLH, respectively and the most frequent TR was SNF2 in all treatments, except in Saranac-DS-Stem (PHD). Isoforms annotated as TF, TR and PK were compared in control vs stressed conditions from same germplasm and same tissue source. The isoforms only present in stressed conditions were considered upregulated, and isoforms only present in control conditions were considered downregulated (Supplementary Table 7). For example, isoforms annotated as RLK-Pelle shows differences in up and downregulated isoforms among treatments. PI467895-SS-Leaf and PI467895-SS-Root had more RLK-Pelle isoforms upregulated while PI467895-SS-Stem had more RLK-Pelle isoforms downregulated.
TAMA GO: ORF and NMD predictions tool 9 was used to predict NMD and their exon distribution. NMD was predicted in 989,464 (64.2%) isoforms and it was most frequent in the first exons. The percentage of transcripts without predicted NMD were similar among treatments, with a total of 550,985 (35.8%) transcripts. The highest percentage of NMD was in Saranac-CK-Leaf (69.59%) and Wilson-CK-Stem (68.86%) while the lowest percentage was in Wilson-DS-Stem (66.44%) (Supplementary Table 8). Since NMD affects gene expression levels, we correlated NMD with expression levels in log 2 TPM obtained from RNA-seq data. We were able to identify transcripts without NMD having higher expression mean values compared with transcripts with NMD (Supplementary Figure 2). The log 2 TPM values were lower with NMD in isoforms with more than 10 exons. Among them, Wilson-CK-Root had the highest log 2 TPM values with a maximum value of 1.00 in isoforms without NMD. SQANTI3 isoforms classification. SQANTI3 structural classification allowed us to classify isoforms into eight groups: full splice match (FSM), incomplete splice match (ISM), novel in catalog (NIC), novel not in cata- www.nature.com/scientificreports/ log (NNIC), antisense, fusion, genic and intergenic. The most common categories in all treatments were NNIC (mean 42.87%) followed by FSM (mean 26.96%) and the most uncommon categories were Fusion (mean 1.94%) and Antisense (mean 1.12%) (Fig. 3a). Structural classifications were compared by exon distribution and found that genic and antisense groups had the lowest exon number compared with other groups (Fig. 3b). The transcripts were subclassified into subcategories of 5' fragment (18,433), 3' fragment (52,388), internal fragment (4,735) and intron retention (IR) (36,684). IR was present in fusion, ISM NIC and NNIC structural classifications. All antisense, genic, and intergenic transcripts were classified into the multi-exon subcategory (Fig. 3c). We did not identify differences in length distribution according to structural classification (Fig. 3d). Structural classification distribution was plotted against the alfalfa allele aware genome finding high number of antisense, genic, and intergenic transcripts at the end of chromosomes 3.1 and 3.4 and lowest isoform density on chromosomes 6.1, 6.2, 6.3 and 6.4 (Fig. 3e,f).
Noncoding RNAs in alfalfa. Noncoding RNAs, are group of transcripts involved in post-transcriptional regulation. Long noncoding RNAs (lncRNA) were predicted using the plant long noncoding RNA prediction (PlncPRO) 26 . A total of 11,677 non-redundant lncRNAs were obtained in all 21 Iso-Seq libraries. PI467895-SS-Stem had the highest lncRNAs (1016), while lowest lncRNAs (164) were predicted in Wilson-CK-Root (Supplementary Table 9). The mean length of lncRNAs was 2,081 bp and maximum length was 8,817 bp. Additionally, there was a high frequency of lncRNAs between 200 and 300 bp, possibly due to that 200 bp was the minimum threshold to detect lncRNAs (Fig. 4a). LncRNAs with two and three exons were the most frequent with 6,875 and 2276, respectively (Fig. 4b). LncRNAs were classified according to structural classification using SQANTI3, where the most important categories were identified as follows: intergenic (55.1%), NNIC (16.8%)  Table 10). LncRNAs can regulate the expression of up or downstream genes ( Fig. 4c upper part). We compared expression levels of lncRNAs vs up and downstream genes and identified 2,105 lncRNAs differentially upregulated and 1,894 lncRNAs differentially downregulated compared to adjacent genes ( Fig. 4c lower part). Gene ontology enrichment analysis of adjacent genes to lncRNAs identified several terms related with drought or salt stress like hydrotropism, oxidoreductase activity, response to auxin, response to stimulus (Fig. 4d, Supplementary Table 11). www.nature.com/scientificreports/ CircRNAs were predicted from the RNA-seq data using circexplorer2 27 . Only high confidence circRNAs were analyzed and merged to generate the first circRNA genome-wide profiling in alfalfa. We identified 2,108 nonredundant circRNAs highly specific by treatment, tissue source or germplasm. The highest number of circRNA were found in Saranac-DS-Root with 260 circRNAs, while Wilson-DS-Root had 34 circRNAs (Supplementary Table 9). The mean length of circRNAs was 317 bp and maximum length was 3,613 bp (Fig. 4a). CircRNAs were mainly mono-exonic and the maximum number of exons detected were 18 (Fig. 4b). Gene ontology enrichment analysis of host genes of circRNAs shows how these ncRNAs are targeting to key genes related with drought or salt stress (Fig. 4d). Several parental genes associated with circRNAs were annotated as important regulators: 62 circRNAs were associated with genes annotated as TF, 60 circRNAs were associated with genes annotated as PK. We identified 13 and 22 specific GO terms related to lncRNAs and circRNAs, respectively. We identified common GO terms between lncRNAs and circRNAs involved in auxin response, signal transduction, UDPglycosyltransferase activity, or catalytic activity (Supplementary Table 11). To carry out a weighted correlation of gene network analysis (WGCNA), the WGCNA R package was used 12 . Adjacency matrix was calculated using the soft-thresholding power 8. Hierarchical clustering was used to identify 72 modules using the dynamicTreeCut function (Fig. 5a). Isoforms for each module were analyzed according to alfalfa germplasm and stress conditions. The analysis identified several significant modules-trait associations based on the threshold of Pearson correlation > 0.5 and p value < 0.05. PI467895-SS, Wilson-DS, Saranac-SS and Saranac-DS had seven, five, seven, and six significant modules, respectively. Additionally, blue module with 988 isoforms was significantly correlated with both Saranac-SS and Saranac-DS. All significant modules identified by treatment were exported to Cytoscape to generate transcript networks. The PI467895-SS network contained 3,026 isoforms including 28 lncRNA and 86 TF. The Wilson-DS network had 2,613 isoforms including 18 lncRNA and 48 TF. The Saranac-SS network had 2614 isoforms including 29 lncRNA and 78 TF. The Saranac-DS network had 2,895 isoforms including 40 lncRNA and 121 TF. In total 105 and 307 isoforms were identified as lncRNA and TF respectively. A list of TFs and lncRNAs identified by modules and treatment are in Supplementary Table 14. A total of 11,148 isoforms in 26 modules were analyzed in the Uniprot database to retrieve Uniprot ID, protein names and metabolic pathways. In total, 279 out of 11,148 isoforms were annotated in some metabolic pathways (Supplementary Table 15). Among metabolic pathways, protein ubiquitination was the most common pathway with 41 genes in 18 modules followed by glycolysis with 36 genes in 11 modules and protein glycosylation with  Figure 9). Finally, to reduce complexity of the networks, subnetworks were constructed independently selecting hub isoforms, lncRNAs and TF (Fig. 6).

Discussion
Comparative transcriptomic analysis using PacBio and RNA-seq. In this study, we used both Iso-Seq and RNA-seq platforms and generated genome-scale transcriptomes from 21 samples of three tissue sources (leaf, stem and root) in three germplasms (Saranac, PI467895 and Wilson) under three stress conditions (SS, DS and CK). Iso-Seq allows to obtain full-length transcripts without assembly steps, helping in obtaining completed alternative splicing events (ASE). Isoforms were aligned to allele-aware chromosomes of the autotetraploid alfalfa genome (M. sativa cv XinJiangDaYe) which was assembled in all 32 chromosomes (2n = 2n = 4× = 32) and 9,789 scaffolds with a total of 164,632 protein-coding genes 28 . It has been reported that the use of single genotype for transcriptome analysis in response to abiotic stress may not represent the genome-scale diversity of transcripts in a particular species 29 . Therefore, in the present study, we used three germplasms of alfalfa and analyzed 21 transcriptomes to generate a merged pan-transcriptome with 1,124,275 isoforms in 91,378 genes. This approach made a significant improvement in isoforms and gene discovery detecting almost seven times more transcripts than those reported in the reference genome of alfalfa cv XinJiangDaYe 28 . We identified differences in numbers of isoforms and genes among germplasms, which might be caused by structural variation 30 or incomplete capture of genes in an individual 19 . For example, PI467895 has more than 5,000 core isoforms in comparison with Wilson and Saranac (Supplementary Table 3). All transcriptomes had a similar average gene length (~ 2,400 bp) with a maximum length of 13,424 bp, which is similar to a previous www.nature.com/scientificreports/ report 19 . However, in this work we discovery a vast number of new isoforms. Only 27.2% isoforms were classified as FSM; isoforms with the same splicing pattern as predicted in the reference genome. Isoforms belong to NNIC, isoforms with novel splice donors or acceptors represent 42.7%, demonstrating the usefulness of Iso-Seq to discover new isoforms in alfalfa. It has been reported that NMD is an important process to identify premature termination codons in controlling gene expression 31 . Previous work with NMD in alfalfa focused on heat shock transcription factor 1 (MsHSF1) which is related with nodule formation by inoculation with Sinorhizobium meliloti 32 . In this work, we were able to predict NMD in 62.4% of isoforms and revealed that this process was a common post-transcriptional modification in alfalfa, which helped in understanding the complexity of gene expression. Additionally, the frequency of NMD and expression mean values decreased as exon predicted NMD increased, thus, higher expression levels occurred in isoforms without NMD than those with NMD. These results agree with the previous Iso-Seq results in maize and sorghum 33 36,37 . In this work we predicted 159 isoforms annotated as VOZ TFs with different expression patterns according to tissue or treatments. Additionally, in this work, we identified 46 isoforms of signal transducer and activator of transcription (STAT) TFs in alfalfa. STATs are very common in non-plant species, and involved in the activation of the Janus tyrosine kinase (JAK)/ STAT pathway 38 . It has been suggested that the GRAS TFs fulfil the function of STAT TFs in plants 39 . However, in this work, we identified both GRAS and STAT TFs which opens the question of whether STAT and GRAS TFs have redundant functions in alfalfa.
RLK-Pelle, CAMK, CK1 and CMGC were the most frequent PK families among all transcriptomes analyzed in this study and this is the first report of genome-wide level of PK in alfalfa. In M. sativa, calcium/calmodulindependent protein kinase (CAMK) is involved in the rearrangement of intramolecular hydrophobic interactions and in calcium oscillations 40 . In rice, the PK CAMK OsDMI3, confers saline-alkaline tolerance by modulating the Na + and H + influx in root 41 . Here we reported 243 genes and 3,610 isoforms annotated as CAMK, with the higher number of isoforms up and downregulated in Saranac-SS-root.
Transcription regulators (TR) identified in the present study were classified in 23 families and the most abundant families were SNF2 and mitochondrial transcription termination factor (mTERF). SNF2 is involved in positive and negative regulation of gene expression of a large number of genes. mTERF is involved in organellar gene expression. In Chlamydomonas, mTERF protein MOC1 is involved in controlling the stromal redox and played roles in the communication between chloroplast and mitochondrion, organelles for redox homeostasis 42 . Additionally, 855 isoforms in 79 genes were annotated as Auxin/Indole-3-Acetic Acid (Aux/IAA) transcriptional regulator family and 1,986 isoforms in 71 genes were annotated as auxin response factor (ARF) transcription factor family which are involved in auxin early response. In Medicago truncatula Aux/IAA family is involved in nodule formation during infection with Sinorhizobium meliloti 43 . Additionally, we were able to differentiate the low frequent families of TF in different germplasm and tissue sources. For example, 6 genes and 22 isoforms were annotated as HRT-like transcription factor in PI467895-stem. Among those, however, the isoform G77245.10 was only present in PI467895-SS-Stem, while isoform G77245.6 was only present in PI467895-CK-Stem (Supplementary Figure 10).
The importance of non-coding RNA a in response to drought and salt stresses. NcRNAs play a crucial role in abiotic stress tolerance through five different pathways, including target mimicry, sRNA precursors, DNA methylation, antisense transcription, and histone modification. The mechanism of ncRNAs in response to drought or salt stress may be through abscisic acid (ABA) signaling pathways and modulation of ion channels (Supplementary Table 16). For example, in M. truncatula, the expression of 12 lncRNAs involved in osmotic and salt stress have been validated and lncRNA TCONS_00020253 is upregulating Medtr1g081900 a Na+/H+ex-changer (NHX) gene which improves salt tolerance 44 .
In this work, we identified a diversity of ncRNAs which is in agreement with the previous reports 45 . This is the first report of lncRNAs in regulation of plant response to drought and salt stress in M. sativa, although previous reports highlighted the importance of lncRNAs in response to osmotic and salt stress and cold stress in M. truncatula 11,44 . Additionally, we were able to identify 565 lncRNA with a length between 200 and 300 bp with abnormal distribution. The high frequency of transcripts between 200 and 300 bp could be affected by ncRNAs classification. ncRNAs can be classified as small nucleolar RNAs (snoRNAs) if the size is between 60 and 300 bp or as lncRNA if the size is > 200 bp. The overlap of 200-300 bp may cause a redundancy between snoRNAs and lncRNA. We predicted a lower number of lncRNAs compared with previous studies of M. truncatula. Wang et al. 44 predicted 23,324 lncRNAs under osmotic and salt stress, and Zhao et al. 11 predicted 24,368 lncRNAs under cold stress. This inconsistency could be due to the genomic differences between M. truncatula and M. sativa, or different methodologies used for lncRNA prediction. In the present study, we used plncPRO 26 , a tool designed specifically for predicting lncRNAs in plants based on random forest algorithm, which could be more stringent but more accurate than the method used by Wang et al., where lncRNAs were predicted based on lack of coding potential 44 . Most of lncRNA were intergenic, and many of them showed an inverse expression compared with their adjacent genes. Our result agreed with the role of ncRNAs in transcriptional silencing of adjacent genes.
CircRNAs are important class of ncRNAs with multiple roles including tolerance to drought and salt stress previously reported in cucumber, maize and Arabidopsis 13 www.nature.com/scientificreports/ in resistant and non-resistant alfalfa genotypes under drought and salt stress. This is the first report of circRNA in response to drought and high salinity in alfalfa. Several circRNAs involve in regulation of growth process, ubiquitin activity and signaling pathways, and participating SAGA (Spt-Ada-Gcn5 Acetyltransferase) transcriptional complex.
Gene regulation networks. WGCNA has been used to identify correlated genes under low temperature in M. falcata 38 and to identify potential regulators of flower color pigmentation in M. sativa 47 . In this work we obtained a dynamic tree with 26 significatively associated modules related to salt or drought stress in PI467895, Wilson or Saranac (Fig. 5). Blue module with 988 isoforms in 922 genes was significatively associated in the nontolerant genotype Saranac under drought and salt stress. This result could be relevant in plant breeding, because blue module isoforms co-regulated in drought and salt stress can be used as molecular markers. Yellow4 module, significatively associated with PI467895-SS, had 189 isoforms in 127 genes present at a high correlation (0.91). We identified 341, 210 and 98 isoforms predicted as TF, PK and TR respectively. Blue, green and yellow were the modules with high number of TFs (42,36,27). Additionally, we found 170 isoforms predicted as lncRNAs. This finding helped in generating the regulatory network in response to drought and salt stress in alfalfa. The analysis of gene regulatory networks allowed identifying highly connected isoforms (hubs) and the interaction of TF and lncRNA with other genes.
Hierarchical layout algorithm from Cytoscape was used for illustrating the main direction or "flow" within a network arranged by the layer and the order of the isoforms to identify master regulators in each network. Characterization of master genes identified in gene networks in response to drought and salt stresses are showed in Table 1. The PI467895-SS subnetwork was composed by 55 isoforms identifying a glutaredoxin-dependent peroxiredoxin (G63340.1) as an important hub (Fig. 6a). Glutaredoxin-dependent peroxiredoxin has been reported to be involved in salt stress adaptation by redox signaling 48 . Additional reports showed downregulation of glutaredoxin-dependent peroxiredoxin by salt stress in Arabidopsis 49 . In this work we discovered glutaredoxindependent peroxiredoxin that was regulated by multiple genes such as ubiquitin and HD-ZIP transcription factors. Also, the TIFY domain protein was identified as another important hub. TIFY domain proteins are TFs involved in growth and plant development. In watermelon, TIFY genes (ClTIFY1 and ClTIFY2) were highly induced by salt stress 50 .
Wilson-DS subnetwork was composted by 26 isoforms (Fig. 6b). Among them, Isoform G47352.5, a C2H2type domain-containing protein, was annotated as a master regulator of abiotic stress tolerance. In Arabidopsis the C2H2 family involved in drought stress response 51 . Proteins of this family, Csa1G085390 and Csa7G071440 were upregulated in response to salt and drought stresses in Cucumis sativus 52 . Isoform G31078.4 1 was annotated as an expansin, which was involved in cell wall extension and cell enlargement and is induced in drought in tobacco 53 . Whereas in the Wilson-DS network, the high number of ribosomal, histone and uncharacterized www.nature.com/scientificreports/ proteins caused difficulty in the assumptions about the network. Nevertheless, we identified two isoforms as central hubs. One was annotated as homeobox-leucine zipper protein (HD-ZIP) HDG2 and another was annotated as S4-RNA binding domain protein. They were interacted with other genes in the Wilson-DS network. Previous reports suggested that HD-ZIP ATHB-12, increased salt tolerance by regulating Na + ion homeostasis in yeast 54 while the MtBH2, a HD-ZIP II of M. truncatula was negatively regulated in response to abiotic stress in Arabidopsis 55 .
In Saranac-SS network (Fig. 6c) isoform G10416.2 (vacuolar (H + )-ATPase) was a central hub. Previous report suggested its importance in response to salt stress 56 . Isoform G33123.7 (nerol synthase) is a terpene synthase involved in nerol production, an acyclic monoterpene alcohol. The role of terpene synthases in M. truncatula is related with response to insect herbivory 57 ; however, it has been reported that 20 terpene synthases were suppressed by salt stress in Camellia sinensis 58 . In this study, we identified isoform G17260.6, annotated as a MYB transcription factor, regulating 15 isoforms including nerol synthase and lncRNA-G55925.1. Isoforms G16125.2 and G3736.3 were annotated as zinc finger A20 and AN1 domain-containing stress-associated protein 8 which are involved in salt stress tolerance 59 . Two isoforms, G30392.2 and G30392.4, were annotated as dehydrationresponsive element-binding protein (DREB6), which was involved in drought and salt tolerances in soybean 60 .
The Saranac-DS subnetwork (Fig. 6d) had a high number of isoforms that were downregulated by drought. Isoform G9711.2, annotated as 1-aminocyclopropane-1-carboxylate oxidase-like 4 (ACO) was a hub node. ACO is a key enzyme involved in ethylene production, which plays essential roles in mediating plant responses to biotic and abiotic stresses 61 . Isoform G28222.9 annotated as HVA22-like protein that was induced by abscisic acid with responses to ABA, drought, cold, and salt stresses 62 . Additionally, isoform G1134.5 annotated as EREBP protein (Ethylene-responsive transcription factor) regulated 33 isoforms including isoform ACO and HVA22-like genes. Two isoforms (G63271.4 and G64916.4) were annotated as dehydrin proteins. Dehydrin proteins have roles as protectors of membrane integrity in plant cells under different abiotic stresses including drought stress. In alfalfa, dehydrins has been related with freezing tolerance 63 .

Similarities and differences between drought and high salinity responses. Although drought
and high salinity are different abiotic factors affecting alfalfa production, it has been reported that they share some physiological responses during early stages 64 . Drought and high salinity trigger the production and accumulation of ABA, which in turn induce many stress response genes. Several transcription factors such as MYC/ MYB or nuclear factor-Y (NF-Y) are known to regulate the ABA-responsive gene expression. Both drought and salt stresses can damage cellular components and cause metabolic dysfunction, nutrient imbalance, and oxidative stress.
In alfalfa it has been reported that MsMYB2L was induced by multiple abiotic stress factors including drought and salt 65 and MsMYB4 increase the salinity tolerance ability in an ABA dependent manner 15. In the current work we identified 1,850 isoforms in 274 genes annotated as MYB TF. There is an increase in isoforms annotated as MYB TF in PI467895 and Wilson in stressed treatments (Supplementary Table 6). We found that isoforms G64834.42 and G69600.7 were upregulated by drought and salt stress in Wilson and PI467895, respectively (Supplementary Table 13) and isoform G80387.8 was identified as a master gene in PI467895-SS subnetwork (Fig. 6a).
Dehydrins play important roles in drought tolerance, as protective proteins. For example, dehydrin MtCAS31 from M. truncatula is a positive regulator of drought response 64 . In our work we identified 30 isoforms annotated as dehydrins. Dehydrin isoforms distribution show that most of isoforms were detected in Saranac-DS-Leaf with 17 isoforms and isoforms were not detected in any tissue of Saranac-SS or Saranac-CK in stem and leaf (Supplementary Figure 11). Additionally, isoforms G63271.4 and G64916.4 were present as key regulators in Saranac-DS subnetwork.
Plants use a calcium-dependent protein kinase pathway known as the SOS pathway for salt stress signaling and Na + tolerance. SOS1 is a Na + /H + antiporter that can extrude Na + into the soil solution and load Na + into the xylem for long-distance transport to leaves 66 . We identified 97 isoforms annotated as SOS1. The highest number of SOS1 isoforms were identified in salt tolerant germplasm PI467895-SS-Root and PI467895-CK-Root with 22 and 23 isoforms, respectively, while in salt susceptible variety Saranac-SS-Root and Saranac-CK-Root there were just 12 and 8 isoforms, respectively (Supplementary Figure 12). A high isoform diversity of SOS1 in roots of salt tolerant alfalfa suggest an active of role of Na + detoxification mechanism and they agree with previous reports 19 . Conclusion and remark. Our study described transcriptomes at the whole genome-scale in three alfalfa germplasms with contrasting tolerance to drought and high salinity. The use of the autotetraploid alfalfa genome as a reference allowed us to identify extensive isoform heterogeneity and differences in allele distributions. The complexity of alfalfa transcriptome has been underestimated and RNA-seq do not show all isoform complexity, therefore the results presented in this work illuminates the isoform diversity in polyploids crop. We used Iso-Seq to capture rare/unstable transcripts in each condition without need of transcripts reconstruction based on short read sequences and predictive algorithms. The use of combination of Iso-Seq to generate the full-length transcripts and RNA-seq to correct the transcript sequences allowed us to identify full length DEGs and isoforms in alfalfa with high accuracy.
The pan-transcriptome atlas revealed specific expressed isoforms in leaves, stems and roots by salt and drought stress treatments in tolerant and non-tolerant alfalfa germplasm. Additionally, this is the first report of circRNAs identified in alfalfa with different profiles depending on tissue sources, genotypes and stress treatments. Finally, gene regulation networks allowed us to identify key candidate genes involved in regulation of plant response to salt and drought stresses, including 170 predicted lncRNAs, 341 transcription factors and several non-annotated genes. This work provides new insights into the mechanisms by which drought and salt www.nature.com/scientificreports/ stress affect transcriptomes at the whole genome scale. The master regulatory genes can be used for developing molecular markers to improve tolerance in alfalfa to drought and salt stress.

Materials and methods
Plant material and growth conditions. Two alfalfa varieties, Wilson (drought tolerant) and Saranac (drought susceptible) 67 , were used for the drought stress experiment. Plants were clonally propagated by stem cuttings which were transplanted into 5 cm diameter pots filled with sand and grown for eight weeks in a greenhouse with a 14 h light period and a temperature of 25 °C. They were irrigated daily with Hoagland's solution 68 .
Six clones of each germplasm were divided into two treatments, three control plants and three receiving drought stress. For the drought stress group, water was withheld for three days until visible signs of drought stress (leaf wilting) appeared in the susceptible variety Saranac, but not in the resistant variety Wilson. Control plants were watered daily. After 3 days of drought stress, roots, stems, and leaves of drought stressed and control plants were collected and frozen in liquid nitrogen until used for RNA extraction. For salinity stress, a salt tolerant accession (PI467895) and the same susceptible variety (Saranac) were used. Plants were grouped in the same way as for drought stress. The control group was irrigated daily with Hoagland's solution and the salinity stress group was irrigated with Hoagland's solution containing 100 mM NaCl. After seven days of salinity stress, roots, stems and leaves were collected and frozen in liquid nitrogen until RNA was extracted.
RNA extraction, Iso-Seq and RNA-seq library preparation. Total  cDNA was amplified using the amplification module and Iso-Seq Express Oligo Kit including 12 µM of 16-mer barcoded primers. Amplified cDNA was purified using a standard workflow and three samples in equal molar quantities were pooled for each SMRTbell. 21 libraries were sequenced in a total of seven SMRTbells using a PacBio Sequel II system and a P6-C4 chemistry. Same samples used for Iso-Seq were used for RNA-seq. Sample quality control was checked to ensure RIN ≥ 7.0 and a concentration of ≥ 50 ng/µl of RNA. Libraries were prepared using TruSeq RNA Library Prep Kit v2 and they were sequenced using the Illumina HiSeq sequencing platform at GENEWIZ (South Plainfield, NJ, USA). Sequencing configuration was 2 × 150 bp, single index per lane. Twenty-one flow cells were used, and each cell contained a single type of tissue source from a germplasm with a specific treatment.
Generation of FLNC. The PacBio subreads obtained from a single movie in PacBio format were classified into circular consensus sequences (CCS) with ccs command from IsoSeq Version 3.4.0 pipeline 70 to produce seven CCS.BAM files. The seven CCS.BAM files were demultiplexed using lima command from IsoSeq Version 3.4.0 70 with default parameters (lima-isoseq-dump-clips-peek-guess), resulting in 21 BAM files corresponding to tissue source-germplasm-treatment specific transcriptomes. Demultiplexed transcriptomes were refined with the function isoseq3 refine (− -require-polya) of IsoSeq Version 3.4.0 to remove polyA tails and to detect full length non-concatemer (FLNC) reads.
Hybrid error correction and mapping. The FLNC files were converted into FASTA format with SAMtools Version 1.10 71 . The Iso-Seq data were corrected with an hybrid error correction approach using RNA-Seq short sequence reads with the program LoRDEC Version 0.6 72 with the following parameters: k-mer size of 21 and abundance threshold of 2. The corrected and polished reads were aligned to the M. sativa allele aware reference genome 28 using the alignment program Minimap2 Version 2.17 73 with the following parameters: -ax splice which assumes that the read orientation of the transcript strand is unknown, outputs without secondary alignments and present options configured to splice HQ (-C5 -O6,24 -B4). Finally, the aligned BAM files were sorted to subsequent isoform characterization. Isoform characterization. TAMA collapse was used to collapse isoforms using following parameters: collapse common exons ends flags, coverage: 99, identity: 85, 5′ threshold: 10 bp, exon/Splice junction threshold: 10 bp, 3′ threshold: 10 bp, and specifying no capped flag option 9 . After TAMA collapse, 21 BED files were obtained. All BED files were merged to identify unique genes based on tissue source, treatment and germplasm using TAMA merge 9 with the option of merge duplicate groups to obtain a pan-transcriptome in BED format with unified gene and isoforms IDs.
Pan-transcriptome was annotated and NMD was predicted with the tool TAMA GO: ORF and NMD predictions 9 . The amino acid sequences were aligned against the Uniprot100 protein database 21 and M. truncatula 22 using diamond BASTP 74  www.nature.com/scientificreports/ predicted using plant transcription factor database PlantRegMap 24 , transcription regulators and protein kinases were predicted using iTAK database 23 and domain proteins were identified using PfamA database 25 . Finally, all annotations information was combined in the BED file to compare and summarize the results. The isoforms of collapsed BED files were classified in eight structural categories according to SQANTI3 75 using the M. sativa allele aware reference genome and annotation 28 . Junction files (SJ.out.tab) were obtained from RNA-seq short read data using STAR version 2.7 76 with minimum intron length set to 20 bp, maximum intron length set to 50 kb and default settings. The SQANTI3 outputs were filtered with sqanti3_RulesFilter.py for intra priming 0.6 and short read junction support to obtain the sqanti_filter.gtf. Isoform densities were plotted by structural categories using circlize Version 0.4.13 R package 77 .
NcRNAs predictions. Long non-coding RNAs (lncRNAs) were predicted with plncPRO 26 . First, a specific model was built using the mRNA and lncRNA data of M. truncatula Mt4.0v1 from Phytozome 78 and GREENC 79 databases, respectively using the function of plncpro build. Second, high confidence lncRNAs of the 21 transcriptomes of M. sativa were predicted using predict function following parameters: minimum length to define a lncRNA > 200 bp and BLAST database = Uniprot100 DB.
CircRNAs were predicted with Circexplorer2 27 which identify back-splice junctions and mapped this information of multiple split alignments. The RNA-seq pair-end data was aligned with STAR version 2.7 76 to obtain the file Chimeric.out.junction. The file Chimeric.out.junction was parsed and circRNAs were annotated with CIRCexplorer2 parse and CIRCexplorer2 annotate respectively. Gene ontology. GO enrichment analysis was done with topGO R package Version 80 2.44.0. An enrichment test was done with the algorithm "weight01" and Kolmogorov-Smirnov test to obtain a list of the top ten GO enriched terms in BP, MF and CC.

RNA-seq quantification.
Illumina datasets of RNA-seq were evaluated for quality control with FastQC Version 0.11.9 81 and then quantified with Salmon Version 0.11.3 82 using as reference pan-transcriptome FASTA sequences. Transcript per million (TPM) mapped reads of each isoform was obtained to generate an expression matrix. Absolute TPM values were imported using with tximport Version 3.13 R package 83 using the option lengthScaledTPM and TPM were normalized following the script in Bioconductor to create a DEGList for the use in edgeR 84 . DEG were detected among germplasm under different conditions and were plotted using pheatmap Version 1.0. 8 R package 85 . It is important to emphasize that it was necessary to group different tissue sources from the same germplasm and treatment to perform statistical analysis with limma using p-value = 0.05, adjust method of false discovery rate and log2 fold change = 1.5.
Weighed gene co-expression network analysis (WGCNA). The WGCNA R package 12 was used to identify modules of highly differentially expressed genes based on normalized expression data using leaf, stem and root tissue sources as replicates for the same germplasm ~ condition. Function goodSamplesgenes of WGCNA R package was used to remove unqualified genes and pickSoftThreshold function was used to choose an appropriate soft-thresholding power (β) based on a scale-free topology criterion. Adjacency matrix was generated based on the criterion of approximate scale-free topology using the soft thresholding power 8. The adjacency matrix was transformed into Topological Overlap Matrix (TOM) and dissimilarity matrix (1-TOM). Coexpression network of significative modules were exported as .graphml and analyzed using Cytoscape Version 3.8.2 86 . Only interconnected genes were exported in cys and in jpeg formats.

Data availability
Our study complies with relevant institutional, national, and international guidelines and legislation. Permissions were obtained from USDA-ARS Western Regional Plant Introduction Station for collecting and using alfalfa plants for this study. All plant materials used in this study were provided by the USDA-ARS Western Regional Plant Introduction Station and they are public available upon request. The PacBio demultiplexed reads and RNAseq fastq files were deposited in NCBI Sequence Read Archive (SRA) under accession number PRJNA667169. All 21 individual transcriptomes and pan-transcriptome in BED format, isoform annotation file, metadata information and alfalfa network files in CYS format are available in https:// figsh are. com/ artic les/ datas et/ Sup_ trans cript ome/ 13692 103.