Gene expression data support the hypothesis that Isoetes rootlets are true roots and not modified leaves

Rhizomorphic lycopsids are the land plant group that includes the first giant trees to grow on Earth and extant species in the genus Isoetes. Two mutually exclusive hypotheses account for the evolution of terminal rooting axes called rootlets among the rhizomorphic lycopsids. One hypothesis states that rootlets are true roots, like roots in other lycopsids. The other states that rootlets are modified leaves. Here we test predictions of each hypothesis by investigating gene expression in the leaves and rootlets of Isoetes echinospora. We assembled the de novo transcriptome of axenically cultured I. echinospora. Gene expression signatures of I. echinospora rootlets and leaves were different. Furthermore, gene expression signatures of I. echinospora rootlets were similar to gene expression signatures of true roots of Selaginella moellendorffii and Arabidopsis thaliana. RSL genes which positively regulate cell differentiation in roots were either exclusively or preferentially expressed in the I. echinospora rootlets, S. moellendorffii roots and A. thaliana roots compared to the leaves of each respective species. Taken together, gene expression data from the de-novo transcriptome of I. echinospora are consistent with the hypothesis that Isoetes rootlets are true roots and not modified leaves.

www.nature.com/scientificreports/ ficient RNA from these plants because of the challenge in isolating viable spores, getting the spores to germinate, effecting fertilisation and getting sporophytes to develop in axenic culture. However, we extracted 1 technical replicate of rootlets and 2 technical replicates of corm and leaves. A sporophyte transcriptome was generated for rootlets corms and leaves. The assembled transcriptome comprised 113,464 transcripts with a mean sequence length of 940 base pairs (bp). There were 35,564 sequences over 1 kilobases (Kb) in the assembly, with an N50 of 1313 bp. Proteins were successfully predicted for c. 95% of the transcripts. To investigate the completeness our transcriptome we next performed a BUSCO 35 analysis to investigate the number of conserved BUSCO 35 groups in our transcriptome. BUSCO 35 groups are near-universal single-copy orthologs. Identifing the percentage of BUSCO 35 groups present in our de-novo transcriptome therefore provides a metric for the completeness of our transcriptome. Of the 430 total BUSCO 35 groups searched for in the Viridiplantae dataset 35 , 318 (74.0%) were found complete, 87 (20.2%) were found fragmented and only 25 (5.8%) were missing. These metrics indicate that the transcriptome assembly was high quality. We next mapped the reads extracted from each of the three different organs; leaves, corms, and rootlets to calculate the abundance levels for each transcript in each of the three organs (Supplementary Table S1).
Gene expression profiles are significantly different in Isoetes echinospora rootlets and leaves. If I. echinospora rootlets were modified leaves, as predicted by the modified shoot hypothesis, we might expect gene expression signatures to be similar in the rootlets and leaves. To test this hypothesis, we compared gene expression in rootlets, leaves and corms using a principal coordinate analysis (PCoA), because similar approaches have proved successful for the investigation of lycophyte root transcriptomes 36,37 . The two leaf replicates, two corm replicates and the single rootlet sample were plotted on the first two PCoA axes [which together account for 98.6% of the variance in the sample (Fig. 2b)]. The three tissue types are clearly distinct and separated in gene expression space. The first PCoA axis accounts for 75.4% of the variance in gene expression and it distinguishes leaves and corms from rootlets (Fig. 2b). The second PCoA axis accounts for 23.2% of the variance in gene expression and distinguishes all three tissues from each other (Fig. 2b). The PCoA indicated that gene expression profiles of rootlets and leaves are distinct and does not support the hypothesis that I. echinospora rootlets are modified leaves.

Gene expression profiles of Isoetes rootlets clusters with gene expression of Selaginella and
Arabidopsis roots. If the rootlets of I. echinospora are true roots we expected similarities in gene expression between rootlets and true roots of other land plant species such as the lycophyte Selaginella moellendorffii and the seed plant Arabidopsis thaliana. To compare gene expression between these species we first defined orthologous relationships between the genes of the three species using the OrthoFinder software 38,39 (Supplementary  Table S1). This analysis identified 1737 single copy orthologs in common between these species. We focussed our analysis on orthologs with a 1-1-1 orthology relationship between the three species as expression patterns of single copy orthologs are more likely to be conserved than those of dublicated genes 40 . Using these 1737 orthologs we compared gene expression between the different species. We compared average gene expression between I. echinospora rootlets and leaves (this study) with the published gene expression in roots and leaves of S. moellendorffii 36 and roots and "aerial parts" of Arabidopsis thaliana (based on EMBL-EBI accession E-GEOD-53197) (Supplementary Table S1). To compare gene expression between these different species and organs we subjected the gene expression dataset to a PCoA. The first three principal coordinates accounted for 95.7% of the variance in the dataset. Axis 1 accounted for 43.6% of the variance and separated the samples by species (Fig. 3a,b). Axis 2, accounted for 35.9% of the variance and distinguished the two lycophyte transcriptomes (I. echinospora and S. moellendorffii) from that of the seed plant A. thaliana (Fig. 3a,c). PCoA axes one and two therefore indicate that the majority of the differences in gene expression is accounted for by differences between species rather than between roots and leaves. PCoA axis 3 accounted for 16.2% of the variance and distinguished between leaves and roots in all species (Fig. 3b,c). Leaf samples clustered in the positive values and root samples clustered in the negative values of PCoA axis 3 (Fig. 3b,c). The clustering of the I. echinospora rootlet sample with both the roots of S. moellendorffii and A. thaliana on axis 3 (Fig. 3b,c) indicates that the gene expression signature of the rootlets of I. echinospora is similar to the the gene expression signature of both S. moellendorffii and A. thaliana. It is possible that the clustering of I. echinospora rootlets with root gene expression profiles of S. moellendorffii and A. thaliana resulted from the absence of photosynethetic-related genes in these rooting organs. To test if this accounted for the clustering that we observed, we carried out a further investigation of gene expression after removing genes encoding photosynethetic functions. Of the 1737 orthologs used in the analysis only 47 encoded for photosynthetic functions or were encoded in the chloroplast as assigned by MapMan 41 to the term "photosynthesis". We subjected the gene expression datasets for the remaining 1,690 orthologs to a PCoA. The results of the PCoA ( Supplementary Fig. S1) were analagous to the result using all 1737 orthologs allowing us to rule out that the clustering of I. echinospora rootlets with the roots of S. moellendorffii and A. thaliana in gene expression space was due to genes encoding photosynthetic functions. These gene expression data are consistent with the hypothesis that rootlets of I. echinospora are roots.
The RSL root cell differentiation genes are expressed in Isoetes echinospora rootlets. To verify our findings that gene expression of I. echinospora rootlets were similar to those of the true roots of S. moellendorffii and A. thaliana we next determined the expression of the root-specific ROOT HAIR DEFECTIVE SIX-LIKE (RSL) genes in I. echinospora. ROOT HAIR DEFECTIVE SIX-LIKE (RSL) genes positively regulate the development of root hairs in euphyllophytes including A. thaliana [42][43][44][45] and are expressed in S. moellendorffii roots 37,46 . RSL genes are markers for vascular plant roots because they are expressed at a much higher level in roots of A. thaliana (EMBL-EBI accession E-GEOD-53197) and S. moellendorffii 36 Table S1). We searched the I. echinospora transcriptome for RSL genes using the BLAST algorithm with RSL-specific queries. RSL sequences were identified in the I. echinospora transcriptome. A gene tree was generated and defined four I. echinospora RSL genes in two monophyletic groups (Fig. 4). There were three transcripts in the RSL Class I clade (106204; 101034; 092963) and a single transcript in the RSL Class II clade (095243). We next investigated the expression of the I. echinospora RSL genes in the transcriptome, generated with single biological rootlet replicate. Average expression of the four RSL genes in rootlets was 4.24 transcripts per million (TPM) (Fig. 4). The average root expression was 5.78 TPM for the six RSL genes of A. thaliana (EMBL-EBI accession E-GEOD-53197), demonstrating similarities in expression of RSL genes between I. echinospora and A. thaliana. In I. echinospora, expression of each RSL Class I transcript was higher in rootlets than in leaves (Fig. 4). Furthermore, the single I. echinospora Class II RSL gene transcript (095243) was expressed in rootlets and no expression was detected in the corm or leaves. The analysis of our transcriptome demonstates that RSL genes are preferentially expressed in I. echinospora rootlets and not in leaf tissue; their expression is the same in I. echinospora as in S. moellendorffii of A. thaliana (Supplementary Table S1). The analysis of RSL gene expression is based on a single biological replicate because of the technical limitations of working with I. echinospora in axenic culture; we would have preferred to present data from replicated data of root transcriptomes. However, these results are presented here because they are consistent with the other data presented above that support the hypothesis that I. echinospora rootlets are roots.
To verify that RSL genes are markers of vascular plant roots we investigated the RSL genes in Azolla filiculoides, a fern that develops roots with root hairs 30,48 , and Salvinia cucullata a fern that has secondarily lost roots with root hairs and instead modified leaves perform rooting functions 30,49,50 . We searched the S. cucullata and A. filiculoides genomes and proteomes 51 for RSL genes using the BLAST algorithm with RSL-specific queries. A gene tree was constructed with the retrieved sequences and allowed us to identify 3 RSL Class I genes and a single RSL Class II gene in the A. filiculoides genome (Fig. 4, Supplementary Fig. S2). Consistant with their role in root development in A. filiculoides the RSL genes were expressed in the roots 48 . However, there were no RSL genes in the S. cucullata genome. S. cucullata sequences were identified in closely related basic-helix-loop-helix transcription factor subfamily XI 52,53 but none were identified in the RSL clade (Fig. 4, Supplementary Fig. S2). We conclude that the loss of RSL genes accompanied the evolutionary loss of roots with root hairs in Salvinia cucullata, which is consistent with RSL genes being markers of vascular plant roots. Furthermore, if the rootlets of I. echinospora were modified leaves, similar to the root-like modified leaves of S. cucullata, we might have expected that RSL genes would have also been lost from the I. echinospora genome. Instead, RSL genes are preferentially expressed in I. echinospora rootlets just as they are in S. moellendorffii roots. These data are consistent with the hypothesis that the I. echinospora rootlet is a root and not a modified leaf.
Taken together these data-the distinct gene expression profiles of the rootlets and leaves of I. echinospora, the similarity in expression profiles of orthologous gene preferentially expressed in rootlets of I. echinospora and roots of S. moellendorffii and A. thaliana, and the expression of the RSL genes in the rootlets of I. echinospora and roots of S. moellendorffii and A. thaliana -support the lycopsid root hypothesis which posits that Isoetes rootlets are roots and not modified leaves.

Discussion
The homology of the rootlets of both extinct and extant rhizomorphic lycopsids had been contentious for the past 150 years, with two competing hypotheses. The first, interprets the rootlets as true roots similar to the roots of other lycopsids. The second, interprets rootlets as modified leaves. Despite the second hypothesis that posits that rootlets are modified leaves being widely accepted over the past 30 years 1,3,54 , there is a growing body of evidence 2,34 that suggests that rootlets should be interpreted as true roots. Here we report the de novo transcriptome of I. echinospora that we used to test predictions of the two competing hypotheses. We discovered that expression profiles in I. echinospora rootlets and leaves were different. We showed that gene expression profiles of I. echinospora rootlets and S. moellendorffii and A. thaliana roots were similar. Finally, RSL genes involved in root cell differentiation are preferentially expressed in I. echinospora rootlets as they are in S. moellendorffii roots and the roots of euphyllophytes (A. thaliana, Oryza sativa and Brachypodium distachyon [42][43][44][45] ). Taking these three pieces of evidence together, we conclude that Isoetes rootlets are true roots, like those of extinct and extant lycopsids and not modified leaves.
The new evidence presented here adds to the growing and extensive list of similarities between the rootlets of rhizomorphic lycopsids-Isoetes species and extinct taxa such as Stigmaria-and the roots of other lycopsids 2,20,34 . This growing body of evidence supports the hypothesis that rootlets are roots and not modified leaves. The rootlets of the rhizomorphic lycopsids and roots of all extant lycopsids are indeterminate radially symmetric axes that branch by isotomous dichotomy, develop endogenously within specialised structures, develop a root   www.nature.com/scientificreports/ meristem with root cap and produce root hairs 31,32,34,55 . If the modified shoot hypothesis were correct it would have required the direct modification of a determinate leaf that did not branch, developed exogenously and was characterised by a ligule, stomata and dorsiventral symmetry into a rootlet. Each of these leaf characters would have had to be lost and all of the rootlet characters, which are shared among the lycopsids, would have had to evolve independently. By contrast, if the lycopsid root hypothesis is correct and rootlets are roots then there is no requirement for this large suite of character state changes. Instead, the only character transitions required to account for rootlet character states were the collateral positioning of the phloem, the regular rhizotaxy and rootlet abscission 4 . Although these three characters (collateral positioning of the phloem, the regular rhizotaxy and rootlet abscission) are predominately leaf characters, they are not exclusive to leaves; each has been described in the roots of other species of land plants. The collateral position of the phloem is found in Lycopodium roots including, Lycopodium lucidulum, Lycopodium clavalum, Lycopodium obscurum and Lycopodium complanalum 56 , regular rhizotaxy develops in Ceratoptertis thalictroides, Cucurbita maxima and Pontederia cordata [57][58][59] and roots abscise in Oxalis esculenta, Abies balsamea, Pinus strobus, Tsuga canadensis and Azolla species [60][61][62][63][64][65] . Based on character transitions alone we suggest that the hypothesis that rootlets of the rhizomorphic lycopsids are roots, similar to other lycopsid roots, is a more parsimonious hypothesis than interpreting rootlets as modified leaves.
Our new evidence from the transcriptome of I. echinospora adds to the numerous traits that are common between the rootlets of rhizomorphic lycopsids and the roots of other lycopsids. It is not possible to rule out the hypothesis that all of these similarities in antomy, development and now gene expression may be the product of convergent evolution. However, we suggest that it is more parsimonious to interpret the rootlets of the rhizomophic lycopsids as true roots than modified leaves.
The gene expression data from the de novo I. echinospora transcriptome are consistent with the hypothesis that the rootlets of the rhizomorphic lycopsids are roots and not modified leaves. We therefore interpret the rootlets of the rhizomophic lycopsids as roots developing from a unique root bearing organ; the rhizomorph 21,55,66 . This conclusion suggests that the dichotomously branching rooting axis is conserved among all lycopsids and a distinguishing character of the group. The dichotomous branching of these rooting axes has been conserved for over 400 million years and our comparative transcriptomic analysis suggest that the RSL genes function during root development in Selaginella and Isoetes has been conserved since these species shared a common ancestor at least 375 million years ago 67 . Our comparative analysis of the transcriptomes of extant lycophytes supports the hypothesis that the rooting systems of extant Isoetes species and their extinct giant ancestors are homologous. These data also suggest that the development of the large rooting systems of the lycopsid trees that were an important component of the Palaeozoic flora and played a key role in changing the Earth's Carbon Cycle were controlled by the same genes that regulate root development in their extant herbaceous descendants. Growth of I. echinospora in axenic culture. RNA was extracted from plants grown in axenic culture to ensure that there was no RNA contamination from other organisms. A procedure was developed to surface sterilise I. echinospora spores and germinate a population of axenically grown plants, based on previously developed procedures [69][70][71] . Sporophylls were removed from the mature plant population growing in aquaria in September (2013 and 2014) when sporangia were mature 69 . Using forceps (under a Leica M165 FC stereo microscope) mega-and micro-sporangia were isolated from sporophylls. Intact sporangia were washed in 1% (v/v) sodium dichloroisocyanurate (NaDCC) for 5 min. Sporangia were broken and loose spores were washed in 0.1% NaDCC for a further 5 min. Following the NaDCC washes, loose spores were rinsed for 5 min three times in ddH 2 O. Microspores were centrifuged for 5 min at 5000 rpm between washes with NaDCC. Once sterilised, mega and micro-spores were mixed together in ddH 2 O in a Petri dish. Petri dishes were sealed with parafilm, and incubated in darkness at 4 °C for 2 week. After 2 weeks, Petri dishes were moved to a 16 h light:8 h dark photoperiod at 18 °C. Approximately 30% of surface sterilised megasporangia contained megaspores that germinated, and within these megasporangia c. 25% of the total megaspore population germinated. It was possible to identify germinating megaspores because cracking of the megaspore wall was visible and the presence of archegonia on the megagametophyte. Once fertilisation occurred, developing sporophytes were identified by the presence of the first leaf. Sporophytes were left to continue to grow in ddH 2 O water until the two leaf two rootlet stage when they were moved to magenta boxes containing; ½ Gamborg's medium 72 , supplemented with 1% phytogel (Sigma). Plants were embedded in Gamborg media and submerged in liquid Bold's Basal Medium (Sigma, UK).

Materials and methods
RNA extraction and sequencing. Total RNA was extracted from root, corm and leaf tissues from c. 50 I. echinospora plants. Total RNA from leaves (two independent replicates), corm (two independent replicates) and rootlets (one replicate) was extracted with the RNeasy plant mini kit (Qiagen). On-column DNase I treatment was performed with RNase-free DNase I (Qiagen), according to the manufacturer's instructions. cDNA was synthesised with ProtoScript II reverse transcriptase (New England Biolabs) according to the manufacturer's instructions, using oligo(dT) primer. Total cDNA samples were quantified with a Nanodrop ND-1000 spectrophotometer. RNA purity and quality were checked with an Agilent 2100 Bioanalyzer. Library preparation and sequencing was carried out by the High-Throughput Genomics Group at the Wellcome Trust Centre De novo transcriptome assembly, protein predictions and expression analysis. Raw reads were quality trimmed using Trimmomatic-0.32 73 , to remove remaining Illumina adaptors and low quality tails. Ribosomal RNA was filtered out using Sortmerna-1.9 74 and error corrected using BayesHammer (SPAdes-16 3.5.0) 75 (with setting -only-error-correction) and Allpaths-LG-4832 76 (with setting PAIRED_SEP = − 20 and ploidy = 2). Reads were normalised using Khmer-0.7.1 with a khmer size of 21. Before assembly, paired end reads were stitched together using Allpaths-LG-4832 76 . A de novo transcriptome assembly was made with the cleaned, stitched reads using SGA 77 , SSPACE-v3 78 , and CAP3 79 . Finally assembled scaffolds were corrected using Pilon-1.6 80 . The Transcriptome Shotgun Assembly project has been deposited at DDBJ/EMBL/GenBank under the accession GGKY00000000. The version described in this paper is the first version, GGKY01000000. Proteins were predicted from the de novo transcriptome assembly using GeneMarkS-T 81 , Prodigal 82 and Transdecoder (part of the Trinity assembly program 83  Comparison of gene expression between I. echinospora organs. Using the sporophyte transcriptome assembly we next mapped the reads from the three organ libraries-leaves, corms, and rootlets-to the transcriptome to measure the expression levels of each transcript in the three tissues using Salmon 84 . To investigate the similarities between gene expression in the different organs we carried out a PCoA on the three organ types. Euclidean distances were derived from the expression of all transcripts (TPM) in each organ and were subjected to PCoA in PAST 85 using a transformation exponent of 2. To compare gene expression between I. echinospora, S. moellendorffii and A. thaliana we identified single copy orthologs between these species based on the OrthoFinder 38,39 analysis. In total, 1737 single copy orthologs were found between the three species. Using these 1737 orthologs we contrasted gene expression between the different species. We investigated average genes expression between I. echinospora rootlets and leaves (this study) with the published average gene expression between roots and leaves of S. moellendorffii 36 and A. thaliana. A. thaliana gene expression was based on average gene expression in "aerial part" and "root" of 17 different natural accessions (EMBL-EBI accession E-GEOD-53197). To investigate similarities in gene expression between these 1737 orthologs we carried out a PCoA in PAST 85 . Euclidean distances were derived from the Log10 transformed gene expression of the 1737 orthologs (Supplementary Table S1). Euclidean distances were subjected to a PCoA in PAST 85 , using a transformation exponent of 4. To rule out that the results were not influenced by genes encoding photosynthetic fucntions, we investigated how many of the 1737 genes were assigned to the MapMan 41 term "photosynthesis". Of the 1737 only 47 were photosynthesis related or encoded in the chloroplast (Supplementary  Table S1). We ran a PCoA on these 1,690 using the rsame methods described above.

Phylogenetic analyses.
Phylogenetic analyses were carried out on the RSL genes. BLAST queries were assembled based on previously published gene trees of RSL genes 52 . Sequences were used to BLAST the protein databases of the; Marchantia polymorpha "primary" (proteins) (version 3.1, November, 2015), Physcomitrella patens "primary" (proteins) (version 3.0, January 12, 2014), Selaginella moellendorffii "primary" (proteins) (version 1.0, January 12, 2014), Amborella trichopoda (proteins) (version 1.0, 2013) and Arabidopsis thaliana "primary" (proteins) (TAIR10) on the http://march antia .info/blast / server. Two fern protein databases were also searched; Azolla filiculoides protein v1.1 and Salvinia cucullata proteins v1.2 51 as well as the predicted proteins from the I. echinospora transcriptome generated in this study. All proteins were aligned in MAFFT 86,87 , manually edited in Bioedit 88 . Maximum likelihood gene trees were generated in PhyML 3.0 47 , using Jones, Taylor and Thorton (JTT) amino acid substitution model. To verify the absence of RSL genes in the S. cucullata genome and proteomes the genomes of A. filiculoides and S. cucullata were searched by BLAST using the A. thaliana protein sequence RSL1 (AT5G37800) using an E-value cut off of 1E-15. A gene tree was generated as described above including the addition of A. thaliana protein sequences from subfamilies VIIIb and XI 52