Introduction

Understanding how organisms have adapted to different environments is a long-standing goal of evolutionary biology. With advancing technologies in molecular methods, identification of candidate genes for selection has advanced from examining few genes to thousands in non-model organisms1,2. RNA-Sequencing (RNA-seq) is a recently developed approach to transcriptome profiling that uses second-generation sequencing technologies. It provides unprecedented and powerful opportunities to address comparative genomic-level questions for non-model organisms3, besides its wide application for studying patterns of gene expression. Given the premise that positive selection is linked to potential selective pressures, the ratio of number of nonsynonymous substitutions per nonsynonymous site (dN) to number of synonymous substitutions per synonymous site (dS) is an indicator for measuring positively selected ‘candidate’ genes4. The dN/dS ratio as applied to transcriptomes, has recently uncovered candidate genes related to environmental adaptation in many non-model organisms5,6,7.

Aquatic angiosperms constitute only 1–2% of extant angiosperms8, but are found in approximately 17% of angiosperm families representing about 100 evolutionarily independent origins9. Compared with terrestrial angiosperms, aquatic plants occupy a distinctive and in some ways more stressful ecological environment including low light levels, reduced carbon availability, sediment anoxia and mechanical damage through wave exposure10. Aquatic plants have adapted various life forms requiring different levels of physical change from terrestrial plants, generally being divided into emergent, floating-leaves, or submersed forms, the latter most extreme. Given these challenges and physical changes, the adaptive strategies of aquatic plants have long intrigued scientists11. There is now a strong understanding of the adaptive traits of aquatic angiosperms, including the reproductive systems, light requirements, phenotypic plasticity and leaf economics and so on.

However, there is still limited understanding of the molecular mechanisms of genetic adaptation of aquatic plants. Recent advances in aquatic plant genomics have been made including whole chloroplast genome sequencing for Nuphar advena12, Najas flexilis13, Spirodela polyrhiza, Wolffiella lingulata and Wolffia australiana14 whole mitochondria sequencing of Butomus umbellatus15 and whole genome sequencing of Spirodela polyrhiza16. The identification of genes for adaptation to aquatic life has only recently been explored, viz. Wissler et al.17 identified candidate genes for adaptation to marine life by comparing orthologous genes from two seagrasses and eight terrestrial species. However, candidate genes for adaptation in freshwater habitats have not been investigated. Contrary to the limited studies on aquatic plants, genomic approaches to better understand adaptation to different environments have been carried out in terrestrial plants6,7 and animals including cichlid fishes18 and dolphins19.

Genes associated with adaptions to aquatic environments are valuable genetic resources which could be used to enhance plant resistance to waterlogging in future. This may have particular significance given the rise in global sea levels over the past decades due to increased temperature20. The flood risk and wetland shifts would be aggravated by the sea-level rising, which impose waterlogging pressure to crops and natural plant communities21.

Ranunculus L. (Ranunculaceae) serves as a well-studied lineage that can be used as a genomic model for the study of plant adaptations from terrestrial to aquatic habitats. The genus is cosmopolitan with approximately 360 species (Plant List: http://www.theplantlist.org/) and recent phylogenetic hypotheses that include aquatic taxa suggest a single shift of a subclade to the aquatic habitat22. In the present study, three species were included. (1) Ranunculus bungei Steud. (also treated as Batrachium bungei (Steud.) L. Liou, 2n = 16 or 24 with x = 823) is a typical aquatic perennial herb with submerged vegetative organs and emergent flowers (Fig. 1). The plant is distributed in the temperate to sub-boreal zones of the Northern Hemisphere and used as an indicator of good water quality. (2) Ranunculus cantoniensis DC. (also treated as Ranunculus chinensis Bunge in China), is a terrestrial herb widely distributed in Asia (eFlora of China, http://www.efloras.org/). Its karyotype is 2n = 16, x = 8 in mainland China24. (3) Ranunculus brotherusii var. tanguticus (Maxim.) is also a terrestrial herb distributed in Xinjiang, Qinghai (China), Central Asia and Russia (eFlora of China), 2n = 32 with x = 824. The latter two terrestrial species are used for herbal medicine in China25. The split between R. bungei and its terrestrial relatives is relatively recent, with an age of no more than 20 Ma26.

Figure 1
figure 1

Phylogenetic relationship and photographs of Ranunculus species used in this study. The divergence times are given in millions of years. The plant and habitat photographs were taken by Ling-Yun Chen.

In the current study, we isolated transcriptomes of the aquatic species R. bungei and two terrestrial species R. cantoniensis and R. brotherusii using the Illumina paired-end sequencing technology in order to: (1) increase the genetic resources and obtain orthologous genes of the three species for statistical assessment in order to (2) identify candidate genes involved in the adaptive transition from terrestrial to aquatic habitat.

Results

Divergence time estimation

The ITS sequences obtained in this study were deposited in GenBank (no. KP336398–KP336400). Divergence time estimation using ITS suggested that R. bungei and R. brotherusii split from their shared most common ancestor at c. 11.3 (95% CI: 6.2–17.3) Ma. The two species split from the most recent shared common ancestor with R. cantoniensis at c. 19.7 (95% CI: 12.6–29.4) Ma (Fig. 1 & Supplementary Fig. S1).

De novo assembly and annotation of unigenes

We generated 102–106 million clean reads per species, yielding c. 9.3–9.6 Gb of RNA-seq data per species (Supplementary Table S1). The clean reads were submitted to the NCBI Sequence Reads Archive (no. SRR1822558, SRR1822529, SRR1737526). De novo assembly yielded 114,753–140,218 contigs, with mean length at 342–357 bp and N50 at 649–727 bp. The unigenes, which were assembled by using the contigs, are 637–688 bp on average with N50 at 1,132–1,187 bp.

All the unigenes were annotated on the basis of similarity to the public NCBI non-redundant protein database (NR), Swiss-Prot protein database (Swiss-Prot, http://www.expasy.ch/sprot), Kyoto Encyclopedia of Genes and Genomes (KEGG, http://www.genome.jp/kegg/), Cluster of Orthologous Groups database (COG, http://www.ncbi.nlm.nih.gov/COG/), Gene ontology database (GO) and NCBI nucleotide database (NT). The results indicated that 41,111 R. bungei (54%), 37,427 R. brotherusii (66%) and 43,565 R. cantoniensis (51%) unigenes have a significant match (E-value < 10−5) to public databases (Supplementary Table S1). NR has the highest proportion of successful annotations, while COG has the lowest proportion. The two top-hits for the three species in the NR database are Vitis vinifera and Amygdalus persica (Fig. 2). GO functional classification divided all the unigenes into three categories: cellular component, molecular function and biological process (Supplementary Fig. S2)

Figure 2
figure 2

Summary of the unigenes of R. bungei, R. brotherusii and R. cantoniensis annotated to NCBI NR database with BLASTX.

Orthologous genes and dN/dS analyses

The Bidirectional Best Hit (BBH) method27 with E-value < 10−15 recovered 11,362 putative 1:1:1 orthologous genes, while OrthoMCL28 recovered 8,174 putative 1:1:1 orthologous genes. The median length of orthologs (with alignment gaps) inferred from BBH and OrthoMCL is c. 930 bp and 770 bp respectively. After filtering the orthologous pairs with dS < 0.01, dS > 1.0, dN > 1.0 and pairs with aligned length < 150 bp, the two methods yielded c.11,000 and 7,600 orthologous pairs for each species pair, respectively (Table 1).

Table 1 Summary of the Orthologous genes and dN/dS analyses.

With the BBH orthologs and maximum likelihood (ML) method, the mean value of dN, dS and dN/dS of the three pair-wise comparisons was 0.033–0.054, 0.226–0.390, 0.150–0.160 respectively. Only 1–3 orthologous pairs with dN/dS > 1 were found for each comparison. Taking 0.5 for dN/dS and P < 0.05 as indicators of positive selection, 60–69 orthologous pairs were found for each comparison. In the two comparisons, viz. R. bungeiR. cantoniensis and R. bungeiR. brotherusii, 69 and 64 orthologous pairs were recovered to be positively selected respectively (Table 1, species pair 1–3).

With the OrthoMCL orthologs and approximate method, the mean value of dN, dS and dN/dS of the three pair-wise comparisons was 0.036 –0.059, 0.219 –0.376, 0.170 –0.184 respectively. Only 1–2 orthologous pairs with dN/dS > 1 were found for each comparison. 48–68 orthologous pairs with dN/dS > 0.5 (P < 0.05) were found for each comparison (Table 1, species pair 10–12). The comparison R. bungeiR. cantoniensis and R. bungeiR. brotherusii suggested 54 and 41 orthologous pairs were under positive selection respectively. More information is shown in Supplementary Table S2 and sequences for all the orthologous pairs with dN/dS > 0.5, P-value < 0.05 are provided in Supplementary sequence data.

Positively selected genes (PSGs) of R. bungei

We counted the PSGs of R. bungei for which positive selection was indicated for the 2 aquatic – terrestrial comparisons but not for the terrestrial – terrestrial comparison. Nine PSGs of R. bungei were suggested in the BBH orthologs and ML analysis; 4–6 PSGs were recovered in the other three analyses (Fig. 3). In total, 12 PGSs were recovered, of which 2 were shared in all the analyses, viz. Unigene28957 and Unigene25660 (Table 2). In addition, 2 genes of R. bungei, which do not satisfy our strict criterion of PSGs, but were identified as a PSG in at least one aquatic-terrestrial comparison, might also participate in adaptation to overcome the transition into the aquatic habitats, viz. CL2856.Contig2 and Unigene36323 (Table 2).

Table 2 Genes of R. bungei recognized as candidates for adaptation to aquatic habitats. The genes are identified as PSGs in comparison to terrestrial taxa.
Figure 3
figure 3

Numbers of PSGs shared among species-pairs. The numbers in red colour, for example, ‘bun 9’ indicates 9 PSGs of R. bungei are shared by R. bungei–R. cantoniensis and R. bungei–R. brotherusii, but orthologs in R. cantoniensisR. brotherusii are not positively selected. The number in the centre of each Venn diagram indicates the clusters that are positively selected in all three species-pair comparisons. These data are counted by using Supplementary Table S2; orthologous pairs with possible alignment problems were excluded.

The 3 orthologs within the cluster that includes Unigene28957 were matched to the same protein according to the TAIR 10 Transcripts database ( https://www.arabidopsis.org/) and the Vitis vinifera NR, supported that these were 3 orthologs and not paralogs. Similar results were found for 9 of the 13 remaining clusters (see Table 2). Species phylogeny inferred from cluster that includes Unigene28957 is congruent with that from ITS (Fig. 1). Similar results were found for 9 of the 13 remaining clusters (see Supplementary Fig. S3 for phylogenies of all the clusters).

According to TAIR 10, the best Arabidopsis protein match for Unigene 28957 is AT1G22060.1, a Leucine Rich Repeat domains containing protein, which is located in the vacuole and expressed during growth stages. Unigene32998 was best matched to AT1G34050.1, a member of the Ankyrin repeat family. The Vitis vinifera NR database also identified the gene as a member of the Ankyrin repeat family (Table 2, sequences for the 12 PSGs are provided in Supplementary sequence data).

Discussion

The relatively high mean dN and dS value and low mean dN/dS ratio obtained in the present study might be due to the relatively ancient splits among R. bungei, R. cantoniensis and R. brotherusii (11.3–19.7 Ma). As divergence time between two sequences increases, so too does the dN and dS value29. The orthologs of closely related species usually get low dN and dS. For example, the mean dN and dS of Primula poissoniiP. wilsonii (split c. 0.9 Ma) was 0.007 and 0.027 respectively30, while the values of Arabidopsis – poplar (split c. 110 Ma) were 0.202 and 2.18431. The dN/dS ratio might decrease over time29, e.g. the ratio for cattle – human is even lower than that in common human polymorphisms32. This is likely, in part, why we obtained a limited number of orthologous pairs with dN/dS > 1 and P < 0.05. However, the split between R. bungei and its terrestrial relatives might be the most recent split between terrestrial life form and submersed aquatic life form among aquatic angiosperms. It was assessed by using Angiosperm Phylogeny Website ( http://www.mobot.org/MOBOT/research/APweb/), TimeTree ( http://www.timetree.org/) and an extensive literature search. For example, a comparable split within the Eudicots between terrestrial Haloragis/Gonocarpus and aquatic Myriophyllum/Laurembergia (Haloragaceae) occurred c. 35 Ma33. Whereas several deeper splits of well-known aquatic lineages were much older including the submersed Ceratophyllum and its terrestrial relative, (c. 148 Ma34) and the split between the submersed Alismatale (e.g. Potamogeton) and their terrestrial relatives (c. 124 Ma35). Therefore, the recency of the split between R. bungei and its two relatives makes this a good model system for identifying candidate genes for the adaptation to aquatic habitat.

In the present study, each pair-wise comparison identified more than 40 orthologous pairs that were positively selected. For example, 69 pairs for R. bungeiR. cantoniensis were recovered by the BBH and ML method. However, only the orthologs of R. bungei that satisfied the criterion of positive selection for the two aquatic – terrestrial comparisons (R. bungeiR. brotherusii and R. bungeiR. cantoniensis), but did not show positive selection for the terrestrial – terrestrial comparison (R. cantoniensisR. brotherusii) were identified as being involved in adaptation to the aquatic habitat. Compared with some previous studies3,30, which only use one species pair to explore candidate genes for adaptive evolution, our study provides a more conservative approach to optimize the reliability of the methodology.

Aquatic habitats can be characterized by low carbon, oxygen, shaded conditions, sediment anoxia, wave exposure, sometimes also osmotic stress and limited nutrient supply10, which makes re-colonization of aquatic habitats by terrestrial angiosperms a challenge13. Plant ethylene, ROS (reactive oxygen species), low NO and O2 are important signals and/or regulators for water adaptation36. Some plants can adapt strategies to temporarily avoid or reduce problems associated with submergence36. (1) An ‘escape’ strategy whereby elongation of organs above floodwaters or (2) a quiescent strategy whereby the plant limits carbohydrate consumption and growth and protects the meristem organ. Some of the genes that regulate these strategies have been determined and including SUB1A, SNORKEL1, SNORKEL2 in rice37, HRE1, HRE2 in Arabidopsis thaliana. Submerged aquatic plants usually have adaptions such as degraded cell walls in xylem and roots, well-developed aerenchyma and hydathodes38, which can all be found for R. bungei39.

The model terrestrial/amphibious plant such as rice and A. thaliana don’t possess most of these characters. Aquatic plants and terrestrial plants likely have some common mechanisms of water adaptation, but the aquatic plants also have some distinct mechanisms.

Among the 14 genes of R. bungei, which are identified as PSGs in adaptation to aquatic habitat, gene ontology identified Unigene28957 codes for an expressed protein located in the vacuole. The vacuole has an important role in regulating osmotic pressure (e.g. by accumulating proteins40) and regulation function can be extreme for freshwater aquatic plants41. Thus, Unigene28957 might participate in this osmotic regulation. This was supported by the result that in the terrestrial – terrestrial comparison no orthologous pairs to the vacuole were identified to be under positive selection. CL2856.Contig2 was defined as part of a solute carrier family 50 (sugar transporter), which regulates water transport, response to fructose stimulus, sugar transmembrane transporter activity and root development42. This gene, thus could play a role in regulating osmotic stress and/or directly affect the development of roots of these aquatic plants (Table 2). Unigene36323, a putative DNA repair protein RAD50, which participates in microtubule cytoskeleton organization, mitotic recombination43, vernalization response and seed germination44, could be active in structural modifications (e.g. the cell wall) in changes from R. bungei’s terrestrial relatives.

The other 11 PSGs were identified as members of the Haloacid dehalogenase-like hydrolase (HAD) superfamily, Ankyrin repeat family, DUF724 protein family, etc. (see Table 2). Function information of these genes is very limited, so their direct relation to aquatic adaptation can not yet be determined.

Some genes that are postulated to be involved in re-colonization of aquatic habitats, such as the homologous gene of LESION SIMULATING DISEASE1 that control the formation of Lysigenous Aerenchyma in Arabidopsis45, the homologous gene of VACUOLELESS1 that is an essential gene for vacuole formation in Arabidopsis46 and the homologous gene of group VII Ethylene Response Factor, were not recovered as under positive selection in this study. In the molecular adaptation study of seagrasses17, 51 genes were identified to be under positive selection. Most of them were involved in translation, metabolism and photosynthesis such as the genes for utilizing CO2 and light. Some of the PSGs of R. bungei in the present study are involved in translation and metabolism. None are involved in photosynthesis, although submersed aquatic plants usually live in low dissolved oxygen and low light level environment.

Of course, there are still limitations in our ability to crossover with results from other studies. (1) Although we have more than 9 Gb of clean data for each of the three Ranunculus species, it likely provides only limited coverage of the genome. Similarly, genes that are known to facilitate salt tolerance such as the SOS were absent from investigation in the comparative genomic analysis of seagrasses17. (2) The pair-wise comparison method estimated the dN/dS between two sequences; genes experienced strong positive selection at some nucleotide sites but with low average dN/dS ratio may be neglected47, thus genes of importance may not always be recognized. (3) We are also limited in recognizing all genes that relate to specific function for aquatic adaptations, as some orthologs can’t be annotated by using the public databases, such as the NR and GO. This is a problem even for plants with whole genome sequencing completed, e.g. Populus3.

Conclusions

In this study, we obtained transcriptomes of three Ranunculus species and carried out statistical assessment of non-synonymous and synonymous substitution rates with pair-wise comparisons. In total, we detected 14 candidate genes that may be involved in the adaptation from terrestrial habitats to aquatic habitats. As this study did not have complete transcriptome coverage for these three species, our ability to identify genes involved in recolonization of aquatic habitats by angiosperms will benefit from analyzing more genomic data. Also, including more aquatic plant lineages and their terrestrial relatives, especially lineages with higher levels of genome annotations already available, for comparative analyses will be necessary to identify candidate genes important for aquatic adaptation in plants. The ultimate goal will be to verify the function of these candidate genes and studies such as this provide a starting point for investigation. These 14 candidate genes provide a valuable resource to begin to understand the molecular mechanism of plant adaptation from terrestrial to aquatic habitats.

Methods

Plant material

Ranunculus bungei (36°56'56.60“N, 100°53'09.92“E; 3096 m alt.) and R. brotherusii (37°11'57.42“N, 101°32'18.33“E; 2820 m alt.) were sampled from Qinghai province, China in Sep. 5, 2013. Ranunculus cantoniensis was sampled from MangShan National Forest Park (24°58'57.67“N, 112°53'11.09“E; 770 m alt.), Hunan province, China in Oct. 20, 2013. Living plants of the three species were brought to the greenhouse in Wuhan Botanical Garden for cultivation.

Phylogenetic inferences

The phylogenetic relationship and divergence time among R. bungei, R. brotherusii and R. cantoniensis were estimated including these taxa within a larger Ranunculus dataset. Genomic DNA of the three species was isolated from fresh leaves using an Ezup Column Plant Genomic DNA Purification Kit (Sangon Biotech, Shanghai, China). The internal transcribed spacer regions (ITS1, ITS2) and 5.8S gene of the nuclear-encoded ribosomal DNA were amplified following Chen et al.48. Sequencing using the PCR primers was carried out on an ABI 3730 automated sequencer at Tsingke Biotech Co. (Beijing, China). An ITS data matrix was created with the 3 sequences we generated and 81 Ranunculus sequences from GenBank (accession number were provided in Supplementary Fig. S1). 16 species represent 14 genera such as Krapfia clypeata and Laccopetalum giganteum were selected as outgroups following Emadzade & Horandl26. We then used BEAST v. 1.7.549 with four independent Monte Carlo Markov Chains (MCMC) runs for 15 million generations, sampling every 10,000 generations. The first 10% of trees were discarded as burn-in and the remaining trees were combined. We applied a lognormal relaxed clock and calibrated the phylogeny with two fossil calibration points according to Emadzade & Horandl26, one is the split between Ranunculus and Clematis and one is the minimum age of Myosurus.

RNA extraction and sequencing

A mix of tissues from leaves, stems and roots were collected at 12 am and 12 pm. One individual for each species was sampled, as the intra-species variation is low compared with the inter-species variation50and life-form of the three species is stable. Total RNA was isolated using RNAisoTM Plus (Takara, Qingdao, China) and then treated with RNase-free DNase I (Takara, Qingdao, China) for 45 min according to the manufacturer’s protocols. The quality of total RNA was checked using 2% agarose gel electrophoresis. The RNA samples were then delivered to Beijing Genomics Institute (BGI, Shenzhen, China) and concentrations were checked by Agilent Technologies 2100 Bioanalyzer instrument (Agilent Technologies, Santa Clara CA, USA). The cDNA preparation and Illumina sequencing were performed at BGI. The entire process followed a standardized procedure monitored by BGI’s Quality Control System. The mRNA was isolated from total RNA using oligo (dT) magnetic beads using the manufacturer’s instructions for cDNA library construction. Double stranded cDNA was sequenced using the Illumina HiSeq™ 2000 sequencer (90 bp paired-end). Image data from the sequencer was transformed by base calling into raw sequence data, which formed the raw reads.

De novo assembly and annotation

Raw reads were cleaned by removing adaptor sequences, reads with unknown base calls (N) more than 5% and low quality reads (>20% of the bases with a quality score ≤10) using Filter_fq (an internal program of BGI). De novo assembly was carried out with the short reads assembling program Trinity v. 2013022551 using default parameters except for the following: mini contig length 100 bp, min glue 3, group pairs distance 250, path reinforcement distance 85 and min kmer cov 3. Contigs were assembled by Trinity into unigenes using pair-end information. The unigenes were then processed by the TGI Clustering Tool (TGICL) v. 2.152 to remove redundancies and assembled to acquire non-redundant unigenes as long as possible. We changed the default parameters of TGICL to put together sequences linked to other sequences by overlaps of at least 40 bp and at most 20 bp overlap distance of sequence ends.

In order to get descriptive annotation, all of the unigenes were annotated based on similarity to the NR, Swiss-Prot, KEGG and COG by BLASTX (E-value < 10−5). The unigenes were also annotated to NT by BLASTN (E-value < 10−5). With the results of NR annotation, Blast2GO53 was used to get Gene Ontology functional annotation. After that, WEGO54 was used to determine functional classification for all unigenes and to understand the distribution of gene functions for the species from the macro level.

Sequence direction of the unigenes was determined using the best aligning results between the unigenes and the protein databases. Incongruent results from different databases were settled by a priority order of NR, Swiss-Prot, KEGG and COG. Coding region sequences (CDS) of the unigenes were predicted by firstly aligning unigenes to NR, then Swiss-Prot, then KEGG and finally COG with BLASTX. Unigenes aligned to a higher priority database were not aligned to a lower priority database. The CDS were then translated to amino acid sequences with standard genetic coding using custom Perl scripts.

Identification of orthologous genes

Identification of orthologous genes is critical to this study. To find clusters of orthologs among the three species, we adopted two strategies. Firstly, we applied a BBH method27 and ‘stringent filters’ to exclude paralogs6. The predicted amino acid sequences of each species were used as queries and targets separately to search against those of the other two species (BLASTP). The best hits of the longest isoforms with E-value <10–6 or 10–15 were retrieved. Orthologous pairs with identity <60 were excluded and only 1:1:1 orthologous genes in all three species were retained. BLASTP with E-value <10–15 reduced the number of the putative 1:1:1 orthologous genes by only 2%. The clusters that contained two or more unigenes from the same species were excluded from further analyses to eliminate potential paralogs. Clusters that included stop-codons were also excluded from further analyses.

Secondly, orthologous gene clusters were constructed using OrthoMCL28 with default settings according to methods in Wissler et al.17. Only clusters with at least one sequence per species were used in our analyses17. If more than one sequence of any species was contained in a cluster, all sequences of that species were removed except for the one sequence that showed the highest similarity to all other sequences of the cluster. The putative orthologous pairs were then aligned by MUSCLE55 with default parameters.

dN/dS analyses

The dN, dS value and dN/dS ratio were estimated in KaKs_Calculator v. 1.256 using two methods, a ML method with model averaging57 and an approximate method with the YN model47. Both methods were applied to the BBH orthologs and OrthoMCL orthologs (Table 1). The proportion of synonymous and nonsynonymous substitution sites and maximum-likelihood score for each ortholog pair were calculated. Fisher’s exact test was performed to justify the validity of the dN and dS values. Three independent dN/dS analyses were performed for each orthologous cluster, viz. (1) aquatic R. bungei – terrestrial R. brotherusii; (2) R. bungei – terrestrial R. cantoniensis; (3) R. brotherusiiR. cantoniensis (Table 1). Ortholog pairs with dN > 1.0, saturated for synonymous substitutions (dS > 1.0) and with dS < 0.01 were removed. In addition, ortholog pairs with P-value (Fisher test) > 0.05 or aligned length < 150 bp were also excluded. dN/dS > 0.5 was applied as a threshold of positively selected genes. Swanson et al.58 estimated that more than 80% of genes with dN/dS > 0.5 were under selection and the threshold value of dN/dS > 0.5 has become widely accepted in recent studies3,30.

One additional filter was used to exclude paralogs: the aligned DNA sequences of candidate PSG pairs were checked manually to eliminate results due to poor alignment. We counted the number of ortholog pairs under positive selection between two species and the clusters under positive selection in all three species-pair comparisons. At last, we accepted the PSGs of R. bungei, which were identified to be under positive selection for both the aquatic R. bungei – terrestrial R. brotherusii and R. bungei – terrestrial R. cantoniensis comparisons (dN/dS > 0.5, P < 0.05), but not for the terrestrial R. brotherusiiR. cantoniensis comparison (dN/dS < 0.5 or P > 0.05). We think this conservative method can exclude pseudo-PSGs of R. bungei.

Annotation and phylogenetic inference of the PSGs of R. bungei

In order to get more detailed annotation, orthologs within the clusters that include the 14 PSGs of R. bungei from the last step were annotated to TAIR10 using BLASTP. The orthologs were also annotated to the NCBI NR database of Vitis vinifera using BLASTP, as more than 30% of our unigenes were matched to the species (Fig. 2). In total, 14 clusters were annotated.

Phylogenetic relationships among the 14 PSGs of R. bungei and their orthologs in the two other species were estimated using protein sequences. According to results of BLAST to TAIR10 and NR database of the last step, the sequence hit with lowest E-value was used as outgroup for each cluster. Each cluster was aligned by MUSCLE55 with default parameters and ambiguous alignment were manually deleted. ML and maximum parsimony (MP) analyses were performed using MEGA v.6.0.659. ML was performed with Jones-Taylor-Thornton (JTT) model and Gamma Distributed (G); MP was performed with Subtree-Pruning-Regrafting (SPR) search method. Gaps and missing characters were complete deleted and branch support values were estimated by 100 bootstrap replicates. All other parameters were default values.

Additional Information

How to cite this article: Chen, L.-Y. et al. Transcriptome sequencing of three Ranunculus species (Ranunculaceae) reveals candidate genes in adaptation from terrestrial to aquatic habitats. Sci. Rep. 5, 10098; doi: 10.1038/srep10098 (2015).