Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Comparative transcriptomics with self-organizing map reveals cryptic photosynthetic differences between two accessions of North American Lake cress


Because natural variation in wild species is likely the result of local adaptation, it provides a valuable resource for understanding plant-environmental interactions. Rorippa aquatica (Brassicaceae) is a semi-aquatic North American plant with morphological differences between several accessions, but little information available on any physiological differences. Here, we surveyed the transcriptomes of two R. aquatica accessions and identified cryptic physiological differences between them. We first reconstructed a Rorippa phylogeny to confirm relationships between the accessions. We performed large-scale RNA-seq and de novo assembly; the resulting 87,754 unigenes were then annotated via comparisons to different databases. Between-accession physiological variation was identified with transcriptomes from both accessions. Transcriptome data were analyzed with principal component analysis and self-organizing map. Results of analyses suggested that photosynthetic capability differs between the accessions. Indeed, physiological experiments revealed between-accession variation in electron transport rate and the redox state of the plastoquinone pool. These results indicated that one accession may have adapted to differences in temperature or length of the growing season.


Recent studies involving non-model plant species have provided knowledge unobtainable from using only model plants1. Many of these studies have described molecular mechanisms underlying interspecific differences in morphology, physiology, and ecology2,3,4. In addition to interspecific differences, natural genetic variation within a population of a single species is garnering increasing attention from researchers5,6. For instance, accessions of Arabidopsis thaliana (L.) Heynh. (hereafter “Arabidopsis”) vary in traits such as leaf morphology, flowering time, and drought response6, suggesting the effect of local adaptation. Several studies have addressed the evolutionary processes underlying this variation through identifying genes or miRNAs responsible for between-accession differences, prompting increased attention on accessions as experimental material6. Accessions are particularly powerful for studying non-model species that do not have the genetic resources (e.g., mutants) seen in model organisms. Additionally, accessions are useful for understanding how local adaptation processes may have sculpted morphological and physiological differences among populations.

Rorippa Scop. (Brassicaceae or Cruciferae) comprises 86 species7 distributed on all continents except Antarctica8. The within-genus diversity has resulted in considerable attention, with R. aquatica (Eaton) E.J.Palmer & Steyermark, R. amphibia (L.) Besser, and R. sylvestris (L.) Besser being particularly well studied9. Rorippa aquatica, also known as lake cress, is a semi-aquatic North American plant distributed east of the 95th meridian from eastern Wisconsin into Quebec and southern Vermont into Florida10,11. This species is well adapted to the aquatic environment and exhibits heterophylly12, which is leaf-form variation on a single plant in response to surrounding environmental cues. In nature, deeply dissected leaves develop when plants grow in submerged conditions, whereas simple leaves with entire or toothed margins develop when grown on land12. Previously, we showed that R. aquatica leaf shape changes dramatically in response to varying ambient temperatures and submergence underwater13: an ambient temperature of 25 °C induced leaves with simpler forms compared with 20 °C. Additionally, we found that environmental variation (e.g., in ambient temperature and water levels) altered the expression levels of KNOTTED1-LIKE HOMEOBOX (KNOX1) orthologs; moreover, gibberellin accumulation, thought to be regulated by KNOX1 genes, also changed in leaf primordia.

Rorippa aquatica accessions14 from northern and southern United States clearly differed in leaf forms (Fig. 1a,b) under the same conditions. For instance, the northern sample (hereafter “accession N”) develops leaves with more complex forms than the southern sample (hereafter “accession S”). In addition to the morphological difference, accession N flowers later than accession S (Fig. 1c)15. In Populus angustifolia, it is known that northern and southern populations differ in photosynthetic physiology corresponding to latitude across the North American continent16. Therefore, there is a possibility that Rorippa accessions have a difference in photosynthetic activity. However, little is known about physiological differences between these accessions except for flowering time. Depending on environmental conditions, gene expression would be expected to vary across accessions, and these cryptic physiological differences can be uncovered with comparative transcriptome analysis using RNA-seq technology17.

Figure 1

Comparison of leaf morphology in Rorippa aquatica accessions and Phylogenetic trees constructed using cpDNA sequences. (a) Top view of shoots in accession N (left) and S (right). Plants were cultivated in a growth chamber for a month at 20 °C and under continuous illumination (light intensity of 60 µmol photons m−2 s−1). (b) Comparison of morphology in accession N (left) and S (right). Plants were grown under the same conditions described in (a). (c) Comparison of flowering time between accessions. Side view of shoots in accession N and S. These plants were grown for three months under each listed condition (light intensity of 60 µmol photons m−2 s−1). (d) Global distribution of Rorippa species. Numbers within each country correspond to the species used in the phylogenetic analysis. The map was generated by using Illustrator CS4 (Adobe Systems). (e) Evolutionary history was inferred using the neighbor-joining method. The bootstrap values are indicated on branches (only those > 50% are indicated on the tree). The tree is drawn to scale, with branch lengths in the same units as the evolutionary distances used to infer the phylogeny.

In this study, we aimed to understand how local adaptation processes may have sculpted physiological differences between R. aquatica accessions. We performed large-scale RNA-seq, de novo assembly, and transcriptome annotation in addition to phylogeny reconstruction in Rorippa. Moreover, we variance-scaled transcriptome data separately by two accessions and compared them using principal component analysis (PCA) and self-organizing map (SOM) analysis. These methods provide more details on difference in expression pattern between accessions among different conditions than simple analyses of differential gene expression levels, because the scaling procedure allows focus on genes that exhibit between-accession variation in expression patterns. Then, based on SOM clustering results, we focused on genes with differential expression patterns between accessions. This comparative transcriptome analysis revealed cryptic differences between accessions, specifically in photosynthetic activity (e.g., electron transport rate) and the redox state of the plastoquinone pool.


Accessions are closely related

Despite the attention paid to various Rorippa species, relatively little is known about their phylogenetic relationships. In particular, there was no report on phylogenetic relationship among Rorippa accessions. Sequences of cpDNA were determined from 46 samples of Rorippa species distributed worldwide and two samples from outgroups Nasturtium officinale W.T.Aiton and Cardamine africana L. (Fig. 1d; Table 1). In the NJ phylogenetic tree generated, all Rorippa samples (including R. aquatica accessions N and S) formed a monophyletic group, with the two accessions being the most closely related (Fig. 1e). These relationships were also confirmed in the ML phylogenetic tree (see Supplementary Fig. S1). The NJ phylogeny also suggested that R. aquatica is close to the European R. pyrenaica (L.) Reichenb., but the latter is not heterophyllous18. However, heterophylly is well documented in R. amphibia, a widespread Eurasian species naturalized in North America19. The latter species is placed in an entirely different clade from R. aquatica within Rorippa (Fig. 1e). Therefore, it seems to likely that heterophylly evolved independently at least twice within the genus.

Table 1 List of species, voucher numbers, and accession numbers of plant materials. Herbarium acronyms follow Index Herbariorum Part I.

Transcriptome sequencing, de novo assembly, and defining differentially expressed genes

Rorippa aquatica plants (two accessions, N and S) were planted in soil and grown at three temperatures (20 °C, 25 °C, 30 °C) in a growth chamber under continuous illumination, with a light intensity of 60 or 120 µmol photons m−2 s−1. Total RNA was extracted from the shoot apical meristem with subtending P1–P3 leaf primordia.

For de novo assembly, single-end sequencing of libraries with GAIIx (Illumina) resulted in 935,152,744 reads, and sequencing of longer reads was obtained through RNA-seq with MiSeq (Illumina) to yield 68,782,820 paired-end reads (Table 2). All reads from N and S were used for de novo assembly, because Trinity tries to generate a consensus transcript even if there is allelic variation. De novo assembly using all reads from N and S resulted in 132,566 transcript contigs, with N50 and average lengths of 1,031.06 nt and 1,903 nt, respectively (Table 2). Based on the N50 length, which is an indicator for assembly quality, we confirmed that the de novo assembly has enough quality. Approximately half of the transcripts were ≤500 nt (Fig. 2a; Table 2). Assembled sequences were annotated against the GO database. This procedure allows us to perform GO enrichment analysis, later. After annotation, the most predominant GO terms under the “biological process” category were as follows: cellular (GO: 0009987), metabolic (GO: 0008152), and single-organism (GO: 0044699), followed by response to stimulus (GO: 0050896) and developmental processes (GO: 0032502). Under “molecular function,” binding (GO: 0005488) and catalytic activity (GO: 0003824) were the most enriched terms. Under the “cellular component” category, cell (GO: 0005623), cell part (GO: 0044464), and organelle (GO: 0043226) were the most prominent (Fig. 2b). Reported RNA-seq data are available in the DDBJ Sequenced Read Archive under accession number DRA005242.

Table 2 Transcriptome sequencing and summary statistics of de novo assembly.
Figure 2

Transcripts, gene lengths, and gene ontology (GO) assignments for the Rorippa aquatica transcriptome. (a) Transcript and gene length distributions defined through de novo assembly in Trinity. (b) GO assignments predicting gene involvement. Top (green): biological processes; middle (blue): molecular function; bottom (yellow) cellular component. These assignments were generated in Blast2GO.

For defining differentially expressed genes (DEGs) between accessions, we used only RNA-seq data from plants grown at 60 µmol photons m−2 s−1. Because, decreasing the number of environmental factors that similarly affect leaf form13, leaving only ambient temperature to vary. This reduced data complexity and facilitated further analysis. EdgeR was used to define 8,809 DEGs between the accessions (FDR < 0.01) based on a generalized linear model (GLM) at the gene level using temperature and accession as factors.

Principal components analysis reveals differences in transcriptome profile between accessions

To compare expression profiles between accessions, we performed PCA. Major sources of variance in the transcriptome were investigated with a PCA that considered all DEGs between accessions. The eigenvalues of two components were greater than 1 (Fig. 3a). The first component (PC1) explained 72.3% of the variation and discriminated clearly between accessions. The second component (PC2) explained 16.8% of the variation and discriminated between temperatures (Fig. 3a,b). Thus, the PCA results indicated that accessions differ in transcriptome profiles even under identical conditions. Indeed, a heatmap using all DEGs confirmed the PCA, showing clear differences in the expression patterns between accessions (Fig. 3c).

Figure 3

Principal component analysis (PCA) of gene expression. (a) Eigenvalues and cumulative contribution ratio (%) in PCA. Bars and open circles represent eigenvalues and cumulative contribution ratio, respectively. (b) The global expression profile of each transcript is represented as PC1 and PC2. Note distinct dissimilarities between the two accessions in PC1. (c) Expression profiles of genes that are differentially expressed between accessions.

Visualization and assessment of SOM clustering

We performed SOM for further understanding the difference in the expression patterns. SOM allows us to identify a subset of genes with similar expression profiles. We constructed a SOM to extract genes linked to between-accession physiological differences from DEGs between the accessions. We then used PCA to partition the resulting 20 SOM clusters following previous study20 (5 × 4, rectangular; Supplementary Fig. S2). The genes in each cluster exhibited distinct expression patterns along each condition, suggesting successful clustering (Supplementary Figs S2 and S3).

Expression patterns between accessions were similar in all clusters, differing mainly in degree even under the same conditions (Supplementary Figs S2 and S3). For instance, expression levels in cluster 10 decreased across both accessions as temperature increased, although the accessions differed in expression amount under identical temperatures. Therefore, it appears that each cluster contains genes showing different expression level and similar expression pattern between accessions. For further characterization of each cluster, we performed a GO enrichment analysis with the 20 clustered gene sets. “Response to stress” and “response to abiotic stress” GO terms were enriched in many clusters (q < 0.05), with the former being the top term in cluster 1 (see Supplementary Table S1). The strong representation of this term is likely a reflection of plant response to changes in ambient temperature, as expression levels of cluster 1 genes from both accessions increased with increasing temperature (Supplementary Figs S2 and S3). Moreover, the GO terms “post-embryonic development,” “multicellular organismal development,” “cell differentiation,” “anatomical structure morphogenesis,” and “cell growth” were enriched in cluster 10 (q < 0.05; see Supplemental Table S1). In this cluster, genes from accessions N and S decreased as temperature increased (Supplementary Figs S2 and S3), possibly reflecting a known relationship between temperature and leaf complexity13. These GO terms may be responsible for leaf-form differences across accessions, which exist even under the same environmental conditions (Fig. 1a,b). Furthermore, the “flower development” term was enriched in some clusters (q < 0.05), corresponding to between-accession differences in flowering time (Fig. 1c).

Overall, these results suggest that SOM clustering successfully identified distinct transcriptome differences between accessions. However, the large number of enriched GO terms prevented us from determining which gene types played a more critical role in influencing between-accession physiological differences.

The use of SOM clustering on accession-scaled transcriptome data is sufficient for investigating cryptic differences between accessions

We next performed PCA and SOM clustering (3 × 3, rectangular) on count data of DEGs scaled separately by accession. Gene expression values from the accessions were mean-centered and variance-scaled separately to measure differences caused by changes in accession-specific expression patterns, allowing the focus to fall on differences in expression pattern instead of expression magnitude. Using such data allows separate treatment of genes from each accession and uncovers genes that cluster differently between accessions. As a result, genes from each accession were assigned to clusters irrespective of the accessions. Nine clusters were successfully obtained (Fig. 4a,b), based on box and line plots showing genes in each cluster with distinct, non-redundant expression patterns (Fig. 4c).

Figure 4

SOM clustering of gene expression in differentially expressed genes (DEGs) and their expression profiles. (a) Results of SOM clustering. Line plots indicate representative expression patterns at 20 °C, 25 °C, and 30 °C in each cluster. For SOM and diagrams, the 3 × 3 rectangular topology is shown. (b) Number of genes assigned to each SOM cluster. Red and white indicate low and high counts, respectively. (c) Scaled expression between accessions plotted under 20 °C, 25 °C, and 30 °C are shown. Box plot explanation: upper horizontal line of box, 75th percentile; lower horizontal line of box, 25th percentile; horizontal bar within box, median; upper horizontal bar outside box, 90th percentile; lower horizontal bar outside box, 10th percentile.

Next, we focused on genes with different between-accession expression patterns based on SOM clustering results (Fig. 5a). Such displaced gene sets between accessions among clusters exhibited certain tendencies (Fig. 5b; all directions from accession N to S). Pre- and post-displacement differences in expression pattern occurred primarily at 25 °C (Fig. 5c). GO enrichment analysis with these displaced gene sets between accessions among clusters showed that the GO term “photosynthesis” was significantly enriched in the displacements 3 → 6 (q value: 0.0346), 6 → 3 (0.0149), and 9 → 6 (0.000005), as were other photosynthesis-related GO terms, such as “thylakoid” (Table 3). Among the enriched genes were putative Arabidopsis orthologs of photosystem I subunit H-1 (AT3G16140), photosystem II subunit Q-2 (AT4G05180), CURVATURE THYLAKOID 1 C (AT1G52220), and NAD(P)H-quinone oxidoreductase subunit 2 A (ATCG00890) (see Supplementary Table S2). We confirmed that expression levels varied between accessions (see Supplementary Fig. S4). These results suggest that R. aquatica accessions differ physiologically in photosynthetic activity.

Figure 5

Displacement of orthologs to different clusters under the SOM clustering scheme. (a) A diagram demonstrating SOM clustering. N and S orthologs can be assigned to different clusters. (b) A network representation of ortholog assignment to different SOM clusters. Arrows represent displacement from accession N to S. Arrow sizes are proportional to the number of displaced orthologs. (c) Major displacement directions after SOM clustering of data that were scaled separately by accessions. Line plots indicate representative expression patterns in each cluster.

Table 3 Result of GO enrichment analysis using displacement of orthologs to different clusters under SOM clustering scheme.

As the q value of “photosynthesis” was the lowest in 9 → 6 compared with other displacements such as 3 → 6 and 6 → 3 (Table 3), we then constructed an enrichment map focused on GO terms in 9 → 6. The results showed that communities 1, 2, and 3 were represented by “Biological process,” “Cellular component,” and “Molecular function,” respectively (Fig. 6a). Community 2 comprised the enrichment of terms such as “thylakoid” and “cytoplasm.” In community 3, “nucleotide binding” was enriched (see Supplementary Fig. S5). In contrast, “photosynthesis” was significantly enriched under the “metabolic process” and “cellular process” GO terms in community 1 (Fig. 6b). Therefore, we investigated photosynthetic activity to verify the presence of between-accession differences.

Figure 6

GO enrichment map with differentially expressed genes (DEGs) displaced from cluster 9 to cluster 6. (a) Three distinct communities (generated by Cytoscape) are on the map. (b) GO enrichment map of community 1 from (a). The red to blue scale indicates high to low q values, or P values adjusted with Benjamini-Hochberg.

Electron transport rate (ETR) and redox state of the plastoquinone (PQ) pool are different between accessions

Chlorophyll fluorescence parameters were analyzed to evaluate photosynthetic activity. In accessions N and S grown at 20 °C and 25 °C, PSII activity was high, with a maximum quantum yield (Fv/Fm) greater than 0.8 (Fig. 7a), indicating that photoinhibition was not observed. Under all light intensities, both accessions grown at 20 °C showed similar ETR (Fig. 7b), an indicator of the relative electron flow rate through PSII during steady-state photosynthesis. In contrast, accession N’s ETR values were lower than accession S at 25 °C and were saturated at a lower light intensity (Fig. 7b). To analyze electron transport in more detail, the 1-qL parameter, which reflects the redox state of the PQ pool, was measured. When grown at 25 °C, accession N had higher 1-qL than accession S, indicating a more electron-reduced PQ pool in the former (Fig. 7c). These results indicated that accession S had higher photosynthetic activity than accession N at 25 °C, but not at 20 °C. This is unsurprising because pre- and post-displacement differences in expression pattern occurred primarily at 25 °C (Fig. 5C). Additionally, we measured NPQ and observed no difference in NPQ induction between accessions (Fig. 7d).

Figure 7

Measurements of photosynthetic parameters in two accessions. (a) Maximum quantum efficiency of photosystem II (Fv/Fm). (b) Light-intensity dependence of the electron transport rate (ETR). The ETR was calculated as ΦPSII × light intensity (μmol photons m−2 s−1). (c) Light-intensity dependence of the redox state of plastoquinone (1-qL). (d) Light-intensity dependence of the non-photochemical quenching (NPQ) of chlorophyll fluorescence. All data are the means of five replicates; vertical bars represent SE. *p < 0.05 based on Welch’s t-tests.

Together, our data showed that between-accession differences in the expression of photosynthesis-related genes might contribute to the more active photosynthetic electron transfer system in accession S at warmer temperatures.


To investigate physiological differences between two R. aquatica accessions, we used phylogenetic, transcriptomic, bioinformatic, and physiological approaches. First, we reconstructed a phylogeny of Rorippa to confirm the relationship between two accessions with different habitats. Next, we performed large-scale RNA-seq, de novo assembly, and transcriptome annotation of the two accessions. We then compared these transcriptomes using PCA and SOM construction. We focused especially on genes with different between-accession expression patterns, based on comparisons of results from SOM clustering (Supplementary Fig. S6). The results suggested that photosynthetic capability, as measured by ETR and 1-qL, differs between the accessions. This difference may be an adaptive response to variation in growing season length or temperature. Overall, this study demonstrated that combining RNA-seq and clustering methods can reveal cryptic physiological differences between closely related accessions.

Previous studies showed that clustering methods combining PCA and SOM are effective in extracting gene subsets associated with phenotypes of interest from large-scale transcriptome data between species20. Although the use of PCA and SOM on transcriptome data identified numerous enriched GO terms related to between-accession physiological differences (including in photosynthesis), the sheer number of terms hampered our ability to focus on the most likely candidates. The high-dimensional data obtained from large-scale RNA-seq often requires simplification and conversion to become more interpretable21. Therefore, we reduced data dimensionality via scaling data separately by accessions before performing another PCA and SOM clustering. This fine-tuning let us uncover enrichment of photosynthesis-related genes (GO: 0015979; Q q value: 0.000005) in gene sets displaced between accessions among clusters. Indeed, our investigation of chlorophyll fluorescence parameters demonstrated between-accession differences in ETR and 1-qL, supporting results from the GO enrichment analysis. These results indicate that RNA-seq combined with SOM is remarkably effective for investigating cryptic differences between accessions, as long as data dimensionality is reduced first.

Physiological experiments revealed that accession S has higher ETR and lower 1-qL than accession N when both were grown at 25 °C, indicating that photosynthetic activity may be higher in accession S. When grown at 20 °C, however, accessions did not differ in their chlorophyll fluorescence parameters. Therefore, accession S may have a higher carbon fixation rate than accession N at 25 °C. Thus, these data suggest that accession S may be better adapted to 25 °C or higher temperatures.

The greater photosynthetic activity in accession S compared with accession N, particularly at higher temperatures, is useful for understanding the history of these two populations. The habitats of accessions S and N are thought to be respectively southern (e.g., Florida) and northern (e.g., Ohio and New England) United States15, spanning a wide range of temperatures and day lengths. These considerable environmental gradients can lead to local adaptation. It seems that our physiological experiment on photosynthetic activity provided evidence that accession S was better adapted to 25 °C than accession N. Indeed, the annual average temperature is 22 °C–26 °C in Florida and lower in Ohio ( National Weather Service Climate Prediction Center). Similarly, northern and southern Populus angustifolia populations differ in photosynthetic physiology corresponding to latitude across the North American continent; this variation may be an adaptive response to differences in growth season length, temperature, and insulation16. This relationship between photosynthetic physiology and latitude has also been reported in other North American plant species22. Thus, observed patterns in photosynthetic activity among R. aquatica accessions may be explained by similar adaptive measures.

Our method of combining RNA-seq and SOM was successful in detecting cryptic physiological differences between R. aquatica accessions. By using this method, further work could considerably clarify the molecular mechanisms underlying heterophylly in this species. Beyond R. aquatica research, this comparative technique has broad applications that can be improved further with recent advances in software, packages, and methods for fine-tuned transcriptome analysis23,24,25. Some of these analyses include predicting co-expression networks and defining participating modules, as well as investigating differential co-expression across disparate datasets. Indeed, this comparative transcriptome method has resulted in a gene network module regulating interspecific diversity in the genus Solanum26. Thus, comparative transcriptomics will contribute largely to uncovering key regulatory mechanisms affecting variation between and within species. The knowledge obtained from comparative transcriptomics will provide fundamental insight into evolutionary and ecological developmental biology, especially on the concept of rewiring network interactions during evolution, a process that can lead to speciation and local adaptation.


Plant materials

Rorippa aquatica plants (two accessions, N and S) were planted in soil and grown at three temperatures (20 °C, 25 °C, 30 °C) in a growth chamber under continuous illumination, with a light intensity of 60 or 120 µmol photons m−2 s−1. Seedlings were watered every two days. According to previous reports, N and S accessions are thought to have representative phenotypes from northern and southern populations10,11,15. All plants were cultivated in each condition for a month except those used for the physiological experiment, which were cultivated for two months. The shoot apical meristem subtending P1–P3 leaf primordia were frozen in liquid nitrogen just after sampling, and then stored at −80 °C until needed for DNA and RNA extraction.

Phylogenetic analyses

Phylogenetic trees were reconstructed in MEGA627 with the neighbor-joining (NJ) and maximum-likelihood (ML) methods28,29. Bootstrap values were derived from 1000 replicate runs.

Sequences of the non-coding regions in the trnL intron, trnG (GCC)-trnM (CAU), and psbC-trnS (UGA) were determined from 46 samples of Rorippa species distributed worldwide and two samples from outgroups Nasturtium officinale W.T.Aiton and Cardamine africana L. (Table 1). All sequence data were deposited in the DNA Data Bank of Japan (DDBJ) (Table 1). Their lengths were 517–527 bp for trnL intron, 224–228 bp for trnG-trnM, and 205–222 bp for psbC-trnS.

The optimal NJ phylogenetic tree is shown in Fig. 1e (sum of branch lengths = 0.14081464), along with relationships between the clades and localities of individuals (see also Table 1). A bootstrap test of 1000 replicates30 was used to calculate the percentage of replicate trees in which the associated taxa clustered together.

Evolutionary distances (number of base substitutions per site) were computed using maximum composite likelihood (MCL). The analysis involved 48 nucleotide sequences. Included codon positions were 1st + 2nd + 3rd + Noncoding, while all positions containing gaps and missing data were eliminated, resulting in a final dataset of 910 positions.

The ML phylogenetic tree with the highest log likelihood (-2191.1860) is shown in Supplemental Fig. S1. Initial tree(s) for the heuristic search were obtained automatically: Neighbor-Join and BioNJ algorithms were applied to a matrix of pairwise distances estimated with MCL, and then the topology with a superior log likelihood value was selected. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site. The analysis involved 48 nucleotide sequences. Codon positions included were 1st + 2nd + 3rd + Noncoding. All positions containing gaps and missing data were eliminated to result in a final dataset of 910 positions.

RNA-seq and de novo assembly

Total RNA was extracted from the shoot apical meristem with subtending P1–P3 leaf primordia and shoot with an RNeasy Plant Mini Kit (QIAGEN), for multiplex sequencing in the Illumina Genome Analyzer IIx (Illumina). RNA-seq libraries were prepared using a NEBNext mRNA Library Prep Reagent Set for Illumina (NEB). To find differentially expressed genes (DEGs), 48 libraries (two accessions, three temperatures, two light intensities, and four biological replicates) were prepared. De novo assembly was generated with RNA from several controlled growth conditions (see “Plant materials”), because changes in ambient temperature and light intensity affect leaf morphology13, and because certain transcripts may only be expressed in specific environments.

Longer reads for de novo assembly were obtained through RNA-seq with MiSeq (Illumina). Total RNA was extracted from the shoot apex subtending the leaf primordia. Libraries for MiSeq were prepared with a TruSeq Stranded Total RNA Sample Prep Guide (Illumina), and sequenced with a MiSeq Reagent Kit v3, both following manufacturer protocols.

Short single-end and long paired-end reads were assembled into transcriptome contigs using Trinity31, with default assembling settings. The minimum assembled contig length in our study is 200 bp. BlastX searches of obtained contigs against non-redundant protein sequences from GenPept, SwissProt, PIR, PDF, PDB, and NCBI RefSeq (nr) databases were conducted to find similar known protein sequences. Gene ontology (GO) information was mapped to each contig based on Blastx results with Blast2GO32.

Gene expression profiling with RNA-seq data

Single-end reads were separated by indices, then trimmed and quality-filtered. Raw reads were then mapped with BWA33 ( Contigs from de novo assembly were used as reference sequences for mapping. Transcript expression profiles and DEGs were defined with EdgeR GLMs34. After quality filtering, 93.4% (80,304,302) of the single-end reads were mapped to the reference de novo assembly data using BWA version 0.7.5 (parameters “-n 2 -e 2”). For further analysis in R (version 3.2.1), lowly expressed genes were filtered based on a minimum sum of 10 counts over all samples (genes below this threshold were considered not expressed). Libraries were subjected to trimmed mean of M-values (TMM) normalization in EdgeR. Multi-dimensional scaling was performed via calculating log-fold changes between accessions and using DEGs to compute distances in EdgeR with the “plotMDS” function. Differential expression was calculated via fitting a generalized linear model (GLM) at the gene level using temperature and accession as factors. The threshold for DEGs was a false discovery rate (FDR) of < 0.01; this yielded 8,809 genes. Bioinformatics and statistical analyses were performed on the iPLANT Atmosphere cloud server (

Principal components analysis with SOM clustering and GO analysis

We applied a gene-expression clustering method20 on all 8,809 DEGs defined with EdgeR. Scaled expression values were used for multilevel 5 × 4 and 3 × 3 rectangular SOM clusters (Supplementary Fig. S6)35,36. One hundred training interactions were used during clustering, and gene clusters were based on the final assignment of genes to winning units. To focus only on gene-expression patterns instead of expression magnitude, expression values were mean-centered and variance-scaled separately between accessions in a 3 × 3 rectangular SOM. Using such data allows separate treatment of genes from each accession and uncovers orthologs that cluster differently based on their existing groups (e.g., accessions or species20). This procedure makes it possible to focus on genes that vary in expression patterns between accessions.

The outcome was then visualized in a PCA, with PC values calculated from gene expression across samples (R stats package, prcomp function). For 3 × 3 rectangular SOM clusters, network graphics in Gephi37 were used to visualize—as a directed network—the assignment of genes from different accessions to separate clusters. Arrow direction indicates gene assignment to clusters, from accession N to accession S, with arrow size proportional to gene number represented. Clustered and displaced gene sets among clusters were then subjected to GO analysis using Cytoscape and visualized with the BinGO38 ( Resultant P values were adjusted with the Benjamini-Hochberg method to yield q values. Blast2GO results were used as annotation data.

Chlorophyll fluorescence analysis

Chlorophyll fluorescence was measured with a Mini-PAM (pulse-amplitude modulation) portable chlorophyll fluorometer (Walz). For this analysis, all plants were grown under each environmental condition for two months. Minimum fluorescence (Fo) was obtained with open Photosystem II (PSII) centers in the dark-adapted state through a low-intensity measuring light (wavelength 650 nm, 0.05–0.1 μmol photons m−2 s−1). A saturating pulse of white light was applied to determine the maximum fluorescence with closed PSII centers in the dark-adapted state (Fm) and during actinic light (AL) illumination (Fm′). The steady-state fluorescence level (Fs) was recorded during AL illumination (17–1184 μmol photons m−2 s−1). The quantum yield of PSII (ΦPSII) was calculated as (Fm′ − Fs)/Fm39. The relative rate of electron transport through PSII (ETR) was calculated as ΦPSII × light intensity (μmol photons m−2 s−1). The fraction of the open PSII center (qL) was calculated as [ΦPSII/(1 – ΦPSII)] × [(1 − Fv/Fm)/(Fv/Fm)] × (NPQ + 1)40. Non-photochemical quenching (NPQ) was calculated as (Fm − Fm′)/Fm′; this parameter is roughly indicative of excess absorbed light dissipation as heat to minimize oxygen radical formation in angiosperms. To analyze light-intensity dependence of fluorescence parameters, AL intensity was increased in a step-wise manner every two minutes after applying a saturating pulse.


  1. 1.

    Tsukaya, H. Comparative leaf development in angiosperms. Curr Opin Plant Biol 17, 103–109, (2014).

    Article  PubMed  Google Scholar 

  2. 2.

    Iida, S. et al. Molecular adaptation of rbcL in the heterophyllous aquatic plant Potamogeton. PLoS One 4, e4633, (2009).

    ADS  Article  PubMed  PubMed Central  Google Scholar 

  3. 3.

    Nakayama, H., Yamaguchi, T. & Tsukaya, H. Acquisition and diversification of cladodes: leaf-like organs in the genus Asparagus. Plant Cell 24, 929–940, (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  4. 4.

    Vlad, D. et al. Leaf shape evolution through duplication, regulatory diversification, and loss of a homeobox gene. Science 343, 780–783, (2014).

    ADS  CAS  Article  PubMed  Google Scholar 

  5. 5.

    Alonso-Blanco, C. & Koornneef, M. Naturally occurring variation in Arabidopsis: an underexploited resource for plant genetics. Trends Plant Sci 5, 22–29 (2000).

    CAS  Article  PubMed  Google Scholar 

  6. 6.

    Weigel, D. Natural variation in Arabidopsis: from molecular genetics to ecological genomics. Plant Physiol 158, 2–22, (2012).

    CAS  Article  PubMed  Google Scholar 

  7. 7.

    Al-Shehbaz, I. A. A generic and tribal synopsis of the Brassicaceae (Cruciferae). Taxon 61, 931–954 (2012).

    Google Scholar 

  8. 8.

    Appel, O. & Al-Shehbaz, A. In The Families and Genera of Vascular Plants (ed. Kubitzki, K.) 75–174 (Springer Verlag, 2003).

  9. 9.

    Stift, M., Luttikhuizen, P. C., Visser, E. J. & van Tienderen, P. H. Different flooding responses in Rorippa amphibia and Rorippa sylvestris, and their modes of expression in F1 hybrids. The New phytologist 180, 229–239, (2008).

    Article  PubMed  Google Scholar 

  10. 10.

    La Rue, C. Regeneration in Radicula aquatica. Michigan Academian 28, 51–56 (1943).

    Google Scholar 

  11. 11.

    Al-Shehbaz, I. A. & Bates, V. Armoracia lacustris (Brassicaceae), the correct name for the North American lake Cress. Journal of the Arnold Arboretum 68, 357–359 (1987).

    Article  Google Scholar 

  12. 12.

    Fassett, N. C. A Manual of Aquatic Plants. (University of Wisconsin Press, 1930).

  13. 13.

    Nakayama, H. et al. Regulation of the KNOX-GA gene module induces heterophyllic alteration in North American lake cress. Plant Cell 26, 4733–4748, (2014).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  14. 14.

    Nakayama, N., Nakayama, N., Nakamasu, A., Sinha, N. & Kimura, S. Toward elucidating the mechanisms that regulate heterophylly. Plant Morphology 24, 57–63 (2012).

    Article  Google Scholar 

  15. 15.

    Gabel, J. D. & Les, D. H. Neobeckia aquatica Eaton (Greene) North American Lake Cress. (New England Wild FlowerSociety, Framingham, MA., 2000).

  16. 16.

    Kaluthota, S. et al. Higher photosynthetic capacity from higher latitude: foliar characteristics and gas exchange of southern, central and northern populations of Populus angustifolia. Tree physiology 35, 936–948, (2015).

    CAS  Article  PubMed  Google Scholar 

  17. 17.

    Bushman, B. S., Amundsen, K. L., Warnke, S. E., Robins, J. G. & Johnson, P. G. Transcriptome profiling of Kentucky bluegrass (Poa pratensis L.) accessions in response to salt stress. BMC Genomics 17, 48, (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  18. 18.

    Anchev, M. E. & Tomsovic, P. The Rorippa pyrenaica group (Brassicaceae) in the Balkan peninsula. Folia Geobotanica 34, 261–276 (1999).

    Article  Google Scholar 

  19. 19.

    Jonsell, B. In Flora Helenica (eds Strid, A. & Tan, K.) (Gartner Verlag, 2002).

  20. 20.

    Chitwood, D. H., Maloof, J. N. & Sinha, N. R. Dynamic Transcriptomic Profiles between Tomato and a Wild Relative Reflect Distinct Developmental Architectures. Plant Physiology 162, 537–552, (2013).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  21. 21.

    Sinha, N. R., Rowland, S. D. & Ichihashi, Y. Using gene networks in EvoDevo analyses. Curr Opin Plant Biol 33, 133–139, (2016).

    CAS  Article  PubMed  Google Scholar 

  22. 22.

    McKown, A. D. et al. Geographical and environmental gradients shape phenotypic trait variation and genetic structure in Populus trichocarpa. The New phytologist 201, 1263–1276, (2014).

    CAS  Article  PubMed  Google Scholar 

  23. 23.

    Fukushima, A. et al. Exploring tomato gene functions based on coexpression modules using graph clustering and differential coexpression approaches. Plant Physiol 158, 1487–1502, (2012).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  24. 24.

    Fukushima, A. DiffCorr: an R package to analyze and visualize differential correlations in biological networks. Gene 518, 209–214, (2013).

    CAS  Article  PubMed  Google Scholar 

  25. 25.

    Mohamed, A., Hancock, T., Nguyen, C. H. & Mamitsuka, H. NetPathMiner: R/Bioconductor package for network path mining through gene expression. Bioinformatics (Oxford, England) 30, 3139–3141, (2014).

    CAS  Article  Google Scholar 

  26. 26.

    Ichihashi, Y. et al. Evolutionary developmental transcriptomics reveals a gene network module regulating interspecific diversity in plant leaf shape. Proceedings of the National Academy of Sciences of the United States of America 111, E2616–2621, (2014).

    ADS  CAS  Article  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Tamura, K., Stecher, G., Peterson, D., Filipski, A. & Kumar, S. MEGA6: Molecular Evolutionary Genetics Analysis version 6.0. Mol Biol Evol 30, 2725–2729, (2013).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  28. 28.

    Saitou, N. & Nei, M. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4, 406–425 (1987).

    CAS  PubMed  Google Scholar 

  29. 29.

    Tamura, K. & Nei, M. Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 10, 512–526 (1993).

    CAS  PubMed  Google Scholar 

  30. 30.

    Felsenstein, J. Phylogenies and the Comparative Method. The American Naturalist 125, 1–15 (1985).

    Article  Google Scholar 

  31. 31.

    Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature biotechnology 29, 644–652, (2011).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  32. 32.

    Götz, S. et al. High-throughput functional annotation and data mining with the Blast2GO suite. Nucleic Acids Res 36, 3420–3435, (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  33. 33.

    Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  34. 34.

    Robinson, M. D., McCarthy, D. J. & Smyth, G. K. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics (Oxford, England) 26, 139–140, (2010).

    CAS  Article  Google Scholar 

  35. 35.

    Kohonen, T. Self-Organized Formation of Topologically Correct Feature Maps. Biol Cybern 43, 59–69, (1982).

    MathSciNet  Article  MATH  Google Scholar 

  36. 36.

    Wehrens, R. & Buydens, L. M. C. Self- and super-organizing maps in R: The kohonen package. J Stat Softw 21, 1–19 (2007).

    Article  Google Scholar 

  37. 37.

    Bastian, M., Heymann, S. & Jacomy, M. Gephi: an open source software for exploring and manipulating networks. In Proceedings of the Third International Conference on Weblogs and Social Media. AAAI Press, Menlo Park, CA, 361–362 (2009).

  38. 38.

    Maere, S., Heymans, K. & Kuiper, M. BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics 21, 3448–3349 (2005).

    CAS  Article  PubMed  Google Scholar 

  39. 39.

    Genty, B., Briantais, J. M. & Baker, N. R. The Relationship between the Quantum Yield of Photosynthetic Electron-Transport and Quenching of Chlorophyll Fluorescence. Biochim Biophys Acta 990, 87–92 (1989).

    CAS  Article  Google Scholar 

  40. 40.

    Miyake, C., Amako, K., Shiraishi, N. & Sugimoto, T. Acclimation of tobacco leaves to high light intensity drives the plastoquinone oxidation system–relationship among the fraction of open PSII centers, non-photochemical quenching of Chl fluorescence and the maximum quantum yield of PSII in the dark. Plant Cell Physiol 50, 730–743, (2009).

    CAS  Article  PubMed  Google Scholar 

Download references


We thank Dr. Kaoru O. Yoshiyama for helpful discussions throughout our study. This research was partially supported by a Grant-in-Aid for Scientific Research on Innovative Areas (JP16H01472), JSPS KAKENHI (JP16K07408), and the MEXT-Supported Program for the Strategic Research Foundation at Private Universities (S1511023) to S.K., as well as by a Research Fellowship from JSPS (13J00161) to H.N. H.N. and N.S. were supported by a National Science Foundation grant (1558990). This work used computational resources and cyberinfrastructure provided by the iPlant Collaborative (, which is funded by NSF Grant DBI-0735191.

Author information




H.N. and S.K. conceived and designed the research. H.N., T.S., Y.O., K.K., M.F., Y.I., T.K., and S.K. performed the experiments. H.N., T.S., Y.O., Y.I., K.M., I.A., N.S., and S.K. wrote the article.

Corresponding author

Correspondence to Seisuke Kimura.

Ethics declarations

Competing Interests

The authors declare no competing interests.

Additional information

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Nakayama, H., Sakamoto, T., Okegawa, Y. et al. Comparative transcriptomics with self-organizing map reveals cryptic photosynthetic differences between two accessions of North American Lake cress. Sci Rep 8, 3302 (2018).

Download citation

Further reading


By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing