Comparative transcriptome analysis of field- and chamber-grown samples of Colobanthus quitensis (Kunth) Bartl, an Antarctic flowering plant

Colobanthus quitensis is one of the two vascular plants inhabiting the Antarctic. In natural habitats, it grows in the form of a cushion or mats, commonly observed in high latitudes or alpine vegetation. Although this species has been investigated over many years to study its geographical distribution and physiological adaptations to climate change, very limited genetic information is available. The high-throughput sequencing with a de novo assembly analysis yielded 47,070 contigs with blast-hits. Through the functional classification and enrichment analysis, we identified that photosynthesis and phenylpropanoid pathway genes show differential expression depending on the habitat environment. We found that the known ‘plant core environmental stress response (PCESR)’ genes were abundantly expressed in Antarctic samples, and confirmed that their expression is mainly induced by low-temperature. In addition, we suggest that differential expression of thermomorphogenesis-related genes may contribute to phenotypic plasticity of the plant, for instance, displaying a cushion-like phenotype to adapt to harsh environments.

Colobanthus quitensis (Caryophyllaceae) is a self-fertilizing species that grows in the form of a cushion or mats. It is distributed broadly from Mexico (17°N) to the Antarctic Peninsula (68°S) and at altitudes ranging from 0 to 4200 meters [10][11][12] and is known to exhibit considerable morphological variation depending on their habitats [10][11][12][13][14] . The morphology analysis of the C. quitensis populations along an Antarctic-subantarctic latitudinal gradient revealed that populations at higher latitudes have smaller and thicker leaves with higher mesophyll thickness, narrower adaxial surfaces, increased pigments and reduced epidermis 15,16 .
Structural and physiological characteristics responsible for the high tolerance of C. quitensis to extreme conditions were described in terms of the difference in leaf anatomical traits and ultracellular structures and varied photosynthetic responses [16][17][18][19][20][21][22] . In laboratory controlled conditions, the leaves of C. quitensis had anatomical and physiological changes when exposed to abiotic stressors such as low-temperature, high light or ultraviolet light [18][19][20][21]23 and the photosynthetic efficiency was primarily regulated by temperature 21,23 . In addition, field studies have shown that the OTC (Open Top Chamber) warming effects affect plant growth by influencing plant morphoanatomical traits, cellular chemical composition, and photosynthetic parameters 17,22 . These suggest that the developmental changes induced by temperature ultimately determine photosynthetic efficiency.
On the other hand, the adaptation of Antarctic C. quitensis to the extreme environment has been explained by population-specific genetic evolution [18][19][20] . The studies with two geographically separated populations have shown that the Antarctic ecotype of C. quitensis undergoes less photoinhibition than an alpine ecotype because it has better recovery rates of PSII than the alpine ecotype 19,20 . This suggests that the Antarctic ecotype of C. quitensis might have developed a unique stress response mechanism operating at the molecular level to survive in a harsh environment.
Despite previous studies on morphological differences and stress resistance of C. quitensis, there is no comprehensive report available on genetic regulation using plants growing in a natural habitat. To elucidate the regulatory networks involved in gene expression associated with environmental stress-driven tolerance responses and morphological plasticity of C. quitensis, we constructed a de-novo transcriptome assembly and compared the transcriptome profiles on plant samples collected from low-temperature environment of Antarctic natural habitat versus those grown with the mild growth condition in the laboratory. We classified the genes according to functional categories through GO and KEGG analysis, identified genes that differ in their expression level with the habitat, and deduced their potential functions through ortholog analysis with the model plant. Our findings may provide new insights into the genetic control system developed by plants adapted to the extreme environment.

Morphology of ANT and LAB samples of C. quitensis. C. quitensis plants have shown changes in mor-
phology and physiological responses with respect to geographical distribution and microclimate differences 18,20,24 . The transcriptome profile of extremophile plants in their natural habitats provides biological information as to how they optimize gene regulation to overcome environmental stresses. Therefore, to investigate the molecular and genetic mechanisms associated with stress tolerance and morphological plasticity due to environmental stress in C. quitensis, we compared the transcriptome profiles between plants inhabiting in Antarctic Barton Peninsula and plants cultivated in the mild growth condition in the laboratory. The Antarctic field plants designated ANT were collected in the Baton Peninsula in January 2013 (Fig. 1a). The laboratory cultures designated as LAB were cultivated at 16 °C, a temperature within the known optimal leaf temperature range (14~18 °C) for photosynthesis of C. quitensis 21 . The morphological differences between the LAB and ANT samples were observed during growth. While the LAB plants have longer leaf blades and elongated hypocotyls, the ANT plants have a denser structure consisting of short and thick leaves ( Fig. 1b-d).
Deep sequencing, de novo assembly, and functional annotation of the C. quitensis transcriptome.
Total 160,264,674 reads were generated from Illumina HiSeq. 2500 run. Trimming of raw sequences provided 143,972,486 high-quality reads (Q > 30) comprising 13.7 Gb of sequences. Using CLC Genome Workbench assembly module, the reads were assembled into 101,690 contigs averaging 764 bp and N50 value of 1,058 bp. The assembled contigs were clustered into 95,010 representative contigs using the CD-HIT-EST program (see Supplementary Table S1).
To annotate the assembled unigenes, the sequences were blast searched against GenBank nr or UniProt databases using the BLASTX algorithm. When we blasted them against the nr database, 47,070 contigs had at least one significant alignment to an existing protein data with an E-value of 1 × 10 −5 and 36,387 unigenes (38.3%) had mapping results (Supplementary Table S1). When searched in UniProt databases, 45,037 genes have orthologs (47.4%) with existing protein sequences in the database. Among those, 10,456 genes were predicted as complete or putative complete genes (11%), 11,837 genes were predicted as putative new genes (12.5%) and 38,116 genes remained as unknown (40.1%) (Supplementary Table S1). The species distribution of BLAST matches to the UniProt plant databases is shown in Supplementary Fig. S1. The majority of the genes were matched to the dicot genes, predominantly with A. thaliana (42%) and Vitis vinifera (11%).
To classify the functions of the predicted genes of C. quitensis, GO terms were assigned to total contigs. Based on classification of GO terms, a total of 26,346 contigs were assigned with at least one GO term and classified into different categories using the PlantGO slim terms ( Supplementary Fig. S2, Table S2). In terms of 'biological processes' , the top 2 GO terms were 'metabolic process' and 'cellular Process' . In terms of 'molecular function' , 'catalytic activity' and 'binding' were top 2 GO terms. To infer the functions and the utilities of the genes in the biological system, total contigs were blast queried against the KEGG databases 25 (organism code: ath) with E-value < 10 −10 . As a result, 11,950 contigs were mapped to KEGG pathways with hierarchy and they are classified into 5 functional categories ( Supplementary Fig. S3 specific pathways of the lower hierarchy, the top 3 pathways were 'ribosome' , 'protein processing in endoplasmic reticulum' and 'oxidative phosphorylation' (Supplementary Table S3).

SSR Identification.
Simple-sequence repeats (SSRs) are well established and have increasingly become the marker of choice for population genetic analyses due to their codominant, highly polymorphic and highly reproducible nature 26 . C. quitensis is an important species for studying plant ecology with geographic parameters. Studies about physiological differences involving genetic diversity have been performed among C. quitensis populations 18,19,27 . Given the research interests and the importance of C. quitensis, molecular markers for this species can be valuable resources. We identified EST-SSRs from assembled contigs during the transcriptome analysis of C. quitensis. A total of 8,619 distinct loci containing motifs between one and six nucleotides in size were discovered in 7,749 contigs. Approximately 60.5% of SSRs were mononucleotides, 28.3% were trinucleotides, 9.3% were dinucleotides and the remaining ~2% were tetra-, penta-, and hexanucleotides. The most frequent dinucleotide SSR was AG/CT (382/4.43%), however, for trinucleotide SSRs, AAC/GTT, AAT/ATT, ACC/GGT, AAG/CTT, and ATC/ATG show similar frequency in the range of 4.2-4.9% (Supplementary Table S4).

Identification of differentially expressed genes in ANT vs. LAB samples.
To compare the transcriptome of ANT and LAB samples, we remapped the reads generated from each library to the assembled 95,010 contigs and counted the read numbers that mapped to each contig. Statistical analysis revealed totally 3,902 differentially expressed genes (DEGs) in which 2,127 transcripts were upregulated and 1,775 transcripts were downregulated in ANT samples compared to LAB sample with a cutoff of FDR corrected p-value < 0.05 (beta-binomial test) (Supplementary Tables S5 and S6). Comparative GO enrichment analysis was conducted on the subsets of 3,900 DEGs, 2,127 upregulated DEGs, and 1,775 downregulated DEGs, respectively with the complete set The temperature of the site was measured from a temperature sensor installed 5 cm above the soil surface. During the period, the average temperature was 4.3 °C and 1.8 °C for day and night, respectively. Luminosity was close to zero from midnight to 04:00 AM. For ANT samples, plants were harvested between 11:00 and 13:00 during mid-to-late January (shaded). (b,c) Wild plants from the natural habitat of the Antarctic, and (d,e) a plant grown in a climate chamber. Side-bars indicate 1 cm. The C. quitensis found on the coast of the Baton Peninsula are often found around rocks and their overall shape is hemispherical or similar to moss carpets. When young plants with a diameter of 2 cm or less were transferred to a climate chamber maintained at a temperature of 16 °C and grown in nutrient medium, the density of leaves became much looser than those of similar sized Antarctic plants. of transcripts sets, with Fisher's exact test (FDR < 0.05). Among the GO terms included in 'biological process' category, the 'response to stress (GO: 0006950)' and the 'photosynthesis (GO: 0015979)' categories were the most enriched GO terms in the differentially expressed gene group (Fig. 2a). To elaborate, GO terms of 'response to stress' , 'response to (abiotic, biotic or external) stimulus' and 'extracellular region' were significantly enriched in both upregulated and downregulated gene subsets. In ANT compared to LAB transcriptomes, however, the GO terms of 'secondary metabolic process' , 'response to endogenous stimulus' , 'plasma membrane' , 'vacuole' , 'ribosome' and 'nucleolus' and the GO terms of 'photosynthesis' and 'carbohydrate metabolic process' , 'cellular homeostasis' and 'plastid' were enriched only in the upregulated and downregulated gene subsets, respectively ( Regarding the 'photosynthesis' pathway, it is noteworthy that putative PGR5 (PROTON GRADIENT REGULATION 5, Contig18861) gene, known as an essential protein for photoprotection 28,29 , was upregulated, while many photosynthesis-related genes were downregulated in ANT samples (Supplementary Tables S7-S10).
The increased expression of the 'plant core stress responsive genes' in ANT samples. When plants are treated with abiotic stress, a group of stress genes is commonly expressed at the early stage. This gene response is called 'plant core environmental stress response (PCESR)' , which is well established in Arabidopsis thaliana 31,32 . PCESR is not limited to Arabidopsis but is also found in other plant species, suggesting that PCESR to various abiotic stimuli is conserved in plants 31,[33][34][35] . We hypothesized that the plants growing in Antarctic fields would display high expression of abiotic stress tolerance genes to cope with the extreme environmental stress. Among the 8530 reciprocal-best-hit genes between two species ( Supplementary Fig. S4), 35 C. quitensis orthologs to the Arabidopsis PCESR genes were found (Table S11), and their expression was compared in LAB and ANT samples. As a result, about 40% (17/35) of the PCESR orthologs showed significantly high levels of transcripts (p < 0.05) in ANT samples (Table 2).
We verified their expression levels by quantitative RT-PCR (qPCR) and confirmed if the observed levels are induced by abiotic stimuli including low-temperature, salinity, or drought. Plants were subjected to each abiotic stress for 24 hours and the RNA expression of the genes between treated and non-treated samples was compared. Among the genes tested, 14/17 showed higher expression levels in ANT samples and were induced by abiotic  (Fig. 5). These included the genes with various biological functions such as transcription factors including an APETALATA2 (AP2)/ethylene response factor (ERF) protein (Contig14494), a NAC27-like (Contig38763), a MYB44-like protein Contig16997); C2H2 type zinc finger proteins including ZAT10 (Contig22074) and ZAT12 (Contig28814); a CCR4-associated factor (Contig45609); a calcium-binding protein (Contig38763); a DUF246-containing protein (Contig7995); a sugar transporter 13 (STP13)-like gene (Contig21564); a PUB23 E3 ligase family protein (Contig99760); a major facilitator superfamily gene (Contig15740); ring finger proteins (Contig34911, Contig20878); and a polygalacturonase inhibitor protein (Contig42834). It is noteworthy that these genes respond more specifically to low-temperatures among various abiotic stressors, suggesting that the temperature is the most important factor to induce stress signaling in plants.
Transcription factors associated with morphological changes in different conditions. C. quitensis is a representative cushion plant, which forms a dense hemisphere with short leaves, inhabiting the alpine and high latitude. When the field plants were transferred and grown in a 16 °C climate chamber, we noticed that the compact structure of C. quitensis loosened and the leaves elongated (Fig. 1c,d). Recent reports have shown that some PHYTOCHROME INTERACTING FACTORS (PIFs), bHLH transcription factors, are associated with thermomorphogenesis [36][37][38][39] . In particular, in the case of Arabidopsis PIF4 (AT2G43010), gene expression increases with increasing temperature, while gene expression decreases with decreasing temperature 36 . The length of hypocotyl and leaf petiole was longer than that of WT when these genes were overexpressed, and the opposite when they were knocked out 36,38,39 . These expression patterns and phenotype of transgenic plants suggest that they participate in leaf elongation caused by change in temperature 36,39,40 .
In this regard, we hypothesized that expression of PIF4 and its associated growth-promoting genes would be decreased in ANT samples that have smaller leaves. To prove the hypothesis, first, we tried to determine if the PIF genes of C. quitensis were altered in expression under field vs chamber conditions. A TBLASTN search using Arabidopsis PIF4 as a query yielded five hits (Contig8088, Contig41295, Contig35032, Contig62363, Contig64547) containing a bHLH DNA-binding domain with an E-box/N-box specificity site were found (E-value < 10 −10 ) (Fig. 6a). As expected, the expression levels of all orthologs were lower in the ANT sample, and Contig8088 showed a significant decrease (FDR corrected p-value < 0.05). In Arabidopsis, PIF4 activates the expression of growth-promoting genes such as HFR, IAA29, IAA19, ATHB, and SAUR23 37,39 . Therefore, we examined whether the expression of these gene orthologs known to be targets of PIF4 differs between ANT and LAB samples of C. quitensis. We identified the orthologs of those genes in C. quitensis transcriptome, and the expression values of ATHB2 (Contig12609), HFR (Contig41293), and SAUR23 (Contig10897) were lower in the ANT than LAB samples (Fig. 6b).
These results were reproducible in low-temperature-grown samples. Relative gene expression of orthologs of PIF4 and its downstream genes, ATHB2 and HFR were significantly reduced in both plants grown at 8 °C or 2 °C (Fig. 6C). As expected, leaf length was found to be reduced significantly in low-temperature-grown samples than those grown under control temperature conditions (Fig. 6d). Additionally, proliferation of root and shoot was observed when grown at 16 °C or above than at low-temperature conditions (data not shown). Taken together, the results suggest that the characteristic phenotype of field plants is associated with downregulation of PIF4 and its downstream genes.

Discussion
Plant growth changes due to temperature variations, hyperthermal or hypothermal, has been reported in many studies [41][42][43][44] . In the case of Arctic and Antarctic plants, there have been several reports about the phenotypic plasticity. The chilling temperatures have shown to affect the organelle ultrastructure and the organization of palisade cells in C. quitensis mesophyll 13,24 . The plants naturally growing in the Antarctic field and the cold-acclimated plants, both have deformed chloroplasts with multi-shaped protrusions and invaginations to increase the contact area between adjacent cell compartments 24,45 . The phenotypic plasticity is also evident in another Antarctic vascular plant, Deschampsia antarctica, a Poaceae flowering plant growing in Antarctic regions 46,47 . Recent in situ warming tests on C. quitensis have shown that the photosynthesis response to warming is regulated by the anatomical determinants of leaf CO 2 transfer, which enhanced mesophyll conductance and carbon assimilation, thereby promoting higher leaf carbon gain and plant growth 17,22 . This suggests that phenotypic plasticity by temperature plays a major role in plant growth and adaptation.
In this study, we observed that the leaves of C. quitensis elongated when grown in a growth chamber, the dense cushion shapes of the plants found in nature were loosened. It is expected that plants grown in Alpine/ polar habitats form dense cushions with short leaves to minimize heat loss. On the contrary, when the temperature rises, the density of leaves in a plant decreases with prolonged leaves, thus the CO 2 transfer would be increased, promoting plant growth. Therefore, we suggest that there are developmental regulatory genes that cause anatomical changes of the leaf with temperature and that the PIF transcription factors play a role in this developmental change. There is growing evidence that PIFs act as positive regulators in cell elongation and integrate multiple signaling pathways 37 . Especially, in the case of PIF4 that promotes cell growth, its gene expression is temperature-dependent. Although PIFs are well known for the mechanisms involving thermomorphogenesis associated with high temperature 36,48 , there is not much discussion of the ecological benefits derived from their regulatory roles at low-temperature. In this regard, we observed that the expression of all PIF4 orthologs was found to be lower in the Antarctic samples and the expression of genes such as HFR, IAA19, ATHB, and SAUR23, known as downstream targets, was also found to be lower in the ANT samples than in the LAB samples. In addition, the expression of PIF4 ortholog was dependent on temperature, and the phenotype we observed was in agreement with Kumar et al. 36 .
Phenylpropanoid biosynthesis pathway produces a wide variety of secondary metabolites which contribute to all aspects of plant responses towards biotic and abiotic stimuli, with lignins and flavanones being representative products 30 . In this study, almost all genes involved in the 'phenylpropanoid biosynthesis' pathway were strongly upregulated in ANT samples implying that products and its intermediates could have crucial roles to protect plant cells from abiotic and biotic stress. In Arabidopsis, Zea mays, Eucalyptus globulus, Lotus japonicus and Miscanthus, the major enzymes of the phenylpropanoid pathway such as PAL, CCR, CAD, CHS and CHI have been shown to be strongly induced by cold, drought, UV irradiation or pathogen infection but the gene expression patterns are different according to tissue-and species-specific 33,[49][50][51][52][53][54] . In this regard, transcriptional up-regulation of phenylpropanoid biosynthesis genes in ANT samples may be the result of cross-talk responses due to various environmental factors such as cold, drought and pathogenesis, and this result is correlated with the observation that C. quitensis plants grown in open areas with relatively low ambient temperature exhibited a higher fraction of cell wall chemical components (hemicellulose, cellulose and lignin content) than plants grown inside OTC 22 .
Overall DEG analyses revealed that the transcriptome of field plants of C. quitensis has major changes in 'photosynthesis' genes and 'stress response' genes to adapt to an extreme environment. These results can be compared with transcriptome results from another extremophyte, Eutrema salsugineum, which has a very clear phenotypic plasticity in natural habitats vs. controlled chambers 55,56 . The transcriptome studies of this species have also shown that stress response genes and photosynthetic genes are correlated with phenotypic plasticity at field conditions 56 , suggesting that there may be a common mechanism for how extremophytes adapt to the environment.
In Antarctic C. quitensis, about 40% of PCESR orthologs (14 of the 35 genes), which are known to be induced early in response to abiotic stress in Arabidopsis, were expressed at high levels. In our in-silico study and qPCR expression assay, we identified 14 PCESR genes with diverse functions that were highly expressed in ANT samples and induced by abiotic stresses. The stress response is initiated by the intracellular Ca 2+ change and the generation of ROS 3,57 . This intracellular signaling induces the expression of transcriptional regulatory elements, which in turn induce the expression of downstream genes by binding to cis-elements in the regulatory regions 3,57 .  Among the 14 genes, calmodulin-like proteins and transcription factor genes such as AP2/ERF, NAC, and MYB were included. C2H2 zinc finger proteins ZAT10 and ZAT12 maintained high levels of expression in ANT samples and were specifically induced by low-temperature 58,59 . ZAT12 is activated by different abiotic stresses and is required for systemic H 2 O 2 signaling 58 . Another C2H2 zinc finger protein, ZAT10, is known to be systemically regulated and induced by different stresses and essential for priming for systemic acquired acclimation 59 . We have confirmed that orthologs of ZAT10 and ZAT12 are expressed at high levels in ANT samples of C. quitensis Figure 5. qPCR analysis of PCESR -orthologs from C. quitensis in LAB-vs. ANT samples and the plants exposed to different abiotic stressors. The main graphs are the results of the abiotic stress treated samples and inbox graphs are the results of the LAB-vs. ANT samples. The genes were selected from the 14 PCESR orthologs which have higher expression values in ANT samples from RNA-Seq results. For stress treatment, plants at similar developmental stages were treated with low-temperature (2 °C), high salt (150 mM NaCl) or dehydration for 1 and 7 days. The left vertical axis indicates the relative ratio of transcript abundance of selected genes compared to 18S rRNA and TIM (Contig19814) which were predicted as the best internal controls by reference gene prediction tools 71,72 . Mean and standard deviation are presented (n = 3) and white, grey and black bars display the 0, 1 and 7 days of stress treatments, respectively. The primer sequences used in qPCR analysis are listed in Supplementary Table S12. experiencing various environmental stresses and presume that the expression of these genes will help maintain intracellular homeostasis in response to environmental stress. In addition, an E3-ligase, ring finger proteins, a PGIP protein and various kinds of transporter proteins, which have been reported to be involved in modulation of abiotic stress responses by posttranslational modification, inhibiting of ice-recrystallization or transport of metabolites 47,60,61 , were also highly expressed in ANT samples, and expression was induced by cold stress as well.
It should be noted that we selected genes that are known to be commonly expressed in the early stages of various stress reactions, and we found that they have higher levels of transcripts in ANT samples that have undergone environmental stress such as low-temperature, high salt, drought, or high UV. However, when we stressed plants individually in the laboratory, many genes were induced by low-temperatures, but not by high salt or dehydration. We cannot completely exclude the possibility that we did not see the earlier reaction by setting a single time point after 24 h in our experiment. However, we would expect that the main cause of gene induction by abiotic stress in field sample is low-temperature.
In this study, we found that multiple adaptations such as modulation of energy metabolism, change in PCESR gene expression pattern, and morphological adaptions are present in C. quitensis through a global analysis of field transcriptome of C. quitensis. The results suggest that plants adapted to the extreme environment have developed a modified genetic control system as a survival strategy against harsh conditions. Thus, the C. quitensis transcriptome profile broadens our understanding of how plants tolerate extreme environments and their adaptive responses to climate change.  Fig. 1A. The air temperature of the site was measured from a temperature sensor installed 5 cm above the soil surface. During the warm months, temperatures dropped below 0 °C for only a few days, and the maximum temperature was 14 °C. The average monthly temperature was ca. 2.7 °C. Maximum PAR intensity varied from ca. 1,845 μmol m −1 s −1 on a clear day (19 th January 2013) to ca. 629 μmol m −1 s −1 on a cloudy day (15 th January 2013). The luminosity was close to zero from midnight to 04:00 AM. We harvested ANT samples between 12:00 ~ 14:00 local time.
Growth conditions of laboratory plants. Some  For drought stress treatments, plants were transferred onto 3 mm filter paper, exposed to air in a clean bench for 30 min. Subsequently, the plants on filter papers were transferred back to the empty growth boxes and cultivated until harvest 32 . Plants were harvested for RNA extraction after 0, 1 and 7 days under each condition and prepared in three different biological replicates. All lab−cultured samples were harvested 8 hours after lights-on.

Leaf length measurement.
To compare morphological differences depending on temperature, plants grown at 16 °C with 1.5 ~ 2 cm in horizontal diameter were transferred to growth chambers at 2 °C and 8 °C and cultured for 4 weeks. The length of individual leaves was measured by ImageJ programs (https://imagej.nih.gov/ij/). All plants for measurements were prepared with five biological replicates. Statistical analysis was performed by student's t-test (p < 0.05).
RNA extraction and RNA-Seq library construction. Total RNAs were extracted using TRIzol reagent, treated with DNase I (QIAGEN, Hilden, Germany) to remove contaminant genomic DNA, and were subsequently purified using the RNeasy mini kit (QIAGEN) following manufacturer's protocols. Three biological replicates were prepared. RNA integrity and concentration were determined using a Bioanalyzer (RIN > 6) (Agilent Technologies, Waldbronn, Germany) and a Qubit ® RNA Broad-range Assay Kit (Thermo Fisher Scientific, Waltham, MA, USA), respectively. To construct RNA-Seq libraries, 1.5 µg of total RNA from each sample was used as input for the TruSeq RNA sample prep kit v2 (Illumina, San Diego, CA, USA). The libraries were quantified using a Bioanalyzer and the library qPCR quantification method following the Illumina guideline. After quantification, they were multiplexed in equal ratios and loaded onto a single flow cell of Illumina HiSeq Rapid SBS kit v2 (2 × 100 runs). Sequencing was performed on a HiSeq. 2500 Sequencer system (Illumina) and a total of 16.2 Gb (160M paired-end reads) of sequencing data were generated (Q30 > 93%).
De novo assembly and annotation. De novo assembly was performed using the CLC Genomics Workbench v7.5 software (QIAGEN). The reads were filtered by trimming adapter sequences, excluding low-quality sequences (quality score <0.001, ambiguity <2 bps) and removing too short sequences (length >50 bps) and duplicates. The resulting reads were assembled with following parameters (word size = 20 and bubble size = 50, length >200 bps), and then the assembled contigs were clustered using CD-HIT 63 . A total of 95,010 assembled contigs were subjected to BLASTX searches against the non-redundant protein database with an E-value threshold of 1 × 10 −5 . Gene ontology (GO) mapping and annotation were performed with an annotation cutoff of E-value < 1 × 10 −10 and using Blast2GO platform 64 . GO enrichment analysis was performed with using AgriGO 65 with Fisher's exact test (FDR < 0.05). Putative full-length cDNAs were predicted by comparison of BLASTX reports from the UniProt databases with a web-based ORF prediction tool, Full-lengther 66 . To identify orthologs, translated sequences of C. quitensis derived from ORF prediction and the Arabidopsis protein sequence datasets (TAIR version 10.0) which were obtained from the JGI Phytozome website (https://phytozome.jgi.doe.gov) were compared. The orthologs were identified by reciprocal best-hit analysis for selecting BLASTP parameters with options of soft masking and Smith-Waterman alignment (-seg yes -soft_masking true -use_sw_tback), lowest e-value, query coverage >50 and protein identity >50 as best hits 67 . To map the contigs to the metabolic pathways, the translated sequences of contigs were blasted to the pathways of Kyoto Encyclopedia of Genes and Genomes (KEGG) databases using internal annotation tool in KEGG website 25 . KEGG enrichment analysis was performed using a KOBAS web server 68  and t-tests were performed on the normalized read counts. In addition, several relevant values for analysis, such as p-values and corrected p-values for multiple corrections, were calculated in the "two-group comparison" option. Through the statistical analysis, the differentially expressed genes were determined using a cut-off value (corrected p-value of false discovery rate <0.05 and difference ≠ 0). qPCR analysis. Total RNA was extracted from leaves of plant samples and purified using the RNeasy Plant Mini Kit (QIAGEN) as described previously. cDNA was synthesized from 2 ug total RNA extracted from samples using Superscript III (Invitrogen). Gene-specific primers were designed according to the sequences of the contigs and are listed in Supplementary Table S12. To select internal control genes, we selected 7 candidate genes which have steady expressions at both LAB and ANT transcriptome data as follows: 18S rRNA (contig32901), UBC28, ubiquitin-conjugating enzyme E2 28-like (contig3602), RPB6A, DNA-directed RNA polymerase subunit (con-tig9755), TIM, triosephosphate chloroplast-like (contig19814), CHC, clathrin heavy chain (contig6535), GLX2-4, lactoylglutathione chloroplast-like (contig18505), RPL3, 50S ribosomal protein L3 (conti39479). To validate the fitness of the reference genes, we performed the amplification efficiency test and the gene stability test by RefFinder (http://leonxie.esy.es/RefFinder/) 71 . The amplification efficiency between 95-105% (R 2 > 0.97) was determined as good. And RefFinder analysis carried out using Ct values obtained from all experimental samples. As results of both analyses, ranking for reference gene was followed: TIM > 18S > GLX2-4 > UBC28 > RPL3 > RPB6A > CHC. Also, geNorm algorism 72 gave a suggestion for a combination of TIM and 18S genes, and thus we used them as references genes. The primer information is listed in Supplementary Table S12. A qPCR was performed with biological triplicates using SYBR ® Premix Ex Taq ™ DNA polymerase (Takara Bio Inc., Shiga, Japan) and the