Genotypic and tissue-specific variation of Populus nigra transcriptome profiles in response to drought

Climate change is one of the most important challenges for mankind in the far and near future. In this regard, sustainable production of woody crops on marginal land with low water availability is a major challenge to tackle. This dataset is part of an experiment, in which we exposed three genetically differentiated genotypes of Populus nigra originating from contrasting natural habitats to gradually increasing moderate drought. RNA sequencing was performed on fine roots, developing xylem and leaves of those three genotypes under control and moderate drought conditions in order to get a comprehensive dataset on the transcriptional changes at the whole plant level under water limiting conditions. This dataset has already provided insight in the transcriptional control of saccharification potential of the three Populus genotypes under drought conditions and we suggest that our data will be valuable for further in-depth analysis regarding candidate gene identification or, on a bigger scale, for meta-transcriptome analysis.


Background & Summary
Ongoing climate change, entailing more frequent and severe drought events, put biomass productivity at an increasing risk 1-3 , especially on marginal land. Considering the increasing demand for both, food production and feedstock, energy crop production systems should preferentially utilize perennial crops grown on marginal sites 4 . Consequently, there is a need to develop new germplasms of perennial biomass crops characterized by high productivity and the ability to maintain productivity under water limited growth conditions 5 . To this end, it is pivotal to understand the underlying physiological mechanisms and molecular drivers of drought stress responses and growth adjustment of trees, especially mechanisms underlying adjustments of growth in response to water deprivation.
Because of its adaptation to a broad range of habitats, its fast growth and available molecular and genomic resources, the genus Populus is considered as a model system for woody biomass plants 6 . The publication of the full genome sequence of Populus trichocarpa 7 fostered the application of systems biology approaches on a multitude of Populus species, including most widespread species like P. tremuloides and its European counterpart P. tremula. In Europe, P. nigra, European black poplar, is also widely distributed, and this is reflected in significant phenotypic variation in growth rates, tree architecture and leaf size 8 . P. nigra is a candidate for second generation biofuels that use lignocellulosic biomass 9-11 as well as a raw material for pulp and paper production 12 . As a pioneer species, P. nigra can also be used to preserve local biodiversity, as it is able to outcompete exotic poplar species. Therefore, it is used in restoration and protection of riparian forest sites 13 . In Eastern Europe, P. nigra is planted for soil protection and used for restoration areas polluted by industrial usage 14 . As a species perceived as both, economically and ecologically important, P. nigra is targeted in studies aiming to elucidate the genetic basis of variation in adaptive traits 8,15,16 .
In order to characterize traits that play a conserved role across populations adapted to contrasting natural habitats we included three P. nigra genotypes originating from areas with different water availability. The genotypes were previously shown to be genetically differentiated 8 . This genetic variance between the three subgroups originates from the last glacial period, where there were three refugia for P. nigra 8 in Europe from which the species recolonized Europe. Among the three genotypes, the Spanish genotype grows on the driest land, while the Italian genotype derives from the most humid area 8,17 .
Here, trees of each genotype were exposed to a gradually increasing, precisely controlled, moderate drought for five weeks before harvest. Transcriptome profiles of the developing xylem, fine roots, and fully-expanded young leaves were prepared in order to get a holistic insight into the transcriptional reprogramming in specific tissues and between different genotypes. Of these tissues roots sense drying soil first 18,19 ; the xylem is the water conducting system in plants 20 ; and leaves play the key role in transpiration and gas exchange 21 . Thus, this study covers all pivotal tissues involved in the water balance of plants.
Our data set opens the possibility for analyses in various directions. One could either look at conserved gene clusters across all three genotypes to identify gene families that are important for drought acclimation or other important traits. An example for this kind is the investigation of the transcriptional drought response in the developing xylem 22 , which constitutes a subset of the present data. Weighted gene correlation network analysis identified genes correlated significantly with the saccharification potential of the wood, which was enhanced under drought 22 . Interestingly, no genes involved in lignin biosynthesis were found to be correlated with drought, but polysaccharide biosynthesis genes were upregulated, underpinning the improved saccharification potential 22 . This study gave a first glimpse in the power of the data set generated in this study. In-depth analyses of the whole data set regarding the acclimation of specific tissues to drier conditions and the crosstalk between tissues has not been done yet. In addition, investigations of the intraspecies genetic variation can assist identifying genetic markers that can be used for predictive breeding. Previous studies have mostly concentrated on intraspecific variation in single tissues 23,24 . Our dataset combines data on different tissues and genotypes. We therefore believe that this data set will be of keen interest for the scientific community and can add a piece to the puzzle to understand drought adaptation of trees.

Methods
Plant material. Three genotypes of Populus nigra L., originating from natural populations in France, Italy and Spain were studied. The three genotypes represent clones of individual trees sampled from the populations Drôme 6 (FR-6), La Zelata (IT1), and Ebro 2 (SP-2) 8 . Cuttings were planted in 10-liter plastic pots filled with a 1:1 (v/v) mixture of peat and sand fertilized with a slow-release fertilizer (4 g L −1 of Nutricote T100, 13:13:13 NPK and micronutrients; FERTIL S.A.S, Boulogne Billancourt, France). Plants were grown in two chambers of a glasshouse located at Champenoux, France (48°45′09.3″N, 6°20′27.6″E), under natural light conditions. Growth conditions in the greenhouse were affected by weather conditions, but the temperature was adjusted to not exceed 28 °C, and daily maxima of irradiance ranged from 73-478 W m −2 . Plants were watered three times per day to field capacity on a custom-made automated watering system. Drought experiment. After six weeks of growth, plants of each genotype were randomly assigned to either a control or a drought treatment. The plants were allocated to the two greenhouse chambers in balanced proportions according to genotype and treatment. Plants were exposed to drought by gradually decreasing the available soil water content. The regulation of soil water availability was based on the soil relative extractable water content (REWsoil), which is defined as: REWsoil = (SWC -water content at wilting point)/(water content at field capacity -water content at wilting point), with SWC: soil volumetric water content; water content at wilting point = 3%; water content at field capacity = 32%. Control plants were watered to 85% REWsoil three times per day for the whole 5-week period of the experiment. For drought-treated plants, a target level of 20% REWsoil was defined, which was reached two weeks after starting to gradually withhold water. Plants were watered at this target level of REWsoil for the following three weeks 22,25 (Fig. 1a).
After five weeks of control or drought treatment, all plants were harvested destructively. We used four individual plants from each genotype and water level (3 genotypes × 4 plants × 2 water levels = 24 plants) for the harvest of 3 tissues ( = 72 samples for further analysis). Different tissues per plant were harvested in parallel by several persons to achieve rapid sampling times. Leaf number 10 from the top of each plant was cut off, weighed and flash frozen in liquid nitrogen. The stem was cut at the base, the bark was removed from the lower part and the developing xylem, a soft tissue, was scratched from the surface and immediately frozen in liquid nitrogen. The root system with soil was carefully removed from the pot and briefly washed under tap water to clean fine roots, which were then cut off. The surface water from the collected fine roots was quickly dabbed off by tissue paper and then roots were frozen in liquid nitrogen. The samples were stored at −80 °C. For the extraction of RNA, each tissue was milled to a fine powder keeping the sample frozen by cooling the device under liquid nitrogen. Of each sample 150 mg of frozen tissue was weighed into the extraction medium for RNA. The procedure of RNA sampling to count table is depicted in Fig. 1b. RNA extraction, library preparation and sequencing. Total RNA was extracted from homogenized samples of a young fully expanded leaf, fine roots, and developing xylem of four biological replicates per genotype and treatment using the CTAB protocol 26 with minor modifications described in 27 . After checking quality and integrity (see Technical Validation), 2 µg of total RNA were used for library preparation using the 'TruSeq mRNA Sample Prep kit v2' (Illumina, San Diego, CA, USA), following the manufacturer's instructions. Libraries were www.nature.com/scientificdata www.nature.com/scientificdata/ then processed with Illumina cBot for cluster generation on the flowcell, following the manufacturer's instructions and sequenced in 50 bp single-end mode at the 6-fold multiplex on the Illumina HiSeq2000 (Illumina, San Diego, CA, USA).

Data Records
Raw data of RNA-seq analysis are deposited at the NCBI short read archive under SRP numbers SRP095832 28 (dev. xylem samples) and SRP101711 29 (fine root and leaf samples). Each biological replicate refers to a single SRA Sample Accession (SRS accession, Online-only Table 1). Raw count data are available at figshare data

Technical Validation
RNA integrity was determined using Agilent 2100 Bioanalyzer RNA Nano assay (Agilent technologies, Santa Clara, CA, USA). Average RIN values were 7.9 ± 0.5 for the developing xylem, 7.2 ± 0.4 for fine roots and 7.2 ± 0.3 for leaf samples. qRT-PCR validation of selected genes has been performed by Wildhagen and colleagues 22 . www.nature.com/scientificdata www.nature.com/scientificdata/ Quality assessment. FastQC Version 0.11.9 was applied to the clean reads to assess the quality of the sequence reads 32 . A summary of FastQC reports was generated using MultiQC 33 . Figure 2 shows that the Mean Quality Scores and the Per Sequence Quality Scores of all sequencing results were of high quality indicated by the green color in the diagrams. Per Sequence GC content was consistent at 44-45%. www.nature.com/scientificdata www.nature.com/scientificdata/ Data filtering and processing. Raw sequence reads were processed with the Python package Cutadapt v1.4.2 34 to remove residual adapter contamination. Reads were subsequently trimmed to remove low-quality reads (option -trim_qual_left/right = 25) and reads shorter than 40 nucleotides, using the PRINSEQ software v.lite-0.20.4 35  Principal component analysis. As a first step of exploratory data analysis, a PCA across all samples of the RNA-seq data was performed. Normalization of counts was performed using the VST (variance stabilizing transformation) function of DESeq2 v1.32.0 package for R 39,40 . The plot of the first two PCs revealed a strong clustering according to tissue type, suggesting strong variation of transcript abundance among tissues (Fig. 3a). Further PCA analyses of leaf (Fig. 3b), developing xylem (Fig. 3c) and fine roots (Fig. 3d) transcriptomes revealed separate clustering of the different genotypes. This result indicates putatively interesting differential gene expression patterns between the different origins ( Fig. 3b-d). PCA analysis has been performed with R 40 using the packages DESeq2 v1.32.0 39  Initial analysis of differentially expressed genes. Analyses of the count data were done with the R package DESeq2, version 1.34.0 39 Normalisation of count tables was done based on the 'median ratio method' 42 implemented in the function 'estimateSizeFactors' . We applied an unspecific filtering 43 to keep only those genes to which at least one read per million reads of library size aligned in at least four samples.
The analysis of differential expression was carried out by fitting two-factorial negative binomial generalized linear models (function 'DESeq') to the count data. To assess the significance of a 'treatment' main effect, a full model with ´treatment´ and 'genotype' main effects was set up, against which a reduced model without 'treatment' main effect was tested with the function 'nbinomLRT' . The sets of genes showing a significant drought main effect were used for a genotype specific analysis of drought effects. For this purpose the data set was split according to tissue and genotype, and for each combination of tissue and genotype, a full model with 'treatment' main effect was set up, against which a reduced model with intercept only was tested with the function 'nbiom-LRT' . The R code for the DEG analysis is available under Figshare R Code DEG analysis 44 . Venn diagrams have been made using an online Venn diagram tool (https://bioinformatics.psb.ugent.be/webtools/Venn/). Analysis of the drought main effect, i.e. without differentiation of the genotypes revealed that the majority of differentially expressed genes (DEGs) were tissue specific under drought (92.8% in fine roots, 62% in the developing xylem and 86.3% in the leaf, Fig. 3e). Interestingly, no shared gene was found to be differentially regulated in all three tissues. When these drought related DEGs shown in Fig. 3e were further analyzed on genotype level, it was found that the majority of drought-related DEGs in fine roots (Fig. 3f) and leaves (Fig. 3h) were genotype specific. However, in the developing xylem (Fig. 3g) there were no genotype specific DEGs found for the Spanish and French genotype. This initial analysis shows, that this dataset can be used to identify conserved drought acclimation processes that are shared by all three genotypes, as well as genotype specific drought responses of P. nigra.