Datasets of seed mucilage traits for Arabidopsis thaliana natural accessions with atypical outer mucilage

The seeds of Arabidopsis thaliana become encapsulated by a layer of mucilage when imbibed. This polysaccharide-rich hydrogel is constituted of two layers, an outer layer that can be easily extracted with water and an inner layer that must be examined in situ in order to study its properties and structure in a non-destructive manner or disintegrated through hydrolysis or physical means in order to analyze its constituents. Mucilage production is an adaptive trait and we have exploited 19 natural accessions previously found to have atypical and varied outer mucilage characteristics. A detailed study using biochemical, histological and Time-Domain NMR analyses has been used to generate three related datasets covering 33 traits measured in four biological replicates. This data will be a rich resource for genetic, biochemical, structural and functional analyses investigating mucilage constituent polysaccharides or their role as adaptive traits.


Background & Summary
Seeds of a number of plant species, including Arabidopsis (Arabidopsis thaliana), become surrounded by sticky mucilage when imbibed. A range of roles has been suggested for this polysaccharide-rich hydrogel, such as aiding germination, dispersion, seedling growth or interaction with soil microorganisms reviewed by Yang et al. 1 . In the reference accession for the model plant Arabidopsis, Columbia (Col-0), the major component of mucilage is the pectin rhamnogalacturonan I (RG-I), which is organized in two distinct layers that differ in their polysaccharide composition and structure 2 . This suggests that the outer water-soluble and inner adherent layer could perform different ecophysiological functions 2 .
Although Arabidopsis is used widely as a model for geneticists, it is a widespread weed whose native range covers most of Europe to central Asia. As mucilage is an adaptive trait its functional advantages are likely to influence the dynamics and evolution of natural Arabidopsis populations. Natural variation in the outer water-soluble layer of Arabidopsis mucilage was recently reported for 306 natural Arabidopsis accessions 3,4 . Large variations were observed in the amount and properties of the constituent polysaccharides. Nonetheless, the composition of the outer mucilage layer was stable between genotypes with RG-I always being the major constituent of outer mucilage. Analysis of the inner mucilage layer is more complex as the polysaccharides are tightly adhered to the seed surface and hydrolysis of the biopolymers into fragments is required. To date, the detailed composition of the inner mucilage layer has only been determined for a limited number of accessions used to generate induced mutant collections (Col-0, Col-2) 2,5 . As the inner mucilage layer can be observed using the cytochemical stain ruthenium red the visual aspect of inner mucilage was previously examined for 280 accessions 6 . This method identified fifty variants that differed in the size of the inner mucilage layer. Observation with ruthenium red can, however, only give an indication of major differences in the width of the inner mucilage layer and this is not necessarily an indication of more or less polysaccharides as the hydrophilic properties, molar mass and conformation of the pectin polymers can alter the volume they occupy 7,8 . Moreover, seed size can vary between natural variants and the volume of the mucilage layer may appear bigger or smaller due to these differences. Furthermore, loss of www.nature.com/scientificdata www.nature.com/scientificdata/ adhesion of inner layer pectin in the muci70 mutant, was recently proposed to be linked to modified macromolecular characteristics as outer mucilage RG-I polymers were shorter in this mutant 4 , which suggests that polymer length contributes to adhesion through intermolecular entanglement. This highlights that much is still unknown concerning the physicochemical requirements for the formation of the inner layer.
To study the variability in mucilage traits in more detail, we have carried out a detailed characterization of both inner and outer mucilage traits for 19 natural variants identified previously as exhibiting atypical outer mucilage macromolecular properties 4 , these included the reference accession Col-0 (Table 1).
Histological, biochemical and Time-Domain NMR (TD-NMR) analyses were used to generate three datasets: dataset 1 contains 182 899 variables for 33 mucilage and seed traits, dataset 2 comprises raw NMR data files and dataset 3 is 4560 values measured following microscope acquisition. In addition to confirming previous values for outer mucilage composition and intrinsic viscosity (IV), the size of the hydrated inner mucilage layer, seed and mucilage width were each measured on images acquired following labeling of the seed surface with a cellulose specific stain (DR23) and the periphery of the inner mucilage layer with an antibody recognizing RG-I epitopes (INRA RU1 9 ). The amount of the major RG-I sugars in the inner layer was determined following hydrolysis of pectin polymers with rhamnogalacturonan hydrolase. Finally, to obtain information about the mobility of water in interaction with macromolecules in different compartments of seed and mucilage, TD-NMR was carried out on dry seeds and over a period of 23 h of imbibition using either intact seeds or seeds pre-treated to remove outer mucilage. The different steps in data production are summarized in Fig. 1.
Methods plant material and growth conditions. The 19 accessions used in this study (Table 1) were obtained from the Versailles Arabidopsis stock center (http://publiclines.inra.fr/naturalAccession/index) and are listed by their Versailles identification number in a four-digit format (i.e. 0001 for accession AV1). These were chosen from 306 outlier accessions analysed previously 4 and included accessions with extreme phenotypes for each of the four macromolecular traits examined, with certain also exhibiting atypical mucilage amounts or composition. Plants were grown in a chamber with 65% relative humidity and 170 µmol m −2 s −1 of light and for the first three weeks with a 16 h photoperiod at 21 °C and 8 h dark at 18 °C, followed by 6 weeks at 6 °C with an 8 h photoperiod to synchronise flowering when subsequently returned to a 16 h photoperiod. Plants were grown in compost (Tref substrates) following a randomized sowing plan in two independent series of plants grown together, with twenty-four plants of each genotype per series. To differentiate these from the seed stocks produced previously to study outer mucilage traits 3 these were termed series c, grown from November 2014 to March 2015 and series d from April to August 2015. Four independent biological replicates were produced for analyses by bulking seed harvested from different plants. These were assigned sample codes c1, c2, d1 and d2 corresponding to two independent lots Versailles identification number (AV) Accession name www.nature.com/scientificdata www.nature.com/scientificdata/ derived from bulks of 10 to 12 plants from series c or d, respectively (Fig. 2). Ten seeds from each lot were weighed using a Sartorius M2P microbalance.

Country of origin
Biochemistry. Outer mucilage (sample type 0) was water-extracted (4 mL) from seeds (200 mg) and analysed as described previously 3 (Fig. 3a). Briefly, after 3 h of head-over-tail mixing at 20 °C and centrifugation (8000 g, 5 min), water extracts were filtered through a disposable glass microfilter (13 mm diameter, 2.7 μm pore size) and analysed colorimetrically for galacturonic acid (GalA) and total neutral sugar (NS) contents 10,11 . Both quantification methods used are based on the ability of sugars to be converted into furfuric derivates in the presence of hot sulfuric acid. Furfuric derivates can then condense with various phenolic compounds to produce a colored complex that can be quantified colorimetrically. Acidic sugars can be quantified specifically using meta-hydroxy biphenyl (mphenyl-phenol or 3 phenyl-phenol) 12 while neutral sugars can be quantified using orcinol (3,5 Fig. 1 Schematic representation of the production of the datasets 1, 2 and 3 for mucilage and seed traits for Arabidopsis accessions with atypical outer mucilage. Data was generated using four seed lots generated from bulks of independent plants that had been produced at two different times corresponding to series c or d. Analyses of the sugar composition, macromolecular properties, water mobility during imbibition, mucilage and seed width, for the 19 accessions generated raw and treated data available in three datasets.
Histochemical staining and immunolabeling of inner mucilage. Seed and inner mucilage layer size (Fig. 3c) were determined following immunolabelling and staining of seeds with an anti-RG-I antibody (INRA-RU1 9 ) and the cellulose specific fluorescent dye Direct Red 23 (DR23) essentially as previously described 15 , except that 1% (w/v) powdered milk was used for the blocking solution and seeds were mounted for observation directly in the DR23 counterstain. As outer mucilage is lost during the immunolabelling procedure seeds analysed correspond to sample type 5. The INRA-RU1 antibody labels the periphery of the inner mucilage while DR23 labels the cellulose within the inner mucilage and the cell walls on the seed surface. Observations was performed with a Zeiss LSM710 confocal microscope using 488 nm or 561 nm lasers to excite Alexa Fluor 488 ® or DR23, respectively. Fluorescence emission was detected between 500 and 550 nm for Alexa fluor 488 ® and 565 and 640 nm for DR23. For each seed lot, measurements were obtained from 30 seeds using Zen software (dataset 3 16 ) and the mean value calculated (dataset 1 17 ). time-domain NMR. Seeds were either analysed directly (sample type 3 or 6) or after removal of water-soluble mucilage (sample type 2 or 5). The latter were prepared by mixing 350 mg of seeds in 10 mL of water for 3 h at 20 °C. Extracts were then centrifuged at 8000 g for 3 min and supernatants carefully removed. Seeds were rinsed four times with 10 mL of water and freeze-dried. Dehydrated seeds with (sample type 3) or without soluble mucilage (sample type 2) were stored at room temperature before being analysed by TD-NMR in dry state or imbibed in water (Fig. 3b).
A Time-Domain spectrometer (Minispec BRUKER, Germany) operating at 0.47 T (resonance frequency of 20 MHz) was used to measure T 2 relaxation times. The temperature of samples was regulated at 20 °C with a temperature control device (±0.1 °C) connected to a calibrated optical fiber (Optoprim; France). The NMR tubes were filled with dry seeds or dry seeds and water (Fig. 3b) as previously described 18 . Tubes were then weighed and hermetically sealed. Acquisitions of T 2 were carried out first on dry seeds and then from 3 min (t0) to 23 h (H23) of imbibition. The FID-CPMG sequence used the following parameters: a 90° pulse close to 2.8 µs, a dwell time of 0.4 µs for a FID duration of 150 µs, 16 scans, a recycle delay of 5 s, an echo time of 0.2 ms with 5000 or 16000 data points, depending on the genotype and/or the seed state (dry, with or without soluble mucilage). www.nature.com/scientificdata www.nature.com/scientificdata/ Transverse relaxation data were analyzed using the following model: where T 2i and T 2j are the proton relaxation times of the solid phase of seeds and those of the more mobile populations (water and oil protons), respectively. The corresponding NMR signal intensities were I i and I j . Dry seeds samples were characterized by four T 2 components (T2_1, _2, _3, _4). Compared to previous studies performed on the reference accession Col-0 and two Arabidopsis insertion mutants 18,19 , the present analyses acquired with a longer FID signal, made it possible to identify an additional T 2 component at around 100 µs for imbibed seeds termed T2_2a while the previously identified component was termed T2_2b so that in total six T2 (T2_1, _2a, _2b, _3, _4, _5) were identified for imbibed seeds. Seeds where outer mucilage had been previously removed by extraction (see above) resulted in the loss of the longer T 2 relaxation time T2_5 and the splitting of T2_3 into T2_3a and T2_3b ( Table 2). Each of these T 2 components could be assigned to populations of protons in water or oil having different mobilities and proportions 18 .

Data Records
All three data records use the same sample nomenclature for input and this is explained in Fig. 2 and Table 2.
Data record 1.  Table 2. Nomenclature used for data in records and indication of sample type analyzed for each trait variable. Each variable measured is listed with its corresponding code, unit of measure and the sample type analyzed (for sample code see www.nature.com/scientificdata www.nature.com/scientificdata/ the sample type 6, are available in datasets described in data records 3 and 2, respectively. An overview of the data set is in shown in Table 3 with the following nine columns: 1. sample_code: the sample code (see Fig. 2   www.nature.com/scientificdata www.nature.com/scientificdata/ 3. cultivation_series: c or d 4. seed_lot: Biological replicate 1 or 2 5. sample type: the sample type analyzed (0, 1, 2, 3, 5, 6) (see Fig. 2) 6. variable: the code of the variable (see Table 2 for the description) 7. value: the measured value 8. date: the date of acquisition in format (year-month-day) or NA if no time points. 9. time: the time of acquisition in format (hour:minute:second) or NA if no time points Data record 2. Dataset 2 consists of NMR raw data files whose name is the combination of the dataset number, the seed lot (2), the cultivation series (c or d) followed by "imb-", the imbibition time (t0 or H23) and the accession code. Data correspond to the TD-NMR raw data of intact seeds imbibed in water (code 6) at initial (imb-t0) or final (imb-24H) imbibition time. The format is supplier imposed (.dps) but can be read by any application using tabular formats. The files comprise three columns: the first indicates the total number of recorded  Table 4. In addition to the nine columns found in dataset 1 an additional column indicates the number of the seed measured. The dataset comprises 4560 values corresponding to measurements of 30 seeds from each seed lot. The dataset is accessible on the Data INRAE site 16 .

technical Validation
The technical quality of the dataset was validated through the use of four biological replicates of seed lots in the different analyses; replicates were produced from plants cultivated in a randomised format to ensure that any environmental effects from their position within the growth chamber, or from plant neighbours, were minimised. The reproducibility of results was examined for biochemical and histological analyses based on the variation between the four replicates with the highest variation observed being under 7% (Table 5). Furthermore, certain analyses carried out on outer mucilage are equivalent to those previously carried out by Poulain et al. 3 , notably for GalA, NS and IV, and these were compared to validate the reliability of measurements and seed lots (Fig. 4). An excellent proportionality between values was observed for all 3 variables with a R 2 of 0.82, 0.80 and 0.96, respectively. For quantification of sugar concentrations, a standard curve was established using standard solutions of Rha or GalA at 20, 40, 60, 80, and 100 µg/ml, which were measured both before and after a series of samples to confirm technical rigour. HP-SEC columns were calibrated for IV using both a calibrant and a standard sample passed at the beginning, middle and end of a series of samples to check that no drift occurred over time.
The NMR spectrometer underwent a daily control procedure in accordance with the manufacturer's recommendations. In addition, the T2 relaxation time and intensity of a reference sample (mineral oil) were controlled each day at spectrometer temperature (around 40 °C). The optical fiber used to regulate sample temperature to 20 °C was calibrated before a series of measurements. In order to validate NMR results, data processing were performed using two different methods that were expected to converge: discrete 20 and continuous maximum entropy (MEM 21,22 ). Each of these methods was performed using two different codes listed below (Table 6). Moreover, the T2 times and amplitudes obtained with Col-0 samples used here were compared for reproducibility with those obtained previously with different Col-0 samples 18 . The reliable acquisition of images by the confocal microscope is certified through annual recalibration of the system, parfocality and light, head scan lens focalisation and collimator by Zeiss, France.

Code availability
The different available software and the versions used to acquire and process data presented in the datasets are summarized in Table 6.