Article

Spatially resolved transcriptome profiling in model plant species

  • Nature Plants 3, Article number: 17061 (2017)
  • doi:10.1038/nplants.2017.61
  • Download Citation
Received:
Accepted:
Published online:

Abstract

Understanding complex biological systems requires functional characterization of specialized tissue domains. However, existing strategies for generating and analysing high-throughput spatial expression profiles were developed for a limited range of organisms, primarily mammals. Here we present the first available approach to generate and study high-resolution, spatially resolved functional profiles in a broad range of model plant systems. Our process includes high-throughput spatial transcriptome profiling followed by spatial gene and pathway analyses. We first demonstrate the feasibility of the technique by generating spatial transcriptome profiles from model angiosperms and gymnosperms microsections. In Arabidopsis thaliana we use the spatial data to identify differences in expression levels of 141 genes and 189 pathways in eight inflorescence tissue domains. Our combined approach of spatial transcriptomics and functional profiling offers a powerful new strategy that can be applied to a broad range of plant species, and is an approach that will be pivotal to answering fundamental questions in developmental and evolutionary biology.

The study of specialized tissue domains and their interactions is essential for a comprehensive understanding of the function of biological systems, including the wide variety of plant species. Functional aspects such as growth, development, biochemistry and physiology ultimately derive from underlying gene expression programmes that are actively regulated in different organs and tissues. Specific molecular conditions are present at every morphological level (whole organism, organ, tissue, cell), rendering spatially resolved, high-resolution and high-throughput analyses of tissues crucial. This need has motivated the development of several methods capable of generating such spatial information1,​2,​3. However, these methods have so far only been established for mammalian systems, and have been lacking for plants4. Nonetheless, RNA-sequencing5,6 (RNA-seq) or microarray profiling7 studies at the level of individual tissues or cell types have been achieved in plants. Techniques to isolate the input material for such type of studies include fluorescence-activated cell sorting (FACS) of fluorescently labelled cells or nuclei8,​9,​10, isolation of nuclei tagged in specific cell types (INTACT)11 and laser capture microdissection (LCM)12,​13,​14,​15,​16 of tissue sections. Although these techniques can provide spatially resolved or cell-type-specific gene expression data, they each have inherent limitations. The FACS and INTACT methods require plant transgenic lines respectively expressing fluorescent proteins or the biotinylated nuclear envelope protein in the cell type of interest. Moreover, FACS in plants requires the formation of protoplasts, which in certain species can be difficult to produce. A major drawback of LCM is the limited sample throughput because of laborious experimental procedures. Furthermore, LCM and FACS can be limited by the yield and purity of the targeted cell types.

Plant tissues and cells present a number of specific challenges compared with those of mammalian systems, including the plant cell wall, vacuoles, chloroplasts and secondary metabolites17,18. These characteristics demand considerably different operational procedures compared with mammalian tissues, and often necessitate species- or tissue-specific optimizations. Therefore, there is a pressing need to ascertain whether existing technologies developed in mammalian tissues can be adapted for use in plants. Moreover, it is of key importance to find efficient ways to mine the resulting datasets to identify genes and gene networks of interest, such as those that control developmental processes. Gene networks reflect potential interactions among genes and can provide a systematic view of the molecular mechanisms underlying targeted biological processes19. Here, we present the first targeted method for transcriptome-wide spatially explicit gene expression profiling in plants. The method is an optimized extension of a high-resolution and high-throughput strategy originally developed for mammalian systems to visualize and quantify gene expression in tissue sections, and exploits arrayed reverse transcription primers containing unique positional barcodes1.

We demonstrate the robust and broad applicability of the method in A. thaliana inflorescence meristem, Populus tremula developing and dormant leaf buds, and Picea abies20 female cones. These exemplary species and tissues represent a broad catalogue of histologically and biologically variable samples across herbaceous and woody plant model systems of the distant angiosperm and gymnosperm phylogenetic groups. Furthermore, we use the spatially resolved data to identify differences in expression patterns, both at the gene and the pathway level, in A. thaliana inflorescence tissue domains.

Results

Specific alterations to treat plant tissue sections on barcoded arrays

We employed P. tremula dormant leaf bud cross-sections, A. thaliana inflorescence and P. abies female cone longitudinal sections to systematically evaluate a large number of variations applied to the previously presented method1 (Fig. 1a) to enable exploitation of the approach in different plant systems. The technique described previously1 consists of positioning tissue cryo-sections on top of a 1,007 spot array of 6.2 × 6.6 mm2. Each spot is 100 µm in diameter and contains 200 million oligonucleotides, which have a spot-specific positional barcode attached to an oligo(dT) primer (Fig. 1a) and a semi-randomized unique molecular identifier (UMI)21,22 used for the quantification of gene-specific transcripts. Tissue sections are first subjected to histological treatment (fixation, staining and high-resolution imaging). We introduced alterations in the fixation (lower concentration of formaldehyde solution; shorter incubation time) and staining processes (Toluidine blue staining) to preserve and visualize plant-specific morphological structures (cell wall) and, at the same time, to prevent an irreversible attachment of the plant section to the array surface (see Methods). Subsequently, tissue sections were permeabilized to allow the capturing of polyadenylated transcripts by the positional probes, which function as primers for reverse transcription when reagents are added onto the tissue section (Fig. 1a). For plant cryosections, we optimized controlled enzymatic permeabilization of tissue structures for each plant and tissue type. Moreover, we included steps in the protocol to capture secondary metabolites, both during permeabilization and cDNA synthesis, as these are abundant in several plant tissues and often inhibit reverse transcription23. After cDNA synthesis, the tissue is removed and the mRNA–cDNA hybrids are released from the array surface to be processed for sequencing library generation. In this phase, we developed cocktails of enzymes for targeted hydrolysis of species-specific plant cell wall components to effectively degrade plant tissue sections without affecting the captured gene expression information (Supplementary Fig. 1). Finally, tissue section morphology was combined with the barcoded gene expression information, providing spatially resolved transcriptomics data (Fig. 1a).

Figure 1: Spatially resolved transcriptome profiling in plants.
Figure 1

a, The barcoded array surface presents 1,007 spots (i), each of which contains 200 million barcoded oligonucleotides to capture polyadenylated transcripts. Oligonucleotides contain UMIs to enable identification of unique transcripts per gene. Barcodes are spot-specific. ii, Plant tissue sections are positioned on top of the array surface, fixed and stained with conditions optimized to preserve and visualize plant morphological structures, and imaged. iii, Species-specific treatments for tissue permeabilization are applied to allow a controlled vertical diffusion of transcripts to the array surface where they bind to the corresponding barcoded oligonucleotides. cDNA synthesis is performed utilizing the barcoded oligonucleotides as primers. iv, Degradation of the plant tissue section is achieved using an enzyme cocktail tailored for the specific cell wall components of the tissue. Release of mRNA–cDNA hybrids is followed by library preparation and sequencing. v, Histological information and barcoded gene expression profiles are combined allowing for spatial gene expression studies. b, Visualization of spatially localized cDNA synthesis in three plant species obtained with Cy3-labelled nucleotides on top of arrays entirely covered by oligo(dT) primers (Supplementary Fig. 1). Fluorescence imaging is performed after tissue degradation. i, P. tremula dormant leaf bud cross section composed of five leaf primordia characterized by a spiral shape (ii). A. thaliana inflorescence meristem longitudinal section with different floral stages (iv), some of them containing stamens (S) and carpels (C) characterized by high levels of gene expression (v). P. abies female cone longitudinal section (vii), comprising lateral organs (LOs), parenchymal tissue (PT), and meristematic tip, a site of high gene expression (viii). iii, vi and ix are magnifications of ii, v and viii, respectively.

To assess stringent confinement of transcripts under individual plant cells effectively, we used oligo(dT) primers immobilized on a glass slide and incorporation of fluorescently labelled nucleotides during cDNA synthesis1 (Supplementary Fig. 2). The reconstruction of a morphologically defined fluorescent image of the plant cell transcripts after cDNA synthesis and subsequent tissue removal provided a solid demonstration that the introduced alterations allowed for localized capture of transcripts (Fig. 1b).

To investigate potential biases in the detection of gene expression from permeabilized tissue sections, we compared expression profiles obtained from intact P. tremula cross-sections enzymatically permeabilized on the array surface with those identified using standard tissue disruption methods of whole sections (pestle and mortar). The number of genes detected in all replicate samples of permeabilized tissue sections (17,043) was almost double the number of common detected genes in ground sections (9,558; Supplementary Fig. 3). Enzymatic treatment of tissue sections therefore allowed highly reproducible detection of unique transcripts across replicates (r = 0.91–0.97; Supplementary Fig. 3). In contrast, ground, whole cross-section samples were less consistent among replicates (r = 0.80–0.88; Supplementary Fig. 3) and resulted in the detection of substantially fewer unique transcripts at equivalent sequencing depth. The correlation between permeabilized and ground tissue sections was similar to the correlation range of ground tissue sections (r = 0.77–0.89; Supplementary Fig. 3).

Reproducibility in angiosperm and gymnosperm tissue sections

We examined technical reproducibility by using P. tremula cross-sections from developing and dormant leaf buds24 (Fig. 2a). These buds are small yet have distinctly defined and replicated biological structures (immature leaves), rendering their intrinsic biological properties suitable for such an assessment of within- and between-sample reproducibility. Three P. tremula cross-sections were taken from each of two summer bud samples, with the three sections from each bud positioned onto two separate arrays (one array per bud). Additionally, one or two cross-sections taken from two different winter buds were positioned onto two separate arrays. This experimental design allowed us to investigate both technical and biological variability. Hierarchical clustering of the highly expressed genes in spots under the tissue sections identified a clear and consistent pattern of gene expression among sections on the same array and between replicated bud sections on different arrays, both for the developing and dormant buds (Fig. 2b). A number of spots displayed a characteristic pattern of having consistently low expression for all of the represented genes across a number of the assayed buds (Fig. 2b). Examination of the corresponding spots revealed that they were largely located under the surrounding bud scale material rather than under leaves (Fig. 2b).

Figure 2: Method reproducibility in angiosperm and gymnosperm species.
Figure 2

a, P. tremula leaf buds are formed early in the growing season with each bud typically containing six developmentally arrested leaves. Immature buds become developmentally arrested early in the summer, after which they develop bud scales that protect the immature leaves from winter conditions and become dormant. Bud dormancy is broken during the winter and buds flush after a period of exposure to warmer spring temperatures, at which point the pre-formed leaves rapidly emerge and expand. b, P. tremula gene expression heat map of the 408 highly expressed genes in two developing and dormant leaf buds. Each colour bar represents one cross-section. The black arrows indicated spots with low gene expression. The tissue image represents tissue section 2, in developing bud 2, having two spots mostly localized under scale tissue. The PCA plot analyses the 160 spots represented in the heat map grouped for developing and dormant buds. c, The PCA plot of the 219 P. abies spots under three female cone tissue sections (i). ii, The PCA plot as in i with colouring based on the different tissue domains of the three replicates. Tissue image of P. abies replicate 2 with spots localized under later organs (LO) and parenchymal tissue (PT). d, Bean plots indicating the number of genes and transcripts per spot in A. thaliana replicates. The black line indicates the average number of genes/transcripts per spot in each replicate; the dashed line indicates the average number of genes/transcripts (2,929/8,061) per spot in all three replicates.

In addition to demonstrating the within-bud type reproducibility of the data, we also observed reproducible, differential expression between the buds types. This pattern was apparent in a principal component analysis (PCA) plot of all expressed genes detected in the two bud types (Fig. 2b). Orthogonal projections to latent structures by means of partial least squares (OPLS) analysis resulted in a model with high values for both the predictive (Q2 = 0.94) and orthogonal (R2 = 0.98) components, confirming the biological separation between the two sample types. We identified 1,253 differentially expressed genes between the two bud types (Supplementary Table 1). There were no gene ontology (GO) terms significantly enriched among genes more highly expressed in dormant buds, but those more highly expressed in developing buds included translation (GO:0006412), photosynthesis (GO:0015979) and a number of categories involved in cell division, differentiation and genome organization, including cell morphogenesis (GO:0000902; Supplementary Table 1). A number of homologues of known developmental regulators in A. thaliana were identified, including members of the GRF (Potri.001G392200, Potri.002G099800, Potri.002G103800, Potri.005G164500, Potri.007G007100, Potri.011G110900), YABBY (Potri.001G120200, Potri.001G214700, Potri.003G112800, Potri.014G066700, Potri.018G129800) and KNAT (Potri.006G190000) families, which were also previously identified as potential regulators of leaf development from systems biology analyses in Populus25,26.

The intra- and inter-experiment reproducibility of the method was also confirmed in P. abies female cone tissue sections (Supplementary Fig. 4). One or two sections of the same female cone were placed on two different arrays. A PCA plot of gene expression in the spots under the three tissue sections did not reveal any sample specific separation (Fig. 2c). Moreover, a spatial division of spots underlying lateral organs and parenchymal tissue of the female cone (Fig. 2c) within each section was evident.

Finally, we confirmed technical reproducibility in longitudinal sections of the same A. thaliana inflorescence assayed using three arrays (Supplementary Fig. 5). The average number of genes and transcripts per spot (2,929 and 8,061 respectively) was very similar within and between replicates, providing clear evidence of reproducibility (Fig. 2d).

Validation of spatially resolved transcriptomics data in plants

To validate our method, we compared the spatial gene expression detected in the three A. thaliana technical replicates with the A. thaliana microarray Atlas of Development (AtGenExpress Development)27, which contains information for most of the organs present in our samples, thus representing a suitable dataset for validation. We considered five corresponding tissue domains: the stem, meristematic area, and flowers of stages 9, 10, 11 and 12 (Fig. 3a).

Figure 3: Validation of the method on three A. thaliana tissue sections.
Figure 3

a, The tissue domains utilized for the validation analysis (ST, stem; MST, meristematic area; S9, S10, S11 and S12: flowers stage 9, 10, 11 and 12, respectively). b, The distribution among the three replicates of the number of spots (107 total) per tissue domain. c, The distribution of gene expression in the spots selected for the validation analysis. d, The proportion of true positives (TPs), true negatives (TNs), false positives (FPs) and false negatives (FNs) among the tissue domains in the three replicates.

We considered 20,215 genes distributed among 107 spots in total (Fig. 3b), corresponding to the five tissue domains listed above. A total of 14,009 genes were shared among all tissue domains, and 6–11% were replicate specific (Supplementary Fig. 6). Gene expression patterns showed similar normal distributions between tissue domains and replicates (Fig. 3c). The AtGenExpress Development dataset contained 20,922 genes distributed among the tissue domains considered, of which 15,648 were common to the spatial gene expression dataset.

By establishing specific gene expression thresholds in both datasets (Supplementary Fig. 6), we calculated false positives, false negatives, true positives and true negatives for all five tissue domains in each of the three replicates (Fig. 3d, Supplementary Fig. 6 and Supplementary Table 2). Overall, false positives were the least represented category in any tissue region or replicate, with true positives and true negatives often equally represented (Fig. 3d). Overall, spatial gene expression achieved 58.5% sensitivity, 92.9% specificity, 71.2% accuracy, and 6.5% false-positive rate (FPR).

Global gene expression pattern analyses

We performed two types of spatial gene expression analysis: at the per-gene level and the global level. We demonstrated precise visualization of the spatial location for two key genes involved in flowering: At5g57720, which is known to be expressed in stage three and nine flowers27,28, and ATA27 (At1g75940), which is expressed in the tapetum layer of stage 11 and 12 flowers29. In addition, we spatially visualized expression of TUB2 (At5g62690), a known housekeeping gene27 (Fig. 4a); five genes involved in the ABC model of floral organ identity: APETALA1 (AP1) and APETALA2 (AP2)30, known to be expressed in early flowers, sepals and petals; APETALA3 (AP3)31 and PISTILLATA1 (PI)32, expressed in petals and stamens; and AGAMOUS (AG)33, expressed in stamens and carpel (Supplementary Fig. 7). All of the ABC- related genes showed an evident pattern of gene expression in the floral part of our inflorescence sections. Moreover, there was a clear predominance of identified spots under the stamens for APETALA3, PISTILLATA1 and AGAMOUS compared with APETALA1 and APETALA2, as expected.

Figure 4: Types of gene expression studies allowed by spatial transcriptomics data in A. thaliana.
Figure 4

a, The expression profiles at the per-gene level for At5G57720, At1G75940 and At5G62690. The spots where the gene expression was detected are highlighted. b, The spots in one A. thaliana replicate coloured according to the cluster identity in the hierarchically clustered t-SNE analysis. c, Classification of tissue domains at the micro-category level employed in the linear model analysis. d, Validation of 141 genes differential in tissue micro-categories. Green points, P-values from actual data; red points, P-values after randomly permuting spot labels; vertical dotted line, 0.1% quantile of permuted P-values (roughly equals 0.001, which corroborated the correctness of the model and was used for estimating FDR); horizontal dotted line, arbitrary threshold P(H0) = 0.05. e, Examples of differentially expressed genes detected between tissue domain micro-categories by the linear model. f, Differential values of 189 pathway scores from NEA. Colour coding as in d. g, Examples of altered pathways detected by the linear model.

We investigated global gene expression patterns in the three A. thaliana inflorescence sections by applying the t-distributed stochastic neighbour embedding (t-SNE)34 machine learning algorithm for dimensionality reduction followed by hierarchical clustering. Spots clustered into four groups, which corresponded to the four main different organ types present in the tissue sections: stem, meristematic area, flower reproductive organs, and sepal and petals (Fig. 4b and Supplementary Fig. 8).

We further performed quantitative investigation of gene expression profiles in spatially defined domains within the inflorescence meristem of A. thaliana. Differentially expressed analyses are widely applied35,36, but these are most often based on comparisons between two, or a set of, conditions/tissue domains. The analysis of spatial gene expression data requires a more complex analysis model that includes components representing biological and technical replicates, spots and defined tissue domains within each assayed section. Therefore, we utilized a three-way linear model (see Methods) that accounted for variability: (1) between samples (technical replicates); (2) between sample-specific spots, which corresponded to the same locations in the tissue through all the samples (see Methods); (3) between different tissue domains defined as either macro-categories (large regions of the tissue section: stem, meristematic area and flowers; Supplementary Fig. 9) or micro-categories (small regions of the tissue section: stem, premeristematic area and the flower development stages 3, 7, 9, 10, 11 and 12; Fig. 4c). We applied this linear model to the gene expression values to test if measurement noise does not preclude the identification of transcriptional differences specific to a tissue domain. In addition to the P-values of the three-way model (green points in Fig. 4d and Supplementary Fig. 9), we controlled the false discovery rate with a permutation test (red points in Fig. 4d). The allocation of mainly green points to the left of the vertical dotted line in Fig. 4d suggests a prevalence of true discoveries over false positives. Although we applied stringent parameters in the linear model, we identified significant differences in the expression levels of 293 (false discovery rate (FDR) <0.04; see Methods) and 141 (FDR <0.14) genes at the macro- and micro-category levels, respectively, of tissue domains (Fig. 4d, Supplementary Fig. 9 and Supplementary Table 3). AtSP2 (At1g07340), which encodes a sugar transport protein, was expressed in stage 11 flowers (Fig. 4e). This is in agreement with previously published reporter gene studies and with the suggested role of AtSP2 in the uptake of glucose units from degrading callose during early stage 11 flowers37. In stage 11 and 12 flowers the tapetum, the supporting cell layer surrounding the microspores, produces proteins and lipids that are transported out of the tapetum layer and deposited onto the developing pollen grains. In line with this, we identified genes involved in lipid transport (ATA7; At4g28395), oleosin production (GRP19; At5g07550) (Fig. 4e) and exine pattern formation (RPG1; At5g41110).

A limitation of differentially expressed analyses at the individual gene level is that they do not take into account that many genes co-participate in, for example, biosynthetic pathways. To enable this framework, we obtained pathway scores from expression values using network enrichment analysis (NEA)38. These scores were subjected to the linear model described above, and subjected to the same permutation test as above. We identified a total of 128 and 189 (FDR <0.01 in both analyses) pathways that significantly differed between macro- and micro-categories of tissue domains, respectively (Fig. 4f, Supplementary Fig. 9 and Supplementary Table 3). We show that the auxin-signalling pathway, which is important for pattern formation, growth and development was significantly enriched in the macro-categories meristematic area and stem compared with flowers (Supplementary Fig. 9). Interestingly, we note that the stamen filament development pathway was enriched in floral stage 11, the site of stamen filament elongation, and in the stem, where vascular channels are abundantly present and where the process of cell elongation and vascular development are active (Fig. 4g). Moreover, we show that the pollen exine formation pathway was altered in floral stages 10 and 11; at this stage, exine, one of the major constituents of the pollen wall, is produced in the tapetum layer and deposited on the pre-pollen cells. This is in agreement with our findings at the gene level (ATA7, GRP19 and RPG1) (Fig. 4g), and is confirmed by the enrichment of the same pollen exine formation pathway in the flowers macro-category (Fig. 4g).

Overall, we identified significant differences in tissue-domain specific genes and pathways that provide a large biological reservoir for future functional and evolutionary studies, as well as being in agreement with existing literature. We present an extensive list of genes and pathways significantly altered between macro- and micro-categories in Supplementary Figs 10–13.

Visualization tools

To enable community exploitation of our datasets, we have incorporated the newly generated data into the A. thaliana (http://atgenie.org), Populus (http://popgenie.org) and conifer (http://congenie.org) subdomains of the PlantGenIE web platform39. The data are available using the exImage tool, which enables visual examination of spatial gene expression within the corresponding tissue cross-sections (http://v22.popgenie.org/spatiallyresolvedtranscriptomics/).

Discussion

The availability of transcriptome-wide, spatially resolved gene expression data represents a substantial advance in our current ability to link gene expression profiles to cellular and developmental processes. Here, we present the first such method available for plant researchers and demonstrate analytical approaches that exploit the power of the data generated. Our results demonstrate the efficacy of our workflow for the study of gene expression patterns in whole plant tissue sections. In combination with the corresponding morphological information our approach enables high-throughput and spatially resolved transcriptomics analyses in plant tissues. By modifying molecular reactions to account for differences in plant section composition, we were able to successfully apply the method to assay gene expression across a broad range of vegetative and reproductive organs of herbaceous and woody angiosperm and woody gymnosperm species, enabling future comparative studies. There are techniques, such as FACS and INTACT, that enable profiling of gene expression in plants at a histologically defined cell-type level. Despite their great depth of profiling in specific locations (cell or tissue-types), such techniques require the formation of transgenic plant lines, limiting their applicability, especially in the case of non-model species. Our method circumvents these obstacles by directly assaying the gene expression within tissue sections while preserving their histological information, enabling application to a far broader range of species. The LCM technique can also be utilized to generate spatial gene expression studies and is applicable to any plant species. However, LCM suffers from low throughput due to its laborious methodology. Moreover, it is widely known that LCM offers suboptimal yield and purity of the targeted cell types. Ståhl et al.1 found that the number of genes and transcripts captured by the spatial transcriptomics method was at least twice as high compared with another study when LCM was used to obtain large regions from mouse brain tissue.

The novelty of our method lies in the ability to combine morphological information with single gene expression patterns in plants, and to extend such studies to global level analyses, both within different domains of a plant tissue section and between several tissue sections. It is well established that most biological characteristics arise from complex interactions and coordination between the numerous and multilayered components of a cell, which interact across the genome-to-phenotype continuum40,41. The successful application of the linear model analysis approach we employ on transformed read counts and pathway scores, obtained from expression values by using NEA, demonstrates that spatial data allow functional analyses among different tissue domains of the same tissue section and among replicates. Our findings provide an extended spatial catalogue of expression patterns compared with those present in the literature, which may be employed in new functional studies to investigate new spatial regulation mechanisms acting in the A. thaliana inflorescence. Moreover, our approach may be extended to other species, enabling comparative studies between gene networks to identify conservation and divergence in developmental gene co-expression networks.

With the current array resolution, we can already generate precisely quantified gene expression and/or pathway scores in different domains of the same tissue section. Such information cannot be generated using bulk experiments, and it is technically challenging to obtain using other methods. Although comparable resolution might be achieved by combining approaches such as FACS and microdissection, as in the study of Brady et al.9 to create a high-resolution spatiotemporal gene expression map of the Arabidopsis root, or by consecutive microdissection in the three different spatial coordinates throughout a whole organ (clonal condition)42, the technical challenges associated with these methods (discussed above) remain. As such, we provide a valuable, effective and broadly applicable resource for data exploration and hypothesis generation, which could be further enhanced by the development of arrays with even higher resolution by array manufacturers. Moreover, application of our workflow on crop species would complement transgenic and genome editing studies, enabling visualization of modulated gene expression to better understand the impact of genetic modifications.

Our presented method is highly reproducible between plant tissue sections and has higher technical reproducibility than ground whole-section experiments. This might be explained by the different efficiencies of the approaches used to disrupt cells and tissues, as well as the different RNA extraction procedures utilized. Material loss is quite high when using ground sections, which increases variability between replicate samples. By contrast, our method utilizes tissue sections attached to an array surface and treated in situ, thus preventing any material depletion.

We have demonstrated the simultaneous processing of different plant tissue section domains on a single array, as well as of multiple sections of suitable size. This maximizes the ability to detect biological variation while reducing technical variability43 between samples, a problem encountered when dissected samples from the same tissue are processed in parallel.

Validation of the spatial data using the AtGenExpress Development microarray dataset clearly demonstrated the high specificity, accuracy and a low FPR of the method. We observed similar trends between replicate samples, confirming a high rate of method reproducibility. The moderate sensitivity of the method is a consequence of the relatively high number of false negatives, which represent genes present in the AtGenExpress Development microarray dataset but not identified by our method. Such differences might be ascribed to the broader tissue domain definitions employed in the microarray study, where a whole plant organ, for example a stage 9 flower, composed of several domains (sepals, petals, reproductive structures, and so on), was homogenized, producing an average transcriptome of the sample. This hypothesis is supported by the lower number of false negatives identified in the stem organ of all three longitudinal sections of the A. thaliana inflorescence meristem. Stem anatomical structures are repetitive, so there is a higher probability of capturing the same gene expression profile in the homogenized organ and the spots distributed under its region within tissue sections. Moreover, some of the false negatives detected could be due to the sequencing depth of the spatial data.

Overall, our results demonstrate the applicability, reproducibility and accuracy of our method to simultaneously assay global expression patterns in plant tissue sections of both herbaceous and woody plants in combination with their morphological information. We believe that our workflow represents a novel approach for transcriptomics studies in plants that will facilitate future avenues of developmental and evolutionary research. To facilitate this goal, the data have been made available to the community for visual exploration, and represent the first high-resolution spatially resolved gene expression resource in plants.

Methods

Collection of the plant material

A. thaliana inflorescences harbouring floral stages 1–14 (according to Smyth et al.44) were collected from plants three weeks after germination.

P. tremula dormant leaf buds were sampled from a single mature tree on the Umeå University campus in April 2014, before bud flush had initiated. To sample developing buds, buds were collected from the same tree weekly throughout the summer (from June to August 2014) until five leaf primordia had developed.

P. abies female cones were collected in early autumn (September and October) before the buds entered winter dormancy from trees grown outside Uppsala Sweden (Latitude 59.8 longitude 17.6). Bud scales were removed during collection; the remaining sample consisted of the basal region of the one-year shoot and the developing bud meristem.

All plant tissues were immediately snap frozen in liquid nitrogen and stored at −80 °C until processed.

Array structure

Arrays consisted of a 33 × 35 spot matrixes. Each matrix included frame spots for orientation purposes and 1,007 spots for capturing the gene expression information printed on a SurModics Codelink Activated Slide surface of 6.2 × 6.6 mm2 following the manufacturer's instructions. Each gene expression spot, of 100 µm diameter and with a 200 µm distance between centres of two consecutive spots, contained approximately 200 million oligonucleotides with unique positional barcodes1,45. Each oligonucleotide covalently immobilized on the surface had the following common 5′–3′ structure: 18-mer spot-unique positional barcode, 9-mer semi-randomized UMI21,22 and poly-20TVN capture region. UMIs were utilized to identify unique transcripts per gene. The general sequence was [AmC6]UUUUUGACTCGTAATACGACTCACTATAGGGACACGACGCTCTTCCGATCT[18mer_Spatial_Barcode_1to1007]WSNNWSNNVTTTTTTTTTTTTTTTTTTTTVN.

To acquire fluorescence prints of the transcripts contained in the tissue section cells, a variation of the spatial arrays was obtained by uniformly immobilizing poly-20TVN oligonucleotides on the SurModics Codelink Activated Slide surface with the following general sequence: [AmC6]UUUUUGACTCGTAATACGACTCACTATAGGGACACGACGCTCTTCCGATCTNNNNNNNNTTTTTTTTTTTTTTTTTTTTVN.

All oligonucleotides were synthesized by Integrated DNA Technologies (IDT).

Cryosectioning, fixation, staining and imaging

All pre-frozen plant samples were embedded in cold OCT (Sakura) prior to cryo-sectioning. A. thaliana inflorescence meristem was longitudinally sectioned at 8 µm thickness, P. tremula developing and dormant leaf bud cross-sections at 10 µm, and P. abies female cone longitudinal sections at 12 µm.

Fixation of plant tissue sections on top of the arrays was performed at room temperature in neutral formaldehyde solution, diluted 1:20 from 36.5–38.0% stock solution (Sigma-Aldrich) in 1×PBS (Medicago). Fixation occurred for 2.5 min for A. thaliana sections, and for 3.0 min for both P. tremula and P. abies sections. Sections of all plant species were washed with 1×PBS directly after fixation, and were stained with Toluidine Blue (Sigma-Aldrich) for 1.0 min at room temperature. Sections were rinsed with Milli-Q DNase/RNase free water, air dried, mounted with 85% glycerol (Merck Millipore) and covered with a coverslip (Menzel-Gläser). Bright-field imaging was performed with the Metafer Slide Scanning Platform (MetaSystems). Collected images were stitched together using the VSlide software (MetaSystems). After imaging, the coverslip was removed from the glass slide by dipping in Milli-Q RNase/DNase free water and subsequently in 80% ethanol to remove glycerol.

Permeabilization, cDNA synthesis, tissue removal and probe release

Glass slides with arrays were put in ArrayIT mask holders to obtain reaction chambers for each array and perform on-array reactions.

A. thaliana, P. tremula and P. abies tissue sections were treated with 70 µl of 1× Exonuclease I Reaction Buffer (NEB), diluted in 2.0% PVP40 (Sigma-Aldrich), with 0.19 µg µl–1 BSA (NEB) at 37 °C for 30 min. After washing each array with 100 µl of 0.1× SSC (Sigma-Aldrich) diluted with Milli-Q DNase/RNase free water (standard washing), 70 µl of 0.1% pepsin (Sigma-Aldrich) dissolved in 0.1 M HCl (Sigma-Aldrich) were added to each array chamber and incubated at 37 °C for 8 min. Instead, P. tremula tissue sections were incubated for 10 min, and tissue sections of P. abies underwent 11.5 min pepsin treatment. Pepsin was washed away as described above, and 70 µl of reverse transcription mixture in 2.0% PVP40 was added to each of the array chambers. The reverse transcription mixture was incubated at 42 °C overnight and contained: 1× First Strand Buffer (Invitrogen), 5 mM DTT (Invitrogen), 0.5 mM of each dNTP (Fisher Scientific), 0.19 µg µl–1 BSA, 50 ng µl–1 actinomycin D (Sigma-Aldrich), 1% DMSO (Sigma-Aldrich), 20 U µl–1 Superscript III (Invitrogen) and 2 U µl–1 RNaseOUT (Invitrogen).

Before releasing the probes from the array surface, array chambers underwent standard washing steps, and tissue sections were degraded. Depending on the plant species, different tissue removal procedures were applied. A. thaliana tissue sections were degraded by first adding 70 µl of plant cell wall degradation mix in 250 mM sodium citrate buffer pH 6.6 (Sigma-Aldrich) to each array chamber. Incubation was performed at 37 °C for 1 h at 300 r.p.m. interval shaking. The plant cell wall degradation mix contained 0.43 U µl–1 pectate lyase (Megazyme; cleavage of (1,4)-α-d-galacturonan to give oligosaccharides with 4-deoxy-α-d-galact-4-enuronosyl groups at their non-reducing ends; https://secure.megazyme.com/Pectate-lyase-Aspergillus-sp-ammonium-sulphate), 0.14 U µl–1 xyloglucanase (Megazyme; endo-hydrolysis of 1,4-β-d-glucosidic linkages in xyloglucan; https://secure.megazyme.com/Xyloglucanase-GH5-Paenibacillus-sp), 0.26 U µl–1 xylanase 10A (Nzytech; endo-hydrolysis of a large variety of decorated and undecorated 1,4-β-d-xylans; https://www.nzytech.com/products-services/xylanases/cz0007/), 0.21 U µl–1 mannanase 26A (Nzytech; endo-hydrolysis of 1,4-β-d-mannans and galactomannans; https://www.nzytech.com/products-services/beta-mannanases/cz0205/) and 0.14 µg µl–1 cellulase (Worthington, product code CEL; it converts crystalline, amorphous and chemically derived celluloses quantitatively to glucose; http://www.worthington-biochem.com/cel/default.html). Second, after standard array washing, 70 µl 10% Triton X-100 (Sigma-Aldrich) was added into the array chambers and incubated at 56 °C for 1 h in continuous shake. Third, array chambers were subjected to standard washing and a 70 µl mixture containing Proteinase K (Qiagen) and PKD buffer (Qiagen) was added to the array chambers for incubation at 56 °C for 1 h at 300 rpm interval shaking.

P. tremula tissue sections underwent the degradation process described for A. thaliana, with an additional step. Prior to the proteinase K treatment, a 70 µl mixture of 1% β-mercaptoethanol (Calbiochem) in RLT buffer (Qiagen) was added to the array chambers. This was incubated at 56 °C for 1 h with continuous shaking and subsequent standard washing of the array chamber.

P. abies tissue sections underwent the P. tremula tissue section degradation process with two modifications: (1) an incubation with 70 µl 0.002 U µl–1 lignin peroxidase (Sigma-Aldrich) in 250 mM sodium citrate buffer pH 3 and 24 mM hydrogen peroxide (30% w/w, Sigma-Aldrich) at 30 °C for 30 min at 300 rpm interval shake followed by standard washing; (2) the incubation with RLT buffer contained 3% β-mercaptoethanol.

After tissue removal treatment, glass slides were washed in 2× SSC, which contained 0.1% SDS (Sigma-Aldrich), at 50 °C for 10 min, and subsequently washed in 0.2× SSC and 0.1× SSC at room temperature for 1 min.

Release of probes with mRNA–cDNA hybrids from the array surface was performed by adding 70 µl of release mix to the array chambers. Release mix was composed of 1.1× Second Strand Buffer (Invitrogen), 8.75 µM of each dNTP, 0.20 µg µl–1 BSA, and 0.1 U µl–1 USER enzyme (NEB). Incubation was performed at 37 °C for 1, 2 and 3 h for A. thaliana, P. tremula and P. abies samples, respectively.

Synthesis of fluorescence cDNA

To obtain fluorescent footprints of the transcripts contained in the tissue section cells, the reverse transcription reaction was carried out and prepared as described above, but with dCTP at a concentration of 12.5 µM, and with the additional inclusion of 25 µM Cyanine 3-dCTP (PerkinElmer). Before fluorescence imaging, arrays underwent standard washing and tissue removal as described above.

Assessment of the effects of the tissue removal enzyme cocktail on RNA

An array with poly-20TVN oligonucleotides immobilized on the surface was used in preliminary tests to assess the effects of the tissue removal enzyme cocktail on the RNA. RNA (35 ng) was added to the top of each of five wells together with the fluorescence cDNA synthesis mix (see “Synthesis of fluorescence cDNA”). The reverse transcription reaction was prepared and carried out as described above. After incubation, the array underwent standard washing and the fluorescence signal was measured with InnoScan 910 (Innopsys). Subsequently, four wells were treated with the P. tremula-specific tissue removal enzyme cocktail mix for 1 h, and one well was incubated with 0.1× SSC as positive control. The array again underwent standard washing and fluorescence measurement.

Library preparation and sequencing

The reaction mix (65 µl) containing the released probes was collected from each array chamber. cDNA second strand synthesis was performed by adding 5 µl of the mix to each sample. The second strand synthesis mixture contained 2.7× First Strand Buffer, 3.7 U µl–1 DNA polymerase I (Invitrogen) and 0.18 U µl–1 RNaseH (Invitrogen). Incubation was performed at 16 °C for 2 h, after which 5 µl of T4 DNA polymerase (NEB) was added to each samples for further incubation at 16 °C for 20 additional minutes. Twenty-five microlitres of 80 mM EDTA (Invitrogen) was added to each sample before purification with Agencourt RNAClean XP beads (Beckman Coulter) following the manufacturer's instructions. Elution was performed with Milli-Q DNase/RNase-free water. The in vitro transcription mixture in a volume of 10.4 µl containing 1× T7 Reaction Buffer (Ambion), 7.5 mM of each NTP (Ambion), 1× T7 Enzyme Mix (Ambion) and 1 U µl–1 SUPERaseIN (Ambion) was added to 5.6 µl of the purified samples and incubated for 37 °C for 14 h. A second purification was performed on the samples using the Agencourt RNAClean XP beads following the manufacturer's instructions, and elution was performed in 10 µl of Milli-Q DNase/RNase-free water. A ligation reaction comprising 8 µl of amplified RNA (aRNA) and 2.5 µl of ligation adapter (IDT, [rApp]AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC[ddC]) was incubated at 70 °C for 2 min. Subsequently, 4.5 µl of ligation mixture was added to each sample, thus obtaining a final composition of 1× T4 RNA Ligase Reaction Buffer (NEB), 20 U µl–1 T4 RNA Ligase2 truncated (NEB), 4 U µl–1 RNase Inhibitor Murine (NEB) and 0.5 µM ligation adapter. Samples were incubated at 25 °C for 1 h and then purified using Agencourt RNAClean XP beads according to the manufacturer's instructions, and elution was performed in 10 µl of Milli-Q DNase/RNase-free water. The RT primer (IDT, 5′-GTGACTGGAGTTCAGACGTGTGCTCTTCCGA-3′) was added to the eluted samples at a final concentration of 1.7 µM and 1 µl of dNTPs (0.83 mM final concentration of each dNTP). Heat was applied at 65 °C for 5 min, after which samples were placed on ice and 8 µl of reaction mixture was added (final concentration: 1× First Strand Buffer, 0.05 M DTT, 500 µM each dNTP, 1 mM RT-primer, 10 U µl–1 Superscript III, 2 U µl–1 RNaseOUT). Incubation was performed at 50 °C for 1 h. Then samples were purified using Agencourt RNAClean XP beads according to the manufacturer's instructions, and eluted in 10 µl of Milli-Q DNase/RNase-free water.

The optimal number of PCR cycles for final library amplification was estimated by qPCR in 10 µl of final volume. The qPCR reaction mixture contained 2 µl of purified sample, 1× KAPA HiFi HotStart Readymix (KAPA Biosystems), 0.5 µM PCR InPE1.0 primer (Eurofins, 5′-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCT CTTCCGATCT-3′), 0.01 µM PCR InPE2.0 primer (Eurofins, 5′-GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT-3′), 0.5 µM PCR Index primer (Eurofins, 5′-CAAGCAGAAGACGGCATACGAGATXXXXXXGTGACTGGAGTTC-3′), and 1× EVA green (Biotium). The qPCR programme was as follows: 1×, 98 °C 3 min; 25×, 98 °C 20 s, 60 °C 30 s, 72 °C 30 s. The PCR was performed in a reaction volume of 25 µl with the above programme and the established optimal number of cycles per sample. A final extension at 72 °C for 5 min was included. Indexed libraries were purified by an automated MBS robot46 and eluted in 20 µl of Elution Buffer (Qiagen). The library average length was assessed using the DNA 1,000 kit (Agilent) on a 2,100 Bioanalyzer (Agilent), and the concentration was measured using the Qubit dsDNA HS Assay Kit (Life Technologies) following the manufacturer's instructions. Indexed libraries were diluted to 4 nM and sequenced using the Illumina NextSeq platform applying paired-end sequencing. Read 1 was sequenced for 31 bases, and read 2 for 121 bases.

Fluorescence imaging

A fluorescence signal on spatially barcoded arrays was obtained by adding 70 µl of reaction mixture containing 0.96× PBS, 0.2 µM Cy3-anti-A probe (Eurofins, [Cy3]AGATCGGAAGAGCGTCGTGT) and 0.2 µM Cy3-anti-frame probe (Eurofins, [Cy3]GGTACAGAAGCGCGATAGCAG). Incubation was performed at room temperature for 10 min. Arrays were washed with 2× SSC containing 0.1% SDS at 50 °C for 10 min, then with 0.2× SSC and 0.1× SSC at room temperature for 1 min, respectively. Subsequently, spatially barcoded arrays were mounted with SlowFade Gold Antifade Reagent (Invitrogen) and covered with a coverslip. Fluorescence imaging, image stitching and extraction were performed following the procedure described in the section “Cryosectioning, fixation, staining and imaging”.

Fluorescence footprints of the transcripts contained in the tissue section cells on a SurModics Codelink Activated Slide with uniformly immobilizing poly-20TVN oligonucleotides were also mounted with SlowFade Gold Antifade Reagent and scanned using the above-mentioned procedure.

Image alignment

Image alignment of corresponding bright field and fluorescence images was performed manually using Adobe Photoshop CC (Adobe).

Total RNA extraction and fragmentation

Single 10-µm-thick fresh-frozen tissue cross-sections of P. tremula dormant leaf buds were manually ground to a powder in 1.5 ml Eppendorf tubes containing 100 µl of Lysis/Binding Solution of the RNAqueous-Micro Total RNA Isolation Kit (Ambion). Grinding was performed using a plastic pestle. A half volume of 100% ethanol was added to the lysate. The lysate/ethanol mixture was added to the filter cartridge and total RNA was extracted using the RNAqueous-Micro Total RNA Isolation Kit following the manufacturer's instructions. RNA was eluted in 20 µl of elution solution.

Fragmentation of total RNA was performed using the NEBNext Magnesium Fragmentation Module (NEB) for 1.5 min following the manufacturer's instructions. Fragmented RNA was purified using the RNeasy MiniElute Cleanup Kit (Qiagen). The average RNA fragment length was 350 bp established using the RNA 6000 Pico Kit on a 2100 Bioanalyzer.

The concentration of libraries obtained starting from fragmented RNA was measured with the Qubit RNA BR Assay Kit (Life Technologies).

Comparison between permeabilized and ground tissue section libraries

Four sections belonging to the same P. tremula dormant leaf bud were positioned on four separate arrays (one section per array) and were subjected to the protocol described in the sections “Cryosectioning, fixation, staining and imaging” and “Permeabilization, cDNA synthesis, tissue removal and probe release” with P. tremula dormant leaf bud specifications.

Four additional arrays, with no tissue section on top, were treated as described in the sections “Cryosectioning, fixation, staining and imaging” and “Permeabilization, cDNA synthesis, tissue removal and probe release” with P. tremula dormant leaf bud specifications, no bright field imaging was performed, but 85% glycerol was added and removed. Four fragmented RNA samples (RIN values of starting RNA samples: 7.50 and 8.00) extracted from four single sections of another P. tremula dormant leaf bud were separately added in the four reverse transcription mixtures in an amount of 12 µl each (the corresponding amount of PVP40 2.0% was not added in the mixture). The rest of the protocol proceeded as described above.

Sequence alignment and annotation

Sequencing reads are structured as follows: read 1 (R1) contains the spatial barcode and the UMI, and read 2 (R2) contains gene information. Sequencing adaptors were initially removed both from R1 and R2, as well as A/G/T/C homopolymers longer than 15 bp with their following sequences. The remaining reads underwent BWA quality trimming. After quality trimming, reads shorter than 28 bp were discarded. Subsequently, quality filtered reads were aligned to the SortMeRNA47 (v2.0) ribosomal RNA database for eukaryotes and reads mapping to this database were removed. The remaining reads were aligned to the reference genome (A. thaliana48, A. thaliana TAIR10; P. tremula, P. trichocarpa49 v3.0; P. abies, P. abies20 v1.0) using STAR50 (v2.4) using default settings except for the minimum and maximum intron length (A. thaliana: min. intron length = 30, max. intron length = 12,000; P. tremula: min. intron length = 50, max. intron length = 12,000; P. abies: min. intron length = 50, max. intron length = 70,000). Aligned reads were counted using HTSeq51 (htseq-count mode union) based on the following annotations: A. thaliana, TAIR10 containing any type of gene information; P. tremula, P. trichocarpa v3.0 specific for mRNA sequences; P. abies, P. abies v1.0. Non-annotated reads were discarded. Finally, annotated R2s were associated to their corresponding R1s (‘de-multiplexing’)52. R2s having corresponding R1 with a spatial barcode not present in the reference barcode file were removed. Moreover, duplicated R2s were removed based on the UMI information.

Data normalization

Normalization was performed differently depending on plant-specific expression profiles. For P. tremula, A. thaliana and P. abies samples, counts of unique transcripts per spot, obtained from the UMI filtering step, were normalized by the total number of unique transcripts detected in a spot and applying a scaling factor of 10,000 reads (TP10K). A + 1 pseudocount was added prior to log2 transformation.

A. thaliana counts were normalized utilizing the estimateSizeFactors function in DESeq prior to t-SNE analysis.

Data comparison between permeabilized and ground tissue section libraries

All spots were included in the normalization. A total number of 7,987 spots and 9,448 genes (common genes between all permeabilized and ground tissue section replicates) were considered to calculate the Pearson correlation.

Hierarchical clustering, PCA and differential gene expression analysis

Two different P. tremula developing and dormant leaf buds were utilized in this analysis. Replicates of the experiments were performed on different days following the standard protocol.

Spots covered by at least 10% of leaf primordia (160 in total) were selected for the analysis. Normalization was performed as described above; only spots presenting expression profiles for at least 150 genes, and genes represented by at least two counts in at least one spot were included. Hierarchical clustering, based on Euclidean distances and the ward.D2 method, was performed on the 408 highly expressed genes common to all samples (sum of normalized counts per gene ≥200).

Principal component analysis (PCA) was performed on the common 5,177 genes expressed in the 160 spots belonging to developing and dormant leaf buds.

Differentially expressed genes were identified by using the software Simca p + 13.0.3 × 64 (Umetrics), applying the multivariate statistical tool Orthogonal Projections to Latent Structures by means of partial least squares (OPLS)53 to the normalized counts of the common 5,177 genes. The application of multivariate data analysis techniques to gene expression analyses has already been described54,55. Given X the data matrix containing the gene expression levels, and Y the vector containing the two developmental stages (class of belonging), OPLS separates the systematic variation in X into two parts: one that is correlated (predictive) with Y, and one that is uncorrelated (orthogonal) with Y. The systematic variation observed in X that is not linearly related to Y can therefore be removed and only the variation that is correlated to the classification of interest is studied.

Before this calculation, which was executed with default parameters, except for “Confidence level on parameters” set to 99% and “Significance level” set to 0.01, the data underwent mean-centring and unit variance scaling. Cross-validation (default setting, seven rounds) was applied to estimate the model complexity. Briefly, cross-validation works as follows: (1) observations (rows in the X matrix) are kept out of the model development; (2) the response values (Y) for the omitted observations are predicted by the model; (3) the predictions of the omitted values are compared with the actual values; (4) 1–3 are repeated until all parts have been omitted once and only once. The prediction error is then calculated as the squared differences between observed Y and predicted values when the observations were kept out of the model fitting.

The significance of differentially expressed genes was calculated as the difference between the absolute value (ABS) of the OPLS loading value and the absolute value of the jack-knife confidence interval56: ABS(OPLS model loading value) – ABS(jack-knife standard error). A positive value of significance indicated that a gene was significantly differentially expressed. Briefly, the OPLS loading value (p) expresses the importance of the variables in approximating X in the selected component, and the jack-knife standard error derives from rounds of cross-validation.

To study the method reproducibility and spatial gene expression in P. abies samples, only spots covered by at least 10% of female cone tissue section were selected (219). One hundred and thirty-six spots were located under later organs, and 83 under parenchymal tissue. Only spots presenting expression profiles for at least 100 genes (109 in total), and genes represented by at least two counts in at least one spot (1,385 in total) were normalized as described above. PCA was performed on 212 shared expressed genes among the samples.

Gene ontology enrichment analysis and identification of orthologous genes

Gene ontology enrichment analysis of P. tremula genes identified using OPLS as differentially expressed in developing and dormant buds was performed using the AgriGO web resource57 (http://bioinfo.cau.edu.cn/agriGO) using a custom background gene set of all genes classified as expressed in developing and/or dormant buds (Supplementary Table 1).

Gene annotation and orthologue information of Populus homologues to A. thaliana are taken from the PopGenIE.org resource39.

Validation analysis

Three longitudinal sections of the same A. thaliana inflorescence (technical replicates) were considered. Tissue domains represented in the tissue sections were stem, meristematic area, flowers stage 9, flowers stage 10 and 11, and flowers stage 12. Identification of floral stages was based on morphological correspondence with the classification proposed by Smyth et al.44

Normalized data from five developmental stages of the A. thaliana microarray Atlas of Development (AtGenExpress Development)27 were selected: ATGE_28 (1st node), ATGE_29 (shoot apex, inflorescence – after bolting), ATGE_31 (flowers stage 9), ATGE_32 (flowers stage 10/11) and ATGE_33 (flowers stage 12). Each of the selected stages in the AtGenExpress Development was represented in triplicate. The mean expression value was calculated for each gene. Based on the gene expression distribution per stage and per replicate (Supplementary Fig. 6), genes with a mean expression value lower than 5.0 were considered as not expressed in the AtGenExpress Development atlas.

From the expression profile distributions of spatial data (Supplementary Fig. 6), genes without any mapped reads, which means a log2(TP10K) ≤−6, were considered as not expressed.

Gene expression profiles were compared between spatial data and the AtGenExpress Development atlas by examining data for the following pairs: stem vs ATGE_28, meristematic area vs ATGE_29, flowers stage 9 vs ATGE_31, flowers stage 10 and 11 vs ATGE32 and flowers stage 12 vs ATGE_12. Since each selected stage in the AtGenExpress Development atlas was represented in triplicate, the average expression for each gene was calculated. Genes in the spatial transcriptomics dataset were defined as follows: true positive is the average expression value in the AtGenExpress Development atlas ≥5 and log2(TP10K) in the spatial transcriptomics dataset >−6; true negative is the average expression value in the AtGenExpress Development atlas <5 and log2(TP10K) in the spatial transcriptomics dataset <−6; false positive is the average expression value in the AtGenExpress Development atlas <5 and log2(TP10K) in the spatial transcriptomics dataset >−6; false negative is the average expression value in the AtGenExpress Development atlas ≥5 and log2(TP10K) in the spatial transcriptomics dataset ≤−6 (Supplementary Table 2).

Per-gene and global expression analysis in A. thaliana

For expression analysis at the per-gene level, individual cut-offs for each gene were determined using the R package ‘changepoint’.

For the global gene expression analysis 167 spots were selected among the three replicates. Only spots presenting expression profiles for at least 1,000 genes (156 spots in total), and genes represented by at least one count in at least one spot (21,000 genes in total) were normalized as described above. Dimensionality reduction of the expressed data was performed using the Rtsne R package. The separated spots were plotted in two dimensions and each spot was assigned to a cluster based on Euclidean distance (K means). The optimal number of clusters was determined using the DIndex method from the NbClust R package. A colour was assigned to each cluster and spots were plotted in the tissue showing the cluster colour.

Linear model

We anticipated three potential sources of variability of gene expression (E) and included them as factors in the three-way linear model: (1) functionally distinct tissue categories (T), (2) three longitudinal samples (technical replicates) (S) and (3) individual spots (F). Since the spots were not exactly overlapping between the sections, factor F was nested within factor S in this experimental design. The tissue categories were also classified either in macro-categories according to tissue domain of origin (stem, meristematic region and flowers; Supplementary Fig. 7); or in micro-categories (stem and premeristematic area, and the following flower stages: 3, 7, 9, 10, 11 and 12; Fig. 4c). Thus, factor T had either three or eight levels. The linear model allowed separate evaluation of the influence of these main factors: E=β1T+β2S+β3F(S)+ε

Identification of corresponding spots between tissue sections

Morphologically corresponding spots between A. thaliana replicates were identified as follows:

  1. The bright field image of the tissue section on the barcoded array and its corresponding fluorescence image, after probe release, were aligned. The ‘Opacity’ option was set in Photoshop to visualize which spots were covered by the tissue section.

  2. The spatial coordinate of each spot in the matrix was associated to a known spatial barcode. By visualizing the tissue on top of the spots, it was possible to link a specific tissue area to a spatial coordinate.

  3. Tissue sections were visually annotated based on their morphological macro- and micro-categories. Spots under similar morphological areas between the replicates were utilized as input for the factor F in the linear model. Such spots were subsequently filtered based on the gene expression values they contained.

Read counts as input to liner model

A +0.5 pseudocount was added to read counts prior to transformation using the function voom from R package limma, which log-transformed read counts and then fit the mean-variance distribution. Further, we achieved homoscedasticity (absence of correlation between mean and variance) of the read count data by using squares of the log-transformed values. Apart from the independence of variance from the mean, the values appeared normally distributed.

Network enrichment scores as input to linear model

NEA is a method that shares similarities with gene set enrichment analysis58,59 but also considers the context of the gene network.

NEA was performed using the latest version of the A. thaliana global network from the FunCoup resource (v.3)60 characterized by over five million edges. For the analysis we took a subset of the top one million edges, ranked by FunCoup confidence score61, between 8,039 gene nodes.

Apart from the network, the input for NEA should include experimental, altered gene sets (AGSs) and functional gene sets (FGSs). The AGSs contained the 30 most highly expressed genes in each spot. Since the total number of detected genes per spot varied and was low in some cases, 59 spots with at least 60 genes (up to 2,717) that had three or more mapped reads per gene were considered. Thus, each of the AGSs included 1–50% of total reported genes per spot.

All GO terms of the category ‘biological process’ with 3–1,000 member genes were used as the FGS collection. A total of 1,293 FGSs were created. We excluded five small sets in which no member genes were not found in the network.

Validation of the linear model

Apart from applying Bonferroni correction to the parametric P-values, the overall significance of the model findings was evaluated by a global permutation test, in which the P-values were calculated in the same linear model on gene expression profiles with randomly permuted site labels. We note that obtaining local, variable-specific P-value and FDR estimates was unfeasible, since it would require very deep permutation tests on each variable. The actual expression data generally reached the desired P-value levels more often than the random profiles (green vs red points in Fig. 4). More specifically, the upper left quadrant (delineated with black dotted lines in Fig. 4d,f and Supplementary Fig. 9) contained predominantly green points. Those quadrants combine genes and pathways, respectively, that were (1) consistent (not significantly different) between spots matching across the samples and (2) dissimilar (significantly different) between some tissue domains (low P-value for 1). Thus, prevalence of mainly green and not red points in those quadrants indicated that the three-way nested model controlled the FPR properly and that multiple gene and NEA score profiles were significantly different between the tissue categories. To roughly estimate FDR from the permutation test results, we counted how many variables had nominal P-values for factor T (functionally distinct tissue categories) below 0.001 and a P-value of factor F (individual spots) above 0.05 (Supplementary Table 3). We then compared counts from the actual data Nactual with those obtained on randomly permuted spot labels Npermuted, which produced global estimates FDR=Npermuted/Nactual

Code availability

The pipeline (v 0.7.5) used for read mapping and demultiplexing of spatial barcodes is available at: https://github.com/SpatialTranscriptomicsResearch/st_pipeline/.

Code written to analyse the comparison between the gene expression detected in permeabilized and ground tissue sections in P. tremula is available at https://github.com/stefaniagiacomello/Spatial-transcriptomics-data-analysis-in-plants.

Software used for the network enrichment analysis (creation of AGSs and pathway score calculation) is available as R package NEArender62 (https://cran.r-project.org/web/packages/NEArender/index.html).

Data availability

Raw sequence reads are available at the Sequence Read Archive with accession number SRP100428. Count matrixes are available at http://www.spatialtranscriptomicsresearch.org/. Expression levels are available at http://v22.popgenie.org/spatiallyresolvedtranscriptomics/.

Additional information

How to cite this article: Giacomello, S. et al.  Spatially resolved transcriptome profiling in model plant species. Nat. Plants 3, 17061 (2017).

Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

References

  1. 1.

    et al. Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science 353, 78–82 (2016).

  2. 2.

    et al. Highly multiplexed subcellular RNA sequencing in situ. Science 343, 1360–1363 (2014).

  3. 3.

    et al. In situ sequencing for RNA analysis in preserved tissue and cells. Nat. Methods 10, 857–860 (2013).

  4. 4.

    , & Spatially resolved transcriptomics and beyond. Nat. Rev. Genet. 16, 57–66 (2015).

  5. 5.

    , & RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10, 57–63 (2009).

  6. 6.

    et al. A transcriptome atlas of Physcomitrella patens provides insights into the evolution and development of land plants. Mol. Plant 9, 205–220 (2015).

  7. 7.

    & Microarray expression profiling resources for plant genomics. Trends Plant Sci. 10, 603–609 (2005).

  8. 8.

    et al. A gene expression map of the Arabidopsis root. Science 302, 1956–1960 (2003).

  9. 9.

    et al. A high-resolution root spatiotemporal map reveals dominant expression patterns. Science 318, 801–806 (2007).

  10. 10.

    , , , & A high-resolution gene expression map of the Arabidopsis shoot meristem stem cell niche. Development 17, 2735–2744 (2014).

  11. 11.

    & The INTACT method for cell type – specific gene expression and chromatin profiling in Arabidopsis thaliana. Nat. Protoc. 19, 56–68 (2011).

  12. 12.

    , , & Laser microdissection of plant tissue: what you see is what you get. Annu. Rev. Plant Biol. 57, 181–201 (2006).

  13. 13.

    et al. An improved procedure for isolation of high-quality RNA from nematode-infected Arabidopsis roots through laser capture microdissection. Plant Methods 12, 25 (2016).

  14. 14.

    , , & An efficient LCM-based method for tissue specific expression analysis of genes and miRNAs. Sci. Rep. 6, 21577 (2016).

  15. 15.

    et al. Ontogeny of the maize shoot apical meristem. Plant Cell 24, 3219–3234 (2012).

  16. 16.

    et al. A transcriptome atlas of rice cell types uncovers cellular, functional and developmental hierarchies. Nat. Genet. 41, 258–263 (2009).

  17. 17.

    Growth of the plant cell wall. Nat. Rev. Mol. Cell Biol. 6, 850–861 (2005).

  18. 18.

    , , & Production of plant secondary metabolites: a historical perspective. Plant Sci. 161, 839–851 (2001).

  19. 19.

    , & Gene networks in plant biology: approaches in reconstruction and analysis. Trends Plant Sci. 20, 664–675 (2015).

  20. 20.

    et al. The Norway spruce genome sequence and conifer genome evolution. Nature 497, 579–584 (2013).

  21. 21.

    et al. Quantitative single-cell RNA-seq with unique molecular identifiers. Nat. Methods 11, 163–166 (2014).

  22. 22.

    , , & Counting individual DNA molecules by the stochastic attachment of diverse labels. Proc. Natl Acad. Sci. USA 108, 9026–9031 (2011).

  23. 23.

    , , & Inclusion of polyvinylpyrrolidone in the polymerase chain reaction reverses the inhibitory effects of polyphenolic contamination of RNA. Nucleic Acids Res. 27, 915–916 (1999).

  24. 24.

    , & Daylength mediated control of seasonal growth patterns in perennial trees. Curr. Opin. Plant Biol. 16, 301–306 (2013).

  25. 25.

    et al. A cross-species transcriptomics approach to identify genes involved in leaf development. BMC Genomics 9, 589 (2008).

  26. 26.

    , & A systems biology model of the regulatory network in Populus leaves reveals interacting regulators and conserved regulation. BMC Plant Biol. 11, 13 (2011).

  27. 27.

    et al. A gene expression map of Arabidopsis thaliana development. Nat. Genet. 37, 501–506 (2005).

  28. 28.

    , , , & Genome-wide analysis of gene expression during early Arabidopsis flower development. PLoS Genet. 2, 1012–1024 (2006).

  29. 29.

    , & Identification, sequence analysis and expression studies of novel anther-specific genes of Arabidopsis thaliana. Plant Mol. Biol. 37, 607–619 (1998).

  30. 30.

    & Function of the apetala-1 gene during Arabidopsis floral development. Plant Cell 2, 741–753 (1990).

  31. 31.

    , & The homeotic gene apetala 3 of Arabidopsis thaliana encodes a MADS box and is expressed in petals and stamens. Cell 68, 683–697 (1992).

  32. 32.

    & Function and regulation of the Arabidopsis floral homeotic gene PISTILLATA. Genes Dev. 8, 1548–1560 (1994).

  33. 33.

    et al. The protein encoded by the Arabidopsis homeotic gene AGAMOUS resembles transcription factors. Nature 346, 35–39 (1990).

  34. 34.

    & Visualizing high-dimensional data using t-sne. J. Mach. Learn. Res. 9, 2579–2605 (2008).

  35. 35.

    , & Edger: a bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics 26, 139–140 (2009).

  36. 36.

    et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 43, e47 (2015).

  37. 37.

    , , & A male gametophyte-specific monosaccharide transporter in Arabidopsis. Plant J. 17, 191–201 (1999).

  38. 38.

    et al. Network enrichment analysis: extension of gene-set enrichment analysis to gene networks. BMC Bioinformatics 13, 226 (2012).

  39. 39.

    et al. The plant genome integrative explorer resource: plantGenIE.org. New Phytol. 208, 1149–1156 (2015).

  40. 40.

    & Modelling and analysis of gene regulatory networks. Nat. Rev. Mol. Cell Biol. 9, 770–780 (2008).

  41. 41.

    , & Comparative analysis of gene regulatory networks: from network reconstruction to evolution. Annu. Rev. Cell Dev. Biol. 31, 399–428 (2015).

  42. 42.

    et al. Genome-wide RNA tomography in the zebrafish embryo. Cell 159, 662–675 (2014).

  43. 43.

    , , , & RNA-seq: an assessment of technical reproducibility and comparison with gene expression arrays. Genome Res. 18, 1509–1517 (2008).

  44. 44.

    & Early flower development in Arabidopsis. Plant Cell 2, 755–767 (1990).

  45. 45.

    et al. Massive and parallel expression profiling using microarrayed single-cell sequencing. Nat. Commun. 7, 1–9 (2016).

  46. 46.

    , , , & Increased throughput by parallelization of library preparation for massive sequencing. PLoS ONE 5, e10029 (2010).

  47. 47.

    , & SortMeRNA: Fast and accurate filtering of ribosomal RNAs in metatranscriptomic data. Bioinformatics 28, 3211–3217 (2012).

  48. 48.

    The Arabidopsis Genome Initiative. Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature 408, 796–815 (2000).

  49. 49.

    . et al. The genome of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313, 1596–1604 (2006).

  50. 50.

    et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).

  51. 51.

    , & HTSeq—a python framework to work with high-throughput sequencing data. Bioinformatics 31, 166–169 (2015).

  52. 52.

    , & TagGD: fast and accurate software for DNA tag generation and demultiplexing. PLoS ONE 8, e57521 (2013).

  53. 53.

    & Orthogonal projections to latent structures (O-PLS). J. Chemometrics 16, 119–128 (2002).

  54. 54.

    et al. A combined proteomic and transcriptomic approach shows diverging molecular mechanisms in thoracic aortic aneurysm development in patients with tricuspid- and bicuspid aortic valve. Mol. Cell. Proteomics 12, 407–425 (2013).

  55. 55.

    et al. The impact of endurance training on human skeletal muscle memory, global isoform expression and novel transcripts. PLoS Genet. 12, e1006294 (2016).

  56. 56.

    , , , & Analysis of designed experiments by stabilised PLS regression and jack-knifing. Chemom. Intell. Lab. Syst. 58, 151–170 (2001).

  57. 57.

    , , , & agriGO: a GO analysis toolkit for the agricultural community. Nucleic Acids Res. 38, 64–70 (2010).

  58. 58.

    , & Bioinformatics enrichment tools: paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res. 37, 1–13 (2009).

  59. 59.

    et al. Gene set enrichment analysis: A knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. 102, 15545–15550 (2005).

  60. 60.

    , & Funcoup 3.0: Database of genome-wide functional coupling networks. Nucleic Acids Res. 42, 380–388 (2014).

  61. 61.

    & Global networks of functional coupling in eukaryotes from comprehensive data integration. Genome Res. 19, 1107–1116 (2009).

  62. 62.

    & NEArender: an R package for functional interpretation of ‘omics’ data via network enrichment analysis. BMC Bioinform. 18, 118 (2017).

Download references

Acknowledgements

We thank the Swedish National Genomics Infrastructure hosted at SciLifeLab, the National Bioinformatics Infrastructure Sweden (NBIS) for providing computational assistance, and the Uppsala Multidisciplinary Center for Advanced Computational Science (UPPMAX) for providing computational infrastructure. This work was supported by Knut and Alice Wallenberg Foundation, and Swedish Research Council. N.R.S. and B.K.T. are supported by the Trees and Crop for the Future (TC4F) project. This work was supported by a grant to N.R.S. from the Carl Tryggers Stiftelse för Vetenskaplig Forskning.

Author information

Author notes

    • Fredrik Salmén
    •  & Barbara K. Terebieniec

    These authors contributed equally to this work.

Affiliations

  1. Division of Gene Technology, School of Biotechnology, KTH Royal Institute of Technology, Science for Life Laboratory, 17165 Solna, Sweden

    • Stefania Giacomello
    • , Fredrik Salmén
    • , Sanja Vickovic
    •  & Joakim Lundeberg
  2. Department of Biochemistry and Biophysics, Stockholm University, Science for Life Laboratory, 17165 Solna, Sweden

    • Stefania Giacomello
  3. Umeå Plant Science Centre, Department of Plant Physiology, Umeå University, 90736 Umeå, Sweden

    • Barbara K. Terebieniec
    • , Chanaka Mannapperuma
    •  & Nathaniel R. Street
  4. Department of Cell and Molecular Biology, Karolinska Institute, 17165 Solna, Sweden

    • José Fernandez Navarro
    •  & Patrik L. Ståhl
  5. Department of Microbiology, Tumor and Cell Biology (MTC), Karolinska Institutet, 17165 Solna, Sweden

    • Andrey Alexeyenko
  6. National Bioinformatics Infrastructure Sweden, Science for Life Laboratory, 17121 Solna, Sweden

    • Andrey Alexeyenko
  7. Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, 75237 Uppsala, Sweden

    • Johan Reimegård
  8. Division of Glycoscience, School of Biotechnology, KTH Royal Institute of Technology, AlbaNova University Centre, 11421 Stockholm, Sweden

    • Lauren S. McKee
    •  & Vincent Bulone
  9. ARC Centre of Excellence in Plant and Cell Walls and School of Agriculture, Food and Wine, The University of Adelaide, Waite Campus, Urrbrae, Adelaide, South Australia 5064, Australia

    • Vincent Bulone
  10. Department of Plant Biology, Uppsala BioCenter, Linnean Center for Plant Biology, Swedish University of Agricultural Sciences, 75007 Uppsala, Sweden

    • Jens F. Sundström

Authors

  1. Search for Stefania Giacomello in:

  2. Search for Fredrik Salmén in:

  3. Search for Barbara K. Terebieniec in:

  4. Search for Sanja Vickovic in:

  5. Search for José Fernandez Navarro in:

  6. Search for Andrey Alexeyenko in:

  7. Search for Johan Reimegård in:

  8. Search for Lauren S. McKee in:

  9. Search for Chanaka Mannapperuma in:

  10. Search for Vincent Bulone in:

  11. Search for Patrik L. Ståhl in:

  12. Search for Jens F. Sundström in:

  13. Search for Nathaniel R. Street in:

  14. Search for Joakim Lundeberg in:

Contributions

S.G. and J.L. designed the project. S.G. developed the plant-specific protocol, performed and guided experiments and data analyses, prepared figures and wrote the manuscript. F.S. and P.L.S. developed the original protocol for mammalian tissue. F.S. contributed to some code used for the analyses. B.K.T. performed experiments. S.V. developed barcoded arrays. J.F.N. developed the alignment and demultiplexing pipeline. A.A. developed the linear model and performed its computations. J.R. performed a part of data analysis and a part of figure preparation. L.S.M. and V.B. provided consultation on plant cell-wall degradation enzymes. C.M. developed the visualization and data sharing tool. J.F.S. provided A. thaliana and P. abies samples, contributed to data interpretation. N.S. provided P. tremula samples, contributed to data interpretation, and guided the development of the visualization tool. A.A., L.S.M., J.F.S., N.R.S. and J.L. edited the manuscript.

Competing interests

P.L.S. and J.L are founders of a company that holds IP rights to the presented technology.

Corresponding authors

Correspondence to Stefania Giacomello or Joakim Lundeberg.

Supplementary information

PDF files

  1. 1.

    Supplementary Information

    Supplementary Figures 1-13.

Excel files

  1. 1.

    Supplementary Table 1

    Differential expressed genes between developing and dormant Populus tremula leaf buds.

  2. 2.

    Supplementary Table 2

    Number of TP, TN, FP and FN per each tissue domain in A. thaliana replicates.

  3. 3.

    Supplementary Table 3

    Linear model P-values per genes and pathways at the macro- and micro-category level.