In situ 10-cell RNA sequencing in tissue and tumor biopsy samples

Single-cell transcriptomic methods classify new and existing cell types very effectively, but alternative approaches are needed to quantify the individual regulatory states of cells in their native tissue context. We combined the tissue preservation and single-cell resolution of laser capture with an improved preamplification procedure enabling RNA sequencing of 10 microdissected cells. This in situ 10-cell RNA sequencing (10cRNA-seq) can exploit fluorescent reporters of cell type in genetically engineered mice and is compatible with freshly cryoembedded clinical biopsies from patients. Through recombinant RNA spike-ins, we estimate dropout-free technical reliability as low as ~250 copies and a 50% detection sensitivity of ~45 copies per 10-cell reaction. By using small pools of microdissected cells, 10cRNA-seq improves technical per-cell reliability and sensitivity beyond existing approaches for single-cell RNA sequencing (scRNA-seq). Detection of low-abundance transcripts by 10cRNA-seq is comparable to random 10-cell groups of scRNA-seq data, suggesting no loss of gene recovery when cells are isolated in situ. Combined with existing approaches to deconvolve small pools of cells, 10cRNA-seq offers a reliable, unbiased, and sensitive way to measure cell-state heterogeneity in tissues and tumors.


Results
Methods for profiling small quantities of cellular RNA have evolved considerably over the past decade, but they all involve the same fundamental steps: (1) cell isolation, (2) RNA extraction, (3) reverse transcription, (4) preamplification, and (5) detection 51 . The original protocol for in situ 10-cell profiling combines LCM for cell isolation followed by proteinase K digestion for RNA extraction 42 . The extracted material undergoes an abbreviated high-temperature reverse transcription with oligo(dT) 24 , and cDNA is carefully preamplified by poly(A) PCR 52 that generates sufficient 3′ ends (~500 bp in size) for microarray labeling and hybridization 42 (Fig. 1).
Unsurprisingly, the earliest steps in the procedure are the most critical for achieving the maximum amount of amplifiable starting material. To avoid losses, steps 1-4 (cell isolation through preamplification) are normally performed without intermediate purification. Therefore, buffers and reagents must be carefully tested and titrated to be mutually compatible throughout the "one-pot" protocol. Since description of the procedure 41,42 , multiple commercial providers merged or were acquired, leading to the discontinuation of multiple RNAse inhibitors, the Taq polymerase, and the BeadChip microarrays. The collective disruptions in sourcing prompted a modernization of 10-cell profiling toward RNA-seq of primary material at a biopsy scale, including how tissue-tumor samples were handled before the start of the procedure (Fig. 1).

Protein localization for LCM requires fresh cryoembedding.
To minimize extra handling steps that could degrade RNA, in situ profiling of clinical samples is ordinarily performed with rapid histological stains 41,51,53,54 (Fig. 1). LCM can also be guided by fluorescence in place of histology when using cells or animals engineered to encode genetic labels 55,56 . However, new challenges arise when seeking to preserve localization and brightness of encoded fluorophores during single-cell isolation and RNA extraction. Compared to polysome-bound mRNAs, fluorescent proteins diffuse much more readily, and chromophores may be damaged by the fixation and dehydration steps needed to preserve RNA integrity. Fluorescent-protein structure is preserved by chemical fixatives, but covalent crosslinking of biomolecules is unsuitable for extracting RNA from tissue. Fluorescence-guided profiling therefore entails a competing set of tradeoffs that must be balanced for optimal performance. www.nature.com/scientificreports www.nature.com/scientificreports/ We reasoned that the greatest flexibility would be afforded by reporter mice expressing tandem-dimer Tomato (tdT)-a bright, high molecular-weight derivative of DsRed 57 . Key handling parameters were evaluated using Cspg4-CreER;Trp53 F/F ;Nf1 F/F ;Rosa26-LSL-tdT mice, a model of malignant glioma 58 . In these animals, administration of tamoxifen elicits sparse labeling of oligodendrocyte precursor cells (OPCs) in the brain, enabling fluorescence retention to be assessed in single cells. Extensive optimization of cryosectioning and wicking conditions was required to preclude fluorophore diffusion while ensuring reliable LCM pickup (see Methods). We found that an accelerated 70-95-100% ethanol series 41,42 maintained tdT fluorescence and localization of labeled cells through xylene clearing and dehydration ( Fig. 2A). Separately, using freshly embedded tissue from a "mosaic analysis of double markers" (MADM) animal that labels various brain lineages with enhanced green fluorescent protein (EGFP), tdT, or both 59,60 , we confirmed that EGFP fluorescence was also acceptably retained with the 70-95-100% ethanol series ( Supplementary Fig. S2). Although EGFP diffusion was noticeably greater compared to tdT owing to its smaller size (~28 kDa vs. ~54 kDa), we could nonetheless reliably identify the cell bodies of single EGFP-positive cells for LCM. Surprisingly, we found that fresh-tissue embedding was critically important for preserving single-cell localization and brightness. Snap-freezing before cryoembedding caused considerable loss and delocalization of tdT fluorescence, even when prefrozen material was rapidly embedded in dry ice-isopentane (−40 °C) (Fig. 2B,C). Brightfield images of these cryosections also showed considerable tissue damage compared to freshly embedded material ( Supplementary Fig. S3). For mechanically challenging tissues in which embedding support is important for cryosectioning, we conclude that fresh-tissue embedding is essential for maximum biomolecular retention and integrity. Improving poly(A) preamplification for modern RNA-seq. Previously, in situ 10-cell profiling was optimized for quantification by BeadChip microarray 41,42 , but microarrays have been supplanted by RNA-seq for unbiased measures of the transcriptome 61 (Fig. 1). An advantage of RNA-seq is that nucleic acids are detected regardless of origin, enabling use of exogenous RNA standards to calibrate sensitivity and quantitative accuracy when spiked into a biological sample [62][63][64] . The versatility of RNA-seq is also a caveat, because all nucleic acids in a sample will be sequenced, including unwanted preamplification byproducts and contaminating DNA from mitochondria or the nucleus [65][66][67] . In the original scRNA-seq report that used a variant of poly(A) PCR, only 37 ± 9% of sequenced reads aligned to RefSeq transcripts 68 , and exonic alignment rates below 50% remain common 69 . Therefore, we focused improvements to poly(A) preamplification towards ensuring that most sequencing reads aligned to the 3′ ends of cellular mRNAs.
In poly(A) PCR, cDNA is 3′ adenylated and then preamplified with a universal T 24 -containing primer called AL1 52 . We previously found that the amount of AL1 strongly influenced overall sensitivity of gene detection, with improvements noted at concentrations as high as 25 µM 42 . Excess AL1 also drives nonspecific amplification of low molecular-weight primer concatemers 70 , which do not influence gene measurements by quantitative PCR or microarray but create overwhelming contamination for RNA-seq. To improve poly(A) PCR, we screened a range of commercial Taq and proofreading polymerases along with empirical blends of those that maximized the Fresh cryoembedding preserves tandem-dimer Tomato (tdT) fluorescence and localization better than snap-frozen alternatives. Brain samples from Cspg4-CreER;Trp53 F/F ;Nf1 F/F ;Rosa26-LSL-tdT animals were (A) freshly cryoembedded in Neg-50 medium with dry ice-isopentane (−40 °C), (B) snap-frozen in dry iceisopentane and then cryoembedded, or (C) snap-frozen and slowly cryoembedded in a cryostat (−24 °C). Lowand high-magnification images were captured with the factory-installed color camera on the Arcturus XT LCM instrument. Images were exposure matched and are displayed with a gamma compression of 0.67. Insets have been rescaled to emphasize tdT diffusion away from the cell body. Scale bar is 25 µm. Brightfield images from the same sections are shown in Supplementary Fig. S3. www.nature.com/scientificreports www.nature.com/scientificreports/ intended ~500 bp cDNA products relative to nonspecific concatemer. We obtained a better-than-additive preamplification by combining Taq and Phusion polymerases (see Methods). An equal mixture of the two enzymes dramatically increased the yield of ~500 bp preamplification products relative to nonspecific concatemer (Fig. 3A, lower). The empirical blend also significantly improved the preamplification of both high-abundance (GAPDH) and low-abundance (PARN) targets as measured by quantitative PCR (Fig. 3A, upper). The two-enzyme blend further enabled a 10-fold decrease in AL1 primer concentration without detectable loss in preamplification efficiency (Fig. 3B). The Taq-Phusion combination was superior for a primary breast-cancer biopsy (Fig. 3) as well as two murine tissue sources: a murine small-cell lung cancer line derived from Trp53 ∆/∆ Rb ∆/∆ lung epithelium 71 and tdT-labeled OPCs ( Supplementary Figs S4 and S5), illustrating its generality. The enzyme modification created a viable starting point for combining poly(A) PCR preamplification with RNA-seq.
Sensitivity, accuracy, and precision of the updated poly(A) PCR approach were assessed using recombinant RNA spike-ins as internal positive controls 64 . A dilution of ERCC spike-ins was defined that did not detectably perturb the measured abundance of endogenous transcripts in RNA equivalents from 10 microdissected cells (Fig. 4A). After poly(A) PCR of the spike-in dilution plus 100 pg RNA (~10 cells), we measured the relative abundance of individual spike-ins, using quantitative PCR (qPCR) to eliminate RNA-seq read depth as a complicating factor. Purified qPCR end products served as an absolute reference of each spike-in for cross-comparison (see Methods). We observed good linearity across 22 spike-ins spanning an abundance of ~10 4 (Fig. 4B). Deviations, technical noise, and dropouts all increased considerably for spike-ins below ~250 copies per reaction, consistent with previous reports 29 . This collective measurement uncertainty restricts interpretation of single-cell data to highly expressed transcripts, but 10-cell pooling reduces the threshold to ~25 copies on average per cell. With Data are shown as the median inverse quantification cycle (40-Cq) ± range from n = 3 amplification replicates and were analysed by two-way (A) or one-way (B) ANOVA with replication. Below-Preamplifications were analysed by agarose gel electrophoresis to separate poly(A)-amplified cDNA from nonspecific, low molecularweight concatemer (n.s.). Qualitatively similar results were obtained separately three times. Lanes were cropped by poly(A) PCR cycles for display but were electrophoresed on the same agarose gel and processed identically. The uncropped image is shown in Supplementary Fig. S13A. www.nature.com/scientificreports www.nature.com/scientificreports/ poly(A) PCR, we did not observe qualitative dropout in more than 50% of technical replicates for spike-ins as dilute as four copies per reaction (ERCC85; Fig. 4B), indicating good sensitivity. RNA spike-ins do not mimic the characteristics of endogenous transcripts extracted from cells, but they can provide a common reference to benchmark preamplification methods for RNA-seq 48 . These experiments indicated that the improved poly(A) preamplification was sufficiently reliable for unbiased profiling of 10-cell transcriptomes.
For RNA extraction from the LCM cap, an optimized digestion buffer is used containing proteinase K to release mRNAs from precipitated ribosomes 41 . Proteinase K also digests nucleosomes, which may cause elution of contaminating genomic DNA. In past and current analyses of human LCM samples preamplified ± reverse transcription, we never found genomic copies of genes amplified within ~0.4% of measured mRNA transcripts (∆Cq ≥ 8 for 16 genes measured in four human cell types, Supplementary Fig. S6). For mouse tissues, however, genomic copies were more prevalent and variable, with some genes measured as abundantly without reverse transcription as with it (Figs 5A and S6). Gel electrophoresis showed weak-but-detectable bands above the desired ~500 bp product in preamplifications without reverse transcription, implying nonspecific amplification (Fig. 5A, lower). Concerned that the murine genome could compete with the amplification of cDNA, we appended an intermediate purification following reverse transcription with 5′-biotin-modified oligo(dT) 24 . Biotinylated cDNA was purified on streptavidin-conjugated magnetic beads, which could be separated from contaminants in the LCM extract and used as a starting template for poly(A) preamplification. Addition of the biotin cleanup step mildly improved the amplification of cDNAs and, importantly, eliminated the confounding abundance of murine genomic DNA (Fig. 5B). We recommend biotinylated oligo(dT) 24 and bead purification for mouse samples considering the recurrent challenges with genomic DNA (Supplementary Fig. S6 and see Discussion).
Poly(A) PCR samples are kept dilute to avoid saturating the preamplification, but aliquots can be carefully reamplified up to microgram scale for microarray hybridization 41,42 . In preparing libraries for sequencing, we pursued tagmentation using Tn5 transposase because addition of sequencing adapters is sterically impossible within the ~40 bp distal ends of a PCR amplicon 72 . The steric restrictions of Tn5 were advantageous for pruning away the long, A-repetitive universal primer from poly(A) amplicons that would otherwise be wastefully sequenced. Commercial Tn5 tagmentation kits (Nextera XT) require 1000-fold less material than past microarray hybridizations, prompting reevaluation of how the 10-cell libraries were prepared. We retained the mid-logarithmic reamplification approach described previously 41 but substituted paramagnetic Solid Phase Reversible Immobilization (SPRI) beads for library purification 73 . Two rounds of purification with 70% (vol/vol) SPRI beads eliminated ~99% of primer dimers and concatemers in 10-cell reamplifications from various sources (Figs 6 and S7). Reamplified samples yielding at least 200 ng of purified product ( Supplementary Fig. S8) were tagmented at 1-ng scale according to the Nextera XT protocol. Although poly(A) amplicon sizes are centered at ~500 bp (Fig. 6A), we found that the higher SPRI bead ratio recommended for 300-500 bp inputs (180% [vol/vol] beads) was essential for purification of tagmented libraries ( Supplementary Fig. S9). Under these conditions, both new and archival poly(A) PCR preamplifications are compatible with RNA sequencing.

Paired comparison of 10-cell transcriptomics by BeadChip microarray and RNA-seq. Poly(A)
PCR provides an abundant source of material for transcript quantification, creating an opportunity to revisit 10-cell samples profiled earlier on BeadChip microarrays. In the original application of stochastic profiling, 10-cell samples were locally microdissected from 3D spheroids of a clonal human breast-epithelial cell line 41 . We sequenced 18 biological replicates from this study (6.6 ± 2.3 million reads) along with three 10-cell pool-and-split controls that assessed technical variability 33,41 . Technical correlation was as high within pool-and-split replicates www.nature.com/scientificreports www.nature.com/scientificreports/ measured by RNA-seq as when the same replicates were measured by microarray (R ~ 0.9; Fig. 7B,C,D,F-H). For both platforms, undetectable genes in one technical replicate were quantified up to ~10 2 = 100 transcripts per million (TPM) or ~10 3.3 = 2000 BeadChip fluorescence intensity in another replicate. Among detected genes with at-least one technical replicate yielding zero measured TPM, we found that RNA-seq correlated with BeadChip intensity across replicates (R ~ 0.4, p ~ 0; Supplementary Fig. S10A). The concordance between the two platforms strongly argues that transcript losses are authentic dropout events 74 , not artifacts of RNA-seq read depth or BeadChip detection sensitivity. Combining the reliable detection limits of 100 TPM (Fig. 7B,C,F) and ~250 ERCC copies/reaction (Fig. 4B), we predict (250 copies/reaction)/(10 cells/reaction x 100 TPM) = 250,000 mRNA copies per cell, consistent with published estimates 38 .
When 10-cell transcript representation was compared, we found that RNA-seq TPM and BeadChip microarray intensities were correlated (R ~ 0.6; Fig. 7A,E,I), albeit not as strongly as reported elsewhere 50,75 . Some genes yielded background fluorescence on microarrays but moderate-to-high TPM, likely due to BeadChip probe sequences absent from the amplicons generated by poly(A) PCR. Among genes with a median TPM > 1000 by RNA-seq, we identified 27 BeadChip probes exhibiting a median fluorescence less than 10 2.5 . The median distance of the 27 probes from the 3′ end of the corresponding gene was 845 bases (interquartile range: 492-1392 bases), Above-Data are shown as the median inverse quantification cycle (40-Cq, gray) of n = 3 independent experiments (three amplification replicates per experiment). Differences with and without bead cleanup were assessed by Wilcoxon rank sum test. Below-Preamplifications were analysed by agarose gel electrophoresis to separate poly(A)-amplified cDNA and genomic amplification. Electrophoretic traces were analysed by densitometry to the left of the image, with genomic amplicons highlighted (arrows). Lanes were cropped by the indicated conditions for display but were electrophoresed on the same agarose gel and processed identically. The uncropped image is shown in Supplementary Fig. S13D. www.nature.com/scientificreports www.nature.com/scientificreports/ upstream of the distal ~500 bp 3′ ends amplified by poly(A) PCR. The probe-independent nature of RNA-seq reinforces one of its critical advantages for 10-cell transcriptomics.
We also evaluated quantitative concordance of the 18 10-cell samples measured both by BeadChip microarray and RNA-seq. The variance of 7713 genes was twice their mean value measured on each platform, suggesting significant biological variation across the 18 samples (p < 0.01). For biologically variable genes, the median sample-by-sample Pearson correlation between BeadChip microarray and RNA-seq was 0.42 (interquartile range: 0.16-0.63), with 599 transcripts showing R ≥ 0.8 ( Supplementary Fig. S10B). Considering a median TPM of 17 (interquartile range: 4-49) for the 10-cell data analysed, these cross-platform correlations fall within the range reported for TCGA microarrays and RNA-seq (R ~ 0.4-0.9) 75 . Our retrospective analysis indicates that 10cRNA-seq data corroborate BeadChip microarrays and provide broader access to 3′ mRNA ends not represented on oligonucleotide probe sets.
Advantages of 10cRNA-seq for diverse mouse and human cell types. Last, we aggregated the intermediate revisions to 10-cell transcriptomic profiling (Fig. 1) and asked whether there were more-overarching benefits to sequencing small pools versus single cells. Different methods for scRNA-seq have already been rigorously compared by multiple groups 48,69 . Since 10-cell sampling could be adopted by many of these approaches, we focused instead on the data quality from published scRNA-seq datasets of various types relative to similar cells profiled by our 10cRNA-seq approach, including biological replicates and pool-and-split controls. We identified two scRNA-seq datasets for murine OPCs 76,77 , two for murine lung neuroendocrine cells 78 , two for human breast cancer 79,80 , and one for MCF-10A cells 81 (Supplementary Table S1). All raw data were identically processed and aligned to the transcriptome with RSEM 82 . Using transcriptome references stringently emphasized exonic read alignments, and the RSEM model for expectation maximization enabled the degeneracy of 3′-end sequences to contribute to transcript quantification. Data quality was gauged by the percentage of reads aligned, and sensitivity was assessed by the number of Ensembl genes with an estimated TPM greater than one.
For the mouse cell types, we observed significant increases in gene detection between 10cRNA-seq and certain scRNA-seq datasets (Fig. 8A). OPCs isolated by fluorescence-guided LCM showed increased gene detection with 10cRNA-seq compared to scRNA-seq of OPCs purified by fluorescence-activated cell sorting (GSE75330) 77 . Gene detection in the sorted OPCs was poorer than when OPCs were collected randomly in a cell atlas of the mouse cortex (GSE60361) 76 , emphasizing the stresses caused by non-LCM methods of enrichment. We were unable to detect a significant increase in gene detection between small-cell lung cancer cells profiled by 10cRNA-seq and single neuroendocrine cells randomly dissociated from the mouse airway and profiled by plate-based scRNA-seq 78 . However, neuroendocrine cells are so rare in this tissue that plate-based scRNA-seq was very underpowered (n = 5 cells). When droplet-based scRNA-seq was used to increase statistical power to n = 92 cells, there was a significant reduction in gene counts compared to 10cRNA-seq profiling the equivalent of 120 cells (n = 12 10-cell replicates). Results were similar but even more striking for human cell types (Fig. 8B). 10cRNA-seq of MCF-10A cells and primary breast cancer cells showed high alignment rates and routinely detected more than 10,000 Ensembl genes, the upper limit for any single cell profiled by three different scRNA-seq methods [79][80][81] . In cases where gene sensitivities were comparable, we noted dramatically improved alignment rates for 10cRNA-seq (Fig. 8C,D), reinforcing the efficiency of data collection by adopting a 10-cell approach.
The increased detection of transcripts in 10cRNA-seq data could arise from the accumulation of sporadic gene-expression events among single cells in the 10-cell pool. 10cRNA-seq collects 10-cell pools that are  Supplementary Fig. S13E. (B) Contaminating low molecular-weight concatemers are significantly reduced after two rounds of SPRI bead purification. Data are shown as the mean (gray) of n = 3 independent reamplifications (circles) each purified three times (+). Differences were assessed by two-way ANOVA with replication. The uncropped gel image used for concatemer densitometry is shown in Supplementary Fig. S13F (upper).
www.nature.com/scientificreports www.nature.com/scientificreports/ histologically indistinguishable by LCM, but it does not control for noisy transcriptional bursting or differences in cell-cycle phase. To evaluate whether the 10cRNA-seq detection statistics were consistent with those from scRNA-seq data, we randomly combined similar single-cell transcriptomes into 10-cell groups, modeling dropouts as a binomial probability for RNA-to-cDNA conversion (see Methods). We aggregated 48 random 10-cell assemblies within each of the six scRNA-seq datasets [76][77][78][79][80][81] and noted a significant increase in gene counts that was comparable to 10cRNA-seq data ( Supplementary Fig. S11). On a per-cell basis, 10cRNA-seq matches the gene-recovery sensitivity of scRNA-seq and may be preferable when isolating single cells in situ is critical. www.nature.com/scientificreports www.nature.com/scientificreports/

Discussion
Single-cell transcriptomics has expanded or rewritten the catalog of cell types in tissues, organs, and organisms 78,83-88 . Yet, scRNA-seq does not obviate the need for complementary approaches, which accurately profile regulatory-state changes within a given cell lineage 43 . The technical advances reported here demonstrate the immediate feasibility of 10cRNA-seq for mouse and human samples obtained in situ by LCM. We combined straightforward extensions of ERCC spike-ins and tagmentation with new approaches for fluorescence-guided LCM and cDNA purification that may prove beneficial for other applications (Fig. 1). Although small-sample RNA-seq is never fully dissociated from tissue acquisition or cell handling, our data illustrate a workflow that can be paused and restarted when LCM is used as an intermediate step.
Previous descriptions of fluorescence-guided LCM relied upon exogenous fluorophores added by lectins, antibodies, or viruses 27,55,56,89 . Through careful optimization of cryoembedding and LCM, we identified conditions that preserved the most-common fluorescent proteins used to engineer the mouse germ line. Compatibility with genomically encoded labels creates new opportunities for combining 10cRNA-seq with lineage tracing 90 to examine early regulatory-state changes in development and disease. Compared to fluorophore localization, RNA integrity was not as exquisitely sensitive to sample preparation and handling. Nevertheless, we recommend fresh cryoembedding of all samples in case other protein-guided approaches, such as immuno-LCM 91 , might be pursued. The breast core biopsies profiled here were prospectively obtained and cryoembedded during an outpatient procedure. However, a nearly identical protocol has been deployed intra-operatively for surgical pathology 92 , implying that fresh cryoembedding is not prohibitive for biobanked clinical samples.
A startling result from the revised protocol was the extent of poly(A) amplification observed in murine samples when reverse transcription was omitted. Nonspecific amplification was not as prominent in human samples obtained by LCM, pointing to specific differences in genome composition and the susceptibility to priming with AL1. A plausible explanation lies in transposable elements-specifically, the distinct classes of short interspersed nuclear elements (SINEs) in rodents and humans 93 . Human-specific Alu SINEs and rodent-specific B-type SINEs both contain stretches of 10-20 As that could partially anneal to the T homopolymer sequence on the 3′ end of AL1 94 . However, to amplify during poly(A) PCR, an antisense SINE must be sufficiently nearby. The mouse genome is ~20% smaller than humans, and B-type SINEs are ~25% more numerous in mice compared to Alu SINEs in humans 93 . The differences reduce the expected spacing of sense-antisense SINEs from ~6 kb in humans to ~4 kb in mice, consistent with a prior analysis of sense-antisense SINEs around transcription start sites 95 . The shorter average spacing may be close enough for genomic fragments to compete with the ~500 bp cDNA amplicons generated during reverse transcription (Figs 3, 5A). Such nonspecific products were prevented from coamplifying with cDNA by using biotinylated oligo(dT) 24 and streptavidin beads, akin to the bead capture and primer www.nature.com/scientificreports www.nature.com/scientificreports/ extension of droplet-based approaches 34,96 . This strategy may prove useful in other non-murine settings, such as suspension cells, where genomic contamination will be more extensive than with LCM 42 .
ERCC spike-ins provide a standard to compare 10cRNA-seq against single-cell methods for transcriptomic profiling. Using the metrics of Svennson et al. 48 , we estimate a 50% detection sensitivity of 45 copies per reaction (90% nonparametric CI: ) and a Pearson product-moment correlation coefficient of R = 0.86 (90% nonparametric CI: [0.71-0.91] from n = 72 samples). The R accuracy is somewhat lower than prevailing techniques, but that may be overly pessimistic because 10cRNA-seq uses such a dilute mix of spike-ins (4 million-fold dilution of the ERCC stock). Detection sensitivity is comparable to that reported for the most popular plate-based scRNA-seq methods, including SMART-seq2 97 and CEL-seq 98 . The strength of 10cRNA-seq lies in the use of 10-cell pooling to improve the per-cell technical sensitivity beyond the best microfluidic-and droplet-based approaches for scRNA-seq 48 . LCM minimizes disruptive tissue handling and provides histologic cues for microdissecting pools of cells within the same lineage. Adopting a 10-cell approach may prove similarly beneficial for other microdissection-based approaches, such as GEO-seq 26 and the recent pairing of SMART-seq2 with LCM 28 .
When 10cRNA-seq was compared to scRNA-seq, we often observed significant improvements in exonic alignment. Methods for scRNA-seq typically yield exonic alignment rates below 50% 81 , with the remainder of aligned reads splitting equally between intronic and intergenic sequences 97 . 10cRNA-seq achieves exonic alignments of 70% or higher despite using oligo(dT)-primed reverse transcription with the same potential to prime internal A homopolymer sequence as with scRNA-seq 99,100 . Interestingly, in one instance of similarly high exonic alignment (GSE66357, Fig. 8B), the RNA-printing approach to scRNA-seq incorporated a DNase treatment absent from all other methods 81 . This study also yielded a significantly reduced gene-detection sensitivity compared to 10cRNA-seq. Commingling genomic DNA may dilute exonic alignment percentages and inflate the number of genes detected due to chance sequencing of genomic DNA from exonic loci. Multiple scRNA-seq approaches incorporate unique molecular identifiers appended to oligo(dT) 48,81,101 . The identifiers avoid redundantly counting the same product of reverse transcription, and they also retrospectively exclude sequenced reads that do not come from cDNA. The biotin cleanup approach we devised for mouse cells (Fig. 5) achieves cDNA selection prospectively in situations where genomic contamination may be problematic.
Our work illustrates that 10-cell profiling can extend beyond microarrays 45 and quantitative PCR 39,40 to compete favorably with scRNA-seq. Although ill-suited for lineage mapping of highly mixed cell populations 43 , 10cRNA-seq exploits the precision of LCM to target specific cell types in situ and define their regulatory heterogeneities. LCM is also advantageous for sequencing cells that are delicate or difficult to dissociate rapidly 28 . We anticipate immediate applications of 10cRNA-seq to cancer biology, where the initiation, progression, and diversification of tumors could be tackled in modern animal models as well as in patients.

Materials and Methods
Cell and tissue sources. The MCF10A-5E breast epithelial cell samples were described previously 41 . KP1 small-cell lung cancer cells 71 were grown as spheroids in RPMI Medium 1640 with 10% FBS, 1% penicillin-streptomycin, and 1% glutamine. KP1 spheroids were pelleted and mixed in Neg-50 (Richard-Allan Scientific) before cryoembedding. Animal housing and experimental procedures were carried out in compliance with regulations and protocols approved by the IACUC at the University of Virginia. Cspg4-CreER;Trp53 F/F ;Nf1 F/F ;Rosa26-LSL-tdT mice 58 were housed in accordance with IACUC Protocol #3955 at the University of Virginia. As per the approved protocol, animals were administered 200 mg/kg tamoxifen by oral gavage for five days, and brains were harvested at 12 days or 183 days after the last administration. A labelled glioma arising the olfactory bulb at 165 days after the last tamoxifen administration was also used. Human sample acquisition and experimental procedures were carried out in compliance with regulations and protocols approved by the IRB-HSR at the University of Virginia. In accordance with IRB Protocol #19272, breast cancer samples were collected as ultrasound-guided core needle biopsies during diagnostic visits from participants who provided informed consent. Each core biopsy was divided into multiple pieces before cryoembedding. Unless otherwise indicated, all samples were freshly cryoembedded in a dry ice-isopentane bath and stored at −80 °C wrapped in aluminium foil.
Cryosectioning. Samples were equilibrated to −24 °C in a cryostat before sectioning. 8 µm sections were cut and wicked onto Superfrost Plus slides. To preserve fluorescence localization of tdT and EGFP, slides were precooled on the cutting platform for 15-30 sec before wicking, and the section was carefully placed atop the cooled slide with forceps equilibrated at −24 °C. Then, the slide was gently warmed from underneath by tapping with a finger until the section was minimally wicked onto the slide. All wicked slides were stored in the cryostat before transfer to −80 °C storage on dry ice. Frost build-up was minimized by storing cryosections in five-slide mailers.
Staining, dehydration, and laser-capture microdissection. For cryosections lacking fluorophores, slides were stained and dehydrated as described previously 41,42 . Briefly, slides were fixed immediately in 75% ethanol for 30-60 sec, rehydrated quickly with water, stained with nuclear fast red (Vector Labs) containing 1 U/ml RNAsin-Plus (Promega) for 15 sec, and rinsed two more times with water before dehydrating with 70% ethanol for 30 sec, 95% ethanol for 30 sec, and 100% ethanol for 1 min and clearing with xylene for 2 min. tdTand EGFP-labelled cryosections were not stained and instead began with the 70% ethanol dehydration step that also provided solvent fixation. After air drying, slides were microdissected immediately on an Arcturus XT LCM instrument (Applied Biosystems) using Capsure HS caps (Arcturus). The smallest spot size was used, and typical instrument settings of ~50 mW power and ~2 msec duration yielded ~25 µm spot diameters capturing 1-3 cells per laser shot.
RNA extraction and first-strand synthesis. RNA extraction and first-strand synthesis were similar to earlier protocols 41,42 with some minor modifications. Capsure HS caps were eluted for 1 hr at 42 °C with 4 µl www.nature.com/scientificreports www.nature.com/scientificreports/ of digestion buffer containing 1.25x First-strand buffer (Invitrogen), 100 µM dNTPs (Roche), 0.08 OD/ml oligo(dT) 24 with or without 5′-biotin modification (IDT), and 250 µg/ml proteinase K (Sigma). Samples containing ERCC spike-ins included a four-million-fold dilution of ERCC spike-in mixture 1 (Ambion). Eluted samples were centrifuged into 0.5 ml PCR tubes at 560 rcf for 2 min, the digestion buffer was quenched with 1 µl of digestion stop buffer containing 2 U/µl SuperAse-in (Invitrogen) and 5 mM freshly prepared PMSF (Sigma). 4.5 µl of the quenched extract was transferred to a 0.2 ml PCR tube, and reverse transcription was performed with 0.5 µl of SuperScript III (Invitrogen) for 30 min at 50 °C followed by heat inactivation at 70 °C for 15 min. Samples were placed on ice and centrifuged for 2 min at 18,000 rcf on a benchtop microcentrifuge.
Streptavidin bead cleanup of biotinylated first-strand products. For 5′-biotin-containing samples, streptavidin magnetic beads (Pierce) were prepared in a 0.2 ml PCR tube on a 96 S Super Magnet Plate (Alpaqua). Beads (6 µl per sample) were magnetized, aspirated, and resuspended in binding buffer (5 µl per sample) containing 1x First-strand buffer (Invitrogen), 4 M NaCl, and 0.02% (vol/vol) Tween-20. 5 µl of resuspended beads were added after first-strand synthesis, and samples were incubated for 60 min at room temperature with mixing every 15 min. Beads were pelleted on the magnet plate, resuspended in 100 µl high-salt wash buffer (50 mM Tris Poly(A) PCR re-amplification. For sequencing, poly(A) PCR cDNA samples were reamplified as before 41,42 in a 100 µl PCR reaction containing 1x High-Fidelity buffer (Roche), 3.5 mM MgCl 2 , 200 µM dNTPs (Roche), 100 µg/ml BSA (Roche), 5 µg AL1 primer, 1 μl Expand High Fidelity polymerase (Sigma), and 1 µl of poly(A) PCR sample. Each reaction was amplified according to the following thermal cycling scheme: 1 min at 94 °C (denaturation), 2 min at 42 °C (annealing) and 3 min at 72 °C (extension). The appropriate number of PCR cycles was determined by a pilot reamplification containing 20 µl of the PCR reaction above plus 0.25x SYBR Green monitored on a CFX96 real-time PCR instrument (Bio-Rad). The number of amplification cycles for each sample was selected to ensure that the reamplification remained in the exponential phase and there was sufficient cDNA for SPRI bead purification (typically 5-12 cycles).
qPCR. For detection of specific targets in poly(A) PCR samples, qPCR was performed on a CFX96 real-time PCR instrument (Bio-Rad) as previously described 102 . 0.1 µl or 0.01 µl of each preamplification was used with the qPCR primers listed in Supplementary Table S2. For relative quantification between ERCC spike-ins, qPCR amplicons were purified by gel electrophoresis, extracted, ethanol precipitated, and quantified by spectrophotometry. Purified amplicons were used to create a six-log standard curve based on ERCC amplicon copy number. All spike-ins were normalized to ERCC130 copy numbers to obtain relative abundance.

SPRI bead purification.
Re-amplified samples were purified twice with 70% (vol/vol) Ampure Agencourt XP SPRI beads. SPRI beads were equilibrated to room temperature for 30 min, and 70 µl beads were added to the 100 µl reamplification product. After a 15-min incubation at room temperature, samples were magnetized for 5 min. The supernatant was removed with a gel-loading pipette tip, leaving ~5 µl volume in the well. Beads were gently washed twice on the magnet with 200 µl freshly prepared 80% (vol/vol) ethanol and aspirated with a gel-loading pipette tip. Residual ethanol was removed after the second wash, and beads were air-dried at room temperature for 10 min before resuspension in 10 µl elution buffer (10 mM Tris-HCl [pH 8.5]). Samples were magnetized at room temperature for 1 min, and the eluted supernatant was transferred to a new 0.2 ml PCR tube. The 10 µl elution was purified a second time with 7 µl beads and the same incubation, ethanol wash, and elution conditions as the first purification. www.nature.com/scientificreports www.nature.com/scientificreports/ RNA sequencing and analysis. Bead-purified cDNA libraries were quantified with the Qubit dsDNA BR Assay Kit (Thermo Fisher) using a seven-point standard curve and a CFX96 real-time PCR instrument (Bio-Rad) for detection. Samples were diluted to 0.2 ng/µl before tagmentation with the Nextera XT DNA Library Preparation Kit (Illumina) according to the manufacturer's earlier recommendation to purify libraries with 180% (vol/vol) SPRI beads (Supplementary Fig. S9). For each run, samples were multiplexed at an equimolar ratio, and 1.3 pM of the multiplexed pool was sequenced on a NextSeq 500 instrument with NextSeq 500/550 Mid/High Output v1/v2 kits (Illumina) to obtain 75-bp paired-end reads at an average depth of 4.2 million reads per sample ( Supplementary Fig. S8) or 7.5 million reads per sample (all others). Simulated read depths of 10cRNA-seq data from MCF10A-5E cells confirmed saturation of gene detection above ~5 million reads per sample (Supplementary Fig. S12). Adapters were trimmed using fastq-mcf in the EAutils package (version ea-utils.1.1.2-537) with the following options: -q 10 -t 0.01 -k 0 (quality threshold 10, 0.01% occurrence frequency, no nucleotide skew causing cycle removal). Quality checks were performed with FastQC (version 0.11.7) and multiqc (version 1.5). Datasets were aligned to either the human (GRCh38.84) or the mouse (GRCm38.82) transcriptome along with reference sequences for ERCC spike-ins, using RSEM (version 1.3.0) with the following options: --bowtie2 --single-cell-prior --paired-end (Bowtie2 transcriptome aligner, single-cell prior to account for dropouts, paired-end reads). RSEM read counts were converted to transcripts per million (TPM) by dividing each value by the total read count for each sample and multiplying by 10 6 . Mitochondrial genes and ERCC spike-ins were not counted towards the total read count during TPM normalization. The number of genes with TPM > 1 for each sample was calculated relative to the number of unique Ensembl IDs for the organism excluding ERCC spike-ins.
Analysis of public scRNA-seq datasets. FASTQ files were downloaded from GSE75330, GSE60361, GSE103354 (plate-based), GSE66357, GSE113197, and PRJNA396019. FASTQ files were not available for the droplet-based dataset of GSE103354; therefore, BAM files were downloaded from SRR7621182 and converted to FASTQ format. Adapters were trimmed using fastq-mcf with the following options: -q 10 -t 0.01 -k 0 (quality threshold 10, 0.01% occurrence frequency, no nucleotide skew causing cycle removal). To compare with the other datasets, seqtk (version 1.3) was used to clip 15 bp unique molecular identifiers from the beginning of sequences in GSE60361 and GSE75330. All RNA-seq datasets were aligned to either the human (GRCh38.84) or the mouse (GRCm38.82) transcriptome as well as reference sequences for ERCC spike-ins, using RSEM with the following options: --bowtie2 --single-cell-prior (Bowtie2 transcriptome aligner, single-cell prior to account for dropouts). GSE103354 (plate-based), GSE113197, and PRJNA396019 also used --paired-end (paired-end reads). TPM conversion and gene detection quantification were calculated as above. For post-hoc pooling ( Supplementary  Fig. S11), individual scRNA-seq profiles were selected at random (n = 48 per dataset) and grouped with the nine scRNA-seq profiles in the dataset that were nearest by Jaccard distance. To model dropouts, TPM values for each scRNA-seq profile were scaled to expected copies per cell assuming 250,000 mRNA copies per cell 38 and transmitted to the 10-cell pool as a binomial random variable (N = expected copies per cell, p = RNA-to-cDNA conversion efficiency = 10% for Supplementary Fig. S11). Post-hoc pooling results were similar up to a conversion efficiency of ~30%.
Paired analysis of BeadChip microarrays and 10cRNA-seq. Microarray data (GSE120030) 41 were batch processed with the lumi R package 103 using a detection threshold of 0.05 and simple scaling normalization to obtain log 2 -normalized values that were converted to log 10 -normalized values. Gene names from the BeadChip files were merged to the extent possible with Ensembl IDs from the RSEM alignments by using HUGO Gene Nomenclature synonym tables to match current and retired gene names.

Monte Carlo simulations.
Simulations of stochastic-profiling experiments were performed in MATLAB using StochProfGUI 42 . Each parameter set was run 50 times to measure median p values and nonparametric confidence intervals. False positives were called when the median p value was less than 0.05 for a unimodal population (expression fraction = 0). False negatives were called when the median p value was greater than 0.05 for a bimodal population (expression fraction ≠ 0).

Data Availability
All 10cRNA-seq data are available through the NCBI Gene Expression Omnibus (GSE120261).
Step-by-step protocols for 10cRNA-seq, including critical steps and troubleshooting, are available here as a Supplementary Note and will be maintained on the Janes Laboratory website (http://bme.virginia.edu/janes/protocols/).