Abstract
Advances in single-cell technologies have highlighted the prevalence and biological significance of cellular heterogeneity. A critical question researchers face is how to design experiments that faithfully capture the true range of heterogeneity from samples of cellular populations. Here we develop a data-driven approach, illustrated in the context of image data, that estimates the sampling depth required for prospective investigations of single-cell heterogeneity from an existing collection of samples.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$29.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on Springer Link
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout
Similar content being viewed by others
References
Almendro, V., Marusyk, A. & Polyak, K. Annu. Rev. Pathol. 8, 277–302 (2013).
Altschuler, S.J. & Wu, L.F. Cell 141, 559–563 (2010).
Yuan, G.C. et al. Genome Biol. 18, 84 (2017).
Wan, W.H., Fortuna, M.B. & Furmanski, P. J. Immunol. Methods 103, 121–129 (1987).
Camp, R.L., Neumeister, V. & Rimm, D.L. J. Clin. Oncol. 26, 5630–5637 (2008).
Bray, M.A. & Carpenter, A.E. in Assay Guidance Manual (eds. Sittampalam, G.S. et al.) 617–650 (Eli Lilly & Company and the National Center for Advancing Translational Sciences, 2013).
Snijder, B. et al. Nature 461, 520–523 (2009).
Eckel-Passow, J.E. et al. Diagn. Pathol. 5, 48 (2010).
Tennstedt, P. et al. Int. J. Oncol. 40, 261–268 (2012).
Jiang, J., Colli, J. & El-Galley, R. J. Endourol. 24, 143–147 (2010).
Rimm, D.L. et al. J. Clin. Oncol. 29, 2282–2290 (2011).
Wampfler, J.A. J. Cancer Sci. Ther. 3, 120–124 (2011).
Goethals, L. et al. J. Pathol. 208, 607–614 (2006).
Khan, A.M. & Yuan, Y. Sci. Rep. 6, 36231 (2016).
Massey, F.J. Jr. J. Am. Stat. Assoc. 46, 68–78 (1951).
North, A.J. J. Cell Biol. 172, 9–18 (2006).
Pawley, J. Biotechniques 28, 884–886, 888 (2000).
Kang, J. et al. Nat. Biotechnol. 34, 70–77 (2016).
Minner, S. et al. Mod. Pathol. 26, 106–116 (2013).
Weibel, E.R., Hsia, C.C. & Ochs, M. J. Appl. Physiol. 102, 459–467 (2007).
Schneider, C.A., Rasband, W.S. & Eliceiri, K.W. Nat. Methods 9, 671–675 (2012).
Loo, L.H., Wu, L.F. & Altschuler, S.J. Nat. Methods 4, 445–453 (2007).
Mason, D.M. & Schuenemeyer, J.H. Ann. Stat. 11, 933–946 (1983).
Acknowledgements
We thank M. Calvert, T.D. Tlsty and P.B. Stark for helpful discussions. This work was supported by the NCI K08CA175143 (C.E.A.), P01HL088594 (J.S.M.), a Conquer Cancer Foundation Young Investigator Award from the Scopus Foundation (J.D.G.), a gift from the Edmund Wattis Littlefield Foundation (R.S.W.), NSF PHY-1545915 (S.J.A.), Stand Up To Cancer (S.J.A.), NCI R01 CA133253 (S.J.A.), NCI RO1 CA185404 (L.F.W.) and NCI R01 CA184984 (L.F.W.), and the Institute of Computational Health Sciences (ICHS) at UCSF (S.J.A. and L.F.W.).
Author information
Authors and Affiliations
Contributions
S.R., S.J.A. and L.F.W. conceived of and designed the study. L.E.H., J.D.G., K.M.B. and A.K.W. performed the experiments and/or provided data. S.R., J.A., S.J.A., and L.F.W. developed the algorithms; and S.R. performed the analysis. R.S.W. and A.K.W. contributed samples. The manuscript was written by S.R., S.J.A. and L.F.W. with contributions from C.E.A. and J.S.M.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Integrated supplementary information
Supplementary Figure 1 KS’ improves sensitivity to detect changes at the tails of distributions.
A) The probability distributions (PDF, top), cumulative probability distributions (CDF, middle) and difference in CDF with respect to whole tumor (bottom) are shown for two virtual TMA cores (red/blue curves) and the whole tissue (green). X-axis is in log scale. Vertical dotted lines were obtained by automatically thresholding intensities into low, medium and high levels. The KS statistic with respect to whole tissue (i.e. max value of CDF difference) is indicated by the double-sided arrow and does not reflect the fact that Core 1 contains a greater proportion of high intensity cells than the whole tissue.
B) Images of tissue showing intensities of TTF staining in the nucleus. Left: raw intensity levels; Right: classification of nuclei into low, medium and high intensity levels as used in A.
C) Subsample CDFs show reduced variation at tails of distributions. Top: in green is shown the whole tumor CDF and in black are the CDFs of the intensities of 100 cell subsamples generated using the whole tumor distribution. The magnitude of CDF variation is clearly smaller near the tails. Bottom: the standard deviations across the random CDFs above at each value of the intensity show excellent agreement with the theoretical prediction.
D) Shown are the CDF differences in the bottom row of A scaled using the theoretical scaling factor shown in C. The KS’ (maximal differences represented by double sided arrows) reflects the fact that Core 1 has a higher proportion of high intensity cells.
E) Comparison of KS and KS’: This plot is generated essentially the same as Fig. 1D, except with the median difference replaced by the KS. Top: Each point represents a single sampling run from the tissue in Fig. 1B, with color denoting the number of cores (as in Fig 1). As with the median difference, the KS’ also bounds the KS. The KS’ and the KS are numerically equal when the major difference between whole tissue and sample appear close to the median (x=y diagonal). However the KS’ is able to pick up differences at the tail more sensitively (lower right quadrant represents samples considered more similar to the whole by the KS than the KS’). Bottom: CDFs of the KS (dotted line) and KS’ (solid line) as a function of the number of cores (colors coded as in Fig. 1D) shows that the KS and KS’ scores for this sample are reasonably close, though the KS always gives a slightly higher confidence of obtaining a lower score.
Supplementary Figure 2 Comparison of IF staining across imaging sets
Two serial sections of the same liver cancer tissue sample were stained and imaged with the same biomarkers (DAPI/YAP) 5 months apart using different microscopes. Circles show the size of a 0.6mm diameter core. Areas between red lines depict automatic identification of tissue edge or blurring/staining artifacts and are not used in downstream analysis.
Supplementary Figure 3 The effect of staining/imaging (i.e. Imaging Set 1 vs Set 2) on number of cores is robust to analysis parameters.
Each scatter sub-plot compares the number of cores (1-10) required to capture the heterogeneity in YAP nuclear expression across two imaging sets (similar to Fig. 2a). The number of cores depends on two analytical choices: the KS’ tolerance and desired confidence. The grid shows how different combinations of these analytical parameters (values shown on right/below) affect the resultant plot. Samples requiring more than 10 cores are not displayed. In some cases, dots are displayed on top of each other.
Supplementary Figure 4 Comparison of the KS and KS’
A) Comparison of the number of cores needed to capture whole tissue heterogeneity based on the KS and KS’ for liver cancer samples stained with LKB1. Plots are constructed as in Fig. 2A and B, with numbers of cores chosen to ensure KS or KS’ < 0.2 with 80% confidence (size of points represents the number of specimens with the same number of suggested cores). Note that the KS’ always requires the same or more cores than the KS, but the disagreement (deviation from diagonal) is sample specific, likely denoting the differences at the tails. B & C) Comparison of the KS’ and KS scores (for whole tissue vs core distributions) arising from 1000 random 6-core samplings (points in plot) from the patient tissues indicated by the arrows. In B, the KS & KS’ are largely in numerical agreement (diagonal distribution of points), whereas in C the lower right quadrants of points represent samplings that appear similar to the whole based on the KS but not the KS’. The numbers in red denotes the fraction of points in each quadrant.
Supplementary Figure 5 Number of cores required to capture heterogeneity is biomarker (YAP/DAPI/Beta-Catenin/LKB1) dependent.
Across the panel of liver cancer patient specimens (25 in imaging set 1 and 38 in imaging set 2), the number of cores needed to capture heterogeneity (at KS’ tolerance of 0.2 at 80% confidence) was calculated for pairs of co-stained biomarkers (x/y axes). Points on dotted diagonal denote samples requiring the same number of cores for both biomarkers. As in Fig. 2B (top left plot is identical to Fig. 2B), the size of each point denotes the number of specimen requiring the same numbers of cores.
Supplementary Figure 6 Relating the properties of single cell features (rows) to the number of replicate wells needed to capture their heterogeneity (left most plot, row order).
Except for the left and right plot, each column represents a feature trait (present – gray, absent – white). From left to right, panels indicate: 1) the number of wells needed, 2) feature class, 3) which of three biomarker images is used to calculate the feature, 4) which of 3 cellular compartments (D – nucleus, Y – cytoplasm, C – whole cell) are profiled by the feature, 5) The quantifier used in summarizing intensity features (MultiChannel – depends on 2 biomarkers, R – ratio of intensities, Ixx – xxth percentile, Iav – average intensity, Itot – sum of pixel intensities), 6) whether the feature measures off-target biomarker staining (e.g. nuclear marker in cytoplasm), 7) the extent to which a feature is discrete (a large number of cells share the same feature value) or continuous (where no two cells share the same feature value). Note, features numbered 208-215 required > 20 wells (the maximum number tested) in at least one replicate plate, and the estimated number of wells required were set to 20.
Supplementary Figure 7 Robustness of TMA- based confidence estimates to specimen selection.
Confidence curves for the different markers were generated as in Fig. 2c, but by using (a random selection of) half the total number of available specimens (13/25 for imaging set 1 and 19/38 for imaging set 2). Random specimen selection was repeated 1000 times to generate a distribution of confidence values over different specimen-subsets at a given number of cores. A box-whisker plot was used to display the results. For a more conservative estimate of confidence level that includes inter-specimen effects, a user might consider using the lower bound on the box-whisker plot at a given number of cores.
Supplementary Figure 8 Determining the number of TMA cores to capture whole tissue heterogeneity based on Immunohistochemistry (IHC) images.
A) Processing IHC images. Shown is a raw IHC image of a breast cancer tumor stained for haemotoxylin and Ki67. The tumor area is outlined in green, and two virtual cores of 0.6mm diameter are shown in red and blue. To the right, are enlarged versions of the cores’ images, before and after de-convolution (to separate out DNA and Ki67 intensities). B) Percentage of positive cells (PP) is a poor predictor of distribution. Shown are cumulative distributions of the nuclear intensity of the cores/tumors’ from A. The PP is the fraction of cells above a (manually) selected threshold intensity (i.e. 100 minus the y-coordinate at the threshold intensity). The PP for the cores/tumor are shown by the triangles to the left of the y axis. Although the cores and the tumor have very similar PP values, their distributions are clearly different C) KS’ tolerance bounds PP tolerance. Shown is a scatter plot of the difference in PP (with respect to whole tissue) for different virtual samplings (dots) plotted against their corresponding KS’ score. For any value of the KS’, there is a clear upper bound on the possible PP difference. D) Determining number of cores. Similar to the results for IF (Fig 1) our framework generates a plot of the level of confidence (y-axis) achieved by using different number of cores (x-axis) at different levels of KS’ (and by extension PP) tolerance (different colored curves), thereby allowing the user to balance the tradeoff between these quantities.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–8.
Supplementary Software
Figure Code: This file contains the MATLAB code used to generate the main figures, and an implementation of the KS' statistic in R.
Rights and permissions
About this article
Cite this article
Rajaram, S., Heinrich, L., Gordan, J. et al. Sampling strategies to capture single-cell heterogeneity. Nat Methods 14, 967–970 (2017). https://doi.org/10.1038/nmeth.4427
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1038/nmeth.4427
This article is cited by
-
Optimizing multiplexed imaging experimental design through tissue spatial segregation estimation
Nature Methods (2023)
-
In silico tissue generation and power analysis for spatial omics
Nature Methods (2023)
-
Currently favored sampling practices for tumor sequencing can produce optimal results in the clinical setting
Scientific Reports (2020)
-
Eliciting the impacts of cellular noise on metabolic trade-offs by quantitative mass imaging
Nature Communications (2019)
-
A multi-modal data resource for investigating topographic heterogeneity in patient-derived xenograft tumors
Scientific Data (2019)