Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

Sampling strategies to capture single-cell heterogeneity

Abstract

Advances in single-cell technologies have highlighted the prevalence and biological significance of cellular heterogeneity. A critical question researchers face is how to design experiments that faithfully capture the true range of heterogeneity from samples of cellular populations. Here we develop a data-driven approach, illustrated in the context of image data, that estimates the sampling depth required for prospective investigations of single-cell heterogeneity from an existing collection of samples.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Sampling strategy to capture single-cell heterogeneity.
Figure 2: Effect of experimental and analysis parameters on capturing heterogeneity.

Similar content being viewed by others

References

  1. Almendro, V., Marusyk, A. & Polyak, K. Annu. Rev. Pathol. 8, 277–302 (2013).

    Article  CAS  Google Scholar 

  2. Altschuler, S.J. & Wu, L.F. Cell 141, 559–563 (2010).

    Article  CAS  Google Scholar 

  3. Yuan, G.C. et al. Genome Biol. 18, 84 (2017).

    Article  Google Scholar 

  4. Wan, W.H., Fortuna, M.B. & Furmanski, P. J. Immunol. Methods 103, 121–129 (1987).

    Article  CAS  Google Scholar 

  5. Camp, R.L., Neumeister, V. & Rimm, D.L. J. Clin. Oncol. 26, 5630–5637 (2008).

    Article  Google Scholar 

  6. Bray, M.A. & Carpenter, A.E. in Assay Guidance Manual (eds. Sittampalam, G.S. et al.) 617–650 (Eli Lilly & Company and the National Center for Advancing Translational Sciences, 2013).

  7. Snijder, B. et al. Nature 461, 520–523 (2009).

    Article  CAS  Google Scholar 

  8. Eckel-Passow, J.E. et al. Diagn. Pathol. 5, 48 (2010).

    Article  Google Scholar 

  9. Tennstedt, P. et al. Int. J. Oncol. 40, 261–268 (2012).

    PubMed  Google Scholar 

  10. Jiang, J., Colli, J. & El-Galley, R. J. Endourol. 24, 143–147 (2010).

    Article  Google Scholar 

  11. Rimm, D.L. et al. J. Clin. Oncol. 29, 2282–2290 (2011).

    Article  Google Scholar 

  12. Wampfler, J.A. J. Cancer Sci. Ther. 3, 120–124 (2011).

    Article  CAS  Google Scholar 

  13. Goethals, L. et al. J. Pathol. 208, 607–614 (2006).

    Article  CAS  Google Scholar 

  14. Khan, A.M. & Yuan, Y. Sci. Rep. 6, 36231 (2016).

    Article  CAS  Google Scholar 

  15. Massey, F.J. Jr. J. Am. Stat. Assoc. 46, 68–78 (1951).

    Article  Google Scholar 

  16. North, A.J. J. Cell Biol. 172, 9–18 (2006).

    Article  CAS  Google Scholar 

  17. Pawley, J. Biotechniques 28, 884–886, 888 (2000).

    Article  CAS  Google Scholar 

  18. Kang, J. et al. Nat. Biotechnol. 34, 70–77 (2016).

    Article  CAS  Google Scholar 

  19. Minner, S. et al. Mod. Pathol. 26, 106–116 (2013).

    Article  CAS  Google Scholar 

  20. Weibel, E.R., Hsia, C.C. & Ochs, M. J. Appl. Physiol. 102, 459–467 (2007).

    Article  Google Scholar 

  21. Schneider, C.A., Rasband, W.S. & Eliceiri, K.W. Nat. Methods 9, 671–675 (2012).

    Article  CAS  Google Scholar 

  22. Loo, L.H., Wu, L.F. & Altschuler, S.J. Nat. Methods 4, 445–453 (2007).

    Article  CAS  Google Scholar 

  23. Mason, D.M. & Schuenemeyer, J.H. Ann. Stat. 11, 933–946 (1983).

    Article  Google Scholar 

Download references

Acknowledgements

We thank M. Calvert, T.D. Tlsty and P.B. Stark for helpful discussions. This work was supported by the NCI K08CA175143 (C.E.A.), P01HL088594 (J.S.M.), a Conquer Cancer Foundation Young Investigator Award from the Scopus Foundation (J.D.G.), a gift from the Edmund Wattis Littlefield Foundation (R.S.W.), NSF PHY-1545915 (S.J.A.), Stand Up To Cancer (S.J.A.), NCI R01 CA133253 (S.J.A.), NCI RO1 CA185404 (L.F.W.) and NCI R01 CA184984 (L.F.W.), and the Institute of Computational Health Sciences (ICHS) at UCSF (S.J.A. and L.F.W.).

Author information

Authors and Affiliations

Authors

Contributions

S.R., S.J.A. and L.F.W. conceived of and designed the study. L.E.H., J.D.G., K.M.B. and A.K.W. performed the experiments and/or provided data. S.R., J.A., S.J.A., and L.F.W. developed the algorithms; and S.R. performed the analysis. R.S.W. and A.K.W. contributed samples. The manuscript was written by S.R., S.J.A. and L.F.W. with contributions from C.E.A. and J.S.M.

Corresponding authors

Correspondence to Lani F Wu or Steven J Altschuler.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 KS’ improves sensitivity to detect changes at the tails of distributions.

A) The probability distributions (PDF, top), cumulative probability distributions (CDF, middle) and difference in CDF with respect to whole tumor (bottom) are shown for two virtual TMA cores (red/blue curves) and the whole tissue (green). X-axis is in log scale. Vertical dotted lines were obtained by automatically thresholding intensities into low, medium and high levels. The KS statistic with respect to whole tissue (i.e. max value of CDF difference) is indicated by the double-sided arrow and does not reflect the fact that Core 1 contains a greater proportion of high intensity cells than the whole tissue.

B) Images of tissue showing intensities of TTF staining in the nucleus. Left: raw intensity levels; Right: classification of nuclei into low, medium and high intensity levels as used in A.

C) Subsample CDFs show reduced variation at tails of distributions. Top: in green is shown the whole tumor CDF and in black are the CDFs of the intensities of 100 cell subsamples generated using the whole tumor distribution. The magnitude of CDF variation is clearly smaller near the tails. Bottom: the standard deviations across the random CDFs above at each value of the intensity show excellent agreement with the theoretical prediction.

D) Shown are the CDF differences in the bottom row of A scaled using the theoretical scaling factor shown in C. The KS’ (maximal differences represented by double sided arrows) reflects the fact that Core 1 has a higher proportion of high intensity cells.

E) Comparison of KS and KS’: This plot is generated essentially the same as Fig. 1D, except with the median difference replaced by the KS. Top: Each point represents a single sampling run from the tissue in Fig. 1B, with color denoting the number of cores (as in Fig 1). As with the median difference, the KS’ also bounds the KS. The KS’ and the KS are numerically equal when the major difference between whole tissue and sample appear close to the median (x=y diagonal). However the KS’ is able to pick up differences at the tail more sensitively (lower right quadrant represents samples considered more similar to the whole by the KS than the KS’). Bottom: CDFs of the KS (dotted line) and KS’ (solid line) as a function of the number of cores (colors coded as in Fig. 1D) shows that the KS and KS’ scores for this sample are reasonably close, though the KS always gives a slightly higher confidence of obtaining a lower score.

Supplementary Figure 2 Comparison of IF staining across imaging sets

Two serial sections of the same liver cancer tissue sample were stained and imaged with the same biomarkers (DAPI/YAP) 5 months apart using different microscopes. Circles show the size of a 0.6mm diameter core. Areas between red lines depict automatic identification of tissue edge or blurring/staining artifacts and are not used in downstream analysis.

Supplementary Figure 3 The effect of staining/imaging (i.e. Imaging Set 1 vs Set 2) on number of cores is robust to analysis parameters.

Each scatter sub-plot compares the number of cores (1-10) required to capture the heterogeneity in YAP nuclear expression across two imaging sets (similar to Fig. 2a). The number of cores depends on two analytical choices: the KS’ tolerance and desired confidence. The grid shows how different combinations of these analytical parameters (values shown on right/below) affect the resultant plot. Samples requiring more than 10 cores are not displayed. In some cases, dots are displayed on top of each other.

Supplementary Figure 4 Comparison of the KS and KS’

A) Comparison of the number of cores needed to capture whole tissue heterogeneity based on the KS and KS’ for liver cancer samples stained with LKB1. Plots are constructed as in Fig. 2A and B, with numbers of cores chosen to ensure KS or KS’ < 0.2 with 80% confidence (size of points represents the number of specimens with the same number of suggested cores). Note that the KS’ always requires the same or more cores than the KS, but the disagreement (deviation from diagonal) is sample specific, likely denoting the differences at the tails. B & C) Comparison of the KS’ and KS scores (for whole tissue vs core distributions) arising from 1000 random 6-core samplings (points in plot) from the patient tissues indicated by the arrows. In B, the KS & KS’ are largely in numerical agreement (diagonal distribution of points), whereas in C the lower right quadrants of points represent samplings that appear similar to the whole based on the KS but not the KS’. The numbers in red denotes the fraction of points in each quadrant.

Supplementary Figure 5 Number of cores required to capture heterogeneity is biomarker (YAP/DAPI/Beta-Catenin/LKB1) dependent.

Across the panel of liver cancer patient specimens (25 in imaging set 1 and 38 in imaging set 2), the number of cores needed to capture heterogeneity (at KS’ tolerance of 0.2 at 80% confidence) was calculated for pairs of co-stained biomarkers (x/y axes). Points on dotted diagonal denote samples requiring the same number of cores for both biomarkers. As in Fig. 2B (top left plot is identical to Fig. 2B), the size of each point denotes the number of specimen requiring the same numbers of cores.

Supplementary Figure 6 Relating the properties of single cell features (rows) to the number of replicate wells needed to capture their heterogeneity (left most plot, row order).

Except for the left and right plot, each column represents a feature trait (present – gray, absent – white). From left to right, panels indicate: 1) the number of wells needed, 2) feature class, 3) which of three biomarker images is used to calculate the feature, 4) which of 3 cellular compartments (D – nucleus, Y – cytoplasm, C – whole cell) are profiled by the feature, 5) The quantifier used in summarizing intensity features (MultiChannel – depends on 2 biomarkers, R – ratio of intensities, Ixx – xxth percentile, Iav – average intensity, Itot – sum of pixel intensities), 6) whether the feature measures off-target biomarker staining (e.g. nuclear marker in cytoplasm), 7) the extent to which a feature is discrete (a large number of cells share the same feature value) or continuous (where no two cells share the same feature value). Note, features numbered 208-215 required > 20 wells (the maximum number tested) in at least one replicate plate, and the estimated number of wells required were set to 20.

Supplementary Figure 7 Robustness of TMA- based confidence estimates to specimen selection.

Confidence curves for the different markers were generated as in Fig. 2c, but by using (a random selection of) half the total number of available specimens (13/25 for imaging set 1 and 19/38 for imaging set 2). Random specimen selection was repeated 1000 times to generate a distribution of confidence values over different specimen-subsets at a given number of cores. A box-whisker plot was used to display the results. For a more conservative estimate of confidence level that includes inter-specimen effects, a user might consider using the lower bound on the box-whisker plot at a given number of cores.

Supplementary Figure 8 Determining the number of TMA cores to capture whole tissue heterogeneity based on Immunohistochemistry (IHC) images.

A) Processing IHC images. Shown is a raw IHC image of a breast cancer tumor stained for haemotoxylin and Ki67. The tumor area is outlined in green, and two virtual cores of 0.6mm diameter are shown in red and blue. To the right, are enlarged versions of the cores’ images, before and after de-convolution (to separate out DNA and Ki67 intensities). B) Percentage of positive cells (PP) is a poor predictor of distribution. Shown are cumulative distributions of the nuclear intensity of the cores/tumors’ from A. The PP is the fraction of cells above a (manually) selected threshold intensity (i.e. 100 minus the y-coordinate at the threshold intensity). The PP for the cores/tumor are shown by the triangles to the left of the y axis. Although the cores and the tumor have very similar PP values, their distributions are clearly different C) KS’ tolerance bounds PP tolerance. Shown is a scatter plot of the difference in PP (with respect to whole tissue) for different virtual samplings (dots) plotted against their corresponding KS’ score. For any value of the KS’, there is a clear upper bound on the possible PP difference. D) Determining number of cores. Similar to the results for IF (Fig 1) our framework generates a plot of the level of confidence (y-axis) achieved by using different number of cores (x-axis) at different levels of KS’ (and by extension PP) tolerance (different colored curves), thereby allowing the user to balance the tradeoff between these quantities.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–8.

Life Sciences Reporting Summary

Supplementary Software

Figure Code: This file contains the MATLAB code used to generate the main figures, and an implementation of the KS' statistic in R.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rajaram, S., Heinrich, L., Gordan, J. et al. Sampling strategies to capture single-cell heterogeneity. Nat Methods 14, 967–970 (2017). https://doi.org/10.1038/nmeth.4427

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.4427

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing