Sampling strategies to capture single-cell heterogeneity

Rajaram, Satwik; Heinrich, Louise E; Gordan, John D; Avva, Jayant; Bonness, Kathy M; Witkiewicz, Agnieszka K; Malter, James S; Atreya, Chloe E; Warren, Robert S; Wu, Lani F; Altschuler, Steven J

doi:10.1038/nmeth.4427

Brief Communication
Published: 04 September 2017

Sampling strategies to capture single-cell heterogeneity

Satwik Rajaram ORCID: orcid.org/0000-0001-8242-4402¹,
Louise E Heinrich ORCID: orcid.org/0000-0002-4394-922X¹,
John D Gordan^2,3,
Jayant Avva⁴,
Kathy M Bonness⁴,
Agnieszka K Witkiewicz⁵,
James S Malter⁶,
Chloe E Atreya^2,3,
Robert S Warren^3,7,
Lani F Wu^1,3 &
…
Steven J Altschuler ORCID: orcid.org/0000-0001-9142-0796^1,3

Nature Methods volume 14, pages 967–970 (2017)Cite this article

6158 Accesses
16 Citations
21 Altmetric
Metrics details

Subjects

Abstract

Advances in single-cell technologies have highlighted the prevalence and biological significance of cellular heterogeneity. A critical question researchers face is how to design experiments that faithfully capture the true range of heterogeneity from samples of cellular populations. Here we develop a data-driven approach, illustrated in the context of image data, that estimates the sampling depth required for prospective investigations of single-cell heterogeneity from an existing collection of samples.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Sampling strategy to capture single-cell heterogeneity.**

**Figure 2: Effect of experimental and analysis parameters on capturing heterogeneity.**

A SIMPLI (Single-cell Identification from MultiPLexed Images) approach for spatially-resolved tissue phenotyping at single-cell resolution

Article Open access 09 February 2022

Michele Bortolomeazzi, Lucia Montorsi, … Francesca D. Ciccarelli

Multiplexed laser particles for spatially resolved single-cell analysis

Article Open access 21 August 2019

Sheldon J. J. Kwok, Nicola Martino, … Seok-Hyun Yun

Capturing single-cell heterogeneity via data fusion improves image-based profiling

Article Open access 07 May 2019

Mohammad H. Rohban, Hamdah S. Abbasi, … Anne E. Carpenter

References

Almendro, V., Marusyk, A. & Polyak, K. Annu. Rev. Pathol. 8, 277–302 (2013).
Article CAS Google Scholar
Altschuler, S.J. & Wu, L.F. Cell 141, 559–563 (2010).
Article CAS Google Scholar
Yuan, G.C. et al. Genome Biol. 18, 84 (2017).
Article Google Scholar
Wan, W.H., Fortuna, M.B. & Furmanski, P. J. Immunol. Methods 103, 121–129 (1987).
Article CAS Google Scholar
Camp, R.L., Neumeister, V. & Rimm, D.L. J. Clin. Oncol. 26, 5630–5637 (2008).
Article Google Scholar
Bray, M.A. & Carpenter, A.E. in Assay Guidance Manual (eds. Sittampalam, G.S. et al.) 617–650 (Eli Lilly & Company and the National Center for Advancing Translational Sciences, 2013).
Snijder, B. et al. Nature 461, 520–523 (2009).
Article CAS Google Scholar
Eckel-Passow, J.E. et al. Diagn. Pathol. 5, 48 (2010).
Article Google Scholar
Tennstedt, P. et al. Int. J. Oncol. 40, 261–268 (2012).
PubMed Google Scholar
Jiang, J., Colli, J. & El-Galley, R. J. Endourol. 24, 143–147 (2010).
Article Google Scholar
Rimm, D.L. et al. J. Clin. Oncol. 29, 2282–2290 (2011).
Article Google Scholar
Wampfler, J.A. J. Cancer Sci. Ther. 3, 120–124 (2011).
Article CAS Google Scholar
Goethals, L. et al. J. Pathol. 208, 607–614 (2006).
Article CAS Google Scholar
Khan, A.M. & Yuan, Y. Sci. Rep. 6, 36231 (2016).
Article CAS Google Scholar
Massey, F.J. Jr. J. Am. Stat. Assoc. 46, 68–78 (1951).
Article Google Scholar
North, A.J. J. Cell Biol. 172, 9–18 (2006).
Article CAS Google Scholar
Pawley, J. Biotechniques 28, 884–886, 888 (2000).
Article CAS Google Scholar
Kang, J. et al. Nat. Biotechnol. 34, 70–77 (2016).
Article CAS Google Scholar
Minner, S. et al. Mod. Pathol. 26, 106–116 (2013).
Article CAS Google Scholar
Weibel, E.R., Hsia, C.C. & Ochs, M. J. Appl. Physiol. 102, 459–467 (2007).
Article Google Scholar
Schneider, C.A., Rasband, W.S. & Eliceiri, K.W. Nat. Methods 9, 671–675 (2012).
Article CAS Google Scholar
Loo, L.H., Wu, L.F. & Altschuler, S.J. Nat. Methods 4, 445–453 (2007).
Article CAS Google Scholar
Mason, D.M. & Schuenemeyer, J.H. Ann. Stat. 11, 933–946 (1983).
Article Google Scholar

Download references

Acknowledgements

We thank M. Calvert, T.D. Tlsty and P.B. Stark for helpful discussions. This work was supported by the NCI K08CA175143 (C.E.A.), P01HL088594 (J.S.M.), a Conquer Cancer Foundation Young Investigator Award from the Scopus Foundation (J.D.G.), a gift from the Edmund Wattis Littlefield Foundation (R.S.W.), NSF PHY-1545915 (S.J.A.), Stand Up To Cancer (S.J.A.), NCI R01 CA133253 (S.J.A.), NCI RO1 CA185404 (L.F.W.) and NCI R01 CA184984 (L.F.W.), and the Institute of Computational Health Sciences (ICHS) at UCSF (S.J.A. and L.F.W.).

Author information

Authors and Affiliations

Department of Pharmaceutical Chemistry, University of California, San Francisco, San Francisco, California, USA
Satwik Rajaram, Louise E Heinrich, Lani F Wu & Steven J Altschuler
Department of Medicine, University of California, San Francisco, San Francisco, California, USA
John D Gordan & Chloe E Atreya
Helen Diller Family Comprehensive Cancer Center, University of California, San Francisco, San Francisco, California, USA
John D Gordan, Chloe E Atreya, Robert S Warren, Lani F Wu & Steven J Altschuler
Green Center for Systems Biology, University of Texas Southwestern Medical Center, Dallas, Texas, USA
Jayant Avva & Kathy M Bonness
Department of Pathology, University of Arizona, Tucson, Arizona, USA
Agnieszka K Witkiewicz
Department of Pathology, University of Texas Southwestern Medical Center, Dallas, Texas, USA
James S Malter
Department of Surgery, University of California, San Francisco, San Francisco, California, USA
Robert S Warren

Authors

Satwik Rajaram
View author publications
You can also search for this author in PubMed Google Scholar
Louise E Heinrich
View author publications
You can also search for this author in PubMed Google Scholar
John D Gordan
View author publications
You can also search for this author in PubMed Google Scholar
Jayant Avva
View author publications
You can also search for this author in PubMed Google Scholar
Kathy M Bonness
View author publications
You can also search for this author in PubMed Google Scholar
Agnieszka K Witkiewicz
View author publications
You can also search for this author in PubMed Google Scholar
James S Malter
View author publications
You can also search for this author in PubMed Google Scholar
Chloe E Atreya
View author publications
You can also search for this author in PubMed Google Scholar
Robert S Warren
View author publications
You can also search for this author in PubMed Google Scholar
Lani F Wu
View author publications
You can also search for this author in PubMed Google Scholar
Steven J Altschuler
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.R., S.J.A. and L.F.W. conceived of and designed the study. L.E.H., J.D.G., K.M.B. and A.K.W. performed the experiments and/or provided data. S.R., J.A., S.J.A., and L.F.W. developed the algorithms; and S.R. performed the analysis. R.S.W. and A.K.W. contributed samples. The manuscript was written by S.R., S.J.A. and L.F.W. with contributions from C.E.A. and J.S.M.

Corresponding authors

Correspondence to Lani F Wu or Steven J Altschuler.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 KS’ improves sensitivity to detect changes at the tails of distributions.

A) The probability distributions (PDF, top), cumulative probability distributions (CDF, middle) and difference in CDF with respect to whole tumor (bottom) are shown for two virtual TMA cores (red/blue curves) and the whole tissue (green). X-axis is in log scale. Vertical dotted lines were obtained by automatically thresholding intensities into low, medium and high levels. The KS statistic with respect to whole tissue (i.e. max value of CDF difference) is indicated by the double-sided arrow and does not reflect the fact that Core 1 contains a greater proportion of high intensity cells than the whole tissue.

B) Images of tissue showing intensities of TTF staining in the nucleus. Left: raw intensity levels; Right: classification of nuclei into low, medium and high intensity levels as used in A.

C) Subsample CDFs show reduced variation at tails of distributions. Top: in green is shown the whole tumor CDF and in black are the CDFs of the intensities of 100 cell subsamples generated using the whole tumor distribution. The magnitude of CDF variation is clearly smaller near the tails. Bottom: the standard deviations across the random CDFs above at each value of the intensity show excellent agreement with the theoretical prediction.

D) Shown are the CDF differences in the bottom row of A scaled using the theoretical scaling factor shown in C. The KS’ (maximal differences represented by double sided arrows) reflects the fact that Core 1 has a higher proportion of high intensity cells.

E) Comparison of KS and KS’: This plot is generated essentially the same as Fig. 1D, except with the median difference replaced by the KS. Top: Each point represents a single sampling run from the tissue in Fig. 1B, with color denoting the number of cores (as in Fig 1). As with the median difference, the KS’ also bounds the KS. The KS’ and the KS are numerically equal when the major difference between whole tissue and sample appear close to the median (x=y diagonal). However the KS’ is able to pick up differences at the tail more sensitively (lower right quadrant represents samples considered more similar to the whole by the KS than the KS’). Bottom: CDFs of the KS (dotted line) and KS’ (solid line) as a function of the number of cores (colors coded as in Fig. 1D) shows that the KS and KS’ scores for this sample are reasonably close, though the KS always gives a slightly higher confidence of obtaining a lower score.

Supplementary Figure 2 Comparison of IF staining across imaging sets

Two serial sections of the same liver cancer tissue sample were stained and imaged with the same biomarkers (DAPI/YAP) 5 months apart using different microscopes. Circles show the size of a 0.6mm diameter core. Areas between red lines depict automatic identification of tissue edge or blurring/staining artifacts and are not used in downstream analysis.

Supplementary Figure 3 The effect of staining/imaging (i.e. Imaging Set 1 vs Set 2) on number of cores is robust to analysis parameters.

Each scatter sub-plot compares the number of cores (1-10) required to capture the heterogeneity in YAP nuclear expression across two imaging sets (similar to Fig. 2a). The number of cores depends on two analytical choices: the KS’ tolerance and desired confidence. The grid shows how different combinations of these analytical parameters (values shown on right/below) affect the resultant plot. Samples requiring more than 10 cores are not displayed. In some cases, dots are displayed on top of each other.

Supplementary Figure 4 Comparison of the KS and KS’

A) Comparison of the number of cores needed to capture whole tissue heterogeneity based on the KS and KS’ for liver cancer samples stained with LKB1. Plots are constructed as in Fig. 2A and B, with numbers of cores chosen to ensure KS or KS’ < 0.2 with 80% confidence (size of points represents the number of specimens with the same number of suggested cores). Note that the KS’ always requires the same or more cores than the KS, but the disagreement (deviation from diagonal) is sample specific, likely denoting the differences at the tails. B & C) Comparison of the KS’ and KS scores (for whole tissue vs core distributions) arising from 1000 random 6-core samplings (points in plot) from the patient tissues indicated by the arrows. In B, the KS & KS’ are largely in numerical agreement (diagonal distribution of points), whereas in C the lower right quadrants of points represent samplings that appear similar to the whole based on the KS but not the KS’. The numbers in red denotes the fraction of points in each quadrant.

Supplementary Figure 5 Number of cores required to capture heterogeneity is biomarker (YAP/DAPI/Beta-Catenin/LKB1) dependent.

Across the panel of liver cancer patient specimens (25 in imaging set 1 and 38 in imaging set 2), the number of cores needed to capture heterogeneity (at KS’ tolerance of 0.2 at 80% confidence) was calculated for pairs of co-stained biomarkers (x/y axes). Points on dotted diagonal denote samples requiring the same number of cores for both biomarkers. As in Fig. 2B (top left plot is identical to Fig. 2B), the size of each point denotes the number of specimen requiring the same numbers of cores.

Supplementary Figure 6 Relating the properties of single cell features (rows) to the number of replicate wells needed to capture their heterogeneity (left most plot, row order).

Except for the left and right plot, each column represents a feature trait (present – gray, absent – white). From left to right, panels indicate: 1) the number of wells needed, 2) feature class, 3) which of three biomarker images is used to calculate the feature, 4) which of 3 cellular compartments (D – nucleus, Y – cytoplasm, C – whole cell) are profiled by the feature, 5) The quantifier used in summarizing intensity features (MultiChannel – depends on 2 biomarkers, R – ratio of intensities, Ixx – xxth percentile, Iav – average intensity, Itot – sum of pixel intensities), 6) whether the feature measures off-target biomarker staining (e.g. nuclear marker in cytoplasm), 7) the extent to which a feature is discrete (a large number of cells share the same feature value) or continuous (where no two cells share the same feature value). Note, features numbered 208-215 required > 20 wells (the maximum number tested) in at least one replicate plate, and the estimated number of wells required were set to 20.

Supplementary Figure 7 Robustness of TMA- based confidence estimates to specimen selection.

Confidence curves for the different markers were generated as in Fig. 2c, but by using (a random selection of) half the total number of available specimens (13/25 for imaging set 1 and 19/38 for imaging set 2). Random specimen selection was repeated 1000 times to generate a distribution of confidence values over different specimen-subsets at a given number of cores. A box-whisker plot was used to display the results. For a more conservative estimate of confidence level that includes inter-specimen effects, a user might consider using the lower bound on the box-whisker plot at a given number of cores.

Supplementary Figure 8 Determining the number of TMA cores to capture whole tissue heterogeneity based on Immunohistochemistry (IHC) images.

A) Processing IHC images. Shown is a raw IHC image of a breast cancer tumor stained for haemotoxylin and Ki67. The tumor area is outlined in green, and two virtual cores of 0.6mm diameter are shown in red and blue. To the right, are enlarged versions of the cores’ images, before and after de-convolution (to separate out DNA and Ki67 intensities). B) Percentage of positive cells (PP) is a poor predictor of distribution. Shown are cumulative distributions of the nuclear intensity of the cores/tumors’ from A. The PP is the fraction of cells above a (manually) selected threshold intensity (i.e. 100 minus the y-coordinate at the threshold intensity). The PP for the cores/tumor are shown by the triangles to the left of the y axis. Although the cores and the tumor have very similar PP values, their distributions are clearly different C) KS’ tolerance bounds PP tolerance. Shown is a scatter plot of the difference in PP (with respect to whole tissue) for different virtual samplings (dots) plotted against their corresponding KS’ score. For any value of the KS’, there is a clear upper bound on the possible PP difference. D) Determining number of cores. Similar to the results for IF (Fig 1) our framework generates a plot of the level of confidence (y-axis) achieved by using different number of cores (x-axis) at different levels of KS’ (and by extension PP) tolerance (different colored curves), thereby allowing the user to balance the tradeoff between these quantities.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rajaram, S., Heinrich, L., Gordan, J. et al. Sampling strategies to capture single-cell heterogeneity. Nat Methods 14, 967–970 (2017). https://doi.org/10.1038/nmeth.4427

Download citation

Received: 21 March 2017
Accepted: 11 August 2017
Published: 04 September 2017
Issue Date: 01 October 2017
DOI: https://doi.org/10.1038/nmeth.4427

This article is cited by

Optimizing multiplexed imaging experimental design through tissue spatial segregation estimation
- Pierre Bost
- Daniel Schulz
- Bernd Bodenmiller
Nature Methods (2023)
In silico tissue generation and power analysis for spatial omics
- Ethan A. G. Baker
- Denis Schapiro
- Aviv Regev
Nature Methods (2023)
Currently favored sampling practices for tumor sequencing can produce optimal results in the clinical setting
- Lőrinc S. Pongor
- Gyöngyi Munkácsy
- Balázs Győrffy
Scientific Reports (2020)
Eliciting the impacts of cellular noise on metabolic trade-offs by quantitative mass imaging
- A. E. Vasdekis
- H. Alanazi
- G. Stephanopoulos
Nature Communications (2019)
A multi-modal data resource for investigating topographic heterogeneity in patient-derived xenograft tumors
- Satwik Rajaram
- Maike A. Roth
- Lani F. Wu
Scientific Data (2019)