Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Brief Communication
  • Published:

Clonal genotype and population structure inference from single-cell tumor sequencing

Abstract

Single-cell DNA sequencing has great potential to reveal the clonal genotypes and population structure of human cancers. However, single-cell data suffer from missing values and biased allelic counts as well as false genotype measurements owing to the sequencing of multiple cells. We describe the Single Cell Genotyper (https://bitbucket.org/aroth85/scg), an open-source software based on a statistical model coupled with a mean-field variational inference method, which can be used to address these problems and robustly infer clonal genotypes.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Overview of the SCG model.
Figure 2: Comparison of clustering performance on real data with doublets.
Figure 3: The D-SCG3 model identified clonal cell populations in multiple samples from an HGSOC patient.

Similar content being viewed by others

Accession codes

Primary accessions

Sequence Read Archive

References

  1. Navin, N. et al. Nature 472, 90–94 (2011).

    Article  CAS  Google Scholar 

  2. Gawad, C., Koh, W. & Quake, S.R. Proc. Natl. Acad. Sci. USA 111, 17947–17952 (2014).

    Article  CAS  Google Scholar 

  3. Wang, Y. et al. Nature 512, 155–160 (2014).

    Article  CAS  Google Scholar 

  4. Baslan, T. et al. Genome Res. 25, 714–724 (2015).

    Article  CAS  Google Scholar 

  5. Eirew, P. et al. Nature 518, 422–426 (2015).

    Article  CAS  Google Scholar 

  6. Navin, N.E. Sci. Transl. Med. 7, 296fs29 (2015).

    Article  Google Scholar 

  7. Roth, A. et al. Nat. Methods 11, 396–398 (2014).

    Article  CAS  Google Scholar 

  8. Jiao, W., Vembu, S., Deshwar, A.G., Stein, L. & Morris, Q. BMC Bioinformatics 15, 35 (2014).

    Article  Google Scholar 

  9. Zare, H. et al. PLoS Comput. Biol. 10, e1003703 (2014).

    Article  Google Scholar 

  10. Malikic, S., McPherson, A.W., Donmez, N. & Sahinalp, C.S. Bioinformatics 31, 1349–1356 (2015).

    Article  CAS  Google Scholar 

  11. Popic, V. et al. Genome Biol. 16, 91 (2015).

    Article  Google Scholar 

  12. Shapiro, E., Biezuner, T. & Linnarsson, S. Nat. Rev. Genet. 14, 618–630 (2013).

    Article  CAS  Google Scholar 

  13. Ning, L. et al. Front. Oncol. 4, 7 (2014).

    Article  Google Scholar 

  14. Yuan, K., Sakoparnig, T., Markowetz, F. & Beerenwinkel, N. Genome Biol. 16, 36 (2015).

    Article  Google Scholar 

  15. Broderick, T., Pitman, J. & Jordan, M.I. Bayesian Anal. 8, 801–836 (2013).

    Article  Google Scholar 

  16. McPherson, A. et al. Nat. Genet. http://dx.doi.org/10.1038/ng.3573 (2016).

  17. Ahmed, A.A. et al. J. Pathol. 221, 49–56 (2010).

    Article  CAS  Google Scholar 

  18. Rosenberg, A. & Hirschberg, J. In Proc. of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning 410–420 (Association for Computational Linguistics, 2007).

  19. Amigó, E., Gonzalo, J., Artiles, J. & Verdejo, F. Inf. Retrieval 12, 461–486 (2009).

    Article  Google Scholar 

  20. Shah, S.P. et al. Nature 461, 809–813 (2009).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We acknowledge generous long-term funding support from the BC Cancer Foundation. In addition, the S.P.S. and S.A. groups receive operating funds from the Canadian Breast Cancer Foundation, the Canadian Cancer Society Research Institute (impact grant 701584 to S.A. and S.P.S.), the Terry Fox Research Institute (PPG program on forme fruste tumors), Canadian Institutes for Health Research (CIHR) (grant MOP-115170 to S.A. and S.P.S.), CIHR Foundation (grant FDN-143246 to S.P.S.), and a CIHR new investigator grant (MSH-261515 to J.N.M.). A.R. is supported by a Frederick Banting and Charles Best CIHR doctoral scholarship. S.P.S. and S.A. are supported by Canada Research Chairs. S.P.S. is a Michael Smith Foundation for Health Research scholar. We thank V. Earle for artwork depicting anatomic sites sampled in the study.

Author information

Authors and Affiliations

Authors

Contributions

A.R., project conception, algorithm development, software implementation, and data analysis; S.A., A.M., E.L., J.B., D.Y., and A.W., single-nucleus sequencing; M.A.S. and C.B.N., data visualization; J.N.M., surgery, sample acquisition, and tumor banking; A.R., S.A., A.B.-C., and S.P.S., manuscript writing; S.P.S., project oversight and senior responsible author.

Corresponding authors

Correspondence to Andrew Roth or Sohrab P Shah.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Performance comparison using 90 synthetic data without doublets.

(a) Example synthetic data used for benchmarking. (b) V-measure metric used to assess clustering performance (higher is better). The mean Hamming distance between predicted genotypes for each cell and their true genotypes in the (c) two-state and (d) three-state representations respectively (lower is better).

Supplementary Figure 2 Performance comparison using 80 synthetic data with doublets.

(a) F-measure of the B- cubed metric to assess feature allocation performance (higher is better). (b) Clone accuracy assessed by the maximum Hamming distance of a predicted clonal genotype to its nearest true clonal genotype in 3 state representation (lower is better).

Supplementary Figure 3 Difference between the number of true clusters and number of clusters predicted by the D-SCG3 model.

Data was simulated from the D-SCG3 model with 100 data points with 10 replicate datasets per parameter setting. We simulated data across a range of doublet probabilities and number of clusters.

Supplementary Figure 4 Copy number profile for the high grade serous ovarian cancer dataset.

Red lines indicate major copy number and blue lines indicate minor copy number. Note that this tumour likely underwent a genome doubling early in its evolutionary history.

Supplementary Figure 5 Missing data in high grade serous ovarian cancer dataset.

Proportion of missing values per cell for SNV events in the high grade serous ovarian cancer data set. Cells are grouped by cluster.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–5, Supplementary Notes 1–3, Supplementary Results and Supplementary Discussion. (PDF 1790 kb)

Supplementary Table 1

Parameters used to generate synthetic data sets (XLS 5 kb)

Supplementary Table 2

P-values from Nemenyi test comparing clustering accuracy using V-measure metric. (XLS 5 kb)

Supplementary Table 3

P-values from Nemenyi test comparing performance of genotype prediction using mean Hamming distance in two-state representation. (XLS 5 kb)

Supplementary Table 4

P-values from Nemenyi test comparing performance of genotype prediction using mean Hamming distance in three-state representation. (XLS 5 kb)

Supplementary Table 5

Clustering performance of methods on synthetic data sets without doublets. (XLS 78 kb)

Supplementary Table 6

Genotyping performance of methods on synthetic data sets without doublets (XLS 49 kb)

Supplementary Table 7

P-values from Nemenyi test comparing feature allocation accuracy using B-cubed metric. (XLS 5 kb)

Supplementary Table 8

Feature allocation performance of methods on data sets with doublets. (XLS 70 kb)

Supplementary Table 9

P-values from Nemenyi test comparing maximum Hamming distance to nearest clone. (XLS 5 kb)

Supplementary Table 10

Accuracy of predicted clonal genotypes of methods on data sets with doublets. (XLS 21 kb)

Supplementary Table 11

Input data for CMM and SCG models for the childhood leukemia data set. (XLS 45 kb)

Supplementary Table 12

Cluster assignments predicted by CMM3 model for the childhood leukemia data set. (XLS 13 kb)

Supplementary Table 13

Cluster assignments predicted by SCG3 model for the childhood leukemia data set. (XLS 13 kb)

Supplementary Table 14

Cluster assignments predicted by D-SCG3 model for the childhood leukemia data set. (XLS 13 kb)

Supplementary Table 15

Predicted genotypes from CMM3 model of clusters with cells assigned for the childhood leukemia data set. (XLS 5 kb)

Supplementary Table 16

Predicted genotypes from SCG3 model of clusters with cells assigned for the childhood leukemia data set. (XLS 5 kb)

Supplementary Table 17

Predicted genotypes from D-SCG3 model of clusters with cells assigned for the childhood leukemia data set. (XLS 5 kb)

Supplementary Table 18

Input data for D-SCG3 model for the HGSOC data set. (XLS 162 kb)

Supplementary Table 19

Cluster assignments for the HGSOC data set using D-SCG3 model. (XLS 29 kb)

Supplementary Table 20

Predicted genotypes of clusters with cells assigned for the HGSOC data set using D-SCG3 model. (XLS 9 kb)

Supplementary Table 21

Predicted clone prevalences for the HGSOC data set using D-SCG3 model. (XLS 9 kb)

Supplementary Software

Single cell genotyper model and simulation code. (ZIP 67 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Roth, A., McPherson, A., Laks, E. et al. Clonal genotype and population structure inference from single-cell tumor sequencing. Nat Methods 13, 573–576 (2016). https://doi.org/10.1038/nmeth.3867

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nmeth.3867

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer