Clonal genotype and population structure inference from single-cell tumor sequencing

Roth, Andrew; McPherson, Andrew; Laks, Emma; Biele, Justina; Yap, Damian; Wan, Adrian; Smith, Maia A; Nielsen, Cydney B; McAlpine, Jessica N; Aparicio, Samuel; Bouchard-Côté, Alexandre; Shah, Sohrab P

doi:10.1038/nmeth.3867

Brief Communication
Published: 16 May 2016

Clonal genotype and population structure inference from single-cell tumor sequencing

Andrew Roth^1,2,
Andrew McPherson ORCID: orcid.org/0000-0002-5654-5101^1,3,
Emma Laks¹,
Justina Biele¹,
Damian Yap ORCID: orcid.org/0000-0002-5370-4592^1,4,
Adrian Wan¹,
Maia A Smith¹,
Cydney B Nielsen^1,4,
Jessica N McAlpine⁵,
Samuel Aparicio^1,4,
Alexandre Bouchard-Côté⁶ &
…
Sohrab P Shah ORCID: orcid.org/0000-0001-6402-523X^1,4,7

Nature Methods volume 13, pages 573–576 (2016)Cite this article

9242 Accesses
71 Citations
38 Altmetric
Metrics details

Subjects

Abstract

Single-cell DNA sequencing has great potential to reveal the clonal genotypes and population structure of human cancers. However, single-cell data suffer from missing values and biased allelic counts as well as false genotype measurements owing to the sequencing of multiple cells. We describe the Single Cell Genotyper (https://bitbucket.org/aroth85/scg), an open-source software based on a statistical model coupled with a mean-field variational inference method, which can be used to address these problems and robustly infer clonal genotypes.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Overview of the SCG model.**

**Figure 2: Comparison of clustering performance on real data with doublets.**

**Figure 3: The D-SCG3 model identified clonal cell populations in multiple samples from an HGSOC patient.**

MQuad enables clonal substructure discovery using single cell mitochondrial variants

Article Open access 08 March 2022

Fast intratumor heterogeneity inference from single-cell sequencing data

Article 08 September 2022

Haplotype-aware analysis of somatic copy number variations from single-cell transcriptomes

Article 26 September 2022

Accession codes

Primary accessions

Sequence Read Archive

References

Navin, N. et al. Nature 472, 90–94 (2011).
Article CAS Google Scholar
Gawad, C., Koh, W. & Quake, S.R. Proc. Natl. Acad. Sci. USA 111, 17947–17952 (2014).
Article CAS Google Scholar
Wang, Y. et al. Nature 512, 155–160 (2014).
Article CAS Google Scholar
Baslan, T. et al. Genome Res. 25, 714–724 (2015).
Article CAS Google Scholar
Eirew, P. et al. Nature 518, 422–426 (2015).
Article CAS Google Scholar
Navin, N.E. Sci. Transl. Med. 7, 296fs29 (2015).
Article Google Scholar
Roth, A. et al. Nat. Methods 11, 396–398 (2014).
Article CAS Google Scholar
Jiao, W., Vembu, S., Deshwar, A.G., Stein, L. & Morris, Q. BMC Bioinformatics 15, 35 (2014).
Article Google Scholar
Zare, H. et al. PLoS Comput. Biol. 10, e1003703 (2014).
Article Google Scholar
Malikic, S., McPherson, A.W., Donmez, N. & Sahinalp, C.S. Bioinformatics 31, 1349–1356 (2015).
Article CAS Google Scholar
Popic, V. et al. Genome Biol. 16, 91 (2015).
Article Google Scholar
Shapiro, E., Biezuner, T. & Linnarsson, S. Nat. Rev. Genet. 14, 618–630 (2013).
Article CAS Google Scholar
Ning, L. et al. Front. Oncol. 4, 7 (2014).
Article Google Scholar
Yuan, K., Sakoparnig, T., Markowetz, F. & Beerenwinkel, N. Genome Biol. 16, 36 (2015).
Article Google Scholar
Broderick, T., Pitman, J. & Jordan, M.I. Bayesian Anal. 8, 801–836 (2013).
Article Google Scholar
McPherson, A. et al. Nat. Genet. http://dx.doi.org/10.1038/ng.3573 (2016).
Ahmed, A.A. et al. J. Pathol. 221, 49–56 (2010).
Article CAS Google Scholar
Rosenberg, A. & Hirschberg, J. In Proc. of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning 410–420 (Association for Computational Linguistics, 2007).
Amigó, E., Gonzalo, J., Artiles, J. & Verdejo, F. Inf. Retrieval 12, 461–486 (2009).
Article Google Scholar
Shah, S.P. et al. Nature 461, 809–813 (2009).
Article CAS Google Scholar

Download references

Acknowledgements

We acknowledge generous long-term funding support from the BC Cancer Foundation. In addition, the S.P.S. and S.A. groups receive operating funds from the Canadian Breast Cancer Foundation, the Canadian Cancer Society Research Institute (impact grant 701584 to S.A. and S.P.S.), the Terry Fox Research Institute (PPG program on forme fruste tumors), Canadian Institutes for Health Research (CIHR) (grant MOP-115170 to S.A. and S.P.S.), CIHR Foundation (grant FDN-143246 to S.P.S.), and a CIHR new investigator grant (MSH-261515 to J.N.M.). A.R. is supported by a Frederick Banting and Charles Best CIHR doctoral scholarship. S.P.S. and S.A. are supported by Canada Research Chairs. S.P.S. is a Michael Smith Foundation for Health Research scholar. We thank V. Earle for artwork depicting anatomic sites sampled in the study.

Author information

Authors and Affiliations

Department of Molecular Oncology, BC Cancer Agency, Vancouver, British Columbia, Canada
Andrew Roth, Andrew McPherson, Emma Laks, Justina Biele, Damian Yap, Adrian Wan, Maia A Smith, Cydney B Nielsen, Samuel Aparicio & Sohrab P Shah
Graduate Bioinformatics Training Program, University of British Columbia, Vancouver, British Columbia, Canada
Andrew Roth
School of Computing Science, Simon Fraser University, Burnaby, British Columbia, Canada
Andrew McPherson
Department of Pathology and Laboratory Medicine, University of British Columbia, Vancouver, British Columbia, Canada
Damian Yap, Cydney B Nielsen, Samuel Aparicio & Sohrab P Shah
Department of Gynecology and Obstetrics, University of British Columbia, Vancouver, British Columbia, Canada
Jessica N McAlpine
Department of Statistics, University of British Columbia, Vancouver, British Columbia, Canada
Alexandre Bouchard-Côté
Michael Smith Genome Sciences Centre, BC Cancer Agency, Vancouver, British Columbia, Canada
Sohrab P Shah

Authors

Andrew Roth
View author publications
You can also search for this author in PubMed Google Scholar
Andrew McPherson
View author publications
You can also search for this author in PubMed Google Scholar
Emma Laks
View author publications
You can also search for this author in PubMed Google Scholar
Justina Biele
View author publications
You can also search for this author in PubMed Google Scholar
Damian Yap
View author publications
You can also search for this author in PubMed Google Scholar
Adrian Wan
View author publications
You can also search for this author in PubMed Google Scholar
Maia A Smith
View author publications
You can also search for this author in PubMed Google Scholar
Cydney B Nielsen
View author publications
You can also search for this author in PubMed Google Scholar
Jessica N McAlpine
View author publications
You can also search for this author in PubMed Google Scholar
Samuel Aparicio
View author publications
You can also search for this author in PubMed Google Scholar
Alexandre Bouchard-Côté
View author publications
You can also search for this author in PubMed Google Scholar
Sohrab P Shah
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

A.R., project conception, algorithm development, software implementation, and data analysis; S.A., A.M., E.L., J.B., D.Y., and A.W., single-nucleus sequencing; M.A.S. and C.B.N., data visualization; J.N.M., surgery, sample acquisition, and tumor banking; A.R., S.A., A.B.-C., and S.P.S., manuscript writing; S.P.S., project oversight and senior responsible author.

Corresponding authors

Correspondence to Andrew Roth or Sohrab P Shah.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Integrated supplementary information

Supplementary Figure 1 Performance comparison using 90 synthetic data without doublets.

(a) Example synthetic data used for benchmarking. (b) V-measure metric used to assess clustering performance (higher is better). The mean Hamming distance between predicted genotypes for each cell and their true genotypes in the (c) two-state and (d) three-state representations respectively (lower is better).

Supplementary Figure 2 Performance comparison using 80 synthetic data with doublets.

(a) F-measure of the B- cubed metric to assess feature allocation performance (higher is better). (b) Clone accuracy assessed by the maximum Hamming distance of a predicted clonal genotype to its nearest true clonal genotype in 3 state representation (lower is better).

Supplementary Figure 3 Difference between the number of true clusters and number of clusters predicted by the D-SCG3 model.

Data was simulated from the D-SCG3 model with 100 data points with 10 replicate datasets per parameter setting. We simulated data across a range of doublet probabilities and number of clusters.

Supplementary Figure 4 Copy number profile for the high grade serous ovarian cancer dataset.

Red lines indicate major copy number and blue lines indicate minor copy number. Note that this tumour likely underwent a genome doubling early in its evolutionary history.

Supplementary Figure 5 Missing data in high grade serous ovarian cancer dataset.

Proportion of missing values per cell for SNV events in the high grade serous ovarian cancer data set. Cells are grouped by cluster.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Roth, A., McPherson, A., Laks, E. et al. Clonal genotype and population structure inference from single-cell tumor sequencing. Nat Methods 13, 573–576 (2016). https://doi.org/10.1038/nmeth.3867

Download citation

Received: 14 October 2015
Accepted: 06 April 2016
Published: 16 May 2016
Issue Date: July 2016
DOI: https://doi.org/10.1038/nmeth.3867

This article is cited by

Conifer: clonal tree inference for tumor heterogeneity with single-cell and bulk sequencing data
- Leila Baghaarabani
- Sama Goliaei
- Bahram Goliaei
BMC Bioinformatics (2021)
PyClone-VI: scalable inference of clonal population structures using whole genome data
- Sierra Gillis
- Andrew Roth
BMC Bioinformatics (2020)
Machine learning approaches to drug response prediction: challenges and recent progress
- George Adam
- Ladislav Rampášek
- Anna Goldenberg
npj Precision Oncology (2020)
Cardelino: computational integration of somatic clonal substructure and single-cell transcriptomes
- Davis J. McCarthy
- Raghd Rostom
- Sarah A. Teichmann
Nature Methods (2020)
Learning mutational graphs of individual tumour evolution from single-cell and multi-region sequencing data
- Daniele Ramazzotti
- Alex Graudenzi
- Giulio Caravagna
BMC Bioinformatics (2019)