Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

PyClone: statistical inference of clonal population structure in cancer

Abstract

We introduce PyClone, a statistical model for inference of clonal population structures in cancers. PyClone is a Bayesian clustering method for grouping sets of deeply sequenced somatic mutations into putative clonal clusters while estimating their cellular prevalences and accounting for allelic imbalances introduced by segmental copy-number changes and normal-cell contamination. Single-cell sequencing validation demonstrates PyClone's accuracy.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.

from$8.99

All prices are NET prices.

Figure 1: Comparison of clustering performance for the mixture of normal-tissue data sets.
Figure 2: Joint analysis of multiple samples from high-grade serous ovarian cancer 2.

References

  1. 1

    Nowell, P.C. Science 194, 23–28 (1976).

    CAS  Article  Google Scholar 

  2. 2

    Aparicio, S. & Caldas, C. N. Engl. J. Med. 368, 842–851 (2013).

    CAS  Article  Google Scholar 

  3. 3

    Greaves, M. & Maley, C.C. Nature 481, 306–313 (2012).

    CAS  Article  Google Scholar 

  4. 4

    Shah, S.P. et al. Nature 486, 395–399 (2012).

    CAS  Article  Google Scholar 

  5. 5

    Ding, L. et al. Nature 481, 506–510 (2012).

    CAS  Article  Google Scholar 

  6. 6

    Nik-Zainal, S. et al. Cell 149, 994–1007 (2012).

    CAS  Article  Google Scholar 

  7. 7

    Carter, S.L. et al. Nat. Biotechnol. 30, 413–421 (2012).

    CAS  Article  Google Scholar 

  8. 8

    Govindan, R. et al. Cell 150, 1121–1134 (2012).

    CAS  Article  Google Scholar 

  9. 9

    Shah, S.P. et al. Nature 461, 809–813 (2009).

    CAS  Article  Google Scholar 

  10. 10

    Gerlinger, M. et al. N. Engl. J. Med. 366, 883–892 (2012).

    CAS  Article  Google Scholar 

  11. 11

    The 1000 Genomes Project Consortium. Nature 467, 1061–1073 (2010).

  12. 12

    Harismendy, O. et al. Genome Biol. 12, R124 (2011).

    CAS  Article  Google Scholar 

  13. 13

    Rosenberg, A. & Hirschberg, J. in Proc. 2007 Joint Conf. Empir. Methods Natural Lang. Process. Comput. Natural Lang. Learn. (EMNLP-CoNLL) Vol. 410, 420 (2007).

    Google Scholar 

  14. 14

    Bashashati, A. et al. J. Pathol. 231, 21–34 (2013).

    CAS  Article  Google Scholar 

  15. 15

    Forshew, T. et al. Sci. Transl. Med. 4, 136ra68 (2012).

    Article  Google Scholar 

  16. 16

    Dawson, S.J. et al. N. Engl. J. Med. 368, 1199–1209 (2013).

    CAS  Article  Google Scholar 

  17. 17

    Sottoriva, A. et al. Proc. Natl. Acad. Sci. USA 110, 4009–4014 (2013).

    CAS  Article  Google Scholar 

  18. 18

    Fritsch, A. & Ickstadt, K. Bayesian Anal. 4, 367–392 (2009).

    Article  Google Scholar 

  19. 19

    Ng, S.B. et al. Nature 461, 272–276 (2009).

    CAS  Article  Google Scholar 

  20. 20

    Van Loo, P. et al. Proc. Natl. Acad. Sci. USA 107, 16910–16915 (2010).

    CAS  Article  Google Scholar 

  21. 21

    Greenman, C.D. et al. Biostatistics 11, 164–175 (2010).

    Article  Google Scholar 

  22. 22

    Yau, C. et al. Genome Biol. 11, R92 (2010).

    CAS  PubMed  PubMed Central  Google Scholar 

  23. 23

    Untergasser, A. et al. Nucleic Acids Res. 40, e115 (2012).

    CAS  Article  Google Scholar 

  24. 24

    Li, H. & Durbin, R. Bioinformatics 26, 589–595 (2010).

    Article  Google Scholar 

Download references

Acknowledgements

This work is funded by Canadian Institutes for Health Research (CIHR), Genome Canada, Genome British Columbia, Canadian Cancer Society Research Institute and Canadian Breast Cancer Foundation grants to S.P.S. and S.A. S.P.S. is supported by the Michael Smith Foundation for Health Research and is the Canada Research Chair (CRC) for Computational Cancer Genomics. S.A. is the CRC for Molecular Oncology. A.R. is supported by a CIHR Banting scholarship.

Author information

Affiliations

Authors

Contributions

Project conception and oversight: S.P.S., S.A., A.R.; method development: A.R., A.B.-C., S.P.S.; implementation and benchmarking: A.R.; manuscript writing and editing, study design and execution: A.R., A.B.C., S.P.S., S.A.; single-cell sequencing: J.K., D.Y., A.W., E.L., J.B.; data analysis and interpretation: G.H.

Corresponding author

Correspondence to Sohrab P Shah.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–14, Supplementary Results, Supplementary Discussion and Supplementary Note (PDF 5370 kb)

Supplementary Table 1

Allelic counts, IBBMM and PyClone PCN cellular prevalence estimates for mutations in high grade serous ovarian cancer case 2. Copy number predictions where inferred using PICNIC as described in the Online Methods. Cellular prevalences where computed by taking the mean of the post burnin trace for the cellular prevalences for the respective methods. The standard deviation of the cellular prevalence parameter estimated from the post burnin trace is also included. Cluster ids (last two columns) were predicted from the post burnin trace using the MPEAR clustering criteria as described in the Online Methods and Supplementary Note. Mutation ids list gene name, chromosome and chromosome coordinate. All coordinates are in the hg19 coordinate system. (XLS 50 kb)

Supplementary Table 2

Allelic counts, IBBMM and PyClone PCN cellular prevalence estimates for mutations in high grade serous ovarian cancer case 1. Copy number predictions where inferred using PICNIC as described in the Online Methods. Cellular prevalences where computed by taking the mean of the post burnin trace for the cellular prevalences for the respective methods. The standard deviation of the cellular prevalence parameter estimated from the post burnin trace is also included. Cluster ids (last two columns) were predicted from the post burnin trace using the MPEAR clustering criteria as described in the Online Methods and Supplementary Note. Mutation ids list gene name, chromosome and chromosome coordinate. All coordinates are in the hg19 coordinate system. (XLSX 40 kb)

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Roth, A., Khattra, J., Yap, D. et al. PyClone: statistical inference of clonal population structure in cancer. Nat Methods 11, 396–398 (2014). https://doi.org/10.1038/nmeth.2883

Download citation

Further reading

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing