Focus on TCGA Pan-Cancer Analysis

Enabling transparent and collaborative computational analysis of 12 tumor types within The Cancer Genome Atlas

Journal name:
Nature Genetics
Volume:
45,
Pages:
1121–1126
Year published:
DOI:
doi:10.1038/ng.2761
Published online

Abstract

The Cancer Genome Atlas Pan-Cancer Analysis Working Group collaborated on the Synapse software platform to share and evolve data, results and methodologies while performing integrative analysis of molecular profiling data from 12 tumor types. The group's work serves as a pilot case study that provides (i) a template for future large collaborative studies; (ii) a system to support collaborative projects; and (iii) a public resource of highly curated data, results and automated systems for the evaluation of community-developed models.

At a glance

Figures

  1. Molecular profiling data sets in the Pan-Cancer project.
    Figure 1: Molecular profiling data sets in the Pan-Cancer project.

    Each circular plot displays the total number of samples analyzed across each of the 12 tumor types in the Pan-Cancer project. Samples are arranged in the same order in each concentric circle for each tumor type. Different circles are colored according to whether the sample was profiled using the most current platform, was profiled using a legacy platform or was not profiled. Each data set, including older versions, is available in Synapse (syn300013).

  2. Schematic of the Pan-Cancer analysis workflow.
    Figure 2: Schematic of the Pan-Cancer analysis workflow.

    Data were aggregated and standardized from the TCGA DCC, Broad Firehose and individual analysis working groups and processed into easy-to-use tab-delimited files. Collaborators used a variety of analytic tools, such as R, Python, Unix shell and the web client, to interact with data in Synapse while also storing results, provenance records, analysis descriptions and source code. For a subset of these results (for example, patient survival predictions), Synapse carried out automated performance evaluations and displayed results on a real-time leader board, which were available to collaborators to perform comparative meta-analysis or adapt model source code to additional applications.

  3. Example provenance graph of a multistep workflow showing interaction between the analysis of three researchers.
    Figure 3: Example provenance graph of a multistep workflow showing interaction between the analysis of three researchers.

    The provenance record consists of two types of nodes—activities (shown as red boxes above) performed by a researcher and input and output files of these actions (shown as file and folder icons and identified by their name and Synapse ID). In addition, every activity has metadata associated with it to further describe the details of the actions performed. This specific graph shows the workflow used to perform comparative analysis of two mutation-calling algorithms—MuSiC and MutSig. For MuSiC, the provenance of analysis is displayed from input data to derivation of mutation calls. Provenance records may be further expanded (ellipses) to trace the origin of input files to their original data source in Firehose, DCC or personal communications with AWG members. For brevity, the MutSig graph is not expanded. This graph was produced from version 2 of the data in doi:10.7303/syn1750331.

References

  1. Edgar, R., Domrachev, M. & Lash, A.E. Nucleic Acids Res. 30, 207210 (2002).
  2. Mailman, M.D. et al. Nat. Genet. 39, 11811186 (2007).
  3. Goecks, J., Nekrutenko, A. & Taylor, J. Genome Biol. 11, R86 (2010).
  4. Wolstencroft, K. et al. Nucleic Acids Res. 41, W557W561 (2013).
  5. Derry, J.M.J. et al. Nat. Genet. 44, 127130 (2012).
  6. Cancer Genome Atlas Research Network. Nature 474, 609615 (2011).
  7. Cancer Genome Atlas Research Network. N. Engl. J. Med. 368, 20592074 (2013).
  8. Cancer Genome Atlas Research Network. Nature 497, 6773 (2013).
  9. Cancer Genome Atlas Network. Nature 490, 6170 (2012).
  10. Cancer Genome Atlas Research Network. Nature 489, 519525 (2012).
  11. Cancer Genome Atlas Research Network. Nature 455, 10611068 (2008).
  12. Cancer Genome Atlas Network. Nature 487, 330337 (2012).
  13. Dees, N.D. et al. Genome Res. 22, 15891598 (2012).
  14. Bilal, E. et al. PLOS Comput. Biol. 9, e1003047 (2013).
  15. Shi, L. et al. Nat. Biotechnol. 28, 827838 (2010).
  16. Margolin, A.A. et al. Sci. Transl. Med. 5, 181re1 (2013).
  17. Saito, R. et al. Nat. Methods 9, 10691076 (2012).
  18. Kelder, T. et al. PLoS ONE 4, e6447 (2009).
  19. Benkler, Y. Yale Law J. 114, 273358 (2005).
  20. Kolata, G. New York Times (12 August 2010).
  21. Zack, T.I. et al. Nat. Genet. doi:10.1038/ng.2760 (26 October 2013).
  22. Lawrence, M.S. et al Nature 499, 214218 (2013).
  23. Li, J. et al. Nat. Meth. doi:10.1038/nmeth.2650 (15 October 2013).

Download references

Author information

  1. These authors contributed equally to this work.

    • Larsson Omberg &
    • Kyle Ellrott

Affiliations

  1. Sage Bionetworks, Seattle, Washington, USA.

    • Larsson Omberg,
    • Michael R Kellen,
    • Stephen H Friend &
    • Adam A Margolin
  2. Department of Biomolecular Engineering, University of California, Santa Cruz, Santa Cruz, California, USA.

    • Kyle Ellrott,
    • Chris Wong &
    • Josh Stuart
  3. Department of Bioinformatics and Computational Biology, University of Texas MD Anderson Cancer Center, Houston, Texas, USA.

    • Yuan Yuan &
    • Han Liang
  4. Graduate Program in Structural and Computational Biology and Molecular Biophysics, Baylor College of Medicine, Houston, Texas, USA.

    • Yuan Yuan &
    • Han Liang
  5. The Genome Institute, Washington University, St. Louis, Missouri, USA.

    • Cyriac Kandoth

Contributions

L.O., K.E. and A.A.M. wrote the manuscript with assistance from C.K., S.H.F., J.S. and H.L. L.O., K.E. and C.W. created the visuals for the manuscript. L.O. and K.E. coordinated data aggregation and sharing in Synapse. Y.Y. developed data sampling and primary models for survival predictions. C.K. performed MuSiC analysis and created the corresponding Synapse annotations. L.O. developed infrastructure for scoring and evaluations in Synapse. M.R.K. oversaw the development of Synapse. J.S. and A.A.M. conceived of and oversaw the use of Synapse to support the Pan-Cancer project. The TCGA Research Network contributed all of the results in Synapse.

Competing financial interests

The authors declare no competing financial interests.

Corresponding authors

Correspondence to:

Author details

Supplementary information

Excel files

  1. Supplementary Note (37 KB)

    Supplementary Note

Additional data