To the Editor:
Genomic information about predisposing germline mutations in normal cells as well as acquired somatic lesions in cancer cells will enable the development and delivery of individualized cancer therapies. Ongoing global initiatives have shown that the spectra of somatic and germline genetic lesions in pediatric cancer are distinct from those found in adult cancer1,2,3. However, existing cancer genome data portals (cBioPortal4 and the Catalogue of Somatic Mutations in Cancer (COSMIC)5) have focused primarily on presenting data generated by adult cancer studies. They also lack features for exploring pathogenic germline mutations, gene fusions and mutation stratification by cancer subtype, all of which are of great importance in pediatric cancer.
Here we describe ProteinPaint, a web application for simultaneously visualizing genetic lesions (including sequence mutations and gene fusions) and RNA expression in pediatric cancers. The pediatric data set consists of 27,188 validated somatic coding lesions acquired at diagnosis or relapse from 17 subtypes of pediatric cancer, 252 pathogenic or loss-of-function germline lesions detected in >1,000 pediatric patients with cancer of 21 subtypes6 and RNA sequencing (RNA-seq) data for 928 pediatric tumors from 36 subtypes (Supplementary Note). The data were compiled from five major studies (Supplementary Note) and will be expanded with the publication of additional pediatric cancer studies.
In ProteinPaint, genetic lesions from pediatric cancers are shown on a protein panel (Fig. 1) with the option for a parallel view of a curated version of published somatic mutations in the COSMIC database (Supplementary Note). This parallel view enables the use of adult data for interpreting the significance of rare genetic lesions in pediatric cancer (Supplementary Figs. 1 and 2) and vice versa (Supplementary Fig. 3). To ensure consistency, all variants were reannotated with a modified version of ANNOVAR7. As an example, we show how this presentation has enabled the detection of aberrant splicing caused by recurrent 'silent' mutations in TP53. This finding also provided insight into the pathogenicity of matching germline variants found in patients with cancer predisposition syndromes (Supplementary Fig. 1 and Supplementary Note). Additionally, presentation of mutant allele fractions in DNA and RNA facilitates evaluation of tumor heterogeneity, which is related to cancer relapse (Supplementary Fig. 3), as well as detection of allelic imbalance in DNA or RNA caused by a second genetic or epigenetic event in tumors (Supplementary Fig. 1 and Supplementary Note). Loss of heterozygosity (LOH), which was computed using the CONSERTING8 algorithm in the pediatric cancer genomes we analyzed, is shown to further facilitate the identification of double-hit mutations affecting both alleles of a gene (Supplementary Fig. 4).
The expression panel in ProteinPaint presents the rank and amount of gene expression for each sample, with superimposed box plots summarizing the expression range of the entire cohort or user-selected subtypes. Selecting a genetic lesion such as the PAX5-JAK2 fusion on the protein panel automatically highlights the mutated samples on the expression panel; in the case of PAX5-JAK2 fusion, the expression panel shows the aberrantly high expression of JAK2 caused by gene fusion (Fig. 1b). Conversely, examination of aberrant expression in a tumor may lead to new insight into the causal genetic lesion. We show an example of how outlier expression of FLT3 in a leukemia with a kinase activation signature led to the discovery of high-level FLT3 amplification resulting from replication of an episome formed by a complex rearrangement involving three chromosomes (Supplementary Fig. 5 and Supplementary Note).
ProteinPaint is designed to deliver a premium visualization experience with interactive and animated features. We implemented novel 'disc-on-stem' skewer graphs to depict the diverse prevalence, complex allelic alteration and temporal origin of mutations and gene fusions at a glance (Fig. 1 and Supplementary Figs. 6 and 7). Customized views include display of mutation and expression by cancer subtype or tumor tissue, dynamic zoom and integration of user-provided data, with new features implemented according to user feedback. Data in mutation annotation format (MAF) generated by studies such as The Cancer Genome Atlas (TCGA) or individual research laboratories can be uploaded to ProteinPaint to enable data visualization and cross-study comparison for the broad genetic research community (Supplementary Fig. 8 and Supplementary Tutorial). Manually curated protein domains have been incorporated for genes frequently mutated in pediatric cancer to facilitate the interpretation of mutation pathogenicity (Fig. 1 and Supplementary Fig. 6). ProteinPaint complements existing cancer genome portals by providing a comprehensive and intuitive view of pediatric cancer genomic data with advanced visualization features, as well as integration of expression and adult cancer data (Supplementary Figs. 6 and 7, and Supplementary Note). Taken as a whole, these features make ProteinPaint a powerful tool for analyzing genomic data to enhance pediatric cancer research, collaboration and clinical care.
URLs. ProteinPaint, https://pecan.stjude.org/proteinpaint/.
J.Z. and J.R.D. conceived the project. X.Z. and J.Z. designed the project. X.Z. implemented ProteinPaint. J.Z., G.W., M.P., A.P., J.B. and M.C.R. performed quality control checks or participated in software development. M.N.E., M.R.W., G.W., Y. Li, Z.Z. and Y. Liu generated the data. J.Z. and J.R.D. supervised the project. X.Z., M.N.E. and J.Z. wrote the manuscript.
Downing, J.R. et al. Nat. Genet. 44, 619–622 (2012).
Zhang, J. et al. Nature 481, 157–163 (2012).
Wu, G. et al. Nat. Genet. 44, 251–253 (2012).
Gao, J. et al. Sci. Signal. 6, pl1 (2013).
Forbes, S.A. et al. Nucleic Acids Res. 43, D805–D811 (2015).
Zhang, J. et al. N. Engl. J. Med. (http://dx.doi.org/10.1056/NEJMoa1508054 (18 November 2015).
Wang, K., Li, M. & Hakonarson, H. Nucleic Acids Res. 38, e164 (2010).
Chen, X. et al. Nat. Methods 12, 527–530 (2015).
Liang, S.H. & Clarke, M.F. J. Biol. Chem. 274, 32699–32703 (1999).
Roberts, K.G. et al. N. Engl. J. Med. 371, 1005–1015 (2014).
We thank J. Klco, P. Northcott and C. Mullighan for helpful suggestions. We thank the reviewers of this manuscript for suggestions of implementing new interface for custom data upload. This study was supported by the St. Jude Children's Research Hospital–Washington University Pediatric Cancer Genome Project, Cancer Center support grant P30 CA021765 from the US National Cancer Institute and the American Lebanese Syrian Associated Charities of St. Jude Children's Research Hospital.
The authors declare no competing financial interests.
About this article
Cite this article
Zhou, X., Edmonson, M., Wilkinson, M. et al. Exploring genomic alteration in pediatric cancer using ProteinPaint. Nat Genet 48, 4–6 (2016). https://doi.org/10.1038/ng.3466
Cancer Cell (2021)
Genomic diagnostics in polycystic kidney disease: an assessment of real-world use of whole-genome sequencing
European Journal of Human Genetics (2021)
Clinical Genetics (2021)
Trends in Genetics (2021)
Identification of a novel KMT2A / GIMAP8 gene fusion in a pediatric patient with acute undifferentiated leukemia
Genes, Chromosomes and Cancer (2021)