Exploring genomic alteration in pediatric cancer using ProteinPaint


To the Editor:

Genomic information about predisposing germline mutations in normal cells as well as acquired somatic lesions in cancer cells will enable the development and delivery of individualized cancer therapies. Ongoing global initiatives have shown that the spectra of somatic and germline genetic lesions in pediatric cancer are distinct from those found in adult cancer1,2,3. However, existing cancer genome data portals (cBioPortal4 and the Catalogue of Somatic Mutations in Cancer (COSMIC)5) have focused primarily on presenting data generated by adult cancer studies. They also lack features for exploring pathogenic germline mutations, gene fusions and mutation stratification by cancer subtype, all of which are of great importance in pediatric cancer.

Here we describe ProteinPaint, a web application for simultaneously visualizing genetic lesions (including sequence mutations and gene fusions) and RNA expression in pediatric cancers. The pediatric data set consists of 27,188 validated somatic coding lesions acquired at diagnosis or relapse from 17 subtypes of pediatric cancer, 252 pathogenic or loss-of-function germline lesions detected in >1,000 pediatric patients with cancer of 21 subtypes6 and RNA sequencing (RNA-seq) data for 928 pediatric tumors from 36 subtypes (Supplementary Note). The data were compiled from five major studies (Supplementary Note) and will be expanded with the publication of additional pediatric cancer studies.

In ProteinPaint, genetic lesions from pediatric cancers are shown on a protein panel (Fig. 1) with the option for a parallel view of a curated version of published somatic mutations in the COSMIC database (Supplementary Note). This parallel view enables the use of adult data for interpreting the significance of rare genetic lesions in pediatric cancer (Supplementary Figs. 1 and 2) and vice versa (Supplementary Fig. 3). To ensure consistency, all variants were reannotated with a modified version of ANNOVAR7. As an example, we show how this presentation has enabled the detection of aberrant splicing caused by recurrent 'silent' mutations in TP53. This finding also provided insight into the pathogenicity of matching germline variants found in patients with cancer predisposition syndromes (Supplementary Fig. 1 and Supplementary Note). Additionally, presentation of mutant allele fractions in DNA and RNA facilitates evaluation of tumor heterogeneity, which is related to cancer relapse (Supplementary Fig. 3), as well as detection of allelic imbalance in DNA or RNA caused by a second genetic or epigenetic event in tumors (Supplementary Fig. 1 and Supplementary Note). Loss of heterozygosity (LOH), which was computed using the CONSERTING8 algorithm in the pediatric cancer genomes we analyzed, is shown to further facilitate the identification of double-hit mutations affecting both alleles of a gene (Supplementary Fig. 4).

Figure 1: Comprehensive visualization of sequence mutations, gene fusions and RNA expression using ProteinPaint.

(a) TP53 mutation profile in the pediatric data set (top) and COSMIC database (bottom). The number of samples affected by each mutation is indicated by the text within each disc, as well as by disc size. The arc outside each disc indicates the proportion of samples that are germline (filled) or relapsed tumor (open). The full legend is shown in Supplementary Figure 1. The manually curated NLS domain shows a hotspot nonsense variant, p.Arg306*, that disrupts a known nucleotide localization signal9. (b) JAK2 gene fusion and expression. Left, JAK2 fusions are shown along with sequence mutations affecting the pseudokinase and kinase domains of JAK2. A half-filled disc represents a gene fusion, with the filled section representing the N or C terminus of the protein involved in the fusion. The arrow points to the PAX5-JAK2 fusion detected in seven tumors of Ph-like B cell acute lymphoblastic leukemia10. The fusion protein involves the C terminus of JAK2. Right, JAK2 expression levels in pediatric samples. The horizontal axis represents the range of FPKM (fragments per kilobase of transcript per million mapped reads) values. Gray circles represent samples in descending order of JAK2 FPKM value. Samples represented by filled red circles are those with PAX5-JAK2 fusion selected by the user. The ratios of PAX5-JAK2 fusion transcript to the overall expression of its two partner genes, PAX5 and JAK2, are labeled in red text. Box plots represent FPKM distributions in pediatric cancer cohorts, labeled by disease name and cohort size.

The expression panel in ProteinPaint presents the rank and amount of gene expression for each sample, with superimposed box plots summarizing the expression range of the entire cohort or user-selected subtypes. Selecting a genetic lesion such as the PAX5-JAK2 fusion on the protein panel automatically highlights the mutated samples on the expression panel; in the case of PAX5-JAK2 fusion, the expression panel shows the aberrantly high expression of JAK2 caused by gene fusion (Fig. 1b). Conversely, examination of aberrant expression in a tumor may lead to new insight into the causal genetic lesion. We show an example of how outlier expression of FLT3 in a leukemia with a kinase activation signature led to the discovery of high-level FLT3 amplification resulting from replication of an episome formed by a complex rearrangement involving three chromosomes (Supplementary Fig. 5 and Supplementary Note).

ProteinPaint is designed to deliver a premium visualization experience with interactive and animated features. We implemented novel 'disc-on-stem' skewer graphs to depict the diverse prevalence, complex allelic alteration and temporal origin of mutations and gene fusions at a glance (Fig. 1 and Supplementary Figs. 6 and 7). Customized views include display of mutation and expression by cancer subtype or tumor tissue, dynamic zoom and integration of user-provided data, with new features implemented according to user feedback. Data in mutation annotation format (MAF) generated by studies such as The Cancer Genome Atlas (TCGA) or individual research laboratories can be uploaded to ProteinPaint to enable data visualization and cross-study comparison for the broad genetic research community (Supplementary Fig. 8 and Supplementary Tutorial). Manually curated protein domains have been incorporated for genes frequently mutated in pediatric cancer to facilitate the interpretation of mutation pathogenicity (Fig. 1 and Supplementary Fig. 6). ProteinPaint complements existing cancer genome portals by providing a comprehensive and intuitive view of pediatric cancer genomic data with advanced visualization features, as well as integration of expression and adult cancer data (Supplementary Figs. 6 and 7, and Supplementary Note). Taken as a whole, these features make ProteinPaint a powerful tool for analyzing genomic data to enhance pediatric cancer research, collaboration and clinical care.

URLs. ProteinPaint, https://pecan.stjude.org/proteinpaint/.

Author contributions

J.Z. and J.R.D. conceived the project. X.Z. and J.Z. designed the project. X.Z. implemented ProteinPaint. J.Z., G.W., M.P., A.P., J.B. and M.C.R. performed quality control checks or participated in software development. M.N.E., M.R.W., G.W., Y. Li, Z.Z. and Y. Liu generated the data. J.Z. and J.R.D. supervised the project. X.Z., M.N.E. and J.Z. wrote the manuscript.


  1. 1

    Downing, J.R. et al. Nat. Genet. 44, 619–622 (2012).

    CAS  Article  Google Scholar 

  2. 2

    Zhang, J. et al. Nature 481, 157–163 (2012).

    CAS  Article  Google Scholar 

  3. 3

    Wu, G. et al. Nat. Genet. 44, 251–253 (2012).

    CAS  Article  Google Scholar 

  4. 4

    Gao, J. et al. Sci. Signal. 6, pl1 (2013).

    Article  Google Scholar 

  5. 5

    Forbes, S.A. et al. Nucleic Acids Res. 43, D805–D811 (2015).

    CAS  Article  Google Scholar 

  6. 6

    Zhang, J. et al. N. Engl. J. Med. (http://dx.doi.org/10.1056/NEJMoa1508054 (18 November 2015).

  7. 7

    Wang, K., Li, M. & Hakonarson, H. Nucleic Acids Res. 38, e164 (2010).

    Article  Google Scholar 

  8. 8

    Chen, X. et al. Nat. Methods 12, 527–530 (2015).

    CAS  Article  Google Scholar 

  9. 9

    Liang, S.H. & Clarke, M.F. J. Biol. Chem. 274, 32699–32703 (1999).

    CAS  Article  Google Scholar 

  10. 10

    Roberts, K.G. et al. N. Engl. J. Med. 371, 1005–1015 (2014).

    Article  Google Scholar 

Download references


We thank J. Klco, P. Northcott and C. Mullighan for helpful suggestions. We thank the reviewers of this manuscript for suggestions of implementing new interface for custom data upload. This study was supported by the St. Jude Children's Research Hospital–Washington University Pediatric Cancer Genome Project, Cancer Center support grant P30 CA021765 from the US National Cancer Institute and the American Lebanese Syrian Associated Charities of St. Jude Children's Research Hospital.

Author information



Corresponding author

Correspondence to Jinghui Zhang.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Figures and Text

Supplementary Figures 1–8 and Supplementary Note. (PDF 4868 kb)

Supplementary Tutorial

Supplementary tutorial for ProteinPaint. (PDF 3710 kb)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhou, X., Edmonson, M., Wilkinson, M. et al. Exploring genomic alteration in pediatric cancer using ProteinPaint. Nat Genet 48, 4–6 (2016). https://doi.org/10.1038/ng.3466

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing