The landscape of genomic alterations across childhood cancers

Pan-cancer analyses that examine commonalities and differences among various cancer types have emerged as a powerful way to obtain novel insights into cancer biology. Here we present a comprehensive analysis of genetic alterations in a pan-cancer cohort including 961 tumours from children, adolescents, and young adults, comprising 24 distinct molecular types of cancer. Using a standardized workflow, we identified marked differences in terms of mutation frequency and significantly mutated genes in comparison to previously analysed adult cancers. Genetic alterations in 149 putative cancer driver genes separate the tumours into two classes: small mutation and structural/copy-number variant (correlating with germline variants). Structural variants, hyperdiploidy, and chromothripsis are linked to TP53 mutation status and mutational signatures. Our data suggest that 7–8% of the children in this cohort carry an unambiguous predisposing germline variant and that nearly 50% of paediatric neoplasms harbour a potentially druggable event, which is highly relevant for the design of future clinical trials. Analyses of genomes from 914 children, adolescents, and young adults provide a comprehensive resource of genomic alterations across a spectrum of common childhood cancers. The genetic alterations that give rise to childhood cancer are less well studied than those that give rise to adult cancers. Two papers in this issue report some of the first pan-cancer analyses of childhood cancers. Stefan Pfister and colleagues studied germline and somatic genomes from 914 young cancer patients, including children, adolescents and young adults. The tumour samples comprised 24 distinct molecular cancer types, including the most frequent and clinically relevant childhood cancers. The team characterized somatic mutation frequencies, genomic alterations, including structural variations and copy-number analysis, and mutational signatures. They found signatures associated with deficiencies of double-stranded break repair across all cancer types. Additionally, 7.6% of patients carried a likely pathogenic germline variant in a candidate cancer predisposition gene. Jinghui Zhang and colleagues analysed the genomes, exomes and transcriptomes of 1,699 paediatric leukaemias and solid tumours. They identified 142 driver genes in paediatric cancers, over half of which were specific to a single histotype. They also characterized copy number alterations and structural variation and identified 11 mutational signatures. Together, these papers provide a comprehensive resource for genomic alterations across common paediatric tumours, and highlight differences compared with the genomic alterations seen in adult cancers.

Cure rates for childhood cancers have increased to about 80% in recent decades, but cancer is still the leading cause of death by disease in the developed world among children over one year of age 1,2 . Furthermore, many children who survive cancer suffer from long-term sequelae of surgery, cytotoxic chemotherapy, and radiotherapy, including mental disabilities, organ toxicities, and secondary cancers 3 . A crucial step in developing more specific and less damaging therapies is the unravelling of the complete genetic repertoire of paediatric malignancies, which differ from adult malignancies in terms of their histopathological entities and molecular subtypes 4 . Over the past few years, many entityspecific sequencing efforts have been launched, but the few paediatric pan-cancer studies thus far have focused only on mutation frequencies, germline predisposition, and alterations in epigenetic regulators [4][5][6] .
We have carried out a broad exploration of cancers in children, adolescents, and young adults, by incorporating small mutations and copy-number or structural variants on somatic and germline levels, and by identifying putative cancer genes and comparing them to those previously reported in adult cancers by The Cancer Genome Atlas (TCGA) 7 . We have also examined mutational signatures and potential drug targets. The compendium of genetic alterations presented here is available to the scientific community at http://www.pedpancan.com.
This integrative analysis includes 24 types of cancer and covers all major childhood cancer entities, many of which occur exclusively in children 8 (Fig. 1, Supplementary Table 1). Ninety-five per cent of the patients in this study were diagnosed during childhood or adolescence (aged 18 years or younger) and 5% as young adults (up to 25 years) (Extended Data Fig. 1a). This study is biased towards central nervous system tumours, and is complemented by an additional study of a non-overlapping paediatric cohort with mainly leukaemias and extracranial solid tumours 9 .
We compiled paired-end Illumina-based sequencing data for 961 tumours (914 individual patients) from previous cancer-type specific studies (see Methods and Supplementary Note 1) including 547 whole-genome sequences (WGS, median coverage 37× ) and 414 whole-exome sequences (WES, 121× ) partially complemented by low-coverage whole genomes (Supplementary Tables 1, 2). Tumour and matched germline samples were processed with standardized pipelines to detect single nucleotide variants (SNVs), short insertions and A list of authors and affiliations appears at the end of the paper. deletions (indels), copy-number variants (CNVs) and other structural variants. Secondary (relapse) tumours (n = 82, including 47 matched to primaries) were analysed separately from the main primary cohort (n = 879).  Table 3). Relapse tumours harboured significantly more mutations than primary tumours (P = 0.0015, excluding highly mutated tumours; Extended Data Fig. 1d).

Mutation frequencies across cancer types
Tumours with more than 10 mutations per Mb have been referred to as 'hypermutators' , and are often related to deficiencies in mismatch repair (MMR) 10,11 . In this cohort, hypermutation occurred exclusively in H3.3 or H3.1 K27-wildtype (K27wt) high-grade gliomas with biallelic germline mutations in MSH6 or PMS2, with an extremely high mutational burden similar to the highest among adult tumours (in POLE-or POLQ-mutated carcinomas) 7,12 (Fig. 1). Some paediatric tumours had a mutational burden below this threshold, but markedly above average (2-10 mutations per Mb, referred to as 'paediatric highly mutated'), including several K27wt high-grade gliomas with monoallelic germline variants in MSH2, MSH6 or PMS2 (Fig. 1). Whether these highly mutated tumours respond to immune checkpoint inhibitors, as described for paediatric glioblastoma, should be of clinical interest 13 .

Mutational processes in childhood cancers
Most cancer types predominantly harboured C > T transitions (≥ 30% of SNVs in two-thirds of cancer types) linked to mutational signature 1, whose previously described age-association occurred in some paediatric brain tumours 15,16 (P < 0.05; Extended Data Figs 1g, 2a-c). Mutational signatures, possibly reflecting biochemical cellular processes, have previously been investigated for many, mainly adult, cancers 15 . In this paediatric cohort (WGS, n = 503), we found evidence for major contributions of 16 out of 30 published signatures and also identified one new signature 15 (Fig. 2, Extended Data Fig. 2a, Supplementary Table 4). This 'signature P1' , which is distinct from any previously documented signatures and harbours elevated C > T mutations in a CCC/CCT context, occurred in several atypical teratoid rhabdoid tumours (ATRTs) and one ependymoma (Fig. 2, Extended Data Fig. 2d, Supplementary Table 5). Its activity correlated with 'multiple nucleotide variants' (MNVs; R = 0.87, P = 1.1 × 10 −12 ), but no particular loci or genes were mutually altered in the affected tumours (Extended Data Fig. 2d). Notably, all ATRTs with signature P1 were in the recently defined subgroup 'SHH' , and even within one proposed methylation subset of these 17

Germline variants in cancer predisposition genes
A recent study of more than 1,000 patients estimated that about 8% of children with cancer harbour a hereditary predisposition 5 . Accordingly, in our cohort (n = 914 individual patients, about 25% of samples overlapping with the previous study), 7.6% of samples were determined as being likely to be associated with a pathogenic germline variant 5,19 (162 genes investigated; Supplementary Tables 6, 7). No general ageof-onset bias was observed in patients with a predisposition; however, onset was later in germline MMR-deficient patients (P = 0.0001), even within the high-grade glioma sub-cohort (P = 0.001).
Most germline variants were related to DNA repair genes from mismatch (MSH2, MSH6, PMS2) and double-stranded break (TP53, BRCA2, CHEK2) repair (Fig. 3b, c). Both groups are clinically relevant: patients with constitutional MMR deficiency could be candidates for immune checkpoint inhibition 13 (Figs 1, 3b, c). Carriers of TP53 germline mutations (Li-Fraumeni syndrome), here most common in adrenocortical carcinomas, hypodiploid B-ALL, SHH medulloblastomas, and K27wt high-grade gliomas, are at a 50% risk for early-onset cancer compared to 1% overall, and are susceptible to treatmentinduced secondary oncogenesis 2,20-22 (Fig. 3b). Correcting the predisposition frequency of 7.6% in this cohort for the relative incidence of  article reSearcH cancer types as a whole, we find that approximately 6% of all childhood cancer patients may carry a causative germline variant (Fig. 3d).

Significance analysis identifies cancer driver genes
Genome-wide analysis for significant mutation clusters (n = 538, WGS excluding hypermutators) identified non-coding mutations in the TERT promoter in 2.5% of tumours (Extended Data Fig. 4a, b, Supplementary Table 8). Further high-confidence clusters corresponded to coding mutations in frequently mutated genes (TP53, H3F3A, CTNNB1), and to localized hypermutation at the rearranged MYC locus in Burkitt's lymphoma, while the bulk were classified as likely technical artefacts 23 (Extended Data Fig. 4b).
MuSiC identified 77 significantly mutated genes (SMGs), which were ranked according to their pan-cancer mutation frequency 24 (Fig. 4, Supplementary Tables 9, 10). Most SMGs were mutually exclusively mutated across cancer types, demonstrating specificity of single putative driver genes in childhood cancers as compared to more frequent co-mutation in adult cancers in the TCGA study 7 (Extended Data Fig. 4c-e). None of the SMGs showed a bias towards samples with higher mutation frequencies. The allele frequencies of mutations in SMGs were higher than in non-SMGs, and ranked higher in individual tumours, suggesting an early clonal occurrence of these likely driver events (Extended Data Fig. 4f). Two additional SMGs emerged from analysis of the relapse tumours (n = 82): PRPS1 and NT5C2, both of which have been previously implicated in disease progression and chemotherapy resistance 25,26 (Extended Data Fig. 4g).
Genes linked to epigenetic modification emerged as the most common (25% of tumours, 23 of 24 cancer types) and the largest (20%) group of SMGs (Extended Data Fig. 5a). Compared to a previous study 6 , for example, we also detected ARID1A and BCOR. Transcriptional regulators and MAP-kinase-associated genes accounted for 12-15% of SMGs. TP53 was the only DNA repair gene among somatic SMGs, in contrast to the multiple DNA repair-related germline mutations, and also in contrast to adult cancers (9% of SMGs, TCGA) 7 . PI3Kassociated SMGs are the most commonly altered (31%) genes in adult cancers, compared to only 3% in paediatric cancers, which could be related to their often late occurrence in the evolution of multi-hit adult cancers 27 (Extended Data Fig. 5a).
Forty-seven per cent of paediatric tumours harboured at least one SMG mutation, with most tumours (57%) having only one. SMG mutations were rare (< 15%) in ependymomas, hepatoblastomas, Ewing's sarcomas (driven by EWSR1 fusions instead of by point mutations 28 ), and pilocytic astrocytomas, and common (> 90%) in K27M highgrade gliomas, WNT medulloblastomas, and Burkitt's lymphomas. By contrast, 93% of adult cancers harbour at least one mutation in an (adult cancer-related) SMG and 76% in multiple SMGs 7 (Extended Data Fig. 5b). In line with the accompanying paediatric pan-cancer study 9 , only around 30% of paediatric SMGs overlapped with adult SMGs (Extended Data Fig. 5c). On the basis of incidence-normalized mutation frequencies, TP53 is predicted to be the most common somati cally mutated gene (4% of childhood tumours), followed by KRAS, ATRX, NF1, and RB1 (1-2% of tumours); in adult cancers, with similarly normalized data, TP53 is also the most commonly mutated gene, albeit ten times more frequently (Extended Data Fig. 5d).
Thirty-four regions recurrently altered by copy-number changes (17 amplified, 17 deleted) were identified using GISTIC2.0 (WGS, n = 516) 36 ; candidate driver genes were assigned to each based on known cancer genes and literature review ( Recurrently amplified regions contained known oncogenes, including MYC, MYCN, or GLI2, with 11 regions involving highlevel amplifications (at least 5-fold gain) (Extended Data Fig. 8b). Further interesting regions included 17q11.2 with 61 genes, containing NCOR1 as a potential candidate, and a region on 12q24.31 near (~ 0.1 Mb) the proposed oncogene KDM2B 37,38 . Recurrently deleted regions were predominantly associated with epigenetic or cell cycle regulators, most commonly TP53, PTEN, SETD2, and CDKN2A or CDKN2B. Further potential tumour suppressors included RAD51D on 17q12 and FOXF1 on 16q24.1, with significant loss across the cohort 39 .
As evidenced by recurrent structural variation outside genes (based on breakpoint clusters in 10-kb windows), rearrangements linked to enhancer hijacking were also found, involving GFI1B and DDX31 in medulloblastomas and TERT in neuroblastomas 40,41 . Together with genes directly affected by breakpoints, in total 70 structural variant-related putative cancer genes were found, many associated with cell cycle or growth (for example, the tumour suppressor PTPRD) or epigenetic regulators (such as SUZ12) 42,43 (Extended Data Fig. 8c, Supplementary Tables 18, 19). Cancer type-specific events that  Table 21). Fifty-five per cent of tumours were exclusive to one class, 27% were mixed but dominated by one type of LFE, 8% were ambiguous, and 10% had no LFEs (which may be of particular interest in assessing other tumour-driving events at the epigenetic or transcriptomic level). Germline MMR mutations were enriched in the M-class, and germline TP53 mutations in the SC-class (P = 0.0003 and P = 0.05, respectively, Fisher's exact test; Extended Data Fig. 10c). Individual cancer types displayed varying relative distributions of mutation classes (Extended Data Fig. 10d).

Drug targets in childhood cancers
To assess the status of druggability of childhood cancers, the cohort (n = 675 with full genomic information; WES-only, n = 39; see Methods) was screened for potentially druggable events 19 (PDEs, that is, alterations in 179 genes with a directly or indirectly targeted treatment currently available or under development; Supplementary Table 22). This analysis revealed 453 PDEs in 59 genes, including 3% germline events (Supplementary Table 23). Most cancer types had tumours with PDEs related to both M-and SC-type (Fig. 6a). Most commonly, PDEs occurred in Burkitt's lymphomas and pilocytic astrocytomas, while none were detected in ependymomas or hepatoblastomas (although the latter lacked information regarding CNVs or structural variants). Associated pathways included RTK/MAPK signalling, transcriptional regulation, cell cycle control, and DNA repair (Fig. 6a).
When the data are normalized for relative cancer incidence, 52% of all primary paediatric tumours may harbour a PDE (Fig. 6b); this might be an underestimate, given that some structural variants may not have been detected by this approach (for example, the common MYC translocations in Burkitt's lymphoma) 23 . After incidence adjustment, MAPK signalling and cell cycle control were most commonly affected. Notably, the PDEs often varied between primary and relapse tumours from one patient (n = 41): only 37% of primary tumours with PDEs retained these upon progression, while most of them partially or completely gained or lost events. This highlights the need for profiling of the current tumour when considering personalized therapy.

Discussion
Our analysis of this pan-cancer compendium outlines the landscape of genomic alterations across multiple childhood cancer types. Although some alteration types and rarer entities are still under-represented and significance analyses are probably limited, this dataset of nearly 1,000 tumours (which can be explored at http://www.pedpancan. com) provides an unprecedented data resource for paediatric cancer research, further complemented by the accompanying pan-cancer study 9 (https://pecan.stjude.org/proteinpaint/study/pan-target). The multiple differences found compared to previous studies of adult tumours emphasize the need to consider paediatric cancers separately, further demonstrating a need for mechanism-of-action driven drug development for paediatric indications 47 . The predicted frequency of pathogenic germline variants in 6% of patients, together with previous findings, demonstrates the relevance of genetic predisposition in childhood cancer 5 . Germline TP53 variants, which are clinically highly important, are estimated for 1.5% of children with cancer, and for more than 10% within individual cancer types. Genetic counselling should thus be systematically considered, particularly for patients with indicated high-risk entities.
Although stratified targeted treatment is currently incorporated only rarely into first-line therapy for paediatric cancer patients, our finding that nearly 50% of primary childhood tumours harbour a potentially targetable genetic event is encouraging. It also highlights the need for personalized profiling for each patient, both to increase diagnostic accuracy and to exploit the potential for potentially more effective and less harmful precision therapies. This may also transcend the direct targeting of genes or pathways, for example, through immune checkpoint inhibition in hypermutated tumours 13 or through PARP inhibition in genomically unstable ('BRCAness') tumours 48 . It is hoped that ongoing personalized medicine approaches for patients at relapse will give initial information on the use and effectiveness of such targeted drugs (for example, in the clinical trials pedMATCH-NCT03155620; eSMART-NCT02813135; INFORM 19 ). Additional longitudinal monitoring, for example using serial liquid biopsies, may further improve our understanding of tumour biology and the development of resistance mechanisms, and shed light on therapeutic challenges such as tumour heterogeneity.
In summary, this multi-faceted pan-cancer analysis provides a valuable resource for assessing genomic alterations across the spectrum of paediatric tumours. While there are undoubtedly more discoveries to come in terms of expanded cohorts and whole-genome and transcriptome analysis, we believe that this study provides a strong basis for functional follow-up and investigation of potential therapeutic targets in this specific patient population.
Online Content Methods, along with any additional Extended Data display items and Source Data, are available in the online version of the paper; references unique to these sections appear only in the online paper.   Somatic structural variant discovery. Somatic structural variant discovery was pursued across all whole-genome sequenced samples (high-quality structural variants available for n = 539 primary tumours) using the DELLY ICGC Pan-Cancer analysis workflow (https://github.com/ICGC-TCGA-PanCancer/ pcawg_delly_workflow) 85 . A high-stringency structural variant set was obtained by additionally filtering somatic structural variants detected in 1% or more of a set of 1,105 germline samples from healthy individuals belonging to phase I of the 1000 Genomes Project and by removing somatic structural variants present in any of the paediatric germline samples of this study 86 . High-stringency structural variants were further required to have at least four supporting read pairs with a minimum mapping quality of 20 and were restricted to somatic structural variant sizes from 300 bp to 500 Mb. Copy-number calling. Copy numbers were estimated using ACEseq (allelespecific copy-number estimation from sequencing) (K. Kleinheinz et al., unpublished data), using a binned tumour-control coverage ratio and B-allele frequency (BAF). Allele frequencies were obtained for all single nucleotide polymorphism (SNP) positions recorded in dbSNP version 135 82 . To improve sensitivity with regard to imbalanced and balanced regions, SNP positions in the control were phased with impute2 87 . Additionally, the coverage for 10-kb windows with sufficient mapping quality and read density was recorded and subsequently corrected for GC content and replication timing.
The genome was segmented using the PSCBS package incorporating structural variant breakpoints defined by DELLY 88,89 . Segments were clustered based on coverage ratio and BAF using k-means and neighbouring segments in the same cluster were joined; focal segments (< 9 Mb) were stitched to the more similar neighbour. Tumour cell content and ploidy were estimated by testing how well different combinations of both explain the data. Segments with balanced BAF were assigned to even-numbered copy-number states, whereas unbalanced segments were allowed to match with uneven numbers as well. Finally, estimated tumour cell content and ploidy were used to compute the total and allele-specific copy-number for each segment. High-quality copy-number calls were available for n = 516 of the WGS samples. Mutation statistics. The frequency of somatic mutations in coding regions was determined for each sample individually by normalizing the total number of coding mutations for the number of sufficiently covered (≥ 6× ) coding bases to account (determined using MuSiC-bmr) for different data types (WGS/WES) and for different exome target enrichment kits 24 . Mutation spectra were obtained by categorizing observed SNVs into base substitution types in pyrimidine context. Spearman's rank correlation test was applied to infer correlations between different types of mutation counts or between mutation counts and age. Generalized linear models were used to fit regression lines. Clusters of localized hypermutation were identified using a previously presented approach adjusted for mutation rates in human paediatric cancers 90 . Deciphering mutation signatures. Exome-sequenced tumours, except for hypermutator cases, were excluded from signature analysis owing to their low numbers of mutations. In brief, signatures are represented as probability distributions of substitution types of SNVs in pyrimidine context. Considering the immediate sequence context of each SNV, this results in 96 possible mutation types with directly adjacent mutations (multiple nucleotide variants, MNVs) being excluded, which are counted per tumour to compile its mutational profile.
As proposed by Alexandrov et al. 91 , the mutational profile of a tumour is expected to reflect a superposition of mutational processes (signatures) acting on its genome, where each mutational process has a different intensity (exposure). For a cohort of tumour genomes, this is modelled as a system of matrices for signatures (P) and exposures (E) defining the observed mutational catalogue (M) 91 : De novo deciphering of signatures was done as described 91 based on the mutational catalogues of all cancer types and of the pan-cancer cohort. All resulting signatures were compared to published signatures (available in the COSMIC database, http://cancer.sanger.ac.uk/cosmic/signatures) based on their cosine similarity 15 . Signatures that did not correspond to any of the previously known signatures (cosine similarity < 0.85) were further analysed to examine their relevance for modelling the cancer genomes. First, linear independence from the known set of signatures was confirmed. Second, for each potentially novel signature, we examined whether the modelling of mutation profiles improved when compared to having used the set of known signatures: for each sample, the observed mutational profile was compared to the theoretical profiles calculated using the set of known signatures only, and using the extended set including the new candidate signature. Here, only samples with a total number of mutations over 200 were considered. Reconstruction was calculated as the difference between cosine similarity of the modelled profile and the observed profile. On the basis of the resulting distribution of similarities in both alternatives, a signature was considered to have a relevant contribution to the model, and thus a potential new signature, article reSearcH if both of the following conditions were fulfilled: the reconstruction (measured as the difference of similarities) of at least one sample increased by 0.02 and that sample had a reconstruction accuracy of < 0.9 based on the known set of signatures only.
This procedure resulted in one new candidate signature, signature P1, which was added to the set of reference signatures. In order to achieve maximum resolution per sample, a sample-wise re-extraction of exposures from the mutational profiles was performed using quadratic programming with the reference signature set used for P and the exposures in E as unknown variables. Samples with a reconstruction accuracy below 0.5 were excluded (resulting in n = 503 tumours with high-quality signature information), as these samples would not be correctly accounted for by the model, which might be due to quality issues or to contributions of unknown signatures that are not present at intensities sufficient to be identified by a de novo approach. The resulting exposures were used for further downstream analyses and visualization. Previously published signatures without validation were first included to model the mutational catalogues as precisely as possible, but then summarized as 'other' for representation.
Spearman's rank correlation and two-sided Kolmogorov-Smirnov tests were used to associate exposure of signatures with numerical and categorical variables, respectively. Exposures to signatures across multiple groups were compared using ANOVA and the post hoc Tukey's test. Identifying mutations in genes predisposing to cancers. To identify germline variants with a high likelihood of being implicated in cancer development, we investigated 162 candidate genes adapted from ref. 19 (110 genes regarded as following a dominant inheritance pattern and 52 genes with recessive inheritance) (Supplementary Table 6).
Germline SNVs and indels were subjected to a stepwise filtering approach to eventually classify them into five categories: benign, likely benign, uncertain significance, likely pathogenic, and pathogenic. First, variants reported in both the 1000 Genomes (release November 2010) and dbSNP (v.141) databases were excluded. High-quality variant calls were selected by including only positions with ≥ 15× coverage, a germline allele frequency of ≥ 0.2, and a phred-based quality score of ≥ 10. Variants with a population frequency ≥ 0.01 reported in additional common databases (esp6500siv2, X1000g2015, and exac03 included in ANNOVAR (http://annovar.openbioinformatics.org)) or with ClinVar (ftp://ftp. ncbi.nlm.nih.gov/pub/clinvar/) annotations of 'benign' , 'likely benign' or 'uncertain significance' were removed.
Furthermore, variants with a phred-scaled CADD score ≥ 15 (http://cadd. gs.washington.edu/info) and with Mutation Assessor (http://mutationassessor. org/r3/) categories 'medium' and 'high' , or no available annotation, were included. Variants with a dbSNP classification of 'precious' were not subject to these two filtering steps. As indel calling is more prone to alignment and calling errors, potentially deleterious indels were manually investigated for artefacts. For recessive tumour genes, variants were included only with an allele frequency of one or with two compound heterozygous mutations of the same gene in the same patient. In total, the filtering steps narrowed down the number of potentially pathogenic mutations to n = 433. Every variant was then manually checked and scored by the use of varied, mainly gene-specific online databases (http://p53.iarc. fr/, http://www.lovd.nl/3.0/home, https://www.ncbi.nlm.nih.gov/clinvar/, and others). Only likely pathogenic and pathogenic mutations were considered as cancer-relevant and used for representation in Fig. 3. Additionally, whole-genome sequenced samples were manually screened for copy-number losses in 13 tumour suppressor genes of the candidate list, which are known to occasionally harbour germline focal deletions (MLH1, MSH2, MSH6, NF1, PMS2, PRKAR1A, PTCH1,  PTEN, RB1, SMARCA4, SMARCB1, SUFU, TP53). Detecting genome-wide mutation clusters. To identify genomic regions with single or clusters of recurrent mutations, the human genome was binned into non-overlapping windows of various sizes (50-500 bp) and compared the observed mutations to a background model (V. A. Rudneva et al., unpublished data) which was estimated using the 'global' model: the genome was stratified into 25 evenly sized groups of genomic windows based on the combined vector of five genetic and epigenetic features (replication timing, gene expression level, GC content, H3K9me3, and open versus closed chromatin conformation). For each region an enrichment score, binomial P value, and negative binomial test P value were computed.
Cross-validations were used to determine the significance cut-off that would provide reproducible results (with samples segregated by subgroup). A combination of the window size (500 bp), test statistics (enrichment score, mutational recurrence, binomial test P value, and gamma Poisson test P value), and a cut-off value that ensured high precision and recall values based on the precision-recall analysis (P = 10 −20 ) were chosen (Extended Data Fig. 4a). Recall was calculated as the number of regions that satisfied the cut-off in results obtained on both halves of the dataset; precision was calculated as a fraction of the recalled regions to the total number of regions that satisfied the cut-off in each of the datasets. The chosen parameters were then used to run the pipeline on the complete dataset and then the mutations in the resulting regions were further examined manually for potential false positives in order to identify high-confidence candidate regions (Extended Data Fig. 4b).
Significantly mutated genes. Significantly mutated genes based on somatic SNVs and indels were identified with the SMG module of the MuSiC tools suite 24 separately from all cancer types and from the pan-cancer cohort, and then merged.
This kind of significance analysis often produces false positive hits (for example, very large genes), despite normalization procedures, and thus several filters were applied to the raw output 30 . First, all genes of > 30,000 bp exonic length or > 10,000 bp with additional replication timing > 800 were excluded (Cancer Cell Line Encyclopedia; CCLE) 92 . Genes that scored significant in three or more cancer types, or that were recurrently mutated at the same position, were manually inspected for artefacts from ambiguous alignments (for example, repetitive sequence regions). Also, genes that are probably not associated with tumour development but rather represent non-neoplastic somatic hypermutation processes in the context of immune function were removed. Furthermore, genes mutated in < 2% of the cohort were included only if they had a secondary signal from either functional impact or from localized clustering bias (Intogen modules OncodriveFM and OncodriveClust v. 3.0 beta) or from being among known cancer genes 29,93 . Mutation needle plots were generated using MutationMapper 94 . Biological processes were assigned to the significantly mutated genes mostly exclusively, except for a few genes with high relevance for multiple processes, as specified in Supplementary Table 9. Genome instability. Occurrence of chromothripsis was determined by manual inspection of coverage ratio plots (tumour/control) for WGS samples based on previously proposed guidelines 95 : at least ten copy-number switches on one chromosome, oscillating copy-number variation (usually with changes of + 1 or − 1, but also between other levels where additional large-scale copy-number changes interfere), and many more of such copy-number variations in one chromosome or chromosome arm compared to the remaining genome. In samples with an exceptionally high degree of structural variation, several chromosomes could be affected, and some samples showed an 'amplifier' type of chromothripsis, which was classified as several high-level focal amplifications on exactly the same copy-number level that are thus likely to be connected to one single event. Generation of copy-number profiles. Copy-number calls reported by ACEseq were converted to the 'SEG' segmentation format, similar to the output of the circular binary segmentation algorithm based on chromosomal segment borders as pseudo marker positions 96 . All possible marker positions were determined from the whole cohort before assessing sample-wise copy-number profiles per marker in order to achieve identical resolution for all samples. Owing to sparse and highly oscillating sequencing coverage at centromeres, centromeric coordinates (± 3 Mb around the centre of annotated centromeres) were excluded from whole-genome segmentation, as were two likely artefact regions on chromosomes 7 and 14 with nonspecific occurrences of relative copy-number gains and losses in 28% and 30% of all analysed samples in 17 of 19 entities (14q11.2, 7p14.1), which were identified using GISTIC2.0 (as described below) with ± 1 Mb. Identifying recurrent copy-number/structural variations. GISTIC2.0 (v.2.0.22, gene-gistic default parameter settings) was applied to the segmented copy-number data (per cancer type and pan-cancer) to identify significant copy-number alterations 36 . The resulting peaks were filtered for significance (q ≤ 0.1) and size (≤ 10 Mb). Compared to array-based data, which commonly serve as inputs for copy-number significance analysis, sequencing-based copy-number profiles are more prone to artefact copy-number variations, for example, due to repetitive regions leading to ambiguous alignments. Thus, several filtering steps were used to eliminate false-positive GISTIC peak calls and to discover potentially cancerrelevant copy-number alterations: first, peaks overlapping with common fragile genomic sites were excluded, as these are likely to be consequences of genomic instability rather than cancer-driving events 97 ; next, peaks overlapping within 1 Mb of chromosomal ends were removed, as here sequencing coverage tends to vary frequently; and last, peaks overlapping with copy-number variable regions 98 (regions ranked 1-100) were excluded. Additionally, some of the resulting peaks were classified as 'passengers' of variable regions that were called as separated peaks from most likely one event, for example, a peak with MYCNOS as passenger peak of MYCN amplification. For overlapping peaks called in multiple entities and/or pan-cancer, the final region was determined based on the analysis with highest significance for each peak, respectively.
Genes with a breakpoint inside the gene borders were assumed to be altered by structural variation and considered as recurrently altered if they had breakpoints in ≥ 5 samples in total or in ≥ 2 samples of one cancer type (for samples without chromothripsis). For other samples, genes with breakpoints in ≥ 5 samples were included as candidates, but these were not used for further downstream analyses. Additionally, recurrent sites of structural variation outside of gene bodies by clustering breakpoints were determined in 10-kb windows.

article reSearcH
Scoring of druggable mutations. To identify candidates for targeted therapy, somatic and germline mutations (SNV and indels) were screened for variants in genes that are directly or indirectly involved in pathways with matched drugs either approved or currently being investigated in clinical trials (Supplementary Table 22a,  adapted from ref. 19). The mutations were then manually assessed by experts in translational oncology and prioritized according to an internal algorithm taking into account the type of alteration, the mechanism of action of potential drugs within the pathway, the level of evidence for the specific alteration, and its role in the present cancer type (Supplementary Table 22b Additionally, copy-number plots of whole-genome-sequenced data (including low-coverage WGS) were used to manually screen 52 druggable genes for amplifications or deletions (Supplementary Table 22a). Only focal CNVs (< 10 Mb) with at least 5 copies (log 2 ≥ 1.3) in the case of amplifications or the loss of ≥ 1 copy (log 2 ≤ − 1) for deletions were included and subsequently prioritized as described for the SNVs/indels. The data representation includes all tumours with full genomic information (WES + lcWGS or WGS; n = 675) and, additionally, tumours analysed by WES only for cancer types without any whole-genome-sequenced tumours (T-ALL, Ewing's sarcoma, HB; n = 39), but the latter were excluded from downstream analyses. Signatures with contributions of ≥ 5% in at least one cancer type are shown. The colour intensity reflects the relative activity of each signature per cancer type. b, Correlation of signature 1 with patient age per cancer type in this paediatric pan-cancer cohort (left, n = 503) compared to results from a global pan-cancer study on 30 cancer types (n = 7,042) 15   a, Genome-wide copy-number profiles normalized for tumour ploidy (n = 516). Cancer types are sorted by genome instability (Fig. 5a). Regions or genes with significant CNVs are indicated (blue, deleted; red, gained or amplified) (Fig. 5b). b, Relative copy-number status (normalized for tumour ploidy to baseline 1) for regions with significant copy-number changes (top, gains or amplifications; bottom, deletions) in n = 516 tumours. Thresholds (amplified: ≥ 1.4, deleted: ≤ 0.6) are based on the overall copy-number distribution indicated on the right. c, Genes affected by breakpoints from structural variants and additional genes associated with clustered breakpoints (in square brackets). Samples are divided into sub-cohorts of tumours with (bottom, n = 73) and without (top, n = 455) chromothripsis. Genes overlapping (direct overlap or within ± 200 kb) with genes with significant copy-number changes from a (blue, deletions; red, amplifications). Tumours with more than 50% (mixed) or 100% (unique) events from one category are considered to be members of the associated class; tumours with equal contributions from both categories are 'ambiguous' , and tumours without an LFE are assigned class 'none' (not shown). Colours indicate germline mutations per tumour. d, Fraction of tumours assigned to different classes per cancer type.
For all figures and tables that use statistical methods, confirm that the following items are present in relevant figure legends (or in the Methods section if additional space is needed).

n/a Confirmed
The exact sample size (n) for each experimental group/condition, given as a discrete number and unit of measurement (animals, litters, cultures, etc.) A description of how samples were collected, noting whether measurements were taken from distinct samples or whether the same sample was measured repeatedly A statement indicating how many times each experiment was replicated The statistical test(s) used and whether they are one-or two-sided (note: only common tests should be described solely by name; more complex techniques should be described in the Methods section) A description of any assumptions or corrections, such as an adjustment for multiple comparisons The test results (e.g. P values) given as exact values whenever possible and with confidence intervals noted A clear description of statistics including central tendency (e.g. median, mean) and variation (e.g. standard deviation, interquartile range)

Clearly defined error bars
See the web collection on statistics for biologists for further resources and guidance.

Software
Policy information about availability of computer code

Software
Describe the software used to analyze the data in this study.
All software used is described in the methods section. Publicly available software included: sambamba, SamToFastq, bwa-mem, samtools, platypus, delly, R, ACEseq, impute2, genome music, gistic2.0 For manuscripts utilizing custom algorithms or software that are central to the paper but not yet described in the published literature, software must be made available to editors and reviewers upon request. We strongly encourage code deposition in a community repository (e.g. GitHub). Nature Methods guidance for providing algorithms and software for publication provides further information on this topic.

Materials and reagents
Policy information about availability of materials 8. Materials availability Indicate whether there are restrictions on availability of unique materials or if these materials are only available for distribution by a for-profit company.
No unique materials were used.

Antibodies
Describe the antibodies used and how they were validated for use in the system under study (i.e. assay and species).
No antibodies were used.