Neuroblastoma arises in early fetal development and its evolutionary duration predicts outcome

Körber, Verena; Stainczyk, Sabine A.; Kurilov, Roma; Henrich, Kai-Oliver; Hero, Barbara; Brors, Benedikt; Westermann, Frank; Höfer, Thomas

doi:10.1038/s41588-023-01332-y

Download PDF

Article
Open access
Published: 27 March 2023

Neuroblastoma arises in early fetal development and its evolutionary duration predicts outcome

Nature Genetics volume 55, pages 619–630 (2023)Cite this article

14k Accesses
16 Citations
112 Altmetric
Metrics details

Subjects

Abstract

Neuroblastoma, the most frequent solid tumor in infants, shows very diverse outcomes from spontaneous regression to fatal disease. When these different tumors originate and how they evolve are not known. Here we quantify the somatic evolution of neuroblastoma by deep whole-genome sequencing, molecular clock analysis and population-genetic modeling in a comprehensive cohort covering all subtypes. We find that tumors across the entire clinical spectrum begin to develop via aberrant mitoses as early as the first trimester of pregnancy. Neuroblastomas with favorable prognosis expand clonally after short evolution, whereas aggressive neuroblastomas show prolonged evolution during which they acquire telomere maintenance mechanisms. The initial aneuploidization events condition subsequent evolution, with aggressive neuroblastoma exhibiting early genomic instability. We find in the discovery cohort (n = 100), and validate in an independent cohort (n = 86), that the duration of evolution is an accurate predictor of outcome. Thus, insight into neuroblastoma evolution may prospectively guide treatment decisions.

Pan-neuroblastoma analysis reveals age- and signature-associated driver alterations

Article Open access 14 October 2020

Single-cell RNA-seq reveals that glioblastoma recapitulates a normal neurodevelopmental hierarchy

Article Open access 08 July 2020

Life histories of myeloproliferative neoplasms inferred from phylogenies

Article 20 January 2022

Main

Cancers result from the accumulation of oncogenic mutations¹. Insights into tumor evolution—and, particularly, the temporal order of driver mutations—are beginning to support diagnosis and treatment². Initially, age–incidence curves were used to estimate the number of rate-limiting mutations in carcinogenesis^3,4. Recently, the order of driver mutations has been inferred from genome sequencing data^5,6,7,8,9, and mathematical approaches based on population genetics have been used to reconstruct the clonal evolution of cancers from the statistics of somatic variants^2,10. Deep whole-genome sequencing (WGS) data are suited for such integrative analyses, because the allele frequencies of neutral somatic variants that hitchhike with driver mutations provide rich information on how drivers shape tumor growth^7,11,12,13.

Tumors of early childhood provide a paradigm for cancer evolution in the context of development. A key question is how driver mutations subvert the normal development of the tissue of origin^14,15. The most common solid tumor in infants, neuroblastoma, arises in the sympathetic nervous system. A striking feature of neuroblastoma is the wide spectrum of clinical outcomes, ranging from low-risk cases requiring light or no treatment to high-risk disease that remains fatal for ~50% of patients¹⁶. Characteristic mutations, including MYCN amplification, gain of telomere maintenance mechanisms (TMMs) and gains (17q) or losses (1p, 11q) of chromosomal segments, have been associated with high-risk disease¹⁷. Nevertheless, the prospective stratification of patients into observation and different treatment groups remains a formidable challenge. In this Article, we study how neuroblastomas originate in development and evolve genetically and ask whether this understanding can provide insights into disease severity and outcomes.

Results

Mutation patterns in a comprehensive neuroblastoma cohort

We assembled a discovery cohort of deep (~80×) WGS data of 100 neuroblastomas (Supplementary Table 1), covering all clinical stages of the disease according to the International Neuroblastoma Staging System (Fig. 1a). Sixty-seven samples were derived from initial diagnoses (including seven from metastases) and 33 from relapsed tumors. Median age at primary diagnosis was 0.7 years for stages 1, 2 and 4S, 3.5 years for stage 3 and 3.9 years for stage 4 (Extended Data Fig. 1a).

**Fig. 1: Molecular subtypes and mutation spectrum of the discovery cohort.**

Median tumor purity was high (88%), which allowed for reliable estimation of tumor ploidy (Fig. 1b) as confirmed by direct measurement of DNA index (Extended Data Fig. 1b). Tumors had baseline ploidies between two and four, with 55, 33 and 12 samples being near-diploid, near-triploid and near-tetraploid, respectively. Each ploidy class contained tumors of all stages (Extended Data Fig. 1c). Relative to baseline ploidy, we detected in 96% of the tumors characteristic segmental (1q and 17q) or whole-chromosome (2, 7 and 17) gains, as well as segmental (1p, 11q) or whole-chromosome (11) losses (Extended Data Fig. 1d), all of which are probable drivers of neuroblastoma^{18,19,20,21,22,23}. Based on tumor cell content and ploidy, the vast majority of these gains and losses were clonal and hence were present in the most recent common ancestor cell (MRCA) of the tumor (Fig. 1c). These data point to aneuploidy as an early feature of neuroblastoma.

The majority of tumors (69%) combined chromosomal gains or losses with candidate driver mutations²³ at smaller genomic scale (Fig. 1d and Supplementary Tables 2–5), including focal gene amplifications (for example, MYCN, CDK4, ALK), structural rearrangements (for example, in TERT and ATRX), large deletions (for example, CDKN2A, ATRX), small insertions/deletions (indels; for example, in ATRX and NF1) and somatic single-nucleotide variants (SSNVs which, among other genes, occurred in ALK and those coding for components of MAPK pathways: HRAS, KRAS, NRAS and BRAF). Most commonly, these additional oncogenic drivers support telomere maintenance by alternative lengthening of telomeres (ALT), MYCN amplification or rearrangement in the TERT locus, which, collectively, are a molecular predictor of poor outcome^24,25,26.

In contrast to driver mutations, neutral SSNVs and small indels are continuously accumulated and contain information on tumor evolution. Analysis of mutational signatures assigned the majority of SSNVs to clock-like signatures (SBS1, SBS5, SBS40). Overall the next most abundant signature was SBS18, associated with reactive oxygen species²⁷ and frequently found in neuroblastoma²⁸, followed by SBS3, associated with failure of homologous recombination in dividing cells (Fig. 1e). These signatures overall occurred at similar frequencies clonally and in subclones (Extended Data Fig. 1e).

Neutral SSNVs distinguish sequential driver events

To understand when and how neuroblastomas evolve, we used neutral SSNVs as a molecular clock to time key events: (1) the emergence of the MRCA, from which the clonal sweep that defines the resected tumor emerges, and (2) the acquisition of clonal chromosomal gains (which may have occurred before or coincident with the MRCA). For each tumor, we computed the frequency distribution of somatic variants on all genomic segments of a given copy number. An example tumor, classified as near-triploid (Fig. 2a), had dominant copy numbers 2 and 3 and a smaller portion of segments at copy number 4—a typical configuration in neuroblastoma. On each copy number, we found a large clonal peak (Fig. 2b) comprising mutations occurring on a single chromosomal copy in each tumor cell. In addition, on copy numbers >2, we detected clonal mutations present on the two (and in some cases more) copies of a multiplied chromosome. Collectively, both single- and multiple-copy clonal SSNVs characterize the MRCA (Fig. 2c). Indeed, the density of clonal SSNVs was similar for the different copy numbers (Extended Data Fig. 2a), thus timing the MRCA by means of a molecular clock (Fig. 2d). To evaluate a potentially biasing effect of partial tumor sampling, which may erroneously cause a subset of subclonal mutations to be classified as clonal, we analyzed pairs of primary and relapse samples and found that the vast majority (85 ± 5%) of clonal SSNVs were present in both samples, indicating that sampling did not introduce a strong bias. Nevertheless, to estimate MRCA density conservatively we performed all subsequent computations with the measured densities corrected by a factor of 0.85. In the example tumor, the mean SSNV density of the MRCA was one SSNV per 5 million base pairs (bp).

**Fig. 2: Timing of chromosomal gains using SSNVs.**

Next, we asked whether clonal chromosomal gains occurred coincident with, or ancestral to, the MRCA. To this end, we quantified SSNVs that were identical and clonal on two copies on trisomic and tetrasomic segments (termed amplified clonal SSNVs⁶; Fig. 2b). These mutations were acquired before a gain on the respective allele (Fig. 2c, dark green), and hence the number of these mutations correlates positively with the time at which the chromosomal gain occurred^6,29. To make this timing measure independent of segment length, we use SSNV density. We compared these densities to those at the MRCA based on a negative binomial distribution of SSNVs across the genome (Methods). In the example tumor, nearly all gains had a mean density of one amplified clonal SSNV per 100 Mbp, which is significantly smaller (adjusted P < 0.01) than that of the MRCA (Fig. 2e). Hence, the molecular clock places these gains ancestral to the MRCA in an early common ancestor (ECA) of the tumor. The only exceptions were gains of Chr. 9 and 20q, which had a mutation density consistent with that of the MRCA and hence occurred later than the ECA. On those segments gained ancestral to the MRCA, the densities of amplified clonal SSNVs were statistically indistinguishable (Fig. 2e). Hence, near-triploidization occurred early and as a temporally confined event during the development of this tumor. Thereafter, further genetic evolution occurred before the clonal sweep from the MRCA commenced (Fig. 2f).

Two evolutionary classes of neuroblastoma

We timed the MRCAs for all tumor samples, and the ECAs associated with chromosomal gains. First, we validated these timing approaches by comparison of mutation densities in samples taken following primary diagnosis with those of relapsed neuroblastomas. As expected, timing of the ECA, defining the putative early origin of tumorigenesis, was conserved in all sample types (Extended Data Fig. 2b). By contrast, the timing of the MRCA, defining the origin of a tumor sample, was significantly later in relapsed tumors, consistent with a bottleneck imposed by incomplete tumor resection or cytotoxic therapy (Supplementary Table 1) and, eventually, regrowth from a small number of surviving cells (Extended Data Fig. 2b,c). Interestingly, the MRCAs of metastases resected after initial diagnosis had SSNV densities indistinguishable from those of the primary tumors (Extended Data Fig. 2b,c), suggesting that metastases had originated around the time when the tumor started to grow from its MRCA.

To infer the early evolution of tumors up to the MRCA, we focused on samples taken at initial diagnosis (primary tumors and metastases, n = 67 in the discovery cohort). Remarkably, mutation densities of the MRCA showed a bimodal distribution (Fig. 3a). This finding suggests that there are two classes of neuroblastoma, one where growth of the resected tumor commenced early (early-MRCA neuroblastoma) and another where it occurred later (late-MRCA neuroblastoma); for mutational signatures, see Extended Data Fig. 2d,e. Of note, although we estimated the mutation density of the MRCA conservatively (Methods), the dichotomy between early and late MRCA emerged robustly.

**Fig. 3: Early- versus late-MRCA neuroblastoma.**

These data raise the question of whether late-MRCA tumors began to develop later or developed early and evolved for a longer period of time. To address this question in terms of the molecular clock, we analyzed SSNV densities on chromosomal/segmental gains in both tumor classes that defined an ECA. All 20 early-MRCA tumors had such gains (with 70% showing near-triploidization, 35% harboring segmental gains and 25% displaying both features; Fig. 3b). The respective mutation densities timing chromosomal gains and MRCA were statistically indistinguishable in the vast majority (90%) of cases (Extended Data Fig. 3a shows an example), suggesting that aneuploidy was not followed by acquisition of further clonal drivers in these tumors. In the remaining 10% of tumors, separate ECAs and MRCAs were timed but these were very close in regard to mutation density (0.04 SSNVs per Mb between ECA and MRCA; Extended Data Fig. 3b;), implying that they were temporally close events. Indeed, 95% of early-MRCA tumors lacked additional small-scale drivers while a single tumor harbored a MYCN amplification and a mutation in ALK. Hence early-MRCA tumors appear to be driven predominantly by aneuploidization.

By contrast, in the majority (55%) of the 47 late-MRCA tumors, we distinguished two well-separated events, ECA and MRCA (Fig. 3c, left), with average distance 0.24 ± 0.1 SSNVs per Mb. In about one-third of these cases, a single early near-triplodization event defined an ECA (Fig. 2); whereas, in the remaining two-thirds, smaller-scale chromosomal gains defined the ECA (Extended Data Fig. 3c). Hence, the majority of late-MRCA tumors showed a signature of early gains followed by further genetic evolution to the MRCA. In the remaining 45% of late-MRCA cases, we could not reliably time a separate ECA (for example, if very short fragments were gained or if gains were found at copy number >4) (Fig. 3c (right) and Extended Data Fig. 3d). Hence, for these tumors we cannot time an early genetic event which, however, leaves open the possibility that small-scale mutations, chromosomal losses or high-level amplifications (four or more copies), neither of which can be timed reliably, preceded the MRCA of the tumor. Consistent with this hypothesis, the mutation densities of the MRCA in tumors without timeable ECA were nearly as high as in those with two distinguishable events (Extended Data Fig. 3e). Moreover, near-tetraploid tumors without timeable ECA showed evidence of sequential events. Here, we found a 2:0 allelic configuration in 11 of 12 near-tetraploid tumors with 1p loss, suggesting that 1p deletion preceded genome doubling. Overall, the ECA of late-MRCA tumors had mutation density indistinguishable from the MRCAs of early-MRCA tumors (Fig. 3d), suggesting that the majority of late-MRCA tumors acquired aneuploidy as early as the early-MRCA tumors.

Finally, we analyzed the timing of gains in genome-wide events: near-triploidization of the genome or genome doubling. In all such tumors, the individual gains involved showed statistically indistinguishable timing, which is consistent with near-triploidization and genome doubling occurring as single catastrophic events (Extended Data Fig. 3f).

In sum, we find that the MRCAs in primary neuroblastoma fell into two evolutionary groups: in early-MRCA tumors, near-triploidization of the genome and/or gains of chromosomal arms or whole chromosomes coincided with the MRCA. Similarly, in more than one-half of late-MRCA tumors, an early ECA was defined by whole-chromosomal or arm-level gains. Hence, these late-MRCA neuroblastomas originated at a time similar to early-MRCA examples but then showed prolonged genetic evolution.

Long evolution of neuroblastoma predicts unfavorable outcome

The early-MRCA class contained cases of stages 1, 2 and 4 S, typically having a prognosis, but only one case of stage 4 (Fig. 3a). Hence, we asked whether MRCA timing predicts outcome. Remarkably, early MRCA timing clearly identified cases with long event-free survival (Fig. 4a) and long overall survival (Fig. 4b). To further test this idea, we assembled an independent cohort of primary tumors and metastases, which was enriched for tumors with WGS at the customary coverage of 30× (n = 86; Supplementary Tables 6–10)^22,25. We evaluated the timing of ECA and MRCA in each sample as described for the discovery cohort. The mutation density of the MRCA showed the same bimodal pattern as in the discovery cohort (Fig. 4c). Moreover, the three main patterns of ECA and MRCA occurrence were identical to the discovery cohort: coincidence of chromosomal/segmental gains and MRCA in the early-MRCA class (Fig. 4d; compare with Fig. 3b); in the late-MRCA class, chromosomal/segmental gains defining an early ECA that substantially preceded either a late MRCA (Fig. 4e, left; compare with Fig. 3c, left) or a late MRCA without timeable ECA (Fig. 4e, right; compare with Fig. 3c, right). Thus, the validation cohort corroborates the scenarios of ECA and MRCA timing found in the discovery cohort.

**Fig. 4: Timing of the MRCA predicts outcome.**

In the validation cohort, MRCA timing was an accurate predictor of both event-free and overall survival (Fig. 5a,b). Merging the two cohorts (n = 152 primary tumors and metastases, excluding one case lacking survival data), we confirmed this result (Fig. 5c,d). To compare MRCA timing with other predictors of survival, we considered clinical variables used worldwide (stage, age), gain of a TMM (which improves on the clinically used criterion, MYCN amplification²⁶) and a more recently proposed molecular predictor, the mutation status of the RAS/p53 pathway²⁵. Early MRCA timing emerged as the most informative predictor of event-free survival (Fig. 5); overall survival was best explained by both MRCA timing and mutations in the RAS/p53 pathway (Extended Data Fig. 4). In sum, survival analyses suggest that extended evolution up to the founding cell of the primary tumor predicts unfavorable outcome.

Genomic instability and telomere maintenance in late-MRCA tumors

Given that both early- and late-MRCA neuroblastomas begin to develop in the same time window (compare with Fig. 3d) but show markedly different durations of evolution and clinical outcome, we asked whether late-MRCA tumors have evolved further than early-MRCA examples simply by chance, or whether there are molecular factors that predispose to ongoing evolution in the late-MRCA class. Characteristic oncogenic events (chromosomal or segmental gains or losses, TMMs) and standard prognostic features (stage and age at diagnosis) showed strong separation between the two classes of neuroblastoma (Fig. 6a). Early-MRCA tumors had predominantly whole-chromosome aneuploidy (for example, chromosomes 17 and 7), whereas arm-level aneuploidy (including gains of 17q, 7q and 1q, as well as loss of 11q and 1p) was prevalent in the late-MRCA group. In late-MRCA tumors with timeable ECA, the vast majority of segmental gains that we could time were coincident with the ECA (Figs. 3c and 4e, dark green squares). Taken together, different processes underlie the early acquisition of aneuploidy in the two classes: mis-segregation of entire chromosomes in early-MRCA tumors vis-à-vis genomic instability in late-MRCA tumors (Fig. 6a).

**Fig. 6: Early genomic instability is associated with unfavorable outcome.**

Acquired TMMs were strongly enriched in the late-MRCA class (Fig. 6a); 12% of early-MRCA tumors but 81% of late-MRCA tumors gained telomere maintenance—via MYCN amplification, ALT or TERT rearrangement—in the combined discovery and validation cohorts (Fig. 6b). Due to their small size, these structural rearrangements cannot be timed using mutation densities and hence may have been acquired before the emergence of arm-level aneuploidy, together with aneuploidy or subsequently. Early timing of arm-level aneuploidy in more than one-half of the late-MRCA cases (that is, tumors with an ECA; left-hand panels in Figs. 3c and 4e) suggests that telomere maintenance was gained during a secondary event between ECA and MRCA in these cases, which mainly included ALT and TERT rearrangement. In the remaining cases, where arm-level aneuploidy was timed at the MRCA and no ECA was identifiable (right-hand panels in Figs. 3c and 4e), telomere maintenance could also have been acquired in an oncogenic event preceding the MRCA. Interestingly, those cases with MYCN amplification fell predominantly within this latter group (Fisher’s exact test, P = 0.006055, odds ratio = 3.9 (1.4, 11.6)), suggesting that MYCN amplification tends to occur earlier than ALT or TERT rearrangement. Indeed, MYCN-amplified tumors generally had an earlier MRCA than tumors with ALT or TERT rearrangement (Fig. 6c).

Collectively, these findings establish a link between genetic evolution of neuroblastoma and the observation that neuroblastomas with extensive arm-level aneuploidy tend to carry a poor prognosis³⁰. The typically late-emerging MRCA in these cases indicates that the underlying genomic instability predisposes these tumors to prolonged evolution, including the acquisition of TMMs.

Chromosomal gains occur during sympathetic neurogenesis

Finally, we asked whether our genetic insights into neuroblastoma development could be synthesized into an integrative model of tumor evolution. To this end, we devised population-genetic models and quantified key parameters by fitting the models to our measured data, including the time-dependent incidences of ECA and MRCA and the variant allele frequency (VAF) distribution of somatic variants. In addition, we required the models to reproduce the overall incidence of the disease in the human population. For this reason, we focused on neuroblastomas with poor prognosis (‘high-risk’, enriched in the late-MRCA class), which occur in around one in 10⁵ children (the frequency of low-risk cases, enriched in the early-MRCA class, is not reliably known due to incomplete diagnosis^31,32,33,34).

The models describe a proliferative population of putative cells of origin that are lost by either differentiation or cell death (Fig. 7). On average, µ SSNVs occur per cell division. A rare subset of mutations (including chromosomal gains/losses and smaller-scale events, such as SSNVs or localized MYCN amplification) will be oncogenic drivers and give rise to selected cell clones. As a minimal requirement for the evolution of a high-risk tumor, we accounted for two oncogenic events, defining ECA and MRCA, with mutation frequencies µ₁ and µ₂ per cell division. To determine these parameters, the model was fit to the experimental data using approximate Bayesian computation. Initially, we assumed that cells of origin are generated during fetal development and then remain available for neuroblastoma development. This simple model consistently overestimated the overall incidence of high-risk disease (Fig. 8a). By contrast, a model in which putative cells of origin were available for only a limited time window (Fig. 8b) accurately reproduced both the overall incidence and dynamics of the emergence of ECA and MRCA (Fig. 8c–e and Extended Data Fig. 5a).

**Fig. 7: Population genetics models of tumor initiation and growth.**

**Fig. 8: Evolutionary dynamics of neuroblastoma initiation.**

The inferred rate of SSNV acquisition (µ) was 3.2 ± 0.4 SSNVs per day, consistent with measurements of somatic mutation rate in the developing central nervous system (5.1 (1.5, 9.0; 95% confidence interval) SSNVs per day)³⁵. The model further inferred that oncogenic driver events occurred on average once per 1 million cell divisions (geometric mean of µ₁ and µ₂), which is consistent with a global estimate of driver mutation rate of 3.4 × 10^–5 per division in the human genome¹³, given that only a subset of all drivers will cause neuroblastoma. We used SSNV rate to calibrate the molecular clock against real time, which placed the ECA within the first trimester of pregnancy (Fig. 8c), at which time rapidly dividing sympathetic neuroblasts are developing³⁶. This first hit (ECA) sustains a subclone in which the MRCA emerges due to a further oncogenic event (Extended Data Fig. 5b) which, in the majority of cases, occurred within the first year of life (Fig. 8d). The model fit of Fig. 8 was based on all SSNVs, because our data (Extended Data Figs. 1e and 2d) suggest that mutational processes may not change markedly during neuroblastoma evolution. To test the robustness of our inference, we also performed all analyses with the subset of clock-like SSNVs as input. The inferred rate of clock-like SSNV acquisition (µ) was 2.3 (1.2, 2.3; 80% credible interval) per division (corresponding to 2.2 ± 0.3 SSNVs per day). All other inferred parameters remained practically unchanged (Extended Data Fig. 5c,d). Hence, confining the analysis to clock-like mutations corroborates the real-time calibration of ECA and MRCA.

A further insight afforded by the population-genetic model is the extent of cell loss in the growing tumor. In general, only a subset of cell lineages will support growth by continued symmetric self-renewal of malignant cells whereas other lineages will terminate by either cell differentiation into nonproliferating states or cell death. We inferred the ratio of self-amplifying tumor cell divisions among all divisions from the subclonal tail of VAF distribution (Methods and ref. ³⁷), finding that only ~10% of tumor cell divisions result in growth of late-MRCA neuroblastomas that acquired TMMs (Fig. 8f,g). This inference is consistent with the clinical observation of extensive cell death in such neuroblastomas. Moreover, the average cell division rate was lower in neuroblastomas with ALT than in those with MYCN amplification or TERT rearrangement (Fig. 8h), again in line with clinical observation; our estimated cell division rates agree quantitatively with those measured in neuroblastoma in vivo³⁸. The fraction of self-renewing cell divisions supporting tumor growth was inferred from VAF distribution also for tumors without acquired TMMs, all falling within the early-MRCA class, yielding a higher value of ~30% (Fig. 8g). Hence, cell loss appears to be a stronger selective pressure for late-MRCA neuroblastomas, the majority of which gain TMMs, than for early-MRCA neuroblastomas, which rarely acquire such mechanisms.

Discussion

In this paper we timed genetic events in the evolution of neuroblastoma using the molecular clock of SSNV accumulation and, inferring the rate of SSNV acquisition from the distribution of VAFs, related this clock to real time by factoring in the age at diagnosis. In two-thirds of cases we find that chromosomal gains implicated in the pathogenesis of the disease occurred early, and typically within the first trimester of pregnancy. With respect to further evolution, these cases fall into two distinct classes: in the early-MRCA class the early chromosomal gain event also marked clonal outgrowth of the resected tumor whereas in the late-MRCA class the tumors evolved further before clonal outgrowth commenced. Remarkably, in our cohort MRCA class is an accurate predictor of clinical outcome. This is true regardless of whether we could time an early chromosomal gain, and implies that neuroblastomas with a longer evolutionary history are more aggressive. Because the strong association between MRCA timing and outcome was also present with 30× WGS, the utility of this predictor for patient stratification may be tested in clinical trials.

Our real-time inference shows that neuroblastomas across the entire clinical spectrum acquired aneuploidy within the first trimester of pregnancy, when the adrenal medulla forms from sympathetic neuroblasts. Moreover, matching disease incidence with the population-genetic model suggests that the initial oncogenic event is limited to this time window. The transcriptomes of neuroblastomas most resemble those of sympathetic neuroblasts^15,36. In this early window, neuroblasts are highly proliferative, which may make them vulnerable to aneuploidy. Finally, the observation that aneuploidy via near-triploidization is a temporally confined event is consistent with the long-standing hypothesis that this karyotype results from endoreduplication of the genome followed by tripolar cell division and selection of the fittest daughter cell^39,40.

The molecular nature of early aneuploidy is associated with whether the tumor continues to evolve: neuroblastomas with whole-chromosome aneuploidy typically did not evolve further and were overall associated with favorable outcomes; in contrast, most tumors with early genomic instability had unfavorable outcomes. Continued evolution of such tumors has also been noted in a recent study taking multiregion biopsies⁴¹, emphasizing the potential of spatially resolved genetic and transcriptomic analysis^42,43. We did not detect specific drivers of genomic instability: in particular, p53 function was not impaired genetically. However, with prevalent 17q gains, reduced expression of TP53 (located on 17p) relative to driver genes expressed on 17q (for example, BIRC5, IGF2-BP1 and BRIP1) may favor genomic instability^44,45.

Accurate risk stratification in neuroblastoma remains a major concern. Our data suggest a link between diverse criteria—age at diagnosis, segmental versus whole-chromosome gains and losses and acquisition of TMM^25,46—based on how neuroblastomas evolve. We find that a greater age at diagnosis is often linked to longer evolution of the tumor rather than later origin. Paradoxically, this implies that low-risk tumors reach detectable size earlier than high-risk. Indeed, we infer that low-risk tumors have a substantially lower fraction of tumor cell loss than high-risk tumors and hence should grow faster. Acquired TMMs should, consequently, provide a larger selective advantage in the high-risk, late-MRCA group where they indeed are enriched. Hence, late-MRCA tumors may grow more rapidly only after gaining telomere maintenance (similar to IDH-wild-type glioblastomas⁷) and hence are diagnosed later. Interestingly, we detected five neuroblastomas with amplified MYCN in the early-MRCA class, and, remarkably, all of these patients survived until the end of the observation periods (1,447–4,177 days), in contrast to the poor survival of patients with MYCN-amplified tumors in the late-MRCA group. Our findings suggest that MRCA timing may be worth considering as a parameter for patient stratification.

Methods

Patient cohorts, tumor samples and ethical approval

Cohorts of primary and relapsed neuroblastoma tumors were retrospectively analyzed. Tumor material was collected as part of the diagnostic workflow of the German Neuroblastoma trial by the Society for Pediatric Oncology and Hematology and collected in the Neuroblastoma tumor bank. All trials were approved by the Ethics Committee of the Medical Faculty, University of Cologne, and collection and use of all tumor tissue material was approved (registry nos. NB97, NB2004 and NB2016). This study includes data of neuroblastoma tumors previously published in Hartlieb et al.²⁴ and Peifer et al.²⁶ The study by Hartlieb et al. also contains a subset of tumors from the St. Anna Kinderkrebsforschung at the Children’s Cancer Research Institute in Vienna, Austria, as well as tumors analyzed in the registry trial INFORM⁴⁷. The study office of the Neuroblastoma trial in Cologne provided clinical annotations and survival information. All patients or their legal guardian approved the use of tumor material by signed informed consent. For analysis, all resected tumors were divided into four quadrants, all of which were evaluated histologically. MYCN status was assessed as a routine clinical marker for all tumors using fluorescence in situ hybridization. A cross-sectional slice of one quadrant was used for DNA extraction for WGS; the same quadrant was used for ploidy analysis, measuring the DNA index. DNA was isolated using phenol chloroform extraction from fresh-frozen tumor material. Control DNA was isolated from whole blood using the NucleoSpin Blood DNA extraction kit (Macherey-Nagel) according to the manufacturers’ instructions. Details of the included samples are provided in Supplementary Tables 1 and 6.

WGS

WGS workflows for the previously published data are described in ref. ²⁴. For additional samples, high-coverage, WGS was performed on a patterned flowcell v.2.5 (150-bp, paired-end) with coverage of about 80× for the tumor and whole-blood control samples. All tumors had a histological tumor cell content of ≥60%. Sequencing libraries were prepared using the Truseq DNA Nano kit (Illumina) according to the manufacturers’ instructions, and size selected using SPRI beads (Beckman Coulter Genomics). Alignment and variant calling was done using the One Touch Pipeline service of the German Cancer Research Center (DKFZ)⁴⁸. Alignment was done using workflow v.1.2.73-1, available at Github (https://github.com/DKFZ-ODCF/AlignmentAndQCWorkflows). In brief, sequences were aligned to the 1000 Genomes project assembly with decoy and PhiX contigs using BWA-MEM v.0.7.15 with option ‘-T 0’. Merging and duplication marking were performed using Sambamba v.0.6.5, and bam files were filtered using Samtools v.0.1.19. Calling of SSNVs, somatic Indels, copy number variations and SVs was done using inhouse workflows, available at https://github.com/DKFZ-ODCF/SNVCallingWorkflow, https://github.com/DKFZ-ODCF/IndelCallingWorkflow, https://github.com/DKFZ-ODCF/ACEseqWorkflow and https://github.com/DKFZ-ODCF/SophiaWorkflow.

Estimates of tumor cell content were manually adjusted in one case (NBE40) following visual inspection of VAF distribution. Structural variants were excluded in the present study if they had a minimal event score of <5; focal amplifications were defined as regions with copy number ≥10; homozygous deletions were defined as regions with copy number <0.9. Deletions of (parts of) chromosomal arm 1p were defined as p-terminal regions lost relative to 1q and, moreover, with copy number ≤1 in near-diploid, ≤2 in near-triploid or ≤3 in near-tetraploid tumors. In analogy, we annotated deletions of chromosome 11q if the copy number was ≤1 in near-diploid, ≤2 in near-triploid or ≤3 in near-tetraploid tumors, and if 11q was lost relative to 11p. For gains on chromosomes 1, 2, 7 and 17 we required the copy number to be higher than the (rounded) basal ploidy of the tumor. Partial gains on 1q, 2p, 7q and 17q were defined as regions on the respective chromosomal arm that were gained relative to the other chromosomal arm.

Mutational signatures

Mutational signatures were learned de novo and thereafter decomposed into Cosmic mutational signatures v.3.1 (ref. ⁴⁹) (http://cancer.sanger.ac.uk/cosmic/signatures) using SigProfilerExtractor (v.1.1.1)⁵⁰. Only signatures contributing to ≥5% of the mutations in at least one sample and, in addition, identified in at least 10% of samples, were considered. The contributions of these signatures to each sample were then re-estimated using the R package mmsig v.0.0.0.9000 (ref. ⁵¹) (setting strandbias=F, bootstrap=F, cos_sim_threshold=0.01, force_include=c(“SBS1”, “SBS5”)). For visualization, signatures SBS1, SBS5 and SBS40 were combined into a single, clock-like mutational signature.

Stratified analysis of clonal and subclonal mutations was performed by classifying mutations as subclonal if$\mathop {\sum}\nolimits_{k = 0}^{n_{{{{\mathrm{var}}}}}} {\left( {\begin{array}{*{20}{c}} {n_{{{{\mathrm{var}}}}} + n_{{{{\mathrm{ref}}}}}} \\ k \end{array}} \right)} p^k\left( {1 - p} \right)^{n_{{{{\mathrm{var}}}}} + n_{{{{\mathrm{ref}}}}}} < 0.05;p = \frac{\rho }{{\rho {{{\mathrm{CN}}}} + 2\left( {1 - \rho } \right)}}$, where ρ is estimated tumor cell content, n_var and n_ref are the number of variant and reference reads, respectively, and CN is copy number.

Mutation timing

Estimation of numbers of amplified and non-amplified clonal mutations

We estimated mutation densities at partial and entire chromosomal gains, and at the MRCA of the tumor from the distribution of VAFs among clonal mutations. To this end we counted clonal mutations separately on each autosome, stratified by copy number (CN) state. Regions lacking a CN estimate were assigned to a specific CN state if the measured coverage ratio (CR) and B-allele frequency (BAF) matched the expected ratios within measurement error. Specifically, we required for each segment i, $\left[ {\frac{{{\mathrm{CN}}_i\rho + 2(1 - \rho )}}{{\pi \rho + 2(1 - \rho )}} - 0.1} \right] \le {{{\mathrm{CR}}}}_i \le \left[ {\frac{{{\mathrm{CN}}_i\rho + 2\left( {1 - \rho } \right)}}{{\pi \rho + 2\left( {1 - \rho } \right)}} + 0.1} \right]$ and $\left[ {\frac{{b_i}}{{{{{\mathrm{CN}}}}_i}} - 0.05} \right] \le {{{\mathrm{BAF}}}}_i \le \left[ {\frac{{b_i}}{{{{{\mathrm{CN}}}}_i}} + 0.05} \right],$ where CN_i is the copy number of segment i, ρ the tumor cell content, π the average tumor ploidy and b the number of B-alleles. States with copy number >4 or of size ≤10⁷ bp were excluded from the analysis because the statistics become ambiguous for short pieces and high CN states.

On each retained segment we estimated the number of clonal mutations, distinguishing clonal mutations present on a single allele (‘non-amplified clonal mutations’) from those present on multiple alleles (‘amplified clonal mutations’). To this end we applied a statistical framework⁵², distinguishing amplified clonal mutations acquired before a clonal gain and thus present on all A-alleles, or on all B-alleles at a given CN state, from non-amplified clonal mutations acquired either on the non-amplified allele or after clonal gain but before the MRCA and hence present on a single copy; the remaining mutations are subclonal. Accordingly, we expect to find clonal mutations at ${{{\mathrm{VAFs}}}} \in \left\{ {\frac{1}{{{{{\mathrm{CN}}}}}},\frac{{{{{\mathrm{CN}}}} - b}}{{{{{\mathrm{CN}}}}}},\frac{b}{{{{{\mathrm{CN}}}}}}} \right\}$ in a pure sample, and at ${{{\mathrm{VAFs}}}}\, \in \left\{ {\frac{\rho }{\zeta },\frac{{({{{\mathrm{CN}}}} - b)\rho }}{\zeta },\frac{{b\rho }}{\zeta }} \right\}$ in an impure sample with tumor cell content ρ, where:

$$\zeta = \rho {{{\mathrm{CN}}}} + 2\left( {1 - \rho } \right)$$

(1)

is the average copy number of a given locus in the sample. Because measured VAFs are randomly distributed around their true values, we fit a binomial mixture model to the clonal mutations. To avoid misclassification of subclonal mutations as clonal, we neglected all variants with ${{{\mathrm{VAF}}}} < \frac{\rho }{\zeta }$. By symmetry, this cutoff retains half of the non-amplified clonal mutations, introducing a subsequent correction factor of 2 (equation (4)). Defining the weights $w = (w_1,w_{{{{\mathrm{CN}}}} - b},w_b)$, for the non-amplified clonal and amplified clonal mutation peak, respectively, we computed the probability of measuring $n_{{{{\mathrm{var}},\,j}}}$ variant reads and $n_{{{{\mathrm{ref,j}}}}}$ reference (nonvariant) reads at the position of the jth SSNV to:

$$P\left( {n_{{{{\mathrm{var,}}\,j}}},n_{{{{\mathrm{ref,}}\,j}}}|{{{\mathrm{CN}}}},\rho ,w} \right) = \mathop {\sum}\limits_{k \in \{ 1,{{{\mathrm{CN}}}} - b,b\} } {B\left( {n_{{{{\mathrm{var,}}\,j}}};n_{{{{\mathrm{var,}}\,j}}} + n_{{{{\mathrm{ref,}}\,j}}},\frac{{\rho k}}{\zeta }} \right)} P(k|w),$$

(2)

where $B\left( {n_{{{{\mathrm{var,}}\,j}}};n_{{{{\mathrm{var,}}i}}} + n_{{{{\mathrm{ref,}}j}}},\frac{{\rho k}}{\zeta }} \right)$ is the binomial probability for drawing $n_{{{{\mathrm{var}},\,j}}}$ variant reads at sequencing depth $n_{{{{\mathrm{var}},\,j}}} + n_{{{{\mathrm{ref}},\,j}}}$ from the peak comprising mutations that are clonal on k chromosomal copies. Defining a uniform prior probability, P(w), for the weights, we computed, up to normalization, the posterior probability as:

$$P(w|\{ n_{{{{\mathrm{var}},\,j}}},n_{{{{\mathrm{ref}},\,j}}}\} _j^N,{{{\mathrm{CN}}}},\rho ) \propto \mathop {\prod}\limits_{j = 1}^N P \left( {n_{{{{\mathrm{var}},\,j}}},n_{{{{\mathrm{ref}},\,j}}}|{{{\mathrm{CN}}}},\rho ,w} \right)P\left( w \right),$$

(3)

where N is the total number of SSNVs under consideration. Clonal mutations were then assigned to distinct clonal peaks according to the weights at maximum a posteriori probability (MAP), yielding, on segment l, estimates for the number of clonal mutations on non-amplified chromosomes, $n_{1,l}$, on amplified b alleles, $n_{b,l}$ (if b > 1) and on amplified a alleles (with copy number ${\mathrm{CN}} - b$), $n_{{\mathrm{CN}} - b,l}$, according to:

$$n_{1,l} = 2{{{\mathrm{MAP}}}}\left( {w_{1,l}} \right)N_l,n_{b,l} = 2{{{\mathrm{MAP}}}}\left( {w_{b,l}} \right)N_l,{{{\mathrm{and}}}}\,n_{{{{\mathrm{CN}}}} - b,l} = {{{\mathrm{MAP}}}}\left( {w_{{{{\mathrm{CN}}}} - b,l}} \right)N_l.$$

(4)

Mutation timing

Timing MRCA and ECA

Mutation densities (SSNVs per bp) at the MRCA were estimated from the number of non-amplified clonal mutations and total size of the analyzed genome, $g = \mathop {\sum}\nolimits_l {g_l}$ (ref. ⁵²). The number of mutations per copy that were acquired up to the MRCA is

$$n_l = \frac{{n_{1,l} + n_{{\mathrm{CN}} - b,l}({{{\mathrm{CN}}}}_l - b_l) + n_{b,l}b_l}}{{{{{\mathrm{CN}}}}_l}}.$$

(5)

If tumor samples were well mixed, or tumors completely sampled, mutation densities at the MRCA could be directly estimated as n_l/g. In practice, however, n_l may consist of a set of true clonal mutations, acquired before tumor growth, and an additional set of mutations that appear as clonal in the tumor sample due to incomplete sampling. To correct for the latter, false-positive (FP), clonal mutations we compared primary and relapse samples from two such pairs available in our dataset (NBE11/NBE66, NBE51/NBE78). The fraction of conservative clonal mutations in the primary sample that remained undetected in the relapse sample was 15 ± 5% and was taken as the fraction of FP. With this correction, the mutation count, $m_{{{{\mathrm{MRCA}}}}}$, and mutation density, $\tilde m_{{{{\mathrm{MRCA}}}}}$, at MRCA were estimated as:

$$m_{{{{\mathrm{MRCA}}}}} = \mathop {\sum}\limits_l {n_l} \left( {1 - {{{\mathrm{FP}}}}} \right)\,{{{\mathrm{and}}}}\,\tilde m_{{{{\mathrm{MRCA}}}}} = \frac{{m_{{{{\mathrm{MRCA}}}}}}}{g},$$

(6)

respectively. Lower and upper 95% confidence bounds for $\tilde m_{{{{\mathrm{MRCA}}}}}$ were estimated by bootstrapping the genomic segments 1,000 times.

Next, we tested for each gained segment whether amplified clonal mutations were significantly less frequent than expected at the MRCA and, accordingly, assigned the segment to either the MRCA or an earlier time point (in the majority of cases the ECA; here we excluded a small number of segments in seven tumors with a higher density of amplified clonal mutations than the estimated mutation density at the MRCA, because such gains may be subclonal CN alterations erroneously classified as clonal). To this end, we modeled the number of mutations falling on each genomic segment with a negative binomial distribution, which accounts for overdispersion caused by heterogeneous mutation rates along the genome⁵³. The probability that the gain of genomic segment l coincided with the MRCA is then computed as:

$$P\left( {n_{k,l;k \in \{ b,{{{\mathrm{CN}}}} - b\} }|g_l,g,m_{{{{\mathrm{MRCA}}}}}} \right) = \mathop {\sum}\limits_{r = 0}^{n_{k,l}} {\left( {\begin{array}{*{20}{c}} {m_{{{{\mathrm{MRCA}}}}} + r - 1} \\ {m_{{{{\mathrm{MRCA}}}}}} \end{array}} \right)} p^r\left( {1 - p} \right)^{m_{{{{\mathrm{MRCA}}}}}},$$

(7)

where:

$p = \frac{{m_{{{{\mathrm{MRCA}}}}}}}{{m_{{{{\mathrm{MRCA}}}}}\left( {1 + \frac{{g_l}}{g}} \right)}}.$

Here, $n_{k,l;k \in \{ b,{{{\mathrm{CN}}}} - b\} }$ is the number of amplified clonal mutations on segment l and g_l is the respective segment size. We corrected the P values obtained with equation (7) for multiple testing using Holm correction (false discovery rate ≤ 0.01) and, accordingly, assigned each segment to either the MRCA or an earlier time point.

Finally, we computed the mutation densities at ECAs from the segments with significance level α = 0.01 to:

$$\tilde m_{{{{\mathrm{ECA}}}}} = \frac{{\mathop {\sum}\nolimits_{l,p_{{{{\mathrm{adj}}}},l} \le 0.01} {n_{b,l}} + n_{{{{\mathrm{CN}}}} - b,l}}}{{\mathop {\sum}\nolimits_{l,p_{{{{\mathrm{adj}}}},l} \le 0.01} {g_{l,b}} + g_{l,{{{\mathrm{CN}}}} - b}}}.$$

(8)

We then tested for each contributing segment whether its mutation load conformed to the ECA according to a negative binomial distribution (in analogy to equation (7)) and computed lower and upper 95% confidence bounds by bootstrapping, as before for the MRCA.

Mutation timing

Translation of mutation densities into estimated weeks p.c

We related mutation densities per haploid genome into weeks post conception (p.c.) by inferring SSNV rates per diploid genome and embryonic day (μλ) and mutation and division rates per day, using the measured VAF distributions and age at diagnosis as outlined in Real-time estimation of cell division rate (with similar results based on all SSNVs or only clock-like SSNVs). Because mutation calling was performed by comparing tumors against a matched blood control, mutation densities correlate with the time post gastrulation (at approximately 2 weeks p.c.). Thus, the mutation density per haploid genome, $\tilde m$, relates to the time p.c., t, according to $\tilde m\left( t \right) = \frac{{\mu \lambda }}{d}\frac{1}{{3.3 \times 10^9}}(t - 14\,{{{\mathrm{days}}}})$. The estimated time of birth was taken as 38 weeks after gastrulation (40 weeks p.c.).

Survival analysis

Survival analysis was performed using the R package survival v.3.1.12 (ref. ⁵⁴). We detected a clear bimodality in the histogram of SSNV density at the MRCA in the discovery cohort and took the upper bin border just before the minimum (0.05 SSNVs per Mb) as threshold to split tumors into groups of early or late MRCA. The same value was found appropriate in the validation cohort.

Modeling neuroblastoma initiation

Modeling emergence of the MRCA

We modeled neuroblastoma initiation by sequential acquisition of driver mutations in two oncogenic events, assuming that both events are infrequent and occurring in early neuroblasts with low probabilities, μ₁ and μ₂, during cell divisions. We denote the selective advantage conferred by the driver mutations acquired during the two events by r and s. Specifically, we assume that one or more first events generate a precancerous cell population in which a second event creates the MRCA. On this basis, we compute the time-dependent probability of the emergence of the MRCA. Neuroblasts initially expand rapidly, but how this cell population behaves subsequently, because sympathetic neurons differentiate from these precursors, is not known precisely. Hence, two possible scenarios were considered: (1) an initial phase of exponential growth of the neuroblast population is followed by a subsequent phase of differentiation, modeled by exponential decay; and (2) an initial phase of exponential growth is followed by a subsequent phase of precursor homeostasis (for a related model of a two-step process of tumorigenesis, but in a homeostatic tissue, see ref. ⁵⁵). The two phases are associated with distinct rates of cell division, λ₁ and λ₂, and loss, δ₁ and δ₂, where $\lambda _1 > \delta _1$ and $\lambda _2 \le \delta _2$ (with equality in the case of precursor homeostasis). Thus the population dynamics of neural precursor cells, N(t), are described by:

$$N\left( t \right) = \left\{ {\begin{array}{*{20}{l}} {{\mathrm{e}}^{\left( {\lambda _1 - \delta _1} \right)t},} \hfill & {0\le t \le T} \hfill \\ {{\mathrm{e}}^{\left( {\lambda _1 - \delta _1} \right)T}{\mathrm{e}}^{\left( {\lambda _2 - \delta _2} \right)(t - T)},} \hfill & {t > T,} \hfill \end{array}} \right.$$

(9)

where T denotes the time point at which the cell population peaks. Cells undergoing the first oncogenic event are generated at rates:

$$\begin{array}{*{20}{l}} {\mu _1\lambda _1N(t){\mathrm{d}}t,} \hfill & {0 \le t \le T} \hfill \\ {\mu _1\lambda _2N(t){\mathrm{d}}t,} \hfill & {t > T.} \hfill \end{array}$$

(10)

For simplicity, we take the selective advantage associated with the first oncogenic event into account during the contraction (or homeostasis) phase and neglect it during the initial rapid expansion; this approximation is appropriate when the selective advantage of the first event is comparatively small, which is subsequently borne out by the parameter inference. Thus, M₁ cells harboring the first mutation grow at rate λ₁ − δ₁ during tissue expansion and at rate $\lambda _2 - \delta _2/r$ during tissue contraction or homeostasis. Hence, depending on the actual value of r > 1, a clone harboring the first event may slowly shrink or expand for t > T. We now ask whether in this clone the second oncogenic event takes place that defines the MRCA of the tumor. The second event occurs at rate:

$$\begin{array}{*{20}{l}} {\mu _2\lambda _1M_1(t){\mathrm{d}}t,} \hfill & {0 \le t \le T} \hfill \\ {\mu _2\lambda _2M_1(t){\mathrm{d}}t,} \hfill & {t > T.} \hfill \end{array}$$

(11)

Using the survival probability of the supercritical birth–death process, we have a cell undergoing the second oncogenic event survive with probability:

$$\nu _{2,{{{\mathrm{E}}}}} = 1 - \frac{{\delta _1}}{{s\,\lambda _1}}$$

(12a)

during the expansion phase (E) and, provided that $\frac{{\delta _2}}{{s\lambda _2}} < 1$, with probability:

$$\nu _{2,{{{\mathrm{D}}}}} = 1 - \frac{{\delta _2}}{{s\lambda _2}}$$

(12b)

during the decay or homeostatic phase (D). We are interested in the probability that at least one surviving cell underwent both oncogenic events, $P_{{{{\mathrm{MRCA}}}}}$. There are three possible cases: (1) both oncogenic events occur during precursor expansion, associated with probability $P_{{{{\mathrm{MRCA}}}},{{{\mathrm{1}}}}}$; (2) the first oncogenic event occurs during precursor expansion, the second during precursor contraction or homeostasis, associated with probability $P_{{{{\mathrm{MRCA}}}},{{{\mathrm{2}}}}}$ ; and (3) both oncogenic events occur during precursor contraction or homeostasis, associated with probability $P_{{{{\mathrm{MRCA}}}},{{{\mathrm{3}}}}}$. For each case we assumed that the number of cells with only the first event, M₁, is small compared with the number of normal cells. We have:

$$P_{{{{\mathrm{MRCA}}}}} = \left\{ {\begin{array}{*{20}{l}} {P_{{{{\mathrm{MRCA}}}},{{{\mathrm{1}}}}},t < T} \hfill \\ {1 - \left( {1 - P_{{{{\mathrm{MRCA}}}},{{{\mathrm{1}}}}}} \right)\left( {1 - P_{{{{\mathrm{MRCA}}}},{{{\mathrm{2}}}}}} \right)\left( {1 - P_{{{{\mathrm{MRCA}}}},{{{\mathrm{3}}}}}} \right),t \ge T} \hfill \end{array}} \right..$$

(13)

We derive expressions for $P_{{{{\mathrm{MRCA}}}},{{{\mathrm{I}}}}}$, $P_{{{{\mathrm{MRCA}}}},{{{\mathrm{2}}}}}$ and $P_{{{{\mathrm{MRCA}}}},{{{\mathrm{3}}}}}$ in Supplementary Note 1a for the case of precursor expansion and decay, and in Supplementary Note 2a for the case of precursor expansion and homeostasis. If precursor expansion is followed by decay, we find:

$$P_{{{{\mathrm{MRCA}}}},{{{\mathrm{1}}}}}(t) = \mathop {\sum}\limits_{x = 1}^{N\left( t \right) - 1} {{\mathrm{e}}^{ - \mu _1\left( {x - 1} \right)}} \left( {1 - {\mathrm{e}}^{ - \mu _1}} \right)\left( {1 - {{{\mathrm{exp}}}}\left\{ { - \frac{{\mu _1\mu _2\lambda _1{\mathrm{TN}}(t)F}}{{1 - \frac{{\delta _1}}{{\lambda _1}}}}} \right\}} \right),$$

(14a)

$$P_{{{{\mathrm{MRCA}}}},{{{\mathrm{2}}}}}(t) = 1 - {{{\mathrm{exp}}}}\left( { - \frac{{\mu _1\mu _2\lambda _1\lambda _2\nu _{2,{{{\mathrm{D}}}}}T}}{{\lambda _2 - \delta _2/r}}N(T\,)\left\{ {{\mathrm{e}}^{\left( {\lambda _2 - \delta _2/r} \right)\left( {t - T} \right)} - 1} \right\}} \right),$$

(14b)

$$P_{{{{\mathrm{MRCA}}}},{{{\mathrm{3}}}}}(t) = 1 - {{{\mathrm{exp}}}}\left( { - \frac{{\mu _1\mu _2\lambda _2^2\nu _{2,{{{\mathrm{D}}}}}N\left( T \right)}}{{\delta _2\left( {\frac{1}{r} - 1} \right)}}\left\{ {\frac{{{\mathrm{e}}^{\left( {\lambda _2 - \delta _2} \right)\left( {t - T} \right)} - 1}}{{\lambda _2 - \delta _2}} - \frac{{{\mathrm{e}}^{\left( {\lambda _2 - \frac{{\delta _2}}{r}} \right)\left( {t - T} \right)} - 1}}{{\lambda _2 - \frac{{\delta _2}}{r}}}} \right\}} \right),$$

(14c)

where $F = {\int}_0^1 {\nu _{2,I}} /\left( {\nu _{2,I}z^{\,\alpha} } \right){\mathrm{d}}z$ and $\alpha = \frac{{\delta _1 - s\lambda _1}}{{\delta _1 - \lambda _1}}$. Alternatively, if precursor expansion is followed by homeostasis, $P_{{{{\mathrm{MRCA}}}},{{{\mathrm{2}}}}}$ and $P_{{{{\mathrm{MRCA}}}},{{{\mathrm{3}}}}}$ take the form (Supplementary Note 2a):

$$P_{{{{\mathrm{MRCA}}}},{{{\mathrm{2}}}}}(t) = 1 - {{{\mathrm{exp}}}}\left( { - \frac{{\mu _1\mu _2\lambda _1\nu _{2,{{{\mathrm{D}}}}}T}}{{1 - \frac{1}{r}}}N\left( T \right)\left\{ {{\mathrm{e}}^{\lambda _2\left( {1 - \frac{1}{r}} \right)\left( {t - T} \right)} - 1} \right\}} \right),$$

(14d)

$$P_{{{{\mathrm{MRCA}}}},{{{\mathrm{3}}}}}(t) = 1 - {{{\mathrm{exp}}}}\left( {\frac{{\mu _1\mu _2\lambda _2\nu _{2,{{{\mathrm{D}}}}}}}{{1 - \frac{1}{r}}}N\left( T \right)\left\{ {\frac{{1 - {\mathrm{e}}^{\lambda _2\left( {1 - \frac{1}{r}} \right)\left( {t - T} \right)}}}{{\lambda _2\left( {1 - \frac{1}{r}} \right)}} + t - T} \right\}} \right).$$

(14e)

Equations (14a–e) give the probability that a growing tumor clone has emerged up to time t for the distinct cases.

Modeling the ECA

The ECA is associated with the first oncogenic event, which may have occurred at variable time points before the MRCA. To time the ECA, we computed the conditional probability of having undergone the first oncogenic event before t₁, given that the second oncogenic event occurred at t₂, denoted by $P(t_1|t_2)$. The acquisition of the second oncogenic event is proportional to the number of cells with the first event M₁ (equation (11)). As denoted by $M_1\left( {t_2{{{\mathrm{|}}}}\tau } \right)$, the number of cells at t₂ resulting from a first event at $\tau .P(t_1|t_2)$ can hence be expressed as:

$$P\left( {t_1|t_2} \right) = \frac{{{\int}_0^{t_1} {M_1} \left( {t_2|\tau } \right){\mathrm{d}}\tau }}{{{\int}_0^{t_2} {M_1} \left( {t_2|\tau } \right){\mathrm{d}}\tau }} = \frac{{{\int}_0^{t_1} {M_1} \left( {t_2|\tau } \right){\mathrm{d}}\tau }}{{M_1(t_2)}}.$$

(15)

Distinguishing the three cases for the timing of the second event relative to the first, as before, we find for precursor expansion followed by decay for two cases (Supplementary Note 1b). When the second event occurs in the exponential growth phase, then:

$$P\left( {t_1|t_2} \right) = \frac{{t_1}}{{t_2}};t_1 < t_2 \le T.$$

(16a)

This corresponds to the classical Luria–Delbrück model. Recalling that t = 0 marks the beginning of neuroblast expansion, the first mutation happens, as expected, with uniform probability during exponential growth. When the second event occurs during the decay phase, the probability for the first event is uniform in the exponential phase and decreases in the decay phase according to:

$$\begin{array}{l}P\left( {t_1|t_2} \right) = \\\frac{{{{\Theta }}\left( {T - t_1} \right)\lambda _1\delta _2\left( {1 - \frac{1}{r}} \right)t_1 + {{\Theta }}\left( {t_1 - T} \right)\left[ {\lambda _2 + \lambda _1T\delta _2\left( {1 - 1/r} \right) - \lambda _2{\mathrm{e}}^{\delta _2(1 - 1/r)(T - t_1)}} \right]}}{{\lambda _2 + \lambda _1T\delta _2\left( {1 - 1/r} \right) - \lambda _2{\mathrm{e}}^{\delta _2(1 - 1/r)(T - t_2)}}};\\t_2 > T\end{array}$$

(16b)

where ${{\Theta }}\left( \cdot \right)$ is the Heavyside step function and, of course, $0 \le t_1 < t_2$. For precursor expansion followed by homeostasis, $P\left( {t_1|t_2} \right)$ reads (Supplementary Note 2b):

$$P\left( {t_1|t_2} \right) = \frac{{t_1}}{{t_2}};t_1 < t_2 \le T$$

(17a)

and:

$$\begin{array}{l}P\left( {t_1|t_2} \right) = \\\frac{{{{\Theta }}\left( {T - t_1} \right)\lambda _1t_1\left( {1 - 1/r} \right){\mathrm{e}}^{\lambda _2\left( {1 - \frac{1}{r}} \right)\left( {t_2 - T} \right)} + {{\Theta }}\left( {t_1 - T} \right)\left[ {\left( {\lambda _1T\left( {1 - \frac{1}{r}} \right) + 1} \right){\mathrm{e}}^{\lambda _2\left( {1 - \frac{1}{r}} \right)\left( {t_2 - T} \right)} - {\mathrm{e}}^{\lambda _2\left( {1 - \frac{1}{r}} \right)\left( {t_2 - t_1} \right)}} \right]}}{{\left( {\lambda _1T\left( {1 - \frac{1}{r}} \right) + 1} \right){\mathrm{e}}^{\lambda _2\left( {1 - \frac{1}{r}} \right)\left( {t_2 - T} \right)} - 1}};\\t_2 > T.\end{array}$$

(17b)

Relating model and data

To relate the model to the measured data, we translate time into mutation counts. Assuming that SSNVs are accumulated at a constant rate, μ, per cell division, the expected mutation count per cell is Poisson-distributed with rate μλt.

Parameter estimation

We estimated the following parameters: peak size of the neuroblast population (N), relative loss rates in the growth and decay phases ($\delta _1/\lambda _1$ and $\delta _2/\lambda _2$, respectively), rate of SSNVs (μ) and rates of first and second oncogenic events (μ₁ and μ₂) and their selective advantages (expressed as r and ν₂), using equations (9–17) and Approximate Bayesian Computation with Sequential Monte Carlo sampling (ABC–SMC) as implemented in pyABC⁵⁶. We used a population size of 1,000 parameter samples and prior probabilities as outlined in Supplementary Table 11. Evaluation was abrogated after 25 SMC generations, or if ε ≤ 0.05. We used mutation density estimates at MRCA and ECA of high-risk tumors (primary tumors and metastases) as determined by equations (6) and (8) as input data. Tetraploidization in tetraploid tumors was not included as ECA because there were probably earlier events, such as Chr. 17q gains. The experimental incidences were computed as $I_{{{{\mathrm{MRCA,ev,i}}}}} = {\sum} {\tilde m_{{{{\mathrm{MRCA,ev}}}}}} < i;i \in \tilde m_{{{{\mathrm{MRCA,ev}}}}}$, where the subscript ‘ev’ denotes the experimentally determined value. Uncertainties were estimated to ${{\Delta }}I_{{{{\mathrm{MRCA,ev,i}}}}} = {\sum} {\tilde m_{{{{\mathrm{MRCA,ev,}}}}{\mathrm{l}}}} < i - {\sum} {\tilde m_{{{{\mathrm{MRCA,ev,}}}}{\mathrm{u}}}} < i;i \in \tilde m_{{{{\mathrm{MRCA,ev}}}}}$, where $\tilde m_{{{{\mathrm{MRCA,ev,}}}}{\mathrm{l}}}$ and $\tilde m_{{{{\mathrm{MRCA,ev,}}}}{\mathrm{u}}}$ denote the lower and upper bounds of the 95% confidence interval of $\tilde m_{{{{\mathrm{MRCA,ev}}}}}$, respectively. Incidences and uncertainties of the ECA, $I_{{{{\mathrm{ECA,ev,i}}}}}$ were computed in analogy.

For each sampled parameter set we performed the following steps:

(1)
Sample for each tumor a time point of the MRCA, t₂. To this end, sample a uniform number x between 0 and 10^–5, thus accounting for the overall incidence of 10^–5. Then, from equation (9), determine the time point at which $P_t = x$. To facilitate numerical computation, we approximated the sum in equation (13a) with an integral. To exclude second hits that do not confer a selective advantage during expansion, we required $s = \max \left( {1,\frac{{\delta _1}}{{\lambda _1\left( {1 - \nu _{2,{{{\mathrm{E}}}}}} \right)}}} \right)$ and adjusted ν₂ accordingly.
(2)
Sample for each sampled time point t₂ a neutral mutation count from a Poisson distribution with mean μt₂ and determine mutation density by dividing with the haploid genome length of 3.3 × 10⁹, yielding $\tilde m_{{{{\mathrm{MRCA,sim}}}}}$.
(3)
Determine the simulated incidence of the MRCA at the experimentally determined mutation loads: $I_{{{{\mathrm{MRCA,sim,i}}}}} = {\sum} {\tilde m_{{{{\mathrm{MRCA,sim}}}}}} < i;i \in \tilde m_{{{{\mathrm{MRCA,ev}}}}}$.
(4)
Sample for each sampled t₂ a time point of the ECA, t₁. To this end, sample a uniform number x between 0 and 1; then, from equation (16) or (17) determine the time point at which $P\left( {t_1|t_2} \right) = x$.
(5)
Sample for each sampled t₂ a neutral mutation count from a Poisson distribution with mean μt₂ and divide by the haploid genome length of 3.3 × 10⁹, yielding $\tilde m_{{{{\mathrm{ECA,sim}}}}}$.
(6)
Determine the simulated incidence of the ECA at the experimentally determined mutation loads: $I_{{{{\mathrm{ECA,sim,i}}}}} = {\sum} {\tilde m_{{{{\mathrm{ECA,sim}}}}}} < i;i \in \tilde m_{{{{\mathrm{ECA,ev}}}}}$.
(7)
Determine the simulated incidence of the MRCA at age 10 years, $I_{{{{\mathrm{MRCA,sim,ten}}}}\,{{{\mathrm{years}}}}}$ using equation (9). This step was introduced to contrast the incidence at old ages with the clinically observed overall incidence of the order of 10⁻⁵, which we weighted by assuming an error of 10⁻⁴.

We computed the cost function, d, as

$$\begin{array}{l}d = \mathop {\sum}\limits_{i \in m_{{{{\mathrm{MRCA,ev}}}}}} {w_i} \left( {\frac{{\left( {I_{{{{\mathrm{MRCA,sim,i}}}}} - I_{{{{\mathrm{MRCA,ev,i}}}}}} \right)^2}}{{{{\Delta }}I_{{{{\mathrm{MRCA,ev,i}}}}}^2}}} \right) +\\ \frac{{\left( {I_{{{{\mathrm{ECA,sim,i}}}}} - I_{{{{\mathrm{ECA,ev,i}}}}}} \right)^2}}{{{{\Delta }}I_{{{{\mathrm{ECA,ev,i}}}}}^2}} + \frac{{I_{{{{\mathrm{MRCA,sim,ten}}}}\,{{{\mathrm{years}}}}} - 10^{ - 5}}}{{\left( {10^{-4}} \right)^2}}.\end{array}$$

(18)

To enforce good fits to the initial phase of the incidence curve for better comparison of the contraction and homeostasis models of neural precursor dynamics, we chose weights

$$w_i = \left\{ {\begin{array}{*{20}{c}} {10,} & {{\mathrm{if}}\,\tilde m_{{{{\mathrm{MRCA}}}}} \le 0.2/10^6} \\ {1,} & {{\mathrm{if}}\,\tilde m_{{{{\mathrm{MRCA}}}}} > 0.2/10^6} \end{array}} \right..$$

Ninety-five per cent posterior probability bounds for the model fits were estimated by simulating the model at each sampled parameter set and cutting off 2.5% at each end of the simulated distribution.

To assess the robustness of the model, we performed an additional parameter estimation on clock-like mutations only. To this end, we multiplied mutation densities at MRCA and ECA by the fraction of mutations generated by clock-like mutational signatures (SBS1, SBS5 and SBS40) and fit the model to the clock-like mutation densities.

Modeling mutation accumulation during tumor growth

Model

Denoting the rate at which tumor cells divide by λ_T and the loss rate by δ_T, we modeled the number of tumor cells, N_T, with an exponential growth model, $N_{\mathrm{T}}\left( t \right) = {\mathrm{e}}^{\left( {\lambda _{\mathrm{T}} - \delta _{\mathrm{T}}} \right)t}$. We assumed that some SSNVs are already present in the founder cell of the tumor and denote their number by $n_{{{{\mathrm{clonal}}}}}$ ; these mutations are clonally propagated to the entire tumor and will thus be found at frequency f = 1. Additional SSNVs are acquired during tumor growth, and we denote their number by $n_{{{{\mathrm{subclonal}}}}}$ ; these mutations are present in a subset of the tumor only and will thus be found at f < 1. The VAF distribution of a neuroblastoma is hence a superposition of clonal and subclonal mutations accumulated before and during tumor growth, respectively.

To model the number of subclonal mutations, we used a model of neutrally evolving tumors that accounts for the stochastic expansion of neutral subclones while assuming exponential expansion of the tumor mass overall⁵⁷. This model assumes that neutral mutations are acquired at all times at rate $\mu \lambda _{\mathrm{T}}N_{\mathrm{T}}\left( t \right)$ and drift stochastically according to a supercritical birth–death process, where⁵⁸:

$$P_{1,i}\left( {\lambda _{\mathrm{T}},\delta _{\mathrm{T}},t} \right) = \left\{ {\begin{array}{*{20}{l}} {\alpha (t),i = 0} \hfill \\ {(1 - \alpha (t))\left( {1 - \beta \left( t \right)} \right)\beta \left( t \right)^{i - 1},i \ge 1} \hfill \end{array}} \right.$$

(19)

where:

$$\alpha \left( t \right) = \frac{{\delta _{\mathrm{T}}\left( {{\mathrm{e}}^{\left( {\lambda _{\mathrm{T}} - \delta _{\mathrm{T}}} \right)t} - 1} \right)}}{{\lambda _{\mathrm{T}}{\mathrm{e}}^{\left( {\lambda _{\mathrm{T}} - \delta _{\mathrm{T}}} \right)t} - \delta _{\mathrm{T}}}},\beta \left( t \right) = \frac{{\lambda _{\mathrm{T}}\left( {{\mathrm{e}}^{\left( {\lambda _{\mathrm{T}} - \delta _{\mathrm{T}}} \right)t} - 1} \right)}}{{\lambda _{\mathrm{T}}{\mathrm{e}}^{\left( {\lambda _{\mathrm{T}} - \delta _{\mathrm{T}}} \right)t} - \delta _{\mathrm{T}}}}.$$

Together, this yields the site frequency spectrum, S(i, μ), of subclonal mutations:

$$S\left( {i,\mu } \right) = \mathop {\int}\limits_0^{t_{{{{\mathrm{end}}}}}} {P_{1,i}} (\lambda _{\mathrm{T}},\delta _{\mathrm{T}},t_{{{{\mathrm{end}}}}} - t)\mu \lambda _{\mathrm{T}}N_{\mathrm{T}}\left( t \right)dt.$$

(20)

The total number of subclonal mutations in a tumor is computed as;

$$n_{{{{\mathrm{subclonal}}}}} = \mathop {\sum}\limits_{i = 1}^{N_{\mathrm{T}}\left( {t_{{{{\mathrm{end}}}}}} \right)} {S\left( {i,\mu } \right)} = \mathop {\sum}\limits_{i = 1}^{N_{\mathrm{T}}\left( {t_{{{{\mathrm{end}}}}}} \right)} {\mathop {\int}\limits_0^{t_{{{{\mathrm{end}}}}}} {P_{1,i}} } (\lambda _{\mathrm{T}},\delta _{\mathrm{T}},t_{{{{\mathrm{end}}}}} - t)\mu \lambda _{\mathrm{T}}N_{\mathrm{T}}\left( t \right)dt,$$

(21)

where N_T(t_end) is the number of tumor cells at diagnosis. The number of mutations present in subclones of at least a and at most b cells is, in analogy to equation (21):

$$\mathop {\sum}\limits_{i = a}^b {S\left( {i,\mu } \right)} = \mathop {\sum}\limits_{i = a}^b {\mathop {\int}\limits_0^{t_{{{{\mathrm{end}}}}}} {P_{1,i}} } (\lambda _{\mathrm{T}},\delta _{\mathrm{T}},t_{{{{\mathrm{end}}}}} - t)\mu \lambda _{\mathrm{T}}N_{\mathrm{T}}\left( t \right)dt.$$

(22)

Because clone sizes a and b are large, we approximate the sum in equation (22) by an integral, yielding:

$$\begin{array}{l}\mathop {\sum}\limits_{i = a}^b S \left( {i,\mu } \right) \approx \mathop {\int}\limits_a^b {\mathop {\int}\limits_0^{t_{{{{\mathrm{end}}}}}} {P_{1,i}} } (\lambda _{\mathrm{T}},\delta _{\mathrm{T}},t_{{{{\mathrm{end}}}}} - t)\mu \lambda _{\mathrm{T}}N_{\mathrm{T}}\left( t \right)dtdi \\\qquad\qquad\ = \mathop {\int}\limits_0^{t_{{{{\mathrm{end}}}}}} \mu \lambda _{\mathrm{T}}N_{\mathrm{T}}\left( t \right)\frac{{P_{1,b}\left( {\lambda _{\mathrm{T}},\delta _{\mathrm{T}},t_{{{{\mathrm{end}}}}} - t} \right) - P_{1,a}\left( {\lambda _{\mathrm{T}},\delta _{\mathrm{T}},t_{{{{\mathrm{end}}}}} - t} \right)}}{{\log \beta \left( {t_{{{{\mathrm{end}}}}} - t} \right)}}dt. \end{array}$$

(23)

Relating model and data

To relate the model of mutation accumulation to the measured VAF distribution, we modeled mutations on each copy number state separately. To this end, we denote by $n_{f,k}$ the number of mutations at frequency f on genomic segments with copy number state k. Note that f reports the fraction of mutated cells among all tumor cells whereas VAFs report the fraction of mutated alleles in the tissue sample. The two quantities can be converted into each other using the following relation:

$${{{\mathrm{VAF}}}} = \frac{{f\rho }}{\zeta }.$$

(24)

where ζ is the average copy number of the sample as defined in equation (1). Clonal mutations are associated with frequency f = 1. The number of clonal mutations falling within genomic segments of copy number $k,n_{1,k}$, is expected to scale with g_k, the fraction of the genome at copy number k, according to

$$n_{1,k} = n_{{{{\mathrm{clonal}}}}}\frac{{kg_k}}{{\mathop {\sum}\nolimits_k {kg_k} }}.$$

(25)

To relate true with measured VAFs, we assumed that the latter are binomially distributed around the former according to

$$B\left( {\frac{{f\rho }}{\zeta },C_k} \right),$$

where $C_k \propto {{{\mathrm{Pois}}}}(\widehat {C_k})$ is the sequencing coverage on a segment with copy number k, and ρ is tumor cell content.

Data selection

Because we model mutation accumulation with neutral tumor expansion, we included all tumors with well-defined subclonal tails and absence of subclonal selection. To this end, we visually inspected the VAF histograms and excluded 29 cases with poorly resolved subclonal tails. To identify subclonal selection we ran Mobster¹², excluding mutations on sex chromosomes and computing pseudoheterozygous VAFs, in the following termed $\widetilde {{{{\mathrm{VAF}}}}}$ and defined as 50% of the mutated sample fraction, SF. The measured VAFs relate to SF according to ${{{\mathrm{VAF}}}} = \frac{k}{\zeta }{{{\mathrm{SF}}}}$, where k is the number of alleles carrying the mutation. It follows that $\widetilde {{{{\mathrm{VAF}}}}} = \frac{\zeta }{{2k}}{{{\mathrm{VAF}}}}$. We excluded mutations with $\widetilde {{{{\mathrm{VAF}}}}} < 0.1$ from the fit and ran the Mobster setting autosetup = “FAST”. This resulted in an additional exclusion of nine cases for which Mobster suggested subclonal selection. Thus we selected 62 tumors with well-resolved subclonal VAF histograms and no evidence of subclonal selection for parameter inference (Supplementary Table 1).

Parameter estimation

We estimated $n_{{{{\mathrm{clonal}}}}}$, μ and $\delta _{\mathrm{T}}/\lambda _{\mathrm{T}}$ using ABC–SMC as implemented in pyABC⁵⁶. We used a population size of 1,000 parameter samples and prior distributions as outlined in Supplementary Table 12. Termination criteria were the same as above (Modeling neuroblastoma initiation).

We stratified measured VAFs by copy number, excluding copy numbers >4 or present on, in total, <10⁸ bp. For each copy number k, we merged mutations from the clonal VAF peaks constituted by amplified and non-amplified mutations (see Mutation timing for the definition of amplified and non-amplified clonal mutations). To this end, we first assessed the average coverage $\widehat {C_k}$ from all mutations falling on segments with copy number k. Then, we classified mutations at copy number k as amplified clonal if $\frac{{Q_{l - 1}^{0.95}}}{{\widehat {C_k}}} < {{{\mathrm{VAF}}}}_k \le \frac{{Q_l^{0.95}}}{{\widehat {C_k}}}$, where $Q_l^{0.95}$ is the 95% quantile of a binomial distribution with success probability $\frac{{\rho l}}{\zeta }$ and where l is the B-allele count. These mutations were then merged with those of the non-amplified clonal peak by multiplying their frequencies by $\frac{l}{k}$ and adding them l times.

Finally, we computed the cumulative mutation counts of the measured data, $F_{k,{{{\mathrm{ev}}}}}{{{\mathrm{(f)}}}} = {\sum} {{{{\mathrm{VAF}}}}_k} > f$, where f runs from 0.05 to 1.00 in steps of size 0.05, and extrapolated the cumulative mutation counts to the whole genome with multiplication by $\frac{{\mathop {\sum}\nolimits_k {g_k} }}{{g_k}}$. At a sampled parameter set for $n_{{{{\mathrm{clonal}}}}}$, μ and δ_T, the following steps were performed for each copy number state k included in the analysis:

(1)
Sample for each clonal mutation a sequencing coverage C_k according to ${{{\mathrm{Pois}}}}(\widehat {C_k})$.
(2)
Sample for each clonal mutation a VAF according to equation (16).
(3)
Determine $n_{f,k;f \ne 1}$ from equation (13), assuming a tumor size of 10⁹ cells at diagnosis, and evaluating equation (13) in bins of size 0.05 at the lower limit:
$$n_{f,k;f \ne 1} \approx \mathop {\sum}\limits_{i = f10^9}^{\left( {f + 0.05} \right)10^9} {S\left( {i,\mu } \right)} .$$
(4)
Sample for each subclonal mutation a sequencing coverage C_k according to ${{{\mathrm{Pois}}}}(\widehat {C_k})$ and a VAF according to equation (16).
(5)
Compute the cumulative mutation counts, $F_{k,{{{\mathrm{sim}}}}}{{{\mathrm{(f)}}}} = {\sum} {{{{\mathrm{VAF}}}}_k} > f$, where f runs from 0.05 to 1.00 in steps of size 0.05.

The cost function is

$$d = \mathop {\sum}\limits_k {\mathop {\sum}\limits_f {\left( {F_{k,{{{\mathrm{sim}}}}} - F_{k,{{{\mathrm{ev}}}}}} \right)^2} } \frac{{g_k}}{{\mathop {\sum}\nolimits_{k\prime } {g_{k\prime }} }}.$$

(26)

Real-time estimation of cell division rate

In the previous sections we describe how we inferred the parameters for our models of neuroblastoma initiation and growth from mutation data. This yielded estimates for the rates at which cells are lost, as well as the rates at which neutral and oncogenic mutations are acquired in units of cell divisions. To convert these estimates to real time we estimated the actual cell division rate, using the age distribution at diagnosis and the approximate tumor size at diagnosis.

We reasoned that the time span between gastrulation and tumor diagnosis (t_D) consists of two phases: premalignancy up to the formation of the MRCA and malignant growth of the tumor thereafter. Hence,

$$t_{\mathrm{D}} = \frac{{\tilde m_{{{{\mathrm{MRCA}}}}}}}{{\lambda \mu }} + \frac{{\log N_{\mathrm{T}}(t_{\mathrm{D}})}}{{\lambda \left( {1 - \frac{\delta }{{s\lambda }}} \right)}},$$

(27)

where we assume exponential tumor growth until diagnosis. Thus the cell division rate, λ, can be expressed as

$$\lambda = \frac{1}{{t_{\mathrm{D}}}}\left( {\frac{{\tilde m_{{{{\mathrm{MRCA}}}}}}}{\mu } + \frac{{\log N_{\mathrm{T}}(t_{\mathrm{D}})}}{{1 - \frac{\delta }{{s\lambda }}}}} \right).$$

(28)

To estimate λ with equation (28), we combined the clinical information with parameter estimates from our models as follows: we know t_D (the age at diagnosis, A, plus approximately 250 days of embryogenesis after gastrulation) from the clinical data, and estimate that a tumor of a few cubic centimeters consists of the order of $N_{{{\mathrm{T}}}}(t_{\mathrm{D}}) = 10^9$ cells. As described below (Mutation timing and Modeling neuroblastoma initiation), we also have estimates for mutation density at the MRCA, $\tilde m_{{{{\mathrm{MRCA}}}}}$, and for the SSNV rate per actual cell division and haploid genome, μ. Finally, we obtained an estimate for the effective rate of acquisition of (neutral) SSNVs during tumor growth from the subclonal VAF histograms of 62 tumors (compare with Modeling mutation accumulation during tumor growth), which is related to δ and s as

$$\mu _{{{{\mathrm{eff}}}}} = \frac{\mu }{{1 - \delta _{\mathrm{T}}/\lambda _{\mathrm{T}}}} = \frac{\mu }{{1 - \frac{\delta }{{s\lambda }}}}.$$

(29)

Substituting equation (29) in equation (28), we obtain for each of the 62 tumors (labeled with the index i) an estimate for the division rate with mean, $\left\langle {\lambda _i} \right\rangle = \frac{1}{{2\left\langle \mu \right\rangle (A_i + 250\,{{{\mathrm{days}}}})}}\left( {\left\langle {2\tilde m_{{{{\mathrm{MRCA,i}}}}}} \right\rangle + \log 10^9\left\langle {\mu _{{{{\mathrm{eff,i}}}}}} \right\rangle } \right)$, and s.d. (standard deviation), $\sigma \left( {\lambda _i} \right) = \frac{1}{{2\left\langle \mu \right\rangle (A_i + 250\,{{{\mathrm{days}}}})}}$ $\left( \frac{{2\left\langle {\tilde m_{{{{\mathrm{MRCA,i}}}}}} \right\rangle + \log 10^9\left\langle {\mu _{{{{\mathrm{eff,i}}}}}} \right\rangle }}{{2\mu }}\sigma \right.$ $\left\langle {2\mu } \right\rangle + \sigma \left( {2\tilde m_{{{{\mathrm{MRCA,i}}}}}} \right) + \sigma \left( {\mu _{{{{\mathrm{eff,i}}}}}} \right) \Big)$, in actual time. Note that factor 2 accounts for the fact that μ and $\tilde m_{{{{\mathrm{MRCA,i}}}}}$ measure mutation rate and density, respectively, per haploid genome. From equation (29), we also get an estimate of $\delta _{{\mathrm{T}}{{{\mathrm{,i}}}}}/\lambda _{{\mathrm{T}}{{{\mathrm{,i}}}}}$ with expectation $\left\langle {\frac{{\delta _{{\mathrm{T}}{{{\mathrm{,i}}}}}}}{{\lambda _{{\mathrm{T}}{{{\mathrm{,i}}}}}}}} \right\rangle = 1 - \frac{{2\left\langle \mu \right\rangle }}{{\left\langle {\mu _{{{{\mathrm{eff,i}}}}}} \right\rangle }}$ and s.d. ${{{{\sigma }}}}\left( {\delta _{{\mathrm{T}}{{{\mathrm{,i}}}}}/\lambda _{{\mathrm{T}}{{{\mathrm{,i}}}}}} \right) = \frac{{\sigma \left\{ {2\mu } \right\}}}{{\left\langle {\mu _{{{{\mathrm{eff,i}}}}}} \right\rangle }} + \frac{{2\left\langle \mu \right\rangle }}{{\left( {\left\langle {\mu _{{{{\mathrm{eff,i}}}}}} \right\rangle } \right)^2}}{{{{\sigma }}}}\left( {\mu _{{{{\mathrm{eff,i}}}}}} \right)$. Fitting effective mutation rates to the VAF histograms of each of the selected 62 tumors individually hence yields tumor-specific division rates along with relative death rates, which we stratify by TMM.

Finally, one also obtains an estimate for the daily mutation rate during tumor initiation by computing $\mu \lambda _{{{{\mathrm{T}}}},{{{\mathrm{i}}}}}$ with associated uncertainty $\mu {{\Delta }}\lambda _{{{{\mathrm{T}}}},{{{\mathrm{i}}}}} + \lambda _{{{{\mathrm{T}}}},{{{\mathrm{i}}}}}\sigma (\mu )$, which relates molecular clock to real time. For this purpose, we average across the inferences from the 44 primary tumors/metastasis, excluding 18 relapsed tumors among the 62 submitted for analysis.

Statistics and reproducibility

All statistical tests were computed with R (v.3.6.0 and v.4.0.0), and details (statistical tests and whether one- or two-sided, exact sample size, P values and test statistics) are specified in the respective figures and accompanying Source data. This is a retrospective analysis of tumor material that was collected as part of the diagnostic workflow of the German Neuroblastoma trial by the Society for Pediatric Oncology and Hematology and collected in the Neuroblastoma tumor bank. Hence, no statistical method was used to predetermine sample size. All analyzed tumors had a clear copy number profile, tumor cell content ≥25% and no hypermutation genotype; no data were excluded from the analyses. The experiments were not randomized. The investigators were not blinded to allocation during experiments and outcome assessment.

Software and packages

Analysis was performed with R (v.3.6.0 and v.4.0.0) and python v.3.6.1. We used the following R packages: openxlsx v.4.1.5, ggsignif v.0.6.0, ggbeeswarm v.0.6.0, gridExtra v.2.3, RColorBrewer v.1.1-2, HDInterval v.0.2.2, cdata v.1.1.8, moments v.0.14, Hmisc v.4.4-0, scales v.1.1.1, bedr v.1.0.7, circlize v.0.4.10, ggplot2 v.3.3.2 (ref. ⁵⁹), Bioconductor v.3.15 (ref. ⁶⁰), ggbio v.1.34.0 (ref. ⁶¹), ggpubr v.0.4.0, pammtools v.0.2.2 (ref. ⁶²), ComplexHeatmap v.2.5.1 (ref. ⁶³), BSgenome.Hsapiens.UCSC.hg19 v.1.4.3, MASS v.7.3-51.6 (ref. ⁶⁴), GenomicRanges v.1.38.0 (ref. ⁶⁵), reshape2 v.1.4.4 (ref. ⁶⁶), mixtools v.1.2.0 (ref. ⁶⁷), dplyr v.1.0.0, survminer v.0.4.8, survival⁶⁸ v.3.1-12, wesanderson v.0.3.6, cowplot v.1.1.1, mobster¹² v.1.0.0, CNAqc v.1.0.0 and mmsig⁵¹ v.0.0.0.9000; and the python packages SigProfilerMatrixGenerator⁶⁹ v.1.1.26, SigProfilerExtractor⁵⁰ v.1.1.1 and pyABC⁵⁶ v.0.9.13.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

Subsets of WGS and RNA sequencing data were part of previously published studies^24,26. Data from these studies are deposited at the European Genome-Phenome Archive (https://www.ebi.ac.uk/ega/) under accession nos. EGAS00001004349 and EGAS00001001308. Additional WGS data generated for this study are available at the European Genome-Phenome Archive under accession nos. EGAS00001004990 and EGAS00001006533. In accordance with the laws of data protection, data are deposited under controlled access. Access can be granted by contacting Frank Westermann (f.westermann@kitz-heidelberg.de) and requires a data access agreement; requests will be replied to within 4 weeks. Variant calls (SNVs, indels, SVs and copy number variations), mutational signatures, model fits and a summary of the mutation profile and modeling results for each tumor can be accessed at Mendeley (https://doi.org/10.17632/m9pwjbm7c8.1)⁷⁰. All remaining data are available in the Supplementary information. Source data are provided with this paper.

Code availability

All code developed for this study is available at https://github.com/hoefer-lab/Neuroblastoma_evolution⁷¹.

References

Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 719–724 (2009).
Article CAS PubMed PubMed Central Google Scholar
Bozic, I. & Wu, C. J. Delineating the evolutionary dynamics of cancer from theory to reality. Nat. Cancer 1, 580–588 (2020).
Article PubMed Google Scholar
Nordling, C. A new theory on the cancer-inducing mechanism. Br. J. Cancer 7, 68–72 (1953).
Article PubMed PubMed Central Google Scholar
Armitage, P. & Doll, R. A two-stage theory of carcinogenesis in relation to the age distribution of human cancer. Br. J. Cancer 11, 161–169 (1957).
Article PubMed PubMed Central Google Scholar
Singer, J., Kuipers, J., Jahn, K. & Beerenwinkel, N. Single-cell mutation identification via phylogenetic inference. Nat. Commun. 9, 5144 (2018).
Article PubMed PubMed Central Google Scholar
Gerstung, M. et al. The evolutionary history of 2,658 cancers. Nature 578, 122–128 (2020).
Article CAS PubMed PubMed Central Google Scholar
Körber, V. et al. Evolutionary trajectories of IDHWT glioblastomas reveal a common path of early tumorigenesis instigated years ahead of initial diagnosis. Cancer Cell 35, 692–704 (2019).
Article PubMed Google Scholar
Yates, L. R. et al. Subclonal diversification of primary breast cancer revealed by multiregion sequencing. Nat. Med. 21, 751–759 (2015).
Article CAS PubMed PubMed Central Google Scholar
Rustad, E. H. et al. Timing the initiation of multiple myeloma. Nat. Commun. 11, 1917 (2020).
Article CAS PubMed PubMed Central Google Scholar
Graham, T. A. & Sottoriva, A. Measuring cancer evolution from the genome. J. Pathol. 241, 183–191 (2017).
Article PubMed Google Scholar
Williams, M. J. et al. Quantification of subclonal selection in cancer from bulk sequencing data. Nat. Genet. 50, 895–903 (2018).
Article CAS PubMed PubMed Central Google Scholar
Caravagna, G. et al. Subclonal reconstruction of tumors by using machine learning and population genetics. Nat. Genet. 52, 898–907 (2020).
Article CAS PubMed PubMed Central Google Scholar
Bozic, I., Gerold, J. M. & Nowak, M. A. Quantifying clonal and subclonal passenger mutations in cancer evolution. PLoS Comput. Biol. 12, e1004731 (2016).
Article PubMed PubMed Central Google Scholar
Coorens, T. H. et al. Embryonal precursors of Wilms tumor. Science 366, 1247–1251 (2019).
Article CAS PubMed PubMed Central Google Scholar
Kildisiute, G. et al. Tumor to normal single-cell mRNA comparisons reveal a pan-neuroblastoma cancer cell. Sci. Adv. 7, eabd3311 (2021).
Article CAS PubMed PubMed Central Google Scholar
Cohn, S. L. et al. The International Neuroblastoma Risk Group (INRG) classification system: an INRG task force report. J. Clin. Oncol. 27, 289–297 (2009).
Article PubMed PubMed Central Google Scholar
Sokol, E. & Desai, A. V. The evolution of risk classification for neuroblastoma. Children 6, 27 (2019).
Article PubMed PubMed Central Google Scholar
Depuydt, P. et al. Meta-mining of copy number profiles of high-risk neuroblastoma tumors. Sci. Data 5, 180240 (2018).
Article CAS PubMed PubMed Central Google Scholar
Attiyeh, E. F. et al. Chromosome 1p and 11q deletions and outcome in neuroblastoma. N. Engl. J. Med. 353, 2243–2253 (2005).
Article CAS PubMed Google Scholar
Vandesompele, J. et al. Multicentre analysis of patterns of DNA gains and losses in 204 neuroblastoma tumors: how many genetic subgroups are there? Med. Pediatr. Oncol. 36, 5–10 (2001).
Article CAS PubMed Google Scholar
Chen, Q.-R. et al. cDNA array-CGH profiling identifies genomic alterations specific to stage and MYCN-amplification in neuroblastoma. BMC Genomics 5, 70 (2004).
Article PubMed PubMed Central Google Scholar
Szewczyk, K. et al. Unfavorable outcome of neuroblastoma in patients with 2p gain. Front. Oncol. 9, 1018 (2019).
Article PubMed PubMed Central Google Scholar
Pugh, T. J. et al. The genetic landscape of high-risk neuroblastoma. Nat. Genet. 45, 279–284 (2013).
Article CAS PubMed PubMed Central Google Scholar
Hartlieb, S. A. et al. Alternative lengthening of telomeres in childhood neuroblastoma from genome to proteome. Nat. Commun. 12, 1269 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ackermann, S. et al. A mechanistic classification of clinical phenotypes in neuroblastoma. Science 362, 1165–1170 (2018).
Article CAS PubMed PubMed Central Google Scholar
Peifer, M. et al. Telomerase activation by genomic rearrangements in high-risk neuroblastoma. Nature 526, 700–704 (2015).
Article CAS PubMed PubMed Central Google Scholar
Pilati, C. et al. Mutational signature analysis identifies MUTYH deficiency in colorectal cancers and adrenocortical carcinomas. J. Pathol. 242, 10–15 (2017).
Article CAS PubMed Google Scholar
Brady, S. W. et al. Pan-neuroblastoma analysis reveals age-and signature-associated driver alterations. Nat. Commun. 11, 5183 (2020).
Article CAS PubMed PubMed Central Google Scholar
Durinck, S. et al. Temporal dissection of tumorigenesis in primary cancers. Cancer Discov. 1, 137–143 (2011).
Article CAS PubMed PubMed Central Google Scholar
Janoueix-Lerosey, I. et al. Overall genomic pattern is a predictor of outcome in neuroblastoma. J. Clin. Oncol. 27, 1026–1033 (2009).
Article PubMed Google Scholar
Woods, W. G. et al. Screening for neuroblastoma in North America. 2-year results from the Quebec Project. Am. J. Pediatr. Hematol. Oncol. 14, 312–319 (1992).
Article CAS PubMed Google Scholar
Brodeur, G. M. Spontaneous regression of neuroblastoma. Cell Tissue Res. 372, 277–286 (2018).
Article CAS PubMed PubMed Central Google Scholar
Schilling, F. H. et al. Neuroblastoma screening at one year of age. N. Engl. J. Med. 346, 1047–1053 (2002).
Article PubMed Google Scholar
Bessho, F. Effects of mass screening on age‐specific incidence of neuroblastoma. Int. J. Cancer 67, 520–522 (1996).
Article CAS PubMed Google Scholar
Bae, T. et al. Different mutational rates and mechanisms in human cells at pregastrulation and neurogenesis. Science 359, 550–555 (2018).
Article CAS PubMed Google Scholar
Jansky, S. et al. Single-cell transcriptomic analyses provide insights into the developmental origins of neuroblastoma. Nat. Genet. 53, 683–693 (2021).
Article CAS PubMed Google Scholar
Williams, M. J., Werner, B., Barnes, C. P., Graham, T. A. & Sottoriva, A. Identification of neutral tumor evolution across cancer types. Nat. Genet. 48, 238–244 (2016).
Article CAS PubMed PubMed Central Google Scholar
Aherne, W. & Buck, P. The potential cell population doubling time in neuroblastoma and nephroblastoma. Br. J. Cancer 25, 691–696 (1971).
Article CAS PubMed PubMed Central Google Scholar
Kaneko, Y. & Knudson, A. G. Mechanism and relevance of ploidy in neuroblastoma. Genes Chromosomes Cancer 29, 89–95 (2000).
Article CAS PubMed Google Scholar
Turajlic, S., Sottoriva, A., Graham, T. & Swanton, C. Resolving genetic heterogeneity in cancer. Nat. Rev. Genet. 20, 404–416 (2019).
Article CAS PubMed Google Scholar
Schmelz, K. et al. Spatial and temporal intratumour heterogeneity has potential consequences for single biopsy-based neuroblastoma treatment decisions. Nat. Commun. 12, 6804 (2021).
Article CAS PubMed PubMed Central Google Scholar
Andersson, N. et al. Extensive clonal branching shapes the evolutionary history of high-risk pediatric cancers. Cancer Res. 80, 1512–1523 (2020).
Article CAS PubMed Google Scholar
Karlsson, J. et al. Four evolutionary trajectories underlie genetic intratumoral variation in childhood cancer. Nat. Genet. 50, 944–950 (2018).
Article CAS PubMed Google Scholar
Venkatachalam, S. et al. Retention of wild-type p53 in tumors from p53 heterozygous mice: reduction of p53 dosage can promote cancer formation. EMBO J. 17, 4657–4667 (1998).
Article CAS PubMed PubMed Central Google Scholar
Seifert, H. et al. The prognostic impact of 17p (p53) deletion in 2272 adults with acute myeloid leukemia. Leukemia 23, 656–663 (2009).
Article CAS PubMed Google Scholar
Pinto, N. R. et al. Advances in risk classification and treatment strategies for neuroblastoma. J. Clin. Oncol. 33, 3008–3017 (2015).
Article PubMed PubMed Central Google Scholar
Worst, B. C. et al. Next-generation personalised medicine for high-risk paediatric cancer patients–the INFORM pilot study. Eur. J. Cancer 65, 91–101 (2016).
Article PubMed Google Scholar
Reisinger, E. et al. OTP: an automatized system for managing and processing NGS data. J. Biotechnol. 261, 53–62 (2017).
Article CAS PubMed Google Scholar
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
Article CAS PubMed PubMed Central Google Scholar
Islam, S. A. et al. Uncovering novel mutational signatures by de novo extraction with SigProfilerExtractor. Cell Genomics 2, 100179 (2022).
Article CAS PubMed PubMed Central Google Scholar
Rustad, E. H. et al. mmsig: a fitting approach to accurately identify somatic mutational signatures in hematological malignancies. Commun. Biol. 4, 424 (2021).
Article CAS PubMed PubMed Central Google Scholar
Bolkestein, M. et al. Chromothripsis in human breast cancer. Cancer Res. 80, 4918–4931 (2020).
Article CAS PubMed Google Scholar
Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell 171, 1029–1041 (2017).
Article CAS PubMed PubMed Central Google Scholar
Therneau, T. M. Extending the Cox model. In Proc. First Seattle Symposium in Biostatistics (eds Lin, D. Y & Fleming, T. R.) 51–84 (Springer, 1997).
Durrett, R., Schmidt, D. & Schweinsberg, J. A waiting time problem arising from the study of multi-stage carcinogenesis. Ann. Appl. Probab. 19, 676–718 (2009).
Article Google Scholar
Klinger, E., Rickert, D. & Hasenauer, J. pyABC: distributed, likelihood-free inference. Bioinformatics 34, 3591–3593 (2018).
Article CAS PubMed Google Scholar
Ohtsuki, H. & Innan, H. Forward and backward evolutionary processes and allele frequency spectrum in a cancer cell population. Theor. Popul. Biol. 117, 43–50 (2017).
Article PubMed Google Scholar
Bailey, N. T. The Elements of Stochastic Processes with Applications to the Natural Sciences (John Wiley & Sons, 1991).
Wickham, H. ggplot2: Elegant Graphics for Data Analysis (Springer, 2009).
Huber, W. et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat. Methods 12, 115–121 (2015).
Article CAS PubMed PubMed Central Google Scholar
Yin, T., Cook, D. & Lawrence, M. ggbio: an R package for extending the grammar of graphics for genomic data. Genome Biol. 13, R77 (2012).
Article PubMed PubMed Central Google Scholar
Bender, A., Scheipl, F., Hartl, W., Day, A. G. & Küchenhoff, H. Penalized estimation of complex, non-linear exposure-lag-response associations. Biostatistics 20, 315–331 (2019).
Article PubMed Google Scholar
Gu, Z., Eils, R. & Schlesner, M. Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32, 2847–2849 (2016).
Article CAS PubMed Google Scholar
Venables, W. N. & Ripley, B. D. in Modern Applied Statistics with S 271–300 (Springer, 2002).
Lawrence, M. et al. Software for computing and annotating genomic ranges. PLoS Comput. Biol. 9, e1003118 (2013).
Article CAS PubMed PubMed Central Google Scholar
Wickham, H. Reshaping data with the reshape package. J. Stat. Softw. 21, 1–20 (2007).
Article Google Scholar
Tatiana, B., Didier, C., David, R. & Derek, Y. mixtools: an R package for analyzing finite mixture models. J. Stat. Softw. 32, 1–29 (2009).
Google Scholar
Therneau, T. M. & Grambsch, P. M. in Modeling Survival Data: Extending the Cox Model 39–77 (Springer, 2000).
Bergstrom, E. N. et al. SigProfilerMatrixGenerator: a tool for visualizing and exploring patterns of small mutational events. BMC Genomics 20, 685 (2019).
Article PubMed PubMed Central Google Scholar
Körber, V., Westermann, F. & Höfer, T. Neuroblastoma evolution. Mendeley data. https://doi.org/10.17632/m9pwjbm7c8.1 (2022).
Körber, V. hoefer-lab/Neuroblastoma_evolution: neuroblastoma evolution. Zenodo https://doi.org/10.5281/zenodo.7437449 (2022).

Download references

Acknowledgements

We thank the members of the Höfer and Westermann groups for discussions. T.H. and F.W. are supported through ERA-CoSysmed consortium INFER-NB (Federal Ministry for Education and Research, grant agreement no. 031L0238) and DKFZ core funding. We thank the NCT Molecular Precision Oncology Program for technical support and funding through HIPO2-K09R.

Funding

Open access funding provided by Deutsches Krebsforschungszentrum (DKFZ).

Author information

Authors and Affiliations

Division of Theoretical Systems Biology, German Cancer Research Center, Heidelberg, Germany
Verena Körber & Thomas Höfer
Hopp Children’s Cancer Center, Heidelberg, Germany
Sabine A. Stainczyk, Kai-Oliver Henrich & Frank Westermann
Division of Neuroblastoma Genomics, German Cancer Research Center, Heidelberg, Germany
Sabine A. Stainczyk, Kai-Oliver Henrich & Frank Westermann
Division of Applied Bioinformatics, German Cancer Research Center, Heidelberg, Germany
Roma Kurilov & Benedikt Brors
Department of Pediatric Oncology and Hematology, University Children’s Hospital of Cologne, Medical Faculty, Cologne, Germany
Barbara Hero

Authors

Verena Körber
View author publications
You can also search for this author in PubMed Google Scholar
Sabine A. Stainczyk
View author publications
You can also search for this author in PubMed Google Scholar
Roma Kurilov
View author publications
You can also search for this author in PubMed Google Scholar
Kai-Oliver Henrich
View author publications
You can also search for this author in PubMed Google Scholar
Barbara Hero
View author publications
You can also search for this author in PubMed Google Scholar
Benedikt Brors
View author publications
You can also search for this author in PubMed Google Scholar
Frank Westermann
View author publications
You can also search for this author in PubMed Google Scholar
Thomas Höfer
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

V.K., F.W. and T.H. conceived the project. S.A.S., K.-O.H., B.H. and F.W. provided primary sequencing data and integrated these with clinical information. V.K. analyzed the data, developed mathematical models and performed all computations. B.B. and R.K. contributed to survival analysis. T.H. supervised the study. T.H. and V.K. wrote the manuscript, with input from all authors.

Corresponding authors

Correspondence to Frank Westermann or Thomas Höfer.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Giulio Caravagna, Anders Valind and Angelika Eggert for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Copy number changes and mutational signatures in primary and relapsed neuroblastomas of the discovery cohort.

a, Age distribution at tumor diagnosis. b, Ploidy estimates based on WGS (inferred with ACEseq) and on DNA-index measurements (as measured by flow cytometry). Shown are 71 primary and relapsed tumors for which DNA-index was determined. c, Distribution of neuroblastoma stages among rounded ploidies. d, Overview of gains and losses across the tumor cohort. e, Comparison of mutational signatures (Cosmic v3) contributing to all SSNVs and to clonal and subclonal SSNVs (n = 100 primary and relapse tumors from the discovery cohort; Pearson’s correlation coefficients and two-sided P values are shown for each comparison).

Source data

Extended Data Fig. 2 Mutation densities at chromosomal gains in the discovery cohort.

a, Mutation densities of non-amplified clonal mutations per genomic segment, stratified by copy number for tumor NBE15. Significance was tested with a two-sided Wilcoxon rank sum test/Mann–Whitney U-test (disomic: n = 15; trisomic: n = 6; tetrasomic: n = 3). Boxes show median, 25% and 75% percentiles, whiskers extend to the smallest and largest value within 1.5x interquartile range. b, Estimated mutation densities at ECA and MRCA of primary tumors (ECA: n = 26; MRCA: n = 60), primary metastases (ECA: n = 2; MRCA: n = 7) and relapsed tumors/metastases (ECA: n = 30; MRCA: n = 33). Significance was tested with a two-sided Wilcoxon rank sum test/Mann–Whitney U-test and defining ** as P < 0.01 (exact P values are given in Source Data). c, Model scheme for neuroblastoma relapse corresponding to data in (b). d and e, Exposures of mutational signatures among clonal (d) and subclonal SSNVs (e). Signatures SBS1, SBS5 and SBS40 were combined into a single clock-like mutational signature. Bar heights correspond to the average among early-MRCA (top; n = 20 primary tumors and metastases) and late-MRCA tumors (bottom; n = 47 primary tumors and metastases); error bars show standard error of the mean.

Source data

Extended Data Fig. 3 Examples of early evolution.

a, Copy number profiles of an example tumor with near-triploid genome and without a timeable ECA. The color panels annotate genomic segments of equal copy number. Next to the copy number profile, the densities of non-amplified (green) and amplified (color-encoded) clonal mutations on segment of length ≥10⁷bp are shown, with the dashed line showing the kernel-density estimate of the distribution of non-amplified clonal mutations. Chromosomal segments of equal copy number were merged and Holm-corrected one-sided P values were computed based on a negative binomial distribution (**, p_adj < 0.01, exact P values are provided in Source Data). The bottom panel shows mutation densities at ECA and MRCA with 95% confidence bounds estimated by bootstrapping. The horizontal lines show mutation densities on gained segments with 95% confidence bounds computed from χ² distributions. b, Mutation densities at ECA and MRCA of early-MRCA tumors with (left) and without (right) timeable ECA. Solid lines represent maximum likelihood estimates and shaded areas represent 95% confidence intervals, as obtained by bootstrapping. c and d, As in (a) but for a near-diploid tumor with timeable ECA (c) and for a near-tetraploid tumor without timeable ECA (d). Holm-corrected one-sided P values were computed based on a negative binomial distribution (**, p_adj < 0.01, exact P values are provided in Source Data). e, Mutation densities at MRCA in early-MRCA (n = 20) and late-MRCA primary tumors (n = 47, thereof n = 26 with ECA and n = 21 without ECA). f, Fraction of polysomic chromosomes in near-triploid (n = 33) and near-tetraploid (n = 12) tumors that were generated in a single oncogenic event. As a control, the fraction of disomic chromosome whose mutation density matched with the MRCA is shown for near-diploid tumors (n = 55). In (e) and (f), boxes show median, 25 and 75% percentiles, whiskers extend to the smallest and largest value within 1.5x interquartile range.

Source data

Extended Data Fig. 4 Analysis of overall survival.

Multivariate analysis of overall survival with Cox-regression considering MRCA timing, acquired mechanisms of telomere maintenance, stage, age at diagnosis and mutation status in the RAS/P53 pathway. Shown are mean hazard ratio, 95% confidence intervals and p-values for each variable (two-sided Wald test).

Source data

Extended Data Fig. 5 Neuroblastoma initiation during progenitor expansion.

a, Two-dimensional projections of the posterior probability distribution of the model parameters (considering neuroblastoma initiation in a transient population of early neuroblasts). b, Predicted expansion of neuroblasts (grey) and the selected subclone upon acquisition of the first oncogenic event (dark green) in a model of transient neuroblast expansion. Colored areas show the 95% posterior probability bounds (estimated from simulations using 1,000 samples from the posterior probability distribution). Vertical line and shaded area give mean and 95% CI for the estimated time of birth (computed from n = 62 primary tumors). c, Comparison of parameter estimates if fitting the model to all SSNVs or to SSNVs generated by a clock-like mutational process only. For the latter, mutant densities were adjusted by the fraction of SSNVs explained by SBS1, SBS5 or SBS40. For each parameter, median and 80% credible intervals are shown (estimated from n = 1000 samples of the posterior distribution). d, Comparison between the estimated time point at which the MRCA (left, n = 95 primary tumors/metastases with TMM) and the ECA (right, n = 47 primary tumors/metastases with TMM) emerged if using all SSNVs or if using SSNVs that were generated by a clock-like process only. Time is measured in weeks post conceptionem (p.c.).

Source data

Supplementary information

Supplementary Information

Supplementary notes 1 and 2 and tables 11 and 12.

Reporting summary.

Peer review file.

Supplementary Table 1

Patient characteristics and driver genes in discovery and validation cohorts.

Source data

Source Data Fig. 1

Statistical source data.

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Fig. 5

Statistical source data.

Source Data Fig. 6

Statistical source data

Source Data Fig. 8

Statistical source data.

Source Data Extended Data Fig. 1

Statistical source data.

Source Data Extended Data Fig. 2

Statistical source data.

Source Data Extended Data Fig. 3

Statistical source data.

Source Data Extended Data Fig. 4

Statistical source data.

Source Data Extended Data Fig. 5

Statistical source data.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Körber, V., Stainczyk, S.A., Kurilov, R. et al. Neuroblastoma arises in early fetal development and its evolutionary duration predicts outcome. Nat Genet 55, 619–630 (2023). https://doi.org/10.1038/s41588-023-01332-y

Download citation

Received: 04 July 2022
Accepted: 06 February 2023
Published: 27 March 2023
Issue Date: April 2023
DOI: https://doi.org/10.1038/s41588-023-01332-y

This article is cited by

Altered methylation of imprinted genes in neuroblastoma: implications for prognostic refinement
- Medha Suman
- Maja Löfgren
- Helena Carén
Journal of Translational Medicine (2024)
Evolving copy number gains promote tumor expansion and bolster mutational diversification
- Zicheng Wang
- Yunong Xia
- Ruping Sun
Nature Communications (2024)
Joint single-cell genetic and transcriptomic analysis reveal pre-malignant SCP-like subclones in human neuroblastoma
- Thale K. Olsen
- Jörg Otte
- Ninib Baryawno
Molecular Cancer (2024)
A human neural crest model reveals the developmental impact of neuroblastoma-associated chromosomal aberrations
- Ingrid M. Saldana-Guerrero
- Luis F. Montano-Gutierrez
- Florian Halbritter
Nature Communications (2024)
Mathematical modeling of neuroblastoma associates evolutionary patterns with outcomes
- Giulio Caravagna
Nature Genetics (2023)

Subjects

Abstract

Similar content being viewed by others

Main

Results

Mutation patterns in a comprehensive neuroblastoma cohort

Neutral SSNVs distinguish sequential driver events

Two evolutionary classes of neuroblastoma

Long evolution of neuroblastoma predicts unfavorable outcome

Genomic instability and telomere maintenance in late-MRCA tumors

Chromosomal gains occur during sympathetic neurogenesis

Discussion

Methods

Patient cohorts, tumor samples and ethical approval

WGS

Mutational signatures

Mutation timing

Estimation of numbers of amplified and non-amplified clonal mutations

Mutation timing

Timing MRCA and ECA

Mutation timing

Translation of mutation densities into estimated weeks p.c

Survival analysis

Modeling neuroblastoma initiation

Modeling emergence of the MRCA

Modeling the ECA

Relating model and data

Parameter estimation

Modeling mutation accumulation during tumor growth

Model

Relating model and data

Data selection

Parameter estimation

Real-time estimation of cell division rate

Statistics and reproducibility

Software and packages

Reporting summary

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Quick links