The utility of cancer whole genome and transcriptome sequencing (cWGTS) in oncology is increasingly recognized. However, implementation of cWGTS is challenged by the need to deliver results within clinically relevant timeframes, concerns about assay sensitivity, reporting and prioritization of findings. In a prospective research study we develop a workflow that reports comprehensive cWGTS results in 9 days. Comparison of cWGTS to diagnostic panel assays demonstrates the potential of cWGTS to capture all clinically reported mutations with comparable sensitivity in a single workflow. Benchmarking identifies a minimum of 80× as optimal depth for clinical WGS sequencing. Integration of germline, somatic DNA and RNA-seq data enable data-driven variant prioritization and reporting, with oncogenic findings reported in 54% more patients than standard of care. These results establish key technical considerations for the implementation of cWGTS as an integrated test in clinical oncology.
Cancer is caused by the accumulation of somatic variants, including point mutations, structural variants (SVs), and copy number alterations (CNAs) that drive oncogenesis, disease progression, and in some cases define therapeutic vulnerabilities. The introduction of next-generation sequencing (NGS)-based targeted gene-panel assays has aided disease diagnosis, guided care, and improved patient outcomes through refinement of treatment options1,2,3,4,5. However, targeted panels are optimized to assess clinical biomarkers in common cancers1,2,4. Recent studies showed the utility of whole-exome sequencing (WES) in identifying coding mutations in rare cancer genes6,7. However, for patients with pediatric or rare cancers that have low mutation burden, distinctive methylation profiles8,9 are primarily driven by SVs and fusion genes, panel/WES tests fail to identify a clinical biomarker in most cases1,2,3,4,7. This underscores an unmet need for better diagnostic workflows to guide clinical management.
Cancer whole-genome and transcriptome sequencing (cWGTS) offers the opportunity to assess the full spectrum of germline and somatically acquired mutations, SVs and CNAs, along with quantification of tumor mutation burden (TMB) and genome-wide mutational patterns10. The likely clinical utility of cWGTS in pediatric and rare cancers is increasingly evidenced in recent literature7,11,12,13,14,15,16. However, clinical implementation of WGS in oncology is challenged by cost of sequencing, complexity of laboratory, and analytical workflows to process large-scale data within clinically relevant timeframes, concerns about the sensitivity of low-coverage WGS in detecting actionable mutations captured by high-depth panel assays, and the interpretability of cWGTS findings with regard to clinical utility3. Here, we demonstrate the feasibility, analytical validity, and resolve critical technical considerations for the implementation of cWGTS in primary cancer care in the context of pediatric, adolescent, and young adult solid tumor patients with rare cancers.
The study cohort included patients presenting with primary diagnostic or relapse/refractory disease. Of 201 patient fresh frozen (FF) tumors nominated for paired cancer/normal whole-genome and transcriptome sequencing (cWGTS), 58 were excluded upon pathology review and 29 did not meet our requirement for >20% tumor purity as assessed by WGS. The majority of the excluded cases were post-therapy neuroblastoma and sarcoma samples with a predominance of necrotic disease. The final cohort included a single sample each from 114 pediatric, adolescent, and young adult patients (median age = 12.6 years, range: 4.5 months to 43.8 years) with solid tumors (Supplementary Tables 1, 2, Supplementary Fig. 1a–d).
Implementation of a cWGTS workflow for clinical decision support
To prototype a clinical cWGTS workflow, we developed an end-to-end process (Fig. 1a) that included dedicated: 1. project-management team, 2. lab operators for sample processing, 3. sequencing machines for cWGTS, 4. data-import channel, 5. Biosciences platform for automated deployment of analysis pipelines and API integration with institutional and public databases17, 6. reserved computing nodes in a high-performance computing environment, and 7. systematic pipeline for prioritization and reporting of genomic findings (Fig. 1a, Supplementary Fig. 1e). We quantified the end-to-end time from sample acquisition to the generation of an automated report. Time logs were audited starting at the time of surgical biopsy submission to report delivery for review by an interdisciplinary molecular tumor board for samples in our study with audit trails recorded through our biosciences platform (Supplementary Fig. 1e). End-to-end, this workflow was executed on average in 17 days during the developmental phase (range: 11–29, n = 59 samples), reaching a fully optimized workflow with a final turnaround of 9 days (Fig. 1b, n = 16). This is shorter than the standard turnaround time (TAT) for many clinical NGS-panel sequencing tests (2–4 weeks)1,2,4 markedly faster than the majority of WGS-processing timeframes in literature (3–8 weeks, Fig. 1b)11,12,13,14,15,16 and comparable to the TAT achieved by centralized infrastructures of scale such as the Hartwig Foundation18. This demonstrates the feasibility of implementing cWGTS profiling to support diagnosis and treatment decisions with a clinically relevant turnaround time within a comprehensive cancer care center.
Comprehensive genome characterization utilizing cWGTS
Across all mutational classes, cWGTS identified on average 7353 acquired mutations per sample, including cancer-associated alterations in 99% (n = 114) of patients (Supplementary Fig. 2). These include CNAs (n = 105 patients), germline predisposition (n = 17), mutations in cancer-associated genes (n = 77), translocations/fusion transcripts (n = 27), disease-associated SVs (n = 75), and outlier TMB or microsatellite-instability (MSI) scores (n = 7) (Supplementary Table 3). Further signals of interest included the delineation of mutation signatures19, detection of chromothripsis20 or whole- genome duplication (WGD)21, cancer-associated viral sequences (i.e., EBV)22,23, estimation of telomere length24, and gene expression signatures. SVs, most of which can only be detected using WGS, represented the third most frequent class of genomic alterations.
Concordance analysis of cWGTS to targeted DNA- and RNA-panel tests
Within our cohort, targeted DNA profiling of corresponding formalin-fixed paraffin-embedded (FFPE) biopsies by MSK-IMPACT4 detected actionable biomarkers as defined by OncoKb Levels 1–425 in 24% of patients (n = 27) (Fig. 2a–c, Supplementary Table 4). Consistent with prior findings demonstrating that patients with rare cancers do not yield clinically relevant biomarkers by panel sequencing4, most patients in our cohort (76%, n = 87) had no therapy-informing alterations. These results are representative of the expanded pediatric/young-adult patient population at MSK (Supplementary Fig. 3a, b).
We first assessed whether mutations captured by MSK-IMPACT were also detected by WGS. For all discordant samples, we performed MSK-IMPACT on the same DNA aliquots used to generate the WGS libraries. This allowed us to ascertain whether discrepant calls were owing to differences in assay sensitivity (MSK-IMPACT and WGS) or a consequence of intratumor heterogeneity (ITH)26.
Of 221 somatic mutations reported by MSK-IMPACT, 174 (79%) were called in WGS (Fig. 2d, e). This includes 68/83 (82%) mutations reported by MSK-IMPACT as oncogenic25 (Supplementary Fig. 3c). Variants called by both assays ranged from 5% to 97% variant allele frequency (VAF) with high concordance (r2 = 0.75) in VAF estimates (Fig. 2d). The majority of discordant mutations (46/47) were subclonal in MSK-IMPACT (<90% of cancer cell fraction) and 15 were classified as oncogenic (Supplementary Table 4, Supplementary Fig. 3c). Discordant mutations presented with a broad range of VAF (range: 2.2–39%, median = 8.5%) (Fig. 2d) and showed no systematic bias in effective coverage (Supplementary Fig. 3d). The 47 discordant mutations were confined to 26 samples (range 1–7 mutations per patient). Targeted resequencing of the WGS libraries by MSK-IMPACT was performed for 44 discordant variants and none were called, despite a median local depth of sequencing at 469x, supporting ITH as the basis of the discrepancies (Supplementary Fig. 3e). Further corroborating ITH, WGS and targeted resequencing data of the same FF DNA aliquots identified 10 mutations (VAF: 6–31%), which were not reported by MSK-IMPACT from the patient-matched FFPE sample. Three of the 10 additional calls were cancer-associated variants (TP53 L265Yfs*81, PPM1D S468*, and HLA-A L102Hfs*73) (Supplementary Fig. 3e).
These validation studies demonstrated that discordant calls were due to stochastic sampling of heterogeneous tumors26 (Supplementary Table 4) and concordance between WGS and MSK-IMPACT is at least 94% (43/47 total discordant) and up to 100% for all 43 mutations in evaluable samples when the same DNA aliquot was used in both assays.
Germline assessment by MSK-IMPACT27 identified predisposition variants in 13 patients and panel RNA assessment with MSK-Fusion28 identified oncogenic fusion genes in 18 (Supplementary Tables 5, 6, Fig. 2f). cWGTS captured all 13 germline-predisposition variants and 18 fusions. Importantly, fusion genes were supported by data in both WGS and RNA-seq, which offers the opportunity to orthogonally validate findings within a single workflow (Fig. 2f). These findings demonstrate that cWGTS as an integrative assay allows for the detection of germline, somatic mutations and fusion genes captured by an array of standard-of-care diagnostic tests.
Technical considerations: optimal depth of coverage for clinical sequencing
Sensitivity for somatic variant detection is directly dependent on the tumor cellularity of the biopsy and depth of sequencing coverage. The median cWGS depth was 95x (range 67–181) and tumor purity ranged from 21% to 100%, resulting in a median effective coverage of 64× (median depth * purity estimate). To evaluate optimal depth of sequencing, for each of 97 tumors with WGS coverage ≥60×, we generated 298 derivative subsampled BAM files in the range of 100×, 80×, 60×, and 30–40× (Supplementary Fig. 4a, Supplementary Table 7). De novo variant calling was performed to assess sensitivity of detection for clinically relevant findings by MSK-IMPACT and WGS (n = 220), genome-wide mutations across variant classes, and TMB (Fig. 3a, Supplementary Fig. 4b, c). Detection sensitivity correlated with effective coverage and was affected by variant class with slightly less sensitivity for SVs (Fig. 3a). Of the oncogenic findings, >91% were captured at 30–40× and >98% were recalled at 60–100× (Fig. 3a). With lower sequencing coverage, the power to detect subclones is limited (VAF range: 4.3–31%, median = 9.5%) (Fig. 3b, c). Optimal sensitivity for genome-wide mutation calling across variant classes was attained at ≥80× and increased with coverage. Figure 3c provides an overview of variant-detection sensitivity by depth of sequencing coverage, tumor purity, and variant clonal representation.
Findings of biological and clinical relevance detected by cWGTS only
The clinical relevance of cWGTS findings that were not identified by clinical panel sequencing (MSK-IMPACT4, MSK-fusion28 and panel testing of 88 cancer-predisposition genes27) was determined by a multidisciplinary molecular tumor board. Consistent with recent studies7,11,12,13,14,15,16, cWGTS analyses identified at least one additional cancer-associated oncogenic variant in 54% of patients (n = 62). Of these, 33 patients had one or more findings that were of direct clinical relevance, including 7 diagnostic (21%), 15 prognostic (45%), 5 therapy-informing (15%), 5 previously undescribed oncofusions (15%), and 6 germline (18%) biomarkers (Fig. 4a, Supplementary Table 3). Most additional relevant findings were explained by the detection of SVs and fusions and genome-wide mutation signatures (Fig. 4b).
We further inferred the portion of additional findings that would be captured by WES by masking results to the coding regions of genes. Of the 62 patients with incremental findings by cWGTS, RNA-seq and WES alone would only capture events in 10 (16%) and 8 (n = 13%) patients, respectively, or in 17 patients when combined (Fig. 4a, Supplementary Table 3). Thus, only 27% of the findings in cWGTS could be captured by WES and RNA-seq as the majority of additional findings were attributed to SVs.
Rare variants in established cancer genes
We identified seven clinically relevant findings targeting known rare cancer genes (Supplementary Tables 5, 8). Of these, three were somatically acquired and included a disease-defining mutation of KBTBD4 (p.R313_M314insPRR) in a pineal parenchymal tumor of intermediate differentiation29,30, a SETBP1 (p.D868N) mutation in a germ cell tumor, and a SIX1 mutation (p.Q177R) in a Wilms tumor31. Additionally, clinically relevant germline variants were detected in four cancer-associated genes, including an SBDS splice-site mutation (c.258 + 2T > C) in a rhabdomyosarcoma, BARD1 p.E652fs*69 in a neuroblastoma, EP300 p.A2259fs*20, and EXT2 p.W414* in two osteosarcoma patients (Supplementary Table 5). These results demonstrate the utility of cWGTS in capturing somatic and germline variants in rare cancer genes not routinely evaluated in targeted panels2,4.
Eight in-frame fusion genes were identified from WGS and RNA-seq in patients with no prior findings on clinical testing (Supplementary Fig. 5, Supplementary Table 6), 5 of which were not described before. Of diagnostic relevance, we identify: 1. a t(2;6) (PAX3-FOXO3) translocation changing diagnosis to alveolar rhabdomyosarcoma (ARMS) in a patient who was diagnosed with embryonal rhabdomyosarcoma (ERMS) in the absence of the cardinal ARMS fusions (PAX3-FOXO1 and PAX7-FOXO1)32 (Fig. 5a, b), 2. a UACA-LTK fusion in a metastatic papillary thyroid carcinoma33, and 3. a pathognomonic SH3PXD2A-HTRA1 fusion establishing a diagnosis of schwannoma in a patient evaluated for relapsed stage-IV neuroblastoma34.
Of potential therapeutic relevance, we identified an NTRK3-SLMAP fusion in a neuroblastoma patient. Activating NTRK3 fusions are promising therapeutic targets for TRK inhibitors, with activity seen across pediatric and adult cancers35. However, screening for NTRK fusions is not routinely performed across all disease indications. Additional undescribed fusions included EPC2-AFF3 and MAN1A2-ACBD6 identified in two patients with undifferentiated sarcoma, and a CITED2-MGA fusion in a round-cell sarcoma not otherwise specified.
Structural variants targeting tumor suppressor genes
Structural variations of established prognostic relevance36 were observed in our cohort providing insights of clinical relevance. cWGTS mapped events in TERT and ATRX in 8 (28%) and 5 (17%) neuroblastoma patients, respectively. Both TERT and ATRX are increasingly considered as therapy-defining risk- stratification biomarkers for neuroblastoma37. The TERT SVs could only be identified by cWGTS, and only ¼ of the ATRX deletions were reported by MSK-IMPACT4.
We also observed recurrent SVs targeting the tumor suppressor gene DLG2 in 15/29 OS patients38 and 3/29 neuroblastoma, of which 6 had homozygous deletions (Supplementary Table 9). While DLG2 has been characterized in osteosarcoma38, our findings demonstrate that DLG2 SVs are also recurrent in neuroblastoma warranting further investigation in future studies.
Integration of RNA-seq and WGS for variant annotation
Interpretation of complex SVs in noncoding regions of the genome presents a major challenge for reporting of WGS findings. cWGTS enables the concomitant detection of SVs and assessment of the transcriptomic consequences of the affected loci. For example, a chromoplexy event resulting in overexpression of the MYB oncogene39 through “hijacking” of an NFIB enhancer (Fig. 5c, d) was detected in an adenoid cystic carcinoma without informative clinical sequencing findings. While MYB overexpression is a cardinal feature of adenoid cystic carcinomas, MYB fusions are identified in only 30% of cases using conventional diagnostic assays39. Integration of gene expression data was critical to the annotation and reporting of this complex noncoding SV as the disease defining diagnostic biomarker.
Similarly, among 29 osteosarcoma patients, we identified TP53 mutations in 12 and mapped noncoding SVs targeting the TP53 locus in 13. Of these, only 3 were reported by MSK-IMPACT (Fig. 5e). Integration with RNA-seq demonstrated that TP53 SVs correlated with loss of TP53 expression, validating their functional relevance (Fig. 5f). Wild-type TP53 represents an inclusion criterion for p53 pathway modulating drug trials40. Here, we show that in the absence of cWGTS, patients with loss of TP53 by SVs, which have been described in diverse cancers, could be erroneously diagnosed as TP53 wildtype with implications for assessment of treatment options40. We did not identify germline SVs targeting the TP53 locus.
Taken together, our findings illustrate the necessity to combine RNA and DNA analyses in variant detection, annotation, and prioritization for clinical cWGTS reporting. In our automated workflow, we interrogate DNA mutations for corroborating evidence in the RNA. All recurrent fusion genes reported by panel RNA-seq assays as well as the 8 additional driver fusion genes were orthogonally detected by both WGS and RNA-seq (Supplementary Fig. 5). Furthermore, integration of gene expression data to SV findings resolved functional consequences of SVs targeting noncoding regions on the genome (e.g., MYB enhancer hijacking and TP53 inactivation).
Global gene expression signatures were further used to cluster samples by tumor type, providing further opportunity to resolve a patient’s diagnosis (Supplementary Fig. 6a). Last, in the 101 patients with RNA-seq data, we identified on average 18 gene expression biomarkers per sample (range = 1–99)16 (Supplementary Table 10). However, the evidence with regard to the clinical utility of such expression biomarkers remains to be validated. To this end, we performed a systematic interrogation of genes with aberrant expression with known tissue-specific expression patterns and SVs or fusions detected from cWGTS. Overall, only 8% of the expression biomarkers were associated with an acquired SV or fusion gene, demarcating a subset of high-confidence expression biomarkers (average per patient = 1, range = 0–10). This limits the number of patients in our cohort with an expression biomarker supported by an SV to 54% (n = 55) (Supplementary Fig. 6b). Demonstrating the utility of this integrative analysis, we identified a concomitant KRAS amplification and overexpression in two patients with no clinically relevant biomarkers (H135462 and H195916) by clinical testing. Of note, in both patients, the amplification was not reported by MSK-IMPACT pointing to the lower sensitivity to detect copy number changes by panel-based assays.
Integration of germline mutations to somatic mutation signatures for variant annotation
Annotation of germline variants is restricted to recurrent events in population databases, thus limiting interpretation for rare founder events. For each genome in our cohort, we quantified the proportion of mutations attributed to each of 73 reference mutation signatures10. Two of three patients with a germline mutation in DNA repair genes further harbored mutation profiles suggestive of DNA repair deficiency. Patient H135421 had a pathogenic variant in MUTYH and somatic loss of the second allele. About 42% of the mutations were attributed to the MUTYH signature SBS3610 (Fig. 6a). Patient H135466 had a pathogenic variant in PMS2 (c.538-1G > C) with loss of the wild-type allele by LOH. The tumor was MSI high with hypermutation (TMB = 11.23, indels = 90,246, SNVs = 17,840, and SVs = 44), enrichment of T > C mutations, and repeat-mediated indels characteristic of PMS2 deficiency10 (Fig. 6b). In contrast, patient H135073 harbored a variant of unknown significance (VUS) in PMS2, a medium MSI score (7.23), and low mutation burden (1.30 Muts/Mb) without evidence of a PMS2 signature (Fig. 6c). These findings demonstrate the utility of mutation signatures in the assessment of germline mutations in DNA repair genes.
To illustrate this point, in a 12-year-old osteosarcoma patient outside the study cohort, cWGTS characterized a hypermutated genome (TMB = 16.7, indel = 89,588, SNVs = 31,520, and SVs = 568) enriched in repeat-mediated deletions consistent with MSI high status (Fig. 6d). This observation prompted consent for germline testing, resulting in identification of a PMS2 mutation (p.D699H) annotated as likely pathogenic/VUS41 and a somatic loss of the wild-type allele. MSK-IMPACT reported an indeterminate MSI status (7.5) yet upon testing validated that the germline mutation is pathogenic.
These results demonstrate the utility of integrating composite readouts from cWGTS (germline mutation, allele-specific copy number, and genome-wide TMB) to deliver corroborating evidence for the assessment and reporting of germline-predisposition mutations with implications for family screening, diagnosis, and treatment.
Genomic alterations of emerging biological and clinical relevance
Recent studies propose telomere length as prognostic indicators in neuroblastoma among other cancers42,43,44,45. We recapitulated the established associations between ATRX and TERT mutations to telomere length46,47 (Supplementary Fig. 7a–c). ATRX mutations were also observed in 8 osteosarcomas, with similar associations to telomere length (Supplementary Fig. 7c). Given the association between adverse risk mutations and telomere length, delineation of the independent prognostic value warrants analyses of data that concomitantly map these mutations, SVs, and telomere length alongside established predictors of outcomes.
We detect chromothripsis20 in 40% of patients, most recurrently observed in sarcomas (35/58), germ cell tumors (2/4), and less frequently in neuroblastoma (6/29) (Supplementary Fig. 7d). Chromothripsis frequently led to TP53 loss (10/29)20,48, amplification of MYC, VEGFA, and MDM2. Additionally, in 2 patients, chromothripsis resulted in oncogenic fusions (MAN1A2-ACBD6 and PAX3-FOXO3) (Supplementary Fig. 7e). Previous studies have proposed an association between whole-genome duplication (WGD) and poor outcomes in cancer21. WGD was seen in 42/114 patients with an enrichment in sarcoma (24/54), carcinomas (3/7), and neuroblastoma (11/30) (Supplementary Fig. 7f).
Biological and clinical implications of tumor mutation burden across variant classes
Panel-based approaches derive estimates of TMB, MSI scores, and mutation signatures49, whereas WGS directly quantifies genome-wide mutation burden across all variant classes (Fig. 7a, b and Supplementary Fig. 8a). We observed higher overall (8.3-fold) TMB estimates in our cohort, relative to reports in pediatric cancer (Fig. 7a)10,48,50. TMB was higher in therapy-exposed compared with treatment-naive patient samples (0.1–11.2 in treated vs 0–2.7 treatment-naive, Mann–Whitney test, p = 1.892e-04) and correlated with evidence of treatment-related signatures (i.e., temozolomide, platinum) (Supplementary Fig. 8b)10,51, observed in 45/114 patients pointing to persistence of clones that were exposed to and survived cancer therapy52.
Patients H135022 (adrenocortical carcinoma) and H135462 (clear-cell carcinoma) had progressive on-treatment metastatic disease and in the absence of therapy-informing biomarkers by clinical testing were at the end of their therapeutic options. cWGTS analyses revealed a profoundly rearranged genome scoring these two patients as the highest in fusion burden and SV burden in the cohort (Fig. 7b–d). H135022 was treated with checkpoint blockade (nivolumab/ipilimumab), resulting in complete response after three cycles of therapy, and is disease-free 26 months after therapy cessation (Fig. 7c), whereas patient H135462 was treated with pembrolizumab, achieved a complete response after 6 cycles, and remains disease-free 10 months after therapy (Fig. 7d).
These findings demonstrate the value of cWGTS to fully assess the level of genomic instability across variant classes and highlight the need to further evaluate SV and fusion gene burden as biomarkers of response to immune checkpoint blockade therapies.
Derivation of comprehensive WGS profiling in cell-free DNA
Our study evaluated key technical considerations of cWGTS in FF biopsies, as an optimal source of tumor DNA. However, limited biopsies may restrict access to cWGTS for all patients. Cell-free DNA (cfDNA) from blood plasma represents an alternative source of DNA for tumor profiling53. Recently, cfDNA NGS profiling including WGS has been used to detect tumor-specific CNAs and fragmentation patterns in pediatric tumors54,55,56. However, the potential of high-depth WGS in tissue-naive identification of the genetic changes (SNVs, indels, and SVs) in a patient’s cancer genome from cfDNA has been largely unexplored.
In an exploratory analysis, we performed WGS from matched FF samples and cfDNA from seven patients collected at the time of tumor biopsy (Supplementary Table 11). cfDNA genome-wide coverage ranged from 94 to 102× (Fig. 8a) with a wide range of tumor content in cfDNA ~10–83% (Fig. 8b). To assess the suitability of cfDNA for unbiased genome-wide mutation detection, we performed de novo mutation calling in cfDNA for all variant classes (SNVs, indels, SVs, and CNA) and compared the results to FF analyses.
WGS from cfDNA did not present technical limitations in data generation or false-positive variant calling across mutation classes. Derivation of high-quality variant calls was contingent upon the quantity of circulating tumor DNA (ctDNA). In four patients with ctDNA content sufficient for CNA detection, we establish good concordance between the FF and cfDNA CNA profiles (Fig. 8c, Supplementary Fig. 9a–g). Strikingly, for patients with high ctDNA content (i.e., IH158182), we derived a near-complete picture of the genome-wide mutation patterns demonstrating that cancer genomic landscape can be fully recapitulated by cfDNA WGS (Fig. 8d). Importantly, for patient H135967, we showcase that even with an estimated ctDNA content of 20%, the same threshold used for analyses of FF material, we can detect all the known oncogenic events in the FF sample across variant classes, which include a TP53 substitution, MYC and CCNE1 amplifications, and SVs targeting ARID1A and ATRX (Supplementary Fig. 9c).
We further demonstrate the potential of cfDNA to capture a comprehensive representation of different types of variants across the tumor phylogeny compared with solid biopsies (Supplementary Table 11) through the detection of cfDNA-specific subclones (Fig. 8e, Supplementary Fig. 9a–g). These results provide the proof-of-concept for the feasibility of deriving tumor-agnostic comprehensive WGS profiling from a liquid biopsy.
We present a comprehensive technical assessment for cWGTS implementation in clinical care practice in oncology. We demonstrate that using a single integrated workflow, cWGTS captures the full spectrum of cancer-associated genomic alterations that are assessed using a diversity of standard-of-care diagnostic assays. With implementation of best laboratory and computational practices, we execute an end-to-end sample-to-report turnaround time within 9 days, which is aligned to clinical needs for diagnosis and care decisions, and is comparable to infrastructures of scale18. Despite 5–10-fold lower sequencing coverage compared with panel-based assays, we demonstrate that in matched biopsies, cWGTS recovered all clinically reported variants by high-depth targeted profiling assays. We establish >80× coverage and tumor purity of at least 20% to attain this sensitivity. However, this sets a stringent quality threshold on fresh frozen tumor specimens that are not as broadly available as FFPE. However, a major limitation of WGS in FFPE is a high error rate in genome-wide calls57. To this end, we provide proof-of-concept feasibility data demonstrating that comprehensive WGS profiling can also be leveraged in patients who have high cfDNA content in circulation at the time of diagnosis or relapse. Our findings pave the way for future studies focused on analytical validation and optimizations of comprehensive tumor-agnostic WGS profiling from cfDNA for diagnostic purposes.
To support cWGTS variant annotation and prioritization, we implemented an analytical workflow that learns from variant annotation databases and integrates signals from germline mutations, somatic DNA, and RNA-seq findings. This allows us to annotate, validate, and prioritize SVs of diagnostic (e.g., MYB enhancer hijacking), prognostic (e.g., ATRX/TERT), and therapeutic relevance (e.g., TP53 loss-of- function SVs). Consistent with recent literature for pediatric and rare cancers7,15,16,58, >50% of patients had additional findings of established biological or clinical significance. The majority of these findings were SVs in cancer genes, fusion genes, and genome-wide mutation signatures that targeted panels are not optimally designed to identify. Importantly, we demonstrate that only a minority of such additional findings would be captured by WES and RNAseq alone or in combination. Larger cohort studies are warranted to determine the incidence and prevalence of clinically relevant biomarkers captured by cWGTS.
The clinical relevance of cWGTS extends beyond that of rare cancers59. We show that by cWGTS, we detect the full spectrum of cancer-associated mutations in 99% of patients. The vision of patient-tailored medicine warrants the delivery of clinical decisions that extend beyond a single druggable biomarker and rather consider the composite readouts from a patient’s cancer genome that inform on a patient’s a priori risk of developing cancer, diagnosis, likelihood of treatment response, risk of progression, and therapeutic vulnerabilities. With increasing implementation of cWGTS on well-annotated clinical specimens15,16,58, our ability to interpret cWGTS findings will improve, and by extension, the clinical utility of cWGTS will expand. As the economic barriers to cWGTS are mitigated in time, a single comprehensive assessment of the cancer genome is positioned to replace multiple targeted diagnostic tests in prospective clinical sequencing59.
Patients who were seen within the Department of Pediatrics at Memorial Sloan Kettering Cancer Center with presumed or established solid tumor malignancies (including CNS tumors) were eligible to enroll on an institutional prospective tumor/germline-sequencing protocol (ClinicalTrials.gov number, NCT01775072) with informed consent from the patients or their guardians. This study was approved by the MSKCC Institutional Review Board/Privacy Board. Patients with newly diagnosed as well as relapsed/refractory disease were eligible. Adults with pediatric-type malignancies or rare cancers up to the age of 39 were also eligible to enroll.
DNA extracted from formalin-fixed paraffin-embedded (FFPE) tumor and blood samples (as a matched normal) was sequenced using MSK-IMPACT, an FDA-approved targeted panel used to sequence patients’ tumors at MSKCC. MSK-IMPACT captures protein-coding exons of 468 cancer-associated genes, introns of frequently rearranged genes, and genome-wide copy number (CN) probes4. Tumor and normal samples were sequenced at 800× and 600×, respectively. Established pipelines followed by manual review were used to characterize germline and somatic mutations, CN variants, and if targeted, genomic rearrangements as previously described1. Germline data for alterations in cancer-predisposition genes were analyzed in 88 genes as previously described2. For select tumor indication MSK-Fusion3, a New York State-approved RNA-capture assay that targets common RNA fusion genes in solid tumors was also performed. Clinically relevant findings were annotated using OncoKb tiers 1–44.
For 114 subjects enrolled in the study, tumor DNA was extracted from fresh frozen (FF) or OCT tissue biopsies and matched normal DNA from buffy coat using the DNeasy Blood & Tissue Kit (Qiagen catalog # 69504) according to the manufacturer’s protocol. FFPE tissue was deparaffinized using heat treatment (90 °C for 10′ in 480 μL PBS and 20 μL 10% Tween-20), centrifugation (10,000 × g for 15′), and ice chill. Paraffin and supernatant were removed, and the pellet was washed with 1 mL of 100% EtOH followed by an incubation overnight in 400 µl of 1 M NaSCN for rehydration and impurity removal. Tissues were subsequently digested with 40 µl of Proteinase K (600 mAU/ml) in 360 µl of Buffer ATL at 55 °C. DNA isolation proceeded with the DNeasy Blood & Tissue Kit (QIAGEN catalog # 69504) according to the manufacturer’s protocol modified by replacing AW2 buffer with 80% ethanol. All DNA was eluted in 0.5X Buffer AE.
After PicoGreen quantification and quality control by Agilent BioAnalyzer, 500 ng of genomic DNA were sheared using a LE220-plus Focused-ultrasonicator (Covaris catalog # 500569) and sequencing libraries were prepared using the KAPA Hyper Prep Kit (Kapa Biosystems KK8504) with modifications. Briefly, libraries were subjected to a 0.5X size select using aMPure XP beads (Beckman Coulter catalog # A63882) after post-ligation cleanup.
PCR-free libraries were pooled equivolume for sequencing. Samples were run on a NovaSeq 6000 in a 150-bp/150-bp paired-end run, using the NovaSeq 6000 SP, S1, S2, or S4 Reagent Kit (300 cycles) (Illumina). Tumors were covered to an average of 95X (range = 67–181) and normals at 50X (range = 32–159).
Tumor tissue from FF biopsies was homogenized in 1 mL TRIzol Reagent (ThermoFisher catalog # 15596018) followed by phase separation with 200 µL chloroform. RNA was extracted from the aqueous phase using the miRNeasy Micro Kit (Qiagen catalog # 217084) on the QIAcube Connect (Qiagen) according to the manufacturer’s protocol with 350 µL input. Samples were eluted in 15 µL of RNase-free water.
Whole-transcriptome RNA sequencing
After RiboGreen quantification and quality control by Agilent BioAnalyzer, 18ng–1µg of total RNA with an RNA integrity number varying from 1 to 9.9 underwent ribosomal depletion and library preparation using the TruSeq Stranded Total RNA LT Kit (Illumina catalog # RS-122-1202) according to instructions provided by the manufacturer with 8 cycles of PCR. Samples were barcoded and run on a HiSeq 2500 in Rapid Mode or HiSeq4000 at PE100 or on a NovaSeq 6000 at PE150, using the HiSeq Rapid SBS Kit v2, HiSeq 3000/4000 SBS Kit, or NovaSeq 6000 SP, S1, S2, or S4 Reagent Kit (300 cycles) (Illumina). Sequencing was performed to achieve a median of 83 million paired reads per sample.
cfDNA extraction and whole-genome sequencing
Cell-free DNA (cfDNA) was extracted from plasma using MagMAX cfDNA isolation kit. After PicoGreen quantification, 47–500 ng of cfDNA were used to make sequencing libraries using the KAPA Hyper Prep Kit (Kapa Biosystems KK8504) with 4 cycles of PCR and pooled equimolar. One sample with sufficient input was prepared PCR-free. Samples were run on a NovaSeq 6000 in a PE150 run, using the NovaSeq 6000 SBS v1 Kit and an S4 flow cell (Illumina). The average coverage per sample was 91X.
In order to achieve stable turnaround times of 9 days, dedicated resources and optimizations were needed, such as to minimize human steps in the process. In the sequencing core, lab technicians along with sequencers were needed to process and quality-control the incoming samples. A high-throughput connection was used to transfer sequencing data to the bioinformatics core with automatic notifications. An ETL cron job was developed to synchronize relevant deidentified metadata regularly from clinical systems. The data and bioinformatics analyses were tracked and automated using the Isabl platform5. In order to achieve stable algorithm turnaround times, parallelization was often split by the estimated amount of work (i.e., number of reads; https://github.com/papaemmelab/split_bed_by_reads) rather than genomic length. Processing was performed within a heavily shared internal high-performance computing (HPC) cluster with around 4000 cores. The results were automatically curated and prioritized using both cached databases and live APIs in order to reduce interpretation time.
Analysis of cWGTS data was executed using Isabl platform5 and included: 1. data QC; 2. ensemble variant calling for germline and somatically acquired mutations from at least two out of three algorithms run for each variant class; 3. signature extraction (i.e., mutation signatures, microsatellite-instability score, and gene expression); 4. variant classification; and, 5. the generation of a clinical prototype summary report. Briefly, upon completion of each sequencing run, Isabl imports paired tumor-normal FASTQ files, executes alignment, quality-control algorithms, and generates tumor purity and ploidy estimates. For samples with sufficient coverage (>60×) and tumor purity (>20%), ensembl variant calling for each variant class (substitutions, insertions and deletions, and structural variations) is performed. High-confidence somatic mutations were classified with regard to their putative role in cancer pathogenesis and statistical post-processing enables the derivation of microsatellite-instability scores and mutation signatures6. RNA-seq data were independently analyzed for acquired fusions and gene expression metrics in a subset (n = 101). For a subset of patients with consent (n = 100) for germline analyses, the normal genome was also independently analyzed.
Clinical relevance of mutations in common cancer genes was annotated using OncoKb, COSMIC, Ensembl Variant Effect Predictor, VAGrENT, gnomAD, and ClinVar databases725 (refs. 60,61,62,63). Additionally, integration of signals across data modalities (germline, somatic mutations, somatic signatures, CN segments, and gene expression profiles) was executed to further determine the significance of observed events. Population filtering, database comparison, and somatic data integration were performed using methods in accordance with the American College of Medical Genetics and ClinGen Somatic/Germline Data Integration subcommittee64,65,66. Last, the findings were automatically embedded into a single-page summary (.html) report containing high-level clinical data, quality-control metrics, genetic findings, and relevant data-visualization plots (i.e., CIRCOS plots, mutation signatures, and gene expression clustering by tSNE). Putative findings of clinical relevance identified by WGS and RNA-seq were reviewed by an interdisciplinary team of clinical oncologists, molecular pathologists, and cancer genomics experts. Typically 8–10 cases were reviewed in an hour-long tumor board meeting that was held biweekly. The findings were categorized with regard to their relevance in clinical practice as 1. diagnostic, 2. risk predisposition for germline variants, 3. prognostic, 4. therapy-informing, 5. pathogenic, 6. likely pathogenic, or 7. variant of unknown significance (VUS).
Whole-genome/transcriptome alignment and quality control
Whole-genome paired-end reads were aligned to human reference genome (GRCh37d5) using BWA-mem (v0.7.17) as a part of the pcap-core v2.18.2 wrapper (https://github.com/cancerit/PCAP-core)67. The wrapper includes marking of duplicates using Picard. Whole-transcriptome sequencing reads were aligned using Spliced Transcripts Alignment to a Reference, STAR (v2.5.4b, https://github.com/alexdobin/STAR) with Ensembl 75 for transcript information68. Upon alignment, BAM files for tumor/normal WGS and tumor RNA-seq data for each individual were compared using Conpair69 in order to detect potential sample swaps and cross-individual contamination. Genome-wide median coverage was calculated using Mosdepth70 with minimum mapping quality of 20. Tumor purity and ploidy was estimated using Battenberg (https://github.com/cancerit/cgpBattenberg) and somatic substitution calls. Additionally, a quality-control report is generated per sample using MultiQC (v1.9) (https://github.com/ewels/MultiQC) to aggregate alignment and read statistics from FastQC (v0.11.5) (https://github.com/s-andrews/FastQC), Picard (v2.25.6) (https://github.com/broadinstitute/picard), and RNA-SeQC (v188.8.131.52) (https://software.broadinstitute.org/cancer/cga/rna-seqc)70,20.
Identification of somatic mutations in whole-genome sequences
Somatic alterations were detected comparing the tumor against the matched normal for each variant type. All bioinformatic tools were launched using an in-house wrapper. Allele-specific subclonal CN changes were detected using Battenberg (cgpBattenberg v1.4.0) (https://github.com/cancerit/cgpBattenberg)71. Single-nucleotide variants (SNVs) were identified using Strelka2 (v2.9.1 with manta v1.3.1), (https://github.com/Illumina/strelka), MuTect2 (gatk:v184.108.40.206),(https://github.com/broadinstitute/gatk), and CaVEMan (cgpCavemanWrapper v1.7.5) (https://github.com/cancerit/cgpCaVEManWrapper)72,73,74 Variant post-processing was done using default flags for Strelka2 and MuTect2, while for CaVEMan, cgpCavemanPostprocessing (v1.5.2) was used filtering for sequencing artifacts with > =3 mutant alleles in at least 1% of samples within a panel of 100 unmatched blood normal (https://github.com/cancerit/cgpCaVEManPostProcessing). Small insertions and deletions (indels) were detected using Strelka2, MuTect2, and Pindel (cgpPindel v1.5.4) (https://github.com/cancerit/cgpPindel) and filtered against a panel of 100 unmatched normals75. Structural genomic variants (SVs) were identified using SvABA (~v1.0.0 commit 47c7a88) (https://github.com/walaj/svaba), GRIDSS (v2.2.2) (https://github.com/PapenfussLab/gridss), and BRASS (v4.0.5 with GRASS v1.1.6) (https://github.com/cancerit/BRASS) using a panel of 100 in-house unmatched normals76,77.
Variant consolidation and annotation
VCF files for SNVs and indels were merged with an in-house wrapper using chromosome, position, reference allele, and alternative allele. The merged VCFs were annotated with VAGrENT (v3.3.0, https://github.com/cancerit/VAGrENT) and VEP (v92, https://github.com/Ensembl/ensembl-vep)63,79 VCF files for SVs were merged using MergeSVvcfs (v1.0.2, https://github.com/papaemmelab/mergeSVvcf). High-confidence mutations were designated as those that were passed by at least 2 callers.
Designation of putative oncogenic mutations
We define a variant as oncogenic if it represents an established “driver” mutation on the basis of prior literature and recurrence in cancer genome. For SNVs, indels, and fusion genes, these annotations are derived from OncoKb. For SVs, we annotate events that target known oncogenes and tumor suppressor genes and use prior literature as reference (e.g., for TERT, ATRX, and TP53 SVs).
Identification of germline mutations
Germline single-nucleotide polymorphisms (SNPs) and indels were detected using Stelka2 and Freebayes (v1.2.0, https://github.com/ekg/freebayes) with an in-house wrapper. VCF files were merged and annotated using the same procedure used for the somatic variants80. Germline variants called by both callers were considered high-confidence. Germline variants were prioritized for review by filtering for recurrence in the current cohort, frequency in any population of 1000 genomes/Gnomad and ClinVar.
Characterization of gene fusions
Gene fusions were identified using three different callers: FusionCatcher (v1.0.0, https://github.com/ndaniel/fusioncatcher), STAR-Fusion (v1.3.1, https://github.com/STAR-Fusion/STAR-Fusion), and FuSeq (v1.1.1, https://github.com/nghiavtr/FuSeq)81,82,83. Calls were merged by gene pair and annotated using FusionCatcher’s databases. Fusions were considered confident if called by at least 2 callers. Events were visualized with the plotting functionality by Arriba (https://github.com/suhrig/arriba)84.
Gene expression analysis
Gene expression profiles were ascertained in transcripts per million (TPM) using SALMON (v0.10.0, https://github.com/COMBINE-lab/salmon)85. tSNE was performed on RNA-seq data from 101 tumors from the cohort and an in-house reference consisting of 155 pediatric tumors using python scikit-learn (v0.21.1, https://scikit-learn.org/) and visualized interactively using python bokeh (v1.2.0, https://docs.bokeh.org/)86.
Identification of gene expression biomarkers
Expression biomarkers were assessed using the methodology outlined by Horak et al.16. Only actionable genes outlined in the publication supplementary as assessed by severe overexpression, overexpression, severe underexpression, or underexpression, were evaluated. An internal reference cohort of 274 tumor RNA samples was used as a baseline. A gene was considered over-/underexpressed if expression was in the top or bottom ten percent of the reference cohort, while severe over-/underexpression was categorized as expression in the top or bottom five percent.
Calculation of tumor mutational burden
Tumor mutational burden (TMB) was calculated using high-confidence, somatic substitutions and indels that fall within coding regions. The totals for these variant classes were combined and then converted to coding TMB using a divisor of 30 to approximate the length of the human exome in Mb. Values greater than 2 coding mutations per Mb were considered pediatric high and values greater than 10 coding mutations per Mb were considered hypermutators, thresholds set by the study in Grobner et al.16.
Identification of mutation signatures of point mutations
Mutational signature analysis was performed with the MutationalPatterns package (v1.6.1, https://bioconductor.org/packages/release/bioc/html/MutationalPatterns.html) and using the COSMIC Mutational Signatures (v3.1) with the addition of Temozolomide signature from Kucab et al.87,88.
Assessment of ITH between matched MSK-IMPACT and WGS samples
ITH between the matched FFPE and FF samples that underwent MSK-IMPACT and WGS sequencing was assessed by comparing CN changes and substitutions/indels falling within 468 genes included in MSK-IMPACT. For substitutions/indels, the clonal representation in the two assays was compared by performing a proportion test comparing the VAFs reported and adjusting for assay-specific local depth and purity. Mutations with a p-value <0.05 have a statistically significant difference in clonal presentation suggesting ITH. CN profiles from MSK-IMPACT generated using FACETS89 were compared with the Battenberg output from WGS. In patients where DNA was available, resequencing of discordant mutations was performed using the MSK-IMPACT panel on the same DNA that underwent WGS sequencing at a median depth of 438X90. Tumor purity of both assays was taken into account to mitigate the effects of technical issues on mutation calling.
Inference of clonal structure
Clonal structure was analyzed using high-confidence SNVs called in each biopsy or the union of SNVs whenever multiple biopsies were available for a patient. DPClust (v0.2.2, https://github.com/Wedge-Oxford/dpclust) was used for calculation of cancer cell fraction corrected for purity and local CN, as well as clustering and assignment of mutations across samples with the exception of the Gibbs Sampling Dirichlet Process step that was optimized internally71. Clonal ordering was deduced using clonevol (v0.99.11, https://github.com/hdng/clonevol)91. Mutational signatures were computed in each cluster independently. Figures were generated with matplotlib (v3.1.0, https://matplotlib.org/).
Estimates of telomere length
Derivation of subsampled BAM files and sensitivity assessment
A total of 298 subsampled BAMs were generated using samtools (v1.11, https://github.com/samtools/samtools) view command with the subsampling option93. Median coverages were calculated for the original BAMs using Mosdepth (v0.2.5, https://github.com/brentp/mosdepth) with mapping quality >20 and then used to calculate fractions to downsample to approximately 100×, 80×, 60×, and 30×–40× where original coverage was allowed70. Mosdepth was used to verify that the median coverage of the subsampled BAM fell within +/−5× of the desired coverage. De novo variant calling and annotation was then performed independently on the subsampled BAM files using the same procedure as cWGTS as described above.
cfDNA tumor content and variant comparison
Tumor content in cfDNA specimens was estimated using Battenberg and manual inspection with the help of SNV VAF density plots. De novo variant calling was performed independently using the methods described for identification of somatic mutations in WGS. Further analysis was done to compare specific clinical variants identified by MSK-IMPACT using hileup (v1.0.0, https://github.com/brentp/hileup). Clonal structure across tissue and cfDNA samples was inferred using the same methods as described before followed by analysis of clone-specific mutational signatures. All mutations across the clonal structures were then piled up across corresponding FF WGS data to look for evidence of mutations in both specimens.
Variant curation and characterization of incremental findings from cWGTS
Variant curation of targeted NGS assays was performed as previously described4,28. To assess variants identified by cWGTS, a multidisciplinary team of disease experts, clinical geneticists, molecular pathologists, and genomics experts assembled regularly to classify molecular alterations (somatic and germline), mutation signatures, and gene expression data. Incremental findings of cWGTS were defined as established oncogenic alterations or signatures not identified by matched MSK-IMPACT somatic or germline NGS (DNA) or ArcherDx targeted NGS (RNA). Incremental findings were further classified as clinically relevant if they met one of the following criteria: (1) diagnostic finding—defined as an alteration that provided justification or an alteration in cancer diagnosis or cancer-subtype diagnosis, (2) established prognostic finding—defined as an alteration with established prognostic relevance with robust support from scientific literature, (3) likely pathogenic or known pathogenic germline- predisposition event, (4) treatment-informing finding—defined as an alteration that provides direct justification of a therapeutic modality, or (5) a driver oncogenic fusion.
Further information on research design is available in the Nature Research Reporting Summary linked to this article.
The raw data for WGS and RNA-seq data generated in this study have been deposited in the dbGAP database under accession code phs002620.v1.p. These data are available under restricted access due to individual privacy concerns. Permanent employees of an institution at a level equivalent to a tenure-track professor or senior scientist with laboratory administration and oversight responsibilities may request access through dbGAP. The requests, which are managed by NCI’s Data Access Committee, take less than 2 days for approval and access is permitted for 12 months. The processed MSK-IMPACT data are available in a study-specific dataset at cbioPortal [https://www.cbioportal.org/study/summary?id=mixed_kunga_msk_2022]. Summary and processed data for the figures are available in the source data file as well as the data repository at https://github.com/papaemmelab/Shukla_Levine_Gundem. Annotation databases included public resources such as Cancer Gene Census, OncoKb, ClinVar, 1000genomes, gnomAD, and Ensembl Variant Effect Predictor (VEP) databases. The remaining data are available within the article and Supplementary Information file.
Scripts for generating the figures are provided at https://github.com/papaemmelab/Shukla_Levine_Gundem.
Beaubier, N. et al. Integrated genomic profiling expands clinical options for patients with cancer. Nat. Biotechnol. 37, 1351–1360 (2019).
Frampton, G. M. et al. Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing. Nat. Biotechnol. 31, 1023–1031 (2013).
Chakravarty, D. & Solit, D. B. Clinical cancer genomic profiling. Nat. Rev. Genet. https://doi.org/10.1038/s41576-021-00338-8 (2021).
Zehir, A. et al. Mutational landscape of metastatic cancer revealed from prospective clinical sequencing of 10,000 patients. Nat. Med. 23, 703–713 (2017).
Gong, J., Pan, K., Fakih, M., Pal, S. & Salgia, R. Value-based genomics. Oncotarget 9, 15792–15815 (2018).
Parsons, D. W. et al. Diagnostic yield of clinical tumor and germline whole-exome sequencing for children with solid tumors. JAMA Oncol. 2, 616–624 (2016).
Rusch, M. et al. Clinical cancer genomic profiling by three-platform sequencing of whole genome, whole exome and transcriptome. Nat. Commun. 9, 3962 (2018).
Koelsche, C. et al. Sarcoma classification by DNA methylation profiling. Nat. Commun. 12, 498 (2021).
Capper, D. et al. DNA methylation-based classification of central nervous system tumours. Nature 555, 469–474 (2018).
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
Davies, H. et al. HRDetect is a predictor of BRCA1 and BRCA2 deficiency based on mutational signatures. Nat. Med. 23, 517–525 (2017).
Wong, M. et al. Whole genome, transcriptome and methylome profiling enhances actionable target discovery in high-risk pediatric cancer. Nat. Med. 26, 1742–1753 (2020).
Wrzeszczynski, K. O. et al. Analytical validation of clinical whole-genome and transcriptome sequencing of patient-derived tumors for reporting targetable variants in cancer. J. Mol. Diagn. 20, 822–835 (2018).
Duncavage, E. J. et al. Genome sequencing as an alternative to cytogenetic analysis in Myeloid cancers. N. Engl. J. Med. 384, 924–935 (2021).
Newman, S. et al. Genomes for kids: the scope of pathogenic mutations in pediatric cancer revealed by comprehensive DNA and RNA sequencing. Cancer Discov. https://doi.org/10.1158/2159-8290.CD-20-1631 (2021).
Horak, P. et al. Comprehensive genomic and transcriptomic analysis for guiding therapeutic decisions in patients with rare cancers. Cancer Discov. https://doi.org/10.1158/2159-8290.CD-21-0126 (2021).
Medina-Martínez, J. S. et al. Isabl Platform, a digital biobank for processing multimodal patient data. BMC Bioinformatics 21, 549 (2020).
Roepman, P. et al. Clinical validation of whole genome sequencing for cancer diagnostics. J. Mol. Diagn. 23, 816–833 (2021).
Marabelle, A. et al. Association of tumour mutational burden with outcomes in patients with advanced solid tumours treated with pembrolizumab: prospective biomarker analysis of the multicohort, open-label, phase 2 KEYNOTE-158 study. Lancet Oncol. 21, 1353–1365 (2020).
Stephens, P. J. et al. Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell 144, 27–40 (2011).
Bielski, C. M. et al. Genome doubling shapes the evolution and prognosis of advanced cancers. Nat. Genet. 50, 1189–1195 (2018).
Jonigk, D. et al. Molecular and clinicopathological analysis of Epstein-Barr virus-associated posttransplant smooth muscle tumors. Am. J. Transpl. 12, 1908–1917 (2012).
Shannon-Lowe, C. & Rickinson, A. The global landscape of EBV-associated tumors. Front. Oncol. 9, 713 (2019).
Lindrose, A. R. et al. Method comparison studies of telomere length measurement using qPCR approaches: a critical appraisal of the literature. PLoS ONE 16, e0245582 (2021).
Chakravarty, D. et al. OncoKB: annotation of the oncogenic effect and treatment implications of somatic mutations in cancer. J. Clin. Oncol. 34, 11583–11583 (2016).
Gerlinger, M. et al. Intratumor heterogeneity and branched evolution revealed by multiregion sequencing. N. Engl. J. Med. 366, 883–892 (2012).
Fiala, E. M. et al. Prospective pan-cancer germline testing using MSK-IMPACT informs clinical translation in 751 patients with pediatric solid tumors. Nat. Cancer 2, 357–365 (2021).
Benayed, R. et al. High yield of RNA sequencing for targetable kinase fusions in lung adenocarcinomas with no mitogenic driver alteration detected by DNA sequencing and low tumor mutation burden. Clin. Cancer Res. 25, 4712–4722 (2019).
Stevens, T. M. et al. NUTM1-rearranged neoplasia: a multi-institution experience yields novel fusion partners and expands the histologic spectrum. Mod. Pathol. 32, 764–773 (2019).
Lee, J. C. et al. Recurrent KBTBD4 small in-frame insertions and absence of DROSHA deletion or DICER1 mutation differentiate pineal parenchymal tumor of intermediate differentiation (PPTID) from pineoblastoma. Acta Neuropathol. 137, 851–854 (2019).
Wegert, J. et al. Mutations in the SIX1/2 pathway and the DROSHA/DGCR8 miRNA microprocessor complex underlie high-risk blastemal type Wilms tumors. Cancer Cell 27, 298–311 (2015).
Skapek, S. X. et al. Rhabdomyosarcoma. Nat. Rev. Dis. Prim. 5, 1 (2019).
Cancer Genome Atlas Research Network. Integrated genomic characterization of papillary thyroid carcinoma. Cell 159, 676–690 (2014).
Agnihotri, S. et al. The genomic landscape of schwannoma. Nat. Genet. 48, 1339–1348 (2016).
Drilon, A. et al. Efficacy of larotrectinib in TRK fusion-positive cancers in adults and children. N. Engl. J. Med. 378, 731–739 (2018).
Ackermann, S. et al. A mechanistic classification of clinical phenotypes in neuroblastoma. Science 362, 1165–1170 (2018).
Valentijn, L. J. et al. TERT rearrangements are frequent in neuroblastoma and identify aggressive tumors. Nat. Genet. 47, 1411–1414 (2015).
Chen, X. et al. Recurrent somatic structural variations contribute to tumorigenesis in pediatric osteosarcoma. Cell Rep. 7, 104–112 (2014).
Drier, Y. et al. An oncogenic MYB feedback loop drives alternate cell fates in adenoid cystic carcinoma. Nat. Genet. 48, 265–272 (2016).
Duffy, M. J. et al. p53 as a target for the treatment of cancer. Cancer Treat. Rev. 40, 1153–1160 (2014).
Arora, S. et al. Functional analysis of rare variants in mismatch repair proteins augments results from computation-based predictive methods. Cancer Biol. Ther. 18, 519–533 (2017).
Lawlor, R. T. et al. Alternative lengthening of telomeres (ALT) influences survival in soft tissue sarcomas: a systematic review with meta-analysis. BMC Cancer 19, 232 (2019).
Pezzolo, A. et al. Intratumoral diversity of telomere length in individual neuroblastoma tumors. Oncotarget 6, 7493–7503 (2015).
Ohali, A. et al. Telomere length is a prognostic factor in neuroblastoma. Cancer 107, 1391–1399 (2006).
Koneru, B. et al. Telomere maintenance mechanisms define clinical outcome in high-risk neuroblastoma. Cancer Res. 80, 2663–2675 (2020).
Hartlieb, S. A. et al. Alternative lengthening of telomeres in childhood neuroblastoma from genome to proteome. Nat. Commun. 12, 1269 (2021).
Sieverling, L. et al. Genomic footprints of activated telomere maintenance mechanisms in cancer. Nat. Commun. 11, 733 (2020).
Gröbner, S. N. et al. The landscape of genomic alterations across childhood cancers. Nature 555, 321–327 (2018).
Campbell, B. B. et al. Comprehensive analysis of hypermutation in human cancer. Cell 171, 1042–1056.e10 (2017).
Lawrence, M. S. et al. Mutational heterogeneity in cancer and the search for new cancer-associated genes. Nature 499, 214–218 (2013).
Pich, O. et al. The mutational footprints of cancer therapies. Nat. Genet. 51, 1732–1740 (2019).
Landau, H. J. et al. Accelerated single cell seeding in relapsed multiple myeloma. Nat. Commun. 11, 3617 (2020).
Bettegowda, C. et al. Detection of circulating tumor DNA in early- and late-stage human malignancies. Sci. Transl. Med. 6, 224ra24 (2014).
Klega, K. et al. Detection of somatic structural variants enables quantification and characterization of circulating tumor DNA in children with solid tumors. JCO Precis. Oncol. 2018, PO.17.00285 (2018).
Peneder, P. et al. Multimodal analysis of cell-free DNA whole-genome sequencing for pediatric cancers with low mutational burden. Nat. Commun. 12, 3230 (2021).
Andersson, D., Fagman, H., Dalin, M. G. & Ståhlberg, A. Circulating cell-free tumor DNA analysis in pediatric cancers. Mol. Asp. Med. 72, 100819 (2020).
Robbe, P. et al. Clinical whole-genome sequencing from routine formalin-fixed, paraffin-embedded specimens: pilot study for the 100,000 Genomes Project. Genet. Med. 20, 1196–1205 (2018).
Wong, M. et al. Whole genome, transcriptome and methylome profiling enhances actionable target discovery in high-risk pediatric cancer. Nat. Med. 26, 1742–1753 (2020).
Nangalia, J. & Campbell, P. J. Genome sequencing during a patient’s journey through cancer. N. Engl. J. Med. 381, 2145–2156 (2019).
Landrum, M. J. et al. ClinVar: public archive of interpretations of clinically relevant variants. Nucleic Acids Res. 44, D862–D868 (2016).
Forbes, S. A. et al. COSMIC: somatic cancer genetics at high-resolution. Nucleic Acids Res. 45, D777–D783 (2017).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
McLaren, W. et al. The ensembl variant effect predictor. Genome Biol. 17, 122 (2016).
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015).
Alberts, D. & Hess, L. M. Fundamentals of Cancer Prevention (Springer Science & Business Media, 2008).
Walsh, M. F. et al. Integrating somatic variant data and biomarkers for germline variant classification in cancer predisposition genes. Hum. Mutat. 39, 1542–1552 (2018).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29, 15–21 (2013).
Bergmann, E. A., Chen, B.-J., Arora, K., Vacic, V. & Zody, M. C. Conpair: concordance and contamination estimator for matched tumor-normal pairs. Bioinformatics 32, 3196–3198 (2016).
Pedersen, B. S. & Quinlan, A. R. Mosdepth: quick coverage calculation for genomes and exomes. Bioinformatics 34, 867–868 (2017).
Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).
Kim, S. et al. Strelka2: fast and accurate calling of germline and somatic variants. Nat. Methods 15, 591–594 (2018).
Van der Auwera, G. A. & O’Connor, B. D. Genomics in the Cloud: Using Docker, GATK, and WDL in Terra (O’Reilly Media, Incorporated, 2020).
Jones, D. et al. cgpCaVEManWrapper: simple execution of CaVEMan in order to detect somatic single nucleotide variants in NGS data. Curr. Protoc. Bioinforma. 56, 15.10.1–15.10.18 (2016).
Ye, K., Schulz, M. H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25, 2865–2871 (2009).
Wala, J. A. et al. SvABA: genome-wide detection of structural variants and indels by local assembly. Genome Res. 28, 581–591 (2018).
Cameron, D. L. et al. GRIDSS: sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly. Genome Res. 27, 2050–2060 (2017).
Niu, B. et al. MSIsensor: microsatellite instability detection using paired tumor-normal sequence data. Bioinformatics 30, 1015–1016 (2014).
Menzies, A. et al. VAGrENT: variation annotation generator. Curr. Protoc. Bioinforma. 52, 15.8.1–15.8.11 (2015).
Erik Garrison, G. M. Haplotype-based variant detection from short-read sequencing. arXiv https://doi.org/10.48550/arXiv.1207.3907 (2012).
Nicorici, D. et al. FusionCatcher - a tool for finding somatic fusion genes in paired-end RNA-sequencing data. https://doi.org/10.1101/011650 (2014).
Haas, B. J. et al. Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods. Genome Biol. 20, 213 (2019).
Vu, T. N. et al. A fast detection of fusion genes from paired-end RNA-seq data. BMC Genomics 19, 786 (2018).
Uhrig, S. et al. Accurate and efficient detection of gene fusions from RNA sequencing data. Genome Res. 31, 448–460 (2021).
Srivastava, A. et al. Alignment and mapping methodology influence transcript abundance estimation. Genome Biol. 21, 239 (2020).
Pedregosa, F. et al. Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011).
Blokzijl, F., Janssen, R., van Boxtel, R. & Cuppen, E. MutationalPatterns: comprehensive genome-wide analysis of mutational processes. Genome Med. 10, 33 (2018).
Kucab, J. E. et al. A compendium of mutational signatures of environmental agents. Cell 177, 821–836.e16 (2019).
Shen, R. & Seshan, V. E. FACETS: allele-specific copy number and clonal heterogeneity analysis tool for high-throughput DNA sequencing. Nucleic Acids Res. 44, e131 (2016).
Cheng, D. T. et al. Memorial Sloan Kettering-integrated mutation profiling of actionable cancer targets (MSK-IMPACT): a hybridization capture-based next-generation sequencing clinical assay for solid tumor molecular oncology. J. Mol. Diagn. 17, 251–264 (2015).
Dang, H. X. et al. ClonEvol: clonal ordering and visualization in cancer sequencing. Ann. Oncol. 28, 3076–3082 (2017).
Ding, Z. et al. Estimating telomere length from whole genome sequence data. Nucleic Acids Res. 42, e75 (2014).
Danecek, P. et al. Twelve years of SAMtools and BCFtools. GigaScience 10, giab008 (2021).
The authors would like to acknowledge Drs T. Heaton, M. LaQuaglia M, and J. Gerstle for executing surgical resections, Drs N. Schultz, D. Chakravarty, and M. Nissan for assistance with OncoKb annotations, Drs. D. Solit and A. Kentsis for support and interesting discussions, and Drs. N.K. Cheung, B. Kushner, E. Basu, P. Meyers, L. Wexler, P. Kothari, I. Dunkel, S. Gilheeney, Y. Khakoo, K. Kramer, S. Prockop, and T. Trippett for participation and clinical contributions to the study. E.P. is a Josie Robertson Investigator and is supported by the European Hematology Association, American Society of Hematology, Gabrielle’s Angels Foundation, V Foundation, and The Geoffrey Beene Foundation, and a Damon-Runyon Rachleff Innovator Award recipient. Funding for this study was supported by the Olayan Fund for Precision Pediatric Cancer Medicine.
E.P., A.L.K and J.S.M.M are founders, equity holders and hold fiduciary roles in Isabl Inc. E.P., A.L.K, J.S.M.M, M.F.L.G.G, J.Z, J.E.A.O, J.G.A, N.S are inventors on intellectual property related to a software platform for genomic data analytics, a non-provisional patent application has been filed 17/629292 titled “SYSTEMS AND METHODS FOR CANCER WHOLE GENOME AND TRANSCRIPTOME SEQUENCING (CWGTS)”. G.G. is a consultant in Isabl Inc.D.G. is a consultant and a shareholder of Repare Therapeutics. D.G. is a consultant and stock-option holder in MNM Diagnostics.
Peer review information
Nature Communications thanks Edwin Cuppen and the other anonymous reviewer(s) for their contribution to the peer review of this work. Peer review reports are available.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Shukla, N., Levine, M.F., Gundem, G. et al. Feasibility of whole genome and transcriptome profiling in pediatric and young adult cancers. Nat Commun 13, 2485 (2022). https://doi.org/10.1038/s41467-022-30233-7