Recurrent intragenic rearrangements of EGFR and BRAF in soft tissue tumors of infants

Soft tissue tumors of infancy encompass an overlapping spectrum of diseases that pose unique diagnostic and clinical challenges. We studied genomes and transcriptomes of cryptogenic congenital mesoblastic nephroma (CMN), and extended our findings to five anatomically or histologically related soft tissue tumors: infantile fibrosarcoma (IFS), nephroblastomatosis, Wilms tumor, malignant rhabdoid tumor, and clear cell sarcoma of the kidney. A key finding is recurrent mutation of EGFR in CMN by internal tandem duplication of the kinase domain, thus delineating CMN from other childhood renal tumors. Furthermore, we identify BRAF intragenic rearrangements in CMN and IFS. Collectively these findings reveal novel diagnostic markers and therapeutic strategies and highlight a prominent role of isolated intragenic rearrangements as drivers of infant tumors.

M any childhood tumors show a predilection for specific developmental stages. Tumors that predominantly occur in infancy include congenital mesoblastic nephroma (CMN), which accounts for 4% of all childhood renal malignancies and the majority of those diagnosed in children under 6 months of age 1,2 . CMN is classified histologically into classical, cellular, and mixed subtypes based primarily on degree of cellularity and mitotic activity 3 . The cellular variant is characterized by a sarcoma-like diffuse hypercellular morphology, whereas classical CMN is composed of less proliferative spindle cells 3 . Cellular CMN is driven by rearrangements involving the tropomyosin receptor kinase (TRK) gene NTRK3, most commonly a t(12;15)(p13;q25) reciprocal translocation with the ETV6 transcription factor 4,5 . Less frequent somatic aberrations include trisomies of chromosomes 8,11,17, and 20 6,7 and rarer TRK fusions, involving NTRK1, NTRK2, or NTRK3 8 . By contrast, the genetic changes underpinning the classical variant, accounting for >30% of cases, are unknown 9 . Cellular CMN shares its genetic and morphological hallmarks with infantile fibrosarcoma (IFS), a spindle cell tumor typically arising in the soft tissues of the extremities or abdomen 5,9,10 .
Standard treatment for CMN and IFS is complete surgical resection [9][10][11] . In the case of IFS, local control frequently requires cytotoxic chemotherapy 10,11 . The role for up-front chemotherapy in CMN is less clear 9 . Recently, a phase I/II clinical trial of a selective TRK inhibitor, larotrectinib, reported high response rates in diverse tumor types harboring TRK gene fusions, including IFS and other soft tissue tumors of infancy 12 . Morbidity and infrequent death result from tumor recurrence or from treatment-related complications [9][10][11] .
Here, we investigated the genetic basis of CMN and IFS lacking the canonical NTRK3-ETV6 fusion gene. We identify oncogenic rearrangements in MAPK signaling genes across all cases interrogated by unbiased sequencing, notably therapeutically tractable intragenic rearrangements in EGFR and BRAF.

Results
Overview of the genomic landscape of CMN. To identify the genetic basis of cryptogenic CMN, we first applied whole genome and transcriptome sequencing to a discovery cohort of ten classical CMN lacking an NTRK3 fusion (Supplementary Data 1). Somatic variants were identified by comparing tumor and matched peripheral blood sequences (see Methods). The genomic landscape was universally quiet, with a low burden of point mutations (median of 45 substitutions and 9 insertions or deletions per genome; Supplementary Data 2). The predominant mutational signatures, as defined by the trinucleotide context of substitutions, were the ubiquitous signatures 1 and 5 13  Internal tandem duplication of the EGFR kinase domain in CMN. Annotating all cases for potential oncogenic variants revealed a single intragenic, in-frame internal tandem duplication (ITD) of the EGFR kinase domain in all ten tumors (Table 1; Fig. 1; Supplementary Data 3). The breakpoints clustered in a narrow genomic window around the kinase domain of EGFR encoded in exons 18−25 (Fig. 1a). This rearrangement is rarely observed in several other tumor types including in glioma and in lung adenocarcinoma, and confers sensitivity to a targeted EGFR inhibitor, afatinib 14 . We validated all rearrangements by genomic copy number analysis and reconstruction of cDNA reads spanning the breakpoint junction ( Fig. 1; see Methods). Of note, the same mutant cDNA junction sequence was found in every case, irrespective of the genomic location of breakpoints. A search for additional known or novel driver variants revealed no further plausible candidates in any of the EGFR-mutant tumors. We next extended this investigation to seven non-classical CMN lacking an NTRK3 fusion, including four mixed cellularity cases and three cellular tumors (Table 1; Supplementary Data 1). Two of the four mixed cellularity tumors surveyed also harbored an EGFR-ITD. Of note, for one child with EGFR-ITD-positive mixed cellularity CMN (PD37214), both primary tumor and recurrence were studied, with no additional driver events apparent at relapse.
BRAF rearrangements in CMN and IFS. A further striking finding was the discovery of mutations in the BRAF oncogene in 2/3 cellular histology CMNs. BRAF fusions have been implicated in a minority of IFS but not in CMN 15 . In both cases the BRAF rearrangement involved a compound deletion of conserved region 1 (CR1) and tandem duplication of exon 2 ( Fig. 2; Table 1; Supplementary Data 3). CR1 encompasses the negative regulatory Rasbinding domain (RBD), loss of which is predicted to generate a constitutively active form of BRAF 16,17 . Mutated tumors displayed intense staining of phosphorylated ERK by immunohistochemistry, consistent with activated signaling downstream of BRAF (Figs. 1e and 2e). A further tumor harbored the KIAA1549-BRAF fusion, a molecular hallmark of a childhood brain tumor, pilocytic astrocytoma 18,19 . This fusion likewise results in loss of the Nterminal portion of the BRAF protein containing the RBD 17,18 .
Other TRK fusions in CMN. The remaining two cases of CMN interrogated by whole genome and transcriptome sequencing were accounted for by gene fusions involving NTRK1, an alternate kinase of the TRK family of protein kinases: TPR-NTRK1 and LMNA-NTRK1. Both of these fusions have been observed in IFS and rarely in adult cancers, but not, to our knowledge, in CMN 20-23 (Table 1). Hence, every cryptogenic CMN interrogated by whole-genome sequencing contained an oncogenic rearrangement in BRAF, EGFR, or NTRK1, all of which encode kinases involved in MAPK signaling and are amenable to inhibition with existing drugs 9,12,14,17,24 .  Table 1; Supplementary Data 1). EGFR-ITD was most prevalent in classical and mixed cellularity CMN, though was also found in cellular CMN (2/17 cases). The frequency of EGFR rearrangement in classical tumors was lower in the validation cohort (20/35 cases) than in the initial discovery cohort (10/10 cases). None of the IFS cases, nor other childhood kidney tumors, harbored EGFR-ITD. However, we encountered three cases of IFS with intragenic BRAF deletions. Remarkably, in two cases BRAF-ID co-occurred with NTRK3 fusions, the disease-defining mutation of IFS. We were unable to accurately estimate relative allele frequencies by nested PCR (see Methods). Hence, it is possible that both fusions co-exist within the same clone or represent independent clones that evolved in parallel within the same tumor.

Discussion
In this exploration of infant tumors we identify ITD of the EGFR kinase domain that delineates a genetic subgroup of CMN transcending histological subtypes. Additionally, we report a novel rearrangement of BRAF present in both cellular CMN and IFS. These mutations represent diagnostic markers that can be readily integrated into routine clinical practice. Furthermore, EGFR and BRAF emerge as therapeutic targets, which may be exploited in certain clinical situations, e.g., large surgically intractable tumors, disease recurrence or metastases. It is noteworthy that an oncogenic mutation was identified in every tumor that we studied by whole-genome sequencing. Of these, 78% harbored either EGFR-ITD or BRAF-ID, while the remaining 22% presented with non-canonical mutations involving BRAF, NTRK1, or NTRK3. This suggests that less recurrent rearrangement variants, albeit implicated in the same signaling circuity, may elude detection by targeted diagnostic assays. Moreover, our results indicate that a subset of tumors harbor multiple drivers with important implications for targeted therapy efforts. The finding of co-mutation of NTRK3 and BRAF in IFS raises the possibility of intrinsic resistance of some tumors to TRK inhibition, regardless of whether these mutations occur in the same clone or in independent competing clones. This finding is pertinent to clinical trials of TRK inhibitors in CMN and IFS 12 . In this vein a structurally similar BRAF fusion transcript, albeit without duplication of exon 2, has recently been implicated as a mechanism of resistance to certain BRAF/MEK inhibitors 16,17 . These considerations underscore the need for adequate genomic profiling in order to match patients to the most appropriate basket studies and to enable meaningful interpretation of treatment responses. Therefore, we would advocate extending the diagnostic work-up of refractory or relapsed CMN and IFS to whole genome sequencing, particularly in the context of clinical trials.
Biologically our findings draw further parallels between CMN and IFS. We identify BRAF and NTRK1 as additional cancer genes operative in both malignancies, substantiating the view that these diagnoses represent variants on the same disease spectrum converging on aberrant RAS-RAF-MEK-ERK signaling 5,8,9 . Furthermore, in the wider context of the childhood cancer genome, our findings add to the growing body of studies that identify short distance intragenic rearrangements as a dominant source of oncogenic mutations in otherwise quiet genomes. We note the parallel between CMN, clear cell sarcoma of the kidney and low-grade glioma that are in large part driven by ITDs often involving kinase domains, mostly as isolated driver events 18,[25][26][27][28][29] . Furthermore, even in acute myeloid leukemia, where FLT3-ITD is a recurrent driver event in adult disease, childhood AML demonstrates a distinct structural variant profile enriched for focal chromosomal gains and losses 30 . We can only speculate on the biological significance of this parallel which may allude to specific mutational mechanisms operative during discrete stages of human development. Sequencing. Tumor DNA and RNA were extracted from fresh frozen tissue that had been reviewed by reference pathologists. Normal tissue DNA was derived from blood samples. Whole genome sequencing was performed by 150-bp paired-end sequencing on the Illumina HiSeq X platform. We followed the Illumina no-PCR library protocol to construct short insert libraries, prepare flowcells, and generate clusters. Coverage was at least 30×. Messenger RNA was enriched by polyA-  20 Variant validation. The Cancer Genome Project (Wellcome Trust Sanger Institute) variant calling pipeline has been continually validated and bench-marked 40,41 . We confirmed variant calling quality through manual visual inspection of raw sequencing read for 8% of all variants called. All rearrangements reported were validated by reconstruction at base pair resolution and by cDNA reads spanning the breakpoint junction.

Methods
Analysis of mutations in cancer genes. We considered variants as potential drivers if they presented in established cancer genes 42 . Tumor suppressor coding variants were considered if they were annotated as functionally deleterious by an in-house version of VAGrENT (http://cancerit.github.io/VAGrENT/) 43 or were disruptive rearrangement breakpoints or focal (<1 Mb) homozygous deletions. Mutations in oncogenes were considered driver events if they were located at previously reported canonical hot spots (point mutations) or amplified the intact gene. Amplifications also had to be focal (<1 Mb) and increase the copy number of oncogenes to a minimum of five copies for a diploid genome. To search for driver variants in novel cancer genes or in non-coding regions, we employed previously developed statistical methods that identify significant enrichment of mutations, taking into account various confounders such as overall mutation burden and local variation in the mutability of the genomic region 44 .
Targeted mutation screening. RNA from frozen tumors (1 µg) or corresponding to approximately 5 cm 2 of 10 µm FFPE sections was reverse transcribed using oligo-dT or random hexamer primers (RevertAid first strand cDNA synthesis kit, ThermoFisher). PCR screening was performed using primer combinations that allow amplification of candidate alterations as well as additional control fragments from the unaffected allele to assess cDNA quality. Amplified fragments were sequenced by Sanger sequencing (GATC, Konstanz, Germany) using primers detailed in Supplementary Table 1.
Code availability. The algorithms used to analyze sequencing data are available at http://cancerit.github.io/.
Data availability. All data supporting the findings of this study are available within the article and its supplementary files or from the corresponding author on reasonable request. Sequencing data have been deposited at the European Genome-Phenome Archive (http://www.ebi.ac.uk/ega/) that is hosted by the European Bioinformatics Institute (accession numbers EGAS00001002534 and EGAS00001002171).