The genomic landscape of tuberous sclerosis complex

Tuberous sclerosis complex (TSC) is a rare genetic disease causing multisystem growth of benign tumours and other hamartomatous lesions, which leads to diverse and debilitating clinical symptoms. Patients are born with TSC1 or TSC2 mutations, and somatic inactivation of wild-type alleles drives MTOR activation; however, second hits to TSC1/TSC2 are not always observed. Here, we present the genomic landscape of TSC hamartomas. We determine that TSC lesions contain a low somatic mutational burden relative to carcinomas, a subset feature large-scale chromosomal aberrations, and highly conserved molecular signatures for each type exist. Analysis of the molecular signatures coupled with computational approaches reveals unique aspects of cellular heterogeneity and cell origin. Using immune data sets, we identify significant neuroinflammation in TSC-associated brain tumours. Taken together, this molecular catalogue of TSC serves as a resource into the origin of these hamartomas and provides a framework that unifies genomic and transcriptomic dimensions for complex tumours.

The resulting mutations were manually inspected for artifacts such as strand bias and residual bias from PCR duplication.

TSC1/TSC2 Mutation Calling
The germline or somatic origin of TSC1/TSC2 mutations in tumors lacking matched non-tumor tissue controls was predicted from features of known germline and somatic mutations as follows. First, all normal tissues had a maximum of one TSC mutation and with the exception of two very low frequency mosa ic mutations (74-MG1 and 57-UG1), these were always found at variant allelic fractions (VAF) > 40% when SNVs or INDELs (excluding mosaic mutations: median 49%; range 40% -72%). These are consistent with heterozygous events affecting an entire population of diploid cells (i.e., germline mutations). Moreover, these mutations detected in normal tissues were always also detected in tumor tissues from the same patient. Second, CN-LOH events were found exclusively in tumor tissues and always co-occurred with a germline mutation (identified in paired normal tissue). When the germline event was a point mutation, its VAF was always higher in the tumor than classified as somatic if it was found at VAF < 40% and occurred with a second mutation a t VAF > 40%, which was then classified as germline. If the co-occurring mutation was a large deletion (for which relative frequency could not be clearly determined), they were both classified as "unclear" origin. ( 4) A point mutation was classified as germline if it was found at VAF > 40% and co-occurred with a second mutation at VAF < 40% (if both were > 40% VAF, they were classified as "unclear" origin). (5) If a tumor had a single mutation at VAF < 40%, it was classified as "unclear" origin (it could represent a mosaic primary mutation or tumor-specific mutation in the absence of an identifiable primary mutation). Similarly, if a tumor only contained a single large deletion, it was classified as "unclear" origin (as the relative frequency could not be cle arly determined). (6) If a mutation was observed in two independent tumors (tumors of distinct type) from the same patient, it was considered a germline event.

RNA Sequencing and Differential Gene Expression Analysis
RNA sequencing was completed at the HAIB GSL. Messenger RNA (mRNA) libraries were prepared using NEBNext reagents (New England BioLabs) from total RNA samples. Samples underwent directional sequencing on the Illumina HiSeq 2500 using v4 reagents and 100 bp paired end reads. RNA sequencing read quality was assessed using FASTQC v0.11.3 (http://www.bioinformatics.bbsrc.ac.uk/projects/fastqc/). Reads were aligned to the hg19 genome using Subread (v1.4.5) 9 with default parameters. Raw read counts to known exons were obtained using FeatureCounts v1.4.5 10 and imported into R 11 for differential expression analysis via limma (v. 3.28.14) 11 . Counts per million (CPM) were calculated and log2 transformed using voom 12

SNP Arrays and Copy Number Analysis
Copy number analysis was performed using Infinium HumanOmni2.5S Arrays (Illumina) at the HAIB GSL.
From raw IDAT files, the GenomeStudio (v2011.1) Genotyping Module (v1.9) was used to call genotypes and estimate total copy number, log R ratio (LRR), and B-allele frequency (BAF) for each SNP (Illumina). Allele detection and genotype calling were performed using default parameters and the a ppropriate manifest file (HumanOmni2-5-8-v1-1-C.bpm or HumanOmni25-8v1-2_A1.bpm). For each tumor, total genome-wide copy number estimates were refined using tangent normalization, in which tumor signal intensities are divided by signal intensities from the linear combination of normal samples in the cohort (Tabak B. and Beroukhim R. manuscript in preparation). Individual copy number estimates then underwent segmentation using the Circular Binary Segmentation algorithm 15 . As part of this process of copy number assessment and segmentation, regions corresponding to germline copy number variations (CNVs) were removed by applying filters generated from germline samples from The Cancer Genome Atlas. Samples with over -segmentation, defined as more than 1000 copy number segments after Circular Binary Segmentation with no enrichment on any particular chromosome, or low data quality were removed from further analysis. Per-sample arm-level and gene-level copy ratios were identified from segmented data using GISTIC 2.0.22 16  and calculate the beta value (Level 3 data) for each probe and sample with the R -based methylumi package.
Dye-bias normalization and normalization were performed as p reviously described 20 . The level of DNA methylation at each CpG locus is summarized as avbeta (β) value, calculated as (M/(M+U)) and ranging from 0 to 1, which represents the ratio of the methylated probe intensity to the overall intensity at each CpG locus. A p value comparing the intensity of each probe to the background level was calculated with the methylumi package at the same time, and data points with detection p values >0.05 were deemed not significantly different from background measurements and therefore were masked as "NA" in the analyses.  Green-dUTP, Orange-dUTP, or Red-dUTP (Abbott Molecular Inc., Abbott Park, IL), by nick translation. Tumor touch preparations were made by imprinting thawed tumors onto positively -charged glass slides. The sample slides were fixed in methanol:acetic acid (3:1) for 30 min, air -dried, aged in 2X saline/sodium citrate (SSC) at 60 °C for 27 min, digested with 0.005% pepsin at 37 °C for 5 min, and washed with 1X PBS for 5 min. Slides were placed in 1% formaldehyde/PBS for 10 min at room temperature, washed with 1X PBS for 5 min, and dehydrated in an ethanol series (70%, 85%, 95%) for 2 min each. Slides were then denatured in 70% formamide/2X SSC at 74 °C for 3.5 min, washed in a cold ethanol series (70%, 85%, 95%) for 2 min each, and air-dried. FISH probes were denatured at 75 °C for 5 min and held at 37 °C for 10-30 min until 10 µl of probe was applied to each sample slide. Coverslips were adhered and slides hybridized overnight at 37 °C in a ThermoBrite hybridization system (Abbott Molecular Inc.). The post-hybridization wash was with 2X SSC at 73 °C for 3 min followed by a brief water rinse. Slides were air -dried and then counterstained with VectaShield mounting medium with 4'-6-diamidino-2-phenylindole (DAPI) (Vector Laboratories Inc., Burlingame, CA). Image acquisition was performed at 600x or 1000x system magnification with a COOL-1300 SpectraCube camera (Applied Spectral Imaging-ASI, Vista, CA) mounted on an Olympus BX43 microscope. Images were analyzed using FISHView v7 software (ASI) and at least 200 interphase nuclei were scored for each samp le.