Tracking Subclonal Mutation Frequencies Throughout Lymphomagenesis Identifies Cancer Drivers in Mouse Models of Lymphoma

Determining whether recurrent but rare cancer mutations are bona fide driver mutations remains a bottleneck in cancer research. Here we present the most comprehensive analysis of retrovirus driven lymphomagenesis produced to date, sequencing 700,000 mutations from >500 malignancies collected at time points throughout tumor development. This enabled identification of positively selected events, and the first demonstration of negative selection of mutations that may be deleterious to tumor development indicating novel avenues for therapy. Customized sequencing and bioinformatics methodologies were developed to quantify subclonal mutations in both premalignant and malignant tissue, greatly expanding the statistical power for identifying driver mutations and yielding a high-resolution, genome wide map of the selective forces surrounding cancer gene loci. Screening two BCL2 transgenic models confirms known drivers of human B-cell non-Hodgkin lymphoma, and implicates novel candidates including modifiers of immunosurveillance such as co-stimulatory molecules and MHC loci. Correlating mutations with genotypic and phenotypic features also gives robust identification of known cancer genes independently of local variance in mutation density. An online resource http://mulv.lms.mrc.ac.uk allows customized queries of the entire dataset.


INTRODUCTION
The majority of verified human cancer genes have been identified by translocation breakpoints, recurrent exonic mutations, and focal copy number aberrations (Futreal et al. 2004). Increasing cohort sizes of human tumor sequencing has revealed large numbers of rare clonal mutations, however their contribution to disease is more difficult to prove due to a lack of statistical power, giving rise to false positives and negatives (Lawrence et al. 2013). It is similarly challenging to determine how deregulation of intact open reading frames by non-coding mutations, large-scale copy number alterations and epigenetic mechanisms contributes to disease. The data available to identify cancer drivers from tumor sequencing studies could be increased through the study of subclonal mutations in both premalignant samples as well as mature tumors, however this requires quantification of both in numbers sufficient to demonstrate that selection has taken place.
Murine leukemia virus (MuLV) induced lymphoma is an ideal model system to study the extent to which subclonal mutations can be used to identify cancer drivers.
Infection of newborn mice with MuLV causes a systemic lifelong viremia whereby viral integrations deregulate and truncate nearby genes by diverse mechanisms. Fusion transcripts, virus LTR enhancer elements acting on endogenous promoters and replacement of 3' UTR sequences can increase expression, whilst disruption of endogenous transcripts, transcriptional interference or methylation of DNA flanking integrations can decrease expression (Berns 2011). These mutations eventually give rise to hematological malignancies, and a high proportion of the recurrently mutated loci correspond to known drivers of human lymphoid malignancies, as well as regulators of hematopoietic development and lymphocyte survival (Berns 2011;Ranzani et al. 2013).
Historically, these screens focused on mutations present in clonal outgrowths as evidence of their role in malignancy, however recent pyrosequencing of MuLV lymphomas has also shown selection taking place within subclonal populations of cells (Huser et al. 2014). Cloning integration mutations by ligation mediated PCR requires a fraction of the sequencing coverage needed to identify other mutation types, allowing large numbers of subclonal integration mutations to be identified with unparalleled sensitivity. Furthermore, gamma retroviruses are not subject to remobilization, can integrate in any sequence context, and localized bias of the orientation of integrations can be used as a measure of selection that is independent of regional variation in integration density (Huser et al. 2014).
The fraction of rare mutations that drive cancer is largely unknowable. In this study, we use somatic insertional mutagenesis in mice as a model to demonstrate that subclonal mutations that are only rarely found as clonal mutations in advanced-stage disease can be effectively employed to identify known cancer drivers and differentiate rare disease causing mutations from passenger mutations. Using a novel insertion site cloning protocol, able to detect subclonal retroviral integrations with unprecedented sensitivity, we cloned more than 3000 clonal and 700,000 subclonal mutations across a spectrum of > 500 MuLV induced T-cell and B-cell lymphoid malignancies from two BCL2 transgenic models over a time course of lymphomagenesis. From these we find both positive and negative selection of subclonal events throughout all stages of lymphomagenesis and that in late-stage disease both clonal and subclonal populations identify more than 100 known cancer drivers and regions implicated in non-Hodgkin lymphoma (NHL) by coding mutations, copy number aberrations and genome wide association studies (GWAS). This resource can be used to prioritize rare but recurrent mutations from human tumors for further study.

A time course of MuLV infection quantifies the transition from premalignancy to malignant lymphoma.
To observe how subclonal mutations undergo selection during lymphomagenesis we generated a diverse set of B cell and T cell derived lymphoid malignancies, sacrificing animals with advanced-stage disease, as well as at a series of premalignant time points. Moloney MuLV typically results in a T-cell leukemia/lymphoma however subtype and mutation profile can be skewed by genetic background and predisposing germline alleles (Uren et al. 2008;Kool et al. 2010). To produce malignancies where mutation profiles are correlated with different phenotypes and predisposing genotypes, we generated lymphomas on two genetic backgrounds using both wild type animals and two BCL2 transgenic models. The t(14;18)(q32;q21) IGH/BCL2 translocation drives enforced expression of the antiapoptotic protein BCL2 and is one of the earliest and most common initiating mutations of follicular lymphoma (FL) and diffuse large B-cell lymphoma (DLBCL). Overexpression of BCL2 is also frequently observed in B-cell chronic lymphocytic leukemia (CLL).
Newborns were infected with MuLV by intraperitoneal injection. A cohort of Vav-BCL2 transgenic animals (expressing high levels of human BCL2 from a Vav promoter (Ogilvy et al. 1999)) and wild type littermates were generated on a C57BL/6 x BALB/c F 1 background. Emu-BCL2-22 transgenic cohorts (expressing moderate levels of human BCL2 from an Emu enhancer (Strasser et al. 1991)) and wild type littermates were generated on both C57BL/6 x BALB/c F 1 and C57BL/6 backgrounds. All mice developed lymphoid malignancies with latency ranging 42-300 days (Fig. 1a-c), with enlarged spleens, thymuses and lymph nodes observed in all cohorts. Disease onset was significantly accelerated by the Vav-BCL2 transgene on the F 1 background (p=0.0001) (Fig. 1a) and by the Emu-BCL2-22 transgene on a C57BL/6 background (p=0.0163) (Fig. 1c) compared with their littermate controls. The F 1 background developed lymphoma more rapidly than equivalent C57BL/6 cohorts (supplemental Both BCL2 transgenic cohorts yielded a higher proportion of CD19+ B-cell lymphomas compared to wild type mice (Fig. 1g), most notably in the Vav-BCL2 cohort. Spleen suspensions segregate into two groups, with either a majority of T cells or of B cells ( Fig. 1h). T-cell lymphomas were primarily CD4+, less frequently CD4-CD8-(typical of early T-cell precursor ALL), and only rarely CD8+ or CD4+CD8+ (Fig. S2b). B-cell lymphomas were generally immunoglobulin light chain positive indicating a mature B cell phenotype (Fig. S2c). MuLV infected Vav-BCL2 transgenic mice displayed a disproportionate outgrowth of PNA+ CD95+ germinal center B cells, and isotype switching to IgG as has been previously described in this strain (Egle et al. 2004)(Fig. S2c-d). The Emu-BCL2-22 transgene significantly reduces latency on a C57BL/6 background but not F 1 background. The Emu-Bcl2-22 C57BL/6 cohort had a significantly shorter latency than wild type C57BL/6 controls and both C57BL/6 cohorts had longer latency compared with F1 equivalents. (Supplemental Fig. 1). d-f) Stacked bar charts on the right represent the immunophenotyping of spleen suspensions from each cohort. Each row represents one spleen. Colors in each row represent the proportion of B cells (blue CD19+) and T cells (yellow CD5+ CD4-CD8-, light orange CD5+ CD8+, dark orange CD5+ CD4+ CD8+ and red CD5+ CD4+) in each sample. BCL2 transgenes increase the proportion of B cells in all cohorts and the mixture of T cell lymphoma subtypes is highly variable. g) The proportion of CD19+ B cells is increased by both BCL2 transgenes (h) Histogram of all CD19+ proportions from all cohorts combined is a bimodal distribution that can be segregated into those consisting primarily of B cells (>50%) and T cells.
To track selection of premalignant mutations we also generated cohorts sacrificed at days 9, 14, 28, 56, 84 and 128 post-infection and harvested the spleens from these animals ( Table S1). QPCR of virus transcript and copy number indicated that virus replication was detectable at day 9 and reached saturation at day 14 ( Fig. 2a & b). This suggests that a high proportion of mutagenesis occurs in the first 14 days post infection, with subsequent rare events of superinfection and selective pressure shaping the mutation profile and the eventual clonal outgrowth of late-stage lymphomas.
Retroviral integration sites from all animals were identified using a novel Illumina HiSeq based protocol (a revision of methods described in Koudijs et al. 2011 andUren et al. 2009) and summarized in Fig. S3). A series of test DNAs were used to generate multiple replicate libraries that were sequenced to saturating coverage. Measures of highly clonal mutations are reproducible when the same DNA sample is processed twice (Fig. S4a), and clonal integrations can be reproducibly detected when diluted 100 fold into a second DNA sample (Fig. S4b). Barcoded libraries were generated from the lymphoid organs of 355 diseased animals, 166 animals sacrificed at predetermined time points, and control DNAs (human and uninfected mice). Sequencing these libraries identified more than 700,000 unique integration sites, the vast majority of which are subclonal and represented by a single read. The relative clonality of integrations within each ligation was estimated using the number of individual sheared DNA fragments identified for each insert.

Quantifying lymphoma progression by clonal outgrowth.
We sought to use mutation abundance to distinguish premalignant samples from rapidly dividing diseased samples. MuLV generates tumors with 100% penetrance, resulting from independent competing clones. A single clonal outgrowth of pure tumor cells containing few mutations will yield high coverage of each mutation, whereas a clonal outgrowth with dozens of concurrent mutations alongside a large proportion of non-tumor DNA will yield low coverage for even the most clonal mutation. There may also be multiple independent clones or related subclones within each animal.
As such, for comparison between samples we generated normalized clonality values (NC values) where the most clonal integration within each sample was normalized to a value of 1. In Fig We used two approaches to compare levels of clonal outgrowth. Entropy was employed as a description of the clonality of integrations within each sample (a method based on the prior use of the Shannon entropy to estimate clonal outgrowth of T lymphoma (Brown et al. 2016) and in mathematical models of leukaemia (Baldow et al. 2016)). Entropy calculations from the 50 most clonal integrations yielded high scores for premalignant samples and low scores for advanced-stage lymphoma.
As an independent approach, we used distance measures to cluster clonality profile by their shape. Dynamic Time Warping (Giorgino 2009) and the Kolmorogov-Smirnov test were both used to measure difference in shape between all insert profiles and a distance matrix was constructed from these. Clustering on these matrices yields two clusters (Fig. 3a)

Kinetics of mutation selection throughout a time course.
MuLV has integration biases that vary substantially throughout the genome such that mutation density is insufficient to differentiate selected driver mutations from passengers. For this reason we assessed selection over the time course to define driver mutations. First, by limiting analysis to the 3051 clonal integrations of the late-stage lymphomas we identified 311 common integration sites by Gaussian In addition to selection between early and late-stage samples, we also considered other criteria as evidence for selection. A recent study of MuLV induced T-cell lymphomas used orientation of integrations as evidence that there is selection for deregulation of nearby genes (Huser et al. 2014). The red heat map of Fig. 4 indicates that the majority of the top 50 clonal CIS loci also have a significant bias for subclonal integrations on one strand or the other. We additionally explored whether integrations also demonstrate phenotypic bias (selection specific to B-cell or T-cell lymphoma, the yellow heat map) or genotypic bias (selection in cooperation with the BCL2 transgenes, the green heat map). The majority of the top 50 clonal CIS loci demonstrate biases of strand specificity, lymphoma subtype and/or genotype specificity. Importantly there is substantial overlap between all four selection criteria (stage, orientation, immunophenotype and genotype) suggesting these criteria can be used in concert to provide corroborating evidence for selection. Examining the entire genome reveals significant local biases for early versus late-stage, strand bias, genotype and immunophenotype (B cell/T cell lymphoma) (table   S3). After multiple testing correction, we identified 170 late-stage specific loci with a false discovery rate (f.d.r.) below 0.05, including dozens of windows containing only subclonal insertions. The Venn diagram in Fig. 5a illustrates that there is substantial overlap of late-stage selected loci with equivalent loci found to be strand specific (49 loci), and genotype specific (37 BCL2 loci, 15 wild type loci). This is also true for immunophenotype specific loci (19 B cell loci, 11 T cell loci). Although biases were observed for some loci between males and females, none were found to be significant after multiple testing correction (data not shown).
In regions of high insert density subclonal mutations form a high-resolution map of the selective pressures surrounding known oncogenes. The central region of chromosome 15 (chr15:62,000,000-63,000,000) with the highest concentration of latestage integrations is the Myc/Pvt1 locus (Fig. 5b). The surrounding region harbors multiple clusters of selection spanning from upstream of Myc, and extending through multiple clusters downstream as far as the Gsdmc gene family locus (Fig. S7b). This distribution of selected mutations over a 2Mb region concurs with the recent finding that copy number gains of the entire segment, incorporating Myc, Pvt1 and the Gsdmc family locus, is required to give acceleration of cancer in mouse models (Tseng et al. 2014).
Orientation bias is a unique criterion, in that it is independent of the integration biases of MuLV that may be influenced by cell type or genotype. To validate that orientation bias is indeed a function of selection we calculated bias using equal numbers of integrations from early and late-stage cohorts (i.e. 80,000 integrations). No loci from the early-stage inserts are significant after multiple testing correction, but the late-stage insert subset identifies 16 loci. This illustrates that the increased significance of strand bias in the late-stage cohort is not merely a function of greater statistical power from larger number of integrations, but rather evidence of selection for integrations that deregulate or disrupt genes (table S4).

Selection of mutations effectively identifies known cancer drivers.
Supplemental table S5 lists all candidate genes associated with one or more of our selection criteria. Table 1 lists the subset of these loci corresponding to genes from the cancer gene census (Futreal et al. 2004); in total 47 genes located at 43 loci. Of the 47 genes, 27 map within 200,000kb of a clonal CIS with a p-value < 0.05, however an additional 21 genes at 20 loci are implicated by subclonal selection criteria that were not identified by clonal CIS demonstrating subclonal mutations can provide additional statistical evidence to implicate cancer drivers for loci lacking sufficient clonal mutations to make this determination.
We also compared the list of candidate genes identified by any criteria with a set of 12 cohorts of hematological malignancies present in the cBio portal (Cerami et al. 2012). All protein coding candidates generated by KCRBM (without curation) or the curated candidate gene lists were used to identify human orthologues using BioMart (http://www.ensembl.org/biomart/martview/). The set of 78 MuLV candidate genes found mutated in 2 or more samples from any study is depicted in Fig 5a. For the majority of cohorts we find significant overlap between either the KCRBM candidates or the full set of mutated genes (Fig 5b). When limiting analysis to genes mutated twice in each study we see most overlap with a pan NHL study (consisting of DLBCL and FL) and cohorts of mature B cell derived lymphoma (DLBCL, MM, MCL). Importantly the overlap is also significant when examining the set of genes mutated only once in each study i.e. the set of most rarely mutated genes from most cohorts overlaps significantly with the candidate lists, demonstrating this dataset can be used as corroborating evidence for rarely mutated genes in human sequencing cohorts.

Selection of loci implicated by GWAS of lymphoid malignancies.
In a previous study we found overlap between CIS loci and the set of loci associated with familial chronic lymphocytic leukemia (CLL) ).

Supplemental table S9 summarizes the recent literature of GWAS studies of ALL, FL
and DLBCL and identifies overlap with the set of candidate genes. The IKZF1 and PIP4K2A/BMI1 loci are associated with ALL (Xu et al. 2013;Moriyama et al. 2015).
The second most specific locus for BCL2 transgenic mice is Cd86 (late-stage, strand bias, BCL2) which is suggestively implicated by two GWAS studies of FL and DLBCL (Skibola et al. 2014;Cerhan et al. 2014;Bassig et al. 2015). We additionally find other loci encoding co-stimulatory/co-inhibitory signaling ( Fig. S7l-p). The loci encoding inhibitors have been shown to inhibit the growth of tumor cell lines (Peserico et al. 2015). The example of the Smyd3 locus demonstrates the potential for time course mutation analysis to not only identify cancer drivers, but also potential targets that whilst not mutated, are essential for tumor cell growth.

Co-mutation analyses using subclonal mutations.
Understanding which genes cooperate in lymphomagenesis can inform the biology and subtype of disease, however co-mutation analyses of subclonal mutations is complicated by the potential presence of multiple independent subclones. To minimize these effects, we performed contingency table tests to identify co-mutation rates, limiting analysis to the most clonal integrations (NC>0.1), increasing the likelihood that mutations are present in the same clone. Rather than discard all subclonal insertions, we additionally performed analyses looking for associations of clonal mutations with the subclonal mutations, based on the assumption that subclonal mutations may be present in the same cells as clonal mutations from the same sample.
Both approaches yield overlapping results overall, but the latter greatly improves the statistical power of the associations identified (supplemental Fig.S8).
We have previously demonstrated that co-mutation analysis can be compromised by the pooled analysis of phenotypically and genotypically distinct groups, which creates false positives from genes that are co-mutated or mutually exclusive due to a primary association with tumor subtype rather than other mutations . For this reason we devised an online tool that allows subsets of tumors with restricted phenotypes, genotypes and mutation profiles to be queried http://mulv.lms.mrc.ac.uk/coocc/index.php.

DISCUSSION
In this study we present the most comprehensive analysis of MuLV-driven lymphomagenesis produced to date, identifying 700,000 mutations from 521 infected animals with an average of more than 1000 subclonal mutations per sample. By developing a framework that incorporates subclonal mutation frequencies in both premalignant cells and late-stage tumors, we enhance the statistical power enabling the identification of known driving events in cancer and further implicate dozens of novel loci in the biology of lymphomagenesis, in some cases independently of evidence of clonal expansion.
The resolution of mutation coverage illustrates considerable complexity in the position and orientation of selected integrations in the vicinity of verified cancer drivers, and suggests uncharacterized locus-specific mechanisms by which these mutations modify expression in a position dependent manner. The online repository (http://mulv.lms.mrc.ac.uk) allows researchers studying lymphoid malignancies to query custom subsets of data for genome wide associations of a gene of interest with tumor type and mutation status, and create custom tracks for the UCSC genome browser (Kent et al. 2002). Tracks on the UCSC genome browser can also be browsed to examine the selection biases at specific loci of interest http://mulv.lms.mrc.ac.uk/ucsc/index.php) and subsets of tumors can be queried to identify co-mutated genes within phenotypically/genotypically matched lymphomas.
Recent whole genome sequencing of cohorts of hundreds of patient samples illustrates the challenges of identifying driver mutations outside the exome (Puente et al. 2015;Nik-Zainal et al. 2016 (Khodabakhshi et al. 2012). This variation can confound the identification of driver mutations outside non-coding regions. The overlap we find between independent criteria as evidence of selection (disease stage, tumor type, genetic interactions and strand bias) using a mutagen that exhibits strong regional variation in distribution, demonstrates it is possible to use these criteria as a mitigant of regional variation in mutation frequencies. Furthermore, visualizing this selection as a continuum at multiple scales using multiple parameters allows intuitive differentiation of recurrent selection in non-exonic regions from mutation hotspots. and premalignant tissue, and incorporating these into a framework that combines tumor genotype and phenotype, not only provides supporting evidence for rarely mutated cancer drivers, but also potentially widens the spectrum of genes encoding therapeutic targets.

Animal work
All procedures were performed in accordance with the UK Home Office Animals Survival cohort mice were sacrificed upon developing advanced symptoms of lymphoma and lymphoid organs were harvested and snap frozen in liquid nitrogen immediately. Cell suspensions of spleen tissue were prepared in all cases using the gentleMACS Dissociator (Miltenyi Biotec) set to programme m_spleen_1.01.

Flow Cytometry
Cryopreserved spleen suspensions were defrosted and washed twice in buffer, PBS-2% FCS and incubated with 2.0µg Fc block per 10 6 cells for 15 minutes. The samples were then incubated with the antibody cocktails for 15 minutes after which they were washed. The majority of samples were processed using the Attune NxT Acoustic Focusing Cytometer, Life Technologies, and the remainder were processed on a BD LSRII. All analyses were performed using FlowJo v10.2.

Integration site cloning and GKC CIS identification
To clone virus integrations we integrated the method of Koudijs et al (Koudijs et al. 2011) and Uren et al (Uren et al. 2009) and modified these for the Illumina platform.
A detailed protocol is provided in the supplemental methods document.

Entropy quantitation
Clonality values among the early-stage samples are relatively uniform, whilst late-stage samples present few integrations with very high clonality values and most with low clonality values. To quantify this difference and thus be able to order samples from pre-malignancy to late-stage lymphoma we used Shannon entropy (Shannon 2001). The 50 highest clonality values ! , ! , … , !" were transformed into probabilities ! : The Shannon entropy over a set of probabilities ! , ! , … , ! is defined as: The entropy quantifies the spread of a distribution: it is zero when a single ! is equal to one and all others are equal to zero; and reaches its maximum value when the probabilities are uniformly distributed (p i =1/50 for every i). Probabilities from early-stage samples are closer to a uniform distribution and therefore the samples will have high entropy values, while the probabilities from late-stage samples are closer to a spike, providing low entropy values.

Clustering
Samples were clustered based on the shape of clonality profiles. The top 50 ranked clonal insert NC values of each sample were compared to all other samples using dynamic time warp (dtw package) or the Kolmorogov-Smirnov test (ks.dist() function) as a measure of difference between samples (with the R package, function dist() takes as input the list of NC values and the method chosen -"DTW"-). A distance matrix that was clustered using the "average" linkage method (using the R hclust() function). Different linkage methods gave similar results (data not shown).

Genome wide scanning for selected insertions
A scanning 100kb window is moved across the genome in increments of 10kb.
For each window the number of insertions in each class (early/late, forward strand/reverse strand, BCL2 transgenic/wild type) is counted and the likelihood of this distribution between groups is estimated using two-tailed Fisher's exact test. By comparing neighboring windows, p-value minima are identified (i.e. windows where the p-value is higher on either side). Where minima are less than 100,000bp from each other the position with the lowest p-value is kept and others discarded. To estimate false discovery rates all insert/group assignments are randomized (e.g. the same number of inserts are early/late but the assignment is random). Local p-values are calculated and p-value minima are identified. 1000 permutations are used to determine the rate at which each p-value is identified by chance.

Cohort comparisons
Survival comparison of cohorts was performed using Prism 6. Significance of the differences in the proportion of B cells in cohorts was determined using Prism 6 student's t-test.

Data Accession
Paired mapped reads for the sequencing are available from the NCBI sequence read archive https://www.ncbi.nlm.nih.gov/Traces/study/?acc=SRP110741.