Introduction

Hemizygous and homozygous deletions on the long arm of chromosome 13 are the most common genetic aberrations in B-cell chronic lymphocytic leukemia (CLL) and monoclonal B-cell lymphocytosis (MBL), occurring in ~50% of cases.1, 2, 3, 4 The reported deletions are usually large and heterogeneous in size and typically include a minimally deleted region (MDR) on 13q14.3 (~130 kb).3, 5, 6, 7 The MDR is telomeric to the retinoblastoma gene (RB1) and includes DLEU1 and DLEU2, as well as two notable micro-RNAs, MIR15A and MIR16-1. Loss at 13q14 as a single large abnormality can indicate a more favorable CLL clinical course;2 however, a higher percentage of cells with 13q14 loss and increased size of 13q14 deletions have been associated with shorter time to treatment.6, 7, 8, 9, 10, 11 Deletions at 13q14 are postulated to result in loss of tumor-suppressor activity, but the variable size of the 13q14 deletion suggests that more than one tumor-suppressor gene is likely present. Further work has shown that downregulation of MIR15A and MIR16-1 in CLL coupled with a CLL-like phenotype in mouse models deficient for the DLEU2/MIR15A/MIR16-1 cluster indicate that MIR15A and MIR16-1 may be important regulators in CLL pathogenesis.12, 13, 14

Somatic alterations of 13q14 have been reported in solid tumors, suggesting a possible role in the carcinogenesis of select non-hematological malignancies. Approximately 6% of retinoblastoma cases have a 13q14.3 deletion of the RB1 gene.15, 16 Sporadic observation of 13q14 events has been reported in other solid tumors, but not with the consistency observed in CLL or retinoblastoma. For example, pancancer analyses of The Cancer Genome Atlas data show evidence for 13q14 loss in bladder, breast, colon, glioblastoma, head/neck, kidney, lung, ovarian and endometrial tumors.17 Allelic loss at 13q14 has been reported in one-third of prostate tumors.18, 19 Other studies have reported 13q14 loss with high prostate tumor grade and stage,2 as well as increased proliferation and invasiveness of untransformed prostate cells after downregulation of MIR15A and MIR16-1.21 Loss of heterozygosity data from breast cancer studies described that a region including 13q12.2–14.3 can be commonly deleted in breast cancer.22 Moreover, it has been suggested that 13q14 allelic imbalance may be associated with breast cancer mortality more than 5 years after diagnosis,23 but further studies are needed to confirm this finding. Additional association evidence indicates that ADP-ribosylation factor-like tumor-suppressor gene 1 (ARLTS1, also called ADP-ribosylation factor-like 11 ARL11) located on 13q14.3 ~400 kb centromeric to MIR15A and MIR16-1 may function as a tumor-suppressor gene for lung cancer,24 familial breast cancer,25, 26 melanoma,27 ovarian cancer,28 prostate cancer29 and colorectal cancer.30 Overall, the many reports of 13q14 deletions in solid cancers point towards a possible role as a contributing ‘driver’ event, but further studies are needed to confirm the reported frequencies and, more importantly, establish the functional implications of 13q14 deletion beyond CLL and MBL.

Genetic mosaicism is the coexistence of clonal cellular populations harboring two or more distinct genotypes in an individual.31, 32 In previous studies of detectable clonal mosaicism, an increased frequency of large structural events detected in a fraction of circulating cells was noted in prediagnostic samples of individuals who later developed a lymphoid malignancy.33, 34 One of the most common sites for large structural deletions of more than 2 Mb in size included at least 300 kb of the 13q14.3 MDR region,33, 35 the region commonly deleted in CLL. In this context, it is notable that mosaic 13q14.3 loss was observed in a fraction of DNA samples collected from individuals diagnosed with solid tumors or who were cancer free.33, 35 As the screening detection algorithm for single-nucleotide polymorphism (SNP) microarrays conducted as part of cancer genome-wide association study (GWAS) is stable for events >2 Mb in size, the aim of this analysis was to re-examine our large GWAS data set of more than 80 000 individuals to identify additional, smaller 13q14.3 events as well. In addition, we wanted to determine if there is a possible relationship between 13q14.3 events in blood DNA and risk for solid tumors.

Materials and methods

Study population

The Division of Cancer Epidemiology and Genetics and the Cancer Genome Research Laboratory in the NCI have conducted a series of GWASs with commercial SNP microarrays (Illumina Hap300, Hap240, Hap550, Hap610, Hap660, Hap 1, Omni Express, Omni 1, Omni 2.5 and Omni 5). The study set included 82 483 participants (46 254 non-hematological cancer cases and 36 229 cancer-free controls) with blood or buccal DNA, previously known as Total GWAS Set I and II, actually scanned in the Cancer Genome Research Laboratory.35 No cases of retinoblastoma were included in either Total GWAS Set I or II. The institutional review board of the participating study centers and the NIH approved the study protocols and informed consent was received for each study participant.

13q14.3 Copy number alteration detection

B-allele frequency (BAF) and log R ratio (LRR) are two metrics used to assess 13q14.3 mosaic loss in our study population. BAF is a measure of allelic imbalance used to determine stretches of heterozygous SNPs that deviate from the expected value of 0.5. LRR provides information on copy number in which LRR values >0 are evidence for allelic gains, LRR values <0 indicate allelic loss and LRR values not deviating from 0 indicate copy neutral events. BAF and LRR values were estimated using methods described previously,36 and normalized and corrected using a framework from our previous work.33 Based on copy number status, mean heterozygous BAF bands were used to calculate the percentage of cells in the cellular population that contained copy number alterations at 13q14.3. This copy number alteration detection method has been previously validated with additional laboratory techniques for autosomes (e.g., single tandem repeat, multiplex ligation-dependent probe amplification and fluorescent in situ hybridization)37 and is described in greater detail in the Methods and Supplementary information of the original analysis.33 Stretches of homozygosity were detected as genomic stretches across 13q14.3 with no BAF band for heterozygous SNPs.

Two segmentation algorithms were used to scan genotypes of the non-hematological cancer population for mosaic alterations at the 13q14.3 locus: a LRR-based approach and a BAF-based approach. Both approaches scanned the sample population and surveyed a window around the CLL MDR region on chromosome 13q14.3 between bases 49 139 793 and 50 269 706 (GRCh36). The LRR-based approach isolated individuals with evidence of 13q14.3 loss by finding individuals with MDR windows that had mean LRR values significantly less than mean LRR values on chromosome 1 (used as the reference) by applying t-tests with unequal variance and a P-value <1.0 × 10−10 as the significance threshold. A mean difference of −0.05 was used as a filter to ensure that the statistically significant 13q14.3 LRR values displayed meaningful differences from the chromosome 1 reference LRR. The BAF-based approach detected individuals with allelic imbalances at 13q14.3 by selecting individuals with MDR windows that had heterozygous Bdev means (Bdev=|BAF−0.5| for heterozygous BAF bands) that significantly differed from Bdev means on chromosome 1 (P-value <1.0 × 10−10). A mean Bdev difference of 0.005 was required in statistically significant regions to ensure that meaningful differences in Bdev were detected. Results from the two approaches were merged and manually reviewed to confirm a copy number alteration at the 13q14.3 MDR. Copy number status was assigned and the percentage of cells with altered 13q14.3 nuclei was estimated from heterozygous BAF bands. For individuals with 13q14.3 events that extended beyond the investigated chromosome 13 window, boundaries were adjusted to include the entire 13q event.

Breakpoint analysis

Mosaic event breakpoint regions were mapped to within 100 kb from the best estimate of 13q14.3 deletion start and end points using the SNP intensity data. The 100 kb window was chosen as a conservative estimate of the true breakpoint since start and stop coordinates for events were defined based on the nearest array tagging SNPs, an estimate that is affected by SNP density and the quality of LRR and BAF data around these SNPs. The 60 detected 13q14.3 mosaic deletions resulted in a total of 120 breakpoint regions, roughly 200 kb in size.

In all, 1000 permutations of 120 randomly selected 200 kb regions on chromosome 13 were selected to estimate the underlying distribution of chromosome 13 structural elements flanking the 13q14.3 region. To generate these breakpoints, 60 random events were selected from a Gaussian distribution that reflected observed event length (μ=10.80 Mb, σ=16.90 Mb) and were constrained to overlap the 393 kb MDR. The first and last Illumina microarray SNPs within simulated events were used to define the estimated breakpoints, and 100 kb windows were calculated to model the underlying distribution of observed event breakpoints.

University of California Santa Cruz (UCSC) and Encyclopedia of DNA Elements (ENCODE) data tracks were downloaded from the UCSC FTP data portal (ftp://hgdownload.cse.ucsc.edu/) to investigate local DNA characteristics in breakpoint regions as well as in randomly selected regions. Features of interest included gene-rich regions (RefSeq genes, CpG islands), indicators of open chromatin (OH Radical Cleavage Intensity Database (ORChID), DNaseI hypersensitivity (DNaseI HS) peaks, Formaldehyde-Assisted Isolation of Regulatory Elements (FAIRE-Seq) peaks), recombination rate (deCODE sex-averaged) and repetitive elements (short interspersed nuclear elements (SINEs), long interspersed nuclear elements (LINEs), long terminal repeats (LTRs) and segmental duplications). Comparisons between observed breakpoints and random breakpoints were made using mean differences and mean counts of feature elements. Statistical significance was assessed by permutation P-value.

Statistical analyses

Univariate analyses were performed characterizing the relationship of age group (<50, 50–54, 55–59, 60–64, 65–69, 70–74 and 75years), gender (male, female), estimated ancestry (%European, %African and %Asian ancestry estimated by SNP genotypes) and non-hematologic cancer (disease-free controls, solid tumor cases overall and substrata of cases by solid tumor type) with 13q14.3 loss. An additional multivariable analysis that included age group, gender, estimated ancestry and non-hematologic cancer, as well as adjustment for contributing study (indicator variables), were carried out using logistic regression. Strata-specific analyses were performed within strata of non-hematological cancer type to investigate whether 13q14.3 deletion was associated with specific non-hematological cancers.

To investigate differences in frequency between 13q14.3 alterations in our population and CLL prevalence in the US population, age- and sex-specific limited-duration prevalence measures from the November 2011 submission of Surveillance Epidemiology and End Results38 data were used based on the US Census data. All statistical analyses were performed in R version 2.15.1 (R Foundation for Statistical Computing, Vienna, Austria).

Results

In a combined analysis of 82 483 non-hematologic cancer cases and disease-free control DNA drawn from peripheral leukocytes (81%) and buccal samples (19%), we detected 60 individuals (0.073%, 95% confidence interval (CI)=0.054–0.091%) with 13q14.3 mosaic loss between 19.7 and 90.1%. The distribution of events across strata of gender, age group, ancestry and non-hematologic cancer status is displayed in Table 1. Non-hematological cancer cases and controls in our analysis had a frequency of 13q14.3 mosaic loss of 0.084 (95% CI=0.066–0.115) and 0.069 (95% CI=0.038–0.089) percent, respectively. The majority of the detected 13q14.3 mosaic losses were in DNA derived from blood; however, three mosaic 13q14.3 deletions were observed in buccal DNA. In addition to the 60 individuals detected with mosaic loss at 13q14.3, our analysis detected one lung cancer case with a copy neutral uniparental disomy that spanned the 13q14.3 region. Stretches of homozygosity within the 13q14.3 region were also observed in 13 individuals, which included 6 controls, 2 endometrial cancer cases, 3 lung cancer cases, 1 ovary cancer case and 1 pancreatic cancer case. Individuals with mosaic 13q14.3 deletions showed no preference for developing additional mosaic autosomal events in other chromosomal locations as compared with others with non-13q14.3 mosaic events (P=0.17).

Table 1 Counts and percentages for detected 13q14.3 mosaic loss across population characteristics

Detected 13q14.3 mosaic losses are plotted in Figure 1 and vary in size and estimated breakpoint locations. The median mosaic loss was 1.9 Mb (interquartile range: 1.1–17.1 Mb) in size, with the smallest mosaic loss 246 kb and the largest >75 Mb in size. Overall, smaller mosaic 13q14 events are more common than larger events (Figure 2a), which parallels what has been seen for large structural mosaic events across the autosomes.35 Common breakpoint regions on the centromeric side included clusters at 29.1–32.0, 40.1–43.7 and 49.3–49.6 Mb, whereas breakpoints on the telomeric side primarily clustered around 50.3–52.9 Mb (GRCh36). At a minimum, mosaic 13q14.3 losses detected in peripheral blood or buccal DNA cover a region on chromosome 13 spanning an ~393 kb region from base pairs 49 590 000 to 49 983 100 (GRCh36). This region includes DLEU1 and DLEU2, the ST13P4 pseudogene, and is ~70 kb telomeric to the location of MIR15A and MIR16-1. The variable size of the detected 13q14.3 mosaic losses resulted in deletions of other known tumor-suppressor genes. When we examined the mapping of the deletions, we observed inclusion of BRCA2 in 10% of individuals, RB1 in 40%, MIR15A in 87%, MIR16-1 in 87%, DLEU1 in 95%, DLEU2 in 98% and DLEU7 in 97%.

Figure 1
figure 1

Distribution of 13q14.3 losses. (a) Graphical illustration of the 60 detected mosaic losses that span the minimally deleted region at 13q14.3. (b) Zoom in of the 13q14.3 minimally deleted region showing overlap of detected events (top panel) and genes in the region (bottom panel).

Figure 2
figure 2

Event size, log R ratio (LRR) and proportion of mosaic cells (P(Mosaic)) for detected 13q14.3 losses. (a) Violin plots of event size, LRR and P(Mosaic) distributions. (b) Relationship between LRR and P(Mosaic) for monoallelic 13q14.3 losses.

In an analysis of average LRR values, an indicator of allelic copy number, the majority of individuals with 13q14.3 mosaic losses affect only one allele (Figure 2a). There was, however, one control individual with evidence for mosaic loss of both alleles at the 13q14.3 locus (LRR=−1.58). Overall, the observed mosaic proportion, a measurement of the percentage of cells containing 13q14 loss, was between 19.7 and 90.1%, with a mean of 41.4% (Figure 2a). As expected, we observed an association between LRR and mosaic proportion in our data, indicating that despite the small size of many 13q14.3 mosaic losses, we were able to calculate highly correlated measures of mosaic percentage (P=8.16 × 10−9, adj R2=0.52) (Figure 2b). No significant associations were observed between the size of 13q14.3 deletion and either LRR or mosaic percentage (P=0.097 and 0.211, respectively).

Analysis of the 13q14.3 deletion breakpoints using select bioinformatic data tracks from the UCSC browser and ENCODE indicates substantial clustering of active elements in the region of 13q14.3 loss breakpoints. In comparison with a random sampling of similar-sized regions on other autosomes, we observed a statistically significant enrichment for regions rich in genes and regulatory elements (Figure 3). For instance, RefSeq genes and CpG islands were significantly enriched at 13q14.3 breakpoints (P<0.002; Figure 3a), indicating 13q14.3 breakpoint regions often occur near genes or promoters of genes. Indicators of open chromatin, ORChID, DNase HS and FAIRE-Seq, were also significantly enriched (P<0.002; Figure 3b). There was no evidence indicating an association between recombination rate and location of 13q14.3 mosaic breakpoints (P=0.39; Figure 3c). The distribution of repetitive elements such as SINEs, LINEs, LTRs and segmental duplications suggests clustering in 13q14.3 breakpoint regions (Figure 3d). In particular, SINEs are significantly enriched for (P<0.002), whereas LTRs are less common in these regions (P=0.002). A marginally significant enrichment for segmental duplications was also observed in 13q14.3 breakpoint regions (P=0.062).

Figure 3
figure 3

Breakpoint analysis of 13q14 breakpoint regions in comparison with breakpoints from random permutations of similarly sized regions spanning the 13q14.3 minimally deleted region. Features investigated include (a) gene-rich regions, (b) indicators of open chromatin, (c) recombination rate and (d) repetitive elements. Gray distributions are means across 1000 permutations. Black boxes and error bars represent the mean across detected 13q14.3 breakpoints and 95% confidence interval around the mean. Asterisks (*) indicate P<0.05.

Unadjusted associations with 13q14.3 mosaic deletions were investigated (Table 1). Mosaic loss of 13q14 was slightly more common in males compared with females (0.09 versus 0.05, respectively, P=0.051). An unadjusted positive association was also observed between mosaic 13q14.3 loss and 5-year age group (<50, 50–54, 55–59, 60–64, 65–69, 70–74, 75+, P=1.8 × 10−3), ranging from 0.02% for individuals <50 years and increases up to a frequency of 0.18% for individuals 75 years and older. An unadjusted association with ancestry was also detected (P-value=0.0004) with mosaic loss of 13q14.3 most commonly observed in individuals of European ancestry, but less commonly in individuals of African ancestry. Table 1 further illustrates the breakdown of 13q14.3 mosaic events by solid tumor subtype. Overall, no significant difference in 13q14.3 loss was observed between non-hematological cancers (0.08%) and controls (0.06%) (P-value=0.19).

In exploratory multivariable models including study, gender, 5-year age group, ancestry and non-hematologic cancer status, a positive association was only observed for age (P-value=0.028). The increased odds for a 13q14.3 mosaic deletion for 75 years or older compared with those less than age 50 years was 8.70 (95% CI=1.65–45.78). In the multivariable analysis, there were no associations of overall solid tumor (P-value=0.89) or specific tumor subtype with 13q14.3 mosaic loss.

To investigate whether detection of 13q14 deletion could be detecting early, undiagnosed cases of CLL, age- and gender-specific frequencies of 13q14.3 mosaic loss in our data set were compared with SEER population-based CLL prevalence data (Table 2). Assuming 50% of all CLL cases have 13q14.3 loss, expected age- and gender-stratified counts of 13q14.3 loss were estimated for our sample set based on SEER limited-duration prevalence data (0–<34 years) and US census data. Results indicate that although on average slightly more instances of 13q14.3 mosaic loss are observed among non-hematologic cancer cases and controls, this does not significantly depart from the 13q14.3 losses attributable to undiagnosed CLL that would be expected in a population of this size, age and gender distribution (P=0.46).

Table 2 Observed versus expected 13q14.3 mosaic deletions

Discussion

Our analysis of 13q14.3 mosaic losses in DNA isolated from blood or buccal cell samples revealed an overall frequency of 0.073% (95% CI=0.056–0.094) based on detection of 13q14.3 mosaicism in 60 individuals out of 82 483 who were included in non-hematologic cancer GWAS. No significant difference in frequency of 13q14.3 mosaic loss was detected between non-hematologic cancer cases and controls. We show that the prevalence of mosaic 13q14.3 loss increases with age, a phenomenon observed for all large mosaic events.35, 39 The size of 13q14.3 mosaic losses can vary, but almost always included an MDR on chromosome 13, from 49 590 000 to 49 983 100 (GRCh36). In more than 85% of instances, mosaic losses included the MIR15A, MIR16-1, DLEU1, DLEU2 and DLEU7 loci. We did not identify a clearly defined set of breakpoints, suggesting that events may not be mapped to specific base pairs but instead to regions of open chromatin.

An enrichment of genes, promoter sites and enhancers around 13q14.3 breakpoints indicates cellular mechanisms related to transcription and gene expression may be important in breakpoint formation. Regions of open chromatin around enhancers and gene-rich regions may also expose more of the DNA backbone to environmental mutagens and could result in higher probabilities for mutational events. The enrichment of SINEs in 13q14.3 breakpoint regions, as well as a marginally significant enrichment of segmental duplications, suggests that genomic repeat regions could have a role in initiation of events leading to 13q14.3 mosaic losses. It is less clear what functional role these repetitive elements could contribute to the initiation of somatic 13q14.3 loss, but transposon activity and mismatch repair may be two important mechanisms to investigate further. Although these findings do not conclusively demonstrate a specific cellular mechanism responsible for the detected 13q14.3 mosaic losses, it does suggest transcription coupled repair, exposure to DNA mutagens and transposon activity may be important mechanisms capable of initiating DNA breaks leading to mosaic 13q14.3 deletions.

In an unadjusted analysis, we observed 13q14.3 mosaic loss were associated with age (5-year age group), continental ancestry and endometrial cancer, but with small numbers these latter observations are most likely false positives. In fact, only the 5-year age group association was observed in the adjusted multivariable analysis. The association between mosaicism and age has been observed in previous studies reporting on overall autosomal mosaicism.33, 34, 35, 39 Notably, 13q14 deletion events are among the most common large structural somatic events observed in detectable mosaicism. The increasing frequency of MBL and CLL with age38, 40, 41 makes it challenging to delineate whether detected 13q14.3 mosaic losses are early biomarkers for potential MBL and CLL risk or if such deletions are sentinel events that frequently appear when genomic maintenance capacities begin to deteriorate with age. Additionally, many of the contributing cancer GWASs did not screen for MBL/CLL; thus, there is a substantive possibility that some undiagnosed cases may have existed in our data set.

Our study does not provide sufficient evidence that mosaic 13q14.3 deletions in blood or buccal DNA are significantly associated with solid tumors, despite the literature reports of sporadic observations of 13q14.3 deletion events in a spectrum of cancers. Publically available The Cancer Genome Atlas data on the cBioPortal suggests that 13q14.3 deletions are present in up to 11% of prostate tumors, 7% of bladder tumors, 4% of endometrial tumors and 3% of ovary and colorectal tumors.42 We surmise that many of these may be passengers and not necessarily drivers of the solid tumors. However, it is important to note that our study on DNA obtained from blood and buccal cells does not provide information about the frequency of 13q14.3 loss in other tissues, which may be a more relevant prognostic feature for tumors that develop in those tissues. In addition, even in our large survey of cancer GWAS, we had limited power to investigate the association between 13q14.3 mosaic loss and solid tumors.

Our study raises an important question regarding the implications of harboring a mosaic 13q14.3 deletion. The most likely possibility is that individuals with mosaic 13q14.3 deletions are instances of early undetected MBL/CLL. The commonly deleted region in our analysis is nearly identical to that seen in MBL/CLL. and the frequency estimate we observe for 13q14.3 deletions (0.07%) is statistically indistinguishable from the expected age and sex prevalence of CLL, assuming that 50% of CLL cases have a 13q14.3 deletion. Additionally, deletions of 13q14.3 as the sole abnormality are generally associated with indolent clinical disease2 and often appear early in early-stage CLL.43 Alternatively, it is plausible that these 13q14.3 deletions are somatic changes in B- or T-cell populations that serve as sentinel events that frequently appear when genomic maintenance capacities begin to deteriorate with age. The last, and least likely possibility, is that these are inherited events passed from parents to offspring through the germline. The range of allelic proportions for heterozygous SNPs spanning these events (20–90%) substantially deviates from the 50% expectation, making this interpretation very unlikely. Further large studies are required to capture a sufficiently large enough set of mosaic 13q14.3 deletions in a prospective manner so as to accurately assess cancer risk.