DNA copy number motifs are strong and independent predictors of survival in breast cancer

Pladsen, Arne V.; Nilsen, Gro; Rueda, Oscar M.; Aure, Miriam R.; Borgan, Ørnulf; Liestøl, Knut; Vitelli, Valeria; Frigessi, Arnoldo; Langerød, Anita; Mathelier, Anthony; Engebråten, Olav; Kristensen, Vessela; Wedge, David C.; Van Loo, Peter; Caldas, Carlos; Børresen-Dale, Anne-Lise; Russnes, Hege G.; Lingjærde, Ole Christian

doi:10.1038/s42003-020-0884-6

Download PDF

Article
Open access
Published: 02 April 2020

DNA copy number motifs are strong and independent predictors of survival in breast cancer

Communications Biology volume 3, Article number: 153 (2020) Cite this article

4428 Accesses
10 Citations
17 Altmetric
Metrics details

Subjects

Abstract

Somatic copy number alterations are a frequent sign of genome instability in cancer. A precise characterization of the genome architecture would reveal underlying instability mechanisms and provide an instrument for outcome prediction and treatment guidance. Here we show that the local spatial behavior of copy number profiles conveys important information about this architecture. Six filters were defined to characterize regional traits in copy number profiles, and the resulting Copy Aberration Regional Mapping Analysis (CARMA) algorithm was applied to tumors in four breast cancer cohorts (n = 2919). The derived motifs represent a layer of information that complements established molecular classifications of breast cancer. A score reflecting presence or absence of motifs provided a highly significant independent prognostic predictor. Results were consistent between cohorts. The nonsite-specific occurrence of the detected patterns suggests that CARMA captures underlying replication and repair defects and could have a future potential in treatment stratification.

Signatures of copy number alterations in human cancer

Article Open access 15 June 2022

Christopher D. Steele, Ammal Abbasi, … Nischalan Pillay

A pan-cancer compendium of chromosomal instability

Article 15 June 2022

Ruben M. Drews, Barbara Hernando, … Florian Markowetz

A survey of cancer genome signatures identifies genes connected to distinct chromosomal instability phenotypes

Article 17 March 2021

Manar S. Shafat, Eamaan S. Rufaie & Johnathan Watkins

Introduction

The allele-specific DNA copy number profile of a tumor is a window into its past history and its future evolutionary potential^1,2,3. In general, we may consider a copy number profile as the accumulated result of a series of genomic events^4,5,6,7. Specific DNA replication and repair errors may leave particular traces throughout the genome in the form of recurring local patterns, or motifs^8,9,10,11. We hypothesized that such motifs represent a substantial proportion of the copy number variation in a tumor, and that they partly explain the high intertumor copy number heterogeneity frequently observed in cancer. We further hypothesized that the presence or absence of specific motifs is informative of a tumor’s past and future evolutionary trajectory. Detailed characterization of such features would thus allow prediction of disease behavior and could potentially direct choice of treatment.

Here, we present an analysis of regional nonsite-specific motifs from allele-specific DNA copy number profiles in breast cancer. The core of this framework is the Copy Aberration Regional Mapping Analysis (CARMA) algorithm, which creates a compact representation of the aberration architecture. Conceptually, the algorithm represents copy number profiles as real-valued functions over the genomic domain and derives a small set of scores representing distinct regional features. The proposed method takes into account copy number amplitude, spatial distribution of copy number break points and allelic imbalance, and captures regional fluctuations in copy number, a signature feature of chromothripsis and chromoplexy. By generating a low-dimensional representation of the copy number data, the proposed algorithm also avoids the curse of dimensionality.

CARMA is related to multiple algorithms designed to detect specific copy number aberration patterns in tumors. The chromosomal instability index (CINdex)¹² and the genomic instability index (GII)¹³ both quantify the total amount of genomic aberrations. Other algorithms have been proposed for detection of simplex and complex copy number events⁹ and structural rearrangement patterns¹⁴, for example the complex arm-wise aberration index (CAAI). An algorithm identifying the presence of multiple aberration patterns with application to ovarian cancer was recently proposed¹¹. In addition, several methods have been proposed to identify copy number features recurring across tumors, such as GISTIC^15,16.

We applied CARMA to four breast cancer patient cohorts (METABRIC, Oslo2, Oslo-Val, and ICGC; see “Methods” for details). An integrated score was derived and shown to have superior prediction performance for breast cancer specific survival compared with other available clinical and molecular stratifications. The relation between copy number motifs and established driver gene based classifications of breast cancer was investigated. The analysis described in the paper is applicable to allele-specific copy number data from all types of cancer and any type of platform, including SNP arrays and high-throughput sequencing.

Results

Brief outline of the analysis approach

CARMA is applicable to allele-specific copy number profiles from one or several tumors, obtained from SNP array analysis or DNA high-throughput sequencing. The algorithm extracts multiple local features which are accumulated across genomic regions by numerical integration to form six regional scores. These scores reflect the degree of amplification (AMP), deletion (DEL), complexity (STP and CRV), such as chromothripsis and chromoplexy, loss of heterozygosity (LOH) and allelic imbalance or asymmetry (ASM). More details and precise mathematical definitions are deferred to “Methods.” The analysis pipeline is depicted in Fig. 1a–d. An application of the algorithm to three breast tumor samples in the Oslo2 cohort and with chromosome arms as regions is shown in Fig. 1e. Specific regional features are discernible, illustrating how CARMA can be used to perform between-sample comparison of copy number features that are not locus specific.

**Fig. 1: Outline of the CARMA algorithm.**

Relation to other methods

CARMA was compared with two methods for detection of nonsite specific copy number aberrations in single samples: CAAI⁹ and CINdex¹². The CAAI algorithm identifies chromosome arms with complex rearrangements, while CINdex detects regional gains and losses. We also compared CARMA with GISTIC, a well-established method for detection of regions with significant copy number change across multiple samples^15,16. Figure 2a shows circos plots of CARMA profiles for two selected samples in the METABRIC cohort, together with the results from GISTIC, CINdex, and CAAI.

As expected, CAAI correlates with the two CARMA complexity scores STP and CRV, but the relative sizes of STP and CRV provide additional detail (e.g., on chromosome 16 in the sample MB-0010). CINdex captures both gains and losses, but in the two selected samples it correlates stronger with DEL than with AMP. This is not unexpected, since the CINdex algorithm includes a relative weighting of gains and losses, while CARMA does not. The use of six distinct measure of copy number distortion in CARMA generally provides more detail than CINdex. For example, in a region with loss of one allele and gain of the other (i.e. a uniparental disomy), such as chromosome 22 in MB-0010, CARMA reports LOH and ASM, while CINdex reports no alteration (Fig. 2a). Observe also that the complex aberration on chromosome 11p in MB-0028 which is reported by CINdex is positive for all six CARMA scores including STP and CRV.

For GISTIC, regions of significant gain or loss were identified based on all METABRIC samples; a binary score is subsequently assigned to each sample in each such region based on the presence or absence of a loss or gain. Regions with significant loss or gain according to GISTIC partially overlap with DEL and AMP, respectively. Next, we investigated the distribution of CARMA scores within each region identified by GISTIC (Fig. 2b). A strong overlap is observed between GISTIC gain and high AMP score, and between GISTIC loss and high DEL score. In addition, there is considerable diversity in the CARMA spectrum within regions called as gains or losses according to GISTIC. For example, the relative contribution of LOH is highly variable across GISTIC loss regions. Similarly, the relative contribution of complex aberrations captured by STP and CRV varies across GISTIC gain regions.

Molecular subgroups have distinct CARMA signatures

We next considered the distribution of CARMA scores within established molecular stratifications of breast carcinomas (PAM50 and IntClust). PAM50^17,18 is an expression based classification system defining five distinct subgroups of breast tumors based on the correlation to a set of 50 genes. IntClust^1,19 identifies ten different subtypes based on the pattern of copy number aberrations exerting an effect on gene expression in cis. The distribution of CARMA scores within these classification systems were explored in four different breast cancer data sets of varying sample size (n = 1943, n = 276, n = 165, and n = 553). The percentage of tumors with scores exceeding a median threshold was plotted for all arm scores and for each PAM50 and IntClust subtype separately (Fig. 3a and Supplementary Figs. 1–4). The CARMA scores consistently reflect differences in the landscapes of genomic architecture in the different biological and clinical patient groups. This visual overview of aberration patterns highlights subtype specific features such as frequent allelic loss on 17p and frequent gain and high complexity on 17q in IntClust1; gain on 1q, frequent asymmetric gain and complex aberrations on 11q and allelic loss on 16q in IntClust2; etc. The signatures of regional CARMA scores within the PAM50 subtypes highlight known features, including whole arm 1q gain/16q loss in luminal A tumors, the more complex copy number aberrations in luminal B tumors, the 17q alterations dominating Her2-enriched tumors, and the global instability of basal-like tumors. Three-dimensional scatter plots of CARMA scores were plotted for all tumors in the Oslo2 cohort (n = 276) and METABRIC cohort (n = 1943) (see Fig. 3b). Trend curves and subtype centroids both demonstrate high degree of consistency between the two cohorts.

**Fig. 3: Stratification and outcome prediction with CARMA.**

Predicting survival from regional scores

To assess the association between disease-specific survival (DSS) and genome-wide CARMA scores, a univariate Cox proportional hazards regression model was fitted with each score as a covariate (see Supplementary Table 1). For this purpose, we used the largest cohort (METABRIC set). All scores were associated with survival (P < 10⁻⁶; Score test) and the strongest associations were found for the scores STP and CRV (P < 10⁻¹⁸; Score test).

We next split the METABRIC cohort into a discovery cohort (n = 1295) and a test cohort (n = 648). We fitted a multivariate Cox regression model to DSS and progression-free survival (PFS) data in the discovery cohort based on the six predictors. The predictors were defined by taking an unweighted mean across all the regional (arm-wise) CARMA scores (Fig. 3c). The fitted model was next applied to the test set, producing a single unweighted prognostic value per patient. Thresholds corresponding to the 1/3 and 2/3 percentile were applied to classify samples into groups of low, intermediate, and high risk, with numerical values ranging from 1 to 3. This final score was termed the CARMA Prognostic Index (CPI). An alternative prognostic index was defined using the 252 arm-wise CARMA scores directly as predictors and fitting a Cox regression model with Lasso penalty to the training set. Coefficients derived from the analysis (Supplementary Fig. 5) were used as weights to calculate a weighted prognostic index termed CPI_weighted.

To compare the efficacy of CPI and CPI_weighted to established clinically and biologically relevant parameters, we fitted a univariate Cox regression model in the METABRIC test set using the prognostic indices and the clinical parameters as covariates (Table 1 and Supplementary Tables 2–3). The P value for CPI from the analysis was lower than for any of the other clinical parameters when looking at both DSS (P = 1.9 × 10⁻¹³; Score test) and PFS (P = 5.7 × 10⁻¹³; Score test), and also performed better than CPI_weighted. However, CPI_weighted did remain strongly significant in the analysis for both DSS (P = 5.2 × 10⁻¹⁰; Score test) and PFS (P = 3.7 × 10⁻⁷; Score test) presenting P values lower than many of the other established parameters. Hazard ratios for CPI and other clinical variables from univariate Cox regression analysis are shown in Fig. 3d.

Table 1 Prognostic value of CPI and other variables.

Full size table

Cox regression modeling was also performed to assess the effect of the prognostic indices with adjustments for other variables (see Table 1 and Supplementary Tables 2–3). CPI consistently showed smaller P values than all other clinical variables. Also CPI_weighted remained significant when adjusting for other variables (Supplementary Tables 2–3). Hazard ratios from multivariate Cox regression models where the effect of CPI is adjusted for the effect of clinical variables are shown in Fig. 3f.

CPI was next used to stratify patients into low, intermediate, and high-risk groups as described above in the three validation cohorts with survival data available (METABRIC test set, OsloVal, and ICGC). A logrank test was performed for the three groups in each data set (Fig. 3e). P values were significant when considering both DSS (P < 10⁻¹² in METABRIC test, P < 10⁻⁴ in OsloVal, and P = 0.003 in ICGC) and PFS (P < 10⁻¹² in METABRIC test; PFS data were not available for OsloVal or ICGC).

Finally, the unweighted continuous prognostic score that was used to obtain the CPI, was utilized to calculate a Harrell’s C score in the METABRIC test set. The C scores obtained from the analysis were 0.65 (95% CI: 0.62–0.69) and 0.64 (95% CI: 0.61–0.68) based on DSS and PFS, respectively.

Discussion

Structural DNA distortions are a result of deregulated DNA repair and maintenance, and mutagenic processes operating in the cells. The conventional focus in studies of DNA copy number alterations in tumors is the identification of recurrently deleted and amplified genes which may define key driver events in carcinogenesis or potential targets for treatment. We and others have previously shown that in addition to this gene centered or locus centered approach, the structural changes provide important information for classification and survival prediction^8,9,20. The methodology presented in this study complements gene specific analyses by providing a systematic framework to characterize the information embedded in the copy number profile of a tumor. CARMA determines the presence and relative contributions of six distinct copy number features in genomic regions and in the genome as a whole. By focusing on pervasive patterns or motifs in the genome rather than locus specific events, the algorithm captures footprints of past and ongoing segmental DNA alterations. Known drivers of such alterations are DNA replication and repair errors^8,9,10,11.

In this study, we used CARMA to assign scores to individual chromosome arms and to the whole genome. The CARMA algorithm is not bound to any particular genomic resolution though, and the tool supports assignment of individual scores to whole genomes, chromosomes, chromosome arms, or genomic bins of any desired width. For a given genomic resolution, scores for individual genes can also be obtained by inheritance of the respective regional score. Irrespective of the selection of regions on which to assign scores, the fact that regions are identical across tumors allows CARMA scores to be used directly as features in clustering, regression, and classification. Normally, the number of features will also be quite small, thus substantially reducing statistical problems related to high dimensionality.

CARMA reveals a rich spectrum of different copy number motifs across samples and also between regions within an individual sample. By combining six different measures of copy number aberration, it provides a more detailed picture of genomic architecture than GII, CAAI, and CINdex. CARMA and GISTIC represent complementary tools with different aims. Combining CARMA with GISTIC offers the possibility of providing a detailed picture of the aberration spectrum restricted to regions that are significantly altered across many samples.

Molecular taxonomy of breast cancer based on gene expression has proved important for the biological understanding of the disease¹⁷. IntClust¹ is a more recent driver-based classification of breast cancer and has been shown to also reflect degree of chemosensitivity²¹. The CARMA scores revealed distinct aberration signatures for the ten IntClust groups, suggesting that the copy number motifs reflect a driver-based classification of tumors. As seen from the Manhattan plots, the expression signatures defining the IntClust subtypes are to a large degree correlated to focal copy number aberrations, representing driver alterations in these subtypes. The copy number aberrations in these driver regions also exhibit differences in their pattern. This is for instance illustrated by the different types of copy number gains found on the 1q arm in the IntClust 8 subtype, as compared with the gains found on the 11q arm in the IntClust2 group. The first type of gain represents noncomplex low-amplicon whole arm translocations captured by the AMP and ASM scores, while the latter represents more complex rearrangements with high-amplicon gains²² captured by all of the CARMA scores. Even though both of the observed patterns represent copy number gains, the underlying mechanisms causing these patterns are fundamentally different. The CARMA scores manage to capture these nuances, illustrating the potential of the method to discriminate between a richer set of aberrational patterns. The plot also gives an indication of the global background variation from copy number aberrations, maybe most apparent in the IntClust ten subtypes. Interestingly, the degree to which the different subtypes are affected by this background variation seems to correlate well with the fraction of TP53 mutations observed within each subtype²³. This again supports the notion that copy number motifs reflect underlying biological traits.

In order to assess the ability of the method to predict breast cancer specific survival, a univariate Cox regression model was fitted to genome-wide CARMA scores in the METABRIC cohort. All genome-wide scores showed a strong and significant association to survival. As a first step this supports the assumption that each of the selected scores are informative and thus qualifies for use in further survival analyses. The scores were combined to produce the unweighted and weighted prognostic indices CPI and CPI_weighted. When CPI and CPI_weighted were compared with established clinical parameters through Cox regression analyses, CPI consistently outperformed all other variables in terms of the level of significance. The multivariate Cox analyses established that CPI is a strong independent predictor of survival in breast cancer. The results might point towards a role of specific aberration motifs, proceeding from specific types of genomic instability, as determinants of malignancy potential in a tumor. The fact that CPI outperformed GII in the above analyses supports the idea that additional information is added through multifaceted measurements of copy number aberrations.

The observation that CPI produced better prognostic predictions than CPI_weighted mightstem from the somewhat strict variable selection exerted by the Lasso regression model. The Lasso model excludes arm-specific scores that individually do not contribute strongly to the survival prediction. Aggregated, however, these arm-specific scores might confer additional prognostic information. CPI, which is based on combining all arm scores in an unweighted manner, is not subject to the same kind of selection bias. The fact that this more inclusive approach performed better in our analyses suggests that all parts of the genome copy number aberration profile contribute to the real signal when assessing survival. This supports the notion that our method captures omnipresent background variation caused by underlying DNA disruptions.

In the future it would be of high interest to apply the methodology to different cancer types to compare aberration patterns across tumors at different sites, for example using The Cancer Genome Atlas Pan-Cancer data set²⁴. Translocation of genomic material is not captured by any array-based DNA analysis, and data from high-throughput sequencing would be required to fully characterize genomic architecture. The complex patterns described in this manuscript are likely to reflect specific mutational processes that could be further elucidated in future studies, linking CARMA with sequencing data. Finally, ASCAT has recently been implemented for whole genome sequencing data²⁵, and it would be interesting to apply our methodology directly to the allele-specific copy number profiles extracted from such data.

Several extensions of the current analyses are possible. One could for example in- crease the genomic resolution by partitioning the genome into a fairly large number of equal-sized regions (say 1000), and then assign separate scores to each of these. At some point, however, the regions may become too small to meaningfully assign scores, most notably for the indices reflecting complex rearrangements (STP and CRV). Another possible extension would be to consider regions harboring genes involved in specific processes or pathways, thus directly linking CARMA scores to biological function.

Methods

Deriving allele-specific copy number profiles

Affymetrix CEL files were preprocessed using the PennCNV libraries for Affymetrix data²⁶ that includes quantile normalization, signal extraction, and summarization. All samples were normalized to a collection of around 5000 normal samples from the HapMap project²⁷, the 1000 genome project²⁸, and the Wellcome Trust Case Control Consortium²⁹. The resulting LogR and BAF (B allele frequency) values were segmented with the piecewise constant fitting algorithm³⁰ and processed with the ASCAT algorithm (version 2.3)³¹ after adjusting LogR for GC binding artifacts³². ASCAT infers an allele-specific copy number profile of a tumor after correction for tumor ploidy and tumor cell fraction, and is based on allele-specific segmentation of normalized raw data³⁰ with penalty parameter (γ) set to 50. The profile reflects the copy number state at m genomic loci for which two alleles are present in the germline in the general population, and can be represented as a sequence of pairs (n_Ai, n_Bi) (i = 1,…, m), where n_Ai and n_Bi denote the number of copies of each of two alleles (here called A and B) being present in the tumor genome at the ith locus. Pairs are ordered according to location, and since the labels A and B are arbitrary, we may assume that n_Ai ≥ n_Bi.

Calculating regional instability scores

We characterize the allele-specific copy number in a small genomic neighborhood on a chromosome arm by six features: degree of alteration in negative direction, degree of alteration in positive direction, degree of change, degree of oscillation, extent of LOH, and extent of allelic imbalance (see Fig. 1c). Sliding the genomic region along the chromosome arm from one end to the other, we may regard each feature as a function of genomic position. Specifically, suppose we have measured allele-specific copy numbers (n_Ai, n_Bi) at genomic loci L_i, i = 1,…, m. We can represent this as a pair of piecewise constant functions (f_A, f_B) defined on the unit interval R = [0, 1]. The interpretation of this is that each position L_i is mapped to a value t_i in the unit interval R = [0, 1], and such that L₁ <· · · < L_m will be represented by points t₁ <· · · < t_m in R. We thus have a one-to-one correspondence between t ∈ [0, 1] and genomic loci L(t), and if L_k is the measurement locus closest to L(t), then f_A(t) = n_Ak and f_B(t) = n_Bk. We assume that f_B(t) ≤ f_A(t) for all t ∈ R, i.e., B is the minor allele when allelic imbalance is present. The median centered total copy number in locus t is f (t) = f_A(t) + f_B(t) − m, where m is the least number in Range(f) that satisfies µ(f⁻¹((−∞, m])) ≥ 1/2, where µ is the Lebesgue measure. Informally, this means that m is chosen as the observed copy number with the property that half the genome has a total copy number less than or equal to m. We define the change in total copy number as the derivative Df (t) of the first order spline interpolation to the center points of segments in f, i.e. Df (t) is the slope of the line segment connecting the pair of segment centers immediately to the left and right of position t. Note that Df is also a piecewise constant function. We define the oscillation in total copy number as D²f (t) = D(Df (t)), which is also a piecewise constant function. This process can in principle be repeated to define higher order properties of f such as D³f (t) = D(D²f (t)); however, in practice further levels add little additional information.

Regional instability scores are next defined by integrating the above local scores over the desired region (e.g., over a chromosome arm). To assess the degree of positive or negative deviation within a region, we define two scores:

$$J_1 = \mathop {\int}\limits_R {\{ f(t)_ + \} ^2dt\;{\mathrm{and}}\;J_2 = } \mathop {\int}\limits_R {\{ f(t)_ - \} ^2dt},$$

where z₊ = z if z > 0 and z₊ = 0 otherwise, and z₋ = z if z < 0 and z₋ = 0 otherwise. For example, in a region with total copy number equal to the median, we have J₁ = J₂ = 0, while in a region with some gains and no losses relative to the median, we have J₁ > 0 and J₂ = 0. The regional degree of change and oscillation in copy number are captured by the following two scores:

$$J_3 = \mathop {\int}\limits_R {\left\{ {Df(t)} \right\}^2dt\;{\mathrm{and}}\;J_4 = } \mathop {\int}\limits_R {\left\{ {D^2f(t)} \right\}^2dt}.$$

In a region with constant total copy number, we have J₃ = J₄ = 0. In a region with gradually increasing (or decreasing) copy number, J₃ > 0 while J₄ is close to zero, and in a region with fluctuations between smaller and larger copy numbers we have J₃ > 0 and J₄ > 0. LOH and allelic asymmetry are captured by the last two scores:

$$J_5 = \mathop {\int}\limits_R {\{ 1_0(f_B\left( t \right))\} dt\;{\mathrm{and}}\;J_6 = } \mathop {\int}\limits_R {(f_A\left( t \right) - f_B\left( t \right))^2dt},$$

where 1₀(z) = 1 if z = 0 and 1₀(z) = 0 otherwise. In a region with only one allele present we have J₅ > 0 and the magnitude of the score reflects the proportion of the region with LOH. In a region with allelic imbalance, we have J₆ > 0. Further computational details can be found in Supplementary Materials.

Calculating CARMA scores in sex chromosomes

The top level function in the accompanying software does not currently support calculation of CARMA scores for the Y chromosome. It is still possible to obtain such scores by use of the included low level function for calculating scores on a single chromosome. Calculation of CARMA scores for the X chromosome is supported, but it requires information about the gender for correct calculation of AMP and DEL.

Statistics and reproducibility

Three-dimensional scatter plots: Subtype centroids were calculated by averaging over all the three-dimensional vectors representing samples from a particular PAM50 subtype. Trend curves are principal curves³³ and were calculated with the R package princurve using default parameter values.

Survival analysis: To assess the association between survival (DSS or PFS) and CPI risk groups, a longrank test was applied, and survival estimates were found using the Kaplan–Meier estimator. The functions survdiff and survfit in the R package survival were used for this purpose. All other associations between survival and covariates were assessed using univariate or multivariate Cox regression, as appropriate. A score test was applied to test the significance of individual covariates in the Cox models. Models were fitted by maximization of the Cox partial likelihood, with the exception of the model containing all the 252 arm-wise CARMA scores as covariates. In the latter case a Cox partial likelihood with an L₁ (lasso) penalty³⁴ was applied. The lasso is a regularization method that shrinks regression coefficients towards zero by enforcing an upper bound on the L₁-norm of the coefficients $( {{\mathrm{i}}{\mathrm{.e}}.\mathop {\sum }\nolimits_{j = 1}^p |\beta _j| \le \;\lambda })$ in the maximization of the partial log likelihood.

The amount of shrinkage is determined by a tuning parameter λ. Leave-one-out cross-validation was used to determine the value of λ. Cox regression with a Lasso penalty was performed using the functions cv.glmnet and glmnet in the R package glmnet^35,36. All other Cox regressions were performed using the function coxph in the R package survival.

Assessment of risk-score model: The goodness of fit of the continuous CPI risk score was determined using Harrell’s C score. For every pair of observations it is determined if the pair is concordant (lowest risk pairs with longest survival), discordant (lowest risk pairs with shortest survival) or cannot be determined due to censoring. Harrell’s C score is then the ratio between the number of concordant pairs and the number of concordant/discordant pairs.

The weighted prognostic index (CPI_weighted) was calculated as CPI_weighted = $x_i^T$ β^ˆ, where x_i represents the CARMA arm scores for patient i in the validation data set and β^ˆ are the estimated coefficients in the survival prediction models found for the discovery set.

Materials

The data material in this study was obtained from four patient cohorts: METABRIC (n = 1943), Oslo2 (n = 276), OsloVal (n = 165), and ICGC (n = 553). Only female patients were included. The distribution of clinical parameters within each of the data sets can be found in Supplementary Tables 4–5. The METABRIC cohort was randomly split into a 2:1 ratio into a discovery set (n = 1295) and a test set (n = 648) for the purpose of model validation. For detailed information regarding which samples belong to the train and test cohort, please contact the authors. For more details about the four cohorts, see Supplementary Material and Methods. Survival data were not available for the Oslo2 cohort.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

Genomic copy number and gene expression information as well as clinical data for the OsloVal cohort have been described previously³⁷ and are available at the Synapse platform, https://doi.org/10.7303/syn1688370. Gene expression information for the Oslo2 cohort has been described previously^38,39 and is available at Gene Expression Omnibus, DOI: GSE81002. The SNP 6.0 copy number data from the Oslo2 cohort are available upon request. Molecular-subtype information and segmented copy number profiles for the OsloVal and Oslo2 cohort are available from the corresponding author on reasonable request. Genomic copy number, gene expression and molecular-subtype information for the METABRIC cohort have been described previously¹ and are available at the European Genome Phenome Archive, DOI: EGAS00000000083, while clinical data are available from¹⁹. Gene expression data, segmented copy number profiles and clinical information for the ICGC breast cancer cohort have been described previously⁴⁰ and are available from the Supplementary Tables in that publication. Raw data are available at the European Genome Phenome Archive under the overarching accession number EGAS00001001178.

Code availability

Software with detailed instructions and test data is available as an R package at the web site http://heim.ifi.uio.no/bioinf/Projects/. The software is open source and may be used according to the MIT license.

References

Curtis, C. et al. The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature 486, 346–352 (2012).
Article CAS PubMed PubMed Central Google Scholar
Beroukhim, R. et al. The landscape of somatic copy-number alteration across human cancers. Nature 463, 899–905 (2010).
Article CAS PubMed PubMed Central Google Scholar
Yates, L. R. et al. Genomic evolution of breast cancer metastasis and relapse. Cancer Cell 32, 169–184.e7 (2017).
Article CAS PubMed PubMed Central Google Scholar
Yi, K. & Ju, Y. S. Patterns and mechanisms of structural variations in human cancer. Exp. Mol. Med. 50, 98 (2018).
Article PubMed Central Google Scholar
McClintock, B. The stability of broken ends of chromosomes in zea mays. Genetics 26, 234–282 (1941).
CAS PubMed PubMed Central Google Scholar
Stephens, P. J. et al. Massive genomic rearrangement acquired in a single catastrophic event during cancer development. Cell 144, 27–40 (2011).
Article CAS PubMed PubMed Central Google Scholar
Baca, S. C. et al. Punctuated evolution of prostate cancer genomes. Cell 153, 666–677 (2013).
Article CAS PubMed PubMed Central Google Scholar
Hicks, J. B. et al. Novel patterns of genome rearrangement and their association with survival in breast cancer. Genome Res. 16, 1465–1479 (2006).
Article CAS PubMed PubMed Central Google Scholar
Russnes, H. G. et al. Genomic architecture characterizes tumor progression paths and fate in breast cancer patients. Sci. Transl. Med. 2, 38ra47 (2010).
Article PubMed PubMed Central Google Scholar
Nik-Zainal, S. & Morganella, S. Mutational signatures in breast cancer: the problem at the DNA level. Clin. Cancer Res. 23, 2617–2629 (2017).
PubMed PubMed Central Google Scholar
Macintyre, G. et al. Copy number signatures and mutational processes in ovarian carcinoma. Nat. Publ. Group 50, 1262–1270 (2018).
CAS Google Scholar
Song, L. et al. CINdex: a bioconductor package for analysis of chromosome instability in DNA copy number data. Cancer Inform. 16, 1176935117746637 (2017).
Article PubMed PubMed Central Google Scholar
Chin, S. F. et al. High-resolution aCGH and expression profiling identifies a novel genomic subtype of ER negative breast cancer. Genome Biol. 8, R215 (2007).
Article PubMed PubMed Central Google Scholar
Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole-genome sequences. Nature 534, 47–54 (2016).
Article CAS PubMed PubMed Central Google Scholar
Beroukhim, R. et al. Assessing the significance of chromosomal aberrations in cancer: methodology and application to glioma. Proc. Natl Acad. Sci. USA 104, 20007–20012 (2007).
Article CAS PubMed PubMed Central Google Scholar
Mermel, C. H. et al. GISTIC2.0 facilitates sensitive and confident localization of the targets of focal somatic copy-number alteration in human cancers. Genome Biol. 12, R41 (2011).
Article PubMed PubMed Central Google Scholar
Perou, C. M. et al. Molecular portraits of human breast tumours. Nature 406, 747–752 (2000).
Article CAS PubMed Google Scholar
Parker, J. S. et al. Supervised risk predictor of breast cancer based on intrinsic subtypes. J. Clin. Oncol. 27, 1160–1167 (2009).
Article PubMed PubMed Central Google Scholar
Rueda, O. M. et al. Dynamics of breast-cancer relapse reveal late-recurring ER-positive genomic subgroups. Nature 567, 399–404 (2019).
Article CAS PubMed PubMed Central Google Scholar
Vollan, H. K. et al. A tumor DNA complex aberration index is an independent predictor of survival in breast and ovarian cancer. Mol. Oncol. 9, 115–127 (2014).
Google Scholar
Ali, H. et al. Genome-driven integrated classification of breast cancer validated in over 7,500 samples. Genome Biol. 15, 431 (2014).
Article PubMed PubMed Central Google Scholar
Glodzik, D. et al. Mutational mechanisms of amplifications revealed by analysis of clustered rearrangements in breast cancers. Ann. Oncol. 29, 2223–2231 (2018).
Article CAS PubMed PubMed Central Google Scholar
Silwal-Pandit, L. et al. TP53 mutation spectrum in breast cancer is subtype specific and has distinct prognostic relevance. Clin. Cancer Res. 20, 3569–3580 (2014).
Article CAS PubMed Google Scholar
Zack, T. I. et al. Pan-cancer patterns of somatic copy number alteration. Nat. Genet. 45, 1134–1140 (2013).
Article CAS PubMed PubMed Central Google Scholar
Raine, K. M. et al. ascatNgs: identifying somatically acquired copy-number alterations from whole-genome sequencing data. Curr. Protoc. Bioinforma. 56, 15.9.1–15.9.17 (2016).
Article Google Scholar
Wang, K. et al. PennCNV: an integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Res. 17, 1665–1674 (2007).
Article CAS PubMed PubMed Central Google Scholar
International HapMap Consortium. The international hapmap project. Nature 426, 789–796 (2003).
Article Google Scholar
The 1000 Genomes Project Consortium. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
Article PubMed Central Google Scholar
Burton, P. R. et al. Genome-wide asso- ciation study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007).
Article CAS Google Scholar
Nilsen, G. et al. Copynumber: efficient algorithms for single- and multi-track copy number segmentation. BMC Genom. 13, 591 (2012).
Article CAS Google Scholar
Van Loo, P. et al. Allele-specific copy number analysis of tumors. Proc. Natl Acad. Sci. USA 107, 16910–16915 (2010).
Article PubMed PubMed Central Google Scholar
Cheng, J. et al. Single-cell copy number variation detection. Genome Biol. 12, R80 (2011).
Article CAS PubMed PubMed Central Google Scholar
Hastie, T. & Stuetzle, W. Principal curves. J. Am. Stat. Assoc. 84, 502–516 (1989).
Article Google Scholar
Tibshirani, R. The lasso method for variable selection in the Cox model. Stat. Med. 16, 385–395 (1997).
Article CAS PubMed Google Scholar
Friedman, J., Hastie, T. & Tibshirani, R. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33, 1–22 (2010).
Article PubMed PubMed Central Google Scholar
Simon, N., Friedman, J. H., Hastie, T. & Tibshirani, R. Regularization paths for cox’s proportional hazards model via coordinate descent. J. Stat. Softw. 39, 1–13 (2011).
Article PubMed PubMed Central Google Scholar
Margolin, A. A. et al. Systematic analysis of challenge-driven improvements in molecular prognostic models for breast cancer. Sci. Transl. Med. 5, 181re1 (2013).
Article PubMed PubMed Central Google Scholar
Aure, M. R. et al. Integrated analysis reveals microRNA networks coordinately expressed with key proteins in breast cancer. Genome Med. 7, 21 (2015).
Article PubMed PubMed Central Google Scholar
Aure, M. R. et al. Integrative clustering reveals a novel split in the luminal A subtype of breast cancer with impact on outcome. Breast Cancer Res. 19, 44 (2017).
Article PubMed PubMed Central Google Scholar
Nik-Zainal, S. et al. Landscape of somatic mutations in 560 breast cancer whole- genome sequences. Nature 534, 47–54 (2016).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

The authors thank all the women who have contributed to this study by donating tumor tissue and blood. We thank Hans Kristian Moen Vollan for vital input and support in the development of an early version of the CARMA algorithm. We thank Sandra Jernstrøm for her assistance in preparing the gene expression data on Oslo2, Einar Rødland for his assistance in normalization of the gene expression data on Oslo2, and Phoung Vu, Veronica Skarpeteig, Inger Riise Bergheim, and Anja Valen for the TP 53 sequencing of Oslo2. David Wedge is supported by the Li Ka Shing Foundation and National Institute for Health Research Oxford Biomedical Research Centre. Peter Van Loo is supported by the Francis Crick Institute, which receives its core funding from Cancer Research UK (FC001202), the UK Medical Research Council (FC001202), and the Wellcome Trust (FC001202). Peter Van Loo is a Winton Group Leader in recognition of the Winton Charitable Foundation’s support towards the establishment of The Francis Crick Institute.

Author information

These authors contributed equally: Arne V. Pladsen, Gro Nilsen.
These authors jointly supervised this work: Hege G. Russnes, Ole Christian Lingjærde.

Authors and Affiliations

Department of Cancer Genetics, Institute for Cancer Research, Oslo University Hospital, Ullernchausseen 70 N-0310, Oslo, Norway
Arne V. Pladsen, Miriam R. Aure, Anita Langerød, Anthony Mathelier, Anne-Lise Børresen-Dale, Gry Aarum Geitvik, Vessela Kristensen, Anita Langerød, Ole Christian Lingjærde, Hege G. Russnes, Therese Sørlie, Vessela Kristensen, Anne-Lise Børresen-Dale, Hege G. Russnes & Ole Christian Lingjærde
Centre for Bioinformatics, Department of Informatics, University of Oslo, Gaustadalléen 23 B N-0373, Oslo, Norway
Gro Nilsen, Knut Liestøl, Ole Christian Lingjærde & Ole Christian Lingjærde
Cancer Research UK, Cambridge Research Institute, Li Ka Shing Centre, Robinson Way, Cambridge, CB2 0RE, UK
Oscar M. Rueda & Carlos Caldas
Department of Mathematics, University of Oslo, Moltke Moes vei 35 N-0851, Oslo, Norway
Ørnulf Borgan
Institute of Basic Medical Sciences, Faculty of Medicine, University of Oslo, Domus Medica, Sognsvannsveien 9 N-0372, Oslo, Norway
Valeria Vitelli & Arnoldo Frigessi
Centre for Molecular Medicine Norway, University of Oslo, Forskningsparken, Gaustadalléen 21 N-0349, Oslo, Norway
Anthony Mathelier
Department of Pathology, Oslo University Hospital, POB 4953 Nydalen N-0424, Oslo, Norway
Elin Borgen, Øystein Garred, Hege G. Russnes & Hege G. Russnes
Institute for Clinical Medicine, University of Oslo, Kirkeveien 166 N-0450, Oslo, Norway
Anne-Lise Børresen-Dale, Olav Engebråten, Rolf Kåresen, Bjørn Naume, Olav Engebråten & Anne-Lise Børresen-Dale
Department of Oncology, Oslo University Hospital, POB 4953 Nydalen, N-0424, Oslo, Norway
Olav Engebråten, Bjørn Naume & Olav Engebråten
KG Jebsen Centre for B-cell malignancies, Institute for Clinical Medicine, University of Oslo, Ullernchausseen 70 N-0372, Oslo, Norway
Ole Christian Lingjærde & Ole Christian Lingjærde
Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Old Road Campus, Headington, Oxford, OX3 7FZ, UK
David C. Wedge
NIHR Biomedical Research Centre, Warneford Ln, Headington, Oxford, OX3 7JX, UK
David C. Wedge
The Francis Crick Institute, 1 Midland Road, London, NW1 1AT, UK
Peter Van Loo
Norwegian University of Science and Technology, N-7491, Trondheim, Norway
Tone F. Bathen
Østfold Hospital, POB 300 N-1714, Grålum, Norway
Britt Fritzman
Akershus University Hospital, Sykehusveien 25, Lørenskog, Norway
Jürgen Geisler & Torill Sauer
Cancer Registry of Norway, Ullernchausseen 64 N-0379, Oslo, Norway
Solveig Hofvind
Department of Tumor Biology, Institute for Cancer Research, Oslo University Hospital, Ullernchausseen 70 N-0310, Oslo, Norway
Gunhild Mari Mælandsmo
Vestre Viken Hospital Trust, POB 800 N-3004, Drammen, Norway
Kristine Kleivi Sahlberg & Helle Kristine Skjerven
Section for Breast and Endocrine Surgery, Division of Surgery, Cancer and Transplantation Medicine, Oslo University Hospital, N-0424, Oslo, Norway
Ellen Schlichting

Authors

Arne V. Pladsen
View author publications
You can also search for this author in PubMed Google Scholar
Gro Nilsen
View author publications
You can also search for this author in PubMed Google Scholar
Oscar M. Rueda
View author publications
You can also search for this author in PubMed Google Scholar
Miriam R. Aure
View author publications
You can also search for this author in PubMed Google Scholar
Ørnulf Borgan
View author publications
You can also search for this author in PubMed Google Scholar
Knut Liestøl
View author publications
You can also search for this author in PubMed Google Scholar
Valeria Vitelli
View author publications
You can also search for this author in PubMed Google Scholar
Arnoldo Frigessi
View author publications
You can also search for this author in PubMed Google Scholar
Anita Langerød
View author publications
You can also search for this author in PubMed Google Scholar
Anthony Mathelier
View author publications
You can also search for this author in PubMed Google Scholar
Olav Engebråten
View author publications
You can also search for this author in PubMed Google Scholar
Vessela Kristensen
View author publications
You can also search for this author in PubMed Google Scholar
David C. Wedge
View author publications
You can also search for this author in PubMed Google Scholar
Peter Van Loo
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Caldas
View author publications
You can also search for this author in PubMed Google Scholar
Anne-Lise Børresen-Dale
View author publications
You can also search for this author in PubMed Google Scholar
Hege G. Russnes
View author publications
You can also search for this author in PubMed Google Scholar
Ole Christian Lingjærde
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

OSBREAC

Tone F. Bathen
, Elin Borgen
, Anne-Lise Børresen-Dale
, Olav Engebråten
, Britt Fritzman
, Øystein Garred
, Jürgen Geisler
, Gry Aarum Geitvik
, Solveig Hofvind
, Vessela Kristensen
, Rolf Kåresen
, Anita Langerød
, Ole Christian Lingjærde
, Gunhild Mari Mælandsmo
, Bjørn Naume
, Hege G. Russnes
, Kristine Kleivi Sahlberg
, Torill Sauer
, Helle Kristine Skjerven
, Ellen Schlichting
& Therese Sørlie

Contributions

A.V.P., G.N., and O.C.L. performed the statistical and bioinformatical analyses, with contributions from O.M.R., M.R.A., Ø.B., K.L., V.V., A.F., A.M., O.E., D.C.W., P.V.L., and H.G.R. V.V., and A.F. performed the IntClust subtyping in the Oslo2 cohort. A.V.P., G.N., H.G.R., and O.C.L. developed the CARMA method, with contributions from M.R.A., O.E., V.K., and C.C. A.L., V.K., and A.L.B.D. performed and planned laboratory experiments. OSBREAC and A.L.B.D. provided patient materials. A.V.P., G.N., H.G.R., and O.C.L. conceived the study and wrote the manuscript. All authors performed critical revision of the manuscript and have read and accepted the final version.

Corresponding author

Correspondence to Ole Christian Lingjærde.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Description of Additional Supplementary Files

Supplementary Data 1

Reporting Summary

Peer Review File

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Pladsen, A., Nilsen, G., Rueda, O.M. et al. DNA copy number motifs are strong and independent predictors of survival in breast cancer. Commun Biol 3, 153 (2020). https://doi.org/10.1038/s42003-020-0884-6

Download citation

Received: 11 October 2019
Accepted: 05 March 2020
Published: 02 April 2020
DOI: https://doi.org/10.1038/s42003-020-0884-6

This article is cited by

Copy number signatures and CCNE1 amplification reveal the involvement of replication stress in high-grade endometrial tumors oncogenesis
- Regine Marlin
- Jean-Samuel Loger
- Mehdi Jean-Laurent
Cellular Oncology (2024)
Whole genome copy number analyses reveal a highly aberrant genome in TP53 mutant lung adenocarcinoma tumors
- Maria Moksnes Bjaanæs
- Gro Nilsen
- Åslaug Helland
BMC Cancer (2021)

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.