A catalogue of molecular aberrations that cause ovarian cancer is critical for developing and deploying therapies that will improve patients’ lives. The Cancer Genome Atlas project has analysed messenger RNA expression, microRNA expression, promoter methylation and DNA copy number in 489 high-grade serous ovarian adenocarcinomas and the DNA sequences of exons from coding genes in 316 of these tumours. Here we report that high-grade serous ovarian cancer is characterized by TP53 mutations in almost all tumours (96%); low prevalence but statistically recurrent somatic mutations in nine further genes including NF1, BRCA1, BRCA2, RB1 and CDK12; 113 significant focal DNA copy number aberrations; and promoter methylation events involving 168 genes. Analyses delineated four ovarian cancer transcriptional subtypes, three microRNA subtypes, four promoter methylation subtypes and a transcriptional signature associated with survival duration, and shed new light on the impact that tumours with BRCA1/2 (BRCA1 or BRCA2) and CCNE1 aberrations have on survival. Pathway analyses suggested that homologous recombination is defective in about half of the tumours analysed, and that NOTCH and FOXM1 signalling are involved in serous ovarian cancer pathophysiology.
Ovarian cancer is the fifth-leading cause of cancer death among women in the United States; 21,880 new cases and 13,850 deaths were estimated to have occurred in 20101. Most deaths (∼70%) are of patients presenting with advanced-stage, high-grade serous ovarian cancer2,3 (HGS-OvCa). The standard treatment is aggressive surgery followed by platinum–taxane chemotherapy. After therapy, platinum-resistant cancer recurs in approximately 25% of patients within six months4, and the overall five-year survival probability is 31% (ref. 5). Approximately 13% of HGS-OvCa is attributable to germline mutations in BRCA1/2 (refs 6, 7), and a smaller percentage can be accounted for by other germline mutations. However, most ovarian cancer can be attributed to a growing number of somatic aberrations8.
The lack of successful treatment strategies led the Cancer Genome Atlas (TCGA) researchers to measure comprehensively genomic and epigenomic abnormalities on clinically annotated HGS-OvCa samples to identify molecular abnormalities that influence pathophysiology, affect outcome and constitute therapeutic targets. Microarray analyses produced high-resolution measurements of mRNA expression, microRNA (miRNA) expression, DNA copy number and DNA promoter methylation for 489 HGS-OvCa tumours, and massively parallel sequencing coupled with hybrid affinity capture9,10 provided whole-exome DNA sequence information for 316 of these samples.
Samples and clinical data
This Article reports the analysis of 489 clinically annotated stage-II–IV HGS-OvCa samples and corresponding normal DNA (Supplementary Methods, section 1, and Supplementary Table 1.1). Patients reflected the age at diagnosis, stage, tumour grade and surgical outcome of individuals typically diagnosed with HGS-OvCa. Clinical data were current as of 25 August 2010. HGS-OvCa specimens were surgically resected before systemic treatment but all patients received a platinum agent and 94% received a taxane. The median progression-free survival and overall survival of the cohort are similar to those in previously published trials11,12. Twenty-five per cent of the patients remained free from disease and 45% were alive at the time of last follow-up, whereas 31% experienced disease progression within six months of completing platinum-based therapy. The median follow-up time was 30 months (range, 0–179 months). Samples for TCGA analysis were selected to have >70% tumour cell nuclei and <20% necrosis.
Coordinated molecular analyses using multiple molecular assays at independent sites were carried out as listed in Table 1. The data set analysed here is available at the TCGA website (http://tcga-data.nci.nih.gov/docs/publications/ov_2011), in two tiers: open access and controlled access. Open-access data sets are publicly available, whereas controlled-access data sets, which include clinical or genomic information that could identify an individual, require user certification as described on the aforementioned website.
We performed exome capture and sequencing on DNA isolated from 316 HGS-OvCa samples and from matched normal samples for each individual (Supplementary Methods, section 2). Capture reagents targeted ∼180,000 exons from ∼18,500 genes totalling ∼33 megabases of non-redundant sequence. Massively parallel sequencing on the Illumina GAIIx platform (236 sample pairs) or ABI SOLiD 3 platform (80 sample pairs) yielded ∼14 gigabases per sample (∼9 × 1012 bases in total). On average, 76% of coding bases were covered in sufficient depth in both the tumour and the matched normal samples to allow confident mutation detection (Supplementary Methods, section 2, and Supplementary Fig. 2.1). We annotated 19,356 somatic mutations (∼61 per tumour); these are classified in Supplementary Table 2.1. Mutations that may be important in HGS-OvCa pathophysiology were identified by searching for non-synonymous or splice site mutations present at significantly increased frequencies relative to background, by comparing mutations in this study to those in the Catalogue of Somatic Mutations in Cancer and Online Mendelian Inheritance in Man, and by predicting the mutations’ impacts on protein function.
Two different algorithms (Supplementary Methods, section 2) identified nine genes (Table 2) for which the number of non-synonymous or splice site mutations was significantly more than that expected on the basis of mutation distribution models. Consistent with published results13, TP53 was mutated in 303 of 316 samples (283 by automated methods and 20 after manual review), and BRCA1 and BRCA2 had germline mutations in 9% and 8% of cases, respectively, and showed somatic mutations in a further 3% of cases. We identified six other statistically recurrently mutated genes: RB1, NF1, FAT3, CSMD3, GABRA6 and CDK12. CDK12 is involved in RNA splicing regulation14 and was previously implicated in lung and large-intestine tumours15,16. Five of the nine CDK12 mutations were either nonsense or indel, suggesting potential loss of function, and the four missense mutations (Arg882Leu, Tyr901Cys, Lys975Glu and Leu996Phe) were clustered in its protein kinase domain. GABRA6 and FAT3 both appeared as significantly mutated but did not seem to be expressed in HGS-OvCa (Supplementary Fig. 2.1) or fallopian tube tissue, so it is less likely that mutation of these genes has a significant role in HGS-OvCa.
We compared mutations from this study with mutations in the Catalogue of Somatic Mutations in Cancer17 and Online Mendelian Inheritance in Man18 databases to identify more HGS-OvCa genes that are less commonly mutated. These comparisons yielded 477 and 211 matches, respectively (Supplementary Table 2.4), including mutations in BRAF (Asn581Ser), PIK3CA (Glu545Lys and His1047Arg), KRAS (Gly12Asp) and NRAS (Gln61Arg). These mutations have been shown to have transforming activity, so we believe that these mutations are rare but important drivers in HGS-OvCa.
We combined evolutionary information from sequence alignments of protein families and whole vertebrate genomes, predicted local protein structure and selected human SwissProt protein features (Supplementary Methods, section 3) to identify putative driver mutations using CHASM19,20 after training on mutations in known oncogenes and tumour suppressors. CHASM identified 122 missense mutations predicted to be oncogenic (Supplementary Table 3.1). Mutation-driven changes in protein function were deduced from evolutionary information for all confirmed somatic missense mutations by comparing protein family sequence alignments and residue placement in known or homology-based three-dimensional protein structures using MutationAssessor (Supplementary Methods, section 4). Twenty-seven per cent of missense mutations were predicted to affect protein function (Supplementary Table 2.1).
Copy number analysis
Somatic copy number alterations (SCNAs) present in the 489 HGS-OvCa genomes were identified and compared with glioblastoma multiforme data (Fig. 1a). SCNAs were divided into regional aberrations that affected extended chromosome regions and smaller focal aberrations (Supplementary Methods, section 5). A statistical analysis of regional aberrations21 (Supplementary Methods, section 5) identified eight recurrent gains and 22 losses, all of which have been reported previously22 (Fig. 1b and Supplementary Table 5.1). Five of the gains and 18 of the losses occurred in more than 50% of the tumours.
We used GISTIC21,23 (Supplementary Methods, section 5) to identify recurrent focal SCNAs. This yielded 63 regions of focal amplification (Fig. 1c; Supplementary Methods, section 5; and Supplementary Table 5.2), including 26 that encoded eight or fewer genes. The most common focal amplifications encoded CCNE1, MYC and MECOM (Fig. 1c; Supplementary Methods, section 5; and Supplementary Table 5.2), each of which was highly amplified in more than 20% of tumours. New tightly localized amplification peaks in HGS-OvCa encoded the receptor for activated C-kinase, ZMYND8; the p53 target gene IRF2BP2; the DNA-binding protein inhibitor ID4; the embryonic development gene PAX8; and the telomerase catalytic subunit, TERT. Three data sources—Ingenuity Systems (http://www.ingenuity.com/), ClinicalTrials.gov (http://clinicaltrials.gov) and DrugBank (http://www.drugbank.ca)—were used to identify possible therapeutic inhibitors of amplified, overexpressed genes. From this search, we found that 22 genes that are therapeutic targets, including MECOM, MAPK1, CCNE1 and KRAS, are amplified in at least 10% of the cases (Supplementary Table 5.3).
GISTIC also identified 50 focal deletions (Fig. 1c). The known tumour suppressor genes PTEN, RB1 and NF1 were in regions of homozygous deletions in at least 2% of the tumours. Notably, RB1 and NF1 also were among the significantly mutated genes. One deletion contained only three genes, including the essential cell cycle control gene CREBBP, which has five non-synonymous and two reading frame shift mutations.
mRNA and miRNA expression and DNA methylation analysis
We combined expression measurements for 11,864 genes from three different platforms (Agilent, Affymetrix HuEx and Affymetrix U133A) for subtype identification and outcome prediction. Individual platform measurements suffered from limited, but statistically significant, batch effects, whereas the combined data set did not (Supplementary Methods, section 11, and Supplementary Fig. 11.1). Analysis of the combined data set identified ∼1,500 intrinsically variable genes24 (Supplementary Methods, section 6) that were used for non-negative matrix factorization consensus clustering. This analysis yielded four clusters (Fig. 2a and Supplementary Methods, section 6). The same analytic approach applied to a publicly available data set from ref. 25 also yielded four clusters. Comparison of these two sets of four clusters showed a clear correlation (Supplementary Methods, section 6, and Supplementary Fig. 6.3). We therefore conclude that at least four robust expression subtypes exist in HGS-OvCa.
We termed the four HGS-OvCa subtypes ‘immunoreactive’, ‘differentiated’, ‘proliferative’ and ‘mesenchymal’ on the basis of gene content in the clusters (Supplementary Methods, section 6) and previous observations25. T-cell chemokine ligands CXCL11 and CXCL10 and the receptor CXCR3 characterized the immunoreactive subtype. High expression of transcription factors such as HMGA2 and SOX11, low expression of ovarian tumour markers (MUC1 and MUC16) and high expression of proliferation markers such as MCM2 and PCNA defined the proliferative subtype. The differentiated subtype was associated with high expression of MUC16 and MUC1 and with expression of the secretory fallopian tube maker SLPI, suggesting a more mature stage of development. High expression of HOX genes and markers suggestive of increased stromal components such as for myofibroblasts (FAP) and microvascular pericytes (ANGPTL2 and ANGPTL1) characterized the mesenchymal subtype.
Increased DNA methylation and reduced tumour expression implicated 168 genes as epigenetically silenced in HGS-OvCa samples compared with fallopian tube controls26. DNA methylation was correlated with reduced gene expression across all samples (Supplementary Methods, section 7). AMT, CCL21 and SPARCL1 were noteworthy because they showed promoter hypermethylation in the vast majority of the tumours. Unexpectedly, RAB25, previously reported to be amplified and overexpressed in ovarian cancer27, also seemed to be epigenetically silenced in a subset of tumours. The BRCA1 promoter was hypermethylated and silenced in 56 of 489 (11.5%) tumours, as previously reported28 (Supplementary Fig. 7.1). Consensus clustering of variable DNA methylation across tumours identified four subtypes (Supplementary Methods, section 7, and Supplementary Fig. 7.2) that were significantly associated with differences in age, BRCA inactivation events and survival (Supplementary Methods, section 7). However, the clusters demonstrated only modest stability.
Survival duration did not differ significantly for transcriptional subtypes in the TCGA data set. The proliferative group showed a decrease in the rate of MYC amplification and RB1 deletion, whereas the immunoreactive subtype showed an increased frequency of 3q26.2 (MECOM) amplification (Supplementary Table 6.2 and Supplementary Fig. 6.4). A moderate, but significant, overlap between the DNA methylation clusters and gene expression subtypes was noted (P < 2.2 × 10−16, chi-squared test, adjusted Rand index of 0.07; Supplementary Methods, section 7, and Supplementary Table 7.6).
A 193-gene transcriptional signature predictive of overall survival was defined using the integrated expression data set from 215 samples. After univariate Cox regression analysis, we found that 108 genes were correlated with poor survival and that 85 were correlated with good survival (P-value cut-off of 0.01; Supplementary Methods, section 6, and Supplementary Table 6.4). We validated the predictive power of this gene expression signature on an independent set of 255 TCGA samples (Fig. 2b) as well as on three independent expression data sets25,29,30. Each of the validation samples was assigned a prognostic gene score, reflecting the similarity between its expression profile and the prognostic gene signature31 (Supplementary Methods, section 6). Kaplan–Meier survival analysis of this signature showed statistically significant association with survival in all validation data sets (Fig. 2c and Supplementary Methods, section 6).
Non-negative matrix factorization consensus clustering of miRNA expression data identified three subtypes (Supplementary Fig. 6.5). Notably, miRNA subtype 1 overlapped the mRNA proliferative subtype and miRNA subtype 2 overlapped the mRNA mesenchymal subtype (Fig. 2d). Survival duration differed significantly between miRNA subtypes: patients with miRNA subtype-1 tumours survived significantly longer (Fig. 2e).
Pathways influencing disease
Several analyses integrated data from the 316 fully analysed cases to identify biology that contributes to HGS-OvCa. Analysis of the frequency with which known cancer-associated pathways harboured one or more mutations, copy number changes or changes in gene expression showed that the RB1 and PI3K/RAS pathways were deregulated in 67% and 45% of cases, respectively (Fig. 3a and Supplementary Methods, section 8). A search for altered subnetworks in a large protein–protein interaction network32 using HOTNET33 identified several known pathways (Supplementary Methods, section 9) including the NOTCH signalling pathway, which was altered in 22% of HGS-OvCa samples34 (Fig. 3b).
Published studies have shown that cells with mutated or methylated BRCA1 or mutated BRCA2 have defective homologous recombination and are highly responsive to PARP inhibitors35,36,37,38. Fig. 3c shows that 20% of our studied HGS-OvCa samples had germline or somatic mutations in BRCA1/2, that 11% lost BRCA1 expression through DNA hypermethylation and that epigenetic silencing of BRCA1 was mutually exclusive of BRCA1/2 mutations (P = 4.4 × 10−4, Fisher’s exact test). Univariate survival analysis of BRCA1/2 status (Fig. 3c) showed better overall survival for BRCA1/2 mutated cases than BRCA1/2 wild-type cases. Notably, epigenetically silenced BRCA1 cases had survival similar to BRCA1/2 wild-type HGS-OvCa tumours (respective median overall survivals of 41.5 and 41.9 months, P = 0.69, log-rank test; Supplementary Methods, section 8, and Supplementary Fig. 8.13b). This suggests that BRCA1 is inactivated by mutually exclusive genomic and epigenomic mechanisms and that patient survival depends on the mechanism of inactivation. Genomic alterations in other homologous recombination genes that might render cells sensitive to PARP inhibitors39 discovered in this study (Supplementary Methods, section 8, and Supplementary Fig. 8.12) include amplification or mutation of EMSY (also known as C11orf30) (8%), focal deletion or mutation of PTEN (7%), hypermethylation of RAD51C (3%), mutation of ATM or ATR (2%), and mutation of Fanconi anaemia genes (5%). Overall, homologous recombination defects may be present in approximately half of all HGS-OvCa cases, providing a rationale for clinical trials of PARP inhibitors targeting tumours with these homologous-recombination-related aberrations.
Comparison between the complete set of BRCA inactivation events and all recurrently altered copy number peaks revealed an unexpectedly low frequency of CCNE1 amplification in cases with BRCA inactivation (8% of BRCA altered cases had CCNE1 amplification whereas 26% of BRCA wild-type cases did; Q = 0.0048, adjusted for false-discovery rate). As previously reported40, overall survival tended to be lower for patients with CCNE1 amplification than for patients in all other cases (P = 0.072, log-rank test; Supplementary Methods, section 8, and Supplementary Fig. 8.14a). However, no survival disadvantage for CCNE1-amplified cases (P = 0.24, log-rank test; Supplementary Methods, section 8, and Supplementary Fig. 8.14b) was apparent when looking only at BRCA wild-type cases, suggesting that the previously reported CCNE1 survival difference can be explained by the higher survival of BRCA-mutated cases.
Finally, we used a probabilistic graphical model (PARADIGM41) to search for altered pathways in the US National Cancer Institute Pathway Interaction Database42, and found that the FOXM1 transcription factor network (Fig. 3d) is significantly altered in 87% of cases (Supplementary Methods, section 10, and Supplementary Figs 10.1–10.3). FOXM1 and its proliferation-related target genes, AurB (AURKB), CCNB1, BIRC5, CDC25 and PLK1, were consistently overexpressed but not altered by DNA copy number changes, indicative of transcriptional regulation. TP53 represses FOXM1 after DNA damage43, suggesting that the high rate of TP53 mutation in HGS-OvCa contributes to FOXM1 overexpression. In other data sets, the FOXM1 pathway is significantly activated in tumours relative to adjacent epithelial tissue44,45,46 (Supplementary Methods, section 10, and Supplementary Fig. 10.4) and is associated with HGS-OvCa22 (Supplementary Methods, section 10, and Supplementary Fig. 10.5).
This TCGA study provides a large-scale integrative view of the aberrations in HGS-OvCa. Overall, the mutational spectrum was surprisingly simple. Mutations in TP53 predominated, occurring in at least 96% of HGS-OvCa samples; and BRCA1 and BRCA2 were mutated in 22% of tumours, owing to a combination of germline and somatic mutations. Seven other significantly mutated genes were identified, but only in 2–6% of HGS-OvCa samples. By contrast, HGS-OvCa demonstrates a remarkable degree of genomic disarray. The frequency of SCNAs stands in striking contrast to previous TCGA findings in glioblastoma47, where there were more recurrently mutated genes with far fewer chromosome arm-level or focal SCNAs (Fig. 1a). A high prevalence of mutations and promoter methylation in putative DNA repair genes, including homologous recombination components, may explain the high prevalence of SCNAs. The mutation spectrum marks HGS-OvCa as completely distinct from other ovarian cancer histological subtypes. For example, clear-cell ovarian cancer tumours have few TP53 mutations but have recurrent ARID1A and PIK3CA mutations48,49,50; endometrioid ovarian cancer tumours have frequent CTNNB1, ARID1A and PIK3CA mutations and a lower rate of TP53 (refs 49, 50); and mucinous ovarian cancer tumours have prevalent KRAS mutations51. These differences between ovarian cancer subtypes probably reflect a combination of aetiological and lineage effects, and represent an opportunity to improve ovarian cancer outcomes through subtype-stratified care.
Identification of new therapeutic approaches is a central goal of the TCGA. The ∼50% of HGS-OvCa tumours with homologous recombination defects may benefit from PARP inhibitors. Beyond this, the commonly deregulated pathways, RB, RAS/PI3K, FOXM1 and NOTCH, provide opportunities for therapeutic treatment. Finally, inhibitors already exist for 22 genes in regions of recurrent amplification (Supplementary Methods, section 5, and Supplementary Table 5.3), warranting assessment in HGS-OvCa cases where the target genes are amplified. Overall, these discoveries set the stage for approaches to the treatment of HGS-OvCa in which aberrant genes or networks are detected and targeted with therapies selected to be effective against these specific aberrations.
All specimens were obtained from patients with appropriate consent from the relevant institutional review board. DNA and RNA were collected from samples using the Allprep kit (Qiagen). We used commercial technology for capture and sequencing of exomes from whole-genome-amplified tumour DNA and normal DNA. DNA sequences were aligned to NCBI Build 36 of the human genome; duplicate reads were excluded from mutation calling. Validation of mutations occurred on a separate whole-genome amplification of DNA from the same tumour. Significantly mutated genes were identified by comparing them with expectation models based on the exact measured rates of specific sequence lesions. CHASM20 and MutationAssessor (Supplementary Methods, section 4) were used to identify functional mutations. GISTIC analysis of the circular-binary-segmented Agilent 1M feature copy number data was used to identify recurrent peaks by comparison with the results from the other platforms, to determine likely platform-specific artefacts. Consensus clustering approaches were used to analyse mRNA, miRNA and methylation subtypes as well as predictors of outcome using previous approaches47. HOTNET33 was used to identify portions of the protein–protein interaction network that have more events than are expected by chance. Networks that had a significant probability of being valid were evaluated for increased fraction of known annotations. PARADIGM41 was used to estimate integrated pathway activity, to identify portions of the network models differentially active in HGS-OvCa.
Sequence information reported here has been submitted to dbGaP under accession number PHS000178.
We thank J. Palchik, A. Mirick and Julia Zhang for administrative coordination of TCGA activities. This work was supported by the following grants from the USA National Institutes of Health: U54HG003067, U54HG003079, U54HG003273, U24CA126543, U24CA126544, U24CA126546, U24CA126551, U24CA126554, U24CA126561, U24CA126563, U24CA143840, U24CA143882, U24CA143731, U24CA143835, U24CA143845, U24CA143858, U24CA144025, U24CA143882, U24CA143866, U24CA143867, U24CA143848, U24CA143843 and R21CA135877.
This zipped file contains 7 Supplementary Tables.