Molecular stratification of endometrioid ovarian carcinoma predicts clinical outcome

Endometrioid ovarian carcinoma (EnOC) demonstrates substantial clinical and molecular heterogeneity. Here, we report whole exome sequencing of 112 EnOC cases following rigorous pathological assessment. We detect a high frequency of mutation in CTNNB1 (43%), PIK3CA (43%), ARID1A (36%), PTEN (29%), KRAS (26%), TP53 (26%) and SOX8 (19%), a recurrently-mutated gene previously unreported in EnOC. POLE and mismatch repair protein-encoding genes were mutated at lower frequency (6%, 18%) with significant co-occurrence. A molecular taxonomy is constructed, identifying clinically distinct EnOC subtypes: cases with TP53 mutation demonstrate greater genomic complexity, are commonly FIGO stage III/IV at diagnosis (48%), are frequently incompletely debulked (44%) and demonstrate inferior survival; conversely, cases with CTNNB1 mutation, which is mutually exclusive with TP53 mutation, demonstrate low genomic complexity and excellent clinical outcome, and are predominantly stage I/II at diagnosis (89%) and completely resected (87%). Moreover, we identify the WNT, MAPK/RAS and PI3K pathways as good candidate targets for molecular therapeutics in EnOC.


1A. Immunohistochemistry for WT1
Immunohistochemistry (IHC) for Wilms' Tumour 1 (WT1) was performed on the Leica Bond III Autostainer using protocol F. WT1 IHC used 1:1000 dilution anti-human WT1 monoclonal mouse antibody clone 6F-H2 (DAKO). Nuclear WT1 expression in tumour cells was recorded as WT1 positive and those with complete absence of nuclear staining as WT1 negative. Positive nuclear staining of vascular endothelial cells served as internal controls.

1B. Immunohistochemistry for CK7 and CK20
Cytokeratin 7 (CK7) staining was performed using a 1:100 dilution of anti-human monoclonal mouse CK7 antibody clone RN7 (Leica). A WT1 positive high grade serous ovarian carcinoma tissue section was used as a positive control. Nuclear staining in tumour cells was considered CK7 positive.
Cytokeratin 20 (CK20) staining was performed using a 1:50 dilution of anti-human monoclonal mouse CK20 antibody clone KS20.8 (Leica). Normal stomach tissue was used as a positive control. Nuclear staining in tumour cells was considered CK20 positive.

1C. Immunohistochemistry for p53
IHC for tumour protein p53 (p53) was performed on the Leica BOND III Autostainer using protocol F. p53 IHC used a 1:50 dilution of the monoclonal mouse anti-human p53 antibody clone DO-7 (DAKO). p53 staining was recorded as aberrant (aberrant diffuse nuclear overexpression or aberrant null pattern) or wild-type (variable nuclear expression). Stromal cells served as an internal control.

1D. Immunohistochemistry for β-catenin
β-catenin IHC was performed using a human tissue microarray constructed from 0.8mm cores taken from EnOC tumour regions. IHC used a 1:100 dilution of the monoclonal mouse anti-human β-catenin antibody M353901-2 (Agilent) on the Leica BOND III Autostainer. Normal tonsil tissue was used as the control. β-catenin staining was recorded as aberrant (abnormal nuclear accumulation in tumour cells) or wild-type (membranous staining only). Stromal cells served as an internal control.

2A. Generation of sequence libraries and exome sequencing
Libraries were prepared from tumour DNA using the Illumina TruSeq Exome Library Prep kit (#FC-150-1002 -Illumina) according to the manufacturer's protocol using modifications for working with formalin-fixed paraffin-embedded (FFPE) material.
200ng of DNA was end-repaired to remove 3' and 5' overhangs and fragment length was optimised using sample purification beads. A single 'A' nucleotide was added to the 3' ends of the blunt fragments to prevent them from ligating to one another during subsequent adapter ligation, and a corresponding single 'T' nucleotide on the 3' end of the adapter provided a complementary overhang for ligating the adapter to the fragment. Multiple indexing adapters were then ligated to the ends of the cDNA to prepare them for hybridisation onto a flow cell, before 12 cycles of polymerase chain reaction (PCR) were used to selectively enrich adapter-bound DNA fragments and amplify DNA quantity. Libraries were quantified using the Qubit 2.0 Fluorometer and the Qubit DNA High Sensitivity (HS) assay (#Q32854 -ThermoFisher); size distribution of fragments was assessed using the Agilent Bioanalyser with the DNA HS Kit (#5067-4626 -Agilent).
DNA libraries containing unique indexes were combined into pools of 6 and target regions bound with capture probes. Streptavidin magnetic beads were then used to capture probes hybridised to the targeted regions of interest and a series of washes removed nonspecific binding from the beads. This process was repeated to ensure high specificity of the captured regions. Capture-enriched library was then purified before 8 cycles of PCR amplification and a final purification step to remove unwanted products.
Exome-captured sequencing library pools were quantified using the Qubit 2.0 Fluorometer and the Qubit DNA HS assay (#Q32854 -ThermoFisher) and the size distribution of fragments was assessed using the Agilent Bioanalyser with the DNA HS Kit (#5067-4626 -Agilent). Fragment size and quantity measurements were used to calculate molarity for each library pool.

2B. Mapping of sequenced reads
Base calling and quality were assessed using FASTQC. Data were processed with the bcbio-nextgen python toolkit for fully automated high throughput sequencing analysis (see https://github.com/bcbio/bcbio-nextgen for full documentation and informatic pipelines). Raw sequence data was mapped to the hg38 genome build using the Burrows-Wheeler alignment algorithm 0.7.17 1 .

2C. Variant calling
Variant calling was carried out on mapped BAM files using a majority vote approach from three variant caller algorithms; VarDict 2 , Mutect2 3 , Freebayes 4 . Filtering for FFPE and oxidation artifacts was applied using the GenomeAnalysisToolkit (GATK) (CollectSequencingArtifactMetrics and FilterByOrientationBias). Resulting variant call (VCF) files were analysed in R using the maftools package 5 . Datasets were filtered to remove common population variants using the 1000 Genomes Microsatellite instability (MSI) scores were assigned based on the number of InDels detected in a given sample. Transitions and transversions were calculated using the titv function in maftools.

SECTION 3: ONCOGENIC PATHWAY ANALYSIS
Pathway analysis was performed using the OncogenicPathways function in the R package maftools.
This highlighted PIK-AKT, WNT, RAS and NOTCH pathways as major altered networks (figure S3).

SECTION 4: TUMOUR COMPLEXITY SCORING
Variant allele frequency (VAF) densities across all genes were plotted for each sample to assess tumour genomic complexity; low complexity specimens, with a single driver event and associated outgrowth, were anticipated to display a single VAF peak. Conversely, highly complex tumours with multiple driver events, branched evolution and cell population expansion, would demonstrate multiple VAF peaks.
Analysis was carried out using the inferHeterogeneity function in the R package maftools 5,9 . Resulting mutant-allele tumour heterogeneity (MATH) scores represent the width of the VAF distribution. Figure S1. Oncoplot for the 100 most frequently mutated genes from whole exome analysis. Figure S2. Unsupervised clustering of endometrioid ovarian carcinomas by patterns of mutation, annotated to highlight cases with concurrent endometrial cancer diagnosis or serous-like morphological features. Product-moment correlation scores between samples were calculated using binary matrices representing the status of most frequently mutated genes (1=mutant, 0=wild-type), yielding a matrix of quantified genomic correlation. These data were subject to hierarchical clustering using Euclidean distance and Ward's linkage.