Non-coding somatic mutations converge on the PAX8 pathway in ovarian cancer

The functional consequences of somatic non-coding mutations in ovarian cancer (OC) are unknown. To identify regulatory elements (RE) and genes perturbed by acquired non-coding variants, here we establish epigenomic and transcriptomic landscapes of primary OCs using H3K27ac ChIP-seq and RNA-seq, and then integrate these with whole genome sequencing data from 232 OCs. We identify 25 frequently mutated regulatory elements, including an enhancer at 6p22.1 which associates with differential expression of ZSCAN16 (P = 6.6 × 10-4) and ZSCAN12 (P = 0.02). CRISPR/Cas9 knockout of this enhancer induces downregulation of both genes. Globally, there is an enrichment of single nucleotide variants in active binding sites for TEAD4 (P = 6 × 10-11) and its binding partner PAX8 (P = 2×10-10), a known lineage-specific transcription factor in OC. In addition, the collection of cis REs associated with PAX8 comprise the most frequently mutated set of enhancers in OC (P = 0.003). These data indicate that non-coding somatic mutations disrupt the PAX8 transcriptional network during OC development. The role of non-coding somatic mutations in ovarian cancer is unclear. Here, the authors integrate genomic and epigenomic data from patient samples to show that these mutations frequently converge on the PAX8 transcriptional network.

E pithelial ovarian cancer (OC) is a heterogeneous disease, comprising several different histological subtypes that differ in their underlying genetics, epidemiologic risk factors, clinical characteristics, and cellular origins [1][2][3] . The most common subtype is high-grade serous OC (HGSOC), which accounts for around two-thirds of all invasive OCs, and likely arises from fallopian tube secretory epithelium 4 . Other subtypes of invasive OC are rarer and include clear cell and endometrioid OCs (CCOC and EnOC), which are strongly associated with the benign precursor lesion endometriosis 5 , and mucinous OC (MOC), which may derive from appendiceal tissues or primordial germ cells 6 .
Molecular analyses of primary OCs have so far focused on characterizing somatic genomic variation in protein-coding genes, which have implicated distinct genetic alterations and biological pathways in the development of each histotype. TP53 mutations are ubiquitous in primary HGSOCs 7 . Loss-of-function mutations in DNA double-strand break repair genes (e.g. BRCA1, BRCA2), which confer high-penetrance genetic susceptibility to OC, are also relatively common 8 and predicate a genomic instability phenotype that results in an accumulation of gross structural genomic changes as tumors develop 9 . CCOCs and EnOCs often harbor somatic pathogenic mutations in ARID1A, a member of the SWI/SNF family of chromatin remodelers 10 . TERT promoter mutations are specific to CCOCs 11 , and coding mutations and promoter methylation in DNA mismatch repair genes are relatively common in the EnOC subtype 12 . Alterations in the MAPK pathway are present in~70% of MOCs, with KRAS hotspot mutations (amino acid 12 or 13) the most frequent genetic change 13 .
There is currently a lack of understanding of the role of noncoding mutations in cancer pathogenesis. Whole genomesequencing (WGS) studies shows that~96% of all somatic mutations identified in primary tumors lie in non-protein coding DNA regions. A proportion of non-coding somatic mutations likely represent functional drivers of cancer disease development [14][15][16] , mediating their effects by modifying the function of regulatory elements (REs) that modify the expression of target genes that contribute to neoplastic development. The architecture of gene regulation is highly tissue and cell-type specific 17 . Somatic mutations within disease-specific REs are therefore expected to affect gene expression in a disease-specific manner.
The goals of the current study are: (1) To characterize the histotype-specific regulatory landscapes of the different OC histotypes and (2) using WGS data from primary ovarian tumors, to identify frequently mutated REs (FMREs) that may play a role in disease pathogenesis (Fig. 1a).

Results
Epigenomic and transcriptomic landscapes of OCs. We characterized the histotype-specific landscapes of active chromatin in primary ovarian tumors using chromatin immunoprecipitationsequencing (ChIP-seq) for acetylated lysine 27 of histone H3 protein (H3K27ac). H3K27ac ChIP-seq was performed in 20 primary tumors, 5 each for the different histotypes, HGSOC, CCOC, EnOC, and MOC (Supplementary Table 1). We identified a union set of 295,243 non-overlapping ChIP-seq peaks across all tumors, comprising 11.6% of the genome (Fig. 1b and Supplementary Fig. 1). The number of peaks identified plateaued at 16 tumors, suggesting we have identified the majority of active REs in OCs. The 12,954 peaks (4.4%) shared across all tumor types were enriched at gene promoters (odds ratio = 14, P < 0.001, Fisher's exact test, when compared genome-wide) (Fig. 1c). For each tumor type, we also identified a set of histotype-specific H3K27ac peaks: 6,583 in HGSOCs, 5,401 in CCOCs, 2,134 in MOCs, and 20 in EnOCs. These peaks fall predominantly in enhancers (odds ratio = 40, P < 0.001, Fisher's exact test, histotype-specific H3K27ac peaks versus common H3K27ac peaks in OCs) (Fig. 1c).
We performed RNA sequencing (RNA-seq) in 19 of these tumors. Consistent with H3K27ac ChIP-seq data, we identified histotype-specific patterns of gene expression for each tumor type. There were 1,214 differentially expressed genes (DEGs) specific to CCOC, 519 DEGs for HGSOC, 371 DEGs for MOC, and 16 DEGs for EnOC ( Supplementary Fig. 2). Patterns of chromatin activity were consistent with patterns of gene expression; genes flanking histotype-specific peaks of active chromatin were consistently expressed at higher levels in tumors of the same histotype. Conversely, lower H3K27ac signal was associated with lower expression of nearby genes (Fig. 1d). As an example, we observed elevated H3K27ac ChIP-seq signal in the promoter of WFDC2 associated with higher expression of the WFDC2 gene in HGSOC and EnOC samples compared to CCOCs and MOCs (Spearman's rho = 0.78, P = 5.5 × 10 −5 ) ( Fig. 1e-g). WFDC2 encodes for human epididymis protein 4 (HE4), a biomarker overexpressed in serous and endometrioid OCs 18 . HE4 is used clinically to monitor OC recurrence. Taken together, these data indicate that histotype-specific enhancers regulate gene expression in cis in a histotype-type specific manner.
Predicting REs and target genes interactions. We integrated H3K27ac ChIP-seq and RNA-seq data to map REs, including typical enhancers and promoters, to putative target genes. We calculated all correlations between gene expression and RE activity to identify all gene-RE pairs within topologically associating domains (TADs) (Spearman's rho > 0.4, P < 0.05, distance <500 kbp) (Fig. 1h). We use the term 'CREAG' (collection of 'cis-Regulatory Elements Associated with a Gene'), to define the collection of predicted enhancers for a gene of interest (a CREAG is conceptually similar to the previously described gene 'plexi' 19 ). This defined a catalog of 15,380 RE-gene associations between 6,197 genes and 11,371 REs across all histotypes, with a median of 2.5 enhancers per CREAG (Fig. 1i) and 1.4 genes per enhancer (Fig. 1j). We compared enhancer-gene associations with the GeneHancer database to infer target genes for 285,000 human enhancers 20 . Forty-four percent of our predicted enhancer-gene associations were annotated to the same gene in GeneHancer, seven times more than expected by chance (P < 0.001). We then mapped all histotype-specific REs to their putative target gene(s) (Supplementary Data 1) and performed pathway enrichment analysis to identify biological mechanisms associated with each histotype (Fig. 1k and Supplementary Fig. 3). Genes associated with histotype-specific REs function in pathways known to be involved the development of each histotype. RE-associated genes for CCOC (e.g. AKT2, ITGA5, LAMC1, MET, PPP2R3A, and SGK1) are involved in PI3K-Akt signaling (P = 0.0017) 3,21,22 ; and active REs specific to MOCs are associated with genes (e.g. GCNT3, MUC12, GALNT5 and B3GNT5) involved in O-glycan processing (P = 0.0001), likely reflecting upregulated mucin production and O-glycosylation activity in this histotype 23 . Pathways common across histotypes were associated with cell proliferation and mitosis ( Supplementary Fig. 3).
Super-enhancer landscapes in OCs. Large enhancer domains, termed super-enhancers (SEs) or stretch enhancers 24,25 , typically regulate genes critical to cell identity. We asked whether histotype-specific SEs were present, and whether these were associated with genes that may contribute to phenotypic heterogeneity in OCs. We characterized between 653 and 1,945 SEs per tumor (mean = 1245, sd = 321) (Fig. 2a, Supplementary Table 2). By assigning SEs to the highest expressed overlapping gene we identified a total of 5,338 SE-associated genes, of which 1,123 were common to all OC histotypes. PAX8, a transcription factor and known essential gene in OC 26 , was associated with SEs in 17 (out of 20) tumors, with the lowest signal in MOCs ( Fig. 2b and Supplementary Fig. 4). MUC16, which encodes CA125, another clinical biomarker used to diagnose and monitor OC, coincided with an SE in 13/20 tumors; again the lowest enhancer activity was in MOCs, which are known to have the lowest CA125   Supplementary Fig. 3). LINC00963 has been implicated in prostate cancer progression 28 but has not been previously associated with OC. A total of 2,094 SE-associated genes were histotype-specific: 634 for CCOC, 617 for HGSOC, 565 for MOC, and 278 for EnOC (Fig. 2c). As expected, histotype-specific SE-associated genes were more highly expressed in the histotype of interest ( Supplementary  Fig. 4); for example, SE-associated genes specific to CCOC had a median normalized gene expression above zero, but less than zero for all other histotypes. The known function of many of the SEassociated genes we identified support a histotype-specific role in OC ( Fig. 2d-f); for example the CCOC-specific gene protein phosphatase 1 regulatory subunit 3B (PPP1R3B) regulates glycogen synthesis, consistent with the observations that clear cell tumors contain large deposits of cytoplasmic glycogen. PPP1R3B is expressed more highly in CCOC cell lines than in HGSOC and normal FTSEC lines (Fig. 2g). In vitro knockdown of PPP1R3B in two CCOC models (JHOC5 and RMG-II cell lines) results in a significant decrease in glycogen content (P = 0.05, Fig. 2h and i). SE-associated genes common to all OCs are involved in pathways, such as TNFA signaling via NF-kB and response to growth factor (P = 2.7 × 10 −25 and 1.3 × 10 −14 , respectively ( Supplementary Fig. 4).
Somatically mutated REs in OCs. We collated whole genome sequencing (WGS) data from 232 primary OCs which included 169 HGSOCs, 28 CCOCs, and 35 EnOCs 3 to identify the somatic non-coding mutations occurring in active enhancers and promoters in OCs. In total, there were 1.7 million single nucleotide variants (SNVs) across all 232 tumors, with an average of 7,163 SNVs per tumor (range 480-40,576) ( Supplementary Fig. 5). Of these, 1.6 million (98.8%) SNVs lay in non-coding DNA regions ( Supplementary Fig. 5). Approximately 9.3% percent of SNVs are located in active REs. Fourteen percent of these are in active promoters and 86% in active enhancers. We looked for FMREs harboring SNVs at a frequency greater than expected by chance based on the average distribution of mutations throughout all H3K27ac positive regions (see "Methods" section). We identified 25 FMREs across all histotypes at a false discovery rate (FDR) of 0.25 ( Fig. 3a-c). Eight FMREs were unique to HGSOC, 17 were unique to endometriosis-associated OCs (CCOC and EnOC) (Supplementary Table 3). FMREs included both promoters and enhancers. To evaluate the functional consequences of FMREs we used our gene-RE maps to quantify global patterns of differential gene expression associated with somatic SNVs in promoters and enhancers. For 89 HGSOCs both WGS and RNA-seq data were available, which enabled us to quantify differential gene expression associated with RE mutation. We found that overall, genes were significantly more likely to be overexpressed in samples with RE mutations compared to wild-type samples, i.e., samples without somatic mutations in the RE of interest. For the 2,893 REs harboring at least 1 somatic SNV, 89 associated genes were significantly overexpressed (FC > 2) and 46 genes were significantly downregulated (FC < 0.5) (P-value < 0.05, binomial distribution), indicating that SNVs in enhancers are around twice as likely to activate rather than repress gene expression (Fig. 3d). At chromosome 10p15, we identified a cluster of nine somatic SNVs all located within the KLF6 promoter, within a common SE in primary OCs (Fig. 3e). Two somatic SNVs coincided with a binding site for PAX8 in the promoter, defined by H3K27ac ChIP-seq in OC cell lines 29 . SE activity is associated with KLF6 expression in all OC histotypes, but the SE is only mutated in HGSOC (P = 8.2 × 10 −8 ). KLF6 is a Krüppel-like transcription factor with tumor suppressor functions, and is associated with chemoresponse and prognosis in OC patients 30,31 .
At the 6p22.1 locus, we identified a cluster of seven SNVs in an enhancer located~9 kb centromeric to the HIST1 gene cluster (Fig. 3f). The activity of this putative enhancer correlates strongly with the expression of ZSCAN16 (Spearman's rho = 0.69, P = 6.6 × 10 −4 ) and ZSCAN12 (Spearman's rho = 0.47, P = 0.02) (Fig. 3g). We used CRISPR/Cas9 to knock out 635 bp of this enhancer in two HGSOC cell lines (UWB1.289 and SHIN3) ( Supplementary Fig. 6). Enhancer knockout induced downregulation of both ZSCAN16 and ZSCAN12, and ZKSCAN3 and HIST1H2AI, which are previously predicted targets of this enhancer 32 (Fig. 3h, i and Supplementary Fig. 6). Crucially, there was no change in expression of ZSCAN31, a gene not predicted by any method to be targeted by this enhancer. ZSCAN16, ZSCAN12, and ZKSCAN3 are TFs containing SCAN domains that mediate protein-protein interactions; little is known about their function and they have not previously been implicated in OC.
Gene-centric analysis of mutated REs. We tested the aggregate of somatic mutations in 6,197 CREAGs identified in primary ovarian Fig. 1 Epigenomic profiling in 20 epithelial OCs reveals histotype-specific REs. a Study overview-leveraging landscapes of active chromatin in ovarian cancer to identify frequently mutated regulatory elements. b Number of peaks and genome coverage as a function of number of samples. c Heatmap showing the normalized H3K27ac ChIP-seq signal for the 20 OC samples (columns) at the active REs (rows). d Gene expression averaged by OC histotype (CCOC, red; EnOC, blue; HGSOC, green; MOC, purple) around histotype-specific REs. e-g WFDC2 locus; e H3K27ac ChIP-seq signal in the promoter region (chr20:44,095,981-44,101,060) and f gene expression in different histotypes of OC. g Normalized H3K27ac ChIP-seq signal versus WFDC2 gene expression. h Diagram of the enhancer-gene association strategy that computes the Spearman's correlation between enhancer activity (normalized H3K27ac ChIP-seq signal) and gene expression (normalized RNA-seq) (r ij ) between all enhancers and all genes within the same topologically associating domain (TAD). A putative enhancer-gene association is established if the correlation (r ij ) is significant (r ij > 0.4 and P-value < 0.05). i Histogram of number of associated genes per RE and j number of associated REs per gene. k Pathway enrichment analysis of genes associated with histotype-specific REs.
Based on the normalized mutational burden (P-value) and the frequency with which somatic SNVs occur in any of the REs that contribute to a CREAG, the CD47 CREAG (containing five REs) is the most significantly mutated in OC and contains 22 mutations in 21 patients (P = 6.46 × 10 −6 , Fig. 4c). Overall, the PAX8 CREAG was the most frequently mutated in HGSOC; 36/169 tumors (21%) harbored a somatic mutation in at least one of three REs in this CREAG (P = 0.003) (Fig. 4f). SNVs in the PAX8 CREAG are scattered throughout PAX8 SE, a 82 kb region overlapping the PAX8 gene locus (Fig. 4g). The PAX8 CREAG is mutated in other cancer types, but this only reached statistical significance in melanoma, kidney, and OCs (Fig. 4h). Kidney and OCs have high levels of PAX8 expression and cell viability dependency shown by CRISPR screens in cancer cell lines 40 , while skin cancer does not show a clear relationship with PAX8 dysregulation. Recent studies have shown that skin cancer has abundant number of mutations in non-coding regions and TF-binding sites, but estimate that a large number of those are random events 41 . Combining these data, we find that 90% of OCs harbor an alteration in the PAX8 pathway, either by somatic amplification (23% of cases) or deletion (13%) of the PAX8 locus, mutation of enhancers upstream of PAX8 (25%), somatic mutation in PAX8-binding site (61%) or TEAD4-binding site mutation (61%) (Fig. 4i). PAX8 target genes were among the differentially expressed genes associated with CREAG mutational state ( Supplementary Fig. 7a) including downregulated expression of HOXA10, a gene implicated in the development of endometrioid but not high-grade serous tumors 42 and upregulated expression of TMPRSS3, a gene associated with tumorigenic phenotypes in in vitro models of OC 43 .

Discussion
Different cancers have been defined by the spectrum of proteincoding mutations and target genes that drive disease pathogenesis; but little is known about the functional role of non-coding somatic mutations in cancer development which likely drive the underlying mechanisms of gene regulation through epigenomic perturbation. This study describes the histotype-specific architecture of gene regulation in OC based on H3K27ac ChIP-seq analysis of primary tumors representing the four major subtypes of invasive disease. As anticipated, different histotypes of OCs share common epigenomic features, which likely reflects a shared embryologic lineage; but we show conclusively that each subtype also has a unique signature of active enhancers that underpins the histotype-specific patterns of gene expression. Endometrioid OCs (EnOCs) represent an exception in that H3K27ac ChIP-seq analysis only identified a small number of histotype-specific REs. When we focused on the histotype-defining enhancers, two of our EnOCs resembled HGSOC, while the other three resembled CCOC. This is consistent with clinical and biological traits of these tumors; both EnOCs and CCOCs are associated with endometriosis; but late-stage EnOCs can share somatic features with high-grade serous ovarian OCs. This may partly explain the lack of specificity in defining the regulatory landscape of this histotype.
Histotype-specific REs were strongly enriched for putative enhancers. Enhancer depletion was more common than enhancer gain, consistent with previous reports that loss of activity drives cell-type-specific identity 44 . A more in-depth analysis of enhancer activity identified around 1,100 'Müllerian' SEs that are common across all OC histotypes. SEs mark genes associated with cell lineage and cell state, in normal and cancer tissues alike 45 . We identified SEs coinciding with genes that encode established biomarkers in OC, highlighting the disease-specific nature of these findings. This included SEs associated with mucin 16 (MUC16) that encodes CA125, and PAX8. MUC16/CA125 is a serum marker that is used clinically to aid in the diagnosis of OC and monitor disease progression. PAX8 is a lineage-specific transcription factor that is highly expressed in fallopian tube epithelia, a precursor of HGSOC and is commonly amplified in HGSOCs, emphasizing its essentiality in OC development 8,26 . The PAX8 SE was detected in all OC histotypes and was most active in HGSOC and EnOC. A SE proximal to PPP1R3B was unique to CCOC. PPP1R3B was most highly expressed in CCOC compared to other histotypes and knockout of PPP1R3B in CCOC cell lines indicates that these analyses have identified functionally relevant biomarkers associated with disease development. We found other histotype-specific SEs which likely regulate genes important for establishing the defining features of each OC histotype that will warrant functional validation in the future. Overall, the data indicate a role for SE landscapes as underlying drivers of histotype-specific OC pathogenesis.
Integrating mutation data from WGS of 232 primary ovarian tumors with epigenomics landscapes, we identified somatic mutations that fall into REs that are candidate non-coding functional targets of these mutations. Ovarian tumors contain several thousand somatic mutations, the vast majority of which (98.7%) lie in the non-protein coding DNA regions; but only a proportion of these are likely to have a functional impact on disease development. There are several hypotheses for the functional mechanisms of non-coding mutations in disease etiology: (1) that somatic mutations perturb specific REs, including active enhancers and SEs, to affect in cis the expression of gene(s) that are critical in disease biology; (2) that the constellation of somatic mutations across multiple REs targeting the same gene have equivalent functional effects; and (3) somatic mutations contribute to disease development by impacting either the expression Fig. 2 Histotype-specific super-enhancers. a Number of super-enhancers as a function of number of samples. b PAX8 and MUC16 loci show common OC SEs. c UpsetR plot showing the size of the all the subsets of super-enhancer associated genes by whether they are present in CCOC, EnOC, HGSOC, and MOC. d Plots of all enhancers ranked by enhancer signal for four representative OC samples, one for each histotype, showing the top 10 SEAGs, and the rank of PAX8 SE, MUC16 SE, and EPCAM SE for each sample. e and f Examples of histotype-specific SEAGs. Gene tracks e and gene expression f that show one example of histotype-specific SEAGs for each histotype (PPP1R3B, DLG5, PBX1, and TFF3 for CCOC, EnOC, HGSOC, and MOC, respectively). g PPP1R3B expression in CCOC, HGSOC, and FTSEC cell lines. Relative PPP1R3B expression h and glycogen level i in JHOC5 and RMG-II cell lines before and after PPP1R3B knockdown, error bars indicate one standard deviation of the mean values from three independent experiments (performed with technical replicates).
or activity of transcription factors (e.g. by altering TF-binding sites). Our analyses found several non-coding REs that contained clusters of somatic mutations (FMREs). FMREs showed histotype-specificity, consistent with the histotype specificity we observed from ChIP-seq and RNA-seq analysis, and were associated with genes known to be biologically important in OC development. The burden of somatic mutations in REs were also predicted to deregulate pathways that are critical in OC biology. Thus, taken together, our studies support the hypothesis that non-coding somatic mutations within tissue-specific REs perturb    AKT2  RSPRY1  GDA  HOXA2  HOXA4  PER1  HOXA10  HOXA3  HOXA5  PIM3  TSPAN14  PLXNB2  MIDN  TOB1  CSNK1E  PARD6B  SOCS3  PTPRF  PLXNB1  EPB41L1  PER2  PHLDB1  WWC1  KCNK3  HOXA9  TBL1XR1  SLC12A7  SOX17  MECOM  VWA1  ZFP92  KHSRP  WNT7A  NDRG1  SHANK2  CDYL  HOXD4  HOXD9  MRE11A  SNX24  CD47  DLG3  CTBP1  GALNT14  GSE1  KLF6  SPSB1  TPM4  PAX8  ADCY5  ARFGAP2  BCKDK  GPBP1L1  HPGD  MTX1  SSH2  ZDHHC5  C16orf87  FASTK  UBE2I  ZBBX  FAM24B  ZNF135  HNF4G  PFKP  SEPT9  DAB2IP  NCOR2  EPAS1  regulatory networks that are specifically associated with disease etiology. These analyses were based on WGS data generated for a relatively small number of primary OCs and for several histotypes. Inevitably, these studies will benefit in the future from additional WGS analyses performed in ovarian tumors for the different histotypes. The greater number of FMREs in endometriosis-associated OCs compared to HGSOCs, despite the smaller sample size, suggests somatic noncoding mutations may play a greater role in the development of clear cell and endometrioid tumors, however, further analysis is needed to properly evaluate this result, since different somatic mutation calling pipelines were used in the generation of this dataset. There are also likely to be limitations in the current analyses because not all SNVs in a RE are likely to have a similar functional impact. Neither did we evaluate other somatic events in REs (e.g. deletions, amplification, and other structural rearrangements) that may impact gene regulation by affecting RE activity.
Perhaps the most compelling finding from these studies is the convergence of several analyses on PAX8 as a major transcription factor target in the etiology of high-grade serous OC. There was a significant clustering of mutations in enhancers upstream of PAX8 in HGSOC but not in CCOC or EnOC; and PAX8-bound active enhancers were also significantly mutated in HGSOC, as were active enhancers bound by TEAD4, a known PAX8-binding partner 29,34 . When we look at the PAX8 CREAG specifically, we do not see evidence for differential expression associated with enhancer/promoter mutations ( Supplementary Fig. 7b), but this could be because of our small number of mutated samples with RNA-seq data (n = 21), passenger mutations, or heterogeneity of mutation effects. Together these data indicate a role for PAX8 as both a target and mediator of non-coding somatic mutations. This was further supported by the finding of somatic mutations disrupting the TEAD4 motif within a mutated enhancer on chromosome 6. Knockout of the frequently mutated enhancer at 6p22.1 locus validated a positive association between the activity of the enhancer and four genes predicted to be regulated by the enhancer. Further analysis is needed to determine whether there is direct contact between the RE and ZSCAN16, ZSCAN12, HIST1H2AI, and ZKSCAN3 promoters, and to determine the functional role for one or more of these genes in OC pathogenesis. Our studies also identified genes and transcription factors that warrant additional functional studies to validate their role in OC. In addition to PAX8 and TEAD4-bound enhancers, we also identified significantly increased somatic mutation rates in FOXM1 and ESR1 bound regions in HGSOC. Analysis of HGSOC TCGA data indicates FOXM1 is a frequently altered pathway in HGSOC development 8 , while ESR1 is expressed in around 80% of HGSOCs 46 .
In conclusion, through the integration of tissue-specific epigenomic and gene expression landscapes for the different histotypes of OC with somatic mutation data from WGS analyses of primary tumors, we have identified non-coding elements that may contribute to OC development. Many of the mutated REs and their associated genes provide insights into disease pathogenesis, and the lineage-specific TF PAX8 was identified a central player in the transcriptional dysregulation caused by non-coding somatic mutations in HGSOCs.

Methods
Tissue ChIP-seq. All tissues used were collected with informed consent and the approval of the institutional review boards of the University of Southern California and Cedars-Sinai Medical Center. Tissue ChIP-seq is performed as follows 47 : Briefly, one frozen 3 mm core was pulverized in a pulverization bag (TT05) using the Covaris CryoPrep system (Covaris, Woburn, MA) twice at intensity 4. The tissue was then fixed using 1% formaldehyde (Thermo fisher, Waltham, MA) in Phosphate-buffered saline solution for 10 min at room temperature with rotation and quenched with 125 mM glycine for 10 min at room temperature with rotation. After rinsing with ice-cold phosphate-buffered saline solution twice, chromatin was resuspended and lysed in ice cold lysis buffer (50 mM Tris, 10 mM EDTA, 1% SDS with protease inhibitor) for 10 min. Chromatin was sheared to 300-500 base pairs using the Covaris E210 sonicator (AFA: 5% duty cycle, 5 intensity, 200 cycles/ burst) for 10 min. 1% of chromatin was saved as input for each sample. 5 vol of dilution buffer (1% Triton X-100, 2 mM EDTA, 150 mM NaCl, 20 mM Tris-HCl pH 8.1) was added to the rest of chromatin and the sample was incubated with 1 µg H3K27ac antibody (DiAGenode, C15410196, Denville, NJ; as a ratio of 1:600) coupled with protein A and protein G beads (Life Technologies, Carlsbad, CA) at 4°C overnight. The chromatin was washed with RIPA washing buffer (0.05 M HEPES pH 7.6, 1 mM EDTA, 0.7% Na deoxycholate, 1% NP-40, 0.5 M LiCl) for five times and, rinsed with TE buffer (pH 8.0) once. The sample was resuspended in elution buffer (50 mM Tris, 10 mM EDTA, 1% SDS), treated with RNase for 30 min at 37°C, and incubated with proteinase K overnight at 65°C. Sample DNA and input were extracted using Qiagen Qiaquick columns, and sequencing libraries prepared using the ThruPLEX-FD Prep Kit (Rubicon Genomics, Ann Arbor, MI). Libraries were sequenced using 75-base pair single reads on the Illumina platform (Illumina, San Diego, CA) at the Dana-Farber Cancer Institute.
ChIP-seq data processing. The AQUAS pipeline (version 0.3.3) 48 was used to process all H3K27ac ChIP-seq data. Reads were aligned against the reference human genome hg19, filtered by quality and duplication. Several quality control metrics were computed for each individual replicate, including number of reads, percentage of duplicated reads, NSC, RSC, and FRiP ( Supplementary Fig. 1). AQUAS follows ENCODE3 guidelines to process ChIP-seq data. For histone modification ChIP-seq data, AQUAS uses macs2 as the peak caller algorithm and a naive overlap approach that selects for the final peak set the regions of the pooled replicate that overlaps 50% or more of each individual replicate. For the 20 OCs, we obtained an average of 33.7 million mapped reads (standard deviation, sd = 9,071,589), and an average of 77,346 peaks (sd = 21,020), per sample. H3K27ac peaks were, on average, 1078 bp wide (sd = 1206 bp), and covered around 83 Mbp per sample (sd = 16 Mbp) ( Supplementary Fig. 1).
There was negative correlation between the number of peaks and the average peak width (Pearson's ⍴ = −0.58, P-value = 0.007) and positive correlation between number of peaks and genome coverage (Pearson's ⍴ = 0.54, P-value = 0.007). Histotype-specific regions were identified using the R Bioconductor package DiffBind 49 , which calculates differentially bound regions from multiple ChIP-seq experiments. For each histotype, we selected sites called in at least three out of five samples in the given histotype, with absolute FC value ≥ 3, FDR ≤ 0.05, contrasting the five samples of the histotype of interest against the remaining 15 samples using the method DBA_EDGER. Common regions that were called in all 20 OC H3K27ac ChIP-seq experiments regardless of the intensity of the ChIP-seq signal. The consensus set of H3K27ac ChIP-seq regions for HGSOC were all sites present in at least three (out of five) HGSOC samples, merging overlapping peaks and we recalculated the ChIP-seq score using DiffBind with the new coordinates across all samples, to have homogenous start/end positions for all peaks across the samples.
RNA-seq. Primary OC specimens were homogenized, and total RNA was extracted using TRIzol LS (Thermo Fisher Scientific, catalog number: 10296028). Ribosomal RNA (rRNA) was depleted using RiboMinus Transcriptome Isolation Kit (Thermo Fisher Scientific, catalog number: K155002). Poly (A) + RNA was then isolated using Dynabeads Oligo (dT) 25 (Thermo Fisher Scientific, catalog number: 61002). Twenty nanograms rRNA-poly (A) + RNA was used to prepare each RNA-Seq library. External RNA Controls Consortium (ERCC) spike-ins (Thermo Fisher Scientific, catalog number: 4456740) were added as control for normalization of the samples. Strand-specific RNA-Seq libraries were constructed using the NEBNext Ultra Directional RNA Library Prep Kit (NEB, catalog number: E7420). The resulting library concentrations were quantified using the Nanodrop. Libraries were sequenced to generate paired-end 75 bp reads on NextSeq 500 platform (Illumina) in high output running mode. Sequencing was performed at the Molecular Genomics Core facility at the University of Southern California.
RE/gene mapping. We were unable to collect sufficient tumor material for one of the endometrioid tumors for RNA-seq, therefore, we only have 19 samples (out of 20) with paired ChIP-seq/RNA-seq. We calculated the pairwise Spearman's rho correlation (ρ) of the H3K27ac ChIP-seq score of a given RE against the gene expression, in counts per million (CPM), of all genes within the same TAD. We selected RE/gene pairs with ρ > 0.4 and one-sided P-value < 0.05.
Pathway enrichment analysis is performed using Metascape [https://metascape. org] using as input the list of genes associated with CCOC-specific, HGSOC-specific, and MOC-specific (enriched and depleted separately) REs against KEGG pathway.
Filtering SNVs. We used SNVs from 232 OCs (110 from the PCAWG cohort and 122 from the UBC cohort). We removed all SNVs that overlap coding regions, regions annotated as low mappability regions by ENCODE (wgEncodeDacMap-abilityConsensusExcludable.bed), as well as areas of the genome that are prone to cause false positives in ChIP-seq assays (seq.cov1.ONHG19.bed.gz). For HGSOC (n = 169), there is a total of 1,276,929 SNVs that fall into 1,276,108 individual positions (1,275,309 positions with only 1 mutated sample, 784 positions with two mutated samples, 11 positions with three mutated samples, three positions with four mutated samples, and one position (chr3:174499208-174499208) with seven mutated samples).
Identifying FMREs. To identify FMREs we used a Poisson binomial distribution (PBD) with a vector of probabilities, i.e., one background mutation rate for each sample. Let X i (X i 2 ½0; 232) be a random variable that represents the number of samples with at least one mutation in the ith RE, then X i follows a PBD with a vector of probabilities p ¼ 1 À 1 À p k ð Þ n i ½ k , where n i is the size of the ith RE in base pairs, and p k is the global background rate of sample k (k 2 ½1; 232) empirically estimated by the ratio of the total number of non-coding SNVs in sample k (n k ) over the total coverage of the regions of interest in base pairs, (the H3K27ac-positive regions) (n cov ): To determine whether the observed number of mutated samples in the ith RE (s i ), we calculate the probability of having at least s i samples mutated, i.e., P-value i = PðX i ≥ s i Þ. P-values were adjusted used the Benjamini-Hochberg method. For gene-level analysis, we combined all REs and annotated promoters associated to gene j into a pseudo-RE and estimated the probability of having s j or more mutated samples in all combined REs, i.e., P-value j = PðX j ≥ s j Þ, where X j follows a PBD with a vector of probabilities p ¼ 1 À 1 À p k ð Þ n j ½ k , where n j is the total length of the pseudo-RE, i.e., the sum of the widths of all REs and annotated promoters associated to gene j.
Gene expression changes associated with RE mutations. For 89 WGS samples (out of 110) in PCAWG, RNA-Seq was available. Analysis was restricted to genes with at least two mutated samples in the RE overlapping the promoter or an associated RE. For each gene, we calculated the FC, defined as the ratio of the median gene expression of the gene in the mutated samples (x 1 ) over the median gene expression of the gene in the non-mutated samples (x 2 ).
Enhancer deletions in OC cell lines. CRISPR/Cas9 system was used to delete the chr6 enhancer in OC cells. FUCas9Cherry (Gifts from Dr. Marco Herold, Addgene plasmid number 70182) was transfected together with lentiviral packaging plasmids pMD2.G and psPAX2 (Gifts from Dr. Didier Trono, Addgene plasmid numbers 12259 and 12260) into HEK293T cells. UWB1.289 and SHIN3 cells were then transduced with lentivirus and mCherry-positive cells (UWB1.289/Cas9 and SHIN3/Cas9) were sorted by FACS. Cas9 activity was confirmed using gRNAs targeting the RB1 locus, and Surveyor T7 Endonuclease 1 digests. UWB1.289/Cas9 and SHIN3/Cas9 were subsequently transduced by lentivirus containing either gRNA pair targeting enhancer region on chromosome 6 (Chr6) or gRNA pair targeting a control gene OR1C1 before selection with 400 ng/mL puromycin. Plasmids expressing gRNAs were obtained from Transomic Technology. Sequences Reporting summary. Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability
The RNA-sequencing and ChIP-sequencing data have been deposited in the GEO database under the accession code GSE121103. The whole-genome sequencing data is available from the PCAWG database under the search "Primary Site=Ovary" and "Software=PCAWG SNV-MNV callers" [https://dcc.icgc.org/pcawg]. All the other data supporting the findings of this study are available within the article and its supplementary information files and from the corresponding author upon reasonable request. A reporting summary for this article is available as a Supplementary Information file.