4C-seq revealed long-range interactions of a functional enhancer at the 8q24 prostate cancer risk locus

Genome-wide association studies (GWAS) have identified >100 independent susceptibility loci for prostate cancer, including the hot spot at 8q24. However, how genetic variants at this locus confer disease risk hasn’t been fully characterized. Using circularized chromosome conformation capture (4C) coupled with next-generation sequencing and an enhancer at 8q24 as “bait”, we identified genome-wide partners interacting with this enhancer in cell lines LNCaP and C4-2B. These 4C-identified regions are distributed in open nuclear compartments, featuring active histone marks (H3K4me1, H3K4me2 and H3K27Ac). Transcription factors NKX3-1, FOXA1 and AR (androgen receptor) tend to occupy these 4C regions. We identified genes located at the interacting regions, and found them linked to positive regulation of mesenchymal cell proliferation in LNCaP and C4-2B, and several pathways (TGF beta signaling pathway in LNCaP and p53 pathway in C4-2B). Common genes (e.g. MYC and POU5F1B) were identified in both prostate cancer cell lines. However, each cell line also had exclusive genes (e.g. ELAC2 and PTEN in LNCaP and BRCA2 and ZFHX3 in C4-2B). In addition, BCL-2 identified in C4-2B might contribute to the progression of androgen-refractory prostate cancer. Overall, our work reveals key genes and pathways involved in prostate cancer onset and progression.


Results
We identified 8q24 enhancer interactomes in LNCaP and C4-2B cells. To identify partners interacting with the 8q24 risk enhancer genome-wide, we applied the 4C technique (Fig. 1A), using enzymatic digestion by BglII as the primary fragmentation method as applied in many other nuclear organization studies [19][20][21][22] . Cells were fixed with paraformaldehyde, and chromosome segments in physical proximity were crosslinked. Then chromatin was digested by BglII, and the ends of interacting chromatin regions were ligated with T4 ligase. We designed primers (Fig. 1B, see Methods) to specifically amplify interacting partners with the bait for library construction. Two replicate libraries for LNCaP and C4-2B were subjected to next-generation sequencing on the Illumina platform. The sequencing reads were analyzed with our custom analysis pipeline 27 . Circos plots depicting the interaction profiles centered on the bait are shown in Supplementary Fig. S1.
We evaluated reproducibility of interactions between biological replicates by counting trans interactions in every 2 Mb genomic bin and cis interactions in every 1 Mb bin. The Pearson's correlation coefficient was 0.812 between replicates of LNCaP and 0.723 between replicates of C4-2B ( Fig. 2A), indicating good consistency between replicates. Furthermore, clustering results showed that replicates of LNCaP and replicates of C4-2B separated well into two clusters (Fig. 2B). This suggests that replicates of the same cell line have consistent 4C results and the risk enhancer interaction profiles of LNCaP and C4-2B differ to some extent.
The analysis showed that around 47%~66% of the total reads in four datasets are distributed on the same chromosome where the bait is located (Fig. 3A). This conforms to the "cis/overall ratio of > 40%" criteria proposed by Van De Werken et al. 28 , indicating good quality experiments. After merging reads aligned to the same site, our analysis revealed more than 3,000 interacting sites in LNCaP and more than 2,000 sites in C4-2B (Table 1). Of note, 2,096 sites and 897 sites were reproducibly identified in the two biological replicates of LNCaP and C4-2B, respectively (Table 1, Fig. 3B). Besides, these interactions consist of both cis and trans sites, with trans interactions accounting for the majority (~90%) ( Table 1, Fig. 3B). To show that the overlapping between replicates is not by chance, we simulated 10 random sets of equally sized and number of sites for LNCaP and C4-2B respectively, and compared the overlapping rates in true versus random datasets. The median of overlapping rate was 0.45% for LNCaP random sets and 0.24% for C4-2B random sets (Fig. 3C). In comparison, the overlapping rate between biological replicates was much higher, 46.3% and 35.4% for LNCaP and C4-2B respectively, indicating that the overlap observed was not due to chance. In addition, the overlapping rates are comparable to similar studies investigating genome organizations. A previous ChIA-PET study of CTCF-mediated chromatin interactome in pluripotent cells reported the overlapping rate between biological replicates as 38% 29 . We reason that 4C technique takes a snapshot of the chromatin interaction patterns across hundreds of thousands of cells where chromatin-chromatin interactions occur in a highly dynamic fashion. These reproducible sites were thus considered to be high-fidelity interacting sites and would be used to conduct integrative association analysis.
We further applied a binomial model based screen to call out significant domains (Fig. 4), and noticed that the distributions of sites interacting with the 8q24 enhancer were very different. This revealed that the chromosome organizations around the enhancer at 8q24 were very different in LNCaP and C4-2B cells. This in turn implies that the enhancer exerts influence over different sets of genes in LNCaP and C4-2B, leading to distinct expression profiles and cancer properties between LNCaP and C4-2B cells. The 8q24 enhancer interactomes occupy open chromosomal compartments. Following identification of 8q24 interacting regions, we speculated that the bait-interacting regions could share similar properties. We examined publicly available datasets related to prostate cancer (see Methods) and conducted association analysis following the two-compartment model proposed by Hi-C studies 30,31 , where chromosomes are distributed into open compartments enriched with active marks and closed compartments depleted of such marks.
Methylation of lysine 4 on Histone H3 -including H3K4me1, H3K4me2 and H3K4me3 -is a hallmark of open chromatin, often featuring promoters and enhancers of actively transcribed genes 32,33 . Moreover, these Bait" segment is shown in black while "capture" segments are shown in red, blue and orange. Two rounds of digestion are included, and inverse PCR is performed to amplify unknown captured sequence. (B) Upper panel shows a broad view of human 8q24 locus, and lower panel shows a detailed view of AcP10 region at 8q24 locus. AcP10 region is highlighted in cyan. The pair of primers for inverse PCR is shown in red triangles. In gene annotation track, coding gene (POU5F1B) is shown in blue, while non-coding genes (CASC8 and RP11-382A18.2) are shown in green.
The 4C sites are in close proximity to prostate cancer risk loci. We asked whether 4C sites identified in LNCaP and C4-2B are associated with annotated SNPs in the 100 independently identified prostate cancer risk loci 3,5-12 . Among the total 112 index-SNPs summarized so far (Supplementary Table S1 The 4C regions are enriched with pathways and genes involved in prostate cancer activity. We input reproducible 4C sites in LNCaP and C4-2B to GREAT 43 (Genomic Regions Enrichment of Annotations Tool) to analyze functional significance of these sites (Fig. 6, Table 2). We observed significant enrichment of positive regulation of mesenchymal cell proliferation in both LNCaP (p = 8.395e-6) and C4-2B (p = 4.542e-7). A recent study 44 applying 3C-MTS (chromosome conformation capture based multiple target sequencing) technology to investigate 8q24's physical interactions with multiple loci across the genome also showed the enrichment of this term. Other terms, including regulation of cell division (p = 9.541e-9), positive regulation of stem cell proliferation (p = 9.131e-6), phosphatidylinositol-mediated signaling (p = 6.312e-5), were found enriched in C4-2B as well. Genes in these terms might be closely involved in prostate cancer activities. In addition, we observed that response to light stimulus was shown as an enriched term in C4-2B, which is interesting considering previous studies proposing melatonin might reduce prostate cancer cell growth 45 and lower prostate cancer risk 46 .
As for pathway analysis, TFG-beta signaling pathway was enriched in LNCaP (p = 4.283e-7), while p53 pathway (p = 8.486e-13) and hypoxia response via HIF activation (p = 1.270e-3) were enriched in C4-2B. These pathways have been demonstrated or proposed as critical in the development of prostate cancer and the regulation of DNA replication, transcription and cell cycle in prostate cancer cells [47][48][49][50][51] . The 4C sites were also enriched for some miRNA motifs. For example, MIR-453 (ACAACCT) and MIR-205 (ATGAAGG) were significantly enriched in LNCaP (p = 1.925e-13) and C4-2B (p = 9.812e-6) respectively. Besides, we observed the enrichment of the transcription factor with binding motif TWSGCGCGAAAAYKR in LNCaP (p = 2.699e-12). We found this is an E2F motif in the inverted orientation (TTTSSCGC), implying the role of E2F in prostate cancer development. All the four disease ontology terms (bladder cancer: p = 9.371e-7, colon adenocarcinoma: p = 2.335e-6, large intestine adenocarcinoma: p = 3.691e-6, retroperitoneal neoplasm: p = 1.104e-3) in C4-2B were related to cancer, but no counterparts were found significantly enriched in LNCaP. Of note, under MSigDB Perturbation, the top 4 hits in LNCaP and top 2 hits in C4-2B are all related to cancer (Supplementary Table S2). Next, we retrieved a set of seed genes associated with prostate cancer from Phenolyzer 52 (Phenotype Based Gene Analyzer), a tool to discover genes associated with a disease/phenotype term. By comparing GREAT identified genes with the reference genes from Phenolyzer, we found 159 genes in LNCaP and 84 genes in C4-2B related to prostate cancer development (Table 3). 25 prostate cancer related seed genes commonly exist in LNCaP and C4-2B. Among them, MYC and POU5F1B are the highest ranked genes. Previous studies have shown that irregular chromatin looping and abnormal expression at the MYC locus could be a critical, oncogenic driving factor in prostate cancer 53,54 . Meanwhile, we noted some seed genes exclusively exist in LNCaP or C4-2B. The highest ranked genes in LNCaP are ELAC2 and PTEN; in comparison, the counterparts in C4-2B are BRCA2 and ZFHX3. As a comparison, the result of Du et al. 44 strengthens our observation of the cis-interaction with MYC and trans-interaction with FAM84B (Table 3). In addition to MYC and FAM84B, Du et al. 44 also showed CD96, PVT1, GSDMC, CXorf36, RRP12, USP14, SMIN3 as interaction genes. However, except for GSDMC in C4-2B, none of these genes were found in our 4C results for LNCaP or C4-2B. Based on Phenolyzer's analysis, none of these 7 genes is annotated as seed genes associated with prostate cancer, thus it is possible that these interactions with 8q24 locus are not as important as others.

Discussion
Prostate cancer is the second leading cause of death for men in the United States. Many studies have identified cancer risk associated loci and pathways involved in prostate cancer onset and development. In this study, we employed an unbiased "one-versus-all" technique -4C-seq to uncover genome-wide interacting partners of a risk enhancer at the 8q24 prostate cancer risk locus. Using LNCaP and C4-2B cell lines for our 4C-seq analysis, we were able to compare these representative models of androgen sensitive and insensitive prostate cancer. Our analysis provides insight into key regions and genes interacting with the risk loci, contributing to our understanding of prostate cancer progression.  As shown above, LNCaP and C4-2B share similar genes interacting with the risk enhancer at 8q24, suggesting a general signature expression landscape for prostate cancer cells. Meanwhile, we also see specific differences in each cell line, where different genes are affected in the same or different enriched pathways. This is understandable given the fact that C4-2B is an osteotrophic and androgen insensitive prostate cancer cell line, unlike its parental LNCaP cell line. The difference in function relies on different set of genes. For example, BCL-2 gene was an interacting candidate in C4-2B but not in LNCaP. BCL-2 is an oncogene encoding inner mitochondrial membrane protein that is anti-apoptotic 55 . It has been shown that the transition to a hormone unresponsive state is accompanied with elevated BCL-2 expression, increased proliferation and decreased apoptosis [56][57][58] . Besides, there is an inverse relation between androgen receptor and BCL-2 expression 59 , which supports the involvement of BCL-2 in androgen-insensitive cells. To characterize the different picture of gene expression levels in LNCaP and C4-2B, we think RNA-seq can be conducted, which combined with 4C-seq results, will add to our knowledge of how prostate cancer cells transition into hormone-refractory state.
It came to our attention that some pathways and genes related with pluripotent stem cells are enriched, for instance, pathway-positive regulation of stem cell proliferation, and genes-MYC, POU5F1B. This implies the involvement of cancer stem cells in prostate cancer. Normal pluripotent stem cells and cancer stem cells are similar in many aspects, and the most striking one might be the ability to perpetuate themselves by self-renewal. Normal stem cells can be the targets of transforming mutations and cancer stem cells can drive the cancerous cells proliferation 60 . In addition, Du et al. 44 and our results showed that positive regulation of mesenchymal cell proliferation was an enriched pathway in both LNCaP and C4-2B. Prostate cancer is one type of carcinoma, developing from epithelial cells that line the surface of glands and ducts 61 . Thus it is compelling to see that positive regulation of mesenchymal cell proliferation is an enriched pathway, given that transformed cells of mesenchymal origin will usually lead to sarcoma made of cancerous bone, cartilage, muscle, vascular, hematopoietic tissues and so on. Despite that, it has been shown that during tumorigenesis concomitant changes occur in cells surrounding epithelial cells as well 62 . These cells, called as stromal cells and composed of endothelial, fibroblastic, smooth muscle and nerve cells, play a supportive role for epithelial layer 61 . Several investigations showed that normal molecular dialogue between embryonic epithelial and mesenchymal cells form the basis of tissue/organ function [63][64][65] , while the disruption of that homeostasis might confer tumorigenesis. That sets the concept framework for the involvement of mesenchymal cells in prostate cancer.
In addition, we realized that some other cancers (for example, bladder cancer, breast cancer and lung cancer, Table 2, Supplementary Table S2) are shown enriched in our ontology analysis of genes interacting with 8q24. 8q24 was found to harbor common alleles in prostate, breast and colon cancer 14,66,67 . Besides, breast, prostate, colon and lung cancers are common type of carcinomas derived from epithelial cells. Thus it is probable that the risk locus 8q24 interact with similar genes and have common mechanism underlying the association with cancer.
We observed some miRNA motifs enrichment in 4C sites of LNCaP and C4-2B, implying that miRNAs might be implicated in prostate cancer. Over the years, miRNAs have been shown to play a role in regulating gene expression at post-transcriptional level [68][69][70] . Moreover, evidence is mounting that miRNAs are associated with various diseases, especially cancer [71][72][73] . Of note, recently an increasing number of studies are dedicated to taking computational approaches to characterize the association between miRNA and disease [74][75][76] . Our findings might contribute to the knowledge of miRNA-cancer relationship.
In our analysis, we have obtained a list of potentially biologically significant regions and genes in LNCaP and C4-2B cells. We can confirm interactions between the risk enhancer and candidate genes by performing 3C and fluorescent in situ hybridization (FISH). To test the biological importance of the interaction with the risk enhancer, we can apply CRISPR (clustered regularly interspaced short palindromic repeats)-Cas9 genome editing to knock out the enhancer region and compare RNA-seq expression profiles between edited and unmodified prostate cancer cells. Recently, Tak et al. 77 has successfully applied this strategy to examine the effect of deletion of a distal element on transcriptome in colon cancer. The genes proposed by computational analysis and experimental validation will be important targets for drug development and intervention therapy.
4C library preparation. 4C library preparation was performed following previously described protocol 28,78 with modifications. 5 million cells were trypsinized and resuspended as single cells in 0.5 ml RPMI 1640/10% FBS. For crosslinking, 9.5 ml of 2% formaldehyde/10% FBS in PBS was added. After being incubated for 10 minutes, fixation was quenched with 0.6 ml of 2.5 M glycine and cells were lysed in 1 ml lysis buffer 1 (50 mM HEPES-KOH, pH7.5, 140 mM NaCl, 1 mM EDTA, 10% Glycerol, 0.5% NP-40, 0.25% Triton X-100, protease inhibitors) for 10 minutes at 4 °C with rotating. Nuclei were pelleted by centrifugation and washed with lysis buffer 2 (10 mM Tris-HCl, pH8.0, 200 mM NaCl, 1 mM EDTA, protease inhibitors). Pelleted nuclei were resuspended in 0.5 ml of 1.2 fold restriction enzyme buffer and incubated with 0.3% SDS for 1 hour at 37 °C with shaking and further incubated with 2% Triton X-100 for 1 hour. BglII restriction enzyme were added and incubated overnight at 37 °C with shaking. Digestion was stopped after adding 1.6% SDS and incubated at 65 °C for 25 minutes. Samples were then diluted in 6.125 ml 1.15 fold ligation buffer and 100 Weiss unit of T4 DNA ligase was added. Samples were kept at 16 °C for 4 hours and then at 25 °C for 30 minutes. The ligated chromatin was digested by proteinase K and then incubated with RNaseA. Samples were purified by phenol-chloroform extraction and ethanol precipitation. Samples were further digested by CviQI and circularized using T4 DNA ligase. After purification, 8  Index primer for replicate 2 -caagcagaagacggcatacgagatCTGATCgtgactggagttcagacgtgtgctcttccgatcGGAAGTA-GAGTAGCAATTCTTG; Universial primer -aatgatacggcgaccaccgagatctacactctttccctacacgacgctcttccgatctAT-GGAAATCAAGCAGCAGATCT). The amplicons were extracted by E-gel. The bar-coded DNA libraries were sequenced as 50 bp single-end reads using the Illumina HiSeq2000 platform.
4C-seq data analysis. Reads with 5' end mapped to the forward inverse PCR primer sequence were selected.
The rest of the selected reads (including the BglII sites) were mapped to hg19 assembly by Burrows-Wheeler Aligner 79 (BWA). The ligation sites were determined in that process. To get rid of off-target aligned reads, we applied a second round alignment, where the mapped ligated BglII sites were further mapped to a library that includes the locations of genome-wide BglII sites. Sites with only one read were eliminated to filter out random noise. Reproducible sites were defined as exact match of 4C sites with coverage > 1 in both biological replicates.
To call out statistically significant regions, we applied a binomial model 19,28 . Every interacting site i on chromosome W with length L W was examined within window w with length l w . The number of interacting sites was defined as Ci w , , and a z score was assigned to the window based on the following calculation: in which μ W is the expected number of interacting sites in window w on chromosome W . A window size of 500 was used in counting the number of ligated sites to define inter-chromosomal interactions, while a window size of 200 was set to define intra-chromosomal interactions. To define intra-chromosomal interactions, we set background window size to 5,000, so as to calculate the expected number of ligated sites in a window. We used type I error rate of 0.05 as the threshold to identify non-random chromosomal interactions.
Integrative association study. We integrated publicly available datasets that are associated with prostate cancer or LNCaP cells. From the ENCODE Project Portal, we retrieved ChIP-seq data for H3K4me3 (GSM945240). We also retrieved H3K27Ac ChIP-seq data (GSE51621), NKX3-1 ChIP-seq data (GSM699633), FoxA1 ChIP-seq data (GSM699634, GSM699635), and Androgen Receptor ChIP-seq data (GSE28219). The ChIP-seq results for other histone marks were downloaded from the Cistrome database.
GREAT and Phenolyzer. Genes associated with 4C sites were retrieved by GREAT 43 (http://bejerano.stanford.edu/great/public/html/), a tool to assign functional annotation to genomic regions. The background regions were chosen as the default whole genome. We used default basal plus extension association rule (proximal: 5 kb upstream, 1 kb downstream; distal: up to 1000 kb). Curated regulatory domains were included.
Seed genes associated with prostate cancer were discovered by Phenolyzer 52 (http://phenolyzer.usc.edu/), a tool to discover genes based on user inputted disease/phenotype terms. "prostate cancer" was used as input term and a total of 964 genes with ranks were returned.  Other. Genomic feature annotations were retrieved from the UCSC genome bioinformatics site. Genomic coordinates in other assemblies were converted to GRCh37/hg19 using the liftOver tool. Statistical tests and plotting were performed using R.