Gene fusions, mainly between TMPRSS2 and ERG, are frequent early genomic rearrangements in prostate cancer (PCa). In order to discover novel genomic fusion events, we applied whole-genome paired-end sequencing to identify structural alterations present in a primary PCa patient (G089) and in a PCa cell line (PC346C). Overall, we identified over 3800 genomic rearrangements in each of the two samples as compared with the reference genome. Correcting these structural variations for polymorphisms using whole-genome sequences of 46 normal samples, the numbers of cancer-related rearrangements were 674 and 387 for G089 and PC346C, respectively. From these, 192 in G089 and 106 in PC346C affected gene structures. Exclusion of small intronic deletions left 33 intergenic breaks in G089 and 14 in PC346C. Out of these, 12 and 9 reassembled genes with the same orientation, capable of generating a feasible fusion transcript. Using PCR we validated all the reliable predicted gene fusions. Two gene fusions were in-frame: MPP5–FAM71D in PC346C and ARHGEF3–C8ORF38 in G089. Downregulation of FAM71D and MPP5–FAM71D transcripts in PC346C cells decreased proliferation; however, no effect was observed in the RWPE-1-immortalized normal prostate epithelial cells. Together, our data showed that gene rearrangements frequently occur in PCa genomes but result in a limited number of fusion transcripts. Most of these fusion transcripts do not encode in-frame fusion proteins. The unique in-frame MPP5–FAM71D fusion product is important for proliferation of PC346C cells.
Prostate cancer (PCa) is one of the most frequently diagnosed cancers and a major cause of death in men in countries with a western lifestyle.1 Throughout the past two decades, several genetic events have been revealed that are important in development and progression of PCa.2 The predominant genetic abnormalities identified so far include the forming of ETS-fusion genes,3 loss of phosphatase and tensin homolog (PTEN) tumor suppressor gene,4 amplification of AR and amplification of the MYC oncogene.5 Most of these studies used comparative genomic hybridization/SNP (single nucleotide length polymorphism) arrays and performed genome-wide copy number variation analysis as a start to identify specific genetic alterations.6, 7 The use of gene expression arrays was important in the detection of genes with aberrant expression patterns.8 The integration of these two different sets of data, copy number variation and gene expression reinforced and expanded this panel of genetic alterations.9, 10
Next-generation sequencing (NGS) techniques emerging over the last few years proved to be a major breakthrough in documenting novel genetic changes resulting in a better understanding of cancer cell biology.11, 12 Both RNA and DNA can be used as templates for NGS methods and it is possible to analyze either paired-end, mate pair, as short or long sequence reads depending on the platform applied.13, 14 An exome-sequencing approach uses only 1–2% of the genomic sequences as a template through capture and enrichment before the sequencing process.15, 16 This allows for higher sequence coverage at the expense of a few disadvantages such as the uneven capture efficiency and the absence of unknown or yet to be annotated exons.17 Instead of a focused exome-sequencing approach, more challenging whole-genome sequencing provides the most complete view of genomic changes.18 A range of sequencing technologies is now available and more are becoming available soon that provide different approaches for library construction, clone separation and amplification, and nucleotide detection.19, 20, 21, 22 The technology utilized in this study from Complete Genomics Inc. (Mountain View, CA, USA) makes use of array technology to separate amplified DNA template organized into single-strand coils, known as DNA nanoballs. Nucleotide read-out is based on a ligation protocol.23
In PCa genomics the use of NGS has allowed significant progress in cataloging systematically all the DNA changes present in cancer.24 The use of RNA-seq has produced major insight into the identification and expression of novel long noncoding RNAs and novel gene fusions in PCa.25, 26 Exome sequencing of PCa samples was utilized for the detection of novel small mutations.27, 28, 29, 30 In the same way, a mutational landscape of patients with castration-resistant PCa was described by Tomlins et al.3 in 2012.31 In addition, deep RNA sequencing is also used to identify transcription-induced chimeras associated with human prostate adenocarcinoma such as the novel TMEM79-SMG5.32, 33 However, a full overview of all structural variations (SVs) and mutations can only be provided by whole-genome sequencing. Many studies so far have applied NGS to obtain a more thorough perceptive of already known genetic alterations.34 The mechanism involving PTEN loss has been correlated with novel genetic alterations in genes located in the vicinity of PTEN.24 Moreover, the expression of constitutively active androgen receptor splice variants in castration resistance PCa has been described.35 A few studies have been published making use of whole or focused genome NGS technologies to discover and describe new genetic alterations in PCa. The study by Berger et al.24 in 2011 provided a comprehensive approach towards both known and newly identified mutations and genomic rearrangements. Other studies have focused on either a particular genomic region such as the 10q11.2 97-kb region comprising the MSMB gene or in the analysis of all alterations present in a particular PCa sample.36, 37 This was the case for the new type of prostate adenocarcinoma identified that has a hybrid phenotype of both luminal and neuroendocrine cells.37, 38
In this study, we applied whole-genome paired-end sequencing in two PCa samples. Analysis of the PC346C PCa cell line and the G089 primary PCa patient sample, which were selected based on the absence of ETS-fusion genes,39 generated a wide panel of new SV events. We focused on the detection of SVs to unravel the events leading to fusion genes. The potential fusion genes were validated and the genes involved were checked for copy number alteration in other PCa samples. The in-frame fusion found in PC346C was assessed for possible functional implications.
SVs detected by NGS
Whole-genome paired-end sequencing (Complete Genomics Inc., 2071 Stierlin Court) of two PCa samples, the G089 primary tumor and the PC346C cell line, generated data with an average coverage of 61 × and 67 × , respectively (Supplementary Table S1, Supplementary Figure S1). The fully aligned genome fraction for both samples was above 93% (Supplementary Figure S2). Data showing reads mapping at a different position/orientation than expected were used to perform de novo assembly and generate a list of all SVs in the two DNAs. In the PC346C sample, a total of 3898 SVs were detected (both interchromosomal and intrachromosomal) as compared with the reference genome. For the G089 sample the total of SVs was 3837 (Table 1). As normal DNA from these two PCa samples was not available, the data were curated for polymorphisms using a data set derived from 46 normal DNA samples (Supplementary Table S2). This left us with 387 candidate cancer-related events in PC346C and 674 in G089 (Table 1).
Overall, 90.1 and 82.4% of all SVs detected in PC346C and G089 were also present in any of the 46 normal controls. Of the cancer-related events the far majority did not occur in a gene intron or exon: only 2.7 and 5.0% of SVs were inside genes for both junctions in PC346C and G089, respectively (Figures 1a and b). Additionally, we often found the same gene IDs on both sides of the junctions, which corresponded to (small) intragenic rearrangements, such as deletions, duplications and inversions. We detected 14 SV events in PC346C and 33 in G089 where each side of the junction occurred within a different gene (Figure 2).40 From these 14 and 33 SVs, we excluded the events in which the orientation of the genes was head-to-head or tail-to-tail to select for gene fusion events that could generate fusion transcripts. In the end, nine (PC346C) and 11 (G089) SVs fusing two different genes in the same orientation were identified (Table 2; Supplementary Tables S3–S5).
Validation of fusion genes present in PC346C and G089
For each possible gene fusion event in PC346C and G089, we checked all the mapped reads, both the normal and the mispaired reads (Supplementary Figure S3). We observed that a number of fusion events were not supported by a convincing number of discordant mate pairs (Table 2). To evaluate whether the criterion of three or more mispaired reads was justified, all 9+11 fusion events were validated using (RT)–PCR both at the DNA and RNA levels (Supplementary Figures S4 and S5, Supplementary Tables S6–S9). Most of the fusion events with low mispaired read counts (<5) could not be confirmed, showing that the cutoff number of three discordant mate pairs was set low.
For most gene fusions, the event was confirmed by sequencing on the RNA and/or DNA level. Out of these six PC346C fusion events, all were validated on the DNA level with only one that could not be verified at the transcript level (Supplementary Tables S10 and S11). The fusion in question, ITGA10–RBM8A, comprises the downstream part of the 3′-untranslated repeat sequence of RBM8A so it was not expected to be present within the fusion transcript. In G089, out of the nine reliable fusion events eight were validated at the DNA level and five at the RNA level. The four fusions that were validated only at the DNA level were assessed for the expression levels of the involving genes using Affymetrix Exon-array analysis (Affymetrix, Santa Clara, CA, USA) of the G089 PCa sample. In general, we found a low expression pattern of these fusion genes. The OBP2A–OBP2B fusion, comprising only three mispaired reads, was verified at the RNA level and involved two homologous genes. The AATK–LAMA5 fusion was confirmed at the DNA level, although the number of discordant mate pairs was only five (Table 2).
Next, we assessed which fusion events generated an in-frame fusion protein. On the basis of the sequence of the fusion transcript we determined that MPP5–FAM71D in PC346C and ARHGEF3–C8ORF38 in G089 were in-frame and most likely produced a fusion protein (Table 2). The final set of validated gene fusions (Figures 3a and b) was assessed for discordant mate pairs on both sides of the break as well as visible copy number-related breakpoints using the Illumina 1M SNP array (Illumina, San Diego, CA, USA). The presence of discordant mate pair reads on both sides of the break corresponds to distinct genomic locations that were joined together. On the other hand, a break with discordant mate pairs on only one side will, most likely, correspond to a break with loss of DNA on the other side. Regarding our final set of 15 fusions we observe mainly the presence of discordant mate pairs on one side of the break (Figure 3b) corresponding to the candidate gene fusion we initially detected. The remaining fusions had discordant mate pairs on both sides, meaning both sides of the break fused to different genomic locations. That was the case for AATK–LAMA5 with the reciprocal LAMA5–AATK gene fusion detected, NFKB which fuses to both LRBA and FAM160A1 and CCNY, which fuses to PARD3 and WDFY2. The remaining C8ORF38, LRBA and FAM160A1 gene fragments break to ARHGEF3 and NFKB, respectively, as well as to additional intergenic genomic locations.
All the genes of the validated fusion events were checked for breakpoints in a 66-SNP array data set of 11 PCa xenografts, eight PCa cell lines and 47 PCa patient samples with six matching normal controls. We observed that a few genes had breakpoints at the SNP level in other PCa samples providing us with an impression of the frequency of the specific breaks (Supplementary Table S12, Supplementary Figure S6). We checked for the presence of the same fusion transcript (THADA–MAP4K3, ARHGEF3–C8ORF38 and LRBA–NFKB1) in the samples with copy number variations (CNVs), but never observed a positive RT–PCR using the primer pairs that proved the fusion transcripts in G089 and PC346C (data not shown).
MPP5–FAM71D fusion has functional implications in PC346C cell line
The two in-frame gene fusions were assessed for the expression level of the 3′ acceptor genes in the exon-array data of G089 and PC346C. We observed a clear upregulation of FAM71D expression from exon 4 onwards in the PC346C sample (Supplementary Figure S7) compared with normal controls and with G089. Regarding C8ORF38 we did not observe differential expression in G089 as compared with normal controls and other PCa samples (Supplementary Figure S8). MPP5 and ARHGEF3 do not show differential expression in PCa samples or in PC346C and G089.
The MPP5–FAM71D fusion was confirmed both at the RNA and DNA levels (Figure 4a) and the breakpoint was sequenced (Figure 4b). Forty-three discordant mate pairs were detected for MPP5 and FAM71D (Table 2). The two genes are located on chromosome 14, ∼52 kb apart. In the SNP array data we did not detect any copy number alteration (deletion or gain) in these two genes. The fusion transcript predicted to code for a fusion protein of aa 1–217 of MPP5 (674 aa full-length protein) and aa 54–422 of FAM71D (Figure 4c). The expression of MPP5 is not known to be prostate-specific or androgen-regulated. Obviously, the expression of the 3′-sequence of FAM71D was highly upregulated in PC346C, now being under control of the MPP5 promoter region. The PC346P patient-derived xenograft from which the PC346C cell line originated also shows high expression of the fusion gene (Figure 4d).
In order to assess the frequency of the MPP5–FAM71D fusion and FAM71D overexpression, we performed TaqMan gene expression assays on 201 PCa samples. None of these were positive for the MPP5–FAM71D fusion transcript observed in PC346C. In order to extend the analyses to other genomic events that would result in the FAM71D overexpression, we found 10 samples to have a relative high expression of FAM71D as compared with PC346C, normal prostate and the bulk of cancer samples (Supplementary Figure S9).
Next, we addressed the functional relevance of FAM71D overexpression in the PC346C cell line by the knockdown of its expression. The small interfering RNA SMARTpool of FAM71D successfully caused downregulation in both PC346C and the normal epithelial prostate RWPE-1 cell line (Figure 5a). In the case of PC346C cells we observed a considerable decrease in proliferation capacity at all time points tested (2, 4 and 6 days after the knockdown). This decrease was significant when compared with all the transfection controls and the normal PC346C growth (Figure 5b). In contrast, RWPE-1 cells showed a slight increase in proliferation after FAM71D knockdown, which was not significant when compared with the transfection controls (Figure 5c).
In order to address whether there was induction of apoptosis by FAM71D knockdown, we measured the caspase 3/7 activity for both transfected PC346C and RWPE-1. We did not detect an increase in apoptosis in PC346C or RWPE-1 due to the knockdown of FAM71D as compared with the transfection controls (Figure 5d). The cell growth effect of overexpression of FAM71D in PC346C is most likely the result of increased proliferation and not inhibition of apoptosis. The selective decrease in proliferation only in PC346C indicates a functional beneficial role for MPP5–FAM71D in these cancer cells.
Correction for most SV polymorphisms using a control data set of 46 normal samples
In the present study, we report the SVs detected by whole-genome paired-end sequencing of two PCa samples, the primary tumor G089 and the cell line PC346C. As described above, we detected 3898 and 3837 SVs in PC346C and G089 as compared with the HG18 reference genome. As most of these SVs are polymorphisms, one needs to separate these common variants from the cancer-related events. Optimally, the sequence of a normal DNA sample from the cancer patient will do so, but the normal control DNA of the G089 patient sample and PC346C cell line is not available. This is not an uncommon problem since for many cell lines, xenografts and old cancer sample repositories, a normal blood or tissue sample has not been stored. In order to remove common polymorphism, one can make use of databases of structural variants that are far from complete but are becoming more comprehensive rapidly.41 We decided to use a control data set of 46 normal DNA samples that were sequenced on the same Complete Genomics platform and analyzed using the identical pipeline. This resulted in the exclusion of 3511 and 3163 SVs from the initial list of SVs detected in PC346C and G089, respectively. As expected, the majority of the SVs detected initially were normal occurring variation across individuals. In the end, we anticipate that most of these common variants will not correlate with cancer, although we cannot exclude their contribution to disease susceptibility.42, 43
The list of gene-to-gene rearrangements in the 46 normal samples was used to estimate a false discovery rate of cancer-related gene fusions. We observed that 28% of all the ‘common’ gene-to-gene SVs were present in only one of the 46 normal samples (Supplementary Table S6). Such a high proportion means that many gene-to-gene SVs are rare and therefore we can assume that of our cancer-related gene fusions described, some are rare polymorphisms.
Overall, our study showed that, in the absence of normal control DNA, it is still possible to eliminate the majority of the normal polymorphisms. Correction for rare polymorphisms will improve with the use of larger sets of normal control samples particularly with samples of the same ethnic origin.
G089 and PC346C PCa samples do not show positive selection for breaks inside genes or in-frame gene fusion transcripts
The detection of novel gene fusions in G089 and PC346C aimed at investigating the relevance of these events in the initiation and/or progression of PCa. As gene rearrangements often affect the genes involved and are therefore under selection pressure, one might expect these events to be more common than random nonfunctional intergenic SVs. In order to determine whether there is a bias towards breaks inside genes, we checked whether such occurrences are more prevalent than the 38.5%—the approximate percentage of our genome that consists of genes (introns and exons).44 Considering that each SV consists of two breakpoints, out of the 774 breaks in PC346C (387 SVs, Table 1), 42 and 212 (106 SVs with genes on both sides of the SV) are inside a gene, which is 32.8% of all breaks observed. For G089, in which out of the 1348 breaks, 96 and 384 are inside a gene representing 35.6% of all the breaks observed. This shows there is no bias towards breaks inside genes in our two PCa samples. Apparently, the occurrence of DNA breaks is an arbitrary process and is not selective for genomic coding regions.
Most of the gene rearrangements present in PC346C and G089 corresponded to small deletions, insertions or inversions within the same gene. After the exclusion of these SVs, we were left with a total of 20 feasible gene fusions from which 17 were successfully validated. The remaining three fusions had a very low number of discordant mate pairs and are likely false-positives.
In order to verify the presence of in-frame gene fusions, we sequenced the breakpoint of the fusion transcripts. We observed that three gene fusions generated an in-frame fusion transcript, which is less than expected by random exon splicing, and conclude that there is no bias towards in-frame gene fusions in PC346C and G089. We expect that many of the gene fusions we have identified will be passenger events and not genetic drivers of PCa. However, it is certainly possible that some of the fusion genes that are inactivated by the rearrangement have tumor suppressor activity and their knockout does contribute to tumor progression. We checked whether additional mutations were identified in the fusion genes and found that with the exception of a few (THADA, C8ORF38, LRBA and LAMA5) all represent novel genes with potential implications in cancer biology.
Integration of NGS data with SNP and gene expression arrays is a powerful approach to determine the most relevant SVs in cancer samples
In our study, we used SNP and Exon-array data samples in conjunction with the NGS data to find support for the mechanism and relevance of the rearrangement using the copy number variation and transcript expression changes.
Regarding the unique gene partners identified in our study, we used the gene expression data to exclude genes that are not expressed in either PC346C or G089. This was particularly important to determine the expression change of fusion gene acceptors. In relation to the in-frame gene fusions, we observed that FAM71D is clearly upregulated in PC346C as compared with other normal and PCa samples, whereas C8ORF38 is not differentially expressed in G089 as compared with the other samples (Supplementary Figures S7 and S8). Owing to the fusion, FAM71D is regulated and expressed in a higher manner by the MPP5 promoter, and this outlier expression is a good indication that this gene fusion has a functional role in PC346C (Supplementary Figure S10).
Our final set of gene fusions comprised several candidate genes, many of which were not yet identified as cancer-related. The genes LRBA, KNF19A, PARD3 and THADA were reported previously to be involved in other genomic events in PCa and thyroid adenocarcinoma (THADA).24, 32, 33, 45 LRBA was found to be rearranged with regions downstream of the gene FSTL5, whereas KNF19A was not directly affected since the breakpoint was 3 kb upstream of the coding region.24 As for PARD3 it was found to be fused to ARHGAP10 and THADA to noncoding regions in chr3p25 and chr7p15.32, 33 SNP microarray data allowed us to check whether any of the respective gene partners of our list of fusion genes had additional breaks visible at the SNP level in an additional 64 PCa samples (Supplementary Figure S11). THADA, MAP4K3, ARHGEF3 and LRBA were found to have copy number breaks in other PCa samples, mainly PCa xenografts, cell lines and late stage tumors (Supplementary Table S12). The same fusions (THADA–MAP4K3, ARHGEF3–C8ORF38 and LRBA–NFKB1) were not present in these samples with the CNV, indicating that THADA, MAP4K3, ARHGEF3 and LRBA rearrange to diverse genomic locations, potentially different fusion partners. The CNV check using SNP array data will underestimate the number of samples with breaks inside these genes, as rearrangements without loss of DNA and small deletion or amplifications cannot be identified using this technology. Recurrent breaks strongly suggest a role for these genes in PCa progression, although further studies are needed to explore the exact frequency and functional consequences.
The integration of NGS and SNP microarray data generated a few inconsistent observations. We expected the lack of discordant mate pair reads on both sides of the break to indicate loss or gain of genetic material and therefore to be detected as a copy number change in the SNP array. This was the case for eight of our gene fusion partners, in which only one side of the break revealed discordant mate pairs without CNV in the SNP array data. This discrepancy can be due to the lower resolution of the SNP array analyses, which has difficulties in detecting small deletions and amplifications. In the future, small-window copy number estimate from high coverage NGS data might resolve the differences observed in our study.
MPP5–FAM71D is related to the proliferation capacity of PC346C
The MPP5–FAM71D fusion, identified in the PC346C cell line, was silenced by the knockdown of FAM71D. This resulted in decreased proliferation capacity of PC346C cells but had no effect on RWPE-1-immortalized normal epithelial cells. This indicates that MPP5–FAM71D has a functional role in the growth of PC346C, although the precise mechanism by which this fusion exerts its function is not clear. FAM71D is located on chromosome 14 in the vicinity of MPP546 and its function is largely unknown. Recent studies in African trypanosomes showed Fam71 to function as a calcium-binding protein through its EF-hand calcium-binding domain but nothing is known on the mammalian ortholog.47
Although we could not detect the MPP5–FAM71D fusion in any other of the 201 PCa samples analyzed, we did find relative higher expression of FAM71D in 10 of these cancer samples. The basis of this higher expression is unclear and could be due to fusion events or transcriptional upregulation. We conclude that FAM71D overexpression is not a frequent event in PCa and that the fusion to MPP5 as observed in PC346C is patient-specific and extremely rare.
Nevertheless, we cannot exclude an influence of MPP5, which is disrupted by the fusion event. MPP5 is a member of the membrane-associated guanylate kinase family and contributes to the establishment of cell polarity in mammals, which is crucial for tissue organization and whose loss is a hallmark of cancer.48, 49, 50 So far, the loss of MPP5 has led to defects in polarity and assembly of tight junctions51 as well as ineffective delivery of E-cadherin to the cell surface.52 The importance of MPP5 in several cellular functions can therefore have a role in this outcome. We can infer that the fusion will cause a disruption of normal MPP5, which is crucial for cell polarity. However, one copy of MPP5 is expressed and intact (no rearrangement or mutation), which excludes a complete MPP5 silencing mechanism. Further functional studies should be performed as to determine through which mechanism this fusion is directly affecting cell proliferation.
Materials and methods
Samples and whole-genome sequencing data analysis
The DNA of two PCa samples, the PC346C PCa cell line53 and the G089 PCa patient were sequenced by Complete Genomics (Complete Genomics Inc.). For microarray analyses and whole-genome sequencing, freshly frozen clinical samples were obtained from the tissue bank of the Erasmus University Medical Center. Collection of patient samples has been performed according to national legislation concerning ethical requirements. Use of these samples has been approved by the Erasmus MC Medical Ethics Committee according to the Medical Research Involving Human Subjects Act (MEC-2004-261).
NCBI build 36 (hg18) was used as a reference genome during the mapping and data analysis process. Rearrangements were identified from discordant paired sequence reads (discordant mate pairs), which were mapping to different chromosomes (translocations), different positions on the same chromosome >400 bp apart (deletions, inversions and duplications) or in unexpected orientations (small inversions and tandem duplications). The events are selected based on a minimum threshold of three discordant mate pairs having the same event. However, potential rearrangements were removed if there were any supporting discordant pairs for the same event in a panel of additional normal genomes that had already been sequenced by Complete Genomics23 (Supplementary Methods).
Illumina 1M SNP Array analysis
SNP array data from 47 PCa patients, 11 PCa xenografts, eight PCa cell lines and six noncancer patients were collected. Genotyping was performed using the Infinium Illumina Human 1M probe BeadChip containing 1 072 820 markers, among which 206 665 are in reported CNV regions. The experiments were performed according to the manufacturer's protocol (Illumina) by an accredited service provider (ServiceXS, Leiden, The Netherlands). Data analysis was performed using Nexus Copy Number 5.0 (BioDiscovery, El Segundo, CA, USA). The Human 1M CV HapMap control set provided by the manufacturer was used as a control. The algorithm used for the analysis was SNP-FASST Rank segmentation. Standard settings were adjusted: a significant threshold of 1.0E06, max contagious probe spacing 1000 kb, minimal number of probes per segment 15, high gain 0.6, gain 0.2, loss −0.2, big loss −1, homozygous frequency threshold 0.9, homozygous threshold 0.85, heterozygosity imbalance threshold 0.35 and minimum loss of heterozygosity (region with loss of heterozygosity) length of 5000 (kb). The plotted Log R Ratio and the B Allele Frequency by Nexus Copy Number 5.0 were used to perform CNV calling. Log R Ratio is the ratio between the observed and the expected probe intensity. The expected probe intensity is an interpolation of the mean intensities of the surrounding probe clusters. The B Allele Frequency is a value between 0 and 1, which represents the proportion contributed by one SNP allele (B) to the total copy number.
Affymetrix exon-array analysis
Exon-array data from 89 PCa patient samples, 11 PCa xenografts and six cell PCa cell lines were collected using the GeneChip Human Exon 1.0 ST Array (Affymetrix). The staining, washing and scanning procedures were performed according to the manufacturers protocol (Affymetrix). The processing and RMA quantile normalization of the data were performed using the R-package affy (http://www.bioconductor.org/packages/release/bioc/manuals/affy/man/affy.pdf). The list of potential fusion genes originated from the whole-genome sequencing data was analyzed to determine differential expression of exons in these genes (GSE41410). On the GeneChip Human Exon 1.0 ST Array, 5 362 207 probes (on average 40 probesets per gene) are used to analyze one million exon clusters (collections of overlapping exons). The microarray chip is based on a selection of validated and predicted gene locations. The file containing the raw intensity values for the core probes (derived from RefSeq transcripts or full-length mRNAs) was used to assess exon expression profile.
The PC346C cell line was cultured in DMEM-F12 (BioWhittaker, Verviers, Belgium), supplemented with 2% (volume/volume) FCS (PAN Biotech, Aidenbach, Germany), 1% insulintransferrin-selenium (GIBCO BRL, Gaithersburg, MD, USA), 0.01% BSA (Boehringer, Mannheim, Germany), 10 ng/ml epidermal growth factor (Sigma-Aldrich, Milan, Italy) and 500 U penicillin–streptomycin (BioWhittaker), 100 ng/ml bronectin (Harbor Bio Products, Tebu-bio, The Netherlands), 20 mg/ml fetuine (ICN Biomedicals, Zoetermeer, The Netherlands), 0.1 nM R1881 (Sigma-Aldrich), 50 ng/ml cholera toxin (Sigma-Aldrich), 0.1 mM phosphoethanolamine (Sigma-Aldrich), 0.6 ng/ml triodothyronine (Sigma-Aldrich) and 500 ng/ml dexamethasone (Sigma-Aldrich). The PC346C cell line is an androgen-sensitive PCa cell line derived from the PC346P xenograft. It expresses the wild-type androgen receptor and secretes high levels of prostate-specific antigen. RWPE-1 cells were cultured in keratinocytemedium (GIBCO BRL), supplemented with 5 ng/ml epidermal growth factor, 1% penicillin–streptomycin (BioWhittaker) and 50 mg/l bovine pituitary extract. RWPE-1 is a normal prostate epithelium cell line that is androgen-independent and expresses both the androgen receptor and prostate-specific antigen. Both cell lines were cultured at 37 degrees with 5% carbon dioxide.
RNA and DNA isolation
Total RNA was isolated from the PC346C, RWPE-1, PC346P and G089 using the RNeasy kit (Qiagen, Valencia, CA, USA) and according to the manufacturer’s protocol. RNA was eluted in 50 μl of RNase-free water. Concentration and purity of RNA were assessed using the NanoDrop ND 1000 spectrophotometer (Nanodrop products, Wilmington, DE, USA) by absorption measurements at 260 nm. RNA was stored at −80 degrees. DNA was isolated using the QIAamp DNA Blood Midi Kit (Qiagen) according to the manufacturers’ instructions. Cell pellets were resuspended in 1 ml phosphate-buffered saline (BioWhittaker) and instructions were followed according to the protocol used for 1 ml whole blood. DNA was eluted in 200 μl elution buffer. Concentration and purity of the DNA were assessed using the NanoDrop ND 1000 spectrophotometer (Nanodrop products) by absorption measurements at 280 nm. DNA was stored at −20 degrees.
Purified PCR products have been sequenced bidirectionally using standard Sanger sequencing. A sequencing PCR was carried out using the same forward and reverse primers as used during the RT–PCR except with different primer concentrations (3 ng of primer per reaction) in a 20 μl reaction volume. The PCR product was sequenced on an ABI Model 3730 automated sequencer and analyzed using DNAMAN (Lynnon Corporation, Pointe-Claire, QC, Canada).
Complementary DNA (cDNA) was synthesized using 1 μg total RNA, M-MLV reverse transcriptase kit (Promega, Madison, WI, USA) and Oligo (dT) primer (Invitrogen, Grand Island, NY, USA) according to the manufacturer’s protocol. Reverse transcription was performed at 37 °C for 60 min and 95 °C for 10 min. Reverse transcription PCR for multiple fusions was carried out by amplification of the cDNA or DNA samples with the HotstarTaq Kit (Qiagen). For the standard PCR reactions, 1 μl of cDNA was used in a 50-μl reaction containing 5 μl 10 × PCR Buffer, 2 μl of each primer solution (100 μM), 0.25 μl HotstarTaq (5 units/μl), 1 μl of dNTPs (10 mM) and 40.75 μl of nuclease-free water. An initial denaturation step of 15 min at 95 °C was used to activate the HotstarTaq, followed by 35 cycles consisting of a denaturation step at 95 °C for 30 s, an annealing step at 55 °C for 30 s and an elongation step at 72 °C for 1 min. A final elongation step at 72 °C for 10 min was used. PCR products were checked using electrophoresis of 20 μl of product in a 1% agarose gel. Products of the expected size were extracted from the agarose gel using the Nucleospin Extract II Kit (Macherey-Nagel, Düren, Germany), according to the manufacturer’s protocol.
Quantitative PCR was performed using SYBR Green dye and TaqMan gene expression assays on an Applied Biosystems StepOne Real Time–PCR system (Applied Biosystems, Foster City, CA, USA). All reactions were performed with SYBR Green Master Mix (Applied Biosystems) and 25 ng of both the forward and reverse primer using the manufacturer’s recommended thermocycling conditions. The TaqMan gene expression assays consisted of a combination of custom primers and probes targeting MPP5–FAM71D and FAM71D. Oligo probe MF—5′-IndexTermTCAGGTCAACAGAAGAGGTGA-3′—5′FAM 3'TAMRA and primers 5′-IndexTermTCTCCAACGCACAAGATCTT-3′ (F′) and 5′-IndexTermCAGTTGGCTCGGTTATGAAGG-3′ (R′) were designed to target the MPP5–FAM71D fusion. Oligo probe F—5′-IndexTermCTCCTGACATCTCCTCCTGC-3′—5′FAM 3' TAMRA and primers 5′-IndexTermCTACTGGCCCATCTGACACC-3′ (F′) and 5′-IndexTermAGGCTCATGTTCTCCGCAT-3′ (R′) were designed to target FAM71D. The comparative threshold cycle (Ct) method and the quantitative standard curve method were used to quantify the expression of the target genes. Equal efficiencies of the primers were confirmed using serial dilutions of PCa cDNA. Products from the PCR were resolved by electrophoresis on 1% agarose gels and sequenced as described above if necessary. Quantitative PCR expression values for the TaqMan gene expression assays were normalized by the expression of the housekeeping genes GAPDH and/or PBGD.
Functional role of FAM71D in the PC346C cell line
The functional role of FAM71D in the PC346C cell line was assessed with the knockdown of both FAM71D and MPP5–FAM71D fusion gene. Proliferation assays were performed at different time points. The small interfering RNA reagent siGENOME SMARTpool, Human FAM71D (Thermo Fisher Scientic, Lafayette, CO, USA) interacts with a part of the FAM71D gene, which is also present in the fusion. Briefly, PC346C and RWPE-1 (control) cells were plated in 96-well culture plates at a density of 2000 cells per well (100 μl) and plated in T25 tissue flasks. Outer wells of the culture plates were filled with phosphate-buffered saline in order to avoid culturing artifacts in the proliferation assays. After 2 days, cells were washed using phosphate-buffered saline and transfected using siGENOME SMARTpool, Human FAM71D (final concentration of 100 nM) and Dharmafect 3 reagent according to the manufacturer’s protocol (Thermo Scientific) in medium without P/S. As negative controls, medium only, Dharmafect with water and a transfection with non-targeting small interfering RNA were included. Twenty-four hours after transfection the medium was replaced with normal medium, without P/S and cells were allowed to grow for a maximum of 4 days. Proliferation was assessed using the CellTiter-Glo Luminescent Cell Viability Assay (Promega) according to the manufacturer’s protocol. Apoptosis was assessed using the ApoLive-Glo Multiplex Assay (Promega) that measures both the number of viable cells as a marker of cytotoxicity and caspase activation as a marker of apoptosis within a single assay well. The assay was performed according to the manufacturer’s protocol. As positive control of apoptosis an apoptosis inducer set (Millipore, Billerica, MA, USA) is used. A mix containing 700 × dilutions of Actinomycin D (10 mM), Camptothecin (2 mM), Cycloheximide (100 mM), Dexamethasone (10 mM) and Etoposide (10 mM) was used for 24 h to induce apoptosis in both PC346C and RWPE-1 cell lines.
Siegel R, Naishadham D, Jemal A . Cancer statistics, 2012. Cancer J Clin 2012; 62: 10–29.
Shen MM, Abate-Shen C . Molecular genetics of prostate cancer: new prospects for old challenges. Genes Dev 2010; 24: 1967–2000.
Tomlins SA, Rhodes DR, Perner S, Dhanasekaran SM, Mehra R, Sun X-W et al. Recurrent fusion of TMPRSS2 and ETS Transcription factor genes in prostate cancer. Science 2005; 310: 644–648.
Dong J-T . Chromosomal deletions and tumor suppressor genes in prostate cancer. Cancer Metastasis Rev 2001; 20: 173–193.
Clegg NJ, Couto SS, Wongvipat J, Hieronymus H, Carver BS, Taylor BS et al. MYC cooperates with AKT in prostate tumorigenesis and alters sensitivity to mTOR inhibitors. PLoS One 2011; 6: e17449.
Cheng I, Levin AM, Tai YC, Plummer S, Chen GK, Neslund-Dudas C et al. Copy number alterations in prostate tumors and disease aggressiveness. Genes Chromosomes Cancer 2011; 51: 66–76.
Jin G, Sun J, Liu W, Zhang Z, Chu LW, Kim S-T et al. Genome-wide copy-number variation analysis identifies common genetic variants at 20p13 associated with aggressiveness of prostate cancer. Carcinogenesis 2011; 32: 1057–1062.
Cuzick J, Swanson GP, Fisher G, Brothman AR, Berney DM, Reid JE et al. Prognostic value of an RNA expression signature derived from cell cycle proliferation genes in patients with prostate cancer: a retrospective study. Lancet Oncol 2011; 12: 245–255.
Taylor BS, Schultz N, Hieronymus H, Gopalan A, Xiao Y, Carver BS et al. Integrative genomic profiling of human prostate cancer. Cancer Cell 2010; 18: 11–22.
Vainio P, Wolf M, Edgren H, He T, Kohonen P, Mpindi J-P et al. Integrative genomic, transcriptomic, and RNAi analysis indicates a potential oncogenic role for FAM110B in castration-resistant prostate cancer. Prostate 2011, n/a-n/a 72: 789–802.
Kim J, Yu J . Interrogating genomic and epigenomic data to understand prostate cancer. Biochim et Biophys Acta 2012; 1825: 186–196.
Meyerson M, Gabriel S, Getz G . Advances in understanding cancer genomes through second-generation sequencing. Nat Rev Genet 2010; 11: 685–696.
Pareek CS, Smoczynski R, Tretyn A . Sequencing technologies and genome sequencing. J Appl Genet 2011; 52: 413–435.
Hawkins RD, Hon GC, Ren B . Next-generation genomics: an integrative approach. Nat Rev Genet 2010; 11: 476–486.
Singleton AB . Exome sequencing: a transformative technology. Lancet Neurol 2011; 10: 942–946.
Haimovich AD . Methods, challenges, and promise of next-generation sequencing in cancer biology. Yale J Biol Med 2011; 84: 439–446.
Majewski J, Schwartzentruber J, Lalonde E, Montpetit A, Jabado N . What can exome sequencing do for you? J Med Genet 2011; 48: 580–589.
Cirulli ET, Singh A, Shianna KV, Ge D, Smith JP, Maia JM et al. Screening the human exome: a comparison of whole genome and whole transcriptome sequencing. Genome Biol 2010; 11: R57.
Pareek CS, Smoczynski R, Tretyn A . Sequencing technologies and genome sequencing. J Appl Genet 2011; 52: 413–435.
Asan, Geng C, Chen Y, Wu K, Cai Q, Wang Y et al. Paired-end sequencing of long-range DNA fragments for de novo assembly of large, complex mammalian genomes by direct intra-molecule ligation. PLoS One 2012; 7: e46211.
Parkinson NJ, Maslau S, Ferneyhough B, Zhang G, Gregory L, Buck D et al. Preparation of high-quality next-generation sequencing libraries from picogram quantities of target DNA. Genome Res 2012; 22: 125–133.
Yao F, Ariyaratne PN, Hillmer AM, Lee WH, Li G, Teo AS et al. Long span DNA paired-end-tag (DNA-PET) sequencing strategy for the interrogation of genomic structural mutations and fusion-point-guided reconstruction of amplicons. PLoS One 2012; 7: e46152.
Drmanac R, Sparks AB, Callow MJ, Halpern AL, Burns NL, Kermani BG et al. Human genome sequencing using unchained base reads on self-assembling DNA Nanoarrays. Science 2010; 327: 78–81.
Berger MF, Lawrence MS, Demichelis F, Drier Y, Cibulskis K, Sivachenko AY et al. The genomic complexity of primary human prostate cancer. Nature 2011; 470: 214–220.
Ren S, Peng Z, Mao JH, Yu Y, Yin C, Gao X et al. RNA-seq analysis of prostate cancer in the Chinese population identifies recurrent gene fusions, cancer-associated long noncoding RNAs and aberrant alternative splicings. Cell Res 2012; 22: 806–821.
Prensner JR, Iyer MK, Balbin OA, Dhanasekaran SM, Cao Q, Brenner JC et al. Transcriptome sequencing across a prostate cancer cohort identifies PCAT-1, an unannotated lincRNA implicated in disease progression. Nat Biotechnol 2011; 29: 742–749.
Barbieri CE, Baca SC, Lawrence MS, Demichelis F, Blattner M, Theurillat JP et al. Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancer. Nat Genet 2012; 44: 685–689.
Spans L, Atak ZK, Van Nieuwerburgh F, Deforce D, Lerut E, Aerts S et al. Variations in the exome of the LNCaP prostate cancer cell line. Prostate 2012; 72: 1317–1327.
Lindberg J, Klevebring D, Liu W, Neiman M, Xu J, Wiklund P et al. Exome sequencing of prostate cancer supports the hypothesis of independent tumour origins. Eur Urol 2013; 63: 347–353.
Kumar A, White TA, MacKenzie AP, Clegg N, Lee C, Dumpit RF et al. Exome sequencing identifies a spectrum of mutation frequencies in advanced and lethal prostate cancers. Proc Natl Acad Sci USA 2011; 108: 17087–17092.
Grasso CS, Wu YM, Robinson DR, Cao X, Dhanasekaran SM, Khan AP et al. The mutational landscape of lethal castration-resistant prostate cancer. Nature 2012; 487: 239–243.
Nacu S, Yuan W, Kan Z, Bhatt D, Rivers CS, Stinson J et al. Deep RNA sequencing analysis of readthrough gene fusions in human prostate adenocarcinoma and reference samples. BMC Med Genomics 2011; 4: 11.
Kannan K, Wang L, Wang J, Ittmann MM, Li W, Yen L . Recurrent chimeric RNAs enriched in human prostate cancer identified by deep sequencing. Proc Natl Acad Sci USA 2011; 108: 9172–9177.
Yeager M, Xiao N, Hayes RB, Bouffard P, Desany B, Burdett L et al. Comprehensive resequence analysis of a 136 Kb region of human chromosome 8q24 associated with prostate and colon cancers. Hum Genet 2008; 124: 161–170.
Watson PA, Chen YF, Balbas MD, Wongvipat J, Socci ND, Viale A et al. Constitutively active androgen receptor splice variants expressed in castration-resistant prostate cancer require full-length androgen receptor. Proc Natl Acad Sci USA 2010; 107: 16759–16765.
Yeager M, Deng Z, Boland J, Matthews C, Bacior J, Lonsberry V et al. Comprehensive resequence analysis of a 97 Kb region of chromosome 10q11.2 containing the MSMB gene associated with prostate cancer. Hum Genet 2009; 126: 743–750.
Wu C, Wyatt AW, Lapuk AV, McPherson A, McConeghy BJ, Bell RH et al. Integrated genome and transcriptome sequencing identifies a novel form of hybrid and aggressive prostate cancer. J Pathol 2012; 227: 53–61.
Lapuk AV, Wu C, Wyatt AW, McPherson A, McConeghy BJ, Brahmbhatt S et al. From sequence to molecular pathology, and a mechanism driving the neuroendocrine phenotype in prostate cancer. J Pathol 2012; 227: 286–297.
Hermans KG, Boormans JL, Gasi D, van Leenders GJ, Jenster G, Verhagen PC et al. Overexpression of prostate-specific TMPRSS2(exon 0)-ERG fusion transcripts corresponds with favorable prognosis of prostate cancer. Clin Cancer Res 2009; 15: 6398–6403.
Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R, Horsman D et al. Circos: An information aesthetic for comparative genomics. Genome Res 2009; 19: 1639–1645.
Sneddon TP, Church DM . Online resources for genomic structural variation. Methods Mol Biol 2012; 838: 273–289.
Sebat J, Lakshmi B, Troge J, Alexander J, Young J, Lundin P et al. Large-scale copy number polymorphism in the human genome. Science 2004; 305: 525–528.
Iafrate AJ, Feuk L, Rivera MN, Listewnik ML, Donahoe PK, Qi Y et al. Detection of large-scale variation in the human genome. Nat Genet 2004; 36: 949–951.
Meena Kishore S, Vincent TKC, Pandjassarame K . Distributions of exons and introns in the human genome. In Silico Biol 2004; 4: 387–393.
Rippe V, Drieschner N, Meiboom M, Escobar HM, Bonk U, Belge G et al. Identification of a gene rearranged by 2p21 aberrations in thyroid adenomas. Oncogene 2003; 22: 6111–6114.
Ota T, Suzuki Y, Nishikawa T, Otsuki T, Sugiyama T, Irie R et al. Complete sequencing and characterization of 21,243 full-length human cDNAs. Nat Genet 2004; 36: 40–45.
Jackson AP, Berry A, Aslett M, Allison HC, Burton P, Vavrova-Anderson J et al. Antigenic diversity is generated by distinct evolutionary mechanisms in African trypanosome species. Proc Natl Acad Sci USA 2012; 109: 3416–3421.
Funke L, Dakoji S, Bredt DS . Membrane-associated guanylate kinases regulate adhesion and plasticity at cell junctions. Annu Rev Biochem 2005; 74: 219–245.
Gosens I, Sessa A, den Hollander AI, Letteboer SJF, Belloni V, Arends ML et al. FERM protein EPB41L5 is a novel member of the mammalian CRB–MPP5 polarity complex. Exp Cell Res 2007; 313: 3959–3970.
Lee M, Vasioukhin V . Cell polarity and cancer–cell and tissue polarity as a non-canonical tumor suppressor. J Cell Sci 2008; 121: 1141–1150.
Straight SW, Shin K, Fogg VC, Fan S, Liu CJ, Roh M et al. Loss of PALS1 expression leads to tight junction and polarity defects. Mol Biol Cell 2004; 15: 1981–1990.
Wang Q, Chen X-W, Margolis B . PALS1 regulates E-cadherin trafficking in mammalian epithelial cells. Mol Biol Cell 2007; 18: 874–885.
Marques RB, Erkens-Schulze S, de Ridder CM, Hermans KG, Waltering K, Visakorpi T et al. Androgen receptor modifications in prostate cancer cells upon long-termandrogen ablation and antiandrogen treatment. Int J Cancer 2005; 117: 221–229.
We would like to thank André Uitterlinden from the Department of Internal Medicine, Erasmus MC for microarray assistance, Arno van Leenders from the Department of Pathology, Erasmus MC for patient sample selection, Wytske van Weerden from the Department of Urology, Erasmus MC for the expertise in PC346C cell line model systems, Complete Genomics Inc. for assistance with the NGS data and the patients whose material was used for this study. This research was made possible by financial contributions from CTMM, project PCMM (project number 03O-203), the FP7 Marie Curie Initial Training Network PRO-NEST (grant number 238278) and the Foundation for Scientific Urological Research (SUWO).
The authors declare no conflict of interest.
Supplementary Information accompanies this paper on the Oncogene website
About this article
Cite this article
Teles Alves, I., Hartjes, T., McClellan, E. et al. Next-generation sequencing reveals novel rare fusion events with functional implication in prostate cancer. Oncogene 34, 568–577 (2015). https://doi.org/10.1038/onc.2013.591
- gene fusions
- next-generation sequencing
- prostate cancer
BMC Cancer (2019)
Newly identified LMO3-BORCS5 fusion oncogene in Ewing sarcoma at relapse is a driver of tumor progression
BMC Genomics (2015)
Comprehensive characterization of complex structural variations in cancer by directly comparing genome sequence reads
Nature Biotechnology (2014)