Systematic mutation analysis in rare colorectal cancer presenting ovarian metastases

Although colorectal cancer is one of the most lethal cancer types in the world, its metastasis to the ovary is rare, compared to metastasis to other organs. Consequently, the genomic basis for colon-to-ovary metastasis remains unstudied, due to limited available patients, and thus there have been no attempts to construct individual-specific networks. Due to its rarity, the small sample size makes common mutations difficult to find. To overcome this problem, we herein attempted to apply a biological connectivity map called a sample-specific network (SSN), to reveal common biological functions in three samples. Our three samples were compared to a clinical dataset contained in The Cancer Genome Atlas (TCGA) Colorectal Adenocarcinoma (COAD), showing different mutational spectra, compared to matched samples based on age, gender, microsatellite instability (MSI) status, and tumor, node, metastasis (TNM) stage. The SSNs for the three samples revealed significant correlations of the mutation statuses of several apoptosis genes, in contrast to the TCGA-matched samples. Further analysis of a targeted-gene panel sequencing dataset for colon-to-ovary metastasis of primary tumor samples also confirmed significant correlations of the mutational statuses among apoptosis genes. In summary, using SSN, we successfully identified a common function (apoptosis) among our three patients having colon-to-ovary metastasis, despite no common mutations in the three patients. Such computational analyses could facilitate productive study of rare cancers and other diseases.


Pathway mutations in CRC tumors.
We first examined the mutational profiles of our patients, to subsequently built SSNs, using a method to determine how mutational co-occurrences of each patient's ovarian CRC metastasis differed in samples matched for age, gender, and tumor stage 10 . Such data, of five samples selected (henceforth, "TCGA-matched samples"), was retrieved from The Cancer Genome Atlas (TCGA), allowing us to build tumor mutational co-occurrence networks. We first built such networks from mutation data of the TCGA-matched samples, using the Pearson correlation coefficient as an association index. This network, henceforth referred to as a "reference network", contained molecular-level mutational co-occurrences among genes 10 . Then, we added each patient's primary tumor mutational profile to the mutation dataset of the TCGA-matched samples, generating a "perturbed network", i.e., a network derived from a "reference network, " by adding a sample of interest 10 . Then, we used a statistical test to identify differential "edges" between the reference network and the perturbed network 10 . Here, a "node" represents a gene, while an edge represents a relationship, between two nodes, in the perturbed network. The value of an edge represents a quantitative strength of how two nodes are related. The entire framework of our method is illustrated in Fig. 1.
We identified mutational profiles of members of the WNT beta-catenin signaling pathway, including MAPK, PI3K, TGF-β, and p53 pathways, when comparing our three patients to the TCGA-matched samples (Fig. 2). In Fig. 2, along with mutational occurrences of our three patients, we juxtaposed frequencies of genetic alterations of a non-hypermutated group (defined as a mutation rate of <8.24 per 10 6 base pairs) 11 from the TCGA COlorectal ADenocarcinoma (COAD) report in 2012 11 . While only one of our three patient samples had a somatic APC mutation, the TCGA-matched samples all showed mutated APC (81% of the non-hypermutated group had APC alterations) in the WNT/beta-catenin signaling pathway (Fig. 2). Although all three of our patients had TP53 mutations, only 60% of the TCGA-matched samples had these (59% of non-hypermutated group showed TP53 alteration) (Fig. 2). For PIK3CA (a member of the PI3K signaling pathway), one of the three, and 40% of the TCGA-matched samples, had mutations, although only 15% were found altered in the non-hypermutated group (Fig. 2). The DNA double-strand break repair enzyme gene, ATM, was not mutated among our patients, although 20% of the TCGA-matched samples did possess ATM mutations, as did 7% of the non-hypermutated samples 11 (Fig. 2). Differential correlation changes, of mutation statuses, among genes, in our three patients. The apoptosis signaling pathway was a common functional context among our three patient ovarian primary tumors, when we constructed three perturbed networks, and then merged these into one network (Fig. 3a). These assessments revealed that apoptosis-related gene correlation coefficients of mutational statuses (between two neighboring genes), were changed significantly in three of our patients, #5, #8, and #9. In all our patients, the neurofilament peptide gene NEFH (not mutated in the network of the TCGA-matched samples) correlated with genes belonging to the apoptosis process (Fig. 3b). For patient #5, IL13RA2 and ROCK1 (a Rho kinase) negatively correlated with NEFH, in terms of their mutation statuses. ROCK1 was also a crosstalk gene between the apoptosis and mitotic spindle pathways. For patient #8, GSN (gelsolin), which facilitates crosstalk between the apoptosis and coagulation pathways, negatively correlated with NEFH (Fig. 3b). For patient #9, ERBB2, BCL2L10, PDCD4, and PACRG, all showed negative correlations with NEFH (Fig. 3b). ERBB2, an oncogene, and BCL2L10, belonging to the apoptosis pathway, positively correlated with each other and PDCD4, of the estrogen response late pathway (Fig. 3b). Since patient #9 had HER2 amplification, its crosstalk genes (PDCD4, BCL2L10, NEFH) should be later examined for correlation with ERBB2, and possible clinical associations. Changes in gene correlation are described in Table 2.  Table 1. Clinico-pathological data of the three patients. Figure 1. Framework of building sample-specific networks. (a) We built an association network (a "reference network"), using mutation information from TCGA-matched samples. Each of our samples was compared to the reference network, to build another association network of the sample, i.e., a "perturbed network". We then used a Pearson correlation-based statistical test to identify differentially correlated edges between the "reference network" and the "perturbed network". We applied this process to all patients' somatic mutation sample datasets. www.nature.com/scientificreports www.nature.com/scientificreports/ Figure 2. Diversity and frequency of genetic changes in our patients. APC was reported highly altered in the TCGA COAD report 11 and we sought similar mutation ratios in the TCGA-matched samples. However, in our patients, only patient #5 had an APC mutation. All our patients had mutated TP53, which was mutated in only 60% of the TCGA-matched samples. Each gene has two box rows. In the upper box row, the left cell indicates a percentage of mutation observed from the TCGA-matched samples, and the right cell represents a percentage of alterations reported in TCGA COAD. A gray-filled box means a missing value. In the lower box row, three cells represent mutation statuses of patients #5, #8, and #9, from left to right (red cells = mutation; white cells = wild type).  Table S1). Consequently, we built two mutational co-occurrence networks for the primary CRC and metastasized ovarian tumors, to compare their topological configurations. We hence observed a network structural similarity between the two networks, based on single nucleotide variations (SNVs, Supplementary Table S1). Overall, significant genes, and their correlations, were preserved in the two mutational co-occurrence networks, although there were changes of neighboring partners or correlational statuses ( Supplementary Fig. S2). For example, the gene pairs APC-CDC27, and CDC27-ROCK1 were positively connected in the CRC network, while these became negatively connected in the metastatic network.

Discussion
In this study, we studied the mutational landscape of three patient samples of rare ovarian colorectal (CRC) metastases, as compared to their primary CRC tumors. The mutational co-occurrences of our three samples showed different mutational co-occurrences, both in age-/tumor stage-matched samples in the TCGA, and in a non-hypermutated group, a TCGA COlorectal ADenocarcinoma (COAD) report from 2012 11 . All three patients had TP53 mutations, which were present in only 60% of the TCGA-matched and non-hypermutated patient samples. For the CRC-causing gene APC, in the WNT/beta-catenin signaling pathway, all the TCGA-matched, and 81% of non-hypermutated, samples, had mutations, but only one of our three ovarian metastasis-suffering patients had somatic APC mutations.
We also detected significant correlation changes of mutations statuses of genes, belonging to apoptosis, in all three SSNs derived from our patients' samples, when we applied a statistical method to identify significant differential correlation changes. Another oft-present gene, NEFH, encodes a heavy polypeptide that is assembled into neurofilaments 13 , the main cytoskeletal components of a mature neuron 14 . Aberrations in neurofilaments have been implicated in numerous neurological diseases [15][16][17] , although NEFH has been little studied in cancer 18,19 , where it was observed in human autonomic nerve tumors and central neurocytomas [20][21][22] . Hypermethylated NEFH may also confer poor prognosis in breast cancer 23 , and its dysregulation has been observed in lung, prostate, and renal cancers [24][25][26] . Our three patients showed negative correlations in NEFH with six genes, PDCD4, ERBB2, BCL2L10, ROCK1, IL3RA2, and GSN, in the hallmark apoptosis gene set (Fig. 3b). NEFH, however, had no relation with these six genes in the reference network.
In the independent targeted-gene panel sequencing analysis ( Supplementary Fig. S1b), NEFH mutation statuses directly associated with the oncogene ERBB2 mutation statuses, which was consistent with our dataset. NEFH mutation statuses also associated with CTNNB1 mutation statuses, a member of the WNT oncogenic signaling cascade, through ERBB2 and SMAD7 (Supplementary Fig. 1b; patterns 2 and 6). NEFH also associated with  Table 2. Correlation coefficient changes in sample-specific networks. W AB and ΔW AB are described in detail in the Methods section. We also added the relevant hallmark gene sets for an interaction of two genes. The p-value was obtained from the Z-value that showed a ΔW AB of an edge was significant statistically. a Pearson correlation coefficient between two gene pair in a "reference network". b Degree of Pearson correlation coefficient change between two gene pair in a "perturbed network". www.nature.com/scientificreports www.nature.com/scientificreports/ APC (another WNT pathway member, upstream of CTNNB1), through GSN, in our dataset analysis (Fig. 3b), Thus, these findings might implicate NEFH in WNT signaling in cancer 19,27,28 , even as NEFH's role in cancer is yet to be discovered. Recent studies 19,27,28 have reported NEFH protein to be linked to Akt/beta-catenin signaling 27,28 , as well as a tumor suppressor role 19 . Due to mutation status changes of the WNT genes (APC and CTNNB1), NEFH might lose its tumor suppressor role, leading to apoptosis dysregulation.
Metastasis is a complex, multi-step process that leads to the accumulation of genomic alterations in primary tumor cells, during malignancy progression, from the primary tumor, to localized invasion, to metastasis 29,30 . Genomic changes, and characteristics of metastatic-stage tumors, are complex and only partially understood. Chemotherapy-naïve primary tumors, and their paired metastases, frequently share many altered genes at both sites, especially in liver metastasis 31 . However, emerging evidence suggests that treatment of colon cancer patients with anti-EGFR agents increases intra-tumor heterogeneity, leading to the emergence of clones with different genetic alterations 32 . To assess this, we used chemo-naïve synchronous colon cancer-to-ovary metastasis samples. Also, considering well-conserved network configurations between co-occurrence networks of the primary CRC and the metastasized ovarian networks (Supplementary Fig. S2), our systematic approach discovered that the paired primary CRC and metastasized ovarian tumors share many altered genes and less heterogeneity.
Colon cancer with ovarian metastasis is less responsive to chemotherapy, compared to non-ovarian metastases 5,6 . Likewise, we observed no tumor response of ovarian metastases 5 , whereas the tumor regression rate for non-ovarian metastatic lesions was over 65%, in a series of twenty-two patients with colon cancer ovarian metastases 5 . Based on these reports, the ovary has been regarded as a "sanctuary" organ for resistance to systemic chemotherapy. Patient #5 has now survived over four years. Thus, the underlying mechanism of chemoresistance needs to be determined, including several tumor features such as volume, cystic properties, and mucin-richness, as these may contribute to poor sensitivity to chemotherapy 6 .
The observation of HER2 (ERBB2) amplification in one tumor can have major impact on treatment strategies in one patient, and for the patient #9, after 1 cycle of lapatinib, the patient was switched to palliative care, due to disease progression. Although the majority of metastatic CRC patients had HER2 (ERBB2) amplification 33 , our study only considered mutations, not amplifications, and moreover HER2 amplification in one of our patients might be insufficient to explain biology in ovarian metastasis.
In summary, although ovarian metastasis from the colon is rare 9 , precluding (due to insufficient sample sizes) most statistical methods that merely identify common mutations, we successfully applied network biology to this disease. The result of that analysis was that these patients all shared a common biological function, i.e., a perturbed network involving the process of apoptosis. We strongly assert that such approaches will allow analyses of similarly uncommon maladies, facilitating increased understanding of rare disease pathogenesis.

Materials and Methods
Patient samples. Three patients who were pathologically diagnosed with colon cancer, with ovary metastases, were included in this study. All subjects were recruited from Gachon Gil Medical Center (Gachon University, South Korea), and the study was approved by the Institutional Review Board of Gachon University Gil Medical Center (IRB#GIRBA2216). Written informed consents were obtained from the patients, in accordance with institutional guidelines. All methods were performed in accordance with the relevant guidelines and regulation. These patients were chemo-naive at the time of primary colon surgery, and were diagnosed with ovary metastasis during the first operation.
Patient #5. A 54-year old woman, presenting with ongoing lower abdominal pain, had obstructing distal sigmoid colon cancer, with synchronous right ovary metastasis and multiple lung metastases. The patient's initial cancer embryonic antigen (CEA) level was 85.31 ng/mL, and she underwent palliative anterior resection, with bilateral oophorectomy, when the primary tumor and ovary metastases were collected and freshly frozen. The pathologic features were as follows: wild type KRAS, T4a lesion, lymphovasular invasion, perineural invasion, and 9 of 20 lymph node metastases with negative resection margins. The patient underwent palliative chemotherapy with FOLFIRI (leucovorin calcium, 5-fluorouracil, and irinotecan) and cetuximab. Currently, the patient is in therapeutic remission for stable lung metastasis.
Patient #8. A 52-year-old female presented with a 3.5-cm sigmoid colon cancer with 10-cm sized right Krukenburg's tumor, and multiple lung metastases. The patient underwent palliative low anterior resection, with total hysterectomy and bilateral oophrectomy (tissue procurement of the primary tumor and ovarian metastasis). The pathologic features were as follows: adenocarcinoma, wild type KRAS, moderate differentiation, direct invasion of the uterine myometrium (T4b), lymphovascular invasion, perineural invasion, and 20 positively metastatic of 60 lymph nodes dissected. Thirteen months later, the patient developed a small recurrent tumor in her lumbar spine, and underwent radiation therapy (3600 cGy from 300 cGy x 12 fractions), and 12 cycles of oxaliplatin-based chemotherapy (FOLFOX6) thereafter. Shortly after chemotherapy, lung metastasis progressed. The patient still has slowly progressive, multiple lung metastases refractory to cetuximab and bevacizumab. The patient is still alive, four years after the initial surgery.
Patient #9. A 51-year old female presented with obstructing sigmoid colon cancer, with occult ovarian metastasis, and underwent anterior resection, with intraoperative bowel lavage with simultaneous bilateral oophorectomy. Pathologic examination revealed metastases in both ovaries, with wild-type KRAS and HER2 amplification. After 12 cycles of oxaliplatin-based chemotherapy, the regimen was switched to bevacizumab plus irinotecan-based chemotherapy. Shortly after 1 cycle of lapatinib, the patient was switched to palliative care, due to disease progression. (2019) 9:16990 | https://doi.org/10.1038/s41598-019-53182-6 www.nature.com/scientificreports www.nature.com/scientificreports/ Clinicopathological features of the three patients are listed in Table 1, and surgical specimens from the three above-described patients, their corresponding abdominopelvic CT scans, and histologies, are shown in Fig. 4. Whole-exome sequencing. We used exome sequencing, at 200X coverage, to profile somatic mutations in primary colon cancers, and synchronous ovary metastases.
DNA preparation. Genomic DNA was extracted from tumors and normal tissues, using a QIAamp DNA mini kit (Qiagen Inc, Valencia, CA, USA), according to the manufacturer's protocol. DNA quality and quantity were assessed using a Nanodrop spectrometer (Nanodrop Technologies, Wilmington, DE, USA) and a Qubit fluorometer (Life Technologies, Invitrogen, Carlsbad, CA, USA).
Experimental DNA sequencing. Whole-exome sequencing was performed using SureSelect Human All Exon V5 51 Mb (Agilent, Santa Clara, CA, USA), following the manufacturer's standard protocol. A paired-end DNA sequencing library was prepared through genomic DNA shearing, use a Covaris ™ (Woburn, MA, USA) sonicator, followed by peak detection, end repair, poly A-tailing, paired-end adaptor ligation, and amplification. After hybridization of the library with bait sequences for 24 hours, the captured library was purified and amplified with an index barcode tag, and the library quality and quantity were determined. Sequencing of the exome library was carried out using the 100-bp paired-end mode of the HiSeq SBS kit (Illumina, San Diego, CA).
Whole-exome sequencing processing and alignment. Sequence reads in FASTQ format were mapped to the human assembly UCSC hg19, using the Burrows-Wheeler Aligner (BWA, v0.7.7) 34 with 'mem' and seed value parameters '-k 45' , to create SAM files with correct mate pair information, with a read group tag that included the sample name. We then used Picard (v1.92) to convert SAM files to compressed BAM files, and sorted the BAM files by chromosome coordinates. The Genome Analysis Toolkit (v2.3.9Lite) 35 was used to locally realign the BAM files at intervals that may have insertion/deletion (indel) alignment errors. Somatic mutations and indels were "called" with Mutect 36 and GATK Somatic Indel Detector (Broad Institute), respectively. Single nucleotide variants and indels were annotated using snpEff (v4.1a) 37 to classify variants as synonymous, non-synonymous, missense, or frameshift point mutations and frameshift indels. We provide all variants detected in primary CRC tumors (single nucleotide variants in Supplementary Table S3 and indels in Supplementary Table S4) and metastasized ovarian tumors (single nucleotide variants in Supplementary Table S5 and indels in Supplementary Table S6).

TCGA sample selection for reference network construction. Dataset from The Cancer Atlas Genome
(TCGA). We downloaded a TCGA primary colon cancer 11 mutation dataset from the UCSC Cancer Genomics Browser 38 (last accessed on 02-21-2018), namely, the version of TCGA COAD mutation database 2015-02-24. From this mutation dataset, we selected five samples under the following conditions; female, age at initial pathologic diagnosis between 46 and 59 years-old, MSI, and tumor stage IV (IVA and IVB included). The age range of the five samples was approximately close to that of our three ovarian metastasis-bearing patients. These samples were referred to as "TCGA-matched samples" throughout the manuscript. Supplementary Table S2 lists the clinicomolecular characteristics of the TCGA-matched samples selected for the reference network construction.
Mutational co-occurrence network. We calculated a correlation coefficient between mutation statuses of patients for each gene pair (genes A and B) to build mutational co-occurrence profiles, using Pearson's correlation coefficient (PCC).
where cov is a covariance, and σ is a standard deviation. We then constructed a reference network of mutational co-occurrences in the TCGA-matched samples, with Pearson correlation coefficient (PCC) cutoff criteria; |PCC| > 0.46 and p-value < 0.05, according to Green et al. 39 . Correlation states refer to positive, negative, and noncorrelation. A PCC > 0.46 was considered a positive correlation, and a PCC < −0.46 a negative correlation. A PCC between −0.46 and 0.46 was considered noncorrelation. Genes of interest in calculating PCCs focused on the hallmark gene sets provided by MIT MSigDB 40 .
Detecting significant correlation changes between two genes. For SSN construction for our individual samples, Lie et al. 10 , showed that the statistical significance of an edge, (correlation of mutation statuses) between two genes (equivalently, nodes; say, nodes A and B), could be verified by the Z-test, and the Z-value could be defined as below: where W AB is a PCC value (for the TCGA-matched samples) of an edge between nodes A and B; X AB is a PCC value between nodes A and B, when the TCGA-matched samples plus our individual samples were combined; ΔW AB is X AB minus W AB .; and n is a number of samples used to build the reference network. Edges in associated networks were selected from those correlation values that changed significantly in a statistical test. We only considered gene pairs with one type of association change (ΔW AB ), gain-of-correlation, which indicates noncorrelation of W AB changed to either positive or negative correlation, of X AB . We did not consider the case where correlation direction (e.g., positive, negative) of W AB was not changed in X AB , or was changed to noncorrelation.
Targeted gene sequencing dataset of colorectal cancer patients metastasized to ovaries. To evaluate the consistency of the functional contexts of our dataset, we evaluated another colorectal cancer dataset metastasized to ovaries 12 . This dataset was a targeted gene sequencing dataset using its own custom-made