Allele-specific expression of mutated in colorectal cancer (MCC) gene and alternative susceptibility to colorectal cancer in schizophrenia

Evidence has indicated that the incidence of colorectal cancer (CRC) among schizophrenia is lower than normal. To explore this potential protective effect, we employed an innovative strategy combining association study with allele-specific expression (ASE) analysis in MCC gene. We first genotyped four polymorphisms within MCC in 312 CRC patients, 270 schizophrenia patients and 270 controls. Using the MassArray technique, we performed ASE measurements in a second sample series consisting of 50 sporadic CRC patients, 50 schizophrenia patients and 52 controls. Rs2227947 showed significant differences between schizophrenia cases and controls, and haplotype analysis reported some significant discrepancies among these three subject groups. ASE values of rs2227948 and rs2227947 presented consistently differences between CRC (or schizophrenia) patients and controls. Of the three groups, highest frequencies of ASE in MCC were concordantly found in CRC group, whereas lowest frequencies of ASE were observed in schizophrenia group. Similar trends were confirmed in both haplotype frequencies and ASE frequencies (i.e. CRC > control > schizophrenia). We provide a first indication that MCC might confer alterative genetic susceptibility to CRC in individuals with schizophrenia promising to shed more light on the relationship between schizophrenia and cancer progression.


Results
Allele and genotype distributions of the genetic variants within MCC gene. Genotype distributions of all four polymorphisms showed no significant deviations from Hardy-Weinberg equilibrium in any of the three groups in the first sample series (312 CRC, colorectal cancer patients; 270 SZ, schizophrenia patients; 270 NC, normal controls). Data of all the markers assayed are summarized in Table 1, corresponding to each paired groups. We observed that the allele frequencies of rs2227947 presented statistically significant differences between SZ and NC (p = 0.005, p = 0.020 after the FDR correction) ( Table 1). The T allele and TT genotype of rs2227947 were significantly more common in schizophrenia group compared to control group (allele, 71.6% versus 63.4%, OR 0.69, 95% CI 0.53-0.89; genotype, 51.5% versus 42.1%). Moreover, we further assessed the allele frequencies of the four markers in the second sample series, which consisted of 50 CRC, 50 SZ, and 52 NC ( Table 2). We found a similar distribution of the allele frequencies in the two sample series, and also rs2227947 showed significant differences in the allele frequencies between SZ and NC in the second sample set (p = 0.039, p > 0.05 after the FDR correction) ( Table 2).

LD estimation and haplotype analysis.
For each pair of markers, we recruited SHEsis to calculate LD between groups (SZ vs NC, CRC vs NC, and CRC vs SZ) ( Table 3). All these four markers were observed to be in strong LD (D' > 0.7), and we therefore estimated the haplotype distributions with these markers among the three independent groups. Haplotypes were omitted from analysis if the estimated haplotype probabilities were less than 3% in any of the three groups. Haplotypes which showed positive results were selected for presentation (Table 4). Moreover, haplotype analysis reported some significant global p values ( Table 5). The haplotype, rs9122-rs2227948, was the most significant giving a global p = 0.0002 between CRC and SZ. As its frequency was greater in CRC than in SZ (or NC), the haplotype G-C-T (rs9122-rs2227948-rs2227947) was observed to be correlated with an increased odds ratio for CRC (CRC vs NC: p = 0.018, OR = 2.70, 95% CI 1.07-6.81; CRC vs SZ: p = 0.001, OR = 3.10, 95% CI 0.98-9.83), and the same situations were also observed in G-C (rs9122-rs2227948), G-T-C (rs9122-rs2112452-rs2227948), and G-T-C-T (rs9122-rs2112452-rs2227948-rs2227947).

ASE measurements of individual SNPs and the MCC gene.
Using a second sample set comprised of 50 CRC patients (including cancerous and normal tissues), 50 schizophrenia patients and 52 normal controls, we studied ASE for both individual SNPs (rs9122, rs2112452, rs2227948 and rs2227947) and for the whole MCC gene (Figs 1 and 2; A, cancerous tissue; B, normal tissue from the same CRC patient; S, schizophrenia patient; N, normal control). The Wilcoxon rank sum test of rs2227948 showed positive results between A and N, and between S and N (A vs N, p = 0.0374; S vs N, p = 0.0112). For rs2227947, obvious differences were observed between B and N and between S and N (B vs N, p = 0.0228; S vs N, p = 0.0384) ( Table 6).
Assessment of ASE imbalances between the three subject groups. We further tested the overall diagnostic accuracy of varied cut-off points using the ROC analysis and Youden's index as described previously 18 . Taking rs2227948 as an example, the final ASE cut-off point was achieved by maximizing Youden's index at an ASE value of 1.2 (Table 7), which means a 17% difference in expression level between the two alleles. Consequently, all the heterozygous individuals were divided into ASE group (ASE ≥ 1.2 or ASE ≤ 0.83) and non-ASE group (0.83 < ASE < 1.2).
As all subjects involved in the present study belonged to either the ASE or non-ASE group, we compared ASE or non-ASE frequencies between CRC patients, schizophrenia patients and normal controls using the χ 2 test or Fisher's exact test (Table 8). With respect to informative individuals, MCC and all the individual markers except Scientific RepoRts | 6:26688 | DOI: 10.1038/srep26688 rs2112452 represented notable ASE imbalances (i.e. discrepancy in ASE frequencies) between CRC group and control (or schizophrenia) group, and significant discrepancy in ASE frequencies as to rs2112452 was found between CRC subjects and schizophrenia subjects. There were no statistically significant ASE imbalances between cancerous tissue and normal tissue from the same CRC patient, or between schizophrenia group and control group. When ASE assessment was performed in either informative subjects or all subjects (i.e. informative & non-informative subjects), the results corresponding to both the MCC gene and individual polymorphisms showed consistently higher ASE frequencies in CRC group (A or B) than in schizophrenia group (S) or control group (N). Although the differences in ASE frequencies were not statistically significant between schizophrenia group and control group, lower frequencies of ASE were consistently observed in schizophrenia group (S) compared to control group (N) (Fig. 3).

Discussion
In an earlier study, we found that shared mechanisms underpinning cell cycle regulation and synaptic plasticity provide further support for the association between schizophrenia and colorectal cancer 23 . Evidence that MCC is heavily involved in the negative regulation of the cell cycle 9 along with other evidence described above, led us to conduct a genetic analysis of the MCC gene among paired groups (CRC vs schizophrenia, CRC vs control, schizophrenia vs control), using a creative strategy combining association studies with ASE measurements.   We firstly conducted the association analysis by genotyping four SNPs, all of which were selected from the HapMap project database (http://www.hapmap.org) and dbSNP (http://www.ncbi.nlm.gov/SNP/). The data based on 852 Han Chinese subjects provides an indication that MCC might be involved in the alterative genetic susceptibility to CRC in individuals with schizophrenia. There were statistically significant differences of allele frequencies between SZ (schizophrenia patients) and NC (normal controls) at rs2227947 surviving the FDR correction. We observed that the T allele and TT genotype of rs2227947 were more frequent in SZ than in NC. Additionally,  we observed a similar distribution of the allele frequencies by comparing the four markers between the first and second sample series (Tables 1 & 2). Since haplotypes constructed from closely located polymorphisms will typically increase the statistical power for association studies, we performed haplotype analysis in the four markers which presented strong linkage disequilibrium (D' > 0.7). Our results indicated that rs9122-rs2227948 and rs9122-rs2112452-rs2227948 showed consistently significant differences in global frequencies between CRC (colorectal cancer patients) and NC or between CRC and SZ (Table 5). Besides, we found that the four haplotypes, including G-C (rs9122-rs2227948), G-T-C (rs9122-rs2112452-rs2227948), G-C-T (rs9122-rs2227948-rs2227947) and G-T-C-T (rs9122-rs2112452-rs2227948-rs2227947), were more common in CRC compared to SZ or NC (Table 4), in particular we observed that there was a trend in haplotype frequencies, i.e. CRC > NC > SZ, which implied that MCC might confer an alternative genetic susceptibility to CRC among SZ. On the basis that the extent of ASE for susceptibility genes might be tissue-dependent, we introduced the TiGER database to study tissue-specific gene expression levels. As shown in Fig. 4, no significant discrepancy was observed in MCC gene expression among brain tissue, colon tissue and peripheral blood, thus ensuring the feasibility of performing ASE detection with the blood samples of schizophrenia cases and normal controls. Taking a step further, we tried to test the accuracy of the MassArray platform in ASE analysis by mixing experiments. A concordant rectilinear correlation at each of the four SNPs (rs9122, rs2112452, rs2227948 and rs2227947) was found between the input of the two allelic variants and the resulting ratio (Fig. 5).
Our findings provide a unique perspective on genetic protection against CRC in patients with schizophrenia which might involve the MCC gene. For the heterozygous samples, ASE values of rs9122, rs2112452, rs2227948     (Fig. 1), and that of the MCC gene ranged from 0.52 to 3.79 (Fig. 2). Moreover, we were able to ensure the accuracy and repeatability of the ASE analysis based on our data by calculating each single ASE value as the average of four different ratios. We applied the Wilcoxon test to study the ASE degree of both individual SNPs and MCC, and found that the analysis of rs2227948 and rs2227947 reported concordantly positive results, implying a difference in ASE degree existed between CRC and NC, or between SZ and NC ( Table 6). Both rs2227948 and rs2227947 are synonymous markers located in the coding region of the MCC gene, and it has been implied that synonymous SNPs may play a role in regulating mRNA secondary structure and stability, and exert downstream effects on the rate of translation, folding and post-translational modifications of nascent polypeptides 24,25 . On the other hand, ASE of synonymous SNPs has been documented in recent studies 26,27 , and since ASE is regarded as a molecular mechanism capable of modulating gene expression, the differences in the gene's expression caused by ASE may further facilitate the impact of synonymous markers on gene functions, which therefore might contribute to the pathological conditions. The genomic mechanisms causing ASE have been indicated to be a combined cis and trans effect 19 . In the present study, even if there were no statistically significant differences in the allele frequencies of rs2227948 between CRC (or schizophrenia) and control group, the discrepancies in the ASE values of rs2227948 were still observed between CRC (or schizophrenia) patients and controls, which might be due to a collaboration between genomic variants in cis and trans under pathological conditions. On the other hand, it has been demonstrated that ASE segregates not only with the phenotypes but also with the haplotypes covering all or part of the gene 19 . Actually we found that the haplotypes including G-C-T (rs9122-rs2227948-rs2227947) and G-T-C-T (rs9122-rs2112452-rs 2227948-rs2227947), were more common in CRC group compared to schizophrenia or control group (Table 4), and similar trends were also observed in the ASE frequencies of both rs2227948 and rs2227947 (Table 8), further supporting the indication that genomic changes in cis are present.
Using ROC analysis and Youden's index, we were able to determine the final ASE cut-off points, and thus all the participants were subjected to either ASE or non-ASE group 18 . With respect to MCC and all the single markers, ASE imbalances were notable between CRC and NC, or between CRC and SZ, and the frequencies of ASE were observed to be accordingly higher in CRC than in SZ or NC. Additionally, we found that the frequencies of ASE were consistently lower in SZ than in NC (Table 8). A trend similar to that observed in haplotype frequencies has also been found in ASE frequencies, i.e. CRC > NC > SZ (Fig. 3). In addition, our results showed no obvious ASE imbalances in cancerous tissues in comparison to the normal tissues from the same CRC patients. We further assessed the ASE imbalances by including non-informative individuals, and observed a similar situation as described above (Table 8). Of note, ASE analysis is more complex in the MCC gene compared to individual markers since it is more difficult to determine whether the individual is informative (heterozygous for a transcribed SNP) or not for the MCC gene compared to the individual SNP. When only informative individuals were taken into account, ASE of MCC occurred in 14/27 (51.9%) CRC subjects, 1/23 (4.3%) schizophrenia subjects, and 6/25 (24.0%) controls. If none of the non-informative individuals had ASE, the ASE frequencies as to MCC would be 14/50 (28.0%) in CRC group, 1/50 (2.0%) in schizophrenia group, and 6/52 (11.5%) in control group. Since not all individuals are informative regarding the MCC gene, recruiting more coding/UTR markers will be necessary to precisely assess the ASE frequencies of MCC among the three independent groups. Overall, the observations suggested that ASE might be involved in the complex relationship between schizophrenia and CRC tumorigenesis. ASE analysis has provided an effective way to explore the impact of genetic variations on gene expression 28 . Heterozygous individuals were tested for allelic transcript levels that differed from each other, taking advantage of allelic ratios of gDNA as a control of 1:1 hybridization intensity 29 . Because the mRNA expression levels of two alleles of a heterozygous SNP are captured in the same cellular environment within the same sample, the alternative alleles serve as within-sample controls of each other, eliminating genetic background and environmental influences, as well as technical noise which affects both alleles equally, and making ASE assay more reliable for detecting significant differences 30,31 . Moreover, allele-specific expression analysis has been applied to identifying eQTLs (expression quantitative trait loci) which measures gene expression discrepancies among individuals with different genotypes, and can be integrated to facilitate the mapping of likely regulatory variants 29 .
A better understanding of the source of ASE imbalances will be necessary to clarify the mechanisms underlying phenotypic diversity and disease susceptibility. It is worth noting that as a common phenomenon found ASE cut-off Sensitivity Specificity Youden's index  in humans, mice and maize, ASE exerts its impact on both normal development and many cellular processes, but if impaired, can lead to an increased risk of disease as indicated in the present study [32][33][34][35] . Though the allelic imbalances we mentioned here can be extremely subtle, they may have expansile/cumulative influences on the downstream signaling pathway. With the present limitations, still we are unable to clarify why allelic imbalances were naturally more common in CRC patients, and the cause of ASE remains unresolved, which may be due to cis/trans effect, or combined cis & trans effect 19 . Moreover, it remains elusive how small ASE imbalances could cause a phenotype. There were some limitations in our study. Firstly, our sample size is relatively small. Because of the differences in genotype frequencies among CRC patients, schizophrenia patients and normal controls, and also because of the preponderance of homozygous individuals for the risk alleles in cases, it's difficult to get a balanced representation of all genotypes and a desirable sample size. Secondly, the four SNPs we selected could not cover the whole region of MCC, thus additional replication studies using more SNPs in large Asian and non-Asian samples are needed. For obvious reasons, it is not possible to obtain gut tissue samples from SZ and NC, having to resort to use blood samples from schizophrenia subjects and controls.
We have provided a first indication that ASE as an inherited gene expression marker in MCC is more frequently found in CRC patients compared to schizophrenia patients and healthy controls, and the similar trends observed in both haplotype frequencies and ASE frequencies (i.e. CRC > NC > SZ) imply that MCC might be involved in the alternative genetic susceptibility to CRC among schizophrenia patients. Our present work and, hopefully, follow-up studies should together provide new insights into the question between schizophrenia and cancer progression (The abstract of our study was published at a scientific meeting). The advances in ASE analysis will no doubt shed more light on the critical etiopathogenesis of human complex diseases.  . Participants with sporadic CRC had all undergone curative surgery in the Ruijin Hospital, Shanghai. Cancerous and normal tissue (>10 cm) samples from the same CRC patient were immediately frozen in liquid nitrogen at the time of collection and were used for ASE analysis. Pathologic tumor staging was classified according to Duke's criteria. For ASE measurements, peripheral blood samples were collected from schizophrenia patients and normal controls.
All the schizophrenia subjects were interviewed by two independent clinicians, and were diagnosed strictly according to the DSM-III-R criteria and hospital case notes. Normal controls with no personal history of cancer or psychiatric disorders were selected randomly from the population. All the schizophrenia patients and normal controls were from Shanghai. All participants signed the full informed consent. All experimental protocols were reviewed and approved by the Ethics Committee of the Human Genetics Center in Shanghai. All experiments were carried out in accordance with the standard procedures.
Nucleic acid extraction and genotyping. Peripheral blood was drawn from schizophrenia cases and normal controls and was collected in EDTA tubes. Genomic DNA and RNA were isolated from peripheral blood using the phenol-chloroform method and QIAamp RNA blood mini kit (Qiagen, Valencia, CA) respectively. Extraction of DNA from tissue was performed by a proteinase K and standard phenol-chloroform procedure. Tissue samples for total RNA extraction were processed with TRIzol reagent (Invitrogen, Carlsbad, CA) and were then treated with DNase I (DNAfree TM , Ambion, Austin, TX) prior to cDNA synthesis using Superscript TM III First-Strand Synthesis System for RT-PCR (Invitrogen, Carlsbad, CA).  All four genetic markers (rs9122, rs2112452, rs2227948 and rs2227947) were selected from the HapMap project database http://www.hapmap.org and dbSNP http://www.ncbi.nlm.nih.gov/SNP/ covering the MCC region. Only coding/UTR markers with higher minor allele frequency (MAF > 15%) in the Han Chinese population were recruited in the present study. Of the four markers, rs9122 and rs2112452 locate in the 3′ -UTR, whereas rs2227948 and rs2227947 locate in the coding region and both of them are synonymous markers. We performed the standard 5 μ l PCR using Taqman ® Universal PCR Master Mix (Applied Biosystems) reagent kit, and genotyped all the markers using the ABI 7900 DNA detection system (Applied Biosystems, Foster City, California).

Assessment of tissue-specific expression and MassArray technique.
We interrogated gene expression differences among brain tissue, colon tissue and peripheral blood taking advantage of TiGER (Tissue-specific Gene Expression and Regulation, http://bioinfo.wilmer.jhu.edu/tiger/) database 36 , which integrates tissue-specific gene expression profiles or expressed sequence tag data, cis-regulatory module data and combinatorial gene regulation data.
In addition, we used mixing experiments to assess the accuracy of MassArray technology in ASE measurements. DNA homozygous for one allele was first mixed in known proportions with DNA homozygous for the other allele 37 and we were thus able to confirm whether the peak strengths of the two alleles were comparable at different proportions of the two alleles.
Allele-specific expression (ASE) analysis. For informative individuals (i.e. heterozygotes), all four polymorphisms (rs9122, rs2112452, rs2227948 and rs2227947; MAF > 15%) within the MCC gene were subjected to MassArray detection in order to assess the expression discrepancies of the two alleles. We typed genomic DNA (gDNA) to identify heterozygotes for the markers described above since ASE to date can only be readily For each SNP of the MCC gene, the two homozygous genomic DNAs as to the different alleles were mixed at known ratios (i.e. expected allele ratios). Observed allele ratios were obtained by the MassArray analysis of amplification products on each mixture. Pearson's correlation test was involved in a further analysis.
Scientific RepoRts | 6:26688 | DOI: 10.1038/srep26688 determined in subjects that are heterozygous for cSNPs or UTR SNPs. The distribution of heterozygous individuals corresponding to each SNP was presented in Table 9.
We fulfilled allele quantification for each targeted SNP, taking advantage of the MassArray platform which is based on MALDI-TOF-Mass Spectrometry 38 and can capture sequence differences at the single nucleotide level when combined with iPLEX Gold reaction kit (Sequenom, Inc). The different mass signals corresponding to the two alleles can be detected by the highly accurate MALDI-TOF-based system. All the amplification and extension primers were designed by Assay Designer 4.0 (Sequenom, Inc).
The ASE ratio was measured as described previously 18 , and was normalized using the formula: ASE ratio = cDNA (allele a expression/allele b expression)/gDNA (allele a expression/allele b expression). Allelic-expression ratios were represented by the measured peak area ratios in cDNA and gDNA. For each SNP of each heterozygous subject, ASE detection was performed with two independent cDNA preparations each in duplicate so that ASE was calculated as the average of four different ratios 18 . For the MCC gene of each individual, the final ASE value was calculated as the average of the ASE values for all four polymorphisms. Statistical analysis. Allelic and genotypic distributions, Hardy-Weinberg equilibrium and Linkage disequilibrium (LD) were calculated on SHEsis (http://analysis2.bio-x.cn/myAnalysis.php) 39 , an online user-friendly platform which integrates efficient analysis tools particularly suited to association studies. We estimated LD by recruiting "D" as the standardized measure for all possible pairs of SNP loci. The program UNPHASED was used to perform haplotype analysis 40 . The odds ratios (OR) and 95% confidence intervals (CI) were calculated by using unconditional maximum likelihood estimation (Wald) method and normal approximation. In addition, significant P values of association analysis for multiplicity were subjected to a false discovery rate (FDR) controlling procedure 41 .
We used the Wilcoxon rank sum test to compare ASE values among the three independent groups, namely CRC subjects, schizophrenia subjects, and controls. 100,000 permutations were carried out to further compare the means of these three groups. Receiver operating characteristic (ROC) analysis and Youden's index were employed to assess the overall diagnostic accuracy of different cut-off points 18 . Using R version 2.9.1, we performed the χ 2 test or Fisher's exact test (where expected cell count was less than 5) to compare ASE/non-ASE proportions between groups. Statistical significance level was set at p < 0.05.  Table 9. Distribution of informative individuals. SNP = single nucleotide polymorphism, informative individuals = heterozygous individuals.