Delineating colorectal cancer distribution, interaction, and risk prediction by environmental risk factors and serum trace elements

The burden of colorectal cancer (CRC) is increasing worldwide especially in developing countries. This phenomenon may be attributable to lifestyle, dietary and environmental risk factors. We aimed to determine the level of 25 trace elements, their interaction with environmental risk factors, and subsequently develop a risk prediction model for CRC (RPM CRC). For the discovery phase, we used a hospital-based case–control study (CRC and non-CRC patients) and in the validation phase we analysed pre-symptomatic samples of CRC patients from The Malaysian Cohort Biobank. Information on the environmental risk factors were obtained and level of 25 trace elements measured using the ICP-MS method. CRC patients had lower Zn and Se levels but higher Li, Be, Al, Co, Cu, As, Cd, Rb, Ba, Hg, Tl, and Pb levels compared to non-CRC patients. The positive interaction between red meat intake ≥ 50 g/day and Co ≥ 4.77 µg/L (AP 0.97; 95% CI 0.91, 1.03) doubled the risk of CRC. A panel of 24 trace elements can predict simultaneously and accurate of high, moderate, and low risk of CRC (accuracy 100%, AUC 1.00). This study provides a new input on possible roles for various trace elements in CRC as well as using a panel of trace elements as a screening approach to CRC.


Linkage analysis of trace elements identifies disturbed clustering in colorectal cancer.
Our results showed a clear alteration of the TE levels in CRC samples compared to non-CRC samples. The correlation between the 25 TEs were observed more in CRC patients (Table 1), with a clear clustering formed between CRC and non-CRC samples. The linkage analysis showed three clusters with a distance of 10-15 in the CRC group, and two clusters in the non-CRC group (Fig. 2). The essential TEs with antioxidant function, i.e., Se and Zn, clustered together in the non-CRC group but were in different clusters in the CRC group. PCA using 14 selected TEs clustered the CRC patients in a clear grouping compared to the non-CRC patients, and this clustering performed better than using the 25 TEs (Fig. 3). The variance for the 14 TEs improved to 56.3% for the three main components compared to the 40.8% from 25 TEs. However, it increased to 70.4% for the five main components compared to 54.1% for 25 TEs.
Serum TEs as biomarkers. The levels of the 14 significant TEs were analysed further to obtain the respective cut-off screening values for CRC ( Table 2). The cut-off points obtained were compared with the values from the Agency for Toxic Substances and Disease Registry (ATSDR). From the comparison, only Be and As were within the normal range by the ATSDR but not the others. Most of the TE levels by the ATSDR were measured using AAS (atomic absorption spectrometry) and the reference values were derived from mainly the Western population. Only Be and Zn had AUC values ≥ 0.80; therefore, the subsequent analysis for determining TEs as biomarkers used the ratio of Be and Zn with the other 12 TEs. The Co/Zn ratio had the highest AUC, followed by the ratios of Be/Zn (0.86) and Rb/Zn (0.85) ( Table 2).
The interaction between serum TE levels and environmental factors. CRC patients with red meat intake ≥ 50 g/day showed the highest contribution of risk due to interaction with Co ≥ 4.77 µg/L followed by Zn < 1103.06 µg/L and Al ≥ 95.02 µg/L. After controlling the confounder factors, the interactions contributed 97% (Co), 95% (Zn), and 88% (Al) of CRC risk (Table 3). Only Zn < 1103.06 µg/L with white meat intake < 50 g/  www.nature.com/scientificreports/ TE-environmental risk factor combination ( Table 4). The ANN algorithm analysis of the training data (n = 159) yielded higher values for accuracy, sensitivity, specificity, PPV, and NPV for the CRC RPM using a panel of the 14 TEs. For CRC RPM using environmental risk factors, the SVM algorithm determined higher accuracy for the training data (83.0%), followed by the results using the ANN algorithm (79.8%). However, with the test data, the SVM and ANN algorithms determined 10% and 2% accuracy, respectively, for the CRC RPM. Therefore, the lower RMSE (root mean square error) value was required to select the best algorithm for the CRC RPM. The ANN algorithm determined a low RMSE value for the RPM using environmental risk factors. The best algorithm for the CRC RPM for the TE-environmental risk factor combination was LR. This model had the highest accuracy and AUC value compared to using the 14 TEs or environmental factors alone.
The CRC RPM was further tested with the ASX CRC cases and showed that the 14-TE panel (81.1%) was the best model (Fig. 4). Although the RPM using the 14-TE panel yielded higher accuracy and AUC value, it was not specific. The RPM analysis showed that the 14-TE panel could predict CRC risk among the asymptomatic population.
CRC RPM for high, moderate, and low CRC risk. The CRC RPM developed in the discovery phase showed good accuracy in predicting high and low CRC risk. However, the accuracy decreased when tested with ASX CRC www.nature.com/scientificreports/ samples. This might have been because the data used in RPM development were from patients with CRC, hence reflecting a late stage pathology as compared to the asymptomatic stage. Therefore, the CRC RPM required improvement with the inclusion of data from ASX CRC itself. The improved CRC RPM could evaluate CRC risk simultaneously into three levels: high (CRC), moderate (ASX CRC), and low (non-CRC). Before the new CRC RPM was developed, selection of variables for the TEs and environmental risk factors was required that could differentiate the CRC, ASX CRC, and non-CRC groups. Only the selected variables were included in the CRC RPM development. The level of 24 TEs (Ag, Al, As, Ba, Be, Cd, Co, Cr, Cs, Cu, Ga, Li, Mg, Mn, Ni, Pb, Rb, Se, Sr, Tl, U, V, Zn, Hg) and eight environmental risk factors (age, ethnicity, comorbidities, smoking status, physical activity, obesity, red meat intake, white meat intake) were significantly different between the three groups and were included in model development.
The CRC RPM using the 24-TE panel produced the highest accuracy (100%) after testing with the test data (Table 5). This was followed by the CRC RPM using the TE-environmental risk combination (86.5%) and environmental risk factors alone (67.3%). Besides that, the LR algorithm was selected for all three CRC RPMs, with the training data yielding high accuracy. Although the SVM algorithm yielded higher accuracy for the CRC RPM using environmental risk factors as compared to LR, the accuracy decreased by almost 25% for the SVM algorithm as compared to LR (7.1%). Therefore, LR was selected for the CRC RPM using environmental risk factors.
CRC RPM accuracy was evaluated using the validation data (n = 69). The highest accuracy for the CRC RPM was based on the 24-TE panel (Fig. 5). The findings confirm the model's consistency in predicting CRC risk with good accuracy, sensitivity, specificity, PPV, NPV, and AUC value. It showed that the CRC RPM could perform accurate predictions using the 24-TE panel compared to RPMs using the TE-environmental risk factor combination or environmental risk factors alone. The CRC RPM was not only able to predict high CRC risk in individuals, but also among individuals with moderate CRC risk and who did not have any CRC symptoms.

Discussion
Determination, comparison, and classification of TE levels between CRC and non-CRC cases. In the present study, we identified a panel of 14 TEs (Li, Be, Al, Co, Cu, Zn, As, Se, Cd, Rb, Ba, Hg, Tl, Pb) that separated CRC from the non-CRC samples. We noted that 10 of the 14 TEs (Li, Be, Al, Co, Rb, Ba, As, Hg, Tl, Pb) have not been reported to be altered in patients with CRC. Be, Al, Co, Rb, Ba, As, Hg and Pb have been reported for other cancer types, but not for CRC [17][18][19][20][21][22][23][24] . High or low levels of TEs can possibly contribute to CRC through various mechanisms. Among the mechanisms that have been reported is inhibition of DNA repair, inhibition of DNA methylation, increased oxidative stress, and altered gene expression 25 . www.nature.com/scientificreports/ www.nature.com/scientificreports/ We found that the levels of Se, Zn, and Cd are in concordance with the results of a previous study on patients with CRC [26][27][28] . The low levels of Se and Zn we found in the patients with CRC have also been reported previously. It is believed that Se acts through an antioxidant defence system to reduce oxidative stress and minimise DNA damage 29 . Similar to Se, Zn is an important co-factor in antioxidant enzymes (superoxide dismutase [SOD], GPx) and is involved in the defence systems of the body 30 . In vivo and in vitro studies have proven that Zn can prevent cancer development through apoptosis mechanisms 31 . High Cd levels in serum 32 and tissue 33 have also been reported in patients with CRC. Cd is also a heavy metal and has been categorised as a human carcinogen 34 . The mechanisms involved in CRC formation are through oncogene activation and the inhibition of apoptosis 35 .
For Cu, our findings were inconsistent when compared to previous studies. We found that patients with CRC had high Cu levels, but Milde et al. 36 37 .
In the present study, most patients with CRC were diagnosed at Dukes' stage C. This indirectly explains why the patients with CRC in the present study had high Cu levels compared to the patients without CRC. Khoshdel et al. 38 reported the same finding in a large sample of patients with CRC from Iran (n = 119), but unlike the present study using ICPMS, they used AAS. Therefore, the difference in Cu level findings in patients with CRC should be investigated further in a different and larger cohort of samples.
We found that there was more correlation between TEs and CRC than with non-CRC cases. Ba, Cs, Ga, U, Li, Ag, Mn, U, Sr, Pb, Tl, Sr, Be, Al, V, and Co had positive correlation values > 0.5. These TEs were all present in high quantities in the patients with CRC compared to the values recommended by the accredited ATSDR. When  Correctly classified  146  157  147  39  37  37  132  127  126  29  31  30  155  157  159  39  38  39   Incorrectly classified  13  2  12  1  3  3  27  32  33  11  9  10  4  2  0  1   www.nature.com/scientificreports/ there is a correlation between TEs, especially at high levels, it is likely to have a toxic effect on the human body and thus could lead to CRC formation 39 . Patients with CRC also have disrupted TE distribution, especially for essential TEs such as Cu, Zn, and Se. These TEs were grouped into different clusters in patients with CRC patients compared to non-CRC cases. Feng et al. 40 studied patients with breast cancer and found that these three TEs are closely related to the status of oxidative stresses that can contribute to cancer formation. There are few findings on the correlation between TEs and their distribution patterns in CRC. However, there is a significant relationship between the TEs and their distribution in patients with CRC as compared to patients without CRC.
Biomarkers using TEs attracted more attention following the reporting of evidence from previous studies on TEs and disease risk 41 . TEs have been used for differentiating to patients with and without cancer such as in breast cancer 42 , lung cancer 43 , prostate cancer 44 , and CRC 45 . Although TEs have attracted a lot of attention as potential cancer biomarkers, the cut-off values of the respective TEs have not been determined. Reference sources remain scarce, and the normal level values are typically referred through the ATSDR website (https :// www.atsdr .cdc.gov/). The TE values on the ATSDR website are more relevant to the general population rather  Correctly classified 161  141  168  46  39  47  142  110  125  31  32  35  168  157  168  42  45  45   Incorrectly classified  7  27  0  6  13  5  26  58  43  21  20  17  0  11  0  10  7   www.nature.com/scientificreports/ than to respective diseases including cancers. Moreover, the reference values were established in the last decade based on the Western population. Therefore, a screening cut-off value for patients with CRC itself is much needed to be used for identifying those with high CRC risk. Our findings cut off value for differenting CRC population almost similar with ATSDR value for Be and Zn.
In the present study, though the level of 14 TEs could differentiate CRC and non-CRC samples, only Be and Zn levels had AUC value of ≥ 0.8. The cut-off values for Be and Zn which we have proposed are in the range set by the ATSDR in the general population 46,47 . For the other 12 TEs the AUC values were < 0.8 hence they are less useful as individual biomarkers 48 . Therefore, the cut-off values for Be and Zn we have proposed can be used as reference or screening values for patients with CRC. The cut-off values from this findings may varies with different population but our finding cut off value are in line with ATSDR suggestion in differentiating CRC and non-CRC.
Apart from the individual TE, it has also been suggested that TE ratios can be used as biomarkers. We found that the AUC value can be improved through the use of the TE ratio rather than a single TE. In the present study, the Co/Zn ratio yielded the highest AUC value. However, no study to date has assessed Co or even the Co/Zn ratio as a biological marker for CRC. However, it has been suggested that the Cu/Zn ratio be used as a biomarker, but no AUC or sensitivity values have been specified for the ratio 37 . Although previous studies have focused on the Cu/Zn ratio, our findings on the Co/Zn ratio require further validation of its potential use as a CRC screening test.

The interaction between serum TEs and environmental factors.
We also showed that the interaction between excessive red meat intake with low Zn levels could increase CRC risk. Red meat is a rich source of Zn 49 . Excessive red meat intake increases Zn levels, but its bioavailability depends on homeostasis. Homeostasis is maintained in the gastrointestinal system through the absorption of exogenous Zn, and the secretion and excretion of Zn endogenously 50 . Imbalanced diet, such as food with high-phytate composition (e.g., grains and legumes) 51 and the presence of certain intestinal microbes 52 are two examples of factors that can interfere with the effectiveness of Zn homeostasis. This decreases the amount of Zn in the body even with excessive red meat intake. Low Zn levels reduce antioxidant responses for neutralising oxidative stress 53 . Also, the carcinogenic mechanisms of red meat content 54 can double CRC risk.
Cooking with utensils made from Al or wrapping food in Al foil can cause Al leaching into food 55 . Turhan 56 showed that Al content was increased by 89-378% if red meat was cooked and wrapped with Al foil. Marinating meat with a mixture of citric acid and lactic acid and wrapping it with Al foil can further enhance the Al content of the meat through leaching 57 . Red meat also has high quantities of Co compared to white meat 58 . The increased Co in red meat can occur through the provision of foods containing high quantities of Co, such as alfalfa seeds or linseed (animal food) 59 . Consequently, excessive red meat consumption indirectly increases Al and Co levels in the human body. The combination of red meat intake with high Al or Co levels stimulates carcinogenic mechanisms in CRC formation 60,61 .
We also showed that the interaction between low intake of white meat (< 50 g/day) and low Zn levels contributed to higher CRC risk. Unbalanced diets 51 and the presence of certain intestinal microbes 52 can cause decreased Zn levels due to disturbance of Zn homeostasis. White meat does not produce carcinogens as compared to red meat, but as a result of low Zn levels, oxidative stress remains uncontrolled 62 , increasing CRC risk.
The factor of obesity combined with low Zn levels or high Co levels also increases CRC risk. Zn levels decrease with increased body mass index 63 . Adipose tissue causes systemic changes in the human body, including altering the levels of insulin, insulin-like growth factor-1, leptin, adiponectin, steroids, and cytokines 64 . This can interfere with Zn homeostasis and cause Zn deficiency 65 . In addition, lower levels of a Zn transporter gene, ZIP14 (SLC39a15), have been reported in obese individuals 66 and result in Zn reduction in the body 67 . Obesityinduced endocrine changes and gene expression cause low Zn levels. Thus, it can increase oxidative stress and DNA damage 30,68,69 , which further contribute to CRC formation. However, the association between obesity and Co levels remains unknown 70 , as does its relation to the mechanism of disease. CRC RPM. We developed RPMs for CRC based on TEs and environmental risk factors. The model was tested on three groups of patients: high-risk (CRC), moderate-risk (ASX CRC), and low-risk (non-CRC). Early in the CRC RPM development, the addition of environmental risk factors to the TEs increased the accuracy of the CRC RPM. However, the accuracy decreased when tested on the ASX CRC group. This may have been due to an inaccuracy of the environmental risk factors information, which relied heavily on the patient's memory. Hence, the environmental risk factor information obtained is more likely to be biased 71 than the quantitative measurement of TEs in the patient's blood.
The 14-TE panel (Li, Be, Al, Co, Cu, Zn, As, Se, Cd, Rb, Ba, Hg, Tl, Pb) could predict high and low CRC risk but was less precise for the moderate risk group. The development of a new CRC RPM using a 24-TE panel (Ag, Al, As, Ba, Be, Cd, Co, Cr, Cs, Cu, Ga, Li, Mg, Mn, Ni, Pb, Rb, Se, Sr, Tl, U, V, Zn, Hg) increased the value of each performance parameter, especially accuracy. This enabled CRC risk assessment to be classified into three categories, i.e., high, medium, and low. This risk stratification method is useful for early detection of patients with high CRC risk 72 . Hence, colonoscopy and tissue biopsy for determining CRC diagnosis may be prioritized to high-risk individuals first, followed by moderate-risk individuals. Early detection of CRC can be performed through this predictive model even if the patient does not show any clinical symptoms of CRC.
To date, there is no CRC RPM using TEs. However, previous studies have shown that TEs can be used to predict the risk of other cancers and diseases. For example, Guo et al. 73  www.nature.com/scientificreports/ importance of knowledge of the TEs in the human body for use as a predictor of CRC risk. The futher validation may needs from other source population before CRC RPM can be performed in the community. The main study strength are the novelity of the findings related to TE and less invasive biomarker contribution for early detection of CRC. However, futher validation needs to be done for a more accurate and sensitive results. Information bias is an avoidable situation as we mainly depends on self-reported information. We try to reduce the bias by confirming the self-reported information with family members.
In conclusion, public awareness of healthy and balanced nutrition needs to be improved. Increased awareness of environmental risk factors in the community can reduce the risk of CRC. In Malaysia, various awareness programs have been organised therefore including CRC screening. We would like to recommend the 24-TE panel developed in this study as a screening test for individual stratification with different levels of CRC risk, i.e., high, moderate, or low risk. High-risk individuals should take priority in colonoscopy and tissue biopsy procedures for determining CRC diagnosis, followed by moderate-and low-risk individuals.

Materials and methods
Participants. Discovery Phase. All participants were newly diagnosed CRC patients from the Universiti Kebangsaan Malaysia (UKM) Medical Centre, Malaysia. Patients were excluded if they had more than one cancer, history or finding of polyps, inflammatory bowel disease (IBD) during colonoscopy, and history of toxic exposure during work. We enrolled 102 patients with CRC and 102 patients without CRC. The participants were interviewed to obtain information on environmental risk factors and underwent blood-taking for TE analysis after histopathology result confirm CRC or not.
Validation phase. All participants from The Malaysian Cohort (TMC) 76 who are diagnosed with CRC during follow up were included as asymptomatic (ASX) CRC. Initial recruitment started in April 2006 through to the end of September 2012. The information on CRC diagnosis was based on self-reporting during follow-up or from mortality data from the Malaysian National Registration Department. Based on information obtained until June 2017, 85 ASX CRC cases were included in this study.
All participants accepted the terms of the study and provided written informed consent. The study was approved by the UKM Medical Research Ethical Committee (FF-2015-380) as following by the guidelines set out in the Declaration of Helsinki.
Environmental risk factors. All participants completed a set of questionnaires adapted from TMC study, which consisted of information on demographics, socioeconomic status, family history of cancer, comorbidity, smoking status, alcohol consumption, diet intake, body mass index, and physical activity. Diet intake and physical activity were assessed using the food frequency questionnaire and International Physical Activity Questionnaire-Malaysia (IPAQ-M), respectively 77 . The information gather are self-reported and the interview were done by several enumerator. A training session on questionnaire was conducted to minimise the potential interview bias. For the validation phase, the information was extracted from the TMC database.
Quantification of trace elements. Fasting blood was processed to obtain the serum and stored at − 80 °C until analysis. Samples were pre-treated with acid digestion. The multi-element analysis of  Statistical analysis. Statistical analyses were performed using STATA/SE 13.0, SPSS Modeler version 18, and MetaboAnalyst 4.0. The normality distributions of quantitative data such as TE levels were checked by histogram and the Kolmogorov-Smirnov test. The 25 TEs between the CRC, non-CRC, and ASX CRC samples were compared using the independent t-test or analysis of variance for data with a normal distribution; the Mann-Whitney U or Kruskal-Wallis tests were used for data with non-normal distribution. The inter-relationship between each pair of TEs was investigated using Pearson correlation analysis. The distribution pattern of circulating TEs was plotted based on principal component analysis (PCA) and cluster analysis (CA). The best cut-off value for CRC was determined using receiver operating curve (ROC) analysis and the Youden index. The significance level was established at p < 0.05.

Risk Prediction Model for CRC . The Risk Prediction Model (RPM) was developed based on machine
learning (ML) algorithms. First, in the CRC RPM for discovery phase, the data were divided into two sets by the partition node of SPSS Modeler for developing a prediction model using three common ML algorithms: logistic regression (LR), support vector machine (SVM), and artificial neural network (ANN). Of the overall data, 80% (n = 159) were used for model development; the remaining 20% (n = 40) were used for model testing. The CRC RPM was validated among the ASX CRC cases (n = 85). Next, an improved CRC RPM with the inclusion of ASX CRC was developed using the same three ML algorithms. The data were divided into three sets: model development, 60% (n = 168); model testing, 20% (n = 52); and model validation, 20% (n = 69).
The independent variables data consisted of different units and therefore required data normalisation. The normalisation was scaled within the range of 0-1 79 . This scaling is suitable for improving the accuracy of numeric computation by the ML algorithms. Accuracy (the percentage of testing data correctly predicted by the model), sensitivity (the proportion of patients with CRC), specificity (the proportion of patients without CRC correctly Scientific Reports | (2020) 10:18670 | https://doi.org/10.1038/s41598-020-75760-9 www.nature.com/scientificreports/ identified by the model), positive predictive value (PPV), negative predictive value (NPV), and area under the curve (AUC) were used for measuring the performance of the prediction models. Ten-fold cross-validation was used to measure the unbiased estimate of the three prediction models for comparing their performance.

Data availability
All data generated or analysed during this study are included in this published article (and its Supplementary Information files).