Exposome approach for identifying modifiable factors for the prevention of colorectal cancer

Previous studies have shown certain exposure factors (such as lifestyle and metabolism) are associated with colorectal cancer (CRC) events. However, the application of the exposome theoretical frame and the extent to which the exposome domain can modulate the risk of CRC remain unknown. Our study aimed to construct valid exposome measurements and examine the relationship between exposome counts and the risk of CRC. This study included 335,370 individuals in the UK Biobank. We used exploratory factor analysis to identify a valid construct of exposome factors. We then summed the exposome counts within each domain. Cox proportional hazard models were used to estimate the hazard ratios and 95% confidence intervals of CRC risk related to the exposome factors and counts. During an 8.69 year median follow-up, 10,702 CRC cases were identified. Five domains were extracted from 12 variables, including ecosystem, lifestyle, tobacco and alcohol use, social economics, and social support. The Cox model results showed that the ecosystem was positively related to the reduced CRC risk (HR = 0.970; 95% CI 0.952–0.989). Similar results were also found among the domains of healthy lifestyles (HR = 0. 889; 95% CI 0.871–0.907), and no tobacco and alcohol use (HR = 0.892; 95% CI 0.876–0.909). The disadvantageous social economic (HR = 1.081; 95% CI 1.058–1.105) and insufficient social support domains (HR = 1.036; 95% CI 1.017–1.056) were associated with an increased risk of CRC. Similar risk trends were also observed across the exposome count groups with CRC incidence. Our findings suggest that certain exposure domains are related to the incidence of CRC. Ecosystem, lifestyle, and social factors can be incorporated into prediction models to identify individuals at high risk of CRC.

Colorectal cancer (CRC) is the third most commonly diagnosed malignancy and the second leading cause of cancer-related deaths worldwide 1 . In the UK, more than 42,000 new diagnosed cases are reported 2 . In addition, colorectal cancer is the 4th most common cancer in the UK, accounting for 11% of all new annual cancer cases 2 . The lack of apparent symptoms in the early stages of CRC causes a heavy disease burden for people 3 . The majority of patients are diagnosed at the late stage, leading to a poor five-year net survival of 10% in the UK 2,4 .
Environmental factors have been increasingly recognised to play an important role in diseases 20,21 . Resultingly, a new approach is necessary to elucidate carcinogenesis to inform early detection strategies to modulate the risks of CRC. The exposome approach aims to capture the diversity and range of complete environmental exposures in epidemiological studies, providing a comprehensive description of various exposures 21 . However, the influence of all exposome factors on health outcomes is poorly understood 22 . This means that few studies explored the whole set of exposome factors from a macro perspective to reveal the whole mechanism of modifiable factors in Exposome factors and CRC . Previous studies have found that certain exposure factors are associated with CRC events. In this section, we divide the literature review into four parts to demonstrate the relationship between exposome factors and CRC.
First, in terms of lifestyle domain, numerous studies have suggested lifestyle factors to be associated with the risk of CRC [10][11][12][13] . A case-control study in Germany derived five modifiable lifestyle factors, namely smoking, alcohol consumption, diet, physical activity, and body fat 10 . This study found that adherence to a healthy lifestyle was associated with a reduced risk of CRC. A study using the UKB cohort also indicated that a healthier lifestyle (including body mass index [BMI], waist-hip ratio, physical activity, sedentary time, lower processed and red meat consumption, higher vegetable and fruit intake, lower alcohol consumption, and reduced tobacco smoking) contributed to a reduced risk of CRC 12 . A similar trend can also be seen in a study by Wang et al. 25 . The study used data from the Nurses' Health Study  and the Health Professionals Follow-up Study , and suggested that a healthy lifestyle was associated with a lower incidence of CRC.
Second, in terms of social domain, few studies have focused on the influence of social determinants on the risk of CRC. Hastert et al. used data from the Vitamins and Lifestyle Study to examine the relationship between socioeconomic status and CRC incidence 26 . Living in the lowest SES areas was associated with a higher CRC incidence than those living in the highest SES areas. One study also indicated that a disadvantaged socioeconomic position in childhood was related to an increased risk of CRC 27 . Social support was another social factor that affected CRC events, however, the findings of the studies differed. One study in Copenhagen, Denmark found no significant association between social networks and CRC incidence 28 , while Ikeda et al. found that lower social support was associated with a higher incidence of CRC among men 29 .
Third, in terms of the ecosystem domain, few studies have evaluated the effect of ecosystems on CRC incidence. However, increasing studies have suggested that ecosystems play a role in cancer incidence 30,31 . A study from the United States suggested that higher neighbourhood walkability was correlated with lower incidence of multiple myeloma incidence 32 . Green space is a significant factor in cancer research. It has been regarded as a protective factor against mouth and throat 30 , skin 33 and breast cancers 34 . Exposome variables. Based on the conceptual framework, we initially identified 12 items related to exposome factors. These factors were physical activities, diet, drinking, smoking, obesity, low income, unemployment, high school, seldom confiding in someone, feel isolated, garden percentage (300 m buffer), natural environment percentage (300 m buffer). The definitions of the variables can be seen in Appendix Table S1.
Healthy physical activity was coded as 1 if residents had < 150 min/week moderate, < 75 min/week vigorous, or < 150 min/week mixed (moderate + vigorous) activity. A healthy diet was coded as 1 if the respondents ate ≥ 4 ideal food groups. The ideal food groups included: fruits, ≥ 3 servings/day; vegetables, ≥ 3 servings/day; fish, ≥ 2 servings/week (counted by oily and non-oily fish); processed meat, ≤ 1 serving/week; and unprocessed meat, ≤ 1.5 serving/week. Smoking status was coded as 1 for those who had never smoked or had previously smoked. Drinking status was coded as 1 if participants drank moderately. Normal weight and waist circumference were coded as 1 if an individual's BMI was ≤ 25 kg/m 2 and they had a normal waist circumference (cm) (women < 88 cm; men < 102 cm) what were regarded as low-risk lifestyle factors. Low income was coded as 1 if an individual's household income was < €30,999. Unemployment was coded as 1 if respondents were not in paid employment or self-employed. High school was coded as 1 if people had a high school degree or lower. Seldom confiding in someone was coded as 1 if people were unable to confide in someone close to them less than once a month or less than once every few months. Feeling isolated was coded as 1 if they felt isolated. In the UKB database, garden percentage (300 m buffer) was defined as the percentage of the home location buffer classed as 'Domestic garden' with home location data buffered at 300 m. Natural environment percentage (300 m buffer) was defined as the percentage of the home location buffer classed as 'Natural Environment' in the Land Cover Map (LCM) 2007 with home location data buffered at 300 m. Figure 2 shows the correlation between variables.
Covariates. Sociodemographic characteristics included age at baseline and sex. Family history (including parents and siblings) of cancer events, cancer screening, and Townsend deprivation index were also controlled in our study.
Factor analysis. We used factor analysis to determine whether these items could be combined into separate components reflecting different aspects of environmental exposure. We also performed Kaiser-Meyer-Olkin (KMO) and Butler sphericity tests, which showed that these variables were suitable for factor analysis. The factors were rotated using the varimax method to achieve a simpler structure with greater interpretability. An eigenvalue > 1.0 was detained. Finally, only exposome factors with loading ± 0.4 were regarded as important in this www.nature.com/scientificreports/ study 46 . Five domains were extracted from the exposome items, according to the rule that eigenvalues were > 1.0, and varimax rotation was performed to minimise loading complexity for each component. Component 1 represented three items pertaining to lifestyle (namely healthy physical activity, healthy diet, and normal weight and waist circumference). Component 2 represented two items pertaining to less tobacco and alcohol use (namely smoking and drinking status). Component 3 represented three items pertaining to concern about social economics (namely low income, unemployment, and high school). Component 4 represented three items pertaining to social support (namely seldom confiding in someone, feel isolated). Component 5 represented two items pertaining to ecosystem (namely garden and natural environment percentage). The coefficients for the items are displayed in Table 1.
Statistical analysis. The 12 exposome items were aggregated into 5 components based on factor analysis (principal components method). First, we described the sociodemographic and exposome-related characteristics of the respondents, and compared between-group differences using one-way ANOVA or χ 2 test when appropriate. Next, we conducted a full sample analysis to examine the relationship between all exposome variables and the risk of CRC. Thirdly, factors were employed in multivariable regression models called principal component  www.nature.com/scientificreports/ regression analyses [46][47][48][49] . Cox proportional hazard models were used to estimate the CRC risk hazard ratios (HRs) and 95% confidence intervals (CIs) related to the exposome factors. Fourthly, we also summed the exposome counts within each domain and examined the cumulative effect of exposome domains on CRC risk. All analyses were conducted using STATA software (version 15.0; StataCorp, TX, USA), and two-tailed p-values ≤ 0.05 were considered statistically significant.
Ethics approval. The  Full sample analysis. Figure 3 shows the effects of all exposome variables on the risk of CRC in the UKB cohort. According to Fig. 3, participants who exercised regularly, had normal weight and normal waist circumference, never have smoked, had low to moderate drinking, and lived in areas with more garden coverage and natural environment coverage were related to the reduced risk of CRC. Those who were low-income, unemployed, less educated, and isolated were significantly associated with an increased risk of CRC. Table 3 shows the multivariable-adjusted HRs (95% CI) of CRC events by exposome factors among the 335,370 participants. The results from the Cox models showed that a healthy lifestyle (HR = 0. 889, 95% CI 0.871-0.907; p < 0.001), and less tobacco and alcohol use (HR = 0.892, 95% CI 0.876-0.909; p < 0.001) were positively associated with a reduced risk of CRC. Adverse results were identified among the socioeconomic and social support domains. Disadvantageous socioeconomic status (HR = 1.081, 95% CI 1.058-1.105; p < 0.001) and insufficient social support (HR = 1.036, 95% CI 1.107-1.056; p < 0.001) were associated with an increased risk of CRC. The ecosystem was positively correlated with a reduced risk of CRC (HR = 0.97, 95% CI 0.952-0.989; p < 0.01).
Exposome counts and CRC . Based on a study by Safford et al. 50 , we decided to use exposome factor counts to examine the cumulative effects of different exposome domains on CRC events. We used lifestyle, social, and ecosystem domains to construct the exposome counts. Notably, exposome counts were categorised into three parts: lifestyle, social, and ecosystem counts. Lifestyle counts were developed by combining physical activities, diet, drinking, smoking, and obesity. The counts range was 0-5. A higher count indicated a healthier lifestyle. Social counts were developed by combining low income, unemployment, high school, seldom confiding in someone, feel isolated. The counts range was 0-5, where a higher count indicated that participants were more socially disadvantaged. Ecosystem counts were developed by combining the median garden percentage (300 m; above the median coded as 1) and the median natural environment percentage (300 m; above the median coded as 1). This count range was 0-2, with a higher score indicating a better ecosystem.  Fig. 4a, the cumulative incidence of CRC decreased with increasing lifestyle counts. Compared with the ≤ 1 lifestyle count, the relative risk of CRC incidence was lower in participants with ≥ 4 lifestyle counts (HR = 0.593, 95% CI 0.552-0.637; p < 0.001; Fig. 4b).
The . We observed an increased cumulative incidence of CRC with increasing social counts (Fig. 5a). The relative risk of CRC incidence was higher in participants with ≥ 4 social domain counts (HR = 1.225, 95% CI 1.150-1.305; p < 0.001) than in those with a social domain count of ≤ 1 (Fig. 5b).
We divided participants into three ecosystem dimension participants: 0 count group ( (Fig. 6a). The relative risk of CRC incidence was lower in participants in the 2 counts group (HR = 0.920, 95% CI 0.856-0.988; p < 0.001) than in those in the 0 count group (Fig. 6b). In addition, we also conducted an additional analysis to examine the relationship between each exposome dimension and the risk of CRC incidence independently, and also found robust results (see Table S3).

Discussion
Numerous studies have focused on the effects of certain exposure factors on CRC events, however, few of these studies have applied the exposome theoretical framework to explore the synergy between these factors in relation to CRC development. To bridge this knowledge gap, this study used UKB data to explore the relationship between exposome factors and the risk of CRC by using evaluation measures with strong construct validity. This study has three noteworthy findings, based on the exposome theoretical framework. These results suggest that lifestyle, social, and ecosystem domains are related to the risk of CRC. www.nature.com/scientificreports/ www.nature.com/scientificreports/ First, in terms of lifestyle domain, we found that a healthy lifestyle and less tobacco and alcohol use were associated with a reduced risk of CRC. These results are in agreement with those of previous studies [10][11][12][13] . A healthy diet is considered to contain rich fish, fruits, vegetables, and less processed and red meat 12 . This diet pattern included sufficient nutrition, and less fatty and carcinogenic substances, which could reduce the risk of CRC 51 . Obesity has a direct effect on certain hormone levels, such as insulin, oestrogen, and insulin-like growth factor-1, which produce a favourable environment for carcinogenesis 52 . Physical activity can reduce CRC risk by motivating gut motility, benefiting the immune system, and elevating metabolic efficiency 51,53 . Alcohol metabolites, such as acetaldehyde, increased the risk of CRC, because acetaldehyde was evaluated as a carcinogen 54 . Smoking induces angiogenesis and suppresses cell-mediated immunity to facilitate tumour growth 55 .
Second, in terms of social domain, this study suggested that lower SES and less social support were associated with an increased risk of CRC. The finding of a relationship between SES and CRC risk was consistent with that of previous studies. Higher incidences of CRC are related to greater social disadvantage 26,56 . One possible reason is that low-SES people may have less rational health behaviour, know less about their symptoms, and communicate more poorly than high-SES people with health care staff 57 . In contrast, high-SES people have better access to health information 57 and seek health services 58 , they are more likely to have behavioural changes to a healthy lifestyle 59 and thus reduce the risk of CRC. However, findings regarding the association between social support and the incidence of CRC have been inconsistent. One study found no significant relationships between social networks and CRC incidence 28 , while Ikeda et al. found that lower social support was associated with a higher incidence of CRC among men 29 . This study supports the finding that lower support was related to an increased risk of CRC. One possibility is that social support raises the esteem of individuals and makes them feel valued, therefore, they may take better care of themselves and be more receptive to preventative services 29,60 . Additionally, social support reduces stress and depression. Fewer stress hormones reduce the risk of immune dysregulation caused by 61 , thereby suppressing the environment for tumour initiation.
Third, in terms of the ecosystem domain, a better ecosystem was associated with a reduced risk of CRC. However, limited studies have evaluated the ecosystem and CRC incidence. Increasing studies have suggested that ecosystems play a role in the incidence of cancer 30,31 . It is unclear how ecosystems influence the risk of CRC. One possibility was that a better ecosystem was related to a more covered natural environment, with less pollution and radiation 62 . Pollutants in water 35 , poisonous heavy metals 18 and radiation are associated with an increased risk of CRC. Furthermore, a better ecosystem may provide a greener environment, which provides a venue for physical activity 63 . Engaging in more physical activity can help reduce the risk of CRC 54 .   www.nature.com/scientificreports/   www.nature.com/scientificreports/ This study has several strengths. First, the results highlight the effects of the physical (ecosystem) and social environments (SES and social support) on CRC events. The majority of previous studies have focused on the influence of biological factors on CRC events while ignoring environmental effects. Second, because of the increased acceptance that non-genetic factors play a role in diseases, this study included the entire set of exposome factors from a holistic perspective. It also showed the whole mechanism of exposome factors on the risk of CRC at a macro level. Under the exposome framework, a broad approach for CRC prevention can be considered. Apart from promoting a healthy lifestyle among the population to prevent CRC, it is also important to take care of people with lower SES and less social support.
However, this study has several limitations. First, this study did not capture temporal changes in exposome factor changes (e.g. lifestyle and SES changes) due to the nature of the dataset. Future studies should use panel data to trace the impact of dynamic changes in exposome factors on CRC events. Second, due to the collinearity of variables in the physical-chemical domain, we did not conduct a full exposome analysis. Future studies will reduce the collinearity and add physical-chemical factors into the full exposome analysis. Third, there may be complex interactions between the three domains, which were not observed in detail in this study. Future studies should include mediation or pathway analysis to explore the interaction of lifestyle, social, ecosystem effects on the risk of CRC. Fourth, although our analyses controlled for various confounding factors, we may have overlooked other factors related to CRC due to the limitation of using secondary datasets. Future studies should use mixed methods (combing semi-structured interviews, and a case study with quantitative methods) to improve the robustness of the results.

Conclusion
To the best of our knowledge, this is the first study to focus on the exposome framework and CRC in social epidemiology studies. Our study included the whole set of exposome factors from a holistic perspective and employed factor analysis to improve the understanding of the relationship between exposome domains and CRC events. These findings suggest that lifestyle, social, and ecosystem domains are related to CRC events. Similar risk trends were also observed across the exposome count group with CRC incidence. This study confirmed the relationship between exposome factors and CRC events from an empirical perspective, which would provide policy implications for future CRC prevention.

Data availability
Data are available in a public, open access repository. This research has been conducted using the UK Biobank Resource under Application Number 44430. The UK Biobank data are available on application to the UK Biobank (https:// www. ukbio bank. ac. uk/).