Introduction

In 2020, the Global Cancer Observatory (GLOBOCAN) estimated that colorectal cancer (CRC) is one of the most commonly diagnosed cancers worldwide1. In Korea, cancer is the leading cause of death, accounting for 27.5% of all deaths in 2019. In particular, the incidence and mortality rates of CRC rank fourth and third, respectively, among all cancer types2. Based on previous epidemiological studies, diet and lifestyle factors have been recognized as key strategies for the primary prevention of CRC3,4,5. In recent nutritional epidemiological studies related to cancer, diet has been evaluated in a more complex manner rather than considering each diet or nutrient individually, which helps to understand dietary pattern concepts6,7.

The dietary pattern can be defined as combinations of dietary components intended to encompass the total diet in a specific population8. This approach is advantageous for exploring the combined effects and interactions of nutrients and food products. Although the components of dietary patterns differed across populations, the most widely investigated patterns in relation to cancer risk, especially CRC were prudent/healthy and western/unhealthy patterns9. Typically, a prudent dietary pattern is characterized by higher intakes of fruits and vegetables, and this pattern has been reported to lower the risk of developing CRC10,11. In contrast, a Western dietary pattern containing higher amounts of meat and processed foods has been shown to increase the risk of CRC12,13.

Several strategies have been applied to identify dietary patterns using hypothesis-driven methods or data-driven methods in relation to cancer, such as factor analysis, principal component analysis (PCA), reduced rank regression (RRR), partial least squares (PLS), and Gaussian graphical models (GGMs)14,15,16,17. Specifically, the use of PCA for identifying dietary patterns aims to explain the maximum variation in dietary intake and hence reflects the actual dietary behaviors within a population18. However, a major criticism of PCA is that this method does not necessarily identify a dietary pattern that is associated with the disease of interest. To determine disease-related dietary patterns, the RRR was proposed as an alternative to PCA19. The RRR aims to explain the maximum variation in investigator-specific intermediate response variables that are potentially relevant for a disease20, whereas dietary patterns derived using this method could be behaviorally irrelevant21. However, few studies have assessed the relationship between dietary patterns and CRC using the RRR method. A case‒control study revealed that an RRR-derived dietary pattern characterized by high intakes of grains, vegetables, and fruits was inversely associated with colon cancer22. However, another prospective cohort study reported that the empirical dietary inflammatory pattern score derived from the RRR was associated with an increased risk of developing CRC23. These contradictory results might be attributed to the selection of response variables that act as intermediates in the relationship between dietary intake and the disease of interest24.

Thus, we aimed to identify the major dietary patterns in a Korean population to determine the associations between those patterns and the risk of CRC using PCA and RRR methods. Moreover, we compared dietary patterns according to sex.

Materials and methods

Study population

This hospital-based, case‒control study recruited participants from two research centers of the National Cancer Center (NCC) of the Republic of Korea. In the CRC patient group, people who were newly diagnosed with CRC between August 2010 and September 2020 at the Center for Colorectal Cancer of the NCC were included. Of the 1780 patients who agreed to participate in this study, 290 participants were excluded due to incomplete semiquantitative food frequency questionnaire (SQFFQ) or general questionnaire data, and 13 others were excluded due to implausible energy intake (< 500 kcal/day or > 4000 kcal/day). We also excluded 57 non-CRC patients. Thus, there were 1420 eligible CRC patients for the study. The control group consisted of people visiting the Center for Cancer Prevention and Detection at the same hospital for the health check-up program provided by the National Health Insurance Cooperation from October 2007 to December 2022. Of the 18,471 control participants, 5409 participants with incomplete SQFFQ or general questionnaire data and 196 others with implausible energy intake (< 500 kcal/day or > 4000 kcal/day) were excluded. Participants were also excluded if they were newly (n = 26) or previously (n = 1279) diagnosed with any cancer. Among the eligible participants, control participants were selected by frequency matching to CRC patients by sex and 5-years age group (case:control ratio of 1:2). Therefore, 1420 CRC patients and 2840 control participants were included in the final analysis (Fig. 1).

Figure 1
figure 1

Flowchart of the study participants.

Ethical approval

This study was conducted according to the guidelines laid down in the Declaration of Helsinki, and all procedures involving human subjects/patients were approved by the Institutional Review Board of Korea National Cancer Center (IRB No. NCC2021-0181). All study participants provided written informed consent before participating in the study.

Outcome assessment

We classified CRC patients into three groups according to anatomical subsite based on the International Statistical Classification of Disease and Related Health Problems, 10th revision (ICD-10)25: (1) the proximal colon (including cecum, ascending colon, hepatic flexure, transverse colon, and splenic flexure); (2) the distal colon (including the descending colon, sigmoid-descending colon junction, and sigmoid colon); and (3) the rectum (including the rectosigmoid colon and rectum).

Data collection

All participants were interviewed about their sociodemographic and lifestyle characteristics, including age, sex, weight, height, first-degree family history of CRC, marital status, education level, monthly income, occupation, smoking status, alcohol consumption, and physical activity, using a structured questionnaire. Body mass index (BMI) was calculated as body weight (kg) divided by the square of height (m2). Dietary data were collected using a 106-item SQFFQ developed for Korean adults. The validity and reproducibility of the SQFFQ have been described elsewhere26. Participants were asked to provide their average food frequency (on a 9-point scale of never or rarely, 1 time per month, 2–3 times per month, 1–2 times per week, 3–4 times per week, 5–6 times per week, 1 time per day, 2 times per day, or 3 times per day) and the average portion size (on a 3-point scale of small, medium, or large) for each food item during the previous year. The food items listed in the SQFFQ were categorized into 33 food groups based on nutritional similarities and culinary usage (Supplementary Table 1).

Dietary pattern analysis

Dietary patterns were assessed by the PCA (PROC FACTOR) and RRR (PROC PLS) methods using 33 predefined food groups. The intake of these food groups was adjusted for total energy intake by density methods (g/1000 kcal). For the PCA method, orthogonal varimax rotation was applied to enhance the interpretability of the extracted components. We decided to retain two factors based on the eigenvalue (greater than 2.0), the inflection point of the scree plot, and the interpretability of the components. For the RRR method, we used four response variables (ratio of n−6:n−3, fiber, vitamin D, and calcium) associated with CRC27,28. We retained only the first factor that explained most of the variation in the response variables. For each dietary pattern, we calculated a score by summing the intakes of each food group weighted by the factor loadings ( ≥|0.20|). However, six food groups (light-colored vegetables, fish, mushrooms, other seafoods, eggs, and red meat) had factor loadings ≥|0.20| in both PCA-derived dietary patterns. For calculating dietary pattern scores, these food groups belonged to one pattern with high loadings. Moreover, we obtained sex-specific pattern scores by conducting dietary pattern analysis for men and women. Dietary patterns were named according to the food groups showing high loadings.

Statistical analysis

The descriptive statistics are presented as the mean ± standard deviation for continuous variables and as numbers (percentages) for categorical variables. The generalized linear model and the chi-square test were used to compare the differences in the means and distributions of the general characteristics of the study participants, respectively. Dietary pattern scores were divided into quartiles based on their distribution among the control participants. The association between dietary pattern scores and CRC risk was assessed using unconditional logistic regression models to calculate odds ratios (ORs) and 95% confidence intervals (CIs). The lowest intake group (Q1) was used as the reference. The median value for each quartile category of the dietary pattern score was used as a continuous variable to test for trends in the regression model. The multivariable logistic regression model considered potential covariates such as age (continuous), BMI (continuous), first-degree family history of CRC (yes/no), marital status (married/others), education level (elementary school or less, middle school, high school, and college or more), monthly income (< 2, 2–4, ≥ 4 million won per month), occupation (professionals/administrative management/office jobs, sales/service, agriculture/manufacturing/mining/army service, and housekeeping/unemployment/others), smoking status (nonsmoker, former smoker, and current smoker), alcohol consumption (nondrinker, former drinker, and current drinker), and regular physical activity (yes/no). Stratified analysis based on anatomical subsites (proximal colon, distal colon, and rectal cancers) was conducted using multinomial logistic regression models. All analyses were performed using SAS version 9.4 (SAS Institute, Inc., Cary, NC, USA). Statistical significance was considered at P < 0.05.

Results

General characteristics of the study population

The general characteristics of the study population are described in Table 1. The mean age was 58.1 ± 10.2 years in the CRC patient group and 57.6 ± 9.4 years in the control group. Among the overall population, CRC patients had higher rates of first-degree family history of CRC and married status, a greater proportion of former drinkers, and higher energy intake than control participants (P < 0.05). Moreover, participants in the CRC patient group had lower levels of education, monthly income, professional occupation status, and physical activity than those in the control group (P < 0.001). When stratified by sex, compared with control participants, male CRC patients had a significantly lower mean BMI and smoking rate, and female CRC patients had a significantly higher mean BMI (P < 0.05). The same trend was observed in the distribution of other characteristics for both men and women (P < 0.05).

Table 1 General characteristics of the study population.

Dietary patterns

The factor loadings of dietary patterns determined by the PCA and RRR methods are shown in Table 2. PCA identified two major dietary patterns among men and women: a prudent pattern and a westernized pattern. The first dietary pattern (“prudent” pattern) derived from PCA was mainly characterized by a high consumption of green/yellow vegetables, condiments/seasonings, light-colored vegetables, tubers, seaweeds, fish, mushrooms, fruits, tofu/soymilk, other seafoods, kimchi, eggs, dairy products, nuts, pickled vegetables, and legumes for both men and women. Milk was also represented in the prudent dietary pattern among women. The second dietary pattern (“Westernized” pattern) derived from PCA was characterized by a high consumption of red meat, oil, sweets, noodles, processed meat, meat byproducts, poultry, carbonated beverages, bread/cake/pizza/hamburgers, seafood products, salted and fermented seafood, cereals and snacks for both men and women. The factor loadings from RRR seemed to be similar to those of the prudent dietary pattern. As in the prudent dietary pattern, the RRR-derived dietary pattern (“healthy” pattern) was characterized by higher consumption of green/yellow vegetables, light-colored vegetables, fruits, eggs, and milk for both men and women. In addition, kimchi was represented in the healthy dietary pattern among men.

Table 2 Factor loading matrix for the 3 major patterns identified by factor analysis.

As expected, the percentage of variation explained by food groups or predictors was higher for the PCA-derived pattern in both men and women (men: 15.44% in PCA 1 vs. 12.67% in RRR-derived pattern; women: 16.80% in PCA 1 vs. 14.45% in RRR-derived pattern). In the RRR pattern, the explained variation in responses was 46.11% for men and 49.93% for women.

Associations between dietary patterns and CRC risk

The associations between dietary pattern scores and CRC risk are presented in Tables 3 and 4. In men, a significant inverse association between dietary pattern and risk of rectal cancer was found only for the healthy dietary pattern (ORQ4 vs. Q1 = 0.66, 95% CI 0.45–0.97, P for trend = 0.036) after adjustment for age, BMI, first-degree family history of CRC, marital status, education, monthly income, occupation, smoking status, alcohol consumption, and regular physical activity. In women, the risk of CRC tended to decrease for the highest quartile of prudent and healthy dietary patterns (prudent, ORQ4 vs. Q1 = 0.59, 95% CI 0.40–0.86, P for trend = 0.005; healthy, ORQ4 vs. Q1 = 0.62, 95% CI 0.43–0.89, P for trend = 0.007) after adjustment for confounding factors. According to an analysis stratified by anatomical subsite, a decreased risk of rectal cancer was observed for those in the highest quartile of the prudent dietary pattern (ORQ4 vs. Q1 = 0.31, 95% CI 0.15–0.65, P for trend < 0.001). A healthy dietary pattern was associated with a decreased risk of distal colon cancer (ORQ4 vs. Q1 = 0.58, 95% CI 0.35–0.97, P for trend = 0.025) and rectal cancer (ORQ4 vs. Q1 = 0.29, 95% CI 0.15–0.57, P for trend < 0.001). The Westernized dietary pattern was not significantly associated with the risk of CRC in either sex.

Table 3 Colorectal cancer risk according to anatomical location and dietary intake of the identified dietary patterns (men).
Table 4 Colorectal cancer risk according to anatomical location and dietary intake of the identified dietary patterns (women).  Significant values are in bold.

Discussion

In the present study, we used both PCA and RRR to investigate the association between dietary patterns and CRC risk in Korean adults. Using PCA, two dietary patterns were derived from men and women: a prudent pattern and a westernized pattern. A prudent dietary pattern was associated with a decreased risk of CRC after adjustment for confounding factors among women only. The westernized dietary pattern was not significantly associated with CRC in either sex. Using RRR analysis, we identified a dietary pattern that seemed to be similar to the prudent dietary pattern in both men and women. A significant inverse association was observed between a healthy pattern and rectal cancer risk in men. In women, a healthy dietary pattern was associated with a significantly lower risk of CRC, as well as distal colon and rectal cancer.

Our findings showed that higher adherence to the prudent dietary pattern decreased CRC risk in women. The prudent dietary pattern was characterized by high intakes of green/yellow vegetables, condiments/seasonings, light-colored vegetables, tubers, seaweeds, fish, mushrooms, fruits, tofu/soymilk, other seafoods, kimchi, eggs, dairy products, nuts, pickled vegetables, milk, and legumes. One prospective cohort analysis from Alberta’s Tomorrow Project indicated that a PCA-derived, prudent dietary pattern was protective against combined cancer and colon cancer22. A healthy dietary pattern that was high in vegetables, fruits and fish was reported to be inversely associated with CRC risk in a European cohort study among women29. In contrast, the results from the Dietary Patterns and Cancer Project reported that there is no association between a vegetable-based pattern and CRC risk30. Consistent with our results, a case‒control study in Korea showed that a prudent dietary pattern rich in fruit and dairy products was inversely associated with CRC risk12. Another case‒control study conducted in central and northeast Pennsylvania reported that higher scores on the fruits and vegetables pattern were associated with a reduced risk of CRC18.

The beneficial effects of the prudent dietary pattern on CRC prevention might be explained by numerous mechanisms. The prudent diet is rich in various plant-based foods, such as fruits and vegetables, which contain antioxidant vitamins, carotenoids, fiber, folic acid, and other phytochemical compounds31. Antioxidant micronutrients, including vitamin C, vitamin E, and carotenoids, trap free radicals and reactive oxygen species32. Specifically, carotenoids are efficient scavengers of reactive oxygen species and stimulate the immune system33. Vitamin C reduces nitrite, thus blocking the formation of nitrosamines and nitrosamides, which are known carcinogens that contribute to the induction of tumors in experimental animals and possibly in humans34. Several nitrosamides have been shown to induce CRC and adenomatous polyps in the large intestine of rats when applied directly to the colonic epithelium35. Dietary fiber may dilute carcinogens through increased stool bulk and decrease the contact time of carcinogens and toxins with the colonic epithelium due to reduced transit time36. Moreover, the decreased fecal pH caused by dietary fiber inhibits bacterial degradation of normal food constituents to potential carcinogens37. Fiber fermentation by fecal flora to short-chain fatty acids (SCFAs), such as acetate, propionate, and butyrate, is known to be the key factor in the suppression of colonic inflammation and carcinogenesis38. SCFAs have anticancer effects, including the promotion of apoptosis in cancer cells39 and the inhibition of chronic inflammatory processes and cancer cell migration/invasion in the colon40. A number of such metabolites related to dietary intake could be an important way to reflect the components of food as well as markers for complex metabolomics responses to dietary exposures. One study that developed metabolite profile scores correlated with dietary intake and also examined the prospective associations of these scores with CRC risk, showed consistent results compared to previous studies using dietary data only41. Our future research could benefit from the incorporation of metabolomics to complement traditional dietary assessments in investigating the diet‒CRC association.

Several cohort studies have shown a protective effect of dairy product and milk intake on the risk of CRC42. A case‒control study conducted in China revealed that subjects in the highest quartile had a 68% and 48% lower risk of CRC than those in the lowest quartile of total dairy product and milk consumption, respectively43. There are several mechanisms by which dairy products and milk are related to a reduction in CRC risk. Dairy products include beneficial constituents, such as butyric acid, lactoferrin, and fermentation products44. Milk is a rich source of dietary calcium and vitamin D. Vitamin D may protect against CRC since it reduces epithelial cell proliferation and exerts anticancer effects45,46. Furthermore, the roles of calcium and vitamin D are closely linked because vitamin D is involved in the regulation of calcium bioavailability47.

In the present study, no associations were found between the Westernized dietary pattern and CRC risk in either men or women. In Western countries, most studies have shown a positive association between the Western dietary pattern and CRC48,49,50. On the other hand, a cohort study conducted in Japan reported that an animal food dietary pattern characterized by a high intake of various animal-derived foods, such as beef, pork, ham, sausage, poultry, liver, and butter, was not significantly associated with a risk of CRC51. In another cohort of Singaporean Chinese individuals, the meat-dim sum pattern which was similar to the Western dietary pattern, was not associated with CRC52. The differences among studies may be partially attributed to different amounts and ranges of meat intake and various cooking methods, such as grilling or stir frying between Western and Asian populations53,54.

Our study showed that a healthy dietary pattern was inversely associated with CRC risk in women. A healthy dietary pattern was characterized by high consumption of green/yellow vegetables, light-colored vegetables, fruits, eggs, and milk. This pattern is somewhat similar to the prudent pattern. A previous study also used RRR analysis with nutrients as a response variable, and the dietary fiber pattern and discretionary fats pattern were shown to be protective against colon cancer22. Another study using RRR analysis showed that secondary fecal bile acid concentrations increased across the tertiles of dietary pattern, which was characterized by high intakes of processed meat, fried potatoes, bread, and margarine and low intakes of muesli, plant-based milk, vegetables, and fruit55. Previous research has indicated that an increased proportion of secondary bile acids in feces is present in patients with CRC56.

Sex differences were also observed in our study. Generally, women have been reported to engage in more health-promoting behaviors than men and have healthier lifestyle patterns. According to our prudent and healthy patterns, the loading of plant-based foods, such as fruits and vegetables, was higher among women than among men57. Moreover, diet and dietary patterns play critical roles in obesity, which causes low-grade chronic inflammation58. Chronic inflammation is one of the major predisposing factors in cancer progression and is indicated by increased plasma levels of C-reactive protein (CRP), interleukin 6, and tumor necrosis factor-alpha. A meta-analysis revealed that prediagnostic circulating CRP concentrations were positively associated with the risk of CRC, and the association was stronger in men than in women59. These sex-dependent variations might affect the reduced risk of CRC in females compared to males.

Considering that dietary patterns generated by PCA reflect real-world dietary habits in a population and that dietary patterns generated by RRR are associated with the disease of interest, patterns derived from each method could be different. However, we identified two similar dietary patterns using PCA and RRR analysis. Moreover, compared to the dietary pattern determined by PCA, the RRR-derived pattern had a slightly stronger association with CRC in women. Similarly, several studies have shown that RRR patterns are more strongly associated with outcomes than PCA patterns because dietary patterns from the RRR method are driven via disease-associated response variables60,61,62. In contrast, a case‒control study investigating the associations between dietary patterns and bladder cancer showed stronger associations for PCA patterns63. Meanwhile, each of these methods has drawbacks. PCA is unable to account for which dietary patterns have the most predictive capability for a disease64. In terms of choosing the most intermediate response variables for the RRR analysis based on prior knowledge, relying solely on the information of selected response variables to derive dietary patterns may lead to the omission of those dietary patterns related to nutrients in the disease’s metabolic pathways but not included in the response variables. Therefore, the PCA and RRR methods complement each other, and the application of both methods might be useful for assessing the similarity between actual dietary behavior and disease-associated patterns.

The main strength of the current study was the application of both PCA and RRR analysis to compare dietary patterns derived from those two methods in a relatively large population. Second, we conducted a risk-stratified analysis of the association between dietary patterns and distinct CRC locations for each sex. Third, we had comprehensive information on potential confounding variables based on a questionnaire administered by skilled interviewers. However, there are several limitations that need to be acknowledged when interpreting the results. Selection bias is an inherent potential limitation of case‒control studies. Control participants were recruited from those who enrolled in a health screening program. Thus, participants in the control group might have had healthier behaviors compared with individuals in the general population. In addition, there could be recall bias in reporting dietary consumption since dietary intake was assessed using the SQFFQ. However, the previously validated SQFFQ used in this study was designed to collect data regarding the usual dietary intake of the Korean population. Additionally, trained interviewers assisted participants with the structured questionnaire to minimize this recall bias. Finally, some subjective decisions were made regarding dietary pattern analyses, such as food grouping, labeling of patterns, and choosing intermediate response variables for the RRR, which might lead to the omission of relevant nutrients in the biological pathways involved in CRC.

In conclusion, we obtained two similar patterns using PCA and RRR analysis. Although both patterns were associated with a lower risk of CRC, the healthy dietary pattern showed a slightly stronger association in women. The concordant food groups in the prudent and healthy dietary patterns consisted of green/yellow vegetables, light-colored vegetables, fruits, eggs, and milk. In this particular study, the RRR-derived dietary pattern was strongly associated with CRC risk in the study population and might be more suitable for deriving the pattern that is associated with CRC. However, further investigations are needed in different populations and with different response variables and other disease outcomes.