Main

Food-liking is a complex trait that reflects the hedonic response to food for individuals1 and is considered to be the most influential factor driving food choices and intake2. With an abundance of food choices available worldwide, people naturally develop diverse dietary patterns. Recently, growing evidence has highlighted that the profound impact of dietary patterns on health, including chronic medical diseases, such as cardiovascular disease3, type 2 diabetes4, metabolic syndrome5 and cancer6, as well as mental health and/or cognitive impairments7,8,9,10, such as major depression disorders and anxiety. Understanding how diet preferences affect health, especially brain health, is critical for developing targeted dietary interventions to promote the consumption of nourishing foods and improve the landscape of brain health.

Previous evidence has demonstrated a strong link between diet and both cognitive functions and mental health. For example, a systematic review focusing on various dietary intake patterns and cognitive functions revealed associations such as increased consumption of simple carbohydrates (for example, sugars) being linked to decreased overall cognitive performance, while saturated fatty acids were associated with reduced memory and learning. Conversely, protein intake was found to potentially enhance executive function and working memory7. Furthermore, unhealthy diets have been implicated as a risk factor for a wide range of psychiatric disorders, including major depression disorders11,12,13, anxiety14, bipolar disorder15,16, stroke17, sleep problems18,19 and Alzheimer’s disease20. For instance, individuals with a ‘Western dietary pattern’ (who preferred high sweet and fatty food but not plant-based food) showed a higher incidence of depression11,12,13 relative to those following a balanced diet (including a balanced amount of vegetables, fruits, cereals, nuts, seeds, pulses, moderate dairy, eggs and fish)15,21,22,23.

The linked diet and cognition and mental health might be related to alterations in molecular biomarkers as well as changes in brain structure and functions. Nutrition research suggests that the relationship between dietary patterns and mental disorders (or cognitive functions) could be potentially mediated by the gut–brain axis. Specific dietary patterns, such as the ‘Western diet’ have the potential to disrupt the balance of gut microbiota, leading to inflammation and oxidative stress, which can impair cognitive function and increase the risk of mental disorders24,25,26,27. Additionally, neuroimaging studies have revealed associations between dietary patterns and functions and structures in brain regions28,29,30,31, emphasizing the intricate relationship between diet and brain health. For instance, higher adherence to the ‘Mediterranean-type diet’ (characterized by high consumption of fruit, vegetables, legumes and cereals, with olive oil as the primary source of fat and a low intake of red meat and poultry) that typically linked with reduced risk of Parkinson’s disease and Alzheimer’s disease was associated with lower reduction of total brain volume over a 3 year period29, as well as with larger cortical thickness in key brain regions, such as the entorhinal cortex, posterior cingulate cortex, orbitofrontal cortex and inferior and middle temporal gyrus31. While previous research has established strong links between dietary and various domains of brain health, the complex relationships and regulation mechanisms underlying different domains of brain health remain poorly understood.

Moreover, based on the quantities, variety or combination of different foods and beverages in diets and the frequency with which they are habitually consumed, several traditional dietary patterns have emerged32, such as the ‘Western dietary pattern’ and the ‘Mediterranean dietary pattern’, as described above, as well as the ‘prudent dietary pattern’ (characterized by a high intake of vegetables, fruit, legumes, whole grains and fish and other seafood)33 and the ‘vegetarian/plant-based dietary pattern’ (a dietary pattern that excludes meat, meat-derived foods and, to different extents, other animal products)34. While extensive research has explored the links between these dietary patterns and brain health, findings across studies are not consistently aligned. For example, some studies associated the vegetarian dietary pattern with higher depression and anxiety35,36, while others found the opposite effect37,38 or no effect39,40. This variation may be attributed to limited sample sizes and different scopes and criteria used for defining dietary patterns. For instance, differences may arise from considerations such as whether individuals consuming dairy products are categorized within the ‘vegetarian/plant-based dietary pattern’34,41. In addition, these studies tend to focus on specific populations adhering to a single dietary pattern, leaving a critical gap in understanding the relationship between dietary patterns and brain health in other populations. Thus, a universally recognized and reliable dietary pattern classification system within a large-scale population is warranted.

In this Article, to narrow these gaps, the current study leverages the large-scale dataset from the UK Biobank and employs data-driven approaches to identify naturally developed dietary patterns and their associations with cognitive function, mental health, blood and metabolic biomarkers, brain imaging and genomics. Specifically, we first utilized food-liking data from the UK Biobank participants and applied principal component analysis (PCA) and hierarchical clustering techniques to develop subtypes of food-liking. Subsequently, through one-way analysis of covariance (ANCOVA), we assessed differences in various brain health domains among these subtypes, including mental health, cognitive functions, blood and metabolism biomarkers, and brain magnetic resonance imaging (MRI) traits. We also examined differences among these subtypes by analyzing longitudinal data on mental disorders via Cox proportional hazards models. Third, structural equation models (SEMs) were employed to explore the relationships between dietary patterns and different aspects of brain health. Fourth, genome-wide association analysis (GWAS) and gene expression and enrichment analysis were conducted to investigate the genetic underpinnings of distinct subtypes of food-liking and potential biological pathways. This study pioneeringly represents the large-scale exploration of food preferences and their comprehensive associations with brain health. By exploring these intricate connections, our research lays the groundwork for further investigations and potential interventions that can significantly impact human health on a global scale, underscoring the importance of understanding the intricate relationship between diet and brain health.

Results

Distinct food-liking profiles of the four subtypes

A total of 181,990 participants (mean age 70.7 ± 7.7 years and 57.08% female) from the UK Biobank were included in the identification of food-liking subtypes. Supplementary Fig. 1 provides a general schema of the current study. First, 140 food and beverage items were classified into ten food categories, and PCA was performed separately for each category. Using this approach, we obtained a total of 83 principal components, which were used as input for hierarchical clustering. The dendrogram of the clustering results showed that participants could be grouped into four distinct food-liking subtypes (Fig. 1a), with proportions of 18.09%, 5.54%, 19.39% and 56.98% for subtypes 1 to 4, respectively. The demographic characteristics of the four subtypes were summarized in Supplementary Table 3.

Fig. 1: Food-liking profiles of the four subtypes.
figure 1

a, The dendrogram resulting from the hierarchical clustering of food preference data from 181,990 participants, revealing four distinct subtypes. The red dashed line indicates the delineation of four subtypes. b, The radar chart depicting the preference scores of the ten food categories for each subtype. c, Comparisons between food-liking and food-consumption traits using relative scores of the four subtypes. The selected foods cover a range of categories analyzed in this study. The food-liking measures are shown to be closely related to food consumption. The four identified subtypes include subtype 1, ‘starch-free or reduced-starch dietary pattern’, subtype 2, ‘vegetarian dietary pattern’, subtype 3, ‘high protein and low fiber dietary pattern’ and subtype 4 ‘balanced dietary pattern’.

To characterize the food preferences of the four subtypes, we generated a radar chart to visualize the liking scores of the ten food categories for the four subtypes (Fig. 1b). Subtype 1 showed a higher preference for fruits, vegetables and protein foods but a lower preference for starches, which is consistent with a ‘starch-free or reduced-starch dietary pattern’. Subtype 2 displayed a stronger preference for fruits and vegetables, while showing a lower preference for protein foods, which is similar to a ‘vegetarian dietary pattern’. Subtype 3 exhibited a greater preference for snacks and protein foods but a lower preference for fruits and vegetables, resembling the ‘high protein and low fiber dietary pattern’. Finally, subtype 4 showed balanced preferences across all food categories, which can be regarded as a ‘balanced dietary pattern’. To further validate the suitability of clustering into four subtypes, we utilized the silhouette criterion42 to determine the optimal number of clusters. Our analysis encompassed cluster numbers ranging from two to seven, as visualized in Supplementary Fig. 2a. The results indicated that the most suitable numbers of food-liking subtypes did not exceed four. In addition to the four subtypes shown in Fig. 1b, we also examined a radar chart depicting three subtypes (Supplementary Fig. 2b). It is noteworthy that one of these three subtypes is a combination of two of the four subtypes (subtype 1 and subtype 2) displayed in Fig. 1b, while the other two subtypes closely resemble two subtypes from Fig. 1b. Furthermore, the radar chart of the four subtypes (Fig. 1b) exhibited distinct food-liking characteristics, indicating an intriguing and meaningful dimension to our exploration of dietary patterns within a large population.

Additionally, we assessed the robustness of our findings in the context of data imputation by utilizing nonimputed data from 72,419 participants for the identification of food-liking subtypes. The radar chart depicting the four subtypes identified among the 72,419 participants without imputation closely mirrored the one generated from the imputed data of 181,990 participants (Supplementary Fig. 3a). This consistency indicated the robustness of our findings with data imputation. Moreover, the food preference characteristics of the four subtypes, as determined using PCA with explained variance ratios of 70% and 90% (Supplementary Fig. 3b,c), both exhibited a strong resemblance to the subtypes identified at variance ratios of 80% (Fig. 1b). This finding indicated the robustness of the explained variance of the derived components and validated the reliability of hierarchical clustering results based on PCA components.

To investigate the potential corresponding relationship between food liking and food consumption, we further calculated the average scores for specific food traits within each subtype, covering various food items. These food traits encompassed a variety of categories, including vegetables, fruits, several types of meat and alcohol, as well as cereal and bread. The results revealed a consistent alignment between the relative scores of the food-liking and food-consumption traits across all four subtypes. This congruence in scores indicated a robust relationship between individual preferences for certain foods and their actual consumption patterns (Fig. 1c).

Subtype-specific mental health and cognitive function

Before one-way ANCOVA analyses, we conducted Levene’s tests to confirm that the data satisfied the assumption of the equality of variances (P > 0.05). After adjusting for covariates and applying the Bonferroni correction, one-way ANCOVAs with the factor of subtype revealed significant main effects on seven mental health measures, including anxiety symptoms (F = 41.5 and P = 8.9 × 10−27), depressive symptoms (F = 71.4 and P = 3.9 × 10−46), mental distress (F = 62.1 and P = 4.0 × 10−40), psychotic experience (F = 17.4 and P = 2.6 × 10−11), self-harm (F = 116.8 and P = 1.6 × 10−75), trauma (F = 155.4 and P = 1.6 × 10−100) and well-being (F = 256.8 and P = 3.8 × 10−166). Figure 2a depicts the subtype-specific patterns of mental health measures. By visual inspection, subtype 4 scored the lowest in most mental health measures and the highest in well-being, indicating better mental health conditions. Subtype 2 and subtype 3 had relatively higher scores in some mental health measures, such as anxiety and depressive symptoms, and a relatively lower level of well-being (Fig. 2a).

Fig. 2: Subtype-specific patterns of mental health measures, cognitive function and mental disorder risk.
figure 2

a, The phenotypic differences in mental health (for all mental health symptoms, n = 118,616) and cognitive function, including the symbol–digit substitution test (n = 93,325), fluid intelligence (n = 96,742), pairs matching (n = 93,394), reaction time (n = 179,740) and trail making (n = 82,375), among the four subtypes: subtype 1 (S1), ‘starch-free or reduced-starch dietary pattern’, subtype 2 (S2), ‘vegetarian dietary pattern’, subtype 3 (S3), ‘high protein and low fiber dietary pattern’ and subtype 4 (S4), ‘balanced dietary pattern’. These differences were determined by ANCOVA analyses after Bonferroni correction (α = 0.05). The analysis controlled for several covariates, including age, BMI, education qualificationsf and Townsend deprivation index. For the cognitive tests (including reaction time and trail making), a lower score indicates better cognitive functions. For the other cognitive tests, a higher score indicates better cognitive performance. The data are presented using a violin plot (median point; upper and lower quartiles). b, Forest plots depicting Cox proportional hazards models for the risks of mental disorders, including anxiety (n = 152,014), depression (n = 145,350), eating disorder (n = 165,158) and stroke (n = 152,730), with subtype 4 as the reference group. The results are presented using HRs and their corresponding 95% CI. The significance of coefficients in the Cox models was evaluated using the Wald test (two-tailed P value). The analyses were adjusted for confounding factors, including sex, age, BMI, education qualifications and Townsend index, with FDR corrections for multiple comparisons (adjusted P value <0.05). The horizontal gray dashed line represents no effect (i.e., hazard ratio = 1). The gray dot represents S4 as the reference in the Cox models.

In addition, similar analysis was conducted for four cognitive functions, which also exhibited significant main effects on four subtypes, including fluid intelligence (F = 15.0 and P = 8.9 × 10−10), pairs matching (F = 6.6 and P = 2.0 × 10−4), reaction time (F = 20.1 and P = 5.2 × 10−13) and symbol–digit substitution (F = 18.6 and P = 4.9 × 10−12). Specifically, subtype 4 had the second-highest correct number of symbol digit matches and the lowest reaction time. Subtype 3 showed the highest correct number of symbol digit matches and the second-lowest reaction time (Fig. 2a).

To further investigate the differences among four subtypes in the risks of mental disorders, we employed Cox proportional hazards regression models, with subtype 4 as the reference group. The Cox model results showed significant differences in the risks of four mental disorders among the four subtypes after false discovery rate (FDR) corrections (adjusted P value <0.05), particularly in anxiety, depression, eating disorder and stroke (Fig. 2b). The P values for Schoenfeld’s global test of the Cox models for anxiety and depression were both 0.2, indicating that the proportional hazards assumption was met. The Cox model for stroke satisfied the proportional hazards assumption after stratification by age (68 years). Additionally, the Cox model for eating disorder, when stratified by body mass index (BMI, ≥25 kg m−2), also met the proportional hazards assumption. Specifically, when compared with subtype 4, both subtype 2 and subtype 3 exhibited a higher risk for depression, with hazard ratios (HRs) of 1.18 (95% confidence interval (CI), 1.04–1.33 and adjusted P = 0.03) and 1.22 (95% CI 1.13–1.30 and adjusted P = 3.8 × 10−7), respectively. However, no significant difference in HRs for this mental disorder was found between subtype 1 and subtype 4. Furthermore, subtype 1 and subtype 3 displayed higher risks than subtype 4 for stroke, with HRs of 1.13 (95% CI 1.03–1.24 and adjusted P = 0.03) and 1.21 (95% CI 1.11–1.31 and adjusted P = 2.3 × 10−5), respectively. However, no significant difference in HRs for this mental disorder was observed between subtype 2 and subtype 4. Additionally, all three subtypes, when compared with subtype 4, exhibited higher risks for anxiety (subtype 1, HR 1.09, 95% CI 1.0–1.17 and adjusted P = 0.03; subtype 2, HR 1.26, 95% CI 1.14–1.41 and adjusted P = 6.2 × 10−5; subtype 3, HR 1.23, 95% CI 1.15–1.31 and adjusted P = 3.2 × 10−9) and eating disorder, with subtype 2 showing particularly significant risk (subtype 1, HR 1.86, 95% CI 1.45–2.38 and adjusted P = 9.1 × 10−6; subtype 2, HR 2.68, 95% CI 2.00–3.58 and adjusted P = 9.7 × 10−10; subtype 3, HR 1.96, 95% CI 1.48–2.59 and adjusted P = 2.0 × 10−5).

Distinctive blood and metabolic biomarker across subtypes

The one-way ANCOVA analyses revealed that 167 of 229 blood and metabolomic biomarkers (32 blood biomarkers and 135 metabolomic biomarkers) were significantly different between the four subtypes after Bonferroni correction (P < 0.05/229) (Fig. 3a). The data satisfied the assumption of the equality of variances and the results were adjusted for covariates. The top 10% of significant biomarkers included the following categories: fatty acids (for example, docosahexaenoic acid (F = 312.5 and P = 1.1 × 10−200) omega-3 fatty acids (F = 232.4 and P = 1.4 × 10−149) and omega-6 fatty acids (F = 18.4 and P = 6.1 × 10−12)), amino acids (for example, glycine (F = 122.1 and P = 1.0 × 10−80)), renal function (for example, urea (F = 114.8 and P = 5.2 × 10−74)), cholesteryl esters (for example, cholesteryl esters in large high-density lipoprotein (HDL) (F = 76.1 and P = 4.1 × 10−49)), cholesterol (for example, cholesterol in large HDL (F = 75.5 and P = 1.0 × 10−48)), lipoprotein particle concentrations (for example, concentration of large HDL particles (F = 73.5 and P = 2.0 × 10−47)), free cholesterol (for example, free cholesterol in HDL (F = 72.9 and P = 4.9 × 10−47)), total lipids (for example, total lipids in large HDL (F = 72.2 and P = 1.5 × 10−46)) and phospholipids (for example, phospholipids in large HDL (F = 70.1 and P = 3.4 × 10−45)).

Fig. 3: Differences in blood and metabolic biomarkers, as well as brain morphology and white matter integrity across the four subtypes.
figure 3

a, Manhattan plot of the one-way ANCOVA analyses (F-tests) for 24 categories of blood and metabolic biomarkers. The height of each point represents the negative logarithm of the P value of the F-test, with the color bar indicating the different biomarker categories. The black dashed line represents the Bonferroni threshold for multiple comparisons (α = 0.05), and the top 15% of biomarkers that exhibited significant differences after Bonferroni correction (P < 0.05/229) are labeled using text annotations. The analysis was adjusted for covariates, including age, BMI, education qualifications and Townsend deprivation index. b, Brain regions that significantly differ in GMV, mean FA and MD among the four subtypes identified in the one-way ANCOVA analyses (F-tests). Multiple comparisons were corrected using FDR correction (adjusted P value <0.05). The analyses were adjusted for age, BMI, education qualifications, Townsend deprivation index, scanning sites and intracranial volume (the latter only for the analysis of GMV). The results of post hoc tests (two-tailed t-tests) on GMV, mean FA, and mean MD comparing subtype 1 (S1, ‘starch-free or reduced-starch dietary pattern’), subtype 2 (S2, ‘vegetarian dietary pattern’) and subtype 3 (S3, ‘high protein and low fiber dietary pattern’) against subtype 4 (S4, ‘balanced dietary pattern’), with FDR correction for multiple comparisons (adjusted P value <0.05). The same covariates were regressed out in the post hoc tests as in the ANCOVAs.

Post hoc analysis further revealed that 127 of 167 blood and metabolomic biomarkers were significantly different between subtype 3 and subtype 4 after Bonferroni correction (P < 0.05/(167 × 3)) (Supplementary Fig. 4c), with most of them being lower in subtype 3. The top 10% of significant biomarkers included fatty acids (for example, docosahexaenoic acid (t = −25.7, Cohen’s d = −0.3 and P = 3.7 × 10−144) and omega-3 fatty acids (t = −21.3, Cohen’s d = −0.3 and P = 6.4 × 10−100)), cholesteryl esters (for example, cholesteryl esters in HDL (t = −14.2, Cohen’s d = −0.2 and P = 6.2 × 10−46)) and cholesterol (for example, HDL cholesterol, t = −14.3, Cohen’s d = −0.2 and P = 1.7 × 10−46)).

Compared with subtype 4, subtype 1 also showed significantly different in 49 blood and metabolomic biomarkers (Supplementary Fig. 4a), with most of them being higher in subtype 1 (Supplementary Fig. 4a), such as fatty acids (for example, degree of unsaturation (t = 8.7, Cohen’s d = 0.1 and P = 4.0 × 10−18) and docosahexaenoic acid (t = 6.3, Cohen’s d = 0.1 and P = 3.8 × 10−10)). Some biomarkers showed lower levels in subtype 1, such as phospholipids (for example, phospholipids in small HDL (t = −5.8, Cohen’s d = −0.1 and P = 8.9 × 10−9)), fatty acids (for example, saturated fatty acid (Cohen’s d = −0.1, t = −5.3 and P = 9.9 × 10−8)) and total lipids (for example, total lipids in small HDL (t = −5.5, Cohen’s d = −0.1 and P = 3.2 × 10−8)).

The results comparing of subtype 2 with subtype 4 were slightly different from those of comparing subtype 3 (or subtype 1) and subtype 4 (Supplementary Fig. 4b). After Bonferroni correction, 72 of the 167 blood and metabolomic biomarkers were found to be significantly different between subtype 2 and subtype 4 (P < 0.05/(167 × 3)). The top 10% of significant biomarkers were renal function (for example, urea (t = −18.7, Cohen’s d = −0.4 and P = 2.5 × 10−77)), amino acids (for example, glycine (t = 18.9, Cohen’s d = 0.4 and P = 3.1 × 10−79)) and fatty acids (for example, omega-3 fatty acids (t = −14.4, Cohen’s d = −0.3 and P = 8.7 × 10−47)). For complete statistical results of the analyses of blood and metabolic biomarkers, please refer to Supplementary Tables 6 and 7.

Differences in neuroimaging phenotypes across subtypes

In the one-way ANCOVA analyses, 23 of the 94 brain regions of the AAL2 atlas were significantly different among the four subtypes after applying FDR correction for multiple comparisons (adjusted P value <0.05) (Fig. 3b). The assumption of the equality of variances were satisfied for conducting one-way ANCOVAs (P > 0.05). These regions included the postcentral gyrus, caudate, putamen, parahippocampal gyrus and so on. Post hoc analysis further revealed significant differences in 16 out of the 23 brain regions between subtype 3 and subtype 4 after FDR correction (adjusted P value <0.05), among which 11 brain regions showed significantly lower in subtype 3, such as postcentral gyrus, parahippocampal gyrus, and inferior parietal gyrus (Fig. 3b). Additionally, subtype 1 showed significant differences compared with subtype 4 in seven regions, including the putamen, caudate, pallidum and paracentral lobule (Fig. 3b). Only four brain regions (thalamus, precuneus and paracentral lobule) were found to be significantly different between subtype 2 and subtype 4 after FDR correction (adjusted P value <0.05), which exhibited higher in subtype 2 (Fig. 3b). The complete statistical results for the analyses of the gray matter volume (GMV) are provided in Supplementary Tables 8 and 9.

Moreover, we performed analogous analyses on the diffusion tensor imaging measures of fractional anisotropy (FA) and mean diffusivity (MD) for the 48 white matter tracts within the John’s Hopkins University (JHU) ICBM-DTI-81 white-matter labels atlas. Our results revealed, for the FA measurements, eight brain regions of interest (ROIs) differed significantly across the four subtypes after FDR correction with an adjusted P value <0.05 (Fig. 3b). These ROIs included the medial lemniscus, uncinate fasciculus and external capsule and so on. Post hoc analyses revealed significant differences in FA measures of seven ROIs between subtype 3 and subtype 4 after FDR correction (adjusted P value <0.05), with all these ROIs exhibiting lower FA values in subtype 3. Additionally, the cingulum hippocampus was the only region that showed significant differences between subtype 1 and subtype 4, with higher FA values in subtype 1. The uncinate fasciculus was the only region that exhibited significant differences between subtype 2 and subtype 4, with higher FA values in subtype 2 (Fig. 3b).

In terms of the MD measures, we found that 11 ROIs were significantly different across four subtypes, including external capsule, anterior limb of the internal capsule, superior fronto-occipital fasciculus and so on (Fig. 3b). Post hoc analysis of comparison of subtype 3 and subtype 4 mirrored these brain regions after FDR correction (adjusted P value <0.05), with higher MD values in all of these ROIs in subtype 3. Also, three ROIs showed significant differences between subtype 1 and subtype 4, such as the superior fronto-occipital fasciculus, anterior limb of the internal capsule and external capsule (Fig. 3b), with higher MD values in subtype 1. Only the cerebral peduncle was found to be significantly different between subtype 2 and subtype 4, with higher MD values in subtype 2. Complete statistical results for the ANCOVA analyses and post hoc tests of FA and MD measures can be found in Supplementary Tables 10–13.

Polygenic risk scores (PRSs) for mental disorders across subtypes

After adjusting covariates and applying the Bonferroni correction (P < 0.05/8), one-way ANCOVA analyses on eight PRSs of mental disorders revealed significant main effects across four subtypes (Fig. 5d), including a PRS for Alzheimer’s disease (F = 8.2 and P = 1.9 × 10−5), ischemic stroke (F = 8.3 and P = 1.7 × 10−5), Parkinson’s disease (F = 6.7 and P = 1.7 × 10−4), cardiovascular disease (F = 6.0 and P = 4.2 × 10−4), bipolar disorder (F = 34.1 and P = 4.8 × 10−22), schizophrenia (F = 72.4 and P = 8.7 × 10−47), depression (F = 11.5 and P = 1.5 × 10−7) and suicide attempt (F = 6.4 and P = 2.5 × 10−4). Levene’s tests confirmed that the data satisfied the assumption of the equality of variances (P > 0.05). Specifically, subtype 2 showed a higher genetic predisposition for several mental disorders, including Alzheimer’s disease, Parkinson’s disease, bipolar disorder, schizophrenia and suicide attempts, than other subtypes, mirroring the comparisons on mental health measures (‘Differences in neuroimaging phenotypes across subtypes’ section). In addition, subtype 3 presented a high genetic susceptibility to ischemic stroke, whereas subtype 4 showed relatively lower genetic risks for most mental disorders, which was consistent with the results on mental health measures (‘Differences in neuroimaging phenotypes across subtypes’ section).

Complex interplay of food preferences and other phenotypes

To examine the complex relationships among food preference, mental health, cognitive function and brain MRI features, we constructed three SEMs with subtype 4 as the reference group. In the model that compared the food preference of subtype 3 with that of subtype 4, we selected those latent variables that were significantly different between subtype 3 and subtype 4. Specifically, the mental health measures encompassed anxiety symptoms (β = 0.63 and P < 0.001), depressive symptoms (β = 0.73 and P < 0.001), self-harm (β = 0.54 and P < 0.001), trauma (β = 0.55 and P < 0.001) and well-being (β = −0.57 and P < 0.001). The cognitive function was characterized by fluid intelligence, reaction time and symbol–digit substitution (β = 0.38 and −0.20 and 0.67, respectively; P < 0.001). The brain MRI traits included the GMV of the top ten brain regions, the mean FA of all seven white matter tracts and the mean MD of the top ten white matter tracts. Figure 4c depicts the directional association results. The food preference was significantly associated with mental health measurements (β = 0.052 and Padj = 5.5 × 10−6), brain MRI traits (β = −0.037 and Padj = 4.6 × 10−4) and cognitive function (β = 0.077 and Padj = 3.5 × 10−8). Additionally, brain MRI traits and mental health were significant predictors for cognitive functions (β = 0.098 and Padj = 9.2 × 10−11 and β = −0.117 and Padj = 1.1 × 10−12, respectively). The brain MRI traits significantly predicted mental health (β = −0.058 and Padj = 1.5 × 10−6). The root mean square error of approximation (RMSEA) of this SEM model was 0.1.

Fig. 4: Directional associations among food preference, mental health, cognitive function and brain MRI trait.
figure 4

a, The results of a structural equation model comparing subtype 1 (S1, ‘starch-free or reduced-starch dietary pattern’) to subtype 4 (S4, ‘balanced dietary pattern’). The analysis revealed that food preference was significantly associated with mental health (β = 0.029 and Padj = 0.009) and brain MRI trait (β = 0.041 abd Padj = 6.1 × 10−5). Mental health significantly predicted cognitive function (β = −0.061 and Padj = 0.03). b,c, The structural equation model for subtype 2 (S2, ‘vegetarian dietary pattern’) (b) and subtype 3 (S3, ‘high protein and low fiber dietary pattern’) (c) versus subtype 4, respectively. All associations were in the expected direction, and all paths in the model for subtype 3 versus subtype 4 were significant. Wald tests were utilized to derive the two-sided P values of the standardized coefficients adjusted for multiple comparisons (FDR correction). The significance levels of the standardized coefficients are indicated by *Padj <0.05, **Padj <0.01 and ***Padj <0.001. n.s., not significant.

The SEM model with comparison of subtype 1 and subtype 4 (Fig. 4a) showed that food preference was significantly associated with mental health (β = 0.029 and Padj = 0.009) and brain MRI trait (β = 0.041 and Padj = 6.1 × 10−5). Mental health was also a significant predictor for cognitive function (β = −0.061 and Padj = 0.03). The RMSEA of this model was 0.1. In addition, the SEM model with comparison of subtype 2 and subtype 4 showed that food preference was significantly associated with mental health (β = 0.084 and Padj = 3.7 × 10−12) and brain MRI trait (β = 0.036 and Padj = 0.002). The brain MRI traits significantly predicted mental health (β = −0.027 and Padj = 0.03). Both brain MRI trait and mental health significantly predicted cognitive function (β = 0.094 and Padj = 1.7 × 10−8 and β = −0.115 and Padj = 1.5 × 10−10, respectively). The RMSEA of this model was 0.05, which indicated a good fit (Fig. 4b).

All observed associations in these three path models were in the expected direction, with most paths being significant after FDR correction. The loadings of the latent variables in these SEM models can be found in Supplementary Table 14.

GWAS for four subtypes

To explore the genetic underpinnings of distinct subtypes of food-liking, we performed three case–control GWAS analysis, with subtype 4 as the reference group. As depicted in Fig. 5a, the GWAS-identified 1,266 single-nucleotide polymorphisms (SNPs) that were significantly different between subtype 3 and subtype 4 (P < 5 × 10−8). These SNPs were mostly located on chromosomes 2, 3, 13 and 17, such as rs36164224 (chromosome 2 (odds ratio (OR) of 1.07 and P = 8.6 × 10−10)), rs62250502 (chromosome 3 (OR of 0.93 and P = 1.3 × 10−11)), rs3124402 (chromosome 13 (OR of 0.92 and P = 2.5 × 10−11)) and rs2532387 (chromosome 17 (OR of 1.07 and P = 1.2 × 10−8)). Additionally, we found that two SNPs were significantly different between subtype 1 and subtype 4 (P < 5 × 10−8), namely rs2622068 and rs11939395, located on chromosome 4. Furthermore, no SNPs were observed to be significantly different between subtype 2 and subtype 4. The summarized GWAS results for subtype 3 versus subtype 4 were provided in Supplementary Data Table 1.

Fig. 5: GWAS-identified genetic variants, distinctive gene expression patterns and enriched functions between subtype 3 (‘high protein and low fiber dietary pattern’) and subtype 4 (‘balanced dietary pattern’).
figure 5

a, Manhattan plot for the case–control GWAS analysis comparing subtypes 3 (cases, n = 35,178) and subtype 4 (controls, n = 103,474). Logistic regression analysis was performed, adjusting for sex, age, BMI, the top 10 ancestry principal components and genotype measurement batch. The red and blue horizontal lines indicate the conventional genome-wide significance thresholds of P < 5 × 10−8 and P < 1 × 10−5, respectively. b, Heatmap for gene expression analysis based on the GTEx (v8 54 tissue types) dataset. EBV, Epstein-Barr virus c, Associated biological functions from the GWAS catalog using the identified genes in GWAS. A multiple test correction was conducted using the Benjamini–Hochberg FDR with an adjusted P value cutoff of 0.05 and a minimum of two overlapped genes. d, The diverse PRSs for mental disorders (n = 176,465) and associated conditions across the four subtypes (subtype 1 (S1), subtype 2 (S2), subtype 3 (S3) and subtype 4 (S4)), as determined by ANCOVA analyses (F-tests) following Bonferroni correction (α = 0.05). The analysis was adjusted for covariates, including age, BMI, education qualifications, Townsend deprivation index and PRS genetic principal components. The data are presented using a box plot (center line, median; box limits, upper and lower quartiles; and whiskers, 1.5 × interquartile range). SCZ, schizophrenia; BD, bipolar disease; CVD, cardiovascular disease; PD, Parkinson’s disease; ISS, ischaemic stroke; AD, Alzheimer’s disease.

Distinct gene expression and enrichment across subtypes

To provide further biological insights into the GWAS results, the identified 1,266 SNPs differed between subtype 3 and subtype 4 (P < 5 × 10−8) were mapped to 16 genes using SNP2GENE function in FUMA. Gene expression analysis based on the Genotype-Tissue Expression (GTEx v8 54 tissue types) dataset revealed a cluster of genes, including MAPT, MVB12B and NSF, which exhibited high expression in several brain tissues, such as the anterior cingulate cortex (BA24), frontal cortex (BA9), amygdala and hippocampus and so on (Fig. 5b). Moreover, the CADM2, CRHR1, MEIS1, PLEKHM1 and KANSL1 genes also showed high expression in the cerebellar hemisphere and cerebellum (Fig. 5b).

Furthermore, after Benjamini–Hochberg FDR corrections (adjusted P value <0.05), the identified 16 genes were found to converge on specific biological processes associated with mental health, cognitive functions and brain tissues, particularly within the context of the gene sets derived from the GWAS catalog (Fig. 5c). For instance, the MAPT, STH, ARL17B, LRRC37A, LRRC37A2, ARL17A and WNT3 genes were most prominently enriched for handedness (Padj = 1.0 × 10−11), whereas some genes were enriched for mental disorders, such as alcohol use disorder (PLEKHM1, CRHR1, SPPL2C, MAPT, STH and WNT3; Padj = 1.8 × 10−11), Parkinson’s disease (ARHGAP27, PLEKHM1, CRHR1, SPPL2C, MAPT, STH and WNT3; Padj = 5.5 × 10−9) and Alzheimer’s disease in APOE \(\varepsilon 4\) carriers (CRHR1, MAPT and WNT3; Padj = 4.3 × 10−4). Additionally, some genes were enriched for cognitive functions, such as reaction time (MAPT, ARL17B, LRRC37A, LRRC37A2, ARL17A and WNT3; Padj = 3.5 × 10−6). Moreover, some genes converged on brain tissues, such as brain morphology (ARHGAP27, PLEKHM1, CRHR1, SPPL2C, MAPT, STH, ARL17B, LRRC37A, LRRC37A2, ARL17A and WNT3; Padj = 3.5 × 10−6), intracranial volume (CRHR1, MAPT and STH; Padj = 4.0 × 10−6), cortical surface area (ARHGAP27, CRHR1 and WNT3; Padj = 4.0 × 10−6) and subcortical region volumes (CRHR1, MAPT and STH; Padj = 3.9 × 10−5).

Discussion

In this study, we investigated naturally developed dietary patterns based on food-liking data from a large UK Biobank cohort (n = 181,990). Remarkably, this study represents a large-scale exploration of food preferences and their important implications for brain health. By employing data-driven approaches, we achieved a reliable and robust classification of dietary patterns. Our analyses identified four distinct dietary subtypes, each characterized by a unique dietary profile: subtype 1 (‘starch-free or reduced-starch dietary pattern’), subtype 2 (‘vegetarian dietary pattern’), subtype 3 (‘high protein and low fiber dietary pattern’) and subtype 4 (‘balanced dietary pattern’).

First, the current study provides a comprehensive understanding of the associations between data-driven dietary patterns and brain health, blood and metabolism and genetics. Our study has shed light on a coherent mediated pathway linking food preferences, brain MRI traits, cognition and mental health via structural equation modeling. A noteworthy finding of our study is the potential impact of food preferences on brain structure. We observed that individuals with specific food preferences displayed distinct patterns of brain MRI traits. These differential brain structural patterns may play an important role in shaping cognitive function and mental health outcomes43,44. The plasticity and adaptability of the brain, influenced by dietary choices, can lead to structural changes that influence cognitive functions and mental health43,44. Moreover, our results suggest a directional relationship between mental health and cognitive function. Mental health not only impacts cognitive abilities but is also influenced by brain structure. The intricate interplay between these factors underscores the importance of considering mental health as a crucial determinant in understanding brain health and cognitive performance45.

Second, we revealed significant differences in mental health and cognitive function across four subtypes. Individuals in subtype 2, who consumed more vegetables and fruits, exhibited relatively higher levels of mental health scores, such as anxiety symptoms, depressive symptoms, mental distress, psychotic experience, self-harm and trauma and a relatively lower well-being score. The association between vegetarian (or vegan) diets and mental health in previous literature have been found to be controversial. Some investigations have reported positive associations of vegetarian and vegan diets with diverse mental health35,36,46,47,48,49,50,51, while other studies found an inverse association37,38,52,53 or no associations39,40. The conflicting findings can be attributed to differences in study designs (for example, cross-sectional, retrospective and randomized controlled trial), variations in how vegetarian and vegan diets were defined (with some of the studies including the consumption of fish or chicken also as vegetarian), discrepancies in the duration of adopting such diets, variations in the timing and methods used to assess mental health52 and the unique characteristics of the groups studied (that is, biological sex). It should be noted that our observational study cannot draw a causal conclusion that vegetarianism leads to mental health problems. Particularly, our genetic analyses showed that individuals adopting the vegetarian dietary pattern exhibited higher PRSs in mental health, so it is possible that the worsened mental health conditions in subtype 2 may be indirectly influenced by the heightened genetic susceptibility. Further investigations are imperative in this regard to establish a causal conclusion in the future.

Subtype 3, which followed an unhealthy ‘high protein and low fiber dietary pattern’, had lower well-being scores than other subtypes. This finding was consistent with previous research that linked dietary quality with well-being54,55 demonstrated that exposure to fast food images potentially impacting well-being56. In contrast, subtype 4, which followed a balanced and healthy dietary pattern, had less mental health problems and a higher well-being score than other subtypes, suggesting that a balanced intake of various food categories may be associated with better mental health57,58. Note that the Cox proportional hazards regression models further support the one-way ANCOVAs. Compared with subtype 4, subtype 3 had higher risks for anxiety, depression and stroke. Individuals, such as in subtype 3, who exhibit a higher intake of fatty meat may experience elevated stress levels and a higher risk of mental disorders, as reported in previous studies on the relationship between diet, stress and mental health59. Such effects could be attributed to an upsurge in the release of inflammatory factors and the permeation of gut flora through the intestinal wall, which is caused by high-fat foods. Our findings on blood and metabolic biomarkers revealed that higher levels of C-reactive protein and white blood cell count in subtype 3, compared with subtype 4, further support this point. These mechanisms may amplify the susceptibility to stress and depression by modifying signaling pathways leading to the brain60,61. This was also confirmed in previous studies which showed that an unbalanced diet may associate with a higher risk of mental disorders15,62, and meat-eaters may have a higher risk for stroke63.

Interestingly, the PRSs for various mental disorders mirrored the pattern. Subtype 2 displayed a heightened genetic susceptibility to a range of mental disorders, including Alzheimer’s disease, Parkinson’s disease, bipolar disorder, schizophrenia and suicide attempt, compared with other subtypes, while subtype 4 demonstrated relatively lower PRS risks for most mental disorders and related conditions. These results provide additional insights into the elevated risks of mental disorders for subtype 2 from a genetic standpoint. In other words, the higher mental health scores (for example, self-harm) as well as the lower cognitive performances scores observed in subtype 2 individuals might potentially be linked to their elevated genetic susceptibility to mental disorders (for example, PRSs for suicide attempts). In contrast, the relatively higher mental health symptoms in subtype 3 individuals, particularly as revealed in the Cox analysis, might be more strongly associated with their dietary habits rather than the genetic risks. Additionally, subtype 4 had the shortest reaction time, which may be attributed to their balanced dietary pattern7.

Third, the associations between dietary patterns and brain morphology and white matter integrity are evident. Specifically, compared with subtype 4, subtype 3 had significantly lower GMV in 11 brain regions, including the postcentral gyrus, parahippocampal gyrus, inferior parietal gyrus, middle temporal gyrus, middle cingulate gyrus and so on. These findings were consistent with previous research that linked a ‘healthier’ diet (that is, rich in vegetables, vitamins, antioxidants and omega-3 polyunsaturated fatty acids or fish intake) with higher total GMV and the volumes of the hippocampus, cingulate gyrus, entorhinal cortex, temporal lobe and parietal lobe28,29,64,65. In contrast, diets high in saturated and trans fats and protein were associated with smaller GMV66. Our study also revealed significant differences in mean FA and MD of white matter tracts between subtype 3 and subtype 4. Higher MD and lower FA values are typically indicative of impaired fiber integrity due to increased diffusion and loss of coherence on preferred movement direction67. Subtype 4 showed higher FA than subtype 3 in seven brain regions, including the medial lemniscus, external capsule, uncinate fasciculus and so on. This findings further complemented the conclusions of previous studies, which suggested that a health-aware diet was associated with improved global white matter connectivity, as indicated by higher FA values68. Furthermore, compared with subtype 4, subtype 3 showed higher MD in 11 ROIs, including the external capsule, anterior limb of the internal capsule, hippocampal cingulum and so on. Overall, these regions were involved in emotional, motivational and cognitive and memory functions, as well as sensory and motor systems69,70,71. For example, the anterior limb of the internal capsule carries fibers from prefrontal cortical regions was associated with emotion, motivation, cognition and decision making69,72, while the hippocampal cingulum is a major pathway connecting the cingulate gyrus to the hippocampal formation and is related to learning and memory functions73. In addition, subtype 2 had higher MD in the cerebral peduncle than subtype 4. The cerebral peduncle was associated with motor and sensory functions74,75. However, there is no conclusive evidence to establish the correlation between vegetarian diets and motor or sensory deficits.

Fourth, the blood and metabolic biomarkers examined in this study appear to be sensitive indicators of the impact of dietary patterns on the body. Our findings indicated that the four subtypes have significantly different levels of several biomarkers, such as omega-3 and omega-6 fatty acids76,77, which are important components involved in serotonin synthesis. Additionally, subtype 4, characterized by a diet that is generally considered to be healthier, exhibited higher levels of certain biomarkers than subtype 3, such as the degree of unsaturation in a fatty acid, HDL cholesterol, total lipids in HDL and so on. These results was consistent with previous investigations, which suggested that a balanced diet pattern was associated with higher HDL cholesterol levels in the elderly population78. Furthermore, in comparison with subtype 4, subtype 2 and subtype 3 both exhibited significantly lower levels of certain crucial fatty acids such as omega-3 fatty acids, which potentially be attributed to the lack of fish consumption in the latter subtype.

Finally, the case–control GWAS-identified 1,266 SNPs that differed between subtype 3 and subtype 4, and these were subsequently mapped to 16 genes. Our gene expression analysis pinpointed a cluster of genes, including notable candidates such as MAPT, MVB12B and NSF, which exhibited elevated expression across multiple brain tissues, encompassing regions such as the anterior cingulate cortex (BA24), frontal cortex (BA9), amygdala and hippocampus, among others. Intriguingly, many of these brain regions with potentially high gene expression overlap with the findings from our comparison of two subtypes based on GMV. This convergence supports the hypothesis that these genes play a crucial role in brain structure and may modulate the impact of dietary patterns on brain health28,29,64,65. Moreover, the identified genes were also found to be enriched in specific biological processes related to mental disorders and cognition, such as Parkinson’s disease, Alzheimer’s disease in APOE ε4 carriers and cognitive performance (reaction time). This further substantiates the potential link between dietary patterns and brain function and brain health79,80. Furthermore, we performed supplementary GWAS and PRS analyses to explore the predictive potential of genetics on brain MRI data and mental health within each dietary pattern. The GWAS analyses involved participants without MRI data, comparing other subtypes to subtype 4. Subsequently, PRSs were computed using genetic data from participants with MRI data, at P value thresholds of 0.01, 0.05 and 0.5. Similar analyses were conducted for mental health symptoms. However, after Bonferroni correction, no significant correlations were observed between brain MRI data and PRSs (Supplementary Fig. 5), or between mental health symptoms and PRSs (Supplementary Fig. 6). Integrating these supplementary analyses on PRSs related to dietary patterns with the evidence from GWAS, gene expression and functional enrichment analyses in our current study, we suggest a potential association between diet-related genes, brain function and mental health, but genes demonstrate a limited capacity to directly predict brain MRI data and mental health symptoms.

There are several highlights and substantial contributions of this study that are worth discussion. A key strength of the current research lies in its pioneering application of data-driven methods to analyze food preference data and identify naturally developed dietary patterns within a large-scale population. Previous studies have often utilized predefined dietary patterns, such as the Mediterranean or Western diet based on self-reported surveys20,81,82,83. However, the adopted definition and application of dietary patterns were not consistent across studies. In contrast, our study established a reliable classification system for dietary patterns by using a data-driven approach without prior assumptions and definitions. The identified dietary patterns reflect the usual eating habits in normal life, which can lead to meaningful investigations of their associations with brain structure and health. Another key strength of our study lies in the integration of multiple-dimensional data with a large sample size (including mental health measures, cognitive function, blood and metabolic biomarkers and genomics), which provides insights into the association between naturally developed dietary patterns by food preferences and brain health.

Our study has several implications for future research and clinical practice. First, our findings underscore the importance of considering dietary factors when examining brain structure, cognitive and mental health outcomes. Future studies could explore the mechanisms underlying these relationships, such as the potential role of specific nutrients or dietary patterns. Second, our study highlights the potential utility of using food preferences as a marker for identifying individuals at risk of cognitive impairment and mental health problems, which could be useful in developing targeted interventions and personalized dietary recommendations to promote brain health. The importance of our study lie in its pioneering exploration of food preferences and their profound impact on the brain, cognition, mental health and overall well-being. To the best of our knowledge, this is the large-scale investigation of its kind, representing a novel advancement in our understanding of the intricate relationship between diet and various aspects of human health. These findings also carry practical implications for educational practices. Early-age education in schools aimed at promoting healthy food preferences can play a vital role in fostering good brain health, cognition and overall well-being throughout the life of an individual. By nurturing healthy dietary habits from an early stage, we have the potential to positively impact public health and empower individuals to lead healthier and high-quality lives.

However, our study also has several limitations. First, it is important to note that the dietary patterns identified in the current study were based on data related to food-liking rather than actual food consumption. While our results have shown that food-liking measures were closely related to food consumption, subtle differences may exist and could influence the observed relationships. Second, participants included in this study are primarily healthy individuals from the UK Biobank. Given that the UK Biobank is known to have a ‘healthy volunteer’ selection bias84, our results may not be entirely generalizable to other populations. Third, we observed demographic differences (for example, age, BMI, Townsend deprivation index and education qualifications) between respondents and nonrespondents to the food-liking questionnaire in the entire UK Biobank population. These disparities may arise from the substantial UKB sample, which may involve the selection criteria. Notably, despite the observed demographic disparities, our study demonstrates that this largest-scale food-liking dataset can effectively unveil robust and reliable food-liking subtypes. Fourth, omega-3 and omega-6 fatty acids76,77, along with tryptophan85, constitute key components in serotonin synthesis. While our findings revealed significant differences in omega-3 and omega-6 fatty acids among the four subtypes, a potential limitation is that we did not account for the actual levels of tryptophan, as well as a lack of detailed information on involvements of omega-3 and omega-6 fatty acids in these dietary patterns. Considering the pivotal role of serotonin in mood regulation and its substantial impact on overall mental health, future research should incorporate these aspects for a more comprehensive understanding. Finally, we used simplified measures to assess mental health factors, including well-being. Though brief measures are pragmatic for large-scale studies and have demonstrated reliability and validity in previous research86,87,88, validation of our findings employing well-designed scales is needed in further investigations.

Our study highlights that the dietary patterns of elderly individuals may have significant associations with their mental health, cognitive functions, blood and metabolic biomarkers and brain imaging. A ‘healthier’ diet with balanced preferences in various food categories is associated with better mental health status, higher levels of cognitive functions and fewer risks of mental disorders. Our findings also indicate that there are genetic associations underlying these dietary patterns, implying that specific genes may be significant in regulating brain function and promoting mental health. Overall, our study provides systematic insights into understanding naturally developed dietary patterns in elderly individuals and underscores the associations between a balanced diet and brain health. The implications of these findings highlight the potential advantages of early-age education on diet, which could promote healthy food preferences and cultivate long-term brain health across the lifespan. Future research is needed to fully comprehend the potential long-term associations between these dietary patterns and brain structure and health, particularly in adolescent and middle-aged populations.

Methods

Study population

This study used data from the UK Biobank under project 19542. The UK Biobank study was approved by the National Information Governance Board for Health and Social Care and the North West Multi-Centre Research Ethics Committee (ref. no. 11/NW/0382). All participants provided written informed consent. The risks of participants experiencing harm due to their involvement were minimal, and the UK Biobank is equipped with insurance to offer compensation for any instances of negligence resulting in harm during participation. The UK Biobank recruited more than 500,000 people aged 37–73 years from the United Kingdom between 2006 and 2010 (ref. 89). The data consisted of a wide range of phenotypic information and biological samples, including demographic characteristics, mental health, cognitive function, blood assays, multimodal neuroimaging and so on. In this study, we only included participants who completed the food-liking questionnaire and provided valid responses, resulting in a total of 181,990 participants (mean age 70.7 ± 7.7 years and 57.08% female).

Food-liking phenotypes

Food-liking data was gathered via an online questionnaire (https://biobank.ndph.ox.ac.uk/showcase/showcase/docs/foodpref.pdf) from 182,176 participants. The questionnaire comprises 150 items that assessed both sensory attributes (for example, bitter and sweet) and food preferences (for example, fruit, vegetables and meat), as well as nonfood items, such as preferences for health-related activities (for example, physical activity and watching television). Liking is measured using a 9-point hedonic scale, where 1 represents ‘extremely dislike’ and 9 represents ‘extremely like.’ This widely used scale has good statistical properties, discrimination between different points and linearity90. The UK Biobank provided two response options, ‘never tried’ or ‘do not wish to answer,’ in addition to the 9-point scale. For our analyses, we included a subset of 140 items related to food and beverages, and classified these items into ten internally reliable categories, based on a classification system of a food preference questionnaire utilized in previous research91, including: alcohol, beverages, dairy, flavorings, fruits, fish, meat, snacks, starches and vegetables. The use of these classification criteria serves to underscore our primary research objective, which is to explore the intricate association between different food categories and brain health. The individual items of each category are listed in Supplementary Table 1. We excluded 186 participants (0.1%) who responded ‘never tried’ or ‘do not wish to answer’ on more than 30% of the 140 food and beverage items, resulting in a final sample of 181,990 participants. Missing values (that is, ‘never tried’ or ‘do not wish to answer’) in the food-liking data were imputed using k-Nearest Neighbors92 with the ‘KNNImputer’ function of the scikit-learn module in Python. The default settings were used, except for the number of neighboring samples (set as 7). Furthermore, to assess the robustness of our findings in the context of data imputation, we also utilized nonimputed data from 72,419 participants for the identification of food-liking subtypes. The demographic characteristics of the 181,990 participants, stratified by three age groups, are summarized in Table 1. In addition, we conducted a statistical analysis (t-test) to compare demographic characteristics between individuals who completed the food-liking questionnaire and those who did not within the entire UK Biobank population (Supplementary Table 2).

Table 1 Demographic characteristics of the 181,990 UK Biobank participants, stratified by age groups

Identification of subtypes based on food-liking phenotypes

To identify data-driven food-liking subtypes, we first normalized the phenotypes using a z-score transformation. Next, to enhance comparability across food categories and reduce the dimensionality of the data93, we performed a PCA for each food category. Specifically, the number of components was determined by adding the explained variance of each component until the total explained variance reached at least 80% (ref. 94). We then used the resulting principal components of the ten food categories as input for hierarchical clustering95. The hierarchical clustering used Euclidean distance and inner squared distance (minimum variance algorithm) for computing the distance between clusters. The clustering results were visualized using a dendrogram. Based on the dendrogram, we found that the population could be grouped into four distinct food-liking subtypes. To assess the stability of the variance explained by the obtained components and validate the reliability of the PCA results, we further examined the explained variance ratios at 70% and 90%.

Furthermore, to characterize the food preferences of the four subtypes, we calculated the average liking score for each food category across participants of each subtype. Given the variations in the score range across the four subtypes for different food categories, we normalized the liking score of each subtype to a range of 1 to 4 within each food category. To facilitate the comparison of food preferences among subtypes, we further standardized the liking score of each food category within each subtype by dividing it by the sum of the liking scores of all food categories, yielding a relative liking score for each food category within each subtype.

Comparisons between food-liking and food-consumption traits

To examine the relationship between food-liking measures and dietary habits, we adopted an approach that accounts for potentially corresponding relationships between the food frequency questionnaire (category 100052) and the food-liking questionnaire. Specifically, we selected food items that were common (for example, fruit, beef, lamb and so on) or those that had corresponding and similar items between both questionnaires (for example, chicken, cheese, bread and so on). To quantify this relationship, we calculated the average scores for both food-liking and food consumption associated with the selected food traits within each subtype. We visualized this quantitative relationship by generating a stacked bar plot to display these comparable scores of selective food items. This approach enabled us to gain a better understanding of the potential associations between food liking and food consumption.

Mental health assessment

The UK Biobank issued an online mental health self-assessment questionnaire (MHQ) in 2016. The questionnaire aimed to comprehensively evaluate self-reported symptoms of mental health and associated major environmental factors (https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=136). Note that the MHQ in the UK Biobank is a composite questionnaire, incorporating previously existing and validated measures, which is based, in part, on the World Health Organization’s Composite International Diagnostic Interview—Short Form96, alongside complementary tools that have been widely used in mental health research and have established validity and reliability97,98. The World Health Organization’s Composite International Diagnostic Interview—Short Form forms the basis of many other major research studies, including those contributing to the work of the international Psychiatric Genomics Consortium. The self-reported diagnosed mental disorder rates in the MHQ align with population estimates from the Health Survey England97. For more details regarding the rationale and procedure for administration of the MHQ, refer to the UK Biobank website (http://biobank.ctsu.ox.ac.uk/crystal/refer.cgi?id=22). In this study, we analyzed eight mental health symptoms, including anxiety symptoms, depressive symptoms, mania symptoms, mental distress, psychotic experience, self-harm, trauma and well-being. The quantitative measures of these mental health symptoms were obtained by calculating an average score of the items used to assess each mental health symptom. The sample size of the data utilized in this study was from 118,616 participants. Specifically, the scores of items in one subcategory of the MHQ were firstly adjusted to the same direction, with higher values indicating more symptoms of mental disorder (with the exception of well-being, where a higher value indicated better well-being). Next, each item was normalized into a range of (0,1) using the MATLAB function ‘mapminmax’, and then the items within each category were averaged to generate an overall measure for each mental health symptom. The items used for the assessment of each mental health symptom were summarized in Supplementary Table 4.

Cognitive assessment

Several of the cognitive function tests administered via touchscreen during the initial assessment visit were reimplemented as web-based questionnaires, and the participants were invited to complete them remotely (https://biobank.ndph.ox.ac.uk/showcase/label.cgi?id=116). Six cognitive function tests were analyzed in the current study, including fluid intelligence, trail making, symbol–digit substitution, pairs matching, reaction time and numeric memory. These tests showed moderate correlations (ranging from 0.33 to 0.64, all P < 0.001) with their respective reference test(s) that was judged to be assessing the same cognitive capability or domain, suggesting substantial concurrent validity and test–retest reliability99. For instance, slower response on the reaction time test in the UK Biobank was associated with slower responses on Deary–Liewald reaction time test simple reaction time (r = 0.52 and P < 0.001). Additional information on the validity of cognitive tests can be found in Fawns-Ritchie et al.99. Additionally, the cognitive assessment data utilized in this analysis were from a substantial sample size of 179,740 participants. Summary information and sample size for these cognitive function tests are provided in Supplementary Table 5.

Blood and metabolic biomarkers

Blood biochemistry data (category 17518) was collected from ~480,000 participants during their recruitment visits between 2006 and 2010. The detailed procedures for quality control of blood biochemistry data can be found in Supplementary Information, as well as the open-source document (https://biobank.ndph.ox.ac.uk/showcase/ukb/docs/biomarker_issues.pdf). Blood count data (category 100081) were also collected from the same number of participants during their first visit, using Beckman Coulter LH750 instruments to analyze samples collected in 4 ml EDTA vacutainers. Additional information about the hematology analysis is provided at (https://biobank.ndph.ox.ac.uk/showcase/ukb/docs/haematology.pdf). We categorized the 30 blood biochemistry biomarkers into ‘liver function’, ‘renal function’, ‘endocrine’, ‘immunometabolism’ and ‘bone and joint’, while the 31 blood cell counts were classified into ‘white blood cell’, ‘red blood cell’ and ‘platelet’.

The metabolic biomarkers were measured using a high-throughput nuclear magnetic resonance (NMR)-based metabolic biomarker profiling platform from randomly selected EDTA plasma samples collected during the first assessment, which included ~120,000 participants. Further details on the processing and quality control of NMR metabolic biomarkers in the UK Biobank were available in Supplementary Information and Julkunen et al.100. The NMR metabolomics (category 220) provided 249 metabolic biomarkers, of which 168 were directly measured and 81 were ratios of these. For this study, only the 168 directly measured metabolic biomarkers were used and categorized into ‘amino acids’, ‘apolipoproteins’, ‘lipoprotein particle sizes’, ‘lipoprotein particle concentrations’, ‘fatty acids’, ‘triglycerides’, ‘phospholipids’, ‘cholesteryl esters’, ‘free cholesterol’, ‘cholesterol’, ‘other lipids’, ‘total lipids’, ‘ketone bodies’, ‘glycolysis-related metabolites’, ‘fluid balance’ and ‘inflammation.’ The dataset for blood and metabolic biomarkers utilized in this study was from 42,665 participants. Supplementary Table 6 provides details of the category and sample size of these blood and metabolic biomarkers.

Brain MRI traits

The UK Biobank collected multimodal neuroimaging from ~40,000 participants using a standard Siemens Skyra 3T running VD13A SP4, with a standard Siemens 32-channel head coil. The details of the image acquisition are provided on the UK Biobank website in the form of a protocol (http://biobank.ctsu.ox.ac.uk/crystal/refer.cgi?id=2367). The UK Biobank conducted all the quality checking and data preprocessing procedures. The details of the acquisition protocols, image processing pipeline, image data files and imaging-derived phenotypes of brain structure and function are available on the UK Biobank website (http://biobank.ctsu.ox.ac.uk/crystal/refer.cgi?id=1977) and elsewhere89,101. We briefly describe the quality control steps for structural and diffusion MRI in Supplementary Information.

Structural MRI

This study utilized quality-controlled T1-weighted neuroimaging data obtained from structural MRI to investigate GMV in 32,715 participants. The T1 data were preprocessed with the Statistical Parametric Mapping software version 12 (https://www.fil.ion.ucl.ac.uk/spm/) using the CAT12 toolbox (https://neuro-jena.github.io/cat/) with default settings. The preprocessing involved high-dimensional spatial normalization with an integrated Dartel template in the Montreal Neurological Institute space, followed by nonlinear modulations and correction for the head size of each individual. Following these procedures, gray matter images (voxel size 1.5 × 1.5 × 1.5 mm3) were obtained for all participants. The AAL2 atlas with 94 cortical brain regions102 was used to extract imaging-derived phenotypes referred to as atlas regional GMV. Intracranial volume was included as a covariate in the statistical analyses of GMV.

Diffusion MRI

The diffusion MRI data in the UK Biobank was obtained with two b-values (b = 1,000 and 2,000 s mm2) at a spatial resolution of 2 mm using a multiband acceleration factor of three, which allows for the acquisition of three slices simultaneously. For each diffusion-weighted shell, 50 distinct diffusion-encoding directions are obtained, resulting in a total of 100 directions across the two b-values. A standard (monopolar) Stejskal–Tanner pulse sequence is used for diffusion preparation, enabling a shorter echo time (TE of 92 ms) and higher signal-to-noise ratio compared with a twice-refocused (bipolar) sequence, although stronger eddy current distortions are introduced. The Eddy tool was used to correct for static field distortion, motion and eddy current distortions103,104. The diffusion MRI data were corrected for distortions, eddy currents and head motion and then modeled using FMRIB’s Diffusion Toolbox for diffusion modeling and tractography analysis105,106. The neurite orientation dispersion and density imaging modeling were conducted using accelerated microstructure imaging via the Convex Optimization tool107. White matter pathways are aligned cross-subject for extracting image-derived phenotypes using tract-based spatial statistics108,109, in which a high-dimensional warp maps a standard-space white matter skeleton to each participant, followed by defining ROIs as the intersection of the skeleton with standard-space masks for 48 tracts110. Definitions of tract regions and names can be found in the JHU ICBM-DTI-81 white-matter labels atlas described at http://fsl.fmrib.ox.ac.uk/fsl/fslwiki/Atlases. The diffusion tensor imaging measures included FA, which reflects the directionality of diffusion, and MD, which measures overall diffusivity111. Statistical analyses were performed on the mean FA and MD of white matter tracts within the JHU ICBM-DTI-81 white-matter labels atlas (category 134) from 31,195 participants.

PRSs for mental disorders

The UK Biobank has released optimized PRSs for 28 diseases and 25 quantitative traits. The PRS scores were generated using a Bayesian approach applied to meta-analyzed summary statistics GWAS data. A principal component-based ancestry centering step was applied to center the score distributions around zero across all ancestries, and the score distributions were also standardized to have approximately unit variance within ancestry groups. More details about the initial PRS release are available at https://biobank.ndph.ox.ac.uk/showcase/refer.cgi?id=5202. For the current analyses, we used the standard PRS of several mental disorders, including Alzheimer’s disease, bipolar disorder, Parkinson’s disease and schizophrenia, as well as ischemic stroke and cardiovascular disease. The standard PRS set was calculated for all UK Biobank individuals, and the data were obtained entirely from external GWAS data. The dataset of PRSs used in this study was from 176,465 participants. The predictive performance of PRS for these six diseases is shown in Thompson et al.112.

Additionally, we computed PRSs for depression and suicide attempt based on existing GWAS summary statistics113,114. To accomplish this, we utilized PRSice-2, in which details can be found at http://www.prsice.info, to estimate the PRS for each participant, after clumping the SNPs with an r2 threshold of 0.1 and a physical distance of 250 kb, resulting in only the most strongly associated SNP being retained. The PRSs for each mental disorder were then calculated using a threshold of 0.05. The data for the PRSs of depression and suicide used in this study were both from 126,895 participants.

Statistical analyses

One-way ANCOVA and post hoc tests

To compare measures of interest among the four subtypes, we conducted one-way ANCOVA analyses115 on mental health symptoms, cognitive functions, blood count and NMR metabolic biomarkers, brain MRI traits and PRSs of mental disorders. A Levene’s test116 was conducted to assess the equality of variances before one-way ANCOVAs. Additionally, post hoc tests117 using two sample t-tests were performed to examine the differences between subtypes 1, 2, or 3 and subtype 4. To ensure the accuracy of our results, we included standard covariates of no interest, such as sex, age, BMI, education qualifications, Townsend deprivation index118 and scanning sites (the last only applied to brain MRI traits). Additional covariates were also included to account for potential confounding factors, including the intracranial volume (regressed out in the analyses of GMV) and the PRS genetic principal components (regressed out in the analyses of PRSs of mental disorders). To correct for multiple comparisons, we used Bonferroni corrections in the analyses of mental health symptoms, cognitive functions, blood count and NMR metabolic biomarkers and PRSs of mental disorders. For the analyses of brain MRI traits, we used FDR corrections to correct for multiple comparisons.

Cox proportional hazards models

To assess the differences in survival rates of several common mental disorders among the four subtypes, Cox proportional hazards models were employed in this study, with subtype 4 serving as the reference group. The Cox model relies on the fundamental assumption of proportional hazards, which posits that the relative hazard remains constant over time across various levels of covariates119. To ensure this assumption, we employed the Schoenfeld residuals method119, which tested the nonzero slope of each time-dependent covariate in the Cox model. The analyses were adjusted for sex, age, BMI, education qualifications and Townsend index118. The analyses included 11 mental disorders, including Alzheimer’s disease (International Classification of Diseases (ICD)-10 F00 and G30), anxiety (ICD-10 F40 and F41), bipolar disorder (ICD-10 F31), depression (ICD-10 F32 and F33), dementia (ICD-10 F00, F01, F02, F03 and G30), eating disorder (ICD-10 F50), Parkinson’s disease (ICD-10 G20), stroke (ICD-10 G45, G46, I60, I61, I63 and I64), sleep disorder (ICD-10 G47), migraine (ICD-10 G43) and schizophrenia (ICD-10 F20). The duration of follow-up, defined as the time elapsed from the participants’ first occurrence of a mental disorder until their death, loss of follow-up or 19 July 2022 (whichever came first), was used as the timescale. The data were provided by 180,173 participants from the UK Biobank. The results of the models were presented as HRs and 95% CI, representing the averaged ratio of hazard of mental disorders between the other three subtypes compared with subtype 4 within 15 years of follow-up. The FDR corrections were used for multiple comparisons. Multivariate Cox regression analyses were performed using the ‘survival’ package in R120.

SEM

A SEM121 was employed to investigate the associations between food-liking and three latent variables: mental health, cognitive function, and brain MRI trait. We constructed three separate SEMs for each of the food-liking comparisons between the subtypes (subtypes 1, 2 or 3) and subtype 4. Food-liking was treated as a group variable indicating the other subtypes or subtype 4. The three latent variables were created by combining measurements that exhibited significant differences between other subtypes and subtype 4 in post hoc tests. Confirmatory factor analysis was used to estimate the latent variables in the model. The cognitive function latent variable was assessed on the basis of symbol–digit substitution, fluid intelligence, pair matching and reaction time. The mental health latent variable was evaluated using anxiety symptoms, depressive symptoms, mental distress, psychotic experiences, self-harm, trauma and well-being scores, which were obtained from the MHQ. The brain MRI trait latent variables in the three SEMs were constructed from the GMV of specific brain regions, the mean FA of the white matter tracts and the MD of the white matter tracts that showed significant differences between other subtypes and subtype 4. The RMSEA was used to assess model fitness. The analyses were conducted using the ‘lavaan 0.6–14’ package in R.

Case–control GWAS

Genotype data were obtained for all 500,000 participants from the UK Biobank v3 imputation. The comprehensive genotyping and quality-control procedures from the UK Biobank are described in a previous publication122. We performed quality control for the genotype data from ~500,000 participants extracted from UKB v3 imputation, excluding SNPs with call rate <95%, minor allele frequency <0.1% and deviation from Hardy–Weinberg equilibrium (P < 1×10−10). We included only participants who were estimated to have recent British ancestry and no more than ten putative third-degree relatives in the analyses. After quality control, a total of 337,199 participants with 8,894,431 SNPs were included in our analysis. In this study, we utilized the genetic data from 181,551 participants.

To explore the genetic underpinnings of distinct subtypes of food-liking, we performed GWAS using logistic regression in PLINK 2.0 (refs. 123) (https://www.cog-genomics.org/plink/2.0/) on a binary phenotype distinguishing between other subtypes (subtype 1, n = 32,843; subtype 2, n = 10,056; and subtype 3, n = 35,178) and subtype 4 (controls, n = 103,474), while adjusting for sex, age, BMI, the top ten ancestry principal components and genotype measurement batch.

Gene expression and enrichment analysis

To provide further biological insights into the GWAS results, we utilized gene set enrichment analysis via FUMA124. First, we employed the SNP2GENE function in FUMA to map the SNPs with significant differences (P < 5 × 10−8) between subtype 3 and subtype 4 in the GWAS results to a set of prioritized genes based on positional, expression quantitative trait loci (eQTL) and chromatin interaction information of the SNPs. FUMA identifies independent, significant SNPs and their surrounding genomic loci based on LD structure and defines lead SNPs and genomic risk loci from the provided summary statistics. Next, we used the GENE2FUNC function to obtain information on gene expression and test for enrichment of the mapped genes from SNP2GENE in predefined pathways. The gene expression analysis was based on the GTEx (v8 54 tissue types) dataset125 and provided averaged expression values (log2 transformed) per gene per label (for example, tissue types or developmental stage). For enrichment analysis on the mapped genes, hypergeometric tests were performed to determine if the mapped genes were overrepresented in any of the predefined gene sets. The FUMA platform provides access to three prominent gene sets for conducting enrichment analyses, namely the Molecular Signatures Database126, WikiPathways127 and the GWAS catalog128. A multiple test correction was conducted using the Benjamini–Hochberg FDR with an adjusted P value cutoff of 0.05 and a minimum of two overlapped genes.

Inclusion and ethics statement

This work involved a collaboration between scientists in China and the United Kingdom. All contributors have been listed as coauthors in acknowledgment to their work. This publication has considered the Global Code of Conduct. All research complies with the Declaration of Helsinki. The UK Biobank study was approved by the National Information Governance Board for Health and Social Care and the North West Multi-Centre Research Ethics Committee (ref. no. 11/NW/0382). All participants provided written informed consent.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.