The obesity epidemic has emerged as a major public health crisis nationally and internationally1,2. In addition to costing the healthcare system hundreds of billions of dollars, there are countless associated negative health outcomes including cancers, endocrinological disorders, musculoskeletal disorders, and a well-documented increase in premature mortality from cardiovascular disease3. Additionally, the distinction between overweight and obese is becoming increasingly important as many studies have demonstrated a dose-dependent relationship between excess weight or body mass index (BMI) and health outcomes4,5.

The pathophysiology of obesity remains complex, representing a derangement of energy homeostasis and gut endocrine signaling, especially within the context of aberrant insulin sensitivity and regulation, in addition to disruptions in the fine balance of pro- and anti-satiety signals in the gut6,7. In brief, gut hormones such as ghrelin produce hunger and cravings8,9, while hormones such as glucagon like peptide (GLP)-110 and peptide tyrosine tyrosine (PYY)11 trigger satiety. External factors, such as the gut microbiota, can disrupt this carefully orchestrated homeostatic energy balance. For example, spore forming microbes found in the human gut microbiome can influence enteroendocrine cells of the gut to release more or less GLP-1 in response to microbiota-derived secondary bile acids12.

Obesity, however, is just as much a disorder of the endocrine system as it is of the brain, especially with respect to the extended reward network, which is responsible for processing rewarding stimuli and food-seeking behaviors. Key regions of the extended reward network that have been implicated include those related to salience, executive control, core reward, sensorimotor, and emotional regulation-related processes13,14,15,16,17. Despite the robust body of neuroscience research on obesity, investigations have almost exclusively focused on understanding how the obese brain differs from the non-obese brain; no studies have investigated central nervous system (CNS) changes that differentiate obese from overweight individuals. In addition to identifying anatomical and functional alterations in specific brain regions, more recent efforts have focused on identifying alterations in brain network properties18. To assess brain connectivity, graph theory has been leveraged to perform complex network analysis. In this way, we can quantify the anatomic and functional contributions to information flow within the context of a global, whole brain network19,20,21,22. Previous investigations used this approach to explore CNS alterations between high BMI and normal BMI brain connectivity23,24.

Perhaps attributed to the feasibility of more computationally rigorous approaches, combined with an overwhelming amount of data, the last decade has seen an explosion of research leveraging systems-based approaches to understanding complex human disease states, such as obesity. One such example has been to view obesity as a brain-gut disorder, with the gut microbiome also likely to modulate these interactions. Most of the high quality evidence to suggest a role for these brain-gut interactions have been from pre-clinical studies, or primarily cross-sectional clinical studies25,26. It is important to note that there has been limited progress in developing effective, long-lasting treatments for obesity27,28. This is likely driven by both the complex pathophysiology of obesity, combined with the failure to view obesity through a systems biology lens, incorporating both neuroimaging and gut metabolite data. Recent advances in machine learning have allowed for the ability to build robust, predictive models that can distill very large amounts of data into a smaller number of highly impactful features. Here, we use a machine learning approach to leverage the enormous amount of neuroimaging and fecal metabolomic data to better understand key drivers of the obese compared to overweight phenotype.


Study participants

The total sample was comprised of 117 right-handed healthy adult (age ≥ 18) volunteers (36 males and 81 females). A medical exam and clinical assessment that included a modified Mini-International Neuropsychiatric Interview Plus 5.0 (MINI) (27) was administered to confirm the absence of significant medical or psychiatric conditions. Subjects were excluded from participating in the study for any of the following reasons: pregnant or lactating, substance abuse, abdominal surgery, tobacco dependence (smoked half a pack of cigarettes or more daily), extreme strenuous exercise (> 8 h of continuous exercise per week), current or past psychiatric illness, and major medical or neurological conditions. Subjects taking medications that interfere with the central nervous system (full dose antidepressants including SSRI, NSRIs, sedatives or anxiolytics) or regular use of analgesic drugs (including narcotics, opioids, and α2-δ ligands) were excluded. Since female sex hormones such as estrogen are known to effect brain structure and function, in this study we used women who were premenopausal and who were scanned during the follicular phase of their menstrual cycles as determined by self-report of their last day of the menstrual cycle.

Subjects with hypertension, diabetes, or metabolic syndrome were also excluded from the study to minimize confounding effects from our findings. For the same reason, subjects with eating disorders such as anorexia or bulimia nervosa were also excluded. For the purpose of our analyses, we used BMI cutoffs to define our groups: Overweight individuals had a BMI ≥ 25 but < 30, obese individuals had a BMI > 30. Individuals with normal BMIs or who had BMIs that would be considered underweight were excluded from our analysis (BMI < 25). No subjects exceeded 400lbs due to MRI scanning weight limits.

All procedures complied with the principles of the Declaration of Helsinki and were approved by the Institutional Review Board at UCLA’s Office of Protection for Research Subjects (approval numbers 11-000069 and 12-001802). All subjects provided written informed consent.

MRI acquisition

A 3.0 T Siemens Trio scanner was used to perform whole brain structural, and diffusion tensor (DTI) magnetic resonance imaging. Noise reducing headphones were used. Automated data processing and computational workflows for structural and diffusion tensor imaging data were designed and implemented in collaboration with the University of Southern California Laboratory of Neuroimaging (LONI) Pipeline (

Structural gray-matter

For registration purposes, a high resolution structural image was obtained from each subject using a magnetization-prepared rapid acquisition gradient-echo sequence, repetition time = 2200 ms, echo time = 3.26 ms, structural acquisition time = 5 m 12 s, slice thickness = 1 mm, 176 slices, 256*256 voxel matrix, 1 mm voxel size.

Anatomical connectivity (DTI)

Diffusion-weighted MRIs (DWIs) were acquired according to two comparable acquisition protocols. Specifically, DWIs were acquired in either 61 or 64 noncolinear directions with b = 1000 s/mm2, with 8 or 1 b = 0 s/mm2 images, respectively. Both protocols had a TR = 9400 ms, TE = 83 ms, and field of view (FOV) = 256 mm with an acquisition matrix of 128 × 128, and a slice thickness of 2 mm to produce 2 × 2 × 2 mm3 isotropic voxels.

MRI preprocessing and quality control

Structural gray-matter

Structural T1-image segmentation and regional parcellation were conducted using FreeSurfer 29,30 following the nomenclature described in Destrieux et al. 31. This parcellation results in the labeling of 165 regions, 74 bilateral cortical structures, 7 subcortical structures, the midbrain, and the cerebellum32.

Anatomical connectivity (DTI)

Diffusion weighted images (DWI) were corrected for motion and used to compute diffusion tensors that were rotationally re-oriented at each voxel. The diffusion tensor images were realigned based on trilinear interpolation of log-transformed tensors as described in Chiang et al. 33 and resampled to an isotropic voxel resolution (2 × 2 × 2 mm3). White matter connectivity for each subject was estimated between the 165 brain regions using DTI fiber tractography 32, performed via the Fiber Assignment by Continuous Tracking (FACT) algorithm 34 using TrackVis (

Anatomical MRI network construction

Connection matrix

Regional parcellation and tractography results were combined to produce a weighted, unidirected connectivity matrix. The final estimate of white matter connectivity between each of the brain regions was determined based on the number of fiber tracts intersecting each region. Weights of the connections were then expressed as the absolute fiber count divided by the individual volumes of the two interconnected regions 21.

Computing network metrics

The Graph Theory GLM toolbox (GTG) ( and in-house matlab scripts were applied to the subject-specific anatomical brain networks to compute three local weighted network metrics indexing centrality. The network metrics are described below 22,35,36,37.

Measures of centrality quantify the importance of a region’s influence on communication and information flow in large-scale brain networks. These measures include strength, betweenness centrality and eigenvector centrality. Strength represents the number of connections (fiber tracts) a given brain region has, factoring in the “weight” of each connection and reflects a brain region’s total level of impact in the network. Betweenness centrality describes degree to which a brain region lies on the shortest path between two other regions. Acting as way stations, regions with high betweenness centrality are topologically primed to control communication between other regions38. Eigenvector centrality reflects how connected a given brain region is to other brain regions with high centrality (greater number of fiber tracts) and is a measure of a region’s overall influence on the network39.

Stool collection and processing

Fecal samples were aliquoted under liquid nitrogen and shipped to Metabolon for processing and analysis as a single batch on their global metabolomics and bioinformatics platform. Data was curated by mass spectroscopy using established protocols and software as previously described40. The samples of stool were all collected within a week of the subjects’ MRIs.

Machine learning model

A balanced, binary classification label was defined using clinical information about each participant’s weight.

For both the metabolite and brain DTI network metric datasets, the number of variables (987 and 2156 respectively) greatly outnumber the sample size of 117. Machine learning models containing this many feature with a comparatively smaller sample size often contain uninformative variables that lower the model’s classification accuracy. In order to arrive at an optimal, reduced feature space that gave a superior classification, a feature selection technique known as Recursive Feature Elimination (RFE) was utilized with cross validation. Previous work has shown RFE using a Support Vector Machine Estimator (SVM-RFE) to be effective at small sample size and high dimensional data modeling41.

RFE is a wrapper feature selection technique that essentially performs backwards selection on the predictors. A linear model is initially trained on the entire set of features, and iteratively, the features with the lowest weight are removed until a certain target of features to be kept is met. We used a Support Vector Machine (SVM) with a linear kernel as our model for RFE.

In this study, two different types of cross validation were used. K-fold cross validation is a resampling procedure that can be used to evaluate a machine learning model’s performance on a limited dataset. In this process the dataset is partitioned into k different, equally sized, subsets, which are termed folds. Of these k folds, a single one is isolated as a test data set while the rest of the data are designated for training. This process is repeated k times, with a different fold designated as test data at each iteration. Here, we used ten-fold cross validation and leave-one-out (LOO) cross validation, which involves iteratively isolating a single sample as test data and using the rest as training data42.

To determine the best target number of features to keep using RFE, a candidate range from 1 feature, all the way to the entire feature space was considered. RFE was performed with LOO cross validation for each candidate. The number of features that recorded the highest average test data accuracy was determined to be the optimal number of features to keep.

Using this optimal number, RFE was then performed again with ten-fold cross validation To determine the final feature subset, a voting strategy was used43. For each fold, a different subset of features was selected by RFE. Each time a feature was included in a fold’s RFE selection, it received a count, or “vote.” If “n” were the optimal number of features to keep, then the final subset of selected features was determined to be the top “n” features by votes across folds43.

This process was performed on both the metabolite and brain DTI datasets to arrive at minimal feature sets that maximized accuracy.

To further assess the ability of these chosen variables to classify obesity, SVM with a linear kernel, ridge classifier, and logistic regression models were created and trained on just these features. In addition to the isolated variables, patient age and sex were also included as model predictors. For the metabolite model, patient diet was included as well. Logistic regression and a ridge classifier, being different models than the one used as an estimator for RFE, were used to ensure robustness and linear separability of the data. The predictive ability of each of these final models was assessed using LOO cross validation.

Finally, a combined model was created by merging the top 90 percent of features by absolute value of model weight, from each of the final brain and metabolite models. This model was also assessed using leave one out cross validation.

For each model, in addition to overall accuracy, precision and recall were calculated as well. Precision is defined as the number of true positives over the number of true positives plus the number of false positives. Recall is defined as the number of true positives over the number of true positives plus the number of false negatives. In other words, precision is the proportion of predicted trues that were actually true, and recall is the proportion of actual trues that were correctly predicted as true by the model. Precision and recall were calculated for both the very obese and obese classes.

As an additional way to ensure that overfitting was not occurring, a permutation test was performed by randomly shuffling the labels and retraining all three (brain, metabolite, and combined) models44. With the null hypothesis that there is no difference between the original model’s test accuracy versus the test accuracy after shuffling the labels, this test illustrates whether the model actually learned based on the particular relationship between the features and the label, or otherwise simply fit very closely to our high dimensional feature data with a small sample size. If the latter occurred, then we would see that the model would still have an above average test accuracy or have overfit and learned something from this random data.

Figure 1 provides a high-level overview of the methodology.

Figure 1
figure 1

Study overview. In this study, we distil over three thousand unique brain microstructural and gut metabolite data into 76 brain and 50 metabolite features. These features were then used to create a machine learning model that could predict if patients were overweight or obese with over 90% accuracy.


Sample characteristics

The total sample (N = 117) included 64 obese individuals (females = 47, males = 17), mean age = 33.203125, standard deviation 10.2457, and 53 overweight individuals (females = 34, males = 19) mean age = 31.4528, standard deviation = 10.964. Clinical characteristics are summarized in Table 1.

Table 1 Clinical characteristics.

Recursive feature elimination results

Our feature selection method identified 83 DTI features out of 2156 total variables for the brain model (Fig. 2A, SuppTable 1).

Figure 2
figure 2

Recursive feature elimination graph of brain network metrics (A) and metabolites (B). A linear model is first developed incorporating all features and iteratively, the features with the lowest weight are removed. These graphs demonstrate the relationship between model accuracy and the number of features kept.

Out of 987 total metabolites, our feature selection method identified 57 variables for the metabolite model (Fig. 2B, SuppTable 2).

Brain classifier

An SVM model trained on this subset of DTI features identified by our RFE method achieved 90.25 percent accuracy in discriminating very obese patients from obese patients (Fig. 3A). The obese class (0 class) had a precision of 0.90 and a recall of 0.88. The extremely obese (1 class) had a precision of 0.91 and a recall of 0.92.

Figure 3
figure 3

Receiver operating characteristic curve of support-vector machine model using brain features (A) exclusively, metabolite features exclusively (B).

Interestingly, the 83 isolated DTI network metric features were all either node betweenness centrality or average path length measures of various parts of the brain. Out of the 83 isolated features, all of the corresponding brain regions were unique except for nine. The left triangular part of the frontal gyrus (L_InfFGTrip), left fronto marginal gyrus (of Wernicke) and sulcus (L_FMarG_S), right occipital pole (R_OcPo), right intraparietal (interparietal sulcus) and transverse parietal sulci (R_IntPS_TrPS), right transverse frontopolar gyri and sulci (R_TrFPoG_S), left inferior segment of the circular sulcus of the insula (L_InfCirIns), right inferior temporal sulcus (R_InfTS), Transverse frontopolar gyri and sulci(L_TrFPoG_S), and left caudate nucleus (L_CaN) each had both its node betweenness centrality and average path length measures in the identified feature set.

The three features with the largest positive weight, influencing the model most towards classification as extremely obese are average path length of the left pericallosal sulcus (AvPathLength__L_PerCaS), average path length of inferior occipital gyrus and sulcus (AvPathLength__L_InfOcG_S), and node betweeness centrality of the left temporal pole (NodeBWCent__L_Tpo). On the other hand, the three features with the largest negative weight, influencing our model towards classification as just obese are average path length of the right long insular gyrus and central insular sulcus (AvPathLength__RLoInG_CInS), average path length of the left middle-posterior part of the cingulate gyrus and sulcus (AvPathLength__L_MPosCgG_S), and posterior dorsal part of the cingulate gyrus (AvPathLength__PosDCgG) (SuppFigure 1).

Metabolite classifier

An SVM model trained on this subset of metabolite features identified by our RFE method achieved 79.84 percent accuracy in discriminating very obese patients from obese patients (Fig. 3B). The obese (0 class) had a precision of 0.79 and a recall of 0.76. The extremely obese (1 class) had a precision of 0.81 and a recall of 0.83.

The three features with the largest positive weight, influencing the model towards classification as extremely obese were acesulfame, N-acetylisoleucine, and 1,2-dilinoleoyl-GPC (18:2/18:2). On the other hand, the three features with the largest negative weight, influencing the model towards classification as obese, were 1-oleoyl-GPC (18:1), pregnen-diol sulfate, and glyocholate (SuppFigure 2).

Combined classifier

Combining the top 90 percent of features by absolute value of weight from the two described models above yielded a new feature subset of 126 variables (50 metabolite variables and 76 brain dti nm variables) (Table 2). The SVM model trained on this subset of features slightly outperformed the brain classifier. It achieved 90.49 percent accuracy in discriminating very obese patients from obese patients (Fig. 4). The obese class (0 class) had a precision of 0.90 and a recall of 0.89, and the extremely obese class (1 class) had a precision of 0.91 and a recall of 0.92.

Table 2 Brain and metabolites features included in combined support-vector machine model of obesity.
Figure 4
figure 4

Receiver operating characteristic curve of support-vector machine model using combined brain and metabolite features.

In this model, the features from the brain DTI dataset had significantly greater weights than those from the metabolite dataset. The three features with the largest positive weight, influencing the model towards classification as extremely obese were average path length of the left pericallosal sulcus (AvPathLength__L_PerCaS), average path length of inferior occipital gyrus and sulcus (AvPathLength__L_InfOcG_S), and node betweeness centrality of the Superior occipital gyrus (SupOcG).

The three features with the largest negatives weights were average path length of the right long insular gyrus and central insular sulcus (AvPathLength__RLoInG_CInS), average path length of the left middle-posterior part of the cingulate gyrus and sulcus (AvPathLength__L_MPosCgG_S), and posterior dorsal part of the cingulate gyrus (AvPathLength__PosDCgG) (Fig. 5A).

Figure 5
figure 5

Top brain (A) and metabolite (B) features in combined support-vector machine model of obesity. The x-axis represents the brain and metabolite features of interest that contributed the most to the machine learning model—the magnitude of contribution is outlined on the y-axis, with a negative weight associated with the obese phenotype and a positive weight with the overweight phenotype.

The most negative metabolite features in the combined model were pregen-diol sulfate, 1-oleoyl-GPC (18:1), and R-salsolinol. Lastly, the most positive metabolite features in this model were I-urobilinogen, 1-oleoylglycerol (18:1), and genistein (Fig. 5B).

Permutation testing

For all three models, this null hypothesis was rejected. By shuffling the labels, the test accuracy with ten-fold cross validation decreased to worse than random (46 percent). This supports our claim that the models learned meaningful relationships between the features (metabolites and brain DTI) and the label (obesity level).


In this study, we demonstrate a machine learning approach that can successfully differentiate overweight from obese individuals using fecal metabolites and neuroimaging data, independently; however, combination of these two data parameters yields a more accurate classifier. Our results reveal a role for metabolites and brain regions of interest that have been previously investigated within the context of obesity, in addition to suggesting a role for previously unexplored metabolites and regions within this context. While not providing information about causality, our findings will be important for future, mechanistic studies, as this represents the first investigation of brain-gut interactions in differentiating obese from overweight individuals.

A substantial number of brain regions in the extended reward network, primarily the emotional regulation and somatosensory networks, emerged as important in differentiating obese from overweight individuals. These brain regions, most notably the inferior frontal gyrus, cingulate gyrus, and straight gyrus (emotional regulation), in addition to the postcentral gyrus, posterior insula, and paracentral lobule (somatosensory) have been previously described in multivariate analysis pattern classifiers that are able to differentiate normal weight from overweight individuals with a high degree of accuracy23. Only one region of the core reward network, the caudate nucleus, emerged as an important, albeit relatively minor, differentiator in our model. Although disruptions in core reward connectivity have been shown to be particularly important in differentiating normal from overweight phenotypes23, our results suggest that these disruptions are likely not the most important factors in differentiating overweight and obese individuals. Unexpectedly, regions outside of the extended reward network, primarily the default mode network, also emerged as important in differentiating obese from overweight individuals. Activity in the default mode network reflects baseline brain function, in addition to spontaneous self-reflection45, attention to internal stimuli45, and accounts for up to 80% of the brain’s energy use46. Research on the default mode network within the context of obesity is limited; though, our findings are consistent with a preclinical mouse model that demonstrates increased default mode network activity in overweight but not lean mice47. Taken together, our results suggest that obesity should not necessarily be interpreted as a more extreme version of the overweight phenotype, but rather as perhaps a different entity with a unique neuroimaging signature that is in many ways distinct from the overweight one.

Several amino acid derivatives (N-methylproline [from proline], imidazole propionate [histidine], agmatine [arginine], N-acetylisoleucine [isoleucine] and N-butyryl-leucine [leucine]) emerged as important in differentiating obese from overweight individuals. Previous work has linked amino acid and branched-chain amino acids in particular to obesity and related insulin resistance in both human and preclinical models, though these studies have been primarily performed using serum rather than fecal samples48,49,50,51. It is likely that the microbiome plays an important role in mediating this relationship; one study showed that mice transplanted with the microbiomes of human twin pairs discordant for obesity demonstrated differences in body composition, with the microbial communities in the obese mice showing increased metabolism of branched-chain amino acids52. Furthermore, a subset of the metabolites that emerged as some of the strongest differentiators between obese and overweight individuals in our data set are produced exclusively by gut microbes, including imidazole propionate53. L-urobilinogen, a metabolite produced in the intestine by bacterial reduction of bilirubin was also an important differentiator54. Previous studies have suggested a role for hyperbilirubinemia and obesity with respect to blood levels54.

With activity on serotonergic and glutaminergic neurotransmitters, agmatine is one of the handful of metabolites highlighted in our results that has been extensively studied with respect to its role as a neuromodulator55. Agmatine is also released from endogenous neurons in the peripheral nervous system and astrocytes in the central nervous system as a compensatory, protective mechanism in response to stress and inflammation56. Gut agmatine may interact with vagal afferents and contribute to the unique brain-gut signature of the obese and overweight phenotype. As many of the other significant metabolites have not been studies within the context of brain-gut interactions, we cannot conclude if these compounds communicate with the central nervous system in some way.

Our study focused on structural connectivity of brain regions and fecal metabolites. Future work may benefit from incorporating functional connectivity of brain regions or perhaps pairwise connectivity in characterizing key differences between overweight and obese individuals. Although we discussed the metabolites of interest within the context of the gut microbiome, we did not explicitly examine gut microbiota signatures (16S rRNA gene sequencing data) in this study. Future, more complex, machine learning models may benefit from the addition of different data sets to deepen our systems-based understanding of obesity and how it may be different from the overweight phenotype. We are unable to define the directionality and causality between fecal metabolites and alteration in brain connectivity with respect to obesity from this study; however, previous work has suggested bidirectional models for brain-gut-microbiome communication in obesity57. Machine learning tends to focus on the variability of features in order to explain the various outcomes. As a result, machine learning can eliminate features with low variance. This may not be correct in all situations as sometimes small variations can potentially drive large changes in the systems. To some extent, this bias can be eliminated by ensuring that the models being built and used are validated by subject matter experts. Although many of the fecal metabolites included in this model were not on their own able to differentiate between the overweight and obese phenotypes, the unique metabolome signature was able to significantly differentiate between the two groups, which underscores the impact of a machine learning approach to our systems understanding of obesity, which would otherwise not be captured in more traditional, strictly correlational analyses. Despite their predictive accuracy in classifying the data, machine learning algorithms cannot always entirely explain the true underlying processes, while also arriving at a final predictive model with the best set of features for optimizing accuracy. In this study, we differentiated our two groups by BMI; although BMI, which expresses the relationship between height and weight is the most widely used measure of obesity. Future studies may consider other measures of obesity such as waist–hip ratio or visceral adiposity in order to validate our current study.

To our knowledge, this is the first study to integrate microstructural neuroimaging and fecal metabolite data using a machine learning model to understand key differentiators and potential drivers of obese and overweight individuals. A growing body of evidence suggests that brain-gut interactions are important in driving the transition from being normal weight to overweight. Our results suggest that brain-directed signaling may be more important in obesity pathophysiology compared to overweight pathophysiology, as our machine learning model weighed alterations in brain connectivity as substantially more significant compared to fecal metabolites. This exploratory analysis supports previous preclinical and clinical investigations, while also revealing novel insights, which will be essential in driving discovery into previously unexplored avenues of brain-gut-microbiome interactions in obesity.