Personalized optimal nutrition lifestyle for self obesity management using metaalgorithms

Precision medicine applies machine learning methods to estimate the personalized optimal treatment decision based on individual information, such as genetic data and medical history. The main purpose of self obesity management is to develop a personalized optimal life plan that is easy to implement and adhere to, thereby reducing the incidence of obesity and obesity-related diseases. The methodology comprises three components. First, we apply catboost, random forest and lasso covariance test to evaluate the importance of individual features in forecasting body mass index. Second, we apply metaalgorithms to estimate the personalized optimal decision on alcohol, vegetable, high caloric food and daily water intake respectively for each individual. Third, we propose new metaalgorithms named SX and SXwint learners to compute the personalized optimal decision and compare their performances with other prevailing metalearners. We find that people who receive individualized optimal treatment options not only have lower obesity levels than others, but also have lower obesity levels than those who receive ’one-for-all’ treatment options. In conclusion, all metaalgorithms are effective at estimating the personalized optimal decision, where SXwint learner shows the best performance on daily water intake.

www.nature.com/scientificreports/ of IoT, blockchain is a reliable and efficient tool to protect users' privacy and secure communications between parties 11 . BeCome method applies blockchain to prevent unauthorized data access and balance workload via offloading edge computing devices (ECDs) to facilitate real-time data processing tasks 12 . Locality-Sensitive Hashing (LSH) imputes missing data of continuous, discrete, or Boolean types in medical data without sacrificing privacy 13 . LSROM-EH utilizes blockchain, fog computing, and software-defined networking (SON) for task offloading in wireless body area networks (WBANs) based healthcare ECDs to improve computing efficiency 14 .
With IoT and ECDs, a detrended fractal dimension (DFD) feature has been proposed using the fractal dimension (FD) of detrended heart-rate signals from wearable devices, which proved effective in differentiating between regular and insomnia groups 15 . It is well known that obesity is closely related to the occurrence and development of many diseases, such as metabolic disorders 16 , diabetes 17 , arthritis 18 , cardiovascular diseases (CVD) 19 , and COVID-19 20 . In the US, approximately 67% of adults diagnosed with arthritis are overweight or obese 18 . Under the widespread lockdown caused by COVID-19, e-learning has become the primary form of education. Although electronic devices have made it more convenient to take classes remotely, e-learning could damage the mental and physical health of students, and obesity might be a problem 21 . Moreover, e-learning could lead to lower scores and more fatigue for deaf and hard of hearing (DHH) students 22 . Increased physical activity and weight-loss counseling can help reduce BMI and treat arthritis 18 . Poor diet and lack of exercise may lead to obesity, which has a significant impact on the progression of type II diabetes and the development of disease complications 17 . In Shenzhen, China, it has been found that overweight and obese male adults had three times more odds of progressing into severe COVID-19 than underweight and normal-weight male adults 20 . It has been discovered that adipokine human Resistin (hResistin) was associated with the secretion of low-grade pro-inflammatory mediators and the development of insulin resistance in obesity-related diseases 23 . The correlation analysis was performed by a oneway ANOVA model with Bonferroni correction, and the Shapiro-Wilk test was used for comparing normally distributed subgroups 23 . A study of 1098 adults found that higher body mass index was associated with ankle systolic-blood-pressures (SBP), and the correlation was obtained by linear regression analysis 19 . Diet-induced obesity (DIO) is related to higher intracranial pressure (ICP) and brain disorders 24 . In the US, obesity is estimated to be responsible for 7% of direct health care spending and 14% of all deaths 17 . Therefore, reducing the BMI of overweight and obese individuals leads to lower average health expenditures and lower risk of severe diseases.
Obesity research often focuses on obesity in childhood and adolescence. Nutrigenetics, epigenomics and metabolomics gather patient information to estimate individualized optimal nutritional decisions 25 . A 1035-person study using hierarchical multiple regression found that maternal obesity and household income significantly affected childhood obesity rates 26 . In a study of 400 overweight or obese women, a higher plant-based dietary index (PDI) leads to better metabolic conditions 27 . Given individual genomics information, nutrigenetics analyze the association between genes and the impact of nutrient intake on the disease status to estimate a personalized optimal diet 28 . In East Asia, people with FTO gene variants had higher BMI when they consumed less protein 16 . On the other hand, the impact of nutrient intake on athlete performance can be affected by environmental conditions, such as social and economic factors, lifestyle patterns, physical activity, and food preferences 28 . The composition of macro-nutrients such as carbohydrates, proteins and lipids can be optimized based on personal genomics and digestive conditions 16 . In an intervention study to lower post-meal blood glucose, a personalized optimal recipe calculated from exercise data and gut microbiome conditions proved effective 29 . Moreover, in a 10-week intervention trial of 82 people, using an individualized optimal nutritional regimen resulted in lower caloric intake and thus reduced individual obesity 30 .
Compared with previous studies, our research has made the following contributions. First, we estimate personalized optimal decisions on dietary lifestyle factors, which are easy to implement and stick to. We consider data on dietary habits including daily water intake and the frequency of alcohol, vegetable and high caloric food consumption. We observe that the calculated individualized optimal treatment options vary from person to person. Second, we build prediction models for body mass index and choose random forest as the base learner for metaalgorithms. We analyze the feature importance of these dietary factors in this predictive study and identify vegetable intake frequency as the most important feature. Third, personalized optimal treatment options we calculate using metaalgorithms result in lower levels of obesity than general 'one-for-all' recommendations. People who actually receive treatment options that are exactly equal to individualized optimal treatment options have lower levels of obesity. Fourth, we propose novel metaalgorithms SX and SXwint learners, which outperform other metalearners in the analysis of personalized optimal daily water intake. Compared with T and X learners, SXwint learner has the tendency to show larger distance between personalized optimal individuals and the rest. On the other hand, T and X learners tend to show greater distance between personalized optimal individuals and ones receiving 'one-for-all' treatment options.

Data.
To better curb the development of obesity-related epidemics, obesity self-management programs should be easy to implement and adhere to. Genomics and microbiome features are more expensive to measure for large populations. We use an obesity database with 2111 observations and 17 dietary or physical lifestyle features collected in Colombia, Peru and Mexico 31 . In the obesity data, MTRANS is the usual means of transportation and consists of five levels: Automobile, Motorbike, Bike, Public Transportation and Walking. CALC is a binary indicator of alcohol intake and contains two levels: Yes (positive alcohol intake) and No (zero alcohol intake). TUE is the time spent on technological devices. FAF is the frequency of physical activity. SCC is a binary indicator of food calorie monitoring and comprises two levels: Yes and No. CH 2 O is the amount of daily water intake in liters. SMOKE is a binary indicator of smoking and contains two levels: Yes (smoke) and No (never smoke). CAEC is the frequency of sub-meals between main meals and involves four levels: No, Sometimes, Frequently and Always. NCP is the number of main meals in a day. FCVC is the frequency of vegetable intake  31 . In the sample, BMI values range from 12 to 51. Among them, 272 people have BMI between 12 and 18.5, which are underweight, and 287 have BMI between 18.5 and 24.9, which belong to normal-weight category, 580 people have BMI between 25.0 and 29.9 and are overweight, 351 have BMI between 30.0 and 34.9 and are type I obese, and 297 have BMI between 35.0 and 39.9 and are type II obese, 324 people have BMI between 40 and 51 and are classified as type III obesity. People with BMI over 24.9 are overweight or obese, and for them, lowering their BMI is good for their health.
Based on the values of personal characteristics, the estimated personalized optimal eating and living habits are easy for the general public to implement. Before we estimate individualized optimal nutrition lifestyle to reduce BMI, we perform a predictive study for two purposes: (1) comparing the performances of different machine learning methods on predicting BMI, (2) pinpointing the individual features that have significant impacts on BMI. Predictive models with high accuracy can be used as base learners in metaalgorithms to estimate the personalized optimal decision 32 . The significant dietary factors that people can actively change are considered as treatments of interest in pursuit of a personalized optimal nutrition policy. Decision tree methods show higher accuracy in predicting obesity levels than Bayesian and Logistic classification techniques 33 . Among all the features in obesity data, CALC, CH 2 O, FCVC and FAVC measure the intake of alcohol, water, vegetables, and high caloric foods respectively. We use metaalgorithms 32 to calculate individualized optimal intake regimens for these foods and beverages.
Predictive model. Catboost 34 and random forest 35 are ensemble learning methods for categorical and continuous features. We randomly divide the original data into training and testing set with equal sample sizes. Catboost and random forest models are estimated on the training data and BMI predictions are computed on the testing data. In catboost, we specify the number of iterations to be 200, learning rate as 0.05 and tree depth as 10. The mean absolute error of catboost is 1.36 on the training data, and 2.25 on the testing data. In random forest 36 , we specify the number of tree estimators to be 100, and the minimal number of observations required at each split as 7. The mean absolute error of random forest is 1.13 on the training data and 2.27 on the testing data.
Lasso penalized regression performs model estimation and variable selection simultaneously. Covariance test 37 , one of the mainstream post-selection inference methods, is conditional on the solution path of lasso penalized regression. Each time a new variable is added, model error variance decreases, and the importance of the variable is measured by the magnitude of the decrease 37 . Covariance test is designed for high-dimensional data, but is also suitable for low-dimensional data. Moreover, the obesity data fully meet the assumptions of using covariance test 31 . Lasso penalized regression is to regress BMI on the features in Table 1. P values for variables are derived from the standard exponential distribution Exp(1) 37 . For overweight and obese people, at the 5 % significance level, the following features are significant: FCVC, FHWO, FAVC, SCC, NCP and FAF.
According to Fig. 1, for overweight or obese people, both catboost and random forest models regard the following variables to be important: FCVC, TUE, NCP, Age, Gender, FAF, CH 2 O (daily water intake), MTRANS and CALC (alcohol intake frequency), and the following variables to be unimportant: SCC, SMOKE, CAEC, FAVC and FHWO. The variables that covariance test considers important are very different from catboost and random forest models. For example, covariance test considers SCC, FAVC (high caloric food intake frequency) and FHWO to be significant, while catboost and random forest regard them as unimportant. On the other hand, catboost and random forest identify TUE, Age, Gender, CH 2 O, MTRANS and CALC to be important, but covariance test considers them to be insignificant. All three methods identify FCVC (vegetable intake frequency), NCP and FAF (physical activity) to be important. Apparently, among all three methods, significant features found by covariance test are the most intuitive results.

Methods
Based on the covariance test results in Table 1, we suggest lower frequency of high-calorie food intake, and more physical activity. However, taking other individual covariates into account, the better solution on average is not personalized optimal. For example, quitting alcohol works for some people, but for others, it can be counterproductive. To further reduce BMI, an individualized optimal nutritional regimen should be used instead of better-on-average recommendations. We aim to estimate the personalized optimal decision on alcohol, vegetable, high caloric food and daily water intake. We only consider overweight and obese people with BMI over 24.9 31 . Normal-weight and underweight individuals with BMI less than 24.9 do not require further BMI reductions. Our approach only considers the case of two treatment options. To demonstrate our methods, we view CALC (alcohol intake) to be the treatment T of interest. Drinking alcohol at a certain frequency corresponds to CALC = Yes and T = 1 . No alcohol intake corresponds to CALC = No and T = 0 . For the personalized optimal solution of other factors, we replace CALC with these corresponding variables, and then perform the same method.
We use metaalgorithms T, X and S learners to compute personalized optimal scenarios for alcohol intake frequency 32 . Under Neyman-Rubin framework of causal inference 38 , for treatment with binary options, heterogeneous treatment effect is is the individual BMI outcome for treatment T = 0 , and X = x is the individual covariates that may affect treatment outcomes.   36 . All base learners are specified to be random forest models, since they have decent accuracy in predicting BMI. T, X and S learners all require splitting the obesity data into training and testing data with equal sample sizes. On the training data, the personalized optimal nutrition plan is estimated, and on the testing data, each individual's personalized optimal nutrition plan can be calculated. As shown in Fig. 2, T learner 32 has the following steps. (T1) Identify the subset of training data where observed treatment is the BMI of individuals without any alcohol intake and X = x is the observation of individual covariates. (T2) Identify the subset of training data where observed treatment (1) is the BMI of individuals with positive alcohol intake and X = x is the observation of personal characteristics. (T3) On the testing data, compute μ 0 (x) and μ 1 (x) using covariates x of each individual. If μ 1 (x) >μ 0 (x) , lower BMI is what we want, so T = 0 and CALC=No is the personalized optimal decision for this individual. If μ 1 (x) <μ 0 (x) , then T = 1 and CALC = Yes is the personalized optimal decision for this individual. If μ 1 (x) =μ 0 (x) , then CALC = No and CALC = Yes are equally beneficial for this individual.
As shown in Fig. 2, X learner 32 has the following steps. (X1) Perform steps (T1) and (T2) of T learner on the training data. On the training data with T = 1 , estimate using another random forest model μ 0 (x) . (X2) On the subset of training data where observed treatment T = 0 , we observe Y(0) and compute μ 1 (x) , which is an estimate of the potential outcome if treatment T = 1 were assigned. Compute the difference in outcomes On the subset of training data where observed treatment T = 1 , we observe Y(1) and compute μ 0 (x) , which is an estimate of the potential outcome if treatment T = 0 were assigned. Compute the difference in outcomes D 1 = Y (1) −μ 0 (x) . Estimate τ 1 (x) = E(D 1 |X = x) using random forest model τ 1 (x) . (X4) On all training data, estimate the propensity score 39 g(x) = P(T = 0|X = x) using random forest model ĝ(x) . (X5) On the testing data, compute τ (x) =ĝ(x)τ 0 (x) + {1 −ĝ(x)}τ 1 (x) . If τ (x) > 0 , then T = 0 and CALC=No is the personalized optimal decision for this individual. If τ (x) < 0 , then T = 1 and CALC=Yes is the personalized optimal decision for this individual. If τ (x) = 0 , then CALC=No and CALC=Yes are equally helpful for this individual.
As shown in Fig. 2, S learner 32 has the following steps. (S1) On all training data, estimate the joint model Here in X = x , we must include all first-order interaction terms between treatment and individual covariates. (S2) On the testing data, compute μ(x, 0) and μ(x, 1) using covariates x of each individual. If μ(x, 1) >μ(x, 0) , then T = 0 and CALC=No is the personalized optimal decision for this individual. If μ(x, 1) <μ(x, 0) , then T = 1 and CALC=Yes is the personalized optimal decision for this individual. If μ(x, 1) =μ(x, 0) , then CALC=No and CALC=Yes are equally advantageous for this individual. Based on T, X and S learners, we propose SX and SXwint learners. Both SX and SXwint learners require dividing the obesity data into training, re-training and testing data, which account for 1/4, 1/4 and 1/2 of the obesity data respectively. For datasets with large samples and few features, decomposing the original data into three parts can alleviate over-fit problem. Figure 2 illustrates metaalgorithms T, X, S, SX and SXwint learners. The steps of SX and SXwint learners are the same, except that SXwint learner uses first-order interactions between treatment and covariates but SX learner does not use any. S learner does not split the original data into samples with observed treatment T = 0 and ones with T = 1 . Instead S learner uses the whole data for joint modeling, which can better distinguish between personalized optimal group and the rest. However, past research results have shown that X learner performs much better than S learner. Our motivation is that we combine the steps of X and S learners to create a new method that inherits the advantages of both. From Fig. 2, we can see that SX and SXwint learners are formed by splicing the steps of S learner and the steps of X learner.
As shown in Fig. 2, SXwint learner 32 has the following steps. (SXwint1) Execute step (SX1) of SX learner procedures on the training data. In X = x , we use first-order interaction terms between treatment and individual covariates. Estimate µ(x, t) = E(Y |X = x, T = t) using random forest model μ(x, t) . (SXwint2), (SXwint3), (SXwint4) and (SXwint5) are the same as steps (SX2), (SX3), (SX4) and (SX5) of SX learner procedures. In SXwint learner, x contains all first-order treatment-covariates interactions, but in SX learner, x does not use any interaction. In most cases, SXwint learner performs better than SX learner, as presented in Table 2.
In SX and SXwint learners, we estimate a joint model µ(x, t) = E(Y |X = x, T = t) rather than µ 1 (x) = E(Y (1)|X = x) and µ 0 (x) = E(Y (0)|X = x) separately. We use training data to estimate the joint model µ(x, t) = E(Y |X = x, T = t) and re-training data to estimate τ 0 (x) and τ 1 (x) rather than one training set to estimate both. The convergence rates of T, S and X learners have been demonstrated under strict assumptions 32 . It assumes linear heterogeneous treatment effect, base learner estimators have high prediction accuracy under all data distributions P , random errors satisfy E(ε|X = x) = 0 and Var(ε|X = x) ≤ σ 2 , features have finite variances, and the covariance matrix is well-conditioned. The obesity data we study 31 fit these strict assumptions. The convergence rates of SX and SXwint learners are of the same scale as the convergence rate of X learner.

Remark 1
In S learner, x must contain interaction terms between treatment and individual covariates. Under the framework of linear heterogeneous treatment effects, in S learner, without interactions, µ(x, t) = xβ 1 + tβ 2 , treatment effect is µ(x, 1) − µ(x, 0) = β 2 , which is independent of x, resulting in the estimation strategy not being personalized. However, in S learner with first-order treatment-covariates interactions, µ(x, t) = xβ 1 + tβ 2 + xtβ 3 , then treatment effect is µ(x, 1) − µ(x, 0) = β 2 + xβ 3 , which contains x, so the estimation strategy is indeed personalized. Hence in S learner, x must contain treatment-covariates interactions in order for the estimation result to be personalized optimal.
Remark 4 If we specify that g(x) = P(T = 0|X = x) , then treatment effects in SX and SXwint learners are related to xg(x) = E[X(1 − T)|X = x] , which is the first-order interaction between treatment and individual covariates. www.nature.com/scientificreports/ Although our methods perform well, they also suffer from the following limitations. First, metaalgorithms perform well only if the prediction accuracy of the base learner is high. In our research, we find that random forest models have high prediction accuracy and use them as base learners in metaalgorithms. If we fail to find a base learner with sufficiently high prediction accuracy, then metaalgorithms do not perform well. Second, SX and SXwint learners use training, re-training and testing data, which account for 1/4, 1/4 and 1/2 of original data respectively. In ultra-high dimensional data where sample size is 10 and feature dimension is in millions, dividing data into training and re-training data results in lower efficiency of data usage, and higher risk of inaccurate predictive models. Third, SXwint learner uses first-order interactions between treatment and covariates. When there are many categorical features that may take many values, the number of dummy variables and first-order interaction terms can become very large. Then the covariates will have higher dimensionality, making it more difficult to train base learners.

Results
Obesity data are randomly split into training and testing data with equal sample sizes. After we estimate T, X and S learners on the training data, we predict the personalized optimal treatment decision on the testing data. For SX and SXwint learners, training data used in T, X and S learners are now randomly split into new training data and re-training data with equal sample sizes. We estimate SX and SXwint learners using new training data and re-training data. Afterwards we predict the personalized optimal treatment decision for people on the testing data. In summary, when calculating individualized optimal options for each nutrient, T, X, S, SX and SXwint learners decompose the same testing data into the following parts. The personalized optimal group is formed by individuals on the testing data whose treatment observations are exactly identical to the personalized optimal decisions. The non-optimal group consists of people on the testing data whose treatment observations are different from the personalized optimal decisions. We compare BMI levels in the personalized optimal group and the non-optimal group to determine whether the estimated personalized optimal decision is effective. Furthermore, the general optimal group is composed of individuals on the testing data whose treatment observations are equal Table 2. Two-sample Kolmogorov-Smirnov (KS) test results concerning alcohol, vegetable, high caloric food and daily water intake. KS Test 1 D and P Value 1 are the test statistic and p value of KS test 1 between the distributions of BMI in personalized optimal and non-optimal groups. KS Test 2 D and P Value 2 are the test statistic and p value of KS test 2 between the distributions of BMI in personalized optimal and general optimal groups. No. O is the sample size of personalized optimal group. No. NO is the sample size of non-optimal group. No.G is the sample size of general optimal group. The largest KS test distance statistic produced under each type of food or drink is in bold. www.nature.com/scientificreports/ to the ' one-for-all' treatment decision. We compare BMI levels in the personalized optimal group and the general optimal group to determine whether individualized decision-making is superior to general advice. In this section, the significance level of hypothesis test results is set to 5% by default. Figure 3 reveals the comparison results between the distributions of BMI in personalized optimal, non-optimal and general optimal groups. Kolmogorov-Smirnov (KS) test is a nonparametric test free of distributional assumption. Two-sample KS test is applied to determine whether two samples follow the same distribution. KS test statistic measures the maximal distance between the empirical BMI distributions of two samples. The distributions of BMI in personalized optimal and non-optimal groups are significantly different according to KS test results in Table 2. In general, for T, X, S, SX and SXwint learners, BMI measurements in personalized optimal groups are significantly lower than BMI levels in non-optimal groups and general optimal groups. Overweight and obese individuals who obey the personalized optimal nutrition plans exhibit much lower BMI measurements than people who Learners comparison. For T, X, S, SX and SXwint learners, personalized optimal groups overlap with each other. When calculating the individualized optimal nutrition plan for alcohol intake frequency, within the personalized optimal group of T learner, 83% people are also personalized optimal in S learner, and 90% are also personalized optimal in X learner. Among personalized optimal individuals of X learner, 91% are also personalized optimal in T learner and 83% are also personalized optimal in S learner. Among personalized optimal individuals of SXwint learner, 93% are also personalized optimal in SX learner. Among personalized optimal individuals of SX learner, 87% are also personalized optimal in SXwint learner. When investigating the individualized optimal decisions on alcohol, vegetable, high caloric food and daily water intake, we observe that T and X learners have a greater proportion of overlapping in personalized optimal groups than T and S learners. When analyzing vegetable intake, within the personalized optimal group of SXwint learner, 65% people are also personalized optimal in SX learner. Among personalized optimal individuals of SX learner, 87% are also personalized optimal in SXwint learner. Individualized optimal nutritional regimens estimated by T, X, S, SX and SXwint learners are all effective in reducing BMI in overweight and obese people. SXwint learner is more effective than SX learner. Depending on the datasets analyzed and the research objectives, the comparison results between these methods are also different. In self obesity management, we find the best-performing metaalgorithms to predict the personalized optimal nutrition lifestyle. In Table 2, SXwint exhibits the largest distance between the distributions of BMI in personalized optimal and non-optimal groups for alcohol, vegetable and daily water intake, as shown by KS Test 1 D. On the other hand, T and X learners show the greatest distance between the distributions of BMI in personalized optimal and general optimal groups for alcohol, vegetable and high caloric food intake, as shown by KS Test 2 D. Compared to common decisions, personalized optimal decisions estimated by T and X learners tend to give the lowest BMI. In the study of daily water intake, SXwint learner brings the greatest distance between the distributions of BMI in personalized optimal and general optimal groups, and between the distributions of BMI in personalized optimal and non-optimal groups. Personalized optimal nutrition lifestyle. On the testing data, the population with positive alcohol intake has a higher BMI level than the population with no alcohol intake, as shown in Fig. 3. To reduce BMI, the general better-on-average recommendation is zero alcohol intake for everyone. The general optimal group is the set of all people with zero alcohol intake on the testing data. The BMI distributions of personalized optimal and non-optimal groups are significantly different, where SXwint learner brings the greatest distance and SX learner www.nature.com/scientificreports/ yields the smallest, as illustrated in Table 2. The BMI distributions of personalized optimal and general optimal groups are also significantly different, where X learner produces the greatest distance and SX learner outputs the smallest. Figure 4 demonstrates that the personalized optimal and non-optimal groups are of comparable Yes/ No ratios, which implies that the sample size of positive alcohol intake is approximately 2-3 times the sample size of zero alcohol intake in both personalized optimal and non-optimal groups. Figure 3 reveals that people with a low vegetable intake frequency have lower BMI levels, so the common decision is low-frequency vegetable intake for everyone. The general optimal group is the set of all people with a low vegetable intake frequency on the testing data. The BMI distributions of personalized optimal and non-optimal groups are significantly different, where SXwint learner yields the greatest distance and S learner produces the smallest, as demonstrated in Table 2. The BMI distributions of personalized optimal and general optimal groups are also significantly different, where X learner outputs the greatest distance and SXwint learner shows the smallest. Figure 4 indicates that the FCVC High/Low ratio is much higher in the non-optimal group than in the personalized optimal group. Compared with the non-optimal group, a higher proportion of people in the personalized optimal group consume vegetable at a low frequency. Figure 1 and Table 1 show that catboost and random forest do not consider FAVC (high caloric food intake) important, but covariance test identifies FAVC as significant. Intuitively, the effect of FAVC should be significant, since FAVC = Yes means higher energy intake, and FAVC = No means lower energy intake. The BMI distributions of personalized optimal and non-optimal groups are significantly different, where X learner outputs the greatest distance and SX learner produces the smallest, as shown in Table 2. The general optimal group is the set of all people with a low high-calorie food intake frequency on the testing data. Almost all people in the general optimal group have BMI below 30. Despite that the general advice is already very effective, the personalized optimal solutions estimated by metaalgorithms further reduce the BMI. Table 2 illustrates that the BMI distributions of personalized optimal and general optimal groups are significantly different for all learners except SX learner. Figure 4 shows that the FAVC Yes/No ratio is much higher in the non-optimal group than in the personalized optimal group especially for T, X and S learners. In the personalized optimal group, a much smaller proportion of individuals consume high-calorie foods frequently.
On the testing data, individuals with low daily water intake show lower BMI on average, which implies that the general strategy is to drink less than 2 liters of water everyday. The BMI distributions of personalized optimal and non-optimal groups are significantly different, where SXwint learner yields the greatest distance and SX learner produces the smallest, as in Table 2. The BMI distributions of personalized optimal and general optimal groups are also significantly different, where SXwint learner outputs the greatest distance and SX learner returns the smallest. Figure 4 shows that the CH 2 O (daily water intake) High/Low ratio is much higher in the non-optimal group than in the personalized optimal group especially for X and SX learners. A smaller proportion of individuals consume more than 2 liters of water everyday in the personalized optimal group. Figure 1 and Table 1 show that catboost and random forest consider CH 2 O important, but covariance test identifies CH 2 O as insignificant. Intuitively daily water intake has no effect on BMI since it does not affect energy intake or consumption processes. However, in our empirical analysis, individualized optimal nutrition regimens on daily water intake still reduce BMI, as shown in Fig. 3. In carefully designed clinical trials, water intake should have no effect on BMI. But in self obesity management, making daily water intake equal to the personalized optimal decision is beneficial for lowering BMI.

Conclusion
For overweight and obese people, in order to reduce BMI, the general recommendation is to lower the intake of all foods and beverages: alcohol, vegetables, high caloric foods and water. However, individualized optimal nutritional regimens estimated by metaalgorithms are more effective in reducing BMI. In a personalized optimal regimen, for some populations, surprisingly, consuming more on a particular type of food or drink is beneficial for lowering BMI. Through calculations, we find that SXwint learner tends to make BMI distributions in personalized optimal and non-optimal groups more distant. On the contrary, T and X learners tend to make BMI distributions in personalized optimal and general optimal groups more distant.

Data availability
The data sets analyzed during the current study are available from the corresponding author on reasonable request.