Global dietary quality in 185 countries from 1990 to 2018 show wide differences by nation, age, education, and urbanicity

Evidence on what people eat globally is limited in scope and rigour, especially as it relates to children and adolescents. This impairs target setting and investment in evidence-based actions to support healthy sustainable diets. Here we quantified global, regional and national dietary patterns among children and adults, by age group, sex, education and urbanicity, across 185 countries between 1990 and 2018, on the basis of data from the Global Dietary Database project. Our primary measure was the Alternative Healthy Eating Index, a validated score of diet quality; Dietary Approaches to Stop Hypertension and Mediterranean Diet Score patterns were secondarily assessed. Dietary quality is generally modest worldwide. In 2018, the mean global Alternative Healthy Eating Index score was 40.3, ranging from 0 (least healthy) to 100 (most healthy), with regional means ranging from 30.3 in Latin America and the Caribbean to 45.7 in South Asia. Scores among children versus adults were generally similar across regions, except in Central/Eastern Europe and Central Asia, high-income countries, and the Middle East and Northern Africa, where children had lower diet quality. Globally, diet quality scores were higher among women versus men, and more versus less educated individuals. Diet quality increased modestly between 1990 and 2018 globally and in all world regions except in South Asia and Sub-Saharan Africa, where it did not improve.

requirements related to body size, metabolic efficiency, and physical activity, and facilitate comparisons between surveys, age groups, and sexes.

Energy adjustment corrections
We initially asked that all data be shared both unadjusted and energy-adjusted to 2,000 kcal, regardless of age category, but retrospectively changed this decision to reflect the age-specific levels. When possible, energy adjustment using the residual method was repeated to reflect these changes. In some cases, this approach was not possible, and thus alternative approaches for energy adjustment correction were taken.
Energy adjustment correction of aggregate ("stratum-level") data In some cases, data were provided or accessed at the stratum level (i.e., age group, sex, education level, etc.). In these cases, energy adjustment correction depended on whether 2,000 kcal/day-adjusted values had previously been provided by the data owner. If energy-adjusted data had been provided, a simple ratio of the age-specific level to 2,000 kcal was applied post-hoc to convert the value to the correct energy level. If stratum-level data were only provided in an unadjusted format but with corresponding total energy intake, intake was adjusted to the age-specific energy level using the energy density method, in which a simple ratio of reported calorie intake to age-specific level was applied to the unadjusted value. If stratum-level data were provided in age groups which traversed more than one level of age-specific energy intake, a weighted mean daily energy intake was calculated. This weighted mean daily energy level was then used to adjust intake using the ratio readjustment method. If only unadjusted intake was available, the energy density method was used.
Energy adjustment of data without adjusted values or total energy intake In limited cases, individual-level data were not initially energy-adjusted or provided with mean caloric intake data, precluding the use of the gold standard and ratio readjustment methods. In these instances, daily per capita energy availability data from FAO Food Balance Sheets (FBS) were used to inform stratum-level caloric intake. In short, country-year-specific FBS energy data were adjusted using coefficients derived from a multivariate linear regression of GDD input data, FBS data, and both regional and survey-level covariates. Adjusted FBS energy was then corrected to the prescribed energy level by applying a factor of the energy level's proportion of 2,000 kcal. Unadjusted food and nutrient intake values were then adjusted with this corrected energy intake via the energy density method.

Quality control
Data integrity and quality were assessed at each step during survey collection, processing, harmonization, and analyses. Duplicate reviews were performed of recorded survey characteristics, demographic variables, dietary definition classifications, and unit conversions. To assess for outliers and validity (errors) in reported intakes, plausibility thresholds were defined for each dietary factor, both at the individual level and stratum (e.g., group mean) level, based on dietary reference intakes, tolerable upper limits, toxicity ranges, and existing regional data on mean intakes in populations. Any value identified as potentially implausible was reviewed for extraction errors, followed by direct correspondence with the corresponding member or public survey data owners, to detect and correct potential errors. Data remaining implausible after such steps were excluded from final datasets. Results for each dietary factor were further graphed and visually inspected by country, age, sex, dietary assessment method, representativeness, and time, reviewing survey result plausibility and consistency within and across countries.

Data finalization
After data has been finalized for inclusion, it is stored within the Access database, which houses information on all surveys, corresponding authors, and survey checking statuses. Aggregated data is collated by dietary factor and prepared for input into the GDD prediction model. Protocol for converting FFQ frequency data into GDD servings 1.
Step 1-Standardize the categorical frequency variables to a single daily serving unit a. If a range of frequencies is given, take the mean (-Avg‖) of the range b. If the frequency is presented in times/week, divide by 7 (for days in a week) c. If the frequency is presented in times/month, divide by 30.42 (average days in a month) i. Note: If the category is presented as days/week instead of times/week, assume one serving per day and treat as times/week ii. Example A) 5-7 days/week = (6 days/week) / (7days/week) = 0.857 servings/day iii. Example B) 1-3 times/month = (2 times/month) / (30.42 days/month) = 0.066 servings/day d. If the upper range is open ended, use the range of the other frequency categories in the survey to create an upper limit and then take the average of that range. i. Example: -5 or more times per week‖ where next lowest level is 2-4 times per week. Assume a range of 5-7 times per week, take the average (6 times per week)/(7 days/week) = 0.857 servings/day 2.
Step 2-Convert servings to grams a. If available, survey-specific serving sizes were used for conversions. b. If survey-specific serving sizes are not available, ask the data owner for usual, country-specific serving sizes. c. If data owner does not provide country-specific serving sizes, utilize country-specific serving sizes identified from national agencies (e.g., USDA). d. If no country-specific serving sizes are identified, use the GDD standard serving size conversions.  Includes nuts/seeds, soy protein, soy products, peanuts, and peas.
Whole grains g/day Total intake of whole grains, defined as a food with ≥1.0 grams of fiber per 10 grams of carbohydrate, in which all components of the kernel (i.e., bran, germ, and endosperm) are present in the same relative proportions as the intact grain. Examples include whole grain bread, brown rice, whole grain pasta, whole grain breakfast cereals, oats, rye, barley, millet, sorghum, and bulgur. This definition excludes corn products including corn flour, corn meal, and popcorn.
Includes wholegrain breads, cereals, rice/pasta, bread, and other products such as biscuits.
Total processed meats g/day Total intake of processed meat, defined as any meat (including poultry) that has been cured, smoked, dried, or chemically preserved. Examples include bacon, salami, sausages, hot dogs, and processed deli or luncheon meats. This definition excludes fish and eggs.
Includes sausages and unprocessed meats.
Unprocessed red meats g/day Total intake of unprocessed red meat, defined as beef, pork, lamb, mutton, or game that has not been cured, smoked, dried, or chemically preserved. This definition excludes poultry, fish, and eggs.
Includes processed red meats, poultry, fish and organ meats.
Total seafoods g/day Total intake of fish and shellfish. Examples include salmon, tuna, trout, tilapia, shrimp, crab, oysters, and cephalopods.
Includes salted fish, processed fish, and other animal products. Cheese g/day Total intake of cheese derived from the milk of livestock (e.g., cows, buffalo, yak), including hard cheese (e.g., cheddar, Includes yogurt, milk products and cheese. 8 mozzarella, Swiss), soft cheese (e.g., ricotta, cottage cheese, paneer), and processed cheese. Yogurt g/day Total intake of yogurt and fermented milk, including reduced-fat and full-fat yogurt.
Includes dairy curd, buttermilk, paneer, cheese, and milk. Sugar-sweetened beverages (SSBs) g/day Total sugar-sweetened beverage intake, defined as any beverage with added sugar having ≥50 kcal per 8 ounces (236.5 grams) serving, including commercial or homemade beverages, soft drinks, energy drinks, fruit drinks, punch, lemonade, and frescas. This definition excludes 100% fruit and vegetable juices and non-caloric artificially sweetened drinks.
Includes fruit and vegetable juices. May also include coffee, tea, and milk.
Fruit juices g/day Total intake of 100% fruit juice, excluding sugar-sweetened fruit juice and vegetable juice.
Includes fruit juices, vegetable juices and sweetened juices. Total milk g/day Total intake of dairy milk including non-fat, low-fat, skim, and whole-fat milk. This definition excludes yogurt, fermented milk, and soy or other plant derived milk (e.g., coconut milk, almond milk).
Includes yogurt, dairy drinks, cheese, and dairy products.

Covariate identification
We identified country-and time-specific covariate data from various sources to further inform our model predictions. These data supplement our individual-level dietary intake data, particularly in countries for which these inputs are limited. We consulted experts and conducted comprehensive searches of publicly available databases to identify >800 covariates. We prioritized approximately 400 covariates for testing:

Data source
Year ( We conducted principal component analysis (PCA) using the 'princomp' function in R separately for: 1) 23 grouped FAO food balance sheet (FBS) foods, beverages, and energy, 2) 142 G ENuS foods, and beverages, and 3) 19 GENuS nutrients and energy. The first four components from each PCA were considered for covariate testing.

Covariate imputation and truncation
If covariate data were missing for some (but not all) years of a given country, we used linear interpolation to fill in those years. Covariate data sources that ended before 2018 were imputed using a moving average of the three most recent values to obtain values for all covariates through the year 2018. Region -level means were assigned to countries for which entire covariates were missing. For each dietary factor, five-fold cross-validation was used to compare model fit for the three versions of the GDD model. Data were split into five partitions at the survey level: four partitions making up the training dataset, and the remaining segment as the testing data. The models were fit to the training set, and resulting outputs were compared to training set to assess model fit via calcul ating the expected log predictive density (ELPD) 1 . This was repeated five times so that each partition was used once as the training set.

Overview
The GDD prediction model aims to estimate mean intake of 53 dietary factors in 185 countries, by country/year/age/sex/urbanicity/education, by synthesizing survey mean intake data from sources of varying quality. The Bayesian multilevel framework has some advantageous properties that are appealing for our purposes. Namely, • -Shrinkage‖ of parameter estimates towards an overall mean. For example, mean estimates for data sparse countries are pulled towards the region mean, allowing for more reasonable estimates for countries with potentially unreliable data.
• Intuitive framework for predicting means (with uncertainty bounds) for countries with no available data.
• Ability to include prior knowledge about intake through priors • Allows for model flexibility and complexity often not granted in similar frequentist approaches due to difficulty in optimization.

Hierarchical nature of the data
Survey data collected across the globe have an inherently nested hierarchical structure which makes a multilevel approach to modeling the data appealing. The hierarchical structure of the data we assumed was as follows: countries were nested in super-regions, which are nested in the globe. Our model assumed that the super-region means were distributed log-normally around the global mean, and that country means were distributed log-normally around their respective super-region means. Using this structure allowed us to borrow strength across units, a concept commonly known as -partial pooling‖ in the Bayesian literature. In partial pooling, each country's mean estimate borrows from the other countries' data within the region, resulting in shrinkage of the country mean estimate towards the region mean. The less informative the data was for a particular country, the more pooling there is.

Description of model
Fundamentally, our model was a Bayesian model on the log-means of intake with a nested hierarchical structure (clusters countries within super-regions and super-regions within the globe), assuming exchangeability between countries and super-regions after accounting for covariates. To this structure, we added sex, urban/rural, education, and non-linear age effects (also within a nested hierarchical structure), survey and country-level covariates, and overdispersion on study-level variance to account for non-sampling variation. It borrowed heavily from models presented by Finucane et al. 1 and Flaxman et al. 2 . For dietary factors that were measured as proportions of energy intake, we used −log(−log(y)) as the link function instead of log(y). Below we provide a full mathematical description of the model, with detailed descriptions for each component, but first, we present some notation:  (y)) for dietary factors measured as proportions, log(y) otherwise y h,i ⇐ mean intake level for stratum h in study i a j ⇐ country-specific intercept b 1j ⇐ country-specific difference between females and males sex h,i ⇐ variable indicating whether the y h,i corresponds to an all male group (0), all female group (1), or mixed (0.5) b 2j ⇐ region-specific difference between urban and rural u h,i ⇐ variable indicating whether the y h,i corresponds to an all rural group (0), all urban group (1), or mixed (% urban) b 3j ⇐ region-specific education effect u i ⇐ two variables indicating whether y h,i corresponds to low education (defined to be 6 years or less of schooling if mixed, proportion of low education and high education γ j ⇐ non-linear age-trend for region j z h,i ⇐ midpoint age of stratum h in study i X i β ⇐ study + country level covariate effects 2 h, i ⇐ standard error of f (y h,i ) (estimated via delta method) τ 2 ⇐ overdispersion parameter for study i Intercept, sex differences, education differences, and urban/rural differences We fit a multi-level model with 3 levels (countries nested in super-regions nested in the globe) for intercepts and sex differences, and 2 levels (super-regions nested in the globe) for age pattern, education differences, and urban/rural differences. a j refers to the intercept for country j, b 1j refers to the country specific sex effect, b 2j refers to the country specific urban effect, and b 3j refers to the country specific education effect. a g and b g correspond to global intercept and effects while a s , b s denote super-region specific random effects and a c , b c denote country specific random effects. κ c and κ s are the between-country and between-super-region variance, respectively, for their respective model components. Note that the model assumes between country variance is the same across all super-regions, and that education, urban/rural differences and age patterns are assumed to be the same for countries within a super-region. Mathematically, this can be described as follows: Weakly informative priors were used for the hyper-parameters: half-Normal(0, 0.5) for the κ pa rameters, For a g , b g , b g , and b g , a prior of N(0, 0.35) was used. Input data were standardized to the standard normal scale to ensure priors were sensible for all dietary factors and to increase computational stability. 1 2 3

Covariate effects
There were two survey-level covariate effects included in the model to explain potential bias from a survey: survey type and food definition. There were four main types of diet surveys included as covariates in the model: short-term recalls (single or multiple); food frequency questionnaires (FFQs); household budget/intake surveys; and DHS (Demographic Health Survey) questionnaires. Only the recall was considered the -gold standard‖ with regards to estimating the mean unbiasedly. Likewise, not all surveys used the optimal definition for a dietary factor. For example, in the case of fruits, most surveys defined fruits as all fruits. However, some surveys only measured a suboptimal metric, such as fruits including fruit juices. Currently, we combine all sub-optimal metrics into one category for our models. We also included country-year specific predictors in the model (e.g., food availability (FAO food balance sheets or Global Expanded Nutrient Supply (GENuS) model). We assumed their relationship to f (y) was linear, and that the relationships were independent of location (not super-region dependent, or country dependent). See the covariate testing section for a list of the country-year specific predictors used for each dietary factor. Mathematically, this portion of the model can be described as follows: For survey level-covariates, we used a prior of Normal(0, 0.35). The prior for β c parameters depended on the dietary factor. For many dietary factors, we only used 1 or 2 country-level covariates, all from FAO or GENuS. For these variables, we had a very strong prior belief that the they should be strongly correlated with the outcome of interest (e.g., log(fruit availability from FAO) should be strongly positively correlated with log(fruit intake)). In these cases, we used a highly informative prior of N(1, 0.1). For other dietary factors, either no such variable exisited, or other country-year level predictors were also included and do not warrant such a high degree of certainty in a strong relationship. In these cases, we used a much weaker prior of N(0, 0.5).

Age trend
For many surveys, intake was not linearly associated with age. We modelled age using restricted cubic splines with 4 knots at k 1 , k 2 , k 3 , k 4 , corresponding to ages 5, 20, 50 and 65, respectively, after standardization: As with the urban and education effect parameters, we used 2 levels of hierarchy for the age-trend: Weakly informative distributions were used for hyper-prior parameters: Half-Normal(0, 0.5) for the κ parameters and Normal(0, 0.35) for the γ g parameters.

Overdispersion
An additional variance component was added to each study to allow the model to account for nonsampling variation due to survey-level error (from imperfect study design and quality). This additional variance component was modeled in such a way to reflect our expectation that surveys that are less likely to represent the true mean (but not necessarily biased) were more variable. Sources of this non-sampling variation accounted for included surveys not being nationally representative, surveys not being stratified by sex, urban/rural or education, and surveys that used large age groupings (>10 years). We also added an additional constraint to ensure local surveys were considered more variable than regional surveys.
Thus, τ 2 = exp(ϕ intercept + ϕ regional I(X rep = regional) + ϕ local I(X rep = local) + ϕ agerange I(X AgeRange > 10) + ϕ sex I(X sex = both) + ϕ urban/rural I((X urban/rural = both) or (X educ = all)) i i i with the constraints ϕ 2 2 loc al , and all ϕ > 0 except ϕ intercept . We used a prior of Normal(-2.5, 1) for ϕ intercept to reflect our a priori belief than an -ideal‖ survey that is both fully stratified and nationally representative should have minimal overdispersion. For all other ϕ parameters, we used a prior of Normal(0, 0.5).

Computation
We fit each model using STAN 3,4 through rstan 5 , using the No-U-turn sampler (NUTS) 6 , a variant of Hamiltonian Monte Carlo 7 . We used 4 chains of 2000 iterations each, treating the first 1000 iterations of each chain as warm up, for a total of 4000 Monte C arlo iterations to define our posterior distributions.

Predictions
The model described above was ultimately used to provide predictive distributions of mean intake for each dietary factor by country-year and subgroup. Note that the model specified g(y h,i[j] ) of subgroup h in survey i from country j as a linear combination of model parameters and survey-year-subgroup specific information: where we had posterior distributions for model parameters a j , b j , b 2s , b 3s , γ s and β. To obtain a predictive distribution for subgroup h in country j, we calculated: for each draw of our posterior distributions. Because we are interested in country-specific means, we did not use survey specific parameters in our predictions. For countries with no survey data, we did not have a posterior distribution of a j . To get the predictive distributions for these countries in such a way that properly accounts for the variation of mean intake between countries within a region, we replaced a j and b1j with a * j and b * 1j where a * Here, a k[j] and b 1k[j] are super-region-level intercepts and sex effects corresponding to country j, and κ c and κ b 1k[j] are superregion-level intercepts and sex effects corresponding to country j, and κ c a and κ c 1b are the between-country variances for intercept and sex effects, respectively. In other words, each posterior draw for the super-region-level parameter and it's corresponding betweencountry variance parameter generated a unique normal distribution for that draw, and we took a one sample draw from each of these distributions to generate the predictive distribution of that parameter for an unknown country in that region. Note that the uncertainty around the super-region level parameter and between country variance propagate into the predictive distribution for the mean. For some dietary factors, there were entire super-regions with no data. For those superregions, predictive distributions for b 2s , b 3s , and γ s were obtained in a similar way, generating a normal distribution for each draw from the global level parameter and between region variance parameter and sampling from that. For a j and b 1j , we needed to account for between-super-region variance and the between country variance. Therefore, taking the intercept as an example, for each posterior draw, we sampled from N (a g , κ c a + κ s a ). Note that this is equivalent to drawing a sample region mean from N(a g , k s a ) then using that sample as mean and k c a as variance to form a normal distribution to sample country mean from.
Varying slopes modeling structure  Our extensive work to identify surveys and model intakes led to recognition and the finding that, for certain dietary factors, the available global data and model were insufficient to accurately model differences in intakes by jointly stratified by country, age, sex, education level, and urban/rural status while also modeling differences in intakes over time.  For countries without multiple comparable dietary surveys over time (the great majority of global nations), trends over time are largely determined by the strength of the relationship between the best available covariates (often variables from FAO food balance sheets or associated GENuS variables) and the raw survey data. For certain dietary factors, this relationship was sufficiently robust to allow modeling of all joint demographic strata and time trends. By reviewing extensive time trends plots for individual dietary factors and nations, dietary factors with a model beta coefficient 0.4 with their corresponding FAO/GENuS covariate were identified as having a reasonable statistical relationship to capture both all demographic strata differences and time trends. For others (FAO/GENuS beta coefficient<0.4), time trends were modeled using a second, separate Bayesian model.  This second Bayesian model assessed the country-specific associations over time of the survey data for each dietary factor with its corresponding FAO/GENuS covariate. The model incorporated country-level intercepts and slopes, along with their correlation that is estimated across countries. Input data were the same stratified survey data as for the GDD Core model and including dietary assessment method as a covariate. This time component model did not separately estimate differences by age, sex, education, or urban/rural status, but focused on the relationships with FAO/GENuS over time. In sensitivity analyses, age and sex were included as main effects (not varying by country or region) but were found to not qualitatively alter the parameter estimates for the relationship of a country's dietary intake data with its FAO/GENuS data. Thus, including these demographics did not largely affect the time-varying predictions. This model is commonly referred as a varying slopes model structure and leverages two-dimensional partial pooling between intercepts and slopes to regularize all parameters and minimize overfitting risk [1][2][3] . Predictions with the varying slopes model take into account a country-specific intercept and slope when the country has dietary factor data and use the global intercept and slope for countries where data are not available. Time effects were predicted separately for each year including 1990, 1995, 2000, 2005, 2010, 2015, and 2018.  For each country and dietary factor, the country-specific time-trend central predictions from the varying slopes models were used to generate a country-year specific adjustment scaling factor, one for each year of 1990, 1995, 2000, 2005, 2010, 2015, and 2018, compared to the reference of one of these years as determined by the median year of that country's survey data (or 2005 if no country data). This scaling factor, determined by taking the ratio of the predicted dietary intake for that year as compared to the reference year, was multiplied by the country-year posterior predictions from the fully stratified, Core GDD model to determine a time-adjusted final estimate for each stratum.  To be conservative, this varying slopes adjustment (scaling factor) was only used for dietary factors and countries meeting all of the following criteria: at the model level, (a) FAO/GENuS beta coefficient<0.4 in the Core GDD model; and (b) availability of a closely corresponding FAO/GENuS covariate (e.g., dietary survey vitamin A intake vs. GENuS vitamin A); and at the country-level, (c) identification of a positive relationship (coefficient or slope) between the national survey data and FAO/GENuS covariate in the varying slopes model; and (d) to minimize implausible results at the country level, no more than a 3-fold difference between the ratio of the country's range of predicted intake between 1990-2018 divided by the ratio of the country's range of FAO/GENuS values over that same time period.  Among 53 evaluated dietary factors in the GDD, 29 were modeled and incorporated time adjustment using this Bayesian varying slopes model. The other dietary factors were not because (in order of criteria applied) 11 did not have any closely corresponding FAO/GENuS variable (e.g., dietary iodine), 8 had an FAO/GENuS beta coefficient in Core GDD Core Model of at least 0.4, and 4 were unable to complete sampling for the varyingslopes model (i.e., the MCMC chains did not finalize, independent of parameterization). One additional dietary factor, vitamin B9, with a borderline FAO/GENuS beta (0.34) was also not further scaled based on adequate qualitative characteristics of the observed time trends in the GDD Core Model.  To calculate the population represented, we assumed that one or more surveys for any year of data collection were representative of the national population. We summed the national populations of the countries with survey data to estimate the absolute population represented. For the estimate of the percentage of the population represented, we used 7.63 billion as the denominator (global population in 2018).  -3 fat  74  0  0  8  14  15  18  14  5  Omega-6 fat  126  0  4  14  22  25  28  30  3  Sodium  394  25  60  38  75  71  69  47 8 Any diet factor: fruit; non-starchy vegetables; whole grains; nuts; legumes; unprocessed red meat; processed meat; seafood; cheese; yogurt; milk; sugar-sweetened beverages; fruit juice; monounsaturated fat; saturated fat; seafood omega-3 fat; plant omega-3 fat; omega-4 fat; or sodium. The AHEI included 9 of the 11 AHEI components (alcohol and trans-fat were excluded): non-starchy vegetables, fruits, nuts and legumes, whole grains, red and processed meat, sugar-sweetened beverages and fruit juice, polyunsaturated fatty acids (PUFAs), long-chain n-3 PUFAs, and sodium. Each component was scored from 0 to 10, and the AHEI ranged from 0 (non-adherence) to 90 (perfect adherence) but was scaled to range from 0 to 100. The DASH included 8 components: non-starchy vegetables, fruits, nuts and legumes, whole grains, low fat dairy products, red and processed meat, sugar-sweetened beverages, and sodium. Each component was scored from 1 to 5 using sex-specific quintiles, and the DASH score ranged from 8 (non-adherence) to 40 (perfect adherence). The MED included 8 of the 9 MED components (alcohol was excluded): non-starchy vegetables, fruits and nuts, legumes, whole grains, dairy products, red and processed meat, seafood, and the ratio of monounsaturated fatty acids to saturated fatty acids. Each component was scored as 0 or 1 using sex-specific medians, and the MED score ranged from 0 (non-adherence) to 8 (perfect adherence).