Nutritional redundancy in the human diet and its application in phenotype association studies

Studying human dietary intake may help us identify effective measures to treat or prevent many chronic diseases whose natural histories are influenced by nutritional factors. Here, by examining five cohorts with dietary intake data collected on different time scales, we show that the food intake profile varies substantially across individuals and over time, while the nutritional intake profile appears fairly stable. We refer to this phenomenon as ‘nutritional redundancy’ and attribute it to the nested structure of the food-nutrient network. This network enables us to quantify the level of nutritional redundancy for each diet assessment of any individual. Interestingly, this nutritional redundancy measure does not strongly correlate with any classical healthy diet scores, but its performance in predicting healthy aging shows comparable strength. Moreover, after adjusting for age, we find that a high nutritional redundancy is associated with lower risks of cardiovascular disease and type 2 diabetes.


Introduction 28
Human dietary intake fundamentally affects our nutrition, energy supply, and health. A better 29 understanding of diet patterns can help us identify measures to prevent or treat health conditions 30 and diseases such as obesity 1,2 , type 2 diabetes 3,4 , and cardiovascular disease 5,6 . Indeed, 31 randomized trials have established the benefits 7 of the Mediterranean diet on clinical 32 cardiovascular disease 8 and the Dietary Approaches to Stop Hypertension (DASH) 9 diet on blood MLVS was designed to complement WLVS. In our analysis, we focused on those MLVS 105 participants with all of four ASA24 records available (in total n=451).

106
For those selected participants in each study, we first assessed the change in their food and 107 nutrient profiles (i.e., the relative abundances of food items and nutrients in their diet) over time.

108
The relative abundances of nutrients are calculated by converting the unit of each nutrient to 109 gram and then normalized by the total grams of all nutrients for an individual. We found that the 110 food profiles were highly dynamic for almost all individuals at different time scales: daily 111 ( Fig.1:a1), monthly ( Fig.1:b1,c1), and yearly ( Fig.1:d1, Fig.1:e1). Moreover, the food profiles are

117
We observed that the most abundant food items consumed by individuals are sweetened 118 beverages and vegetables. This is partially due to the fact that the moisture content in these foods

123
To quantify the between-sample difference in food or nutrient profiles, we adopted the notion of 124 beta diversity from community ecology 46 . In particular, we used four different measures (Bray-125 Curtis dissimilarity, root Jensen-Shannon divergence, Yue-Clayton distance, and negative 126 Spearman Correlation) to quantify the beta diversity. As shown in Fig.S1, the beta diversity of 127 nutritional profiles is significantly lower than that of food profiles at both single food (

141
A key advantage of the FNN is that it enables us to calculate the NR for each diet assessment of 142 any individual, i.e., the within-sample or personal NR. In the ecological literature 32,33,48 , the 143 functional redundancy of a local community is interpreted as the part of its taxonomic diversity 144 that cannot be explained by its functional diversity. Similarly, we can define the NR of a dietary 145 assessment from a particular individual as the part of its food diversity (FD) that cannot be 146 explained by its nutrient diversity (ND), i.e., NR = FD − ND. Here we chose FD to be the Gini-

158
The personal NR of each diet assessment is closely related to the phenomenon of population-level 159 NR observed over a collection of diet assessments. Let's consider highly personalized food 160 profiles from a population. There are two extreme cases: (i) Each food has its own unique nutrient  features such that different foods share a few common nutrients, but some foods are specialized 169 to include some unique nutrients (Fig.2b2). In this case, the ND and NR of each individual's diet 170 assessment can both be quite comparable, and the nutritional profiles can be highly conserved 171 across individuals (Fig.2c2) depicted as a bipartite graph, where for visualization purposes each food node represents one of 176 the nine highest-level food groups (based on the FNDDS food coding scheme) and each nutrient 177 node represents nutrient (Fig.3a). Note that the FNN associated with the diet assessment of any 178 individual can be considered as a particular subgraph of this reference FNN.

179
To characterize the structure of this reference FNN, we systematically analyzed its network 180 properties using the complete nutrient profile of all 7,618 foods. We first visualized its incidence 181 matrix (Fig.3b), where the presence (or absence) of a link connecting a food and a nutrient is 182 colored in green (or white), respectively. We found that some foods (e.g., bacon cheeseburger, hot 183 ham and cheese sandwich, corresponding to the leftmost columns in Fig.3b) contribute to almost 184 all of the nutrients, while some foods only include very few nutrients (e.g., sugar substitutes and 185 smart water, corresponding to the rightmost columns in Fig.3b). Moreover, we noticed that the 186 incidence matrix displays a highly nested structure, i.e., the nutrients of those food items (with 187 fewer distinct nutrients) in the right columns tend to be subsets of nutrients for those food items 190 measure 50,51 , and turns out to be much higher than expected by chance (see Methods for details).

191
We then calculated the nutritional distances among different food items, finding a unimodal 192 distribution with the peak centered around 0.25, indicating that most food items include very 193 similar nutrient components (Fig.3c). Finally, we calculated the degree distributions of nutrient 194 nodes and food nodes, respectively. Here, the degree of a nutrient node in the FNN is just the 195 number of distinct foods that contain this nutrient. Similarly, the degree of a food node in the 196 FNN is the number of distinct nutrients it contains. We found that the degrees of food items 197 follow a Poisson-like distribution (Fig.3d), implying that different foods generally contain a very 198 similar number of nutrients. Nutrient degrees show a more extreme behavior, exhibiting a 199 probability density peak located at the higher end of the degree spectrum, indicating that the network still displays high nestedness structure. However, NODF of this raw FNN is 0.573, which 208 is much lower than the original FNN (see Fig.S4).

209
We emphasize that the highly nested structure of the reference FNN is neither explained by the 210 presence of macronutrients, i.e., broad classes of chemicals such as carbohydrates, protein, and 211 fats that are key components of food and exhibit high degree, nor by the nutrient ontology used 212 to annotate the databases. First, as shown in Fig.3b, the incidence matrix of the FNN still displays 213 a highly nested structure even in the absence of high degree nutrients (the topmost green rows).

214
Second, FNN still shows highly nested structure after excluding nutrients without InChIKey 52 , 215 effectively removing the first hierarchical level of the nutrient ontology, and also all those 216 nutrients that correspond to non-specific chemical mixtures (see Fig.S5). Third, if we randomize 217 the FNN but preserve the nutrient degree distribution, the randomized FNNs have much lower 218 nestedness than that of the real FNN, and the nutrient distances between different foods are 219 significantly increased (Fig.S6). Last but not least, we adopted tools from statistical physics 53 to 220 calculate the expected nestedness value and its standard deviation for an ensemble of 221 randomized FNNs in which the expected food and nutrient degree distributions match those of 222 the real FNN. We found that the expected nestedness of randomized FNNs is significantly lower 223 than that of the real FNN (one sample z-test yields ./012 < 10 *3 , see Methods for details).

224
Personal NR calculation based on dietary intake data. We

231
The NR of the NHS at the population level displays a nonmonotonic decrease, indicating that the 232 diet patterns of those participants indeed have been adjusted. To better illustrate such a diet 233 pattern change, we projected the bipartite food-nutrient network (constructed from HFDB) into 234 the food space, resulting in a food similarity network (see Fig.5a). In this network, each node 235 represents a food item and a link connecting food item-and item-represents the unweighted 236 Jaccard similarity ', of their nutrient constituents (see Methods). Here, for visualization 237 purposes, only links with ', ≥ 0.85 were retained. We found a clear modular structure in the 238 food similarity network, i.e., food items from the same food group form a densely connected 239 cluster or module (see Fig.5a), which is consistent with previous study that a food network based 240 on the foods' nutritional similarity displays separately clustered around animal-based foods and 241 plant-based foods at first, and fish and meats are separately clustered in animal-based food 242 cluster and grains, fruits, vegetables, nuts are separately clustered in plant-based food cluster 54 .

243
Then, we examined the individual food similarity network of a particular NHS participant with 244 the largest NR reduction from year 1984 to 2010 (see Fig.5b,c). We found the density of her food 245 similarity network in 1984 is much higher than that in 2010, suggesting that this participant 246 consumed foods with more overlapping nutrient constituents in 1984 (Fig.5b). Moreover, we 247 found the most abundant food items in 2010 were water and yogurt, which do not connect with 248 each other, indicating that she chose foods with more distinct nutrient constituents (Fig.5c).

249
Impact of FNN structure on personal NR. To identify key topological features of the 250 FNN that determine the NR , we adopted tools from network science. In particular, we 251 randomized the FNN using three randomization schemes, yielding three null models. Null-FNN- nodes and nutrient nodes. Then we recalculated the NR for each diet assessment (Fig.4). We

261
found that for all the cohorts all the four null models yield much lower NR than that of the real 262 FNN (Fig.4

316
MET-h/week) (see Table S1 for details of those characteristics).

317
We first assessed the correlations between NR and these host factors. We found that BMI and 318 pack-years of smoking were negatively correlated with NR, while education, median income, 319 total energy intake, and physical activity were positively correlated with NR (Fig.6b). In all cases,

325
We found that personal NR can achieve very similar error rate (i.e., the proportion of participants

334
We also performed the healthy aging prediction using data from a substudy of HPFS with 6,160 335 healthy agers and 11,534 usual agers 64 . Again, we used personal NR or one of the four healthy 336 diet scores in 1998 and other host factors to predict the healthy aging status. We found that NR 337 can also achieve very similar error rate (or AUROC) as other healthy diet scores in HPFS.

338
Moreover, the performance of NR in HPFS is comparable to that in NHS (Fig.S10).

339
The association between personal NR and the risks of type 2 diabetes and 340 cardiovascular disease. To

347
For NHS participants, after adjusting for age, we observed the NR is associated with a lower risk 348 of the type 2 diabetes (see Table 1). In particular, those NHS participants whose NR values are at tertile-2 and tertile-3 have a hazard ratio of 0.86 (95% CI: 0.80-0.93) and 0.78 (95% CI: 0.72-0.85), 350 respectively, with P for trend <0.001. To further check if this association is robust against many 351 other confounding factors, we also adjusted for total energy intake, race (white, African American,

361
Again, we observed that NR is associated with lower risk of the cardiovascular disease for NHS 362 participants (see Table 1). In particular, after adjusting for age (months) only, the P for trend < 363 0.001. After adjusting for a wide range of confounding factors, the P for trend = 0.006.

364
For HPFS participants, we observed similar results. For type 2 diabetes, after adjusting for age 365 (months) only, the P for trend < 0.001; after adjusting for a wide range of confounding factors, the 366 P for trend = 0.002 (see Table 1). For cardiovascular disease, after adjusting for age (months) only, 367 the P for trend = 0.004; after adjusting for a wide range of confounding factors, the P for trend = 368 0.04 (see Table 2).

369
For both disease outcomes and both cohorts, we also repeated the above calculations using 370 quintiles of the NR score. As shown in Table S2-S3, we found qualitatively very similar results.

372
Since NR is a part of FD and actually they are positively correlated (see Fig.S11), we wonder if 373 FD itself is associated with the risk of type 2 diabetes and cardiovascular disease. We performed 374 association analyses. Interestingly, for both NHS and HPFS participants, we found that, after 375 adjusting for a wide range of confounding factors, FD is not associated with lower risk of type 2 376 diabetes (see Table S4) or cardiovascular disease (see Table S5) at all. This result implies that the 377 association between NR and disease risks cannot be simply attributed to FD.

378
To understand the association between NR and the risk of type 2 diabetes and cardiovascular 379 disease in the two cohorts, we analyzed the food consumption pattern of each NR tertiles in those 380 two cohorts. We found that there is a consistent trend among the three NR tertiles for those 381 important food groups in NHS and HPFS (see Fig.7). For instance, abundances of Fruits,

382
Vegetables, Dairy, Cereal Grains are much higher for T3 (i.e., high-NR participants) than T2 and 383 T1; while abundances of Beverages are much lower for T3 than T2 and T1 in both NHS and HPFS. This food consumption pattern might explain why NR is an indicator of low risk of type 2 diabetes 385 and cardiovascular disease.

387
Through examining various human dietary intake datasets, we found that food profile varies 388 tremendously across individuals and over time, while the nutritional profile is highly conserved 389 across different individuals and over time. To quantify this nutritional redundancy, we 390 constructed the food-nutrient network ---a bipartite graph that connects foods to their nutrient 391 constituents. This food-nutrient network also allows us to assess the NR of any dietary assessment 392 from any individual. We found that this personal NR is not strongly correlated with any existing 393 healthy diet scores. We emphasize that, as the difference between the food diversity and the 394 nutrient diversity of a person's dietary assessment, the personal NR quantifies the nutrient 395 similarity (or overlap) of two randomly chosen food items in the diet assessment. Thus, a healthy 396 diet does not necessarily have higher or lower NR. Interestingly, we found that the personal NR 397 can be used to predict healthy aging with equally strong performance as those healthy diet scores.

398
Hence, the concept of personal NR offers us a completely new perspective on studying human 399 diet. Moreover, we examined its associations with the risks of type 2 diabetes and cardiovascular 400 disease in NHS (all female) and HPFS (all male). For both cohorts, we found a clear inverse 401 association between NR and the two phenotypes after adjusting for age. For HPFS, the inverse 402 association is observed even after adjusting for a wide range of confounding factors. Whether 403 these findings can lead to practical nutritional guidance warrant further interventional studies.

404
Since the personal NR measure is not strongly correlated with any classical healthy diet scores, 405 in principle we can combine the concepts of NR and those healthy diet scores to better capture 406 the total impact of diet on health outcomes. For instance, one can leverage the food-specific 407 subgraphs of the FNN (see Fig.S2) to calculate the NR of food groups contributing to each 408 component of a healthy diet score. This will enable us to define an NR-aware healthy diet score.

409
Systematically exploring this direction warrant dedicated efforts, which is beyond the scope of 410 the current work.

411
There are several limitations in our current framework of NR calculation. First, we did not 412 explicitly consider the nutrient difference between different food sources. We understand that 413 nutrient content and its fluctuations span several orders of magnitude, and different scaling 414 transformation, as well as different selections of nutrients, could modulate nutrient diversity and 415 redundancy across individuals 49 . We anticipate that incorporating this information in our NR 416 calculation will further improve the power of using NR to predict healthy aging or other disease 417 risks 65 .
Second, the calculation of a personal NR relies on food intake measurements, e.g., ASA24 and 420 FFQ, which are based on self-reported dietary intake questionnaires. We understand that such 421 food intake measurements have inherent limitations, particularly measurement error related to 422 poor recall, which can be overcome by the use of nutritional biomarkers that are capable of 423 objectively assessing food consumption in different biological samples without the bias of self-424 reported dietary assessment 66 . Although nutritional biomarkers provide a more proximal 425 measure of nutrient status than dietary intake, quantitatively studying NR using nutritional 426 biomarkers is beyond the scope of the current study. We anticipate that our framework will 427 trigger more research activities in this direction.

491
Theoretical approach. To theoretically analyze the nested structure of a given bipartite graph,

502
Nutritional distance measure. To avoid the influence of nutrient amount variability in foods,

503
we used the (unweighted) Jaccard index to quantify the nutritional distance between food item-504 and item-:

506
where ' represents the nutrients in food . ', = 0 indicates that the food item-and food item-507 share exactly the same nutrient constituents; ', = 1 means that they have totally different 508 nutrient constituents. The nutritional similarity between food item-and item-can be defined as 509 510

512
Nutritional redundancy measure. In the main text, the nutritional redundancy (NR) is defined

527
This offers a parametric class of food diversity measures defined as follows:

532
Note that the Gini-Simpson index (GSI) used in the main text is related to FD + as follows:  558 559

571
The primary outcome measure was major Cardiovascular disease (CVD), which is defined as a

586
and we included only events that occurred before a manifest cardiovascular event.

588
Strokes were confirmed if data in the medical records fulfilled the National Survey of Stroke 589 criteria requiring evidence of a neurological deficit with sudden or rapid onset that persisted 590 for >24h of until death 77 . We excluded cerebrovascular pathology due to infection, trauma, or 591 malignancy, as well as "silent" strokes discovered only by radiologic imaging. Radiology reports 592 of brain imaging (computed tomography or magnetic resonance imaging) were available in 89% 593 of those with medical records. We classified strokes as ischemic stroke (thrombotic or embolic 594 occlusion of a cerebral artery), hemorrhagic stroke (subarachnoid and intraparenchymal 595 hemorrhage), or stroke of probable/unknown subtype (a stroke was documented but the subtype 596 could not be ascertained owing to medical records being unobtainable).

598
Deaths were identified by reports of families, the U.S. postal authorities, and searches of the

Supplemental Information
1. NR calculating using human dietary data 1.1 Reference FNN

USDA database
To construct the Food-Nutrient network, we downloaded the FNDDS 2011-2012 from the USDA database, which including 7,618 foods and 65 macronutrients. The USDA National Nutrient Database for Dietary Studies is the major source of food composition data in the United States.
To be consistent with the version used in DMAS study, we chose the version 2011-2012 to construct reference Food-Nutrient network (FNN). This version includes 7,618 foods, which can be clarified into 9 highest items and the total number of nutrients is 65.

Remark 1:
In calculating the nutritional profiles (Fig.1) as well as the nutritional redundancy ( Fig.4), we excluded energy and water for the following considerations. First, energy does not have a unit of mass, and hence cannot be included in the nutritional profile where components represent relative abundances. Consequently, it cannot be used to calculate the nutritional redundancy either. Second, water was not considered as a nutrient in the Harvard Food Composition Database (HFDB). For consistency and comparison purposes, we also removed it from FNDDS when we calculate the nutritional profiles and nutritional redundancy from the DMAS, WLVS, MLVS data.

Frida database
The database Frida Food Data (frida.fooddata.dk) was created and published by the National Food Institute, Technical University of Denmark (DTU), including data on nutrient content of various foods. We used the version released at 08-02-2019, which includes 1,185 foods items and 198 nutrients.

Harvard Food Composition Database
We used the Harvard food composition table of year 2015 to construct a reference food-nutrient network to calculate the nutritional redundancy of participants from the NHS. The Harvard food composition table consists of 575 foods and 182 nutrients. To calculate the nutritional redundancy, we removed calories due to different unit and total protein, total fat and total sugar since those nutrients are overlapped with some sub-nutrients.

Other databases
Other databases, for example, FooDB, a database representing the most comprehensive effort to integrate food composition data from specialized databases and experimental data, has provided the information of 26,625 distinct bio-chemicals in foods. PhenolExplorer and eBasis have also resulted in wealth of information on food composition. Throughout our analysis, we focused on the nutrient level, rather than composition and compounds levels, thus we reference FNN are constructed by USDA and Frida databases.

Nutrition profile analysis
The nutritional composition of foods for DMAS was determined using ASA24-2016. Both of WLVS and MLVS composition of foods for DMAS was determined using ASA24-2012. ASA24 assigns nutrient information to foods using the USDA's Food and Nutrient Dietary Database (FNDDS). Subjects reported dietary intake as food records and entered their own dietary records directly into ASA24. To calculate the relative abundance of nutrient, we removed energy (unit is not in mass) and water (the water information is not included in NHS database, so we removed to be consistent).

Food Choice Analysis
Foods of DMAS, WLVS and MLVS were categorized according to their FNDDS food code and modification code as assigned by ASA24. The foods of NHS were mapped from serving data to the Harvard food composition table according to the food descriptions.

HEI-2005
HEI-2005 1 is a score that measures adherence to the USDA 2005 Dietary Guidelines for Americans. The score range is 0 to 100. Each of the 12 components has a minimum score of zero and a maximum score of 5, 10 or 20. These components are: Total vegetables, Dark green & orange vegetables, Total fruit, Whole fruit, Total grains, Whole grains, Milk, Oils, Saturated fat, Sodium and SoFAAs.

AHEI-2010
AHEI-2010 1 is a score that measures adherence to a diet pattern based on foods and nutrients most predictive of disease risk in the literature. The minimum score = 0, maximum score = 110. Each of the 11 components has a minimum score of 0 and a maximum score of 10, as outlined in the table below. A score between the minimum and maximum is assigned on a continuous basis (except for sodium and alcohol). Those 11 components are: Vegetables, Fruit, Whole grains, Sugar-sweetened beverages and fruit juice, Nuts and legumes, Red meat and processed meat, Trans fat, Long-chain fats, Poly-unsaturated fatty acids, Sodium and Alcohol.

AMED
The components of AMED 2 are vegetables (excluding potatoes), fruits, nuts, whole grains, legumes, fish, ratio of monounsaturated to saturated fat, red and processed meats, and alcohol. The range of score is 0 to 9. The score criteria are: Intake above the FFQ-specific median intake received 1 point for vegetables, fruits, nuts, whole grains, legumes, fish, and M:S ratio; otherwise, they received 0 points. Red and processed meat consumption below the FFQ-specific median received 1 point; otherwise, 0 points. Alcohol intake between 5 and 15 g/d for women and 10-25 g/d for men received 1 point; otherwise, 0 point.

DASH
This score was created to capture the characteristics of the Dietary Approaches to Stop Hypertension diet. The DASH 3 Components includes fruits, vegetables (excluding potatoes), nuts and legumes, low-fat dairy products, whole grains, sodium, sweetened beverages, red and processed meats. The score range is 8 to 40. The DASH Scoring Criteria is each food group first classified into FFQ-specific quintiles. For fruits, vegetables, nuts and legumes, low-fat dairy products, and whole grains, the score for that food group is the quintile ranking. i.e., quintile 1 is assigned 1 point and quintile 5, 5 points. For sodium, red and processed meats, and sweetened beverages, low intake is best. Therefore, the lowest quintile was given a score of 5 points and the highest quintile, 1 point.

Healthy aging prediction
We used two standard classifiers: RF (Random Forest, R package 'randomForest' 4 ) and XGBoost (Extreme gradient boosting decision trees, R package 'xgboost' 5 ) to predict the healthy aging status. The base learners of RF are decision trees. Each tree is a non-linear model constructed with many linear boundaries. A node in a decision tree is associated with a question asking about the data based on the value of a particular feature. XGBoost is a scalable end-to-end decision tree boosting system 5 . Unlike RF that applies the technique of bootstrap aggregating (i.e., bagging) to tree learners, the trees of a boosting system are built sequentially: each tree aims to reduce the error of its previous tree.
For hyperparameter tuning, we used the R package 'caret' 6 (Classification And REgression Training). The number of features randomly sampled as candidates at each split range from 1 to 15 and the number of trees to grow is fixed to 500 for RF. The parameter ranges for XGBoost are the following: (1)  To overcome the label imbalance issue, we used the downSample function in caret, which will randomly sample a data set so that all classes have the same frequency as the minority class. To compare the performances of NR and the four healthy diet scores, we split the data into 80% percentage of sample as training set and the remaining 20% samples as test set. For each splitting, we used one of NR, HEI-2005, AHEI-2010, AMED and DASH together with other confounding factors to train the model, then validate the classifier using the test set. We used the error rate, i.e., the proportion of participants that have been incorrectly classified by the model and AUC (area under the ROC curve) to quantify the performance. Figure S1: Nutritional profiles are highly conserved across individuals while food profiles are highly personalized. The Bray-Curtis dissimilarity (column-1), rJSD (rooted Jensen-Shannon divergence, column-2), Yue-Clayton distance (column-3) and 1-Spearman correlation (column-4) between the food profiles of the same individuals but different time points (intra-individual) and food profiles among different individuals (inter-individual) at single food level (a1-a4) and nine major food groups level (b1-b4) and nutrient profiles (c1-c4). The Bray-Curtis dissimilarity (column-1), rJSD (column-2), Yue-Clayton distance (column-3) and 1-Spearman correlation (column-4) between the food (or nutritional) profiles of different individuals and different time points at the single food level (d1-d4) and nine major food group level (e1-e4). The Bray-Curtis dissimilarity between a pair of individuals, and is defined as

Supplementary figures and tables
The rJSD dissimilarity is defined as: *+,-( , ) ≡ [ is the Kullback-Leibler divergence between and . The Yue-Clayton dissimilarity is defined as: In all dissimilarity definitions, 6! represents the relative abundance of food/nutrient in individual . We only choose 100 participants in NHS and HPFS due to computational complexity. The boxplot represents all pairwise dissimilarity. Boxes indicate the interquartile range between the first and third quartiles with the central mark inside each box indicating the median. Whiskers extend to the lowest and highest values within 1.5 times the interquartile range.    removing those nutrients that are not specific enough to have a SIMLES or InChIKey ID, e.g., sugar, total fat, protein, total fiber, etc. We organized this matrix using the Nestedness Temperature Calculator to emphasize its highly nested structure 6 . The nestedness based on the NODF measure of the real FNN (gray bar), as well as the randomized FNNs (colored bars) using four different FNN randomization schemes: Null-FNN-1, complete randomization; Null-FNN-2, Food-degree preserving randomization; Null-FNN-3, Nutrient-degree preserving randomization; Null-FNN-4, Food-and nutrient-degree preserving randomization. For each randomization scheme, 50 realizations were generated. (2) The distribution of nutrient distances ( 6! ) between different foods calculated from the real FNN (gray lines) and the randomized FNNs (colored lines) using the same randomization schemes as in row (1). We generated 50 realizations for each randomization scheme, and the bin size is 0.02. All FDR-corrected P values were found using the paired and two-sided t-test. Significance level: FDR-corrected p value <0.0001(***).