Abstract
Current healthcare practices are reactive and use limited physiological and clinical information, often collected months or years apart. Moreover, the discovery and profiling of blood biomarkers in clinical and research settings are constrained by geographical barriers, the cost and inconvenience of in-clinic venepuncture, low sampling frequency and the low depth of molecular measurements. Here we describe a strategy for the frequent capture and analysis of thousands of metabolites, lipids, cytokines and proteins in 10 μl of blood alongside physiological information from wearable sensors. We show the advantages of such frequent and dense multi-omics microsampling in two applications: the assessment of the reactions to a complex mixture of dietary interventions, to discover individualized inflammatory and metabolic responses; and deep individualized profiling, to reveal large-scale molecular fluctuations as well as thousands of molecular relationships associated with intra-day physiological variations (in heart rate, for example) and with the levels of clinical biomarkers (specifically, glucose and cortisol) and of physical activity. Combining wearables and multi-omics microsampling for frequent and scalable omics may facilitate dynamic health profiling and biomarker discovery.
Main
Multi-omics technologies enable the quantification of thousands of molecules and can provide new insights into the molecular landscape of health and disease1,2. Despite major advances in omics technologies, the upstream sample collection and processing still requires travel to a clinic, access to a phlebotomist and physical and emotional discomfort. These current sample-collection strategies do not meet the desired flexibility and non-invasiveness to conduct comprehensive longitudinal profiling independent of access to a clinic. Furthermore, the high sample volume needed (often 10–50 ml of venous blood) prohibits frequent collections, which precludes high-resolution analysis of dynamic metabolic and biological processes that occur on the scale of minutes or hours. Finally, high sample collection and processing costs can be prohibitive for performing large studies in remote environments.
Previous studies have investigated dried blood spot (DBS) sampling3,4,5,6 and volumetric absorptive microsampling (VAMS)7,8,9 for metabolite and protein analyses10. In principle, DBS allows individuals to collect a blood drop sample at home and return the sample by mail at room temperature. However, DBS sampling is often irreproducible since volumetric amounts can vary considerably, and, so far, the number of analytes analysed from DBS has generally been modest11.
In this Article, to circumvent these challenges, we devised a streamlined multi-omics profiling system that uses finger prick blood drop collection, minimizes pain and enables sampling frequencies on the timescale of minutes without needing clinic access. Our method collects fixed 10 μl volumes and, following extraction, enables the simultaneous analysis of proteins, metabolites, lipids and targeted cytokines/hormones from a single sample enabling broad analyte profiling. In two proof-of-principle studies, we first demonstrate the profiling of a dynamic response to ingestion of a mixed meal shake and discover high heterogeneity in individual metabolic and immune responses, and second, we perform high-resolution profiling of an individual over 1 week enabling the identification and quantification of thousands of molecular changes and associations across ‘omes’ at a personal level. Our approach is scalable, enabling high-frequency molecular profiling for broad utility in research and clinical studies.
Results
Overview of the multi-omics microsampling approach
The blood microsampling and multi-omics data acquisition workflow are shown in Fig. 1a. After testing numerous methods, we settled on collecting 10 μl blood microsamples using a Mitra device, a solid matrix that collects fixed blood volumes. We tested a wide variety of extraction conditions and further developed a method for efficiently extracting proteins, a broad range of lipids, and metabolites from a single microsample using biphasic extraction with methyl tert-butyl ether (MTBE). This extraction procedure yields an organic phase containing hydrophobic metabolites and lipids, an aqueous phase containing hydrophilic metabolites and a methanol-precipitated protein pellet processed for proteomics data acquisition. Using a separate microsample, we performed an aqueous extraction for performing multiplexed immunoassays on the Luminex platform (Methods). Omics datasets were then processed, annotated and curated for detailed omics analysis.
a, The samples were collected using microsampling devices, and then multi-omics data (proteomics, metabolomics, lipidomics, cytokine and so on) were acquired. b, Outline of the primary microsampling analyses. c, The coefficient of variation (CV) distribution for proteins, metabolites and lipids across all the samples in the stability analysis. d, The percentage of analytes is significantly affected by storage duration, temperature and interactions (linear regression). The red line shows the expected proportion of nominally significant results at the alpha level of 5% (P = 0.05). e, The Spearman correlations between microsamples and intravenous blood samples (n = 34) for metabolites and lipids, respectively.
To evaluate the microsampling method, we first examined the stability of proteins, metabolites and lipids in microsamples under multiple conditions, including testing storage duration and temperature (Fig. 1b and Extended Data Fig. 1a). We then compared microsampling with conventional intravenous sampling methods (Fig. 1b). Finally, two pilot case studies were performed to demonstrate how microsampling can capture important health and biological perturbations in a lifestyle context (Fig. 1b).
Protein, metabolite and lipid stability in microsamples in multiple conditions
We first evaluated the stability of proteins, metabolites and lipids in the blood microsamples (Supplementary Fig. 1). In brief, blood samples were collected from two participants using the 10 μl Mitra devices. A total of 36 microsamples were collected from each participant, with the microsamples stored in duplicate at three temperatures (4, 25 and 37 °C) and for five durations at each temperature (3, 6, 24, 72 and 120 h) before storage at −80 °C until analysis. An additional set of samples was immediately stored at −80 °C. Proteomics, metabolomics and lipidomics data were acquired from the microsamples (Methods). After quality control (QC), imputation and annotation of the data, there were 66 proteomics samples with 128 proteins, 71 metabolomics samples with 1,461 annotated features and 72 lipidomics samples with 776 lipids (Supplementary Dataset 1). Each omics dataset was assessed individually to examine analyte stability concerning storage duration, storage temperature and the interaction of storage duration and temperature. The stability metrics assessed were (1) the average coefficient of variation (CV) across both participants’ samples (estimated using the formula for log-scale data12), (2) the presence of significant effects of storage conditions on analyte level using a linear regression analysis (excluding the baseline samples that were not stored at any temperature) and (3) relative importance measures (partial R2 and the Lindeman, Merenda and Gold measure, LMG1; Methods).
The results revealed that, overall, the majority of analytes were quite stable to storage duration, temperature and the interaction effect (Fig. 1c,d). Proteins were the most stable (CV range 0.149–1.728, median 0.397) with few, that is, three (2.3%), eight (6.3%) and six (4.7%), associated with storage duration, temperature and the interaction effect, respectively. Metabolites were less stable (CV range 0.054–54.328, median 0.378) with 194 (13.3%), 389 (26.6%) and 193 (13.2%) associated with storage duration, temperature and the interaction effect, respectively. Finally, lipids were the least stable (CV range 0.088–2.218, median 0.335), with 150 (19.3%), 513 (66.1%) and 172 (22.1%) associated with storage duration, temperature and the interaction effect, respectively. The relative importance models gave similar results. Thus, most analytes can be reliably measured using remote sampling, and the less stable ones can be identified and potentially measured using correction models.
Comparison between microsample and intravenous plasma sample
We next examined the similarity between the molecular profiles derived from microsamples of whole blood compared with venepuncture plasma. Blood samples were collected from 34 participants using both microsampling and conventional intravenous blood draws (Supplementary Fig. 1a and Methods), and metabolomics and lipidomics data were acquired from each participant (Supplementary Dataset 2). The median intensity of every feature in the 34 participants was calculated separately in the two datasets, microsampling and intravenous plasma collection samples, and compared via correlation graphs (Fig. 1e). Interestingly, the results of the microsampling and intravenous collection methods were quite similar in that the Spearman correlations were 0.81 (P < 0.001) and 0.94 (P < 0.001) for 642 metabolites and 616 lipids, respectively. Metabolites and lipids that were not well correlated (Spearman correlation < 0.5) were enriched for amino acids and triglycerides (TAGs), respectively (Supplementary Fig. 1b,c). However, most classes of molecules were very similar between the microsampling and venous blood draw, including most of the amino acids, carbohydrates, free fatty acids (FFAs), TAGs, diglycerides, phosphatidylcholines (PCs) and other molecules.
Case studies
As a demonstration of the power of microsampling, we performed two case studies while participants were in their native environments. The first was to examine the effect of drinking a complex mixture on metabolic profiles. The second was to perform very dense ‘24/7’ profiling (98 microsamples) across a period of just longer than 7 days.
Case study 1: metabolic phenotyping responses to Ensure shake consumption
Individuals can differ markedly in their metabolic response to food on the basis of their epigenome, microbiome, metabolome and other factors13,14,15,16, yet the heterogeneity of this response is not well understood or fully established. Determining these differences at an individual level is important to optimize diet and lifestyle changes for personalized health, weight reduction and/or management of the metabolic disease. Biomarkers are typically measured at a single timepoint because of the difficulty of collecting high-frequency blood samples using a conventional blood sampling approach, but the rapid and dynamic nature of metabolism in response to food intake requires higher resolution. To follow the diversity of metabolic responses to complex dietary mixtures, we measured the multi-omics responses to a defined mix of carbohydrates, lipids, proteins and micronutrients. We analysed metabolomics, lipidomics, cytokines and hormones in 28 participants with diverse backgrounds (Fig. 2a and Supplementary Fig. 2a) and developed six metabolic responses metrics: (1) carbohydrate, (2) lipid, (3) amino acid (protein), (4) insulin secretion, (5) FFA (related to insulin sensitivity) and (6) immune (cytokines).
a, The study design and overview of the Ensure shake study. b, The summary of multi-omics data from the microsamples. c, Responses of metabolites, lipids and cytokines/hormones after Ensure shake consumption (two-sided Wilcoxon rank test). d, The clustering of dysregulated molecules following Ensure shake consumption. e, Amino acid response to Ensure shake consumption. f, Response of three dysregulated carbohydrates to Ensure shake consumption. g, Acylcarnitine response to Ensure shake consumption. h, Cytokine/hormone response to Ensure shake consumption. The points are represented by mean ± s.d.
Thirty-two participants were mailed a kit containing microsampling Mitra devices, an Ensure shake and careful instructions for microsampling sample collection. Each participant collected one microsample (defined as 0 min), consumed the Ensure shake and collected additional blood microsamples at 30, 60, 120 and 240 min after consumption (Fig. 2a). Participants returned their microsamples by overnight mail on the same day of microsample collection. The microsamples were used for multi-omics data acquisition, namely, metabolomics, lipidomics and cytokines/hormones. Four subjects without metabolomics data were removed from the dataset (Methods and Fig. 2b). After data cleaning, curation and annotation, 768 analytes were detected from the microsamples, including 560 metabolites, 155 lipids and 54 cytokines/hormones for each of the 28 participants at each of the five timepoints (a total of 140 data points) (Fig. 2b and Supplementary Dataset 3).
Clustering of altered molecules
We first determined whether the microsampled multi-omics data reflected the consumption of the Ensure shake. For each timepoint post consumption, the Wilcoxon rank test was used to define the significantly dysregulated molecules compared with timepoint 0 (baseline). Interestingly, the majority of significantly increased metabolites and lipids peaked at approximately 60 min and 120 min, respectively, and then approached baseline levels by 240 min (Fig. 2c). These results indicate that many molecules substantially responded to Ensure shake in the blood, and the response kinetics differed on the basis of the classes of molecules.
To quantify the molecules that shifted their levels upon Ensure shake consumption, an analysis of variance (ANOVA) test was used. The results show that the levels of 99 of 560 metabolites (17.7%, permutation test P < 0.001), 115 of 155 lipids (74.2%, permutation test P < 0.001) and 7 of 54 cytokines/hormones (13.0%, permutation test P < 0.001) significantly shifted following Ensure shake consumption (Supplementary Dataset 4 and Methods). For the metabolites whose levels changed, the signals of analytes that differed from baseline were greater than those affected by storage duration. These results demonstrate that multi-omics analysis from microsamples can be used to measure the metabolic response to Ensure shake.
The molecules significantly affected by Ensure shake were then clustered using fuzzy c-means clustering to reveal and summarize the pattern of changes associated with consumption time (Methods). The shifted molecules were grouped into three major clusters across five timepoints (Fig. 2d). Cluster 1 contained 39 metabolites, 1 lipid and 4 cytokines that increased and then decreased with a peak at approximately 60 min following Ensure shake consumption and then returned to baseline by 240 min. Cluster 2 contained 19 metabolites and 106 lipids that increased more gradually than cluster 1, peaking at approximately 60–120 min. Molecules in cluster 3 decreased after consuming the Ensure shake and then recovered, including 23 metabolites, 8 lipids and 3 cytokines (Fig. 2d). These results demonstrate that the molecules have different patterns and kinetics of the biochemical responses to complex mixture ingestion.
Altered metabolic pathway and physiological responses to Ensure shake
We next explored the pathways and physiological responses represented by the molecules in each cluster (Fig. 2d and Supplementary Fig. 2). Cluster 1 primarily comprised metabolites (39 metabolites, 1 lipid and 4 cytokines) and several biological pathways such as aminoacyl-tRNA biosynthesis, phenylalanine, tyrosine and tryptophan biosynthesis, and phenylalanine metabolism pathways were evident (Supplementary Fig. 2c). The two major chemical classes captured in cluster 1 were amino acids and carbohydrates (Fig. 2d). Both compound classes probably come directly from the Ensure shake or are metabolized quickly (Fig. 2e,f). On the other hand, for cluster 3, acetylcarnitine was the main metabolite class, which dramatically decreased upon Ensure shake consumption and then recovered gradually by 240 min (Fig. 2g). This is expected because acetylcarnitine is broken down in the blood by plasma esterases to carnitine, and carnitine helps FFAs to be transported into the mitochondria for β-oxidation and energy production, hence maintaining whole-body energy homeostasis17. Consistent with this interpretation, eight FFAs detected in cluster 3 (Supplementary Fig. 2d) decreased following Ensure shake consumption. Notably, in cluster 2, we found 106 lipid species (Fig. 2d), and most of them were TAGs (102 TAGs with 48–52 carbons chains and 1–3 unsaturations; Supplementary Fig. 2e).
To better understand the molecules in the Ensure shake that might be directly detected in the participants’ microsamples, we also analysed the composition of the Ensure shake using the same mass spectrometry procedure. Nearly 50% of the compounds found in the Ensure shake can be detected in the blood, and most of the remainder were of low abundance (Supplementary Fig. 2f). Importantly, of 21 high-interest metabolites that changed in the blood (Fig. 2e,f,g), 17 are present in the Ensure shake. This result demonstrates that the microsampling approach is able to detect the ingested molecular signatures from blood samples.
It is well known that both connecting peptide (C-peptide) and insulin are co-secreted from the pancreas and correlate with increased carbohydrates18,19,20. As expected, C-peptide and insulin were in the same cluster with the carbohydrates (cluster 1, Fig. 2h). Moreover, we found both gastric inhibitory polypeptide (GIP) and pancreatic polypeptide (PP) in the same cluster with insulin (cluster 1) (Fig. 2h, left). GIP is an inhibiting hormone of the secretin family of hormones21, and its main role is stimulating insulin secretion22. Increased secretion of PP is reported to be associated with protein meal consumption, fasting, exercise and acute hypoglycemia23. In cluster 3, we found that leptin, interferon-γ (IFNG) and interleukin 4 (IL4) decreased quickly following Ensure shake consumption (Fig. 2h, right). The primary function of leptin is regulating adipose tissue mass through central hypothalamus-mediated effects on hunger24; its levels are expected to decrease after food consumption. IFNG and IL4 are involved in immune responses, including allergies and antibacterial responses. Interestingly, this suggests that the Ensure shake may have anti-inflammatory properties. In summary, these results demonstrate that the kinetics of the biochemical responses, including hormones, to complex mixture ingestion can be revealed using microsampling (Supplementary Dataset 5).
Metabolic phenotyping reveals unique individual responses
How individuals respond to different foods is an area of great interest. The Ensure shake is a simple yet complex mixture of many types of simple molecules that can be quickly absorbed by the small intestine. To examine how different people respond to different metabolites, we explored the diversity in the kinetics and magnitude of the molecular responses among the different participants. Analysis of the samples using a t-distributed stochastic neighbour embedding (tSNE) plot shows that the samples were clustered by the participant, indicating that each participant had a unique molecular profile and that the difference between participants was greater than that of the effect of the shake (Extended Data Fig. 2a). Nonetheless, a clear timewise separation of data points was observed (Extended Data Fig. 2b). Our study suggested that, by 240 min, the metabolic levels tend to return closer to their baseline level (Extended Data Fig. 2b). We then used unsupervised consensus clustering to cluster participants into different groups. Our results suggested that there were two major groups based on the molecules altered in response to the shake consumption (Extended Data Fig. 2c and Methods). In those two groups, we calculated the level of changes in metabolic features, comparing each timepoint with the baseline (timepoint 0) for each participant (Methods). This result also suggested that participants of those two groups had different responses to the Ensure shake (Extended Data Fig. 2d): group 2 responded more slowly than the participants in group 1 (Extended Data Fig. 2d), indicating the kinetics of their responses were different. Interestingly, for the 13 individuals with a measure of insulin resistance (steady-state plasma glucose (SSPG; Methods)), although statistically insignificant, we noticed a trend for patients with insulin resistance to be included in group 1 over group 2 (Wilcoxon test: P = 0.29, Extended Data Fig. 2e).
Metabolic scores based on the dynamic response to the Ensure shake
As individuals are known to vary in their response to different foods, and we found heterogeneity in response to the Ensure shake for each participant, we next examined the response of each class of molecules, carbohydrates, lipids, cytokines/hormones and proteins to shake ingestion.
We derived a ‘metabolic score’ for the degree of an individual’s carbohydrate, lipid, FFA and protein response to the Ensure shake, along with insulin secretion and inflammatory response (cytokines) (Methods). Briefly, for each molecule in each participant, after the Ensure shake consumption, the area under the curve (AUC) was used to represent its cumulative value (Fig. 3a). The AUCs of molecules for each molecular class (lipids, carbohydrates, amino acids and inflammatory molecules) were then used to calculate the response score for each participant (Fig. 3b). The final metabolic scores were normalized and ranged from 0 to 1, where 0 means the lowest relative metabolic level and 1 means the highest relative metabolic level. One participant was recognized as an outlier subject and excluded during the score calculation (Supplementary Fig. 3 and Methods). For each participant, we observed a consistent distribution pattern of the molecular species within each metabolic score indicative of similar response patterns to Ensure shake consumption. However, those patterns differed greatly across subjects demonstrating high inter-individual variability in the metabolism of nutrients (Supplementary Figs. 4 and 5a).
a, The visualization of the AUC metric for each analyte used in a metabolic score. b, The analytes used in calculating each of the six metabolic scores are shown. c, Participants were grouped into five groups on the basis of six metabolic scores. d, Five participant examples for each group.
The six metabolic scores were calculated for each participant. As expected, we found a negative correlation between FFA score and SSPG, a marker of insulin resistance25 (Supplementary Fig. 5b). Previous studies have demonstrated that elevated plasma levels of FFA are associated with insulin resistance26. The participants were classified into five groups on the basis of their metabolic scores using the hierarchical clustering method (Fig. 3c). We found that individuals varied considerably in their response to the shake for each of the different areas; examples selected from each of the five groups are shown in Fig. 3d. Within each group, we observed variations in the scores from the average score per metabolic class. For example, participant S30 in group 1 presented lower metabolic scores for fats and amino acids compared with the average level of the entire group. In comparison, S34 in group 5 showed higher scores for those classes (that is, carbohydrates and amino acids) than the average scores. These differences may be due to a variety of underlying mechanisms, including levels of digestive enzymes, transporters, hormones (such as incretins) and/or intestinal microbes required to process particular molecules in the Ensure shake. Such underlying causes can be investigated in the future through additional analyses (such as metabolic flux analysis). Interestingly, S29 and S35 in groups 3 and 5, respectively, had higher scores in hormones and cytokines. The latter is particularly interesting as some individuals appear to have a strong inflammatory response (for example, individual S35), whereas others have a different response to appetite-suppressing hormones. Thus, the multi-omics data from microsamples reveal the enormous heterogeneity in the biochemical responses of each individual to a complex mixture. Such information can be defined using microsampling and is important for precision nutrition diets, including inflammatory responses to food. Correlating these individual responses with medical phenotypes (for example, low-density lipoprotein levels and HbA1c levels) will be important for personalized nutrition management in the future.
Case study 2: 24/7 personalized whole physiome profiling using wearable and multi-omics data
Several studies have demonstrated that longitudinal individualized molecular profiles, clinical tests and digital data can monitor health and enable early disease detection at an individual level1,27,28,29. However, these studies use low-frequency/high-volume blood sampling (weekly or monthly, for instance), which does not enable the detection of detailed patterns such as circadian and many high-resolution lifestyle metabolic and other molecular changes. Higher-frequency data collection would enable monitoring of health status as well as circadian and lifestyle patterns at high resolution in real time, uncover relationships between molecules with each other and physiological and lifestyle activities and decipher causal associations between them at the personal level.
As a proof-of-principle study to determine whether this is feasible, we explored the combined use of our microsampling approach and wearables to explore the detailed molecular and physiological changes that occur in a real-world native context in a single individual. In this ‘24/7 study’, a single participant collected blood microsamples usually every 1–2 h during waking hours over 7 days, with some samplings as short as 30 min apart (Fig. 4a and Supplementary Fig. 6a). A total of 98 samples were collected over 7 days along with wearable data from two devices: (1) a smartwatch that recorded heart rate (HR) and step count, and (2) a continuous glucose monitor (CGM)30 (Fig. 4a, Supplementary Fig. 6b and Supplementary Dataset 6). Food logging was also performed many times each day using an app.
a, One participant was closely monitored using wearable devices and high-frequency microsampling (approximately hourly) across 7 days. Microsamples were then analysed for internal multi-omics data measurements. b, Molecular information was detected from the high-frequency microsamples. c, Wearable data from the smartwatch (sleep and step count) and Dexcom (CGM glucose). Legend defines the status of sleep (note that REM = rapid eye movement), the category of consumed foods, and the day/night period at every record. The yellow background represents the daytime (6:00 to 18:00). d, The internal molecules were grouped into 11 clusters using fuzzy c-means clustering.
The 98 microsamples were used for in-depth multi-omics profiling, including untargeted proteomics, untargeted metabolomics, targeted lipidomics and targeted cytokine, hormone, total protein and cortisol assays (Fig. 4b, top). After data acquisition and annotation, we detected a total of 2,213 analytes that included 1,051 metabolites, 811 lipids, 291 proteins, 45 cytokines, 13 metabolic panels (cytokines/hormones), 1 total protein and 1 cortisol measurement (Fig. 4b, bottom) resulting in a total of 214,661 biochemical measurements in addition to wearable physiological data (Supplementary Dataset 7). Overall, the prospective collection of internal molecular and wearable data resulted in comprehensive, high-frequency and abundant longitudinal data on the human whole physiome and lifestyle (Fig. 4b,c and Supplementary Fig. 6b,c), allowing us to explore how the internal molecules and physiology change on an hourly scale and their relationships at a personal level.
To explore whether multi-omics microsampling captures real biological signatures (such as food intake), we selected 2 days on which the participants ate high-carbohydrate food (131.8 g) and low-carbohydrate food (31.9 g), respectively (Supplementary Fig. 7a). Then the two carbohydrate metabolites (fructose and pyruvic acid) in microsamples were extracted and analysed, as shown in the box plot in Supplementary Fig. 7b. The median values of carbohydrate metabolites are 7.8 and 4.7, respectively, demonstrating that the omics data from microsamples roughly reflect the concentration of the food type the participants consumed.
Wearable and internal multi-omics data reflect the individual physiological status
We first explored whether wearable and high-frequency internal multi-omics data can monitor and reflect the participant’s health status and searched for general patterns in the data. The 2,213 internal molecular profiles were smoothed (Methods and Supplementary Fig. 7c,d) and then grouped into 11 clusters using fuzzy c-means clustering analysis (Fig. 4d and Extended Data Fig. 3a). Two clusters followed circadian patterns. For example, cluster 4, which is enriched by a high number of metabolites (Extended Data Fig. 3b), generally peaked during the day time, while cluster 11, which includes mostly lipids (Extended Data Fig. 3b), peaked primarily at night (Fig. 4d). Other clusters were not necessarily tied to circadian patterns and thus may reflect other events. The components of the different clusters were unique, indicating that the molecules have different temporal patterns (Extended Data Fig. 3b). To obtain tight and distinct molecular modules from each cluster, we used the community analysis method31 (Methods and Extended Data Fig. 3c–e). Interestingly, obvious peaks were found in some modules (Extended Data Fig. 3e,f and Methods), indicating that the molecules in modules may be triggered by specific events (Figs. 4d and 5a).
As we have the detailed food (nutrition) and exercise logs, we next analysed whether and how molecular fluctuations relate to daily nutrition intake32,33 (Methods). Briefly, nutrients in the food log were classified into several major classes on the basis of their content level: amino acids, vitamins, fat, electrolytes, calories, carbs and fibre. Next, we calculated the association between those classes with internal molecules presented by the Jaccard index depicted in the heat map (Extended Data Fig. 3g). Interestingly, we captured a high association between classes of amino acids and fat with several modules highly enriched in amino acids, FFAs and lipids (Extended Data Fig. 3g), consistent with previous results34. As with the Ensure shake study, our data revealed molecular associations with daily nutrition intake. For example, the participant consumed the same meal shake every morning during the study, and we captured a clear link between daily shake consumption and temporal increase of several compounds such as 1,2,3-benzenetriol sulfate and hydroxyphenyllactic acid, which are listed as the shake’s ingredients (Fig. 5a, top, and Extended Data Fig. 3h).
a, Four molecules reflect the participant’s lifestyle. b, Heatmap to show the rhythmic molecules. c, Three clusters that have strong rhythmic patterns. d, Lipid class distributions of lipids in three clusters. Cer: ceramide; SM: sphingomyelin; DAG: diacylglycerol; LPE: lysophosphatidylethanolamine. e, Examples for each cluster. The yellow background represents the daytime (6:00 to 18:00).
Cortisol is believed to follow a circadian pattern, with levels higher in the morning that decrease towards the evening35. However, events during the day related to stress, activity and diet can impact cortisol levels36. Although morning peak levels of cortisol were evident on 3 days, we observed large day-to-day variations in cortisol patterns, demonstrating that within-day cortisol levels may not represent accurate inter-day cortisol patterns for this individual (Fig. 5a). This result suggests the importance of high-frequency sampling for monitoring health marker status.
Importantly, this study also demonstrates the potential usage of microsamples to measure the pharmacokinetics of a drug at an individual level. Our participant took a low dose of aspirin in the morning for 4 days. Microsampling accurately captured the pharmacokinetics of salicylic acid (hydrolysed product of aspirin, Extended Data Fig. 3h) and revealed a clearance period of about 24 h in this person, which is similar to previous results37 (Fig. 5a, bottom). In addition, we found a negative correlation between caffeine and sleep quality (Extended Data Fig. 4a,b). This might be expected and has been reported in other studies38,39; however, the participant always consumed coffee before noon, indicating its long-lasting effect.
Interestingly, our detailed monitoring also revealed an unidentified inflammatory event in the middle of the week, spanning 3 days, with a number of both increased inflammatory cytokines (for example, TNFα and CD40L) and as well as several others that decreased (for example, eotaxin) (Extended Data Fig. 4c,d). This event was subclinical, as no symptoms were reported, and may represent an asymptomatic infection or other stress event. Together, these results show the power of high-frequency monitoring to record daily measures and health-related events not evident to the patient. The latter is particularly important for the early detection of disease40.
Circadian rhythms of internal molecules in human blood
Circadian rhythms are endogenous oscillators in physiological and behavioural processes over a 24 h cycle, and they play a critical role in human health and diseases41. Circadian molecules participate in diverse physiological phenomena such as cell division, energy metabolism and blood pressure36,42. These have not been explored at a personal level in a real-life setting because of the low frequency and high blood volume limitations of traditional blood sampling. Using the high-frequency data collected from the microsampling method, we were able to explore and evaluate molecules associated with circadian rhythms in the human body43.
Each molecule was first searched for those that exhibited a consistent pattern across all 7 days, and we removed those that lacked a consistent daily pattern (Methods and Extended Data Fig. 5a). The circadian rhythms analysis (JTK_CYCLE algorithm44) was then used for quantitative analysis of all the molecules (Methods). We identified 332 circadian molecules (Benjamini–Hochberg (BH)-adjusted P values < 0.05) that show clear circadian patterns (Extended Data Fig. 5b and Supplementary Dataset 8). The circadian molecules were grouped into five major clusters using fuzzy c-means clustering (Extended Data Fig. 5c). Interestingly, all clusters, except cluster 4 (enriched by protein), were dominated by lipids (Extended Data Fig. 5d). We focused on the molecules that exhibited a complete 1-day cycle (those in clusters 1, 2 and 3; Fig. 5b,c) and removed clusters 4 and 5, whose molecules had different levels at the beginning and end of the day (Extended Data Fig. 5e,f). Cluster 1 was dominated by PC (32.56%) and lysophosphatidylcholine (LPC, 25.58%), cluster 2 was dominated by TAGs (93.65%), and cluster 3 was dominated by both TAGs (49.15%) and phosphatidylethanolamine (PE, 22.03%) (Fig. 5d). Examples for each cluster are shown in Fig. 5e.
To explore the in-depth functions of the rhythmic molecules in each cluster, we performed lipid enrichment analysis using Lipid Mini-on45. LPC, PC, sterol and cholesterol ester (CE) were significantly enriched in cluster 1. Previous work has shown that LPC and PC have circadian rhythms with peak concentrations in the evening, consistent with our result46. For cluster 2, TAG and glycerolipid were significantly enriched, and for cluster 3, PE was significantly enriched (Extended Data Fig. 6). Thus, the different classes of lipids exhibit distinct circadian patterns. To explore whether the circadian lipids were affected by the food intake, we then examined the food logging data. We found that the fat nutrition intake differed across 8 days, meaning that the circadian lipids are not driven by the food intake. It is plausible that circadian lipids were driven by individual rhymic kinetics or gut microbes. In summary, multi-omics analyses from the high-frequency microsamples revealed rhythmic molecules and demonstrated that lipids related to energy metabolism have distinct circadian patterns.
Wearable data reflect internal molecular changes
Over the past several years, longitudinal monitoring of physiological data has garnered considerable interest30,47,48,49,50. However, the ability of wearable data to predict clinical labs has been limited30. Several studies have demonstrated that wearable data can reflect and predict the internal molecules (multi-omics data), including laboratory clinic tests and metabolites on a weekly or monthly scale29,30. However, due to the low-frequency sampling of multi-omics data, the circadian patterns and causal relationships between digital and internal molecular data cannot be discerned50. We explored the relationship between wearable data and internal molecular changes on an hourly scale at an individual level, including building predictive models.
Because of the different sampling frequencies of wearable and internal multi-omics data, we first attempted to match the wearable and internal multi-omics data using different window sizes. The matching windows were set as 5, 10, 20, 30, 40, 50, 60, 90 and 120 min. For each wearable data type (HR, step count and CGM) in the matched windows, a feature engineering pipeline30 was used to convert different data types into eight features (for example, the standard deviation (s.d.) of heart standard and maximum HR; Methods) resulting in a total of 24 wearable features. The 24 wearable features were used to predict each analyte using the random forest model. Of the 2,223 molecules, we found 447 molecules that correlated with wearable features with at least one R2 > 0.3 (Supplementary Fig. 8a). Interestingly, we also found that most molecules have higher prediction accuracy with the larger matching window, consistent with a previous study30. Most of the 447 molecules were lipids, and enrichment analysis showed that TAGs were the most predictable by wearable data (Supplementary Fig. 8b). HR-related features (for example, HR range, HR maximum and HR s.d.) contributed the most to the predictive models (Supplementary Fig. 8c). Using the random forest model, we then found that several cytokines (C-peptide, GIP, insulin and PP) could also be predicted by wearable data. The most contributed wearable features were CGM and HR-related features (Supplementary Fig. 8d). All those results demonstrate that wearable data could predict our high-frequency multi-omics data from microsamples.
Biochemical changes in the body can occur on the order of minutes and hours51, and thus low-frequency multi-omics data (weekly or monthly) can find some associations with physiological measurements but not causal relationships29,30. Using the high-frequency microsampling approach, we next explored whether we could deduce the potential causal relationships between wearable data and internal molecules through temporal relationships; causal events are expected to precede downstream effects52. We first matched each wearable data point and molecule with different lagged times. Then, the Spearman correlation and P value were calculated for matching time-series data. Only the correlations with similar shapes and lagged time were scored as significant lagged correlations (Methods). To enable this analysis, the laggedCor (lagged correlation) algorithm was developed as an R package (https://jaspershen.github.io/laggedcor/). We then used this algorithm to demonstrate that we could capture and quantify the known lagged correlation and causal relationships between step count and HR. Interestingly, we found a lagged correlation of 0.6 (BH-adjusted P value < 0.0001) with a shift time of −1 min (step count − HR, Supplementary Fig. 9a), which means that 1 min after the step count increases, the HR begins to increase. This expected result demonstrates that our lagged correlation algorithm can capture and quantify potential causal relationships. Next, a lagged correlation network between wearable and internal molecular data was generated (Extended Data Fig. 7a and Supplementary Fig. 9b), including 1,217 nodes (3 wearable data points and 1,214 molecules) and 1,895 edges (Extended Data Fig. 7b and Supplementary Dataset 9), demonstrating a high degree of association between wearable and multi-omics data. An example with the top 100 edges for each pair of wearable and omics data is provided in Extended Data Fig. 7a. Step count and HR have most of the edges (57.3% and 42.6%, respectively) in the lagged correlation network (Extended Data Fig. 7b). We also found that CGM correlates more with cytokines than HR and step count (Supplementary Fig. 9c), indicating that glucose levels strongly correlate with immune responses. This result has been demonstrated by other studies53. In addition, we also observed that step count and HR have many (669) overlapping correlations (Supplementary Fig. 9d), as expected since they have a significant positive correlation.
Interestingly, the immunity-related pathways contained some proteins that negatively correlated with CGM, which was not expected (Supplementary Fig. 9e). This demonstrates the importance of following these responses at the individual level. As expected, we also found that glucagon signalling, oxidative phosphorylation pathways and FFAs positively correlate with CGM (Supplementary Fig. 9f,g). Glucagon breakdown can raise the concentration of glucose and fatty acids in the bloodstream, and oxidative phosphorylation can oxidize nutrients to release chemical energy. We found that the caffeine metabolism pathway positively correlates with HR (Supplementary Fig. 9h), consistent with previous studies54. We also found that the blood coagulation pathway positively correlates with HR (Supplementary 14i), and the neutrophil degranulation pathway negatively correlates with HR (Supplementary Fig. 9j). To the best of our knowledge, these associations provide new biological insights that should be validated in future studies. Overview, these results demonstrate that the wearable data can reflect the physiological status of the participant and reveal useful insights at a personal level.
We extracted the CGM glucose subnetwork from the entire lagged correlation network to further explore how glucose associates with internal molecules (Fig. 6a). We observed that CGM glucose has a significant lagged correlation with α-synuclein (lagged correlation: 0.36, BH-adjusted P value < 0.05) (Fig. 6b), and the shift time is −55 min (α-synuclein − CGM), indicating that α-synuclein may directly or indirectly upregulate glucose levels in the blood. This result has been demonstrated by previous studies55,56. Previous studies have shown that higher C-peptide levels correlate with increased CGM glucose values57. Our data found that CGM glucose significantly lagged correlations with C-peptide (Fig. 6b and Supplementary Fig. 10a) and insulin (Supplementary Fig. 10b). The shift time between CGM and C-peptide in this individual is 10 min (lagged correlation 0.36, BH-adjusted P values < 0.05), which means that CGM glucose precedes the concentration of C-peptide in blood by 10 min. We also observed that CGM significantly correlates with several cytokines, including TNF-β (Fig. 6b), FLT3L, IL15, IP10 and TNF-α (Supplementary Fig. 10c; time shift 0 min to 15 min), and four of them are pro-inflammatory cytokines. These results indicate that glucose can cause a rapid and specific pro-inflammatory response. In summary, our results show that, on the basis of the high-frequency multi-omics data from microsamples, we find potential causal associations between wearable and multi-omics data. The potential causal relationships we found using the laggedCor algorithm can be validated by future experiments.
a, The CGM glucose subnetwork from the whole network. b, Three examples are shown to represent the causal relationships between CGM glucose and internal molecules. NS, not significant.
Discussion
Current healthcare practices are reactive and based on limited physiological and/or clinical information, often collected months or years apart. In this study, we built a multi-omics microsampling approach that enables the measurement of thousands of metabolites, lipids, cytokines and proteins in frequently collected 10 μl blood samples. We demonstrated that many of the molecules from the microsamples (VAMS) are stable and reliable. In addition, most of the molecules from the microsampling are consistent with the classic blood sampling approach (Spearman correlation 0.8–0.9). Compared with DBS, VAMS can achieve good analytical performance for targeted compound and protein analysis58,59. However, for DBS, the haematocrit effect affects the resulting spot size, which can introduce variation in analysis. As the microsampling approach is less invasive and can be used remotely and without specific training, it enables high-frequency blood sample collection (approximately hourly) in a native setting, which is difficult to perform using the classic blood sampling approach. On the basis of the multi-omics microsampling workflow, we carried out two case studies to demonstrate the dense in situ samplings, and analytic capabilities to (1) perform dynamic and individualized metabolic assessments after response to dietary (Ensure shake) intervention and (2) reveal large-scale intra-day molecular fluctuations as well as thousands of molecular associations including those associated with intra-day variation in HR, glucose levels and activity.
It is worth noting that most analytes that we measure, particularly proteins, appear stable with regard to time, temperature and the combination of both. We also note that, since we tested for significant effects of storage conditions with a relatively low sample size, we do not rule out additional effects that may not have been observed here due to power challenges, which was evident from a sensitivity regression analysis that analysed only one storage condition at a time (storage duration and storage temperature), and additional effects for storage duration were identified when the baseline samples were added to the analysis. For those molecules that are not stable, they can either be discarded from the analyses or quantification can be ascertained from unique degradation products. Alternatively, sample collection procedures could include rapid and cold shipping to minimize potential issues with less stable molecules. Indeed, we found that most samples can be collected and stored within 24 h, thus minimizing degradation. Larger stability studies, especially in larger and more diverse populations, will help identify other potential issues. Regardless, reliable measurements can be made for thousands of molecules, including those present at very low abundance (such as cytokines).
In summary, the presented methodology achieves fully remote, scalable, high-temporal-resolution omics and sensor monitoring. It has the potential for large-scale comprehensive, dynamic molecular and digital biomarker discovery and monitoring as well as health profiling. Here we used two case studies to show the potential of multi-omics microsampling in precision medicine. Many other applications can be envisioned. Examples include: (1) Longitudinal biomarker discovery. The multi-omics microsampling is simple and unpainful compared with the traditional blood collection method and thus enables anyone to self-collect high-frequent and high-quality blood microsamples anywhere for longitudinal biomarker discovery. (2) Personalized health monitoring. People can collect blood samples at home without any help and then send the samples to the laboratory for data acquisition and analysis. If a notable abnormality is detected, the result is sent immediately to a physician. The physician would then be able to validate the results and respond quickly with an intervention. (3) Therapeutic drug monitoring. Patients could collect microsamples frequently and remotely to monitor the drug-related compounds or biomarkers in the blood at a known time, to guide dosage, and result in optimized therapy. In our study, all the microsamples were prepared and run together as a batch to avoid batch effects. In the future, the microsamples collected in 1 day could be prepared and run in 1 day after sample collection. The users can receive their results within 2 days after sending their samples to the laboratory for analysis. Additionally, developing a clinical diagnostic based on microsampling requires additional validation steps for accuracy, precision, matrix effects and so on, and the use of standards such as isotopically labelled reference molecules. In addition, presently, only proteins, metabolites, lipids and cytokines were measured using our microsampling approach, but other types of molecules can be measured, such as DNA, epigenomes and RNA. For the 24/7 study, as a pilot study, only one participant was recruited to demonstrate the power of following personalized responses. Enlargement of the cohort size will enable the measurement of more generalized patterns but will also reveal new challenges in the processing and analysis of large numbers of samples. Indeed, our simple studies generated 98 data points in a single individual.
The two pilot case studies (group study and individual study) were used to demonstrate the power and application of the approach. The molecular signatures found in our study provide vast testable hypotheses that should be validated using analytical and experimental approaches. We note that group analysis is usually performed to find the overall trend. However, it can be potentially used to identify individual outliers who may have underlying conditions1. When an individual profile differs greatly from the average, one needs to first check for sample mix-ups, systematic variation and batch effects. Once normalized, data outlier detection can be further performed. Individuals who fall outside the overall pattern can be investigated for underlying causes for their molecular shift (medical conditions, medications or lifestyle abnormalities). In addition, the confounders (such as sex, age and body mass index (BMI)) must be controlled and adjusted to find the real and expected biological variation. Similarly, we note that, when an individual profile differs greatly from the average, overview conclusions from the whole cohort may not extend to individuals33,60. For the personalized analysis, the conclusion from the individual may not extend to the group or other individuals60, which can be revealed using our approach. Overall, we believe the multi-omics microsampling approach offers a promising opportunity to integrate with wearable data to improve precision healthcare.
Methods
Microsampling blood sample collection
The Mitra device (Neoteryx) is used to collect the microsampling blood samples. The blood microsampling method and multi-omics data acquisition workflow were established first (Fig. 1a). We developed a method for extracting proteins, lipids and metabolites from single microsamples, using biphasic extraction using MTBE. This extraction yields an organic phase processed for lipids, an aqueous phase processed for metabolites and a protein pellet processed for proteomics. Using a separate microsample, we performed an aqueous extraction for performing multiplexed immunoassays on the Luminex platform (Fig. 1a).
Intravenous blood sample collection
Intravenous blood from the upper forearm was drawn from overnight-fasted participants. Specimens were immediately placed on ice after collection to avoid sample deterioration. Blood was collected in a purple top tube vacutainer (BD), layered onto Ficoll media (Thermo Fisher Scientific), and spun at 2,000 r.p.m. for 25 min at 24 °C. The top-layer EDTA–plasma was pipetted off, aliquoted, and immediately frozen at -80 °C. The peripheral blood mononuclear cell (PBMC) layer was collected and counted via the cell counter, and aliquots of PBMCs were further pelleted and flash frozen.
Microsampling blood sample preparation
Mitra tip samples were thawed on ice, prepared and analysed randomly. Briefly, 300 μl of methanol spiked in with internal standards (provided with the Lipidyzer platform) was added to a Mitra tip and vortexed for 20 s. Lipids were solubilized by adding 1,000 μl of MTBE and incubated under agitation for 30 min at 4 °C. Phase separation was induced by the addition of 250 μl of ice-cold water. Samples were vortexed for 1 min and centrifuged at 14,000g for 5 min at 20 °C. The upper phase containing the lipids was then collected, dried down under nitrogen, reconstituted with 200 μl of methanol, and stored at −20 °C. After biphasic extraction, the Mitra tips were resuspended in 0.1 M Tris pH 8.6 buffer, along with 10% N-octyl-glucoside and 50 mM Tris(2-carboxyethyl)phosphine, followed by shaking at 60 °C for 1 h (denaturation, solubilization and reduction). The protein mixture was subsequently alkylated with 200 mM indole-3-acetic acid and incubated at room temperature (24 °C) in the dark for 30 min. Proteins were digested with trypsin overnight at 37 °C and quenched the following day with 10% (v/v) formic acid the following day. Three-hundred microlitres of metabolite layer was transferred, and then supplemented with 1,200 μl ice-cold MeOH:acetone:ACN (1:1:1) and vortexed for 10 s. The sample was incubated overnight at −20 °C. The samples were vortexed for 10 s, then centrifuged at 20,000g for 10 min at 4 °C. Then the sample was transferred to a new 2.0 ml tube and dried down. Finally, the samples were stored at −20 °C until data acquisition.
Intravenous blood sample preparation
The sample preparation of venous blood samples for omics data acquisition is documented by our previous studies1,2,29.
Data acquisition of untargeted proteomics
Approximately 8 μg of tryptic digest were separated on a NanoLC 425 System (Sciex). A flow of 5 μl min−1 was used with trap-elute setting using a ChromXP C18 trap column 0.5 × 10 mm, 5 μm, 120 Å (catalogue number 5028898, Sciex). Tryptic peptides were eluted from a ChromXP C18 column 0.3 × 150 mm, 3 μm, 120 Å (catalogue number 5022436, Sciex) using a 43 min gradient from 4% to 32% B with 1 h total run. Mobile phase solvents consisted of 92.9% water, 2% acetonitrile, 5% dimethyl sulfoxide and 0.1% formic acid (A phase) and 92.9% acetonitrile, 2% water, 5% dimethyl sulfoxide and 0.1% formic acid (B phase). Mass spectrometry analysis was performed using Sequential Window Acquisition of all Theoretical (SWATH) acquisitions on a TripleTOF 6600 System equipped with a DuoSpray Source and 25 mm inner diameter electrode (Sciex). Variable Q1 window SWATH acquisition methods (100 windows) were built-in high-sensitivity tandem mass spectrometry mode with Analyst TF Software (v1.7).
Data processing of untargeted proteomics
The spectra were analysed with OpenSWATH using an in-house spectral library made from plasma and PBMC samples. Peak groups were then statistically scored with the PyProphet tool (v2.0.1), and all runs were aligned using the TRIC strategy. A final data matrix was produced with 1% false discovery rate (FDR) at the peptide level and 5% FDR at the protein level. Several QC steps were then applied to the output from SWATH2STATS. The correlation of peptide intensities between samples was calculated, and two samples with a mean sample correlation less than 2 s.d. from the mean sample correlation were removed. An additional sample with a peptide count less than 3 s.d. below the mean was removed. Poorly identified proteins and peptides were removed according to their m-scores using a target FDR of 0.05 (m-score threshold 8.91 × 10−12). Peptides matched to an unknown protein, non-proteotypic peptides and peptides beyond the ten most intense peptides for a given protein were all removed. Protein intensities were then calculated by first summing the intensities of all transitions mapped to each peptide and then all peptides mapped to each protein. Proteins that were missing for > 50% of samples were removed, as were proteins whose CV among a separate set of three QC samples was greater than 50%. Each missing protein value was imputed using k-nearest neighbours (KNN; k = 10; using only non-imputed data; R package VIM, version 6.1.0). Protein values were then log2 transformed.
Data acquisition of untargeted metabolomics
Prepared samples were analysed four times using hydrophilic interaction liquid chromatography (HILIC) and reverse phase liquid chromatography (RPLC) separation in both positive and negative ionization modes, respectively. Data were acquired on a Q Exactive Plus mass spectrometer for HILIC and a Q Exactive mass spectrometer for RPLC (Thermo Fisher Scientific). Both instruments were equipped with an HESI-II probe and operated in full mass spectrometry scan mode. Tandem mass spectrometry data were acquired on QC samples consisting of an equimolar mixture of all samples in the study. HILIC experiments were performed using a ZIC-HILIC column 2.1 × 100 mm, 3.5 μm, 200 Å (catalogue number 1504470001, Millipore) and mobile phase solvents consisting of 10 mM ammonium acetate in 50/50 acetonitrile/water (A phase) and 10 mM ammonium acetate in 95/5 acetonitrile/water (B phase). RPLC experiments were performed using a Zorbax SBaq column 2.1 × 50 mm, 1.7 μm, 100 Å (catalogue number 827700-914, Agilent Technologies) and mobile phase solvents consisting of 0.06% acetic acid in water (A phase) and 0.06% acetic acid in methanol (B phase).
Data processing of untargeted metabolomics
Data from each mode were independently analysed using Progenesis QI software (v2.3, Nonlinear Dynamics). Metabolic features from blanks that did not show sufficient linearity upon dilution in QC samples (r < 0.6) were discarded. To reduce metabolic features of the metabolome profile, only metabolic features present in > 2/3 of the samples were kept for further analysis. Next, in the study samples, metabolic features present in > 50% of those samples were kept for further analysis. Missing values were imputed using KNN with k = 10. Data were then log2 transformed. The batch effect was evaluated using the dbnorm package61. Applying several batch removal algorithms, the ComBat model62, giving the best performance, was considered for correcting systematic variation associated with the batch. Data from each mode were independently analysed using Progenesis QI software. ComBat was used to do data normalization61, and KNN was used for missing value imputation. Data from each mode were merged, and metabolites were formally identified by matching fragmentation spectra and retention time to analytical-grade standards when possible or by matching experimental tandem mass spectrometry to fragmentation spectra in publicly available databases using metID63. We used the Metabolomics Standards Initiative64 level of confidence to grade metabolite annotation confidence (levels 1 and 2).
Data acquisition of semi-targeted lipidomics
Prepared samples were analysed using the Lipidyzer platform that comprises a 5500 QTRAP System equipped with a SelexION differential mobility spectrometry interface (Sciex) and a high-flow LC-30AD solvent delivery unit (Shimadzu). The detailed method can be found in our previous study65. In brief, lipid molecular species were identified and quantified using multiple reaction monitoring (MRM) and positive/negative ionization switching. Two acquisition methods were employed, covering ten lipid classes; method 1 had SelexION voltages turned on, while method 2 had SelexION voltages turned off. Lipidyzer data were reported by the Lipidomics Workflow Manager software, which calculates concentrations for each detected lipid as the average intensity of the analyte MRM/average intensity of the most structurally similar internal standard MRM multiplied by its concentration.
Data processing of semi-targeted lipidomics
The final datasets were generated from the Lipidyzer platform, and the lipid abundances were reported as concentrations in nmol g−1. Lipids detected in less than 2/3 of the samples were discarded, and missing values were imputed on the basis of a lipid class-wise KNN-TN (KNN truncation) imputation method66.
Cytokines and metabolic panel
Cytokines were analysed using the HCYTMAG-60K-PX41 kit or the HSTCMAG28SPMX13 kit. For metabolic hormone assays, the catalogue number was HMHEMAG-34K. These assays were performed by the Human Immune Monitoring Center at Stanford University. All kits were purchased from EMD Millipore Corporation and used according to the manufacturer’s instructions with the following modifications. Briefly, samples were mixed with antibody-linked magnetic beads on a 96-well plate and incubated overnight at 4 °C with shaking. Cold (4 °C) and room-temperature incubation steps were performed on an orbital shaker at 500–600 r.p.m. Plates were washed twice with wash buffer in a Biotek ELx405 washer. Following 1 h of incubation at room temperature with a biotinylated detection antibody, streptavidin–PE was added for 30 min with shaking. Plates were washed as described, and phosphate-buffered saline was added to wells for reading in the Luminex FlexMap3D Instrument (Thermo Fisher Scientific) with a lower bound of 50 beads per sample per cytokine. Each sample was measured in a singlet. Custom Assay Chex control beads were purchased from Radix BioSolutions and added to all wells.
Cortisol
This assay was performed by the Human Immune Monitoring Center at Stanford University using the ProcartaPlex Simplex Kit (catalogue number EPX010-12190-901, Thermo Fisher Scientific) and used according to the manufacturer’s instructions with modifications as described. Briefly: Beads were added to a 96-well plate and washed in a BioTek ELx405 washer. Samples were added to the plate containing the mixed antibody-linked beads, and 20 μl of the competitive conjugate was added and incubated overnight at 4 °C with shaking. Cold (4 °C) and room-temperature incubation steps were performed on an orbital shaker at 500–600 r.p.m. Following overnight incubation, the plate was washed as described, and PE was added for 30 min at room temperature. The plate was washed as above, and a reading buffer was added to the wells. Each sample was measured in a single well. Plates were read using a Luminex FM3D FlexMap instrument with a lower bound of 50 beads per sample per cytokine. Custom Assay Chex control beads (Radix BioSolutions) were added to all wells.
Total protein
Total protein was determined by bicinchoninic acid assay according to kit instructions (Thermo Fisher Scientific).
Wearable data
The smartwatch (Fitbit Ionic) was used to collect the sleep, HR and step count data. The Fitbit Intraday API through the My Personal Health Dashboard app67 was used to retrieve sleep, HR and step count data for the experiment period. The Dexcom G5 device was used to collect the CGM data. CGM data were transferred directly from the G5 device51. Dietary intake was logged manually using a notebook to track approximate meal timing and composition.
Study design of stability analysis
All the microsamples were stored at −80 °C before they were prepared and analysed. The stability analysis was designed to explore whether the molecules from the microsamples are stable in different storage conditions (temperature and duration time) before they are stored at −80 °C. Two individuals were enroled under the institutional review board (IRB)-approved protocol (IRB-23602 at Stanford University) with written consent. By venepuncture, two individuals were asked to provide 10 ml of whole blood (in an EDTA purple top tube). The whole blood of each participant was poured into separate plastic reservoirs. Then 10 μl Mitra devices were touched to the surface of the blood to fill the microsample sponge. Thirty-six microsamples were generated for each participant, and microsamples were stored in duplicate at three temperatures (4, 25 and 37 °C) for six durations at the given temperature (3, 6, 24, 72, 120 and 0 h (that is, put into cold storage immediately)) before being stored at −80 °C until analysis. Then all the microsamples were prepared and used to acquire proteomics, metabolomics and lipidomics data using the protocol described above. All the omics data were provided as Supplementary Dataset 1.
The first metric of stability
After the data generation, annotation, cleaning, imputation and transformations, each of the omic datasets (proteins, metabolite features and lipids) were assessed for analyte stability in storage. A total of 128 proteins (n = 66 samples), 1,461 metabolites (no redundant metabolite removal, n = 71 samples) and 776 lipids (n = 72 samples) were available for the stability analysis. The first metric assessed was the CV (estimated using the formula for log-transformed data12), which was calculated separately across all of the samples for each of the two participants from whom samples were taken. The mean of the two CVs (one from each participant’s samples) was used as the CV for that analyte. The distribution of CVs was plotted.
The second metric of stability
The second stability metric was used to identify storage conditions’ significant effects on the analyte level. Linear regression was performed for each analyte where the analyte level was regressed on storage duration, temperature, the duration × temperature interaction effect, and an indicator for one of the two participants (to remove the effect of the actual difference in analyte level between the participants). As the samples that had 0 storage duration were never stored at any temperature, those samples were excluded from the analysis so that the effect of storage temperature could still be estimated, leaving 54, 59 and 60 samples for the protein, metabolite and lipid analyses, respectively. The ‘lm’ function in R was used, and since the objective of the study was to identify analytes that were stable under storage, a simple significance threshold of P = 0.05 was used to be more conservative since smaller P-value thresholds would exclude subtler potential effects of storage. The total model R2 and the partial R2 for each regression term were calculated using the ‘rsq’ and ‘rsq.partial’ functions of the ‘rsq’ package (version 2.2). The LMG measure of variable importance1 was also calculated using the ‘calc.relimp’ function of the ‘relaimpo’ package (version 2.2-6). The proportion of statistically significant effects of storage conditions on analyte level was evaluated against the expected number of significant results at the alpha level of 0.05 to gauge the extent of signal for significant storage effects on the analytes. For each omic dataset and storage condition term, the top most associated analytes (according to P value) were plotted over time and coloured by storage temperature to visually examine the identified effects. As a lack of power might have prevented the identification of some storage effects, each regression analysis was repeated but using two separate models, one testing only storage duration and one testing only storage temperature. The benefit of this change was that the baseline samples could be included in the models testing the effect of storage duration.
Comparison between microsamples and intravenous plasma
To compare the microsampling and conventional intravenous plasma collection approaches, 34 participants were enroled under the IRB-approved protocol (IRB-55689 at Stanford University) with written consent. Then one microsampling blood sample and one intravenous plasma sample were collected for each participant. All the samples were immediately saved at the −80 °C for subsequent sample preparation. Then all the samples were prepared and used to acquire untargeted metabolomics and lipidomics data according to the above protocols. For the metabolomics data, after data processing and data curation, 22,858 metabolic features were detected (RPLC positive mode: 7,487 features, RPLC negative mode: 4,662 features, HILIC positive mode: 6,362 features, HILIC negative mode: 4,374 features). Only 642 features with annotations (Metabolomics Standards Initiative levels 1 and 2) remained for subsequent analysis. For the lipidomics data, 616 lipids were detected. All the omics data are provided in Supplementary Dataset 2.
Ensure shake study cohort
Twenty-eight participants were enroled in the Ensure shake study under the IRB-approved protocol (IRB-47966 at Stanford University) with written consent. Twenty-one out of 28 participants have completed demographic data (Supplementary Fig. 2). The median SSPG is 166, the median age is 64.2 years, and the median BMI is 29.7 kg m−2. Among all the participants, 38% are male, 14.3% are Asian, 14.3% are Black, 66.7% are Caucasian and 4.8% are Hispanic. All 28 participants were mailed a kit containing microsampling devices (Mitra device), Ensure shake (contains 440 kcal, 66 g carbohydrate, 18 g protein and 12 g fat) and instructions for the microsampling sample collection. Each participant was instructed to consume the Ensure shake and then collected microsampling blood samples immediately before consuming Ensure shake (baseline, timepoint 0), and at 30, 60, 120 and 240 min following Ensure shake consumption (Supplementary Fig. 2b). Finally, we collected five timepoint microsamples for each participant (Supplementary Fig. 2b). Participants were asked to return their microsamples by overnight mail the same day after blood sample collection. Then all the microsamples were used for multi-omics data acquisition, namely, untargeted metabolomics, targeted lipidomics and cytokine/hormone. Four participants (S6, S26, S31 and S37) without metabolomics data were removed from the final dataset (Supplementary Fig. 2b). After data cleaning, curation and annotation, 768 analytes were detected from the microsamples, containing 560 metabolites, 155 lipids and 54 cytokines/hormones. All the omics data are provided in Supplementary Dataset 3.
24/7 study cohort
Only one participant (male, 64 years old) was enroled in the 24/7 study under IRB-approved protocol (IRB-23602 at Stanford University) with written consent. The microsampling method enables frequent sampling on the order of minutes or hours. However, to make it acceptable and executable, the participant was instructed to perform self-collected finger prick microsamples approximately every hour during waking and every two hours during overnight periods sporadically for 7 days (Fig. 4a and Supplementary Fig. 6a). In addition, the participant was also instructed to leverage several wearable devices (Fitbit smartwatch, Dexcom) to acquire comprehensive digital data (wearable data), including the HR, step count, CGM and food logging. The microsamples were immediately saved on dry ice upon collection by the participant and then shipped to the laboratory daily. Finally, 97 microsamples in total were collected. They were used to perform in-depth multi-omics data acquisition, including (1) untargeted proteomics, (2) untargeted metabolomics, (3) semi-targeted lipidomics and (4) targeted assay (cytokine, hormones, total protein and cortisol). After data processing, curation and annotation, from the microsamples, we finally detected a total of 2,213 analytes that included 1,051 metabolites, 811 lipids, 291 proteins, 45 cytokines, 13 metabolic panels (cytokines/hormones), 1 total protein and 1 cortisol. All the data are provided as a resource in Supplementary Datasets 6 and 7.
General statistical, bioinformatics analysis and data visualization
Most statistical analysis and data visualization were performed using RStudio and R language (version 4.1.2). Most of the R packages and their dependencies used in this study are maintained in CRAN (https://cran.r-project.org/) or Bioconductor (https://bioconductor.org/). The detailed version of all the packages can be found in Supplementary Note. The main script for analysis and data visualization is provided on GitHub (https://github.com/jaspershen/microsampling_multiomics).
In general, before all the statistical analysis, the data are log2 transformed and then auto-scaled. All the multiple comparisons were adjusted by the BH method using the ‘p.adjust’ function in R. The R functions ‘cor’ and ‘cor.test’ were used to calculate the Spearman correlation coefficients. The R package ‘ggplot2’ was used to perform most of the data visualization in this study. The R package ‘Rtsne’ was used for the tSNE analysis in the Ensure shake study. The icons used in figures are from iconfont.cn, which can be used for uncommercial purposes under the MIT license (https://pub.dev/packages/iconfont/license).
Differentially expressed molecules after consuming Ensure shake
In the Ensure shake study, the timepoint 0 (before consuming Ensure shake) was set as the baseline, and all the other four timepoints were compared with the baseline to get the differentially expressed molecules (metabolites, lipids and cytokines/hormones). The paired Wilcoxon rank-sum test (‘wilcox.test’ function of R) was used to get the P values. The multiple comparisons were adjusted using the BH method (‘p.adjust’ function of R). And the adjusted P values less than 0.05 were considered as significantly differentially expressed molecules. Then the number of significant molecules whose level had changed at different timepoints was visualized using a Sankey plot (‘ggalluvial’ package of R). Next, after consuming Ensure shake across all the timepoints, we identified the entire set of molecules whose levels changed. The ANOVA test (‘anova_test’ function from the ‘rstatix’ package in R) was used to calculate the P values and then adjusted using the BH method. To evaluate whether the significantly expressed molecules we found were random or not, a permutation test was performed. In brief, the sample labels of omics data were randomly shifted to get the random datasets. Then the same method (ANOVA test) was used to find the altered molecules for the random dataset. This step was repeated 100 times to get a null distribution of differential molecules. Then the permutation P value was calculated to evaluate whether the expressed molecules were random.
Consensus clustering
In the Ensure shake study, the unsupervised k-means consensus clustering of all samples was performed with the R packages ‘CancerSubtypes’ and ‘ConsensusClusterPlus’ using the significantly shifted molecules that were discovered after consuming the Ensure shake68. The data were log2 transformed first and then auto-scaled. Samples clusters were detected on the basis of k-means clustering, Euclidean distance and 1,000 resampling repetitions in the ‘ExecuteCC’ function in the range of two to six clusters. The generated empirical cumulative distribution function plot initially showed the optional separation of two clusters for all samples. To further decide how many groups (k) should be generated, the silhouette information from clustering was extracted using the ‘silhouette_SimilarityMatrix function’. We compared k = 2, 3, 4 and 5 and found that, when k = 2, we got high stability for clustering (Extended Data Fig. 2c). From the consensus matrix heat maps, two groups seem to have the best clustering (Extended Data Fig. 2d). So finally, all the samples were assigned to two groups.
Fuzzy c-means clustering
The R package ‘Mfuzz’ was used for fuzzy c-means clustering69. In brief, the omics data were first log2 transformed and auto-scaled, and then the minimum centroid distances were calculated for cluster numbers from 2 to 22 by step 1. The minimum centroid distance is used as the cluster validity index. Then the optimal cluster number was selected according to rule70. To get a more accurate cluster number, the clusters whose centre expression data correlations are more than 0.8 were merged as one cluster. Then the optimal cluster number was used to do the fuzzy c-means clustering. For each cluster, only the molecules with memberships of more than 0.5 were retained for subsequent analysis.
Metabolic scores
Participant S18 was considered as an outlier in the baseline and removed from the dataset for subsequent analysis (Supplementary Fig. 3). Then five metabolic scores were calculated: (1) Three carbohydrates (fructose, lactic acid and pyruvic acid) were detected and used to calculate the carbohydrate score, which represents the human’s ability to metabolize carbohydrates (Supplementary Fig. 4). (2) Nine amino acids (alloisoleucine, alanine, isoleucine, methionine, norvaline, phenylalanine, tryptophan, tyrosine and l-phenylalanine) were detected and used to calculate the amino acid score (protein), which represents the human’s ability to metabolize proteins (Supplementary Fig. 4). (3) A total of 103 TAGs were detected and used to calculate the fat score, representing the human’s ability to metabolize the fat (Supplementary Fig. 4). (4) The C-peptide and insulin were detected and used to calculate the insulin secretion score, representing the human’s ability to secrete insulin (Supplementary Fig. 4). (5) The eight FFAs (FFA 16:0, FFA 16:1, FFA 18:1, FFA 18:2, FFA 18:3, FFA 22:2, FFA 22:5 and FFA 22:6) were detected and used to calculate FFA (insulin sensitivity) score, which represents the human’s ability to respond to insulin sensitivity (Supplementary Fig. 4). (6) All the cytokines were used to calculate the immune response score representing the human’s immune response (Supplementary Fig. 5a).
For each metabolic score MS, the molecules Mi (i = 1, 2, 3 … m) in this group were first defined and selected (Fig. 3b), and then the dataset was log2 transformed and auto-scaled. For each participant and molecule, the intensity values across all the timepoints were subtracted by the baseline value, so the baseline value was 0. Then the AUC Ai,j was calculated for molecule Mi (i = 1, 2, 3 … m) and participant Pj (j = 1, 2, 3 … n). To normalize the Ai,j, the Ai,j were subtracted by the minimum min(Ai,j) and divided by the range of all the AUCs (max(Ai,j) − min(Ai,j)). The normalized Ai,j is labelled as NAi,j and is from 0 to 1. Then, each metabolic score MSj in each participant j is calculated as below:
where MSj is the metabolic score for participant j, and NAi is the normalized AUCs of molecule i (i = 1, 2, 3 … m). For the carbohydrate score, amino acid (protein) score, fat score and FFA score (insulin sensitivity), the high AUCs of molecules mean that the person’s ability to metabolize the molecules is low, so the final metabolic scores were calculated as 1 − MSj. For the insulin secretion score and immune response score, the final score is the same as the MSj.
Metabolomics pathway enrichment
To do the metabolomics pathway enrichment, the human KEGG pathway database was downloaded from KEGG using the R package massDatabase71. The original KEGG database has 275 metabolic pathways. Then we separated them into metabolic pathways or disease pathways on the basis of the ‘class’ information for each pathway. The pathways with the ‘human disease’ class were assigned to the disease pathway database, which contains 74 pathways, and the remaining 201 pathways were assigned to the metabolic pathway database. The pathway enrichment analysis is used in the hypergeometric distribution test from the tidyMass project72. The BH method was used to adjust P values, and the cut-off was set as 0.05 (BH-adjusted P values < 0.05).
Lipidomics data enrichment analysis
The Lipid Mini-on software was used to do the lipid enrichment analysis45. In brief, the lipids’ names were first modified to meet the requirement of the tool. The dysregulated lipids were uploaded as query files, and all the detected lipids were uploaded as universe files. The default Fisher’s exact test was used as the enrichment test method. The category, main class, subclass, individual chains, individual chain length and number of double bonds were selected for general parameters to test. Finally, the enrichment result containing detailed tables and networks was downloaded for subsequent analysis.
Proteomics pathway enrichment
The R package ‘clusterProfiler’ was used for proteomics pathway enrichment. We first converted the gene ID of proteins to ENTREZID ID, and then the Gene Ontology (GO) database was used for GO term enrichment analysis. The P values were adjusted using the BH method, and the cut-off was set as 0.05. Only the enriched GO terms with at least mapped five proteins remained to ensure that the enriched GO terms have enough genes. To reduce the redundancy of enriched GO terms, the similarity between GO terms was calculated using the ‘Wang’ algorithm from the R package ‘simplifyEnrichment’73. And only the connections with similarities > 0.3 remained to construct the GO term similarity network. Then the community analysis (R package ‘igraph’) was used to divide this network into different modules. The GO term with the smaller enrichment adjusted P values was selected for each module as the representative.
LOESS smoothing data
In the 24/7 study, the timepoints of microsamples for each day differ. However, the circadian analysis requires enough timepoints for each day. So we leveraged the locally estimated scatterplot smoothing (LOESS) method to smooth and predict the multi-omics data in the specific timepoints (every half hour) described in another publication74. In brief, for each molecule, we fitted it with the LOESS regression method for each day (‘loess’ function in R). During the fitting, LOESS’s argument ‘span’ was optimized by cross-validation. As the gap between 2 days is always more than 4 h, we did not fit the time between 2 days for an accurate and robust fitting and prediction. After getting the LOESS prediction model, we predicted each molecule’s intensity every half hour during the days.
Correlation network and community analysis
In the 24/7 study, we constructed a correlation network for each cluster that we got using fuzzy c-means clustering. In brief, the Spearman correlation was calculated for every two molecules. Only the correlations with coefficient > 0.7 and BH-adjusted P values < 0.05 remained for subsequent analysis. All the remained correlations were used to construct the correlation network. To get more accurate and distinct modules, we use the community analysis to extract subnetworks (modules) from the correlation network31. Here we used the fast greedy modularity optimization algorithm (‘cluster_fast_greedy’ function from the R package ‘igraph’). Finally, 11 clusters and 83 modules were detected. The R packages ‘igraph’ and ‘ggraph’ were used to visualize the network.
Associations between molecular modules and nutrition intake
In the 24/7 study, to evaluate the associations between molecular modules and nutrition intake, peak detection (Gaussian distribution fitting) was first used to find the ‘peaks’ in each module (Extended Data Fig. 3f). If there is a peak, then it is marked as ‘1’ at this time. If not, it is marked as ‘0’. For food, if the participant consumes this food at this timepoint, then this timepoint will be marked as ‘1’ for this food. Then, for each food and module, the Jaccard index was calculated, and only the pairs with a Jaccard index > 0.3 were retained for subsequent analysis (Extended Data Fig. 3g).
Consistency score for molecules
In the 24/7 study, the consistency score was designed and calculated for each molecule to assess whether one molecule is consistent daily. LOESS smoothed data was used for consistency score calculation. For each molecule, the Spearman correlations between 2 days were calculated, and the median correlation value was calculated and considered as the consistency score for this molecule. Only the molecules with consistency scores > 0.6 were retained for the next circadian analysis.
Circadian rhythm analysis
In the 24/7 study, the R package ‘MetaCycle’ is used to do the circadian rhythm analysis43. The LOESS smoothed omics data were log2 transformed and auto-scaled. Then, the times for samples were set as the timepoints in the ‘meta2d’ function. The Lomb–Scargle was selected for circadian rhythm analysis75. The P values were adjusted using the BH method. Only the molecules with BH-adjusted P values < 0.05 were considered statistically significant circadian molecules and retained for subsequent analysis.
Wearable data predicts internal molecules
In the 24/7 study, to evaluate whether the wearable data could be used to predict internal molecules, the method from a previous publishment30 was used. As the frequency of wearable data and internal molecules are different, we need to match the internal molecule and wearable data first. The matching windows were set as 5, 10, 20, 30, 40, 50, 60, 90 and 120 min, respectively. For the wearable data points that matched with internal molecules, a feature engineering pipeline30 was used to convert the wearable data into eight features: mean value, median value, standard, maximum, minimum, skewness, kurtosis and range. So, each wearable data point was converted into eight features. The wearable data (HR, step count and CGM) were converted to 24 features in total and were used as independent variables to predict each internal molecule. The random forest model (R package ‘caret’ and ‘RandomForest’), which has been proven to have the best prediction accuracy, was used30. The 24 wearable features were combined for each internal molecule to construct the prediction model. The sevenfold cross-validation method was used during the prediction model construction. The importance of each wearable feature was saved for subsequent analysis.
Lagged correlation
In the 24/7 study, to calculate the lagged correlation between wearable data and internal molecules, we have developed the laggedCor algorithm (lagged correlation) and an R package named ‘laggedcor’ (https://jaspershen.github.io/laggedcor/). The laggedCor algorithm can be used to extract potential causal relationships. Let us assume that X is wearable data and Y is internal omics data. In a real biological system, if X and Y have a causal relationship (X causes Y), Y often responds to X after a certain lapse of time. Such a lapse of time is called a lag time. This means that X and Y change asynchronously. To explore whether X and Y have a potential causal relationship, we just shift the lag time between X and Y for matching and then calculate the correlation between them. Suppose the X and Y have a potential causal relationship and the lag time is T; then we can get the highest lagged correlation between X and Y at the lag time T.
Briefly, two time-series data are used as the inputs for laggedcor. The lower frequency time-series data (in the 24/7 study, the omics data) are labelled as Xt (t ∈ Ti), and the higher frequency time-series data (in the 24/7 study, the wearable data) are labelled as Yt (t ∈ Tj). To make sure that there are overlaps between Xti and Ytj, they should meet the below equation:
Then the two series data, Xt and Yt, are used to calculate the lagged correlation as described in the steps below.
Step 1: matching between X t and Y t
Every sample point Ytj in Y is used to match the sample points in Xt. The shift time is labelled as Ts (Ts is set on the basis of the frequency of Xt and Yt), and the matching time window is labelled as Tw. So the sample points Xti in Xt that meet the below equation are labelled as matched sample points for Ytj in Y:
Then the matched sample points Xti are averaged as Xtj that matched with Ytj in Y:
Then we get the new time-series data Xt (t ∈ Tj).
Step 2: correlation calculation
Then the Spearman correlation between Xt and Yt (t ∈ Tj) is calculated with the shift time Ts. And the correlation rho and P value are recorded as Corts and pts.
Step 3: repeat step 1 and step 2 with different shift time
Then, step 1 and step 2 are repeated for a series shift times Tsi, i = 1, 2, 3 … n; Ts1 < 0 and abs(Ts1) = abs(Tsn). Then we can get a series Corts and a series pts, ts ∈ Ts.
Step 4: evaluation of the significance of lagged correlation
The maximum correlation of Corts and related P value are extracted as the lagged correlation for time-series data Xt and Yt. To evaluate whether the lagged correlation is significant, the Gaussian distribution is used to fit the Corts, and the correlations in all the shift times are calculated using the fitted Gaussian distribution and labelled as PCorts. The quality score was then calculated as the absolute Spearman correlation score between PCorts and Corts. Only the lagged correlation with a quality score was considered a real lagged correlation and used for subsequent analysis.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
All the analysed data used in this study are provided as a supplementary dataset. Source data are provided with this paper.
Code availability
R version 4.1.2 was used with the base packages and other packages, and detailed information is provided in Supplementary Information (section Supplementary Note). All the custom scripts for data analysis and data visualization are provided open-source via https://github.com/jaspershen/microsampling_multiomics and Zenodo (https://zenodo.org/record/7393012#.Y4sEj-yZP0o). The laggedCor algorithm and package were developed for lagged correlation calculation and are available open-source via https://jaspershen.github.io/laggedcor.
References
Schüssler-Fiorenza Rose, S. M. et al. A longitudinal big data approach for precision health. Nat. Med. 25, 792–804 (2019).
Zhou, W. et al. Longitudinal multi-omics of host–microbe dynamics in prediabetes. Nature 569, 663–671 (2019).
Guthrie, R. & Susi, A. A simple phenylalanine method for detecting phenylketonuria in large populations of newborn infants. Pediatrics 32, 338–343 (1963).
Antunes, M. V., Charão, M. F. & Linden, R. Dried blood spots analysis with mass spectrometry: potentials and pitfalls in therapeutic drug monitoring. Clin. Biochem. 49, 1035–1046 (2016).
Corso, G., D’Apolito, O., Gelzo, M., Paglia, G. & Dello Russo, A. A powerful couple in the future of clinical biochemistry: in situ analysis of dried blood spots by ambient mass spectrometry. Bioanalysis 2, 1883–1891 (2010).
Bennett, M. J. et al. Newborn screening for metabolic disorders: how are we doing, and where are we going? Clin. Chem. 58, 324–331 (2012).
Volani, C. et al. Pre-analytic evaluation of volumetric absorptive microsampling and integration in a mass spectrometry-based metabolomics workflow. Anal. Bioanal. Chem. 409, 6263–6276 (2017).
van den Broek, I. et al. Application of volumetric absorptive microsampling for robust, high-throughput mass spectrometric quantification of circulating protein biomarkers. Clin. Mass Spectr. https://doi.org/10.1016/j.clinms.2017.08.004 (2017).
Molloy, M. P. et al. Proteomic analysis of whole blood using volumetric absorptive microsampling for precision medicine biomarker studies. J. Proteome Res. 21, 1196–1203 (2022).
Lei, B. U. W. & Prow, T. W. A review of microsampling techniques and their social impact. Biomed. Microdevices 21, 81 (2019).
Zhuang, Y.-J., Mangwiro, Y., Wake, M., Saffery, R. & Greaves, R. F. Multi-omics analysis from archival neonatal dried blood spots: limitations and opportunities. Clin. Chem. Lab. Med. https://doi.org/10.1515/cclm-2022-0311 (2022).
Canchola, J. A. Correct use of percent coefficient of variation (%CV) formula for log-transformed data. MOJ Proteom. Bioinform. https://doi.org/10.15406/mojpb.2017.06.00200 (2017).
Hannon, B. A. et al. Single nucleotide polymorphisms related to lipoprotein metabolism are associated with blood lipid changes following regular avocado intake in a randomized control trial among adults with overweight and obesity. J. Nutr. https://doi.org/10.1093/jn/nxaa054 (2020).
Berry, S. E. et al. Human postprandial responses to food and potential for precision nutrition. Nat. Med. 26, 964–973 (2020).
Wang, D. D. et al. The gut microbiome modulates the protective association between a Mediterranean diet and cardiometabolic disease risk. Nat. Med. 27, 333–343 (2021).
Turnbaugh, P. J. et al. The effect of diet on the human gut microbiome: a metagenomic analysis in humanized gnotobiotic mice. Sci. Transl. Med. 1, 6ra14 (2009).
Spiekerkoetter, U. Mitochondrial fatty acid oxidation disorders: clinical presentation of long-chain fatty acid oxidation defects before and after newborn screening. J. Inherit. Metab. Dis. https://doi.org/10.1007/s10545-010-9090-x (2010).
Gannon, M. C., Nuttall, F. Q., Westphal, S. A. & Seaquist, E. R. The effect of fat and carbohydrate on plasma glucose, insulin, C-peptide, and triglycerides in normal male subjects. J. Am. Coll. Nutr. 12, 36–41 (1993).
Ludwig, D. S. et al. The carbohydrate–insulin model: a physiological perspective on the obesity pandemic. Am. J. Clin. Nutr. https://doi.org/10.1093/ajcn/nqab270 (2021).
Hall, K. D. A review of the carbohydrate–insulin model of obesity. Eur. J. Clin. Nutr. https://doi.org/10.1038/ejcn.2016.260 (2017).
Meier, J. J. & Nauck, M. A. Glucagon-like peptide 1 (GLP-1) in biology and pathology. Diabetes Metab. Res. Rev. https://doi.org/10.1002/dmrr.538 (2005).
Pederson, R. A. & McIntosh, C. H. S. Discovery of gastric inhibitory polypeptide and its subsequent fate: personal reflections. J. Diabetes Investig. https://doi.org/10.1111/jdi.12480 (2016).
Katsuura, G., Asakawa, A. & Inui, A. Roles of pancreatic polypeptide in regulation of food intake. Peptides https://doi.org/10.1016/s0196-9781(01)00604-0 (2002).
Kelesidis, T., Kelesidis, I., Chou, S. & Mantzoros, C. S. Narrative review: the role of leptin in human physiology: emerging clinical applications. Ann. Intern. Med. 152, 93–100 (2010).
Kim, S. H. & Reaven, G. M. Insulin resistance and hyperinsulinemia. Diabetes Care https://doi.org/10.2337/dc08-0045 (2008).
Xin, Y. et al. Elevated free fatty acid level is associated with insulin-resistant state in nondiabetic Chinese people. Diabetes Metab. Syndr. Obes. 12, 139–147 (2019).
Chen, R. et al. Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell 148, 1293–1307 (2012).
Piening, B. D. et al. Integrative personal omics profiles during periods of weight gain and loss. Cell Syst. https://doi.org/10.1016/j.cels.2017.12.013 (2018).
Gao, P. et al. Precision environmental health monitoring by longitudinal exposome and multi-omics profiling. Genome Res. 32, 1199–1214 (2022).
Dunn, J. et al. Wearable sensors enable personalized predictions of clinical laboratory measurements. Nat. Med. 27, 1105–1112 (2021).
Price, N. D. et al. A wellness study of 108 individuals using personal, dense, dynamic data clouds. Nat. Biotechnol. 35, 747–756 (2017).
Burton-Pimentel, K. J. et al. Discriminating dietary responses by combining transcriptomics and metabolomics data in nutrition intervention studies. Mol. Nutr. Food Res. 65, e2000647 (2021).
Contrepois, K. et al. Molecular choreography of acute exercise. Cell 181, 1112–1130.e16 (2020).
DiNicolantonio, J. J. & O’Keefe, J. H. Effects of dietary fats on blood lipids: a review of direct comparison trials. Open Heart 5, e000871 (2018).
Mohd Azmi, N. A. S. et al. Cortisol on circadian rhythm and its effect on cardiovascular system. Int. J. Environ. Res. Public Health 18, 676 (2021).
Stachowicz, M. & Lebiedzińska, A. The effect of diet components on the level of cortisol. Eur. Food Res. Technol. https://doi.org/10.1007/s00217-016-2772-3 (2016).
Cao, W. et al. Drug–drug interactions between salvianolate injection and aspirin based on their metabolic enzymes. Biomed. Pharmacother. 135, 111203 (2021).
Watson, E. J., Coates, A. M., Kohler, M. & Banks, S. Caffeine consumption and sleep quality in Australian adults. Nutrients 8, 479 (2016).
Clark, I. & Landolt, H. P. Coffee, caffeine, and sleep: a systematic review of epidemiological studies and randomized controlled trials. Sleep. Med. Rev. 31, 70–78 (2017).
Li, X. et al. Digital health: tracking physiomes and activity using wearable biosensors reveals useful health-related information. PLoS Biol. 15, e2001402 (2017).
Fishbein, A. B., Knutson, K. L. & Zee, P. C. Circadian disruption and human health. J. Clin. Investig. https://doi.org/10.1172/jci148286 (2021).
Patke, A., Young, M. W. & Axelrod, S. Molecular mechanisms and physiological importance of circadian rhythms. Nat. Rev. Mol. Cell Biol. 21, 67–84 (2020).
Wu, G., Anafi, R. C., Hughes, M. E., Kornacker, K. & Hogenesch, J. B. MetaCycle: an integrated R package to evaluate periodicity in large scale data. Bioinformatics 32, 3351–3353 (2016).
Hughes, M. E., Hogenesch, J. B. & Kornacker, K. JTK_CYCLE: an efficient nonparametric algorithm for detecting rhythmic components in genome-scale data sets. J. Biol. Rhythms https://doi.org/10.1177/0748730410379711 (2010).
Clair, G. et al. Lipid Mini-On: mining and ontology tool for enrichment analysis of lipidomic data. Bioinformatics 35, 4507–4508 (2019).
Chua, E. C.-P. et al. Extensive diversity in circadian regulation of plasma lipids and evidence for different circadian metabolic phenotypes in humans. Proc. Natl Acad. Sci. USA 110, 14468–14473 (2013).
Alavi, A. et al. Real-time alerting system for COVID-19 and other stress events using wearable data. Nat. Med. 28, 175–184 (2022).
Mishra, T. et al. Pre-symptomatic detection of COVID-19 from smartwatch data. Nat. Biomed. Eng. 4, 1208–1220 (2020).
Marabita, F. et al. Multiomics and digital monitoring during lifestyle changes reveal independent dimensions of human biology and health. Cell Syst. 13, 241–255.e7 (2022).
Miller, I. J. et al. Real-time health monitoring through urine metabolomics. NPJ Digit. Med. 2, 109 (2019).
Hall, H. et al. Glucotypes reveal new patterns of glucose dysregulation. PLoS Biol. 16, e2005143 (2018).
Bizzarri, M. et al. A call for a better understanding of causation in cell biology. Nat. Rev. Mol. Cell Biol. 20, 261–262 (2019).
Shomali, N. et al. Harmful effects of high amounts of glucose on the immune system: an updated review. Biotechnol. Appl. Biochem. 68, 404–410 (2021).
Bitar, A., Mastouri, R. & Kreutz, R. P. Caffeine consumption and heart rate and blood pressure response to regadenoson. PLoS ONE 10, e0130487 (2015).
Rodriguez-Araujo, G. et al. Alpha-synuclein elicits glucose uptake and utilization in adipocytes through the Gab1/PI3K/Akt transduction pathway. Cell. Mol. Life Sci. 70, 1123–1133 (2013).
Wijesekara, N. et al. α-Synuclein regulates peripheral insulin secretion and glucose transport. Front. Aging Neurosci. 13, 665348 (2021).
Buckingham, B. et al. CGM-measured glucose values have a strong correlation with C-peptide, HbA1c and IDAAC, but do poorly in predicting C-peptide levels in the two years following onset of diabetes. Diabetologia 58, 1167–1174 (2015).
Paniagua-González, L. et al. Comparison of conventional dried blood spots and volumetric absorptive microsampling for tacrolimus and mycophenolic acid determination. J. Pharm. Biomed. Anal. 208, 114443 (2022).
Andersen, I. K. L., Rosting, C., Gjelstad, A. & Halvorsen, T. G. Volumetric absorptive MicroSampling vs. other blood sampling materials in LC–MS-based protein analysis—preliminary investigations. J. Pharm. Biomed. Anal. https://doi.org/10.1016/j.jpba.2018.04.036 (2018).
Lancaster, S. M. et al. Global, distinctive, and personal changes in molecular and microbial profiles by specific fibers in humans. Cell Host Microbe 30, 848–862.e7 (2022).
Bararpour, N. et al. DBnorm as an R package for the comparison and selection of appropriate statistical methods for batch effect correction in metabolomic studies. Sci. Rep. https://doi.org/10.1038/s41598-021-84824-3 (2021).
Johnson, W. E., Li, C. & Rabinovic, A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics 8, 118–127 (2007).
Shen, X. et al. metID: an R package for automatable compound annotation for LC–MS-based data. Bioinformatics https://doi.org/10.1093/bioinformatics/btab583 (2022).
Sumner, L. W. et al. Proposed minimum reporting standards for chemical analysis. Metabolomics https://doi.org/10.1007/s11306-007-0082-2 (2007).
Contrepois, K. et al. Cross-platform comparison of untargeted and targeted lipidomics approaches on aging mouse plasma. Sci. Rep. 8, 17747 (2018).
Shah, J. S. et al. Distribution based nearest neighbor imputation for truncated high dimensional data with applications to pre-clinical and clinical metabolomics studies. BMC Bioinformatics 18, 114 (2017).
Bahmani, A. et al. A scalable, secure, and interoperable platform for deep data-driven health management. Nat. Commun. 12, 5757 (2021).
Xu, T. et al. CancerSubtypes: an R/Bioconductor package for molecular cancer subtype identification, validation and visualization. Bioinformatics https://doi.org/10.1093/bioinformatics/btx378 (2017).
Kumar, L. & E Futschik, M. Mfuzz: a software package for soft clustering of microarray data. Bioinformation 2, 5–7 (2007).
Schwämmle, V. & Jensen, O. N. A simple and fast method to determine the parameters for fuzzy c-means cluster analysis. Bioinformatics 26, 2841–2848 (2010).
Shen, X., Wang, C. & Snyder, M. P. massDatabase: utilities for the operation of the public compound and pathway database. Bioinformatics 38, 4650–4651 (2022).
Shen, X. et al. TidyMass an object-oriented reproducible analysis framework for LC–MS data. Nat. Commun. 13, 4365 (2022).
Gu, Z. & Hübschmann, D. Simplify enrichment: A bioconductor package for clustering and visualizing functional enrichment results. Genomics Proteomics Bioinformatics (2022) doi:10.1016/j.gpb.2022.04.008.
Lehallier, B. et al. Undulating changes in human plasma proteome profiles across the lifespan. Nat. Med. 25, 1843–1850 (2019).
Glynn, E. F., Chen, J. & Mushegian, A. R. Detecting periodic patterns in unevenly spaced gene expression time series using Lomb–Scargle periodograms. Bioinformatics 22, 310–316 (2006).
Acknowledgements
M.P.S. discloses support for the research described in this study from the National Institute of Health (grant numbers 5RM1HG00773508 and 5R01AT01023204). Special thanks to Benjamin Rolnik for his various contributations on sample collection and funding application in this study.
Author information
Authors and Affiliations
Contributions
R.K., D.H., X.S. and M.P.S. conceived and designed the study; D.H., B.L.-M., K.E.C., B.M., I.S., S.A., A.G., K.C. and R.K. prepared samples and acquired lipidomics, metabolomics and proteomics data; Y.R.-H. prepared samples and generated Luminex data. D.J.P., N.B. and X.S. performed the stability analysis. X.S. and R.K. analysed the data of the Ensure shake study; X.S., R.K., J.U., A.D. and C.W. analysed the data of the 24/7 study. X.S. and C.W. developed the laggedCor algorithm and built the R package. X.S., C.W. and D.J.P. prepared all the figures. X.S., R.K., N.B., D.H., D.J.P., C.W. and M.P.S. wrote the manuscript. All the authors contributed to the final version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
M.P.S. is a co-founder and scientific advisor of Personalis, SensOmics, Qbio, January AI, Fodsel, Filtricine, Protos, RTHM, Iollo, Marble Therapeutics and Mirvie. He is a scientific advisor of Genapsys, Jupiter, Neuvivo, Swaza and Mitrix. D.H. has a financial interest in Seer Inc. and Prognomiq Inc. R.K. is a co-founder of RTHM Inc. All other authors declare no competing interests.
Peer review
Peer review information
Nature Biomedical Engineering thanks Wei Gao and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Stability analysis of the microsampling approach.
a, The study design of the protein, metabolite, and lipid stability analyses in microsamples. b, The partial R2 distribution for proteins, metabolites, and lipids. The most affected protein, metabolite, and lipid by storage duration (c), temperature (d) and interaction effect (e), respectively. The icons used in this figure are from iconfont.cn.
Extended Data Fig. 2 Metabolic phenotyping separates samples and subjects.
a, tSNA plot using all samples from all the participants. Colors represent the participants. b, tNSA plots for 6 participants. Colors represent the timepoints. The timepoints are also labeled on the plot. c, Silhouette plots for consensus clustering with group numbers 2, 3, 4, and 5. When the group number is 2, the Silhouette width achieves the highest value, so the group number is set as 2 for subsequent analysis. d, Heatmap plot showing differential clustering of molecular features in various samples compared to baseline (0 min) for each participant. Green represents low distance, and red represents high distance. e, The SSPG values for participants (only 13 participants) in group 1 and group 2.
Extended Data Fig. 3 Wearable and multi-omics data can reflect the individual’s health status.
a, The Spearman correlations between all the clusters from all molecules in 24/7 study using the Fuzzy c-means clustering. b, The mosaic plot shows the molecules’ classes for 11 clusters. c, The maximum modularity observed in our correlation network community analysis for cluster 1 was 0.689 at iteration 72 of community pruning. d, The molecule detection from the correlation network. The molecules have more connections inside than outside and are grouped as a module. e, Cluster 1 and Module 1_4 from it. f, Peak detection from the module using the peak detection algorithm. g, The heatmap to show the association between modules and nutrition. h, MS2 spectra matching plots for 1,2,3-benzenetrlol sulfate, Hydroxyphenyllactic acid, and Salicylic acid, respectively.
Extended Data Fig. 4 Wearable and multi-omics data can reflect the individual’s health status.
a, The correlation plot between Caffeine intensity and sleep score. b, The MS2 spectra matching plot for Caffeine. c, Molecules that were upregulated from Wednesday to Friday. d, Molecules that were downregulated from Wednesday to Friday.
Extended Data Fig. 5 Circadian rhythm analysis for internal molecules.
a, Consistence scores versus circadian rhythm p-values (-log10). b, Heatmap to show all the circadian molecules. c, Spearman correlation plot to show the correlations between 5 clusters from circadian molecules. d, The components of all 5 clusters. e, Cluster 4 contains 1 cytokine, 3 lipids, 2 metabolites, and 22 proteins. f, Cluster 3 contains 76 lipids, 1 metabolic panel, and 7 metabolites.
Extended Data Fig. 6 Lipid enrichment results for lipids in clusters 1–3.
Red represents cluster 1, dark green represents cluster 2 and purple represents cluster 3.
Extended Data Fig. 7 Wearable data and internal molecule causal association network.
a, Example association network between wearable data and internal molecules. b, Node and edge distribution of association network.
Supplementary information
Main Supplementary Information
Supplementary figures, tables and note.
Supplementary data
Data for the case studies.
Source data
Source data for Figs. 1–6 and Extended Data Figs. 1–7.
Source data for the main figures and the extended data figures.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Shen, X., Kellogg, R., Panyard, D.J. et al. Multi-omics microsampling for the profiling of lifestyle-associated changes in health. Nat. Biomed. Eng (2023). https://doi.org/10.1038/s41551-022-00999-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41551-022-00999-8