Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Real-time health monitoring through urine metabolomics


Current healthcare practices are reactive and based on limited physiological information collected months or years apart. By enabling patients and healthy consumers access to continuous measurements of health, wearable devices and digital medicine stand to realize highly personalized and preventative care. However, most current digital technologies provide information on a limited set of physiological traits, such as heart rate and step count, which alone offer little insight into the etiology of most diseases. Here we propose to integrate data from biohealth smartphone applications with continuous metabolic phenotypes derived from urine metabolites. This combination of molecular phenotypes with quantitative measurements of lifestyle reflect the biological consequences of human behavior in real time. We present data from an observational study involving two healthy subjects and discuss the challenges, opportunities, and implications of integrating this new layer of physiological information into digital medicine. Though our dataset is limited to two subjects, our analysis (also available through an interactive web-based visualization tool) provides an initial framework to monitor lifestyle factors, such as nutrition, drug metabolism, exercise, and sleep using urine metabolites.


Current medical practice is reactive. Annual checkups measure only a few basic phenotypes and often fail to predict serious health threats such as cancer, dementia, or exposure to pathogens. Instead, most disease is not detected until critical symptoms present, which is often too late for meaningful or cost-effective intervention. Owing to this lack of data, the current model of healthcare is periodic and geared to manage disease symptoms at their onset rather than preventing or reversing the underlying etiology. Humans would undoubtedly benefit from integrated technology to quantify and monitor deviations from baseline wellness using physiological phenotypes.1 Yet access to actionable information on personal physiological health remains limited.

There are currently two avenues for continuous monitoring of health and disease: (1) consumer-grade wearables and (2) clinical-based precision medicine. Wearable devices such as smart watches are broadly accessible and increasingly popular as consumer products.2,3 Data from these devices has the advantage of being continuously and passively collected, triggering wide scale adoption. Many companies have since devoted significant resources to leverage tools in big data and artificial intelligence (AI) to provide actionable insights from these popular products.4 For instance, Apple (CA, USA) has recently received FDA approval to provide users with alerts to detect atrial fibrillation.5 This diagnostic capability was made possible by widespread consumer participation, which provided expansive datasets to train AI models. The Apple Heart Study involved roughly 400,000 participants and models constructed from this initial dataset were validated with a clinical trial involving approximately 600 participants.5

Given sufficiently large datasets, heart rate information alone can suggest the onset of diverse disease processes.6 However, this type of data offers little information on the origins, mechanisms, and progression of disease. For instance, while an elevated resting heart rate may indicate a number of adverse medical events, including an infection,6 such data is not able to distinguish between bacterial and viral infections. This lack of mechanistic information leaves patients and health care providers unable to implement targeted therapeutic intervention and, in this case, antibiotic stewardship.

On the other end of the spectrum of longitudinal monitoring are tools for clinically-based precision medicine. These include deep genome sequencing and integration with multidimensional clinical phenotypes such as transcriptomics, proteomics, metabolomics, and metagenomics datasets.7,8,9 There are a number of large-scale efforts underway to provide multi-omic phenotyping for large cohorts, such as the Pioneer 100 Wellness Project1,10 and the NIH All of Us program.11 While these initiatives have proven to successfully leverage diverse physiological datasets to enable meaningful intervention,1 they remain hampered by their relative inaccessibility and periodic nature. In other words, while high quality data provides clinically actionable insights, it is expensive, invasive, and difficult to collect, resulting in collections on the scale of months rather than days.1

As Leroy Hood and others have long proposed, modern medicine will only be truly effective once it is has transitioned from reactive disease care to a framework that is “predictive, preventive, personalized, and participatory”.12 To combine the accessibility of wearable devices with the robustness and quality of clinical medicine, a third option is needed; one that provides quantitative measurements of health and mechanistic insights into the origins and progression of disease. We hypothesize that real-time metabolic phenotyping (i.e., metabolomics) using urine could fill this void by providing a quantitative fingerprint of metabolic health along with information about exposure to toxins, drugs, and pathogens.13 In theory, continuous metabolic measurements could be collected at home and in the workplace, providing molecular insights into underlying disease processes, such as distinguishing between patients with related strains of infectious bacteria,14 as well as quantifying the effect of lifestyle decisions on health and disease. Lifestyle factors such as nutrition, alcohol and tobacco usage, sleep, and physical activity are well known to contribute to the risk for chronic disease, which costs the United States alone $2.97 Trillion a year, or 90% of all healthcare expenditures.15 By empowering consumer participation with actionable information and the classification of disease using a continuum of molecular phenotypes rather than discrete clinical symptoms,12 the cost and efficacy of healthcare could be dramatically improved.

While a number of biological matrices, including saliva and blood, could be used as a source of metabolic information, urine offers some key advantages as it can be easily collected passively, non-invasively, and longitudinally.16 Urine is a rich source of cellular metabolites, most stemming from filtration of blood in the kidneys, which excrete about a half cup of blood every minute.17 Urine has long been recognized as a rich fluid for medical diagnostics and presently many clinical assays are performed on this biological fluid.18,19,20 Approximately 4500 metabolites have been documented in urine,19,21 showing connections to approximately 600 human conditions18,19 including but not limited to: obesity,22 cancer,23 inflammation,24 neurological disease,25 and infectious disease.14 Further, pregnancy, ovulation, urinary tract infection, diet, and exercise induce metabolomic signatures that can be observed in urine.20 Finally, many drugs and their metabolites are readily detected from urine, presenting the opportunity for dosage tailored to the individual and monitoring compliance, as well as effective stratification for clinical trials, which can greatly reduce the cost of pharmaceutical development.12,13

Over the course of 10 days, we collected every urine sample from two healthy individuals and tracked hundreds of urine metabolites using gas chromatography and mass spectrometry (GC-MS) along with other biometric data provided by nutritional and fitness smartphone applications. Though other studies have measured the concentrations of urine metabolites from larger populations over time with other physiological phenotypes,20,26 we are unaware of any studies with the time resolution or smartphone data integration we present here. Our aim was to explore this combination of smartphone and metabolomics data as a means to understand the biological consequences of lifestyle in real time.


Urine metabolites provide distinct and continuous metabolic phenotypes

We collected 109 urine samples (50 for Subject 1 and 59 for Subject 2) over 10 days along with biometric measurements provided by smartphone and smartwatch applications, including those for nutrition, exercise, and sleep (Fig. 1a and Supplementary Table 1). Urine samples were lyopholized, resuspended, and derivatized with a solution containing N-Methyl-N-(trimethylsilyl)trifluoroacetamide, and subsequently analyzed with high resolution Gas Chromatography-Fourier Transform Mass Spectrometry (GC-FTMS). The resulting data were deconvoluted with previously described software,27,28 which detected and provided relative quantitative values for 603 metabolite features across the 109 individual urine samples. Of these 603 metabolite features, 101 metabolites were annotated/identified based on spectral matching (see Methods section). An additional 24 features were able to identified as carbohydrates, but were not assigned a specific molecular structure due to the highly similar fragmentation patterns for certain sugars, consistent with a Level 3 assignment according to the standards proposed by the Metabolomics Standards Initiative.21,29,30

Fig. 1

GC-MS metabolomics provides continuous and distinct metabolic phenotypes for two healthy individuals. a Every urine sample (109 total; Subject 1 [red], n = 50; Subject 2 [blue], n = 59), along with biohealth data from smartphone applications, was collected for 10 days. Samples were dried down, derivatized with n-methyl-n-(trimethylsilyl)trifluoroacetamide, and analyzed with high resolution GC-Fourier Transform Mass Spectrometry (FTMS). b Deconvolution and quantification with in-house software27,28 provide time series profiles for 603 metabolite features. c The log2-intensity of a metabolite feature identified as dihydroferulic acid, a known endogenous urine metabolite, revealed different baseline concentrations for Subject 1 and Subject 2, compared to a hypothetical range for the general population. d Scores plot from principal component analysis (PCA) based on log2-normalized intensity values shows clear separation between Subject 1 and Subject 2. Each point represents a sample and is colored by Subject. e PCA loadings plot where each point represents a metabolite feature. Explained variance values of PC1 and PC2 are represented as a percentage in parentheses. An interactive version of d and e are provided in the companion webtool

Each one of these 603 metabolite features provided a continuous metabolic phenotype for both subjects. Compounds, such as dihydroferulic acid—a metabolite of phenolic compounds and known endogenous compound in urine31—showed different baseline levels for Subject 1 and Subject 2 (Fig. 1b, c). Though this study was not designed for absolute quantification, the ability to detect different relative baseline metabolite levels for two health subjects (as well as deviations therefrom) offers a proof of principle for continuous urine analysis enabling personalized medicine. For instance, given the higher and tighter distribution of dihydroferulic acid concentrations for Subject 1 compared to Subject 2, deviations from either of these individual baselines would likely be more clinically meaningful than deviations from a hypothetical normal range established for the general population (Fig. 1c).

Principal component analysis of the log2-transformed normalized metabolite intensity values showed clear separation between subject samples across PC2 (Fig. 1d). An analysis of the corresponding loadings plot of PC2 (Fig. 1e) shows that among the top metabolite features by (absolute value of) rank in the loadings on PC2, the top three metabolites that were readily identified based on spectral matching included dihydroferulic acid, histidine, and phenoxyacetic acid. While the average volume of samples from Subject 1 was significantly higher than that of Subject 2 (Welch’s Two Sample t-test, t = 7.8739, dof = 81.614, p = 1.275e−11), neither PC1 nor PC2 reflect systematic differences in sample volume, run order, or batch number (Supplementary Fig. 1), suggesting that the separation in these two planes is driven by biological rather than technical variance. The clear distinction between samples derived from two healthy subjects is not necessarily surprising given that other urine metabolomics studies have shown the ability of unsupervised techniques to resolve a wide range of biological traits and clinical phenotypes.18

Analysis of literature disease associations

Each identified metabolite was searched against the synonyms in the Human Urine Metabolome Database (HMDB) ( For each metabolite with a matching synonym (75/101), features of interest such as chemical taxonomy and associated diseases (based on previous literature mining efforts19) were tabulated (Supplementary Dataset 1). These compounds covered a range of chemical classes, as defined in HMDB as chemical “Super Class,” but had a higher fraction of organic acids compared to metabolite classes represented by the entire HMDB database (Fig. 2a). This difference in chemical compositions is likely a result of minimal sample processing and the use of GC-MS,18,19 which favors volatile and typically lower molecular weight compounds (see Methods section for more details). In total, 65 metabolites with corresponding HMDB data entries had some type of literature association with diseases, 48 of which had connections to various forms of cancer and 19 with Alzheimer’s Disease (Supplementary Dataset 1).

Fig. 2

There are 4240 known urine metabolites in the human metabolome database (HMDB), 1424 of which have literature associations to a diverse set of human conditions. a Pie chart based on counts of HMDB chemical taxonomy (Super Class) for all metabolites identified in this study (n = 101). b Pie chart based on counts of each HMDB chemical taxonomy for metabolites all of HMDB. c Treemap of diseases, scaled by number of metabolites with literature associations

A broader analysis of the 4240 metabolites available in the downloadable database shows that 1424 of these compounds have disease associations in a least one piece of literature. Diseases and conditions ranged from having one (e.g., Cervical Cancer) to 586 associated metabolites (Obesity) (Fig. 2b and Supplementary Dataset 2). Although these connections are not clinically validated biomarkers of disease, they may suggest potential for applications of continuous urine analysis in digital health and personalized medicine. In fact, many of the diseases that the Center for Disease Control (CDC) lists as leading causes of death in the United States, such as cancer, diabetes, and kidney disease,32 have associations to metabolites that can be detected in urine. Collectively the conditions associated with chronic disease create trillions of dollars in cost to the health care system in the United States alone. Many of these diseases are associated with lifestyle factors, such as tobacco, and alcohol usage, and lack of physical activity.32

Metabolite levels reflect nutrition, lifestyle, physical activity, and sleep

To explore connections between urine metabolite concentrations and others measures of health and lifestyle, various biometric data were collected contemporaneously using smartphone applications (see Methods section). Both subjects collected nutritional data and Subject 1 collected further data on exercise and sleep (Supplementary Table 1 and Supplementary Dataset 3). Due to the disparate time scale for which this data was collected (urine samples were collected four to eight times per day as they were generated, while data such as sleep only provides one time point per day), and to control for diurnal variability (explored further below), daily average metabolite intensities were correlated with biometric values using repeated measures correlation33 (where data is available for both subjects) and Robust Spearman’s Correlation34 (where data is available only for Subject 1) (see Methods section, Supplementary Datasets 3 and 4). Interestingly, there are two predominant groups of metabolites that are (1) positively (see upper half of heatmap) and (2) negatively (see lower half of heatmap) correlated with caloric and nutrient intake (Fig. 3a). Perhaps the former represent compounds that are either food-derived or linked to energy metabolism, whereas the latter represent endogenous compounds that belong to metabolic pathways separate from energy metabolism. While an interactive web-based tool to further explore and visualize these correlations is available, (see Methods and Code availability sections for info on data and source code availability), here, we highlight a few putative connections between metabolite concentrations and over the counter (OTC) medication usage, coffee and alcohol consumption, exercise, and sleep (Fig. 3).

Fig. 3

Urine metabolites reflect patterns of health and lifestyle. a Heatmap whereby each cell represents the strength of correlation between daily average metabolite intensity and each biometric data collected from smartphone apps. Repeated measures correlation33 was used where data was available for both subjects whereas Robust Spearman’s Correlation34 was used for data only available for Subject 1 (i.e., exercise and sleep). b An example correlation (r = 0.812, p = 1.32e−4, q = 0.0120, dof = 14) between alcohol consumption (in kcals) and a carbohydrate compound (“carbohydrate 6”), most likely representing xylitol. An interactive version of b is provided in the companion webtool

Subject 1 drank coffee approximately twice a day (at 7 a.m. and 9 a.m.), and Subject 2 drank coffee every morning around 8 a.m. except for on August 25 and August 27, 2018. These consumption patterns are consistent with our measurements of compounds with known associations to coffee consumption (Supplementary Fig. 2). Furoylglycine, a biomarker of coffee consumption,35,36 and its corresponding intensity was consistent with notes of when coffee was consumed each day (Supplementary Fig. 2). While the repeated measures correlation coefficient was low (r = 0.617) and did not reach statistical significance (p = 0.0679, q = 0.199, dof = 14), the daily average intensity value for quinic acid, another known metabolite from coffee,36 tracked much more quantitatively with coffee consumption (Supplementary Fig. 3; r = 0.787, p = 2.93e−4, q = 0.0877, dof = 14).

A metabolite that was putatively identified by spectral database searching software (see Methods section for details on metabolite identification) as a carbohydrate compound (termed “carbohydrate 6” in Supplementary Datasets 1, 3, and 4) was well correlated with alcohol consumption (as measured in kcal) (r = 0.812, p = 1.32e−4, q = 0.0120, dof = 14, Fig. 3b). These calories from alcohol include a variety of alcohol types (beer, wine, whiskey, gin, tequila, cognac, and vermouth) and thus are more likely to reflect ethanol consumption rather than a compound specific to a certain type of beverage. Further manual inspection of the spectral matches to this unannotated metabolite feature suggested that it is most likely a sugar alcohol, with the highest dot product score to xylitol. Follow up analysis using a standard of ethyl-glucuronide, an established metabolite and biomarker of ethanol consumption,37 was then analyzed and added to our in-house spectral database (see Methods section). A separate (initially unidentified) metabolite feature at a retention time (RT) of 14.842677 and m/z 217.1075485 was subsequently identified as ethyl-glucuronide Supplementary Fig. 4). While the repeated measures correlation coefficient for ethyl-glucuronide and alcohol consumption was lower (r = 0.657, p = 5.70e−3, q = 0.0504, dof = 14) than for the putative sugar alcohol (“carbohydrate 6”), this discrepancy is likely a result of the nature of the metabolite’s pharmacokinetics; ethyl glucuronide has a longer half-life than ethanol.37

Subject 2 reported taking acetaminophen at 8:45 p.m. on August 25, 2018. A spike in an ion intensity consistent with acetaminophen shows a corresponding increase observed in next sample in sequence, which was collected at 7:15 a.m. on August 26, 2018 (Supplementary Fig. 5). No such spike was observed for Subject 1, who did not record taking any acetaminophen throughout the collection of these samples.

In addition to nutritional information collected for Subject 1 and Subject 2, data on physical activity and sleep was collected for Subject 1. Hypoxanthine, a degradative purine product resulting from ATP breakdown in muscle tissue during exercise,38,39 correlated with physical activity (r = 0.833, p= 0.0102, q = 0.471, n = 8; Supplementary Fig. 6 and Supplementary Dataset 4). Sleep was anticorrelated with hydrocaffeic acid (r = −0.857, p = 0.0137, q = 0.551, n = 8; Supplementary Fig. 7 and Supplementary Dataset 4), a metabolite of caffeic acid,36,40 which in turn has been shown to affect sleep latency in rats.40 However, neither of these metabolite correlations passed the significance threshold after multiple hypothesis correction (i.e., q >> 0.05).

Analysis and modeling considerations for high time resolution metabolomic data

Although the correlation analysis presented above is based on daily average metabolite intensities (given the disparate, non-paired time scales for the corresponding biometric/nutritional data points), a linear regression model using time of the day as a predictor of metabolite intensity (see Methods section) established diurnal effects for 268 metabolite features (Fig. 4a). Eighty three metabolites had a time of day effect for both subjects, whereas 87 and 98 metabolites exhibited a time of day effect that was specific to Subject 1 and Subject 2, respectively (Fig. 4a). Metabolites exhibiting this effect were then separated into three groups based on which time of day showed the greatest effect (calculated here by median z-scores for each time group) (Fig. 4b, c). For instance, the upper left panel of Fig. 4c shows the subset of metabolites varying the most in the morning for Subject 1. For this group of metabolites (amounting to 104/170 metabolites for Subject 1), most are higher in the morning and then decrease over the afternoon and into the evening. For Subject 2, metabolite deviations were more evenly distributed across the day (Fig. 4b). Though it is not currently clear what behaviors or underlying biology drive these distinct and time-dependent patterns, it is possible that they originate from differences in dietary habits, age, or physical activity. While Subject 2 did not collect any data on physical activity using a smartphone application, they noted a lack of organized physical activity (e.g., running, weight lifting, etc.) during the period of this study.

Fig. 4

Daily metabolite fluctuations can be shared, distinct, or subject specific. a A subset of metabolite features for Subject 1 (n = 170) and Subject 2 (n = 181) were significantly (p < 0.05) affected by time of the day based on linear regression analysis (see Methods section). b While 83 metabolites were affected by time of the day for both Subjects, 87 and 98 were uniquely affected for Subject 1 and Subject 2, respectively. The majority of metabolites for Subject 1 had the greatest deviation from baseline in the morning, whereas metabolite deviations were more evenly distributed across the day for Subject 2. c Median Z-scores of metabolite concentrations throughout the course of the day. Metabolites with the greatest deviations in the morning appear to stabilize by evening (top row), while metabolites with an evening effect (bottom row) are close to baseline concentrations before spiking up or down in the evening. Metabolites with greatest deviations in the afternoon (middle row) peak in the afternoon and tend to reverse course by evening

Because metabolite concentrations in urine are less directly subjected to the strong homeostatic forces of other biological matrices (such as blood), variance in metabolite concentrations due to diet, lifestyle, and time of the day may be more pronounced.20 Thus, in light of our analysis, as well as that of other groups,26,41 there may be better times of the day to measure the effects of factors such as sleep, exercise, or nutrition. Furthermore, optimal sampling time and frequency will likely vary by individual. Thus, at a minimum, it is worth considering diurnal effect (or temporal-behavioral effects) as a confounding variable in more sophisticated modeling approaches with larger, future datasets.

To systematically account for the effect of time dependency and to avoid averaging (and the resulting loss of information, biological variance, and statistical power), we recommend future studies use smartphone applications that record high resolution timestamps for exercise, heart rate, nutritional data, etc. that will allow for more powerful mixed effect linear models.42 Using a moving average with a smaller interval (i.e., rather than daily average) of metabolite intensity may be another viable approach. However, setting the optimal interval for such a moving average will likely depend on the type of metabolite and the various biological or environmental factors affecting its turnover.


Over the course of 10 days, we measured continuous physiological phenotypes via urine metabolites. Multivariate analysis showed a clear distinction between samples derived from two healthy subjects, suggesting a distinct baseline metabolic fingerprint for each. We observed urine metabolites with known associations to human disease, daily metabolite fluctuations that were subject specific, and saw connections between lifestyle factors such as exercise, nutrition, sleep, and OTC drug usage. Taken together, our data suggest that urine analysis offers metabolic phenotypes that are both quantitative and highly personalized. The sampling of urine compared to other biofluids has a number of important tradeoffs to consider. Perhaps the most challenging variable to control for is the relative dilution of samples depending upon factors such hydration levels and physical exertion. In this study, we controlled for dilution by normalizing to the total ion current (TIC), reasoning that this method of normalization would be less biased compared to creatinine normalization, the reuptake of which can be affected by biological factors such as differences in muscle mass.43 The advantage of sampling urine, compared to other biofluids such as blood or saliva, is the passive and non-invasive nature of sample collection. We reason that broad consumer participation will depend upon the integration of technology that imposes minimal changes to lifestyle, such as with a smart watch; a device that is capable of collecting previously unavailable data without demanding any change in behavior from its user.

While healthcare consumer access to such metabolic phenotypes offers tremendous potential for personalized and preventive medicine, both subjects noted practical challenges in participating in this study. For instance, the burden of storing and transporting urine samples using a cooler full of dry ice may explain why few if any other studies have comparable time resolution (i.e., collecting every sample). Future studies and, certainly, user-friendly consumer products would benefit from a collection system that is integrated directly into a toilet.

Of course, designing and manufacturing a consumer-grade device that could effectively and affordably measure metabolites in urine presents many challenges. In this study we elected to utilize GC-MS as an analytical platform for several reasons. First, it is a mature and robust analytical technology that is widely used for quantitative metabolite analysis. Second, the method requires minimal sample amount and can analyze a complex mixture on a relatively rapid timescale with minimal solvent or chemical usage. And, third, GC-MS systems have already been miniaturized and several portable systems—roughly the size of a backpack44,45—are commercially available and could be adapted for integration into toilets for a household product. A “smart bathroom” would likely struggle to accommodate the consumables and complex solvent handling system required for liquid chromatography mass spectrometry (LC-MS). Similarly, the fundamental requirement of a powerful magnet for nuclear magnetic resonance (NMR) would preclude its ability to integrate into a household setting. Regardless of its underlying technology, such a device should be robust enough to withstand repeated urine analysis from multiple users, sensitive enough to simultaneously quantify tens, hundreds, or even thousands of metabolites, and affordable enough to reduce instead of add cost to the already overburdened healthcare system.2,3,46 In addition to the technological and economic challenges of building such a device, there are a wide range of ethical challenges in collecting, storing, sharing, and interpreting personalized metabolic information. Though these challenges will hinder development of such a biosensor, we present this dataset and an accompanying interactive web-based visualization tool to share our optimism. We believe that continuous urine metabolite analysis offers a promising opportunity to integrate with current digital technologies as an orthogonal layer of biomedical data to make modern medicine more predictive, preventive, personalized, and participatory.


Sample collection

Urine samples were collected midstream and volume was measured using sterile 500 mL plastic beakers from which urine was then decanted into cups provided in the BD Vacutainer® Urine Complete Cup Kit. Samples were then transferred into 8 mL urinalysis plus conical urine tubes. These vacuum sealed 8 mL tubes were either immediately deposited into −80 °C freezers or temporarily stored in dry ice using portable coolers overnight before being deposited into −80 °C freezers. Quantitative values for sample volume along with sample collection times are available in Supplementary Dataset 3. Original urine samples are of limited volume and the subject of ongoing research, but may be made available upon reasonable request to the corresponding author.

Ethics approval and consent to participate

Urine obtained in this study was donated by project participants and did not involve recruitment or enrollment of human subjects as defined by the University of Wisconsin IRB.

Sample preparation and analysis

All urine samples were prepared by sampling 100 µL into Thermo Scientific 300 µL Amber Vials with inserts and subsequently evaporated to dryness using a Thermo Scientific SpeedVac® Concentrator. Samples were then derivatized for gas chromatography analysis using a 50 µL solution of 1:1 pyridine: N-Methyl-N-(trimethylsilyl)trifluoroacetamide with 1% trimethylchlorosilane (chemicals obtained from Sigma Aldrich) and incubated at 60 °C for 30 min. Samples were then injected onto a Thermo Scientific Gas Chromatography-Fourier Transform Mass Spectrometry (GC-FTMS) Orbitrap using a temperature gradient starting at 100 °C (hold time of one minute), and increasing at a rate of 8.5 °C per minute until reaching 260 °C. The temperature gradient rate was then increased to 50 °C per minute until reaching a final temperature of 320 °C (hold time of 4 min). Split ratio was set to 10:1 with a carrier gas flow of 1.200 mL/min. The MS transfer line and ion source temperatures were set to 300 °C and 250 °C, respectively. The instrument scanned in Full MS-SIM mode at 30,000 resolution. The AGC target was set to 1.0e6 with a scan range of 50 to 650 m/z. Ionization mode was set to electron ionization (EI).

Raw files were subsequently processed using previously described software27,28 for deconvolution, peak alignment, quantitation, and identification. This software is freely available at along with a detailed, step-by-step user guide including screenshots of the user interface (see: “Y3K GC Quantitation Pipeline User Guide.pdf” in the aforementioned GitHub repository). Cutoffs for peak quantitation were set to a minimum fragment count of 10, minimum observation of a given peak across all files set to 33%, and analyte/background signal set to 10. Spectra were then matched against the unit resolution library curated by the National Institute of Standards and Technology (NIST), and a high resolution library developed in house in collaboration with Thermo Scientific. The output of this software is metabolite feature relative intensity values normalized to total ion current (TIC), which are available in Supplementary Dataset 3 (sheetname “SampleData”).

Preparation and analysis of ethyl glucuronide standard

Ethyl glucuronide standard (100 ug/mL, Sigma Aldrich) was dried down in an Amber autosampler vial using a SpeedVac® Concentrator. This standard was derivatized and analyzed as described above.

Biometric data collection

Nutritional data was recorded daily using the Lose It! App. For Subject 1, active calories were recorded with an Apple Watch Series 2 (Model A1758, software version 4.3.2 (15U70)) and hours of sleep were calculated using the Sleep Cycle App. Summary statistics are available in Supplementary Table 1.

Analysis of the Human Urine Metabolome Database (HMDB)

The XML version of the human urine metabolome database (urine_metabolites.xml) was downloaded on October 7, 2018. The various aggregated files and scripts used in this analysis are accessible in the associated GitHub repository (see Code availability section) and as Supplementary Datasets 1 and 2. For the purpose of visualization, broader disease categories, such as “Cancer” and “Inflammation” were curated manually (Supplementary Dataset 2). A treemap (Fig. 2b) was generated using the ggplot247 and treemapify48 modules in R.

Statistical analysis

In order to account for differences in urine sample volume (and thus dilution levels), metabolite intensity values were normalized to the total ion current (TIC) for the RAW files for each sample using previously described software.27,28 Principal component analysis was conducted on log2-transformed TIC-normalized intensity values using the decomposition.PCA() method in Python’s skickit learn module.49 Welch’s Two Sample t-test was performed in R using the t.test(var.equal = F) function. Repeated measures correlation33 was used to correlate daily log2-transformed average metabolite intensities with biometric values where biometric data were available for both subjects using the rm_corr() function in the Pingouin50 package in Python. Skipped (robust) Spearman’s rho was used to correlate log2-transformed daily average metabolite intensities with biometric values for data exclusively available for Subject 1 (i.e., physical activity and sleep) using corr(method = ’skipped’) from Pingouin.50 P-values from the repeated measures correlation, and Spearman’s rho were adjusted for multiple hypothesis testing (where the number of tests was considered the number of metabolite features observed [603]) using the Benjamini Hochberg false discovery rate (FDR) procedure via the fdr(method = ’fdr_bh’) method in Pingouin and are presented as q-values in the manuscript. All p-values from hypothesis testing are based on two-sided tests and degrees of freedom and/or n values are presented as they appear in the Results section. Further statistical details on correlation results can be found in Supplementary Dataset 4.

Linear regression analysis was used to test for diurnal effects on metabolite concentrations. Samples were binned by “morning” (6 a.m. to 12 p.m.), “afternoon” (12 p.m. to 6 p.m.), “evening” (6 p.m. to 12 a.m.), and “late night” (12 a.m. to 6 a.m.). Only two samples (both from Subject 1) were collected between 12 a.m. and 6 a.m. and were excluded from this analysis under the assumption that they represent outliers and would create an unbalanced group for regression analysis, which was performed using the linear_regression() function in Pingouin.50 An effect was considered significant if the reported p-value associated with the coefficient for the TimeOfDay designation (see Supplementary Dataset 2) was less than or equal to an alpha of 0.05.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.

Data availability

A companion web tool is available via the following URL:, and provides an interactive visualization of Figs 1d, e and 3b. RAW data files were uploaded to the MassIVE database under the accession number: MSV000083880 ( Processed and normalized metabolite intensity values along with other relevant metadata are provided in Supplementary Datasets 14 in this study. These datasets are further available through bioRxiv (, where this article was previously published as a preprint.51

Code availability

All source code used for analysis in this study, including for the interactive data visualization tool, is available on GitHub: The source code, compiled executables, and step-by-step user guide for the software used for GC-MS spectral deconvolution, quantification, and annotation are also freely available on GitHub:


  1. 1.

    Price, N. D. et al. A wellness study of 108 individuals using personal, dense, dynamic data clouds. Nat. Biotechnol. 35, 747–756 (2017).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  2. 2.

    Steinhubl, S. R., Muse, E. D. & Topol, E. J. The emerging field of mobile health. Sci. Transl. Med. 7, 283rv3 (2015).

    PubMed  PubMed Central  Article  Google Scholar 

  3. 3.

    Dunn, J., Runge, R. & Snyder, M. Wearables and the medical revolution. Per. Med. 15, 429–448 (2018).

    CAS  PubMed  Article  Google Scholar 

  4. 4.

    Beam, A. L. & Kohane, I. S. Translating artificial intelligence into clinical care. JAMA 316, 2368–2369 (2016).

    PubMed  Article  Google Scholar 

  5. 5.

    ECG app and irregular heart rhythm notification available today on Apple Watch. Apple Newsroom. Accessed 17 May 2019.

  6. 6.

    Li, X. et al. Digital health: tracking physiomes and activity using wearable biosensors reveals useful health-related information. PLoS Biol. 15, e2001402 (2017).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  7. 7.

    Chen, R. et al. Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell 148, 1293–1307 (2012).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  8. 8.

    Zhou, W. et al. Longitudinal multi-omics of host–microbe dynamics in prediabetes. Nature 569, 663–671 (2019).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  9. 9.

    Schüssler-Fiorenza Rose, S. M. et al. A longitudinal big data approach for precision health. Nat. Med. 25, 792–804 (2019).

    PubMed  Article  CAS  Google Scholar 

  10. 10.

    Hood, L. & Price, N. D. Promoting wellness and demystifying disease: the 100K project. Clin. OMICs 1, 20–23 (2014).

    Article  Google Scholar 

  11. 11.

    National Institutes of Health (NIH)—All of us. Accessed 15 May 2019.

  12. 12.

    Flores, M., Glusman, G., Brogaard, K., Price, N. D. & Hood, L. P4 medicine: how systems medicine will transform the healthcare sector and society. Per. Med. 10, 565–576 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  13. 13.

    Wishart, D. S. Emerging applications of metabolomics in drug discovery and precision medicine. Nat. Rev. Drug Discov. 15, 473–484 (2016).

    CAS  PubMed  Article  Google Scholar 

  14. 14.

    Lv, H., Hung, C. S., Chaturvedi, K. S., Hooton, T. M. & Henderson, J. P. Development of an integrated metabolomic profiling approach for infectious diseases research. Analyst 136, 4752–4763 (2011).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  15. 15.

    About chronic diseases | CDC. Accessed 29 March 2019 (2019).

  16. 16.

    Wald, C. Diagnostics: a flow of information. Nature 551, S48–S50 (2017).

    CAS  PubMed  Article  Google Scholar 

  17. 17.

    Your kidneys and how they work | NIDDK. National Institute of Diabetes and Digestive and Kidney Diseases. Accessed 17 May 2019

  18. 18.

    Khamis, M. M., Adamko, D. J. & El-Aneed, A. Mass spectrometric based approaches in urine metabolomics and biomarker discovery. Mass Spectrom. Rev. 36, 115–134 (2017).

    CAS  PubMed  Article  Google Scholar 

  19. 19.

    Bouatra, S. et al. The human urine metabolome. PLoS ONE 8, e73076 (2013).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  20. 20.

    Wu, J. & Gao, Y. Physiological conditions can be reflected in human urine proteome and metabolome. Expert Rev. Proteom. 12, 623–636 (2015).

    CAS  Article  Google Scholar 

  21. 21.

    Blaženović, I. et al. Structure annotation of all mass spectra in untargeted metabolomics. Anal. Chem. 91, 2155–2162 (2019).

    PubMed  Article  CAS  Google Scholar 

  22. 22.

    Salek, R. M. et al. A metabolomic comparison of urinary changes in type 2 diabetes in mouse, rat, and human. Physiol. Genom. 29, 99–108 (2007).

    CAS  Article  Google Scholar 

  23. 23.

    Yu, L. et al. Analysis of urinary metabolites for breast cancer patients receiving chemotherapy by CE-MS coupled with on-line concentration. Clin. Biochem. 46, 1065–1073 (2013).

    CAS  PubMed  Article  Google Scholar 

  24. 24.

    Alonso, A. et al. Urine metabolome profiling of immune-mediated inflammatory diseases. BMC Med. 14, 133 (2016).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  25. 25.

    Luan, H. et al. Comprehensive urinary metabolomic profiling and identification of potential noninvasive marker for idiopathic Parkinson’s disease. Sci. Rep. 5, 13888 (2015).

    PubMed  PubMed Central  Article  Google Scholar 

  26. 26.

    Kim, K. et al. Mealtime, temporal, and daily variability of the human urinary and plasma metabolomes in a tightly controlled environment. PLoS ONE 9, e86223 (2014).

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  27. 27.

    Kwiecien, N. W. et al. High-resolution filtering for improved small molecule identification via GC/MS. Anal. Chem. 87, 8328–8335 (2015).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  28. 28.

    Stefely, J. A. et al. Mitochondrial protein functions elucidated by multi-omic mass spectrometry profiling. Nat. Biotechnol. 34, 1191–1197 (2016).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  29. 29.

    Schymanski, E. L. et al. Identifying small molecules via high resolution mass spectrometry: communicating confidence. Environ. Sci. Technol. 48, 2097–2098 (2014).

    CAS  PubMed  Article  Google Scholar 

  30. 30.

    Sumner, L. W. et al. Proposed minimum reporting standards for chemical analysis. Metabolomics 3, 211–221 (2007).

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  31. 31.

    Rechner, A. R., Spencer, J. P., Kuhnle, G., Hahn, U. & Rice-Evans, C. A. Novel biomarkers of the metabolism of caffeic acid derivatives in vivo. Free Radic. Biol. Med. 30, 1213–1222 (2001).

    CAS  PubMed  Article  Google Scholar 

  32. 32.

    Health and economic costs of chronic disease | CDC. (2018). Accessed 15 Jan 2019.

  33. 33.

    Bakdash, J. Z. & Marusich, L. R. Repeated measures correlation. Front. Psychol. 8, 456 (2017).

    PubMed  PubMed Central  Article  Google Scholar 

  34. 34.

    Pernet, C. R., Wilcox, R. & Rousselet, G. A. Robust correlation analyses: false positive and power validation using a new open source matlab toolbox. Front. Psychol. 3, 606 (2012).

    PubMed  Article  Google Scholar 

  35. 35.

    Heinzmann, S. S., Holmes, E., Kochhar, S., Nicholson, J. K. & Schmitt-Kopplin, P. 2-Furoylglycine as a candidate biomarker of coffee consumption. J. Agric. Food Chem. 63, 8615–8621 (2015).

    CAS  PubMed  Article  Google Scholar 

  36. 36.

    Ludwig, I. A., Clifford, M. N., Lean, M. E. J., Ashihara, H. & Crozier, A. Coffee: biochemistry and potential impact on health. Food Funct. 5, 1695–1717 (2014).

    CAS  PubMed  Article  Google Scholar 

  37. 37.

    Helander, A., Böttcher, M., Fehr, C., Dahmen, N. & Beck, O. Detection times for urinary ethyl glucuronide and ethyl sulfate in heavy drinkers during alcohol detoxification. Alcohol Alcoholism. 44, 55–61 (2009).

    CAS  PubMed  Article  Google Scholar 

  38. 38.

    Ketai, L. H., Simon, R. H., Kreit, J. W. & Grum, C. M. Plasma hypoxanthine and exercise. Am. Rev. Respiratory Dis. 136, 98–101 (1987).

    CAS  PubMed  Article  Google Scholar 

  39. 39.

    Sahlin, K., Ekberg, K. & Cizinsky, S. Changes in plasma hypoxanthine and free radical markers during exercise in man. Acta Physiol. Scand. 142, 275–281 (1991).

    CAS  PubMed  Article  Google Scholar 

  40. 40.

    Shinomiya, K. et al. Effects of chlorogenic acid and its metabolites on the sleep–wakefulness cycle in rats. Eur. J. Pharmacol. 504, 185–189 (2004).

    CAS  PubMed  Article  Google Scholar 

  41. 41.

    Slupsky, C. M. et al. Investigations of the effects of gender, diurnal variation, and age in human urinary metabolomic profiles. Anal. Chem. 79, 6995–7004 (2007).

    CAS  PubMed  Article  Google Scholar 

  42. 42.

    Aarts, E., Verhage, M., Veenvliet, J. V., Dolan, C. V. & van der Sluis, S. A solution to dependency: using multilevel analysis to accommodate nested data. Nat. Neurosci. 17, 491–496 (2014).

    CAS  PubMed  Article  Google Scholar 

  43. 43.

    Mizuno, H. et al. The great importance of normalization of LC–MS data for highly-accurate non-targeted metabolomics. Biomed. Chromatogr. 31, e3864 (2017).

    Article  CAS  Google Scholar 

  44. 44.

    Shortt, B. J., Darrach, M. R., Holland, P. M. & Chutjian, A. Miniaturized system of a gas chromatograph coupled with a Paul ion trap mass spectrometer. J. Mass Spectrom. 40, 36–42 (2005).

    CAS  PubMed  Article  Google Scholar 

  45. 45.

    Chen, C.-H. et al. Design of portable mass spectrometers with handheld probes: aspects of the sampling and miniature pumping systems. J. Am. Soc. Mass Spectrom. 26, 240–247 (2015).

    CAS  PubMed  Article  Google Scholar 

  46. 46.

    Snyder, M. & Zhou, W. Big data and health. The Lancet Digital Health (2019).

  47. 47.

    Wickham, H. ggplot2: elegant Graphics for Data Analysis (Springer, 2016).

  48. 48.

    Wilkins, D. Treemapify: Draw treemaps in ‘ggplot2’. Online: CRAN. R-project. org/package=treemapify. (2017). Accessed 28 March 2018.

  49. 49.

    Pedregosa, F. et al. Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011).

    Google Scholar 

  50. 50.

    Vallat, R. Pingouin: statistics in Python. J. Open Source Softw. 3, 331 (2018).

    Article  Google Scholar 

  51. 51.

    Miller, I. J. et al. Real time health monitoring through urine metabolomics. (2019). Preprint at

Download references


The authors would like to thank Vanessa Linke for useful suggestions regarding analysis and visualization of metabolomics data, as well as feedback on the paper. The authors would also like to acknowledge Dasom Hwang for her help with refining figures. This research was supported by funding from R35GM118110 and the Morgridge Institute for Research.

Author information




I.J.M., M.S.W. and J.J.C. conceived and planned the study. S.R.P. prepared and analyzed the urine samples using the instrument described in the methods. K.A.O. and B.R.P. prepared and analyzed metabolite standards. I.J.M. and K.A.O. analyzed the data. I.J.M. wrote the code for data analysis and the companion web tool. I.J.M. drafted the paper; I.J.M. and J.J.C. edited and prepared the paper for submission. All authors had access to the data in this study and have read and approved this paper.

Corresponding author

Correspondence to Joshua J. Coon.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Miller, I.J., Peters, S.R., Overmyer, K.A. et al. Real-time health monitoring through urine metabolomics. npj Digit. Med. 2, 109 (2019).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing