Skip to main content

Thank you for visiting You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

A longitudinal big data approach for precision health


Precision health relies on the ability to assess disease risk at an individual level, detect early preclinical conditions and initiate preventive strategies. Recent technological advances in omics and wearable monitoring enable deep molecular and physiological profiling and may provide important tools for precision health. We explored the ability of deep longitudinal profiling to make health-related discoveries, identify clinically relevant molecular pathways and affect behavior in a prospective longitudinal cohort (n = 109) enriched for risk of type 2 diabetes mellitus. The cohort underwent integrative personalized omics profiling from samples collected quarterly for up to 8 years (median, 2.8 years) using clinical measures and emerging technologies including genome, immunome, transcriptome, proteome, metabolome, microbiome and wearable monitoring. We discovered more than 67 clinically actionable health discoveries and identified multiple molecular pathways associated with metabolic, cardiovascular and oncologic pathophysiology. We developed prediction models for insulin resistance by using omics measurements, illustrating their potential to replace burdensome tests. Finally, study participation led the majority of participants to implement diet and exercise changes. Altogether, we conclude that deep longitudinal profiling can lead to actionable health discoveries and provide relevant information for precision health.

Access options

Rent or Buy article

Get time limited or full article access on ReadCube.


All prices are NET prices.

Fig. 1: Study design and data collection.
Fig. 2: Clinical and enhanced phenotyping of glucose metabolism, insulin production and resistance.
Fig. 3: Longitudinal individual phenotyping and multi-omics of glucose metabolism and inflammation.
Fig. 4: Clinical longitudinal cardiovascular health profiling and multi-omics correlation network of adjusted ASCVD risk.
Fig. 5: Oncologic discoveries.
Fig. 6: Summary of major clinically actionable health discoveries and participant health behavior change.

Data availability

Raw omics data (transcriptome, immunome, proteome, metabolome, microbiome) included in this study are hosted on the NIH Human Microbiome 2 project site ( under the T2D project along with clinical laboratory data to 2016. Data from participants who have not consented to make their data public are available on dbGAP (accession phs001719.v1.p1). Additional data unique to this manuscript has been provided in the Supplementary Data files.


  1. 1.

    National Research Council (US) Committee on a framework for developing a new taxonomy of disease. Toward Precision Medicine: Building a Knowledge Network for Biomedical Research and a New Taxonomy of Disease (National Academies Press, 2012).

  2. 2.

    Li, X. et al. Digital health: tracking physiomes and activity using wearable biosensors reveals useful health-related information. PLoS Biol. 15, e2001402 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. 3.

    Chen, R. et al. Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell 148, 1293–1307 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. 4.

    Price, N. D. et al. A wellness study of 108 individuals using personal, dense, dynamic data clouds. Nat. Biotechnol. 35, 747–756 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. 5.

    Perkins, B. A. et al. Precision medicine screening using whole-genome sequencing and advanced imaging to identify disease risk in adults. Proc. Natl Acad. Sci. USA 115, 3686–3691 (2018).

    Article  CAS  Google Scholar 

  6. 6.

    Hall, H. et al. Glucotypes reveal new patterns of glucose dysregulation. PLoS Biol. 16, e2005143 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. 7.

    McConnell, M. V. et al. Feasibility of obtaining measures of lifestyle from a smartphone app: the myheart counts cardiovascular health study. JAMA Cardiol 2, 67–76 (2017).

    Article  Google Scholar 

  8. 8.

    Dinneen, S., Gerich, J. & Rizza, R. Carbohydrate metabolism in non-insulin-dependent diabetes mellitus. N. Engl. J. Med. 327, 707–713 (1992).

    Article  CAS  Google Scholar 

  9. 9.

    Varghese, R. T. et al. Mechanisms underlying the pathogenesis of isolated impaired glucose tolerance in humans. J. Clin. Endocrinol. Metab. 101, 4816–4824 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Zhou, W. et al. Complex host-microbial dynamics in prediabetes revealed through longitudinal multi-omics profiling. Nature (in the press).

  11. 11.

    1000 Genomes Project Consortium. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  CAS  Google Scholar 

  12. 12.

    Rego, S. et al. High frequency actionable pathogenic exome variants in an average-risk cohort. Cold Spring Harb. Mol. Case Stud. 4, a003178 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  13. 13.

    Pearson, E. R. et al. Genetic cause of hyperglycaemia and response to treatment in diabetes. Lancet 362, 1275–1281 (2003).

    Article  CAS  Google Scholar 

  14. 14.

    Cersosimo, E., Solis-Herrera, C., Trautmann, M. E., Malloy, J. & Triplitt, C. L. Assessment of pancreatic β-cell function: review of methods and clinical applications. Curr. Diabetes Rev. 10, 2–42 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. 15.

    Van Cauter, E., Mestrez, F., Sturis, J. & Polonsky, K. S. Estimation of insulin secretion rates from C-peptide levels. Comparison of individual and standard kinetic parameters for C-peptide clearance.Diabetes 41, 368–377 (1992).

    Article  Google Scholar 

  16. 16.

    Matsuda, M. & DeFronzo, R. A. Insulin sensitivity indices obtained from oral glucose tolerance testing: comparison with the euglycemic insulin clamp. Diabetes Care 22, 1462–1470 (1999).

    Article  CAS  Google Scholar 

  17. 17.

    Godsland, I. F., Jeffs, J. A. R. & Johnston, D. G. Loss of beta cell function as fasting glucose increases in the non-diabetic range. Diabetologia 47, 1157–1166 (2004).

    Article  CAS  Google Scholar 

  18. 18.

    Kanat, M. et al. The relationship between {beta}-cell function and glycated hemoglobin: results from the veterans administration genetic epidemiology study. Diabetes Care 34, 1006–1010 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. 19.

    Iikuni, N., Lam, Q. L. K., Lu, L., Matarese, G. & La Cava, A. Leptin and inflammation. Curr. Immunol. Rev. 4, 70–79 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. 20.

    Hamilton, J. A. GM-CSF in inflammation and autoimmunity. Trends Immunol. 23, 403–408 (2002).

    Article  CAS  Google Scholar 

  21. 21.

    Reidy, S. P. & Weber, J. Leptin: an essential regulator of lipid metabolism. Comp. Biochem. Physiol. A 125, 285–298 (2000).

    Article  CAS  Google Scholar 

  22. 22.

    Guasch-Ferré, M. et al. Metabolomics in prediabetes and diabetes: a systematic review and meta-analysis. Diabetes Care 39, 833–846 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. 23.

    Twig, G. et al. White blood cells count and incidence of type 2 diabetes in young men. Diabetes Care 36, 276–282 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  24. 24.

    Oliveira, A. G. et al. The role of hepatocyte growth factor (HGF) in insulin resistance and diabetes. Front. Endocrinol. 9, 503 (2018).

    Article  Google Scholar 

  25. 25.

    Mothe-Satney, I. et al. Adipocytes secrete leukotrienes: contribution to obesity-associated inflammation and insulin resistance in mice. Diabetes 61, 2311–2319 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. 26.

    Tsamardinos, I., Brown, L. E. & Aliferis, C. F. The max-min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 65, 31–78 (2006).

    Article  Google Scholar 

  27. 27.

    Lagani, V., Athineou, G., Farcomeni, A., Tsagris, M. & Tsamardinos, I. Feature selection with the R Package MXM: discovering statistically equivalent feature subsets. J. Stat. Softw. 80, 1–25 (2017).

    Article  Google Scholar 

  28. 28.

    McLaughlin, T. et al. Use of metabolic markers to identify overweight individuals who are insulin resistant. Ann. Intern. Med. 139, 802–809 (2003).

    Article  Google Scholar 

  29. 29.

    Nowak, C. et al. Protein biomarkers for insulin resistance and type 2 diabetes risk in two large community cohorts. Diabetes 65, 276–284 (2016).

    CAS  PubMed  Google Scholar 

  30. 30.

    Apostolopoulou, M. et al. Specific hepatic sphingolipids relate to insulin resistance, oxidative stress, and inflammation in nonalcoholic steatohepatitis. Diabetes Care 41, 1235–1243 (2018).

    Article  CAS  Google Scholar 

  31. 31.

    Gomez-Arango, L. F. et al. Connections between the gut microbiome and metabolic hormones in early pregnancy in overweight and obese women. Diabetes 65, 2214–2223 (2016).

    Article  CAS  Google Scholar 

  32. 32.

    Kwo, P. Y., Cohen, S. M. & Lim, J. K. ACG clinical guideline: evaluation of abnormal liver chemistries. Am. J. Gastroenterol. 112, 18–35 (2017).

    Article  CAS  Google Scholar 

  33. 33.

    Hu, F. B. et al. Elevated risk of cardiovascular disease prior to clinical diagnosis of type 2 diabetes. Diabetes Care 25, 1129–1134 (2002).

    Article  Google Scholar 

  34. 34.

    Goff, D. C. Jr et al. 2013 ACC/AHA guideline on the assessment of cardiovascular risk: a report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. Circulation 129, S49–S73 (2014).

    Article  Google Scholar 

  35. 35.

    Kuznetsova, T. et al. Additive prognostic value of left ventricular systolic dysfunction in a population-based cohort. Circ. Cardiovasc. Imag. 9, e004661 (2016).

    Article  Google Scholar 

  36. 36.

    Wang, T. J. et al. Carotid intima-media thickness is associated with premature parental coronary heart disease: the Framingham Heart Study. Circulation 108, 572–576 (2003).

    Article  Google Scholar 

  37. 37.

    Mitchell, G. F. et al. Arterial stiffness and cardiovascular events: the Framingham Heart Study. Circulation 121, 505–511 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  38. 38.

    Moneghetti, K. J. et al. Applying current normative data to prognosis in heart failure: the fitness registry and the importance of exercise national database (FRIEND). Int. J. Cardiol. 263, 75–79 (2018).

    Article  Google Scholar 

  39. 39.

    Hall, K. T. et al. Polymorphisms in catechol-O-methyltransferase modify treatment effects of aspirin on risk of cardiovascular disease. Arterioscler. Thromb. Vasc. Biol. 34, 2160–2167 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. 40.

    Malik, R. et al. Multiancestry genome-wide association study of 520,000 subjects identifies 32 loci associated with stroke and stroke subtypes.Nat. Genet. 50, 524–537 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  41. 41.

    Cross, D. S. et al. Coronary risk assessment among intermediate risk patients using a clinical and biomarker based algorithm developed and validated in two population cohorts. Curr. Med. Res. Opin. 28, 1819–1830 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  42. 42.

    Ma, H., Calderon, T. M., Fallon, J. T. & Berman, J. W. Hepatocyte growth factor is a survival factor for endothelial cells and is expressed in human atherosclerotic plaques. Atherosclerosis 164, 79–87 (2002).

    Article  CAS  Google Scholar 

  43. 43.

    Bell, E. J. et al. Hepatocyte growth factor is positively associated with risk of stroke: the MESA (Multi-Ethnic Study of Atherosclerosis). Stroke 47, 2689–2694 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. 44.

    Chen, X. & Devaraj, S. Monocytes from metabolic syndrome subjects exhibit a proinflammatory M1 phenotype. Metab. Syndr. Relat. Disord. 12, 362–366 (2014).

    Article  CAS  Google Scholar 

  45. 45.

    Elkind, M. S. et al. Interleukin-2 levels are associated with carotid artery intima-media thickness. Atherosclerosis 180, 181–187 (2005).

    Article  CAS  Google Scholar 

  46. 46.

    Porez, G., Prawitt, J., Gross, B. & Staels, B. Bile acid receptors as targets for the treatment of dyslipidemia and cardiovascular disease. J. Lipid Res. 53, 1723–1737 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. 47.

    Berry, C. E. & Hare, J. M. Xanthine oxidoreductase and cardiovascular disease: molecular mechanisms and pathophysiological implications. J. Physiol. 555, 589–606 (2004).

    Article  CAS  Google Scholar 

  48. 48.

    Sane, D. C., Kontos, J. L. & Greenberg, C. S. Roles of transglutaminases in cardiac and vascular diseases. Front. Biosci. 12, 2530–2545 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. 49.

    Wollert, K. C., Kempf, T. & Wallentin, L. Growth differentiation factor 15 as a biomarker in cardiovascular disease. Clin. Chem. 63, 140–151 (2017).

    Article  CAS  Google Scholar 

  50. 50.

    Klok, M. D., Jakobsdottir, S. & Drent, M. L. The role of leptin and ghrelin in the regulation of food intake and body weight in humans: a review. Obes. Rev. 8, 21–34 (2007).

    Article  CAS  Google Scholar 

  51. 51.

    Charbonneau, B. et al. Pretreatment circulating serum cytokines associated with follicular and diffuse large B-cell lymphoma: a clinic-based case-control study. Cytokine 60, 882–889 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. 52.

    Przewoznik, M. et al. Recruitment of natural killer cells in advanced stages of endogenously arising B-cell lymphoma: implications for therapeutic cell transfer. J. Immunother. 35, 217–222 (2012).

    Article  CAS  Google Scholar 

  53. 53.

    Haabeth, O. A. W. et al. Inflammation driven by tumour-specific Th1 cells protects against B-cell cancer. Nat. Commun. 2, 240 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. 54.

    Ding, Q. et al. CXCL9: evidence and contradictions for its role in tumor progression. Cancer Med. 5, 3246–3259 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  55. 55.

    Rolny, C. et al. HRG inhibits tumor growth and metastasis by inducing macrophage polarization and vessel normalization through downregulation of PlGF. Cancer Cell 19, 31–44 (2011).

    Article  CAS  Google Scholar 

  56. 56.

    Johnson, L. D. S., Goubran, H. A. & Kotb, R. R. Histidine rich glycoprotein and cancer: a multi-faceted relationship. Anticancer Res. 34, 593–603 (2014).

    CAS  PubMed  Google Scholar 

  57. 57.

    Go, R. S., Gundrum, J. D. & Neuner, J. M. Determining the clinical significance of monoclonal gammopathy of undetermined significance: a SEER-Medicare population analysis. Clin. Lymphoma Myeloma Leuk. 15, 177–186.e4 (2015).

    Article  Google Scholar 

  58. 58.

    Turesson, I. et al. Monoclonal gammopathy of undetermined significance and risk of lymphoid and myeloid malignancies: 728 cases followed up to 30 years in Sweden. Blood 123, 338–345 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. 59.

    Alqvist E. et al. Novel subgroups of adult-onset diabetes and their association with outcomes: a data-driven cluster analysis of six variables. Lancet Diabetes Endocrinol. 6, P361–P369 (2018).

    Article  Google Scholar 

  60. 60.

    Cauwenberghs, N. et al. Relation of insulin resistance to longitudinal changes in left ventricular structure and function in a general population. J. Am. Heart Assoc. 7, e008315 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. 61.

    Piening, B. D. et al. Integrative personal omics profiles during periods of weight gain and loss. Cell Syst. 6, 157–170.e8 (2018).

  62. 62.

    Whirl-Carrillo, M. et al. Pharmacogenomics knowledge for personalized medicine. Clinical Pharmacol. Ther. 92, 414–417 (2012).

    Article  CAS  Google Scholar 

  63. 63.

    Li, J. et al. Decoding the genomics of abdominal aortic aneurysm. Cell 174, 1361–1372.e10 (2018).

    Article  CAS  Google Scholar 

  64. 64.

    Douglas, P. S. et al. The future of cardiac imaging: report of a think tank convened by the american college of cardiology. JACC Cardiovasc. Imag. 9, 1211–1223 (2016).

    Article  Google Scholar 

  65. 65.

    Buhr, S. Apple’s Watch isn’t the first with an EKG reader but it will matter to more consumers. TechCrunch (2018).

  66. 66.

    Omer, W., Naveed, A. K., Khan, O. J. & Khan, D. A. Role of cytokine gene score in risk prediction of premature coronary artery disease. Genet. Test. Mol. Biomarkers 20, 685–691 (2016).

    Article  CAS  Google Scholar 

  67. 67.

    Integrated Molecular Pathway Level Analysis (accessed 27 December 2018);

  68. 68.

    Szklarczyk, D. et al. STRINGv10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 43, D447–D452 (2015).

    Article  CAS  Google Scholar 

  69. 69.

    The Integrative Human Microbiome Project. Dynamic analysis of microbiome-host omics profiles during periods of human health and disease. Cell Host Microbe 16, 276–289 (2014).

    Article  CAS  Google Scholar 

  70. 70.

    Cohen, S. & Williamson, G. in The Social Psychology of Health (eds Spacapan, S. and Oskamp, S.) 31–67 (Sage Publications, 1988).

  71. 71.

    Slavich, G. M. & Shields, G. S. Assessing lifetime stress exposure using the Stress and Adversity Inventory for adults (Adult STRAIN): an overview and initial validation. Psychosom. Med. 80, 17–27 (2018).

    Article  Google Scholar 

  72. 72.

    Lee, P. H., Macfarlane, D. J., Lam, T. H. & Stewart, S. M. Validity of the international physical activity questionnaire short form (IPAQ-SF): a systematic review. Int. J. Behav. Nutr. Phys. Act. 8, 115 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  73. 73.

    Pei, D., Jones, C. N. O., Bhargava, R., Chen, Y.-D. I. & Reaven, G. M. Evaluation of octreotide to assess insulin-mediated glucose disposal by the insulin suppression test. Diabetologia 37, 843–845 (1994).

    Article  CAS  Google Scholar 

  74. 74.

    Lam, H. Y. K. et al. Detecting and annotating genetic variations using the HugeSeq pipeline. Nat. Biotechnol. 30, 226–229 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. 75.

    Kalia, S. S. et al. Recommendations for reporting of secondary findings in clinical exome and genome sequencing, 2016 update (ACMG SFv2.0): a policy statement of the American College of Medical Genetics and Genomics. Genet. Med. 19, 249–255 (2017).

    Article  Google Scholar 

  76. 76.

    Trapnell, C., Pachter, L. & Salzberg, S. L. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics 25, 1105–1111 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  77. 77.

    Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 1–21 (2014).

    Article  CAS  Google Scholar 

  78. 78.

    Luminex Multiplex Analysis (Stanford Human Immune Monitoring Core, 2018);

  79. 79.

    Contrepois, K., Jiang, L. & Snyder, M. Optimized analytical procedures for the untargeted metabolomic profiling of human urine and plasma by combining hydrophilic interaction (HILIC) and reverse-phase liquid chromatography (RPLC)-mass spectrometry. Mol. Cell. Proteomics 14, 1684–1695 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. 80.

    Contrepois, K. et al. Cross-platform comparison of untargeted and targeted lipidomics approaches on aging mouse plasma. Sci. Rep. 8, 17747 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. 81.

    Lang, R. M. et al. Recommendations for cardiac chamber quantification by echocardiography in adults: an update from the American Society of Echocardiography and the European Association of Cardiovascular Imaging. J. Am. Soc. Echocardiogr. 28, 1–39.e14 (2015).

    Article  Google Scholar 

  82. 82.

    Wilson, P. W. F. et al. Prediction of coronary heart disease using risk factor categories. Circulation 97, 1837–1847 (1998).

    Article  CAS  Google Scholar 

  83. 83.

    Smith, D. A. In adults without CVD, the MESA score, including coronary artery calcium, predicted 10-y risk for CHD events. Ann. Intern. Med. 164, JC35 (2016).

    Article  Google Scholar 

  84. 84.

    McClelland, R. L. et al. 10-Year coronary heart disease risk prediction using coronary artery calcium and traditional risk factors: derivation in the mesa (multi-ethnic study of atherosclerosis) with validation in the HNR (Heinz Nixdorf Recall) study and the DHS (Dallas Heart Study). J. Am. Coll. Cardiol. 66, 1643–1653 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. 85.

    Lee, K. K., Cipriano, L. E., Owens, D. K., Go, A. S. & Hlatky, M. A. Cost-effectiveness of using high-sensitivity C-reactive protein to identify intermediate- and low-cardiovascular-risk individuals for statin therapy. Circulation 122, 1478–1487 (2010).

    Article  CAS  Google Scholar 

  86. 86.

    Myers, J., Bader, D., Madhavan, R. & Froelicher, V. Validation of a specific activity questionnaire to estimate exercise tolerance in patients referred for exercise testing. Am. Heart J. 142, 1041–1046 (2001).

    Article  CAS  Google Scholar 

  87. 87.

    Arena, R., Myers, J., Aslam, S. S., Varughese, E. B. & Peberdy, M. A. Technical considerations related to the minute ventilation/carbon dioxide output slope in patients with heart failure. Chest 124, 720–727 (2003).

    Article  Google Scholar 

  88. 88.

    Kaminsky, L. A., Imboden, M. T., Arena, R. & Myers, J. Reference standards for cardiorespiratory fitness measured with cardiopulmonary exercise testing using cycle ergometry: data from the fitness registry and the importance of exercise national database (FRIEND) Registry. Mayo Clin. Proc. 92, 228–233 (2017).

    Article  Google Scholar 

  89. 89.

    Hovorka, R., Soons, P. A. & Young, M. A. ISEC: a program to calculate insulin secretion. Comput. Methods Programs Biomed. 50, 253–264 (1996).

    Article  CAS  Google Scholar 

  90. 90.

    Kamburov, A., Cavill, R., Ebbels, T. M. D., Herwig, R. & Keun, H. C. Integrated pathway-level analysis of transcriptomics and metabolomics data with IMPaLA. Bioinformatics 27, 2917–2918 (2011).

    Article  CAS  Google Scholar 

  91. 91.

    Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. 92.

    Fruchterman, T. M. J. & Reingold, E. M. Graph drawing by force-directed placement. Softw. Pract. Exp. 21, 1129–1164 (1991).

    Article  Google Scholar 

  93. 93.

    Montagna, P. A. Using SAS to manage biological species data and calculate diversity indices. in 2014 SCSUG Educational Forum (South Central SAS Users Group, 2014).

  94. 94.

    Caporaso, J. G. et al. QIIME allows analysis of high-throughput community sequencing data. Nat. Methods 7, 335–336 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  95. 95.

    Callahan, B. J. et al. DADA2: High-resolution sample inference from Illumina amplicon data. Nat. Methods 13, 581–583 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  96. 96.

    Bokulich, N. A. et al. Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin. Microbiome 6, 90 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  97. 97.

    Callahan, B. J., McMurdie, P. J. & Holmes, S. P. Exact sequence variants should replace operational taxonomic units in marker-gene data analysis. ISME J. 11, 2639 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  98. 98.

    Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 4, 7 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references


Our work was supported by grants from the National Institutes of Health (NIH) Human Microbiome Project (HMP) 1U54DE02378901 (G.M.W. and M.P.S.), an NIH grant no. R01 DK110186-03 (T.L.M.), a NIH National Center for Advancing Translational Science Clinical and Translational Science Award (no. UL1TR001085). This work used the Genome Sequencing Service Center by the Stanford Center for Genomics and Personalized Medicine Sequencing Center (supported by NIH grant no. S10OD020141), the Diabetes Genomics Analysis Core and the Clinical and Translational Core of the Stanford Diabetes Research Center (NIH grant no. P30DK116074). S.M.S.-F.R. was supported by a Department of Veteran Affairs Office of Academic Affiliations Advanced Fellowship in Spinal Cord Injury Medicine and a NIH Career Development Award no. K08 ES028825. G.M.S. was supported by NIH grant no. K08 MH103443. D.H. was supported by a Stanford School of Medicine Dean’s Postdoctoral Fellowship and a Stanford Center for Computational, Evolutionary and Human Genomics Fellowship. M.R.S. was supported by grant nos. P300PA_161005 and P2GEP3_151825 from the Swiss National Science Foundation (SNSF). The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH, the Department of Veteran Affairs, or the SNSF. We thank S. Chen and B. Lee for their work in metabolomics data production. A. Breschi generously shared her code for the ISR calculations. Finally, we thank the iPOP participants who generously gave their time and biological samples.

Author information




S.M.S.-F.R., M.P.S., F.H., K.C., K.M., T.M. and W.Z. contributed to the conceptualization. S.M.S.-F.R., K.C., F.H., M.P.S., T.M., K.M., S.M., W.Z. and S.R. contributed to the methodology. K.C. (ASCVD biomarkers), D.H. (Lipidomics), A.B.G. (Microbiome DADA2 processing), T.M., M.A. and W.Z. (OGTT C-peptide and insulin) contributed to omics generation and/or processing. S.M.S.-F.R., K.C., T.M., W.Z., J.D., M.A., J.W.C., E.S. and P.L. contributed to data curation. K.C., S.M.S.-F.R., T.M., K.M., F.H. and M.P.S. contributed to visualization. S.M.S.-F.R., K.C., T.M., S.M., K.M., O.D.-R., S.R., J.C. and C.R. contributed to formal analysis. S.M.S.-F.R., K.C. and M.P.S. contributed to project administration. M.P.S. and F.H. contributed to supervision. S.M.S.-F.R., F.H., K.C., K.M. and M.P.S. contributed to writing and preparing the original draft. S.M.S.-F.R., K.C., K.M., F.H., M.P.S., W.Z., A.B.G., D.H., J.D., G.M.S, T.M., M.T., D.P., T.L.M., A.J.B., M.R.S. and S.A. contributed to review and editing. K.M., F.H. and J.W.C. contributed to cardiovascular clinical data collection and investigation. W.Z., S.R., M.A., P.L., D.P., M.T., T.L.M. and S.M.S.-F.R. contributed to iPOP/iHMP clinical data collection/investigation. W.Z., S.R.L, M.P.S., T.L.M., E.S. and G.M.W. contributed to iPOP/iHMP project administration. K.C. (metabolomics), S.A. (proteomics), M.R.S. (DNA, RNA-seq), W.Z. (microbiome, cytokines, and overall omics data), Y.Z. (microbiome), T.M. and D.H. (batch correction methodology for proteomics) contributed to iPOP/iHMP omics raw data processing. M.P.S., G.M.W., T.L.M. and E.S. contributed to iPOP/iHMP funding acquisition.

Corresponding authors

Correspondence to Francois Haddad or Michael P. Snyder.

Ethics declarations

Competing interests

M.P.S. is a cofounder of Personalis, SensOmics, January, Filtricine, Qbio and Akna and an inventor on provisional patent number 62/814,746 ‘Methods for evaluation and treatment of glycemic dysregulation and applications thereof’. S.M.S.-F.R., K.C., W.Z., T.M. and S.M. are also listed as inventors. A.J.B. reports grants and non-financial support from Progenity, grants and personal fees from NIH (multiple institutes) and Genentech, and grants from L’Oreal, personal fees from NuMedii, Personalis, Lilly, Assay Depot, Geisinger Health, GNS Healthcare, uBiome, Roche, Wilson Sonsini Goodrich & Rosati, Orrick, Herrington & Sutcliffe, Verinata, 10x Genomics, Pathway Genomics, Guardant Health, Gerson Lehrman Group, Nuna Health, Samsung, Capital Royalty Group, Optum Labs, Pfizer, AbbVie, Bayer, Three Lakes Partners, HudsonAlpha, Tensegrity, Westat, FH Foundation, WuXi, FlareCapital, Helix, Roam Insights, Autodesk, Regenstrief Institute, American Medical Association, Precision Medicine World Conference, and Mars during the conduct of the study. A.J.B. has pending patent Atul J. Butte, Keiichi Kodama, Methods for diagnosis and treatment of non-insulin dependent diabetes mellitus, published August 4, 2011, WO2011094731 and US20130071408; patent Joel T. Dudley, Atul J. Butte, Method and System for Computing and Integrating Genetic and Environmental Health Risks for a Personal Genome, published April 26, 2012, US20120101736 with royalties paid to Personalis; patent Joel T. Dudley, Atul J. Butte, Method And System For Functional Evolutionary Assessment Of Genetic Variants, published April 11, 2013, US20130090909 with royalties paid to Personalis; patent Konrad Karczewski, Michael Snyder, Atul J. Butte, Joel T. Dudley, Eurie Hong, Alan Boyle, J. Michael Cherry, Method and System for Assessment of Regulatory Variants in a Genome, published May 9, 2013, US20130116931 with royalties paid to Personalis; and patent Frederick Dewey, Euan Ashley, Carlos Daniel Bustamante, Atul Butte, Jake Byrnes, Rong Chen, Phased Whole Genome Genetic Risk In A Family Quartet, published March 28, 2013, US20130080068, with royalties paid to Personalis; Stanford University pays royalties each year on licensed intellectual property.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Integrated personalized omics profiling cohort flow chart and genetic ancestry.

a, The flow chart demonstrates recruitment and enrollment of the iPOP cohort. b, PCA plot showing the ancestries of 72 participants. The reference includes 2,504 samples from the 1,000GP11. Each filled circle is a 1,000GP sample, colored by the super-population of ancestral origin, namely African (AFR; red), admixed American (AMR; purple), East Asian (EAS; green), European (EUR; cyan) and South Asian (SAS; orange). Each black symbol is an individual from the study, which we categorized by self-reported ethnicity consistent with the 1,000GP super-population definitions, namely AFR (black filled circle), AMR (black filled triangle), EAS (black filled square), EUR (black plus sign) and South Asian (a checked box). We see that the individuals in our study have self-reported ancestries generally clustering in the super-population reference panel from the 1,000GP. Source Data

Extended Data Fig. 2 Comparison of diabetic metrics in categorizing individuals when performed at the same time and HbA1C trajectories.

a, Overlap of FPG and hemoglobin A1C (HbA1C) categories when simultaneously measured. FPG impaired: 1.0 mg ml−1 ≤ FPG < 1.26 mg ml−1; diabetic range: FPG ≥ 1.26 mg ml−1; HbA1C impaired: 5.7% ≤ HbA1C < 6.5%; diabetic range: HbA1C ≥ 6.5%. b, Overlap of FPG and 2-hour OGTT when simultaneously measured. FPG ranges as above. OGTT impaired: 1.40 mg ml−1 ≤ OGTT < 2.00 mg ml−1; diabetic range ≥2.00 mg ml−1. c, Longitudinal patterns of changes in Hemoglobin A1C (HbA1C) over time. Six different patterns could be characterized including: 1, participants who remained in the normal range the entire study (Group 1, n = 51); 2, participants who progressed from normal to prediabetic (Group 2, n = 5); 3, participants who went from prediabetic to normal (Group 3, n = 10); 4, participants whose HbA1C went back and forth from normal to prediabetic (Group 4, n = 21); 5, participants whose HbA1C laboratory results were predominantly in the prediabetic range (Group 5, n = 14) and 6, participants whose HbA1C crossed into the diabetic range (Group 6, n = 8). The red lines represent the overall penalized b-spline of participants’ data in each category. Source Data

Extended Data Fig. 3 Additional individual longitudinal trajectories for diabetic measures.

Diabetic-range metrics are indicated in red. ae, Diabetic-range OGTT (a), Diabetic-range FPG (b,c), undiagnosed DM at study entry (HbA1C) (d), Initial abnormality HbA1C (e). Note this person had two HbA1C measurements on the same day at two different laboratories and was started on medication based on the higher measurement. f,g, Bouncer with diabetic-range HbA1C and OGTT (f) and SSPG decrease with lifestyle change (g). Source Data

Extended Data Fig. 4 Longitudinal microbiome trajectories in diabetes.

a,b, Longitudinal weight, gut microbial Shannon diversity and phylum proportion changes in participants ZNDMXI3 (a) and ZNED4XZ (b). c, Longitudinal changes in genus proportion (ZNDMXI3). d,e, Microbiome outliers (95th percentile) at the latest microbiome sample time point in participants ZNDMXI3 (d) and ZNED4XZ (e). Microbial abundance is scaled by row with low (blue) and high (red) abundance. Source Data

Extended Data Fig. 5 Multi-omics of glucose metabolism and inflammation.

a, Proteins and metabolites associated with HbA1C, FPG and hsCRP using healthy-baseline and dynamic linear mixed models. Healthy-baseline models (HbA1C n = 101, samples 560; FPG n = 101, samples 563; hsCRP n = 98, samples 518) account for repeated measures at healthy time points. Dynamic models are similar models except that analytes are normalized across individuals to the first measurement and all time points in the study are used (HbA1C n = 94, samples = 836; FPG n = 94, samples = 843; hsCRP n = 92, samples 777). Individual analyte P values were determined using a two-sided t-test. Multiple testing correction was performed and molecules were considered significant when Benjamini–Hochberg FDR < 0.2. Model estimates were normalized in each condition so the maximum value equal to 1 and the minimal value equal to −1. b, Integrative pathway analysis using IMPaLa67 of proteins and metabolites associated with HbA1C (n = 101, samples 560), FPG (n = 101, samples 563) and hsCRP (n = 98, samples 518) as determined by the healthy-baseline models (Benjamini–Hochberg FDR < 0.2 at molecule level) that matched to known pathways. Significance of pathways for proteins and metabolites separately is determined by the hypergeometric test (one-sided) followed by Fisher’s combined probability test (one-sided) to determine combined pathway significance (Benjamini–Hochberg FDR < 0.05; n’s of proteins and metabolites for each pathway are provided in Supplementary Tables 9, 11 and 13).

Extended Data Fig. 6 Outlier Analysis of RNA-seq data.

a, Number of outlier RNA molecules (95th percentile) in each participant. Outlier analysis was performed on Z scores calculated on the median expression level of each gene at healthy visits in individuals with at least three healthy visits (n = 63). The box is defined as 25th and 75th quartile. The upper whisker extends to 1.5 times the interquartile range from the box and the lower whisker to the lowest data point. The horizontal bar in the box is the median value. b, Selected clinical laboratory and metabolite trajectories (seven measurement time points) for participant ZJTKAE3 showing a concomitant increase of bile acids and glutamyl dipeptides with ALT (alanine aminotransferase) and AST (aspartate aminotransferase). Source Data

Extended Data Fig. 7 Multidimensional cardiac risk assessment.

a, Distribution of ASCVD risk scores (n = 35 participants, 36 measurements) and cardiovascular imaging and physiology measures that have been established as cardiovascular risk markers. (Abbreviations: RWT-relative wall thickness, LV GLS-left ventricular global longitudinal strain, E/e’ - ratio of mitral peak velocity of early filling (E) to early diastolic mitral annular velocity (e’), PWV-pulse wave velocity). Please note that thresholds for PWV are age-related. Box plots were derived to display quartiles (Q1, median, Q3) with the upper whisker being Q3 + 1.5 × (interquartile range) and the lower whisker extending to Q1 − 1.5 × (interquartile range) or the lowest data point. b, Ultrasound of carotid plaque (6 participants out of 35 had an ultrasound finding of carotid plaque) and relative distribution of ASCVD risk score, HbA1C and LV GLS in function of presence or absence of carotid plaque (Student’s t-test (two-sided) was used to evaluate differences between groups; n = 35, 36 measurements) (Abbreviations: CCA-common carotid artery; IJV-internal jugular vein). Error bars represent one standard deviation from the mean (upper edge of box). c, Correlation network of selected metrics collected during cardiovascular assessment (Spearman correlation (two-sided) with q < 0.2; n = 35 participants with 36 measurements). d, Composite Z score of ZOBX723 (unstable angina with stent placement) and ZNED4XZ (mild stroke with full recovery and transition to diabetes). For ZOBX723, day 829 occurred 3 weeks post-stent placement. Day 679 was a mid-infection time point. For ZNED4XZ, day 699 was the time point before the participant’s transition to diabetes and day 846 was the first diabetic time point. The stroke occurred on day 307 for this individual. Gray dots represent Z scores of other participants (n = 101 with 859 samples). e, Violin plot showing the same data as d (n = 101 with 859 samples). The box plot shows the first (lower edge of box), median (middle line) and third (upper edge of box) quartiles. The upper whisker is the third quartile + 1.5 × (interquartile range) and the lower whisker is the lowest data point. Source Data

Supplementary information

Reporting Summary

Supplementary Tables

Supplementary Tables 0–28

Supplementary Data

Data Tables 1–24

Source data

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Schüssler-Fiorenza Rose, S.M., Contrepois, K., Moneghetti, K.J. et al. A longitudinal big data approach for precision health. Nat Med 25, 792–804 (2019).

Download citation

Further reading


Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing