Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Longitudinal machine learning uncouples healthy aging factors from chronic disease risks

Abstract

To understand human longevity, inherent aging processes must be distinguished from known etiologies leading to age-related chronic diseases. Such deconvolution is difficult to achieve because it requires tracking patients throughout their entire lives. Here, we used machine learning to infer health trajectories over the entire adulthood age range using extrapolation from electronic medical records with partial longitudinal coverage. Using this approach, our model tracked the state of patients who were healthy and free from known chronic disease risk and distinguished individuals with higher or lower longevity potential using a multivariate score. We showed that the model and the markers it uses performed consistently on data from Israeli, British and US populations. For example, mildly low neutrophil counts and alkaline phosphatase levels serve as early indicators of healthy aging that are independent of risk for major chronic diseases. We characterize the heritability and genetic associations of our longevity score and demonstrate at least 1 year of extended lifespan for parents of high-scoring patients compared to matched controls. Longitudinal modeling of healthy individuals is thereby established as a tool for understanding healthy aging and longevity.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The evolution of longevity potential through adulthood.
Fig. 2: Longevity model predictive value and clinical markers contribution throughout adulthood.
Fig. 3: Chronic diseases.
Fig. 4: The longevity gap: modeling age-related chronic disease impact on longevity.
Fig. 5: Longevity scoring in healthy individuals is robust across healthcare systems.
Fig. 6: Prioritizing longevity gene variants using multivariate disease risk modeling.

Similar content being viewed by others

Data availability

UKBB data are available to approved researchers via the UKBB Research Analysis Platform (www.ukbiobank.ac.uk/enable-your-research/research-analysis-platform). The longevity GWAS results are available at https://gwasresults.s3.ap-south-1.amazonaws.com/gwas_longevity_age_sex_covar_extended.tsv.gz. The NHANES data can be accessed at www.cdc.gov/nchs/nhanes/index.htm. Access to the CHSDB data used for this study can be made available upon reasonable request, at the discretion of the CHS, subject to an internal review by A.T. to ensure that participant privacy is protected, and subject to completion of a data sharing agreement, approval from the institutional review board of CHS and institutional guidelines, and in accordance with the current data sharing guidelines of CHS and Israeli law. Subject to receipt of the aforementioned CHS consent and subsequent approvals, data sharing will be made in a secure setting, on a per-case-specific manner, solely for the purpose of reproducing the analysis carried in the research paper, as defined by the chief information security officer of CHS. Please submit such requests to A.T.

Code availability

All model training was performed by applying the newly developed mldpEHR R package (https://github.com/tanaylab/mldpEHR) on CHDB. Laboratory normalization was conducted using the labNorm R package (http://github.com/tanaylab/labNorm). The code applied to the UKBB for models score computation, patient classification and genetic analysis is available at https://github.com/tanaylab/Mendelson_et_al_2023 and in Supplementary Software File 1.

References

  1. Kennedy, B. K. et al. Geroscience: linking aging to chronic disease. Cell 159, 709–713 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Barzilai, N., Cuervo, A. M. & Austad, S. Aging as a biological target for prevention and therapy. JAMA 320, 1321–1322 (2018).

    Article  PubMed  Google Scholar 

  3. Kennedy, B. K. et al. Aging: a common driver of chronic diseases and a target for novel interventions. Cell 159, 709–713 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. World Health Organization. World Health Statistics 2022: Monitoring Health for the SDGs, Sustainable Development Goals (2022).

  5. Niccoli, T. & Partridge, L. Ageing as a risk factor for disease. Curr. Biol. 22, R741–R752 (2012).

    Article  CAS  PubMed  Google Scholar 

  6. López-Otín, C., Blasco, M. A., Partridge, L., Serrano, M. & Kroemer, G. The hallmarks of aging. Cell 153, 1194–1217 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  7. Cohen, A. A. et al. A complex systems approach to aging biology. Nat. Aging 2, 580–591 (2022).

    Article  PubMed  Google Scholar 

  8. Marques, I. C. P. & Ferreira, J. J. M. Digital transformation in the area of health: systematic review of 45 years of evolution. Health Technol. 10, 575–586 (2020).

    Article  Google Scholar 

  9. Jensen, P. B., Jensen, L. J. & Brunak, S. Mining electronic health records: towards better research applications and clinical care. Nat. Rev. Genet. 13, 395–405 (2012).

    Article  CAS  PubMed  Google Scholar 

  10. Shilo, S., Rossman, H. & Segal, E. Axes of a revolution: challenges and promises of big data in healthcare. Nat. Med. 26, 29–38 (2020).

    Article  CAS  PubMed  Google Scholar 

  11. Cohen, N. M. et al. Personalized lab test models to quantify disease potentials in healthy individuals. Nat. Med. 27, 1582–1591 (2021).

    Article  CAS  PubMed  Google Scholar 

  12. Belding, J. N. et al. The Millennium Cohort Study: the first 20 years of research dedicated to understanding the long-term health of US Service Members and Veterans. Ann. Epidemiol. 67, 61–72 (2022).

    Article  PubMed  Google Scholar 

  13. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Deary, I. J., Gow, A. J., Pattie, A. & Starr, J. M. Cohort profile: the Lothian birth cohorts of 1921 and 1936. Int. J. Epidemiol. 41, 1576–1584 (2012).

    Article  PubMed  Google Scholar 

  16. Siggaard, T. et al. Disease trajectory browser for exploring temporal, population-wide disease progression patterns in 7.2 million Danish patients. Nat. Commun. 11, 4952 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Viippola, E. et al. Data resource profile: nationwide registry data for high-throughput epidemiology and machine learning (FinRegistry). Int. J. Epidemiol. 52, e195–e200 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Liu, Z. et al. A new aging measure captures morbidity and mortality risk across diverse subpopulations from NHANES IV: a cohort study. PLoS Med. 15, e1002718 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  19. Balicer, R. D. & Afek, A. Digital health nation: Israel’s global big data innovation hub. Lancet 389, 2451–2453 (2017).

    Article  PubMed  Google Scholar 

  20. Singer, S. R. et al. EMR-based medication adherence metric markedly enhances identification of nonadherent patients. Am. J. Manag. Care 18, e372–e377 (2012).

    PubMed  Google Scholar 

  21. Centers for Disease Control and Prevention. About NHANES www.cdc.gov/nchs/nhanes/about_nhanes.htm (2023).

  22. Cheng, C. K.-W., Chan, J., Cembrowski, G. S. & van Assendelft, O. W. Complete blood count reference interval diagrams derived from NHANES III: stratification by age, sex, and race. Lab. Hematol. 10, 42–53 (2004).

    Article  PubMed  Google Scholar 

  23. Avati, A. et al. Improving palliative care with deep learning. BMC Med. Inform. Decis. Mak. 18, 122 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  24. Rajkomar, A. et al. Scalable and accurate deep learning with electronic health records. NPJ Digit. Med. 1, 18 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Beeksma, M. et al. Predicting life expectancy with a long short-term memory recurrent neural network using electronic medical records. BMC Med. Inform. Decis. Mak. 19, 36 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Taylor, R. A. et al. Prediction of in-hospital mortality in emergency department patients with sepsis: a local big data-driven, machine learning approach. Acad. Emerg. Med. 23, 269–278 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  27. Almagro, P. et al. Mortality after hospitalization for COPD. Chest 121, 1441–1448 (2002).

    Article  PubMed  Google Scholar 

  28. Asrani, S. K., Devarbhavi, H., Eaton, J. & Kamath, P. S. Burden of liver diseases in the world. J. Hepatol. 70, 151–171 (2019).

    Article  PubMed  Google Scholar 

  29. Roglic, G. et al. The burden of mortality attributable to diabetes: realistic estimates for the year 2000. Diabetes Care 28, 2130–2135 (2005).

    Article  PubMed  Google Scholar 

  30. Wen, C. P. et al. All-cause mortality attributable to chronic kidney disease: a prospective cohort study based on 462 293 adults in Taiwan. Lancet 371, 2173–2182 (2008).

    Article  PubMed  Google Scholar 

  31. Boehme, M. W. J. et al. Prevalence, incidence and concomitant co-morbidities of type 2 diabetes mellitus in South Western Germany—a retrospective cohort and case control study in claims data of a large statutory health insurance. BMC Public Health 15, 855 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  32. Pelletier, C. et al. Diabetes in Canada: facts and figures from a public health perspective. Chronic Dis. Inj. Can. 33, 53–54 (2012).

    Article  CAS  PubMed  Google Scholar 

  33. Franceschi, C. et al. Inflamm-aging: an evolutionary perspective on immunosenescence. Ann. N. Y. Acad. Sci. 908, 244–254 (2000).

    Article  CAS  PubMed  Google Scholar 

  34. Furman, D. et al. Chronic inflammation in the etiology of disease across the life span. Nat. Med. 25, 1822–1832 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Zahorec, R. Ratio of neutrophil to lymphocyte counts—rapid and simple parameter of systemic inflammation and stress in critically ill. Bratisl. Lek. Listy. 102, 5–14 (2001).

    CAS  PubMed  Google Scholar 

  36. Wey, T. W. et al. An emergent integrated aging process conserved across primates. J. Gerontol. A Biol. Sci. Med. Sci. 74, 1689–1698 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. D’Agostino, R. B. et al. General cardiovascular risk profile for use in primary care: the Framingham Heart Study. Circulation 117, 743–753 (2008).

    Article  PubMed  Google Scholar 

  38. Hageman, S. et al. SCORE2 risk prediction algorithms: new models to estimate 10-year risk of cardiovascular disease in Europe. Eur. Heart J. 42, 2439–2454 (2021).

    Article  CAS  Google Scholar 

  39. Conroy, R. M. et al. Estimation of ten-year risk of fatal cardiovascular disease in Europe: the SCORE project. Eur. Heart J. 24, 987–1003 (2003).

    Article  CAS  PubMed  Google Scholar 

  40. Fox, C. S. et al. Genomewide linkage analysis to serum creatinine, GFR, and creatinine clearance in a community-based population: the Framingham Heart Study. J. Am. Soc. Nephrol. 15, 2457–2461 (2004).

    Article  CAS  PubMed  Google Scholar 

  41. Ge, T., Chen, C.-Y., Neale, B. M., Sabuncu, M. R. & Smoller, J. W. Phenome-wide heritability analysis of the UK Biobank. PLoS Genet. 13, e1006711 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  42. Poulsen, P., Ohm Kyvik, K., Vaag, A. & Beck-Nielsen, H. Heritability of type II (non-insulin-dependent) diabetes mellitus and abnormal glucose tolerance—a population-based twin study. Diabetologia 42, 139–145 (1999).

    Article  CAS  PubMed  Google Scholar 

  43. Singh, P. P., Demmitt, B. A., Nath, R. D. & Brunet, A. The genetics of aging: a vertebrate perspective. Cell 177, 200–220 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Timmers, P. R. et al. Genomics of 1 million parent lifespans implicates novel pathways and common diseases and distinguishes survival chances. eLife 8, e39856 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  45. Lawlor, D. A., Harbord, R. M., Sterne, J. A. C., Timpson, N. & Davey Smith, G. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat. Med. 27, 1133–1163 (2008).

    Article  PubMed  Google Scholar 

  46. Weissbrod, O. et al. Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nat. Genet. 52, 1355–1363 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. Clegg, A. et al. Development and validation of an electronic frailty index using routine primary care electronic health record data. Age Ageing 45, 353–360 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  48. Farrell, S., Mitnitski, A., Rockwood, K. & Rutenberg, A. D. Interpretable machine learning for high-dimensional trajectories of aging health. PLoS Comput. Biol. 18, e1009746 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  49. Li, Q. et al. Homeostatic dysregulation proceeds in parallel in multiple physiological systems. Aging Cell 14, 1103–1112 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Barzilai, N. et al. Unique lipoprotein phenotype and genotype associated with exceptional longevity. JAMA 290, 2030–2040 (2003).

    Article  CAS  PubMed  Google Scholar 

  51. Schächter, F. et al. Genetic associations with human longevity at the APOE and ACE loci. Nat. Genet. 6, 29–32 (1994).

    Article  PubMed  Google Scholar 

  52. Schoenmaker, M. et al. Evidence of genetic enrichment for exceptional survival using a family approach: the Leiden Longevity Study. Eur. J. Hum. Genet. 14, 79–84 (2006).

    Article  PubMed  Google Scholar 

  53. Jabalameli, M. R. & Zhang, Z. D. Unravelling genetic components of longevity. Nat. Aging 2, 5–6 (2022).

    Article  PubMed  Google Scholar 

  54. Lee, S. H., Yang, J., Goddard, M. E., Visscher, P. M. & Wray, N. R. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics 28, 2540–2542 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  55. Ning, Z., Pawitan, Y. & Shen, X. High-definition likelihood inference of genetic correlations across human complex traits. Nat. Genet. 52, 859–864 (2020).

    Article  CAS  PubMed  Google Scholar 

  56. North, B. J. & Sinclair, D. A. The intersection between aging and cardiovascular disease. Circ. Res. 110, 1097–1108 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Acosta, J. N., Falcone, G. J., Rajpurkar, P. & Topol, E. J. Multimodal biomedical AI. Nat. Med. 28, 1773–1784 (2022).

    Article  CAS  PubMed  Google Scholar 

  58. Mamoshina, P. et al. Machine learning on human muscle transcriptomic data for biomarker discovery and tissue-specific drug target identification. Front. Genet. 9, 242 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  59. Pyrkov, T. V. et al. Longitudinal analysis of blood markers reveals progressive loss of resilience and predicts human lifespan limit. Nat. Commun. 12, 2765 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Chen, T. & Guestrin, C. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining 785–794 (ACM, 2016).

  61. Allen, N. et al. UK Biobank: current status and what it means for epidemiology. Health Policy Technol.1, 123–126 (2012).

    Article  Google Scholar 

  62. Privé, F., Aschard, H., Ziyatdinov, A. & Blum, M. G. B. Efficient analysis of large-scale genome-wide data with two R packages: bigstatsr and bigsnpr. Bioinformatics 34, 2781–2787 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  63. Price, A. L. et al. Long-range LD can confound genome scans in admixed populations. Am. J. Hum. Genet. 83, 132–135 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Bulik-Sullivan, B. et al. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47, 1236–1241 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank N. Rappaport, A. Bercovich and O. Milman for critical reading of the manuscript and all members of the Tanay laboratory for discussions. Research at the Tanay group was supported in part by the Adelis Foundation, the Kahn Foundation, the Bolton Hope Foundation and the Israel Science Foundation BRG grant and Israel Precision Medicine program.

Author information

Authors and Affiliations

Authors

Contributions

N.M.C., A.L., G.I.B. and A.T. conceived and designed the study. N.M.C., R.J., A.L., E.R. and A.T. developed the software and pipeline. R.B. provided access and initial context to the data. N.M.C., A.L. and E.R. analyzed the data with help from R.J., L.I.S., G.I.B. and A.T. N.M.C., A.L. and A.T. wrote the manuscript with input from all authors.

Corresponding authors

Correspondence to Gabriel I. Barbash or Amos Tanay.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Aging thanks Alan Cohen, Andrew Rutenberg, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 CHSDB cohort longitudinal coverage.

a. Age distribution. Shown are the number of patients available for each age (x axis) and sex (grey for males, pink for females). b. Patient tracking duration distribution. Shown is the cumulative distribution of tracking duration (in years) for all patients in the age range of 30 to 85. c. Chronological patient year distribution. Shown are the number of patients, 30 < age < 85, in each calendric year. Males shown in grey, females in pink. d. Number of lab interactions by age. Shown is the percentage of patients at each age according to the number of WBC tests performed in the previous 3 years (as a proxy for CBC).

Extended Data Fig. 2 Longevity models.

a. High longevity score survival curves. Shown are the Kaplan-Meier survival curves for the two best scoring longevity groups of patients age 80: top 98–100% (yellow, n = 4474) and 96–98% (turquoise, n = 4475). Longevity scores were computed from cross validation on CHSDB. Error bars indicate 95% confidence intervals. b. Longevity model features. Heatmap of mean lab (y axis) feature value for patients age 80 by quantile normalized longevity score (x axis). Lab values were quantile normalized per age/sex (11). c. Disease predispositions by longevity. Chronic diseases with increased mortality (11) were screened for significant difference as a function of longevity score, in estimated 8-year cumulative incidence (with death as competing risk) using cmprsk R package. Shown are the top 5 chronic diseases for patients age 80 (n = 205839), which show a significant fold-increase in disease incidence. Points (center) indicate the cumulative incidence after 8 years for the disease for each bin of longevity score (X-axis). Error bars indicate 95% confidence intervals. d. Disease/Lab only survival model. Shown are the 5-year survival probability estimates (center points) for males (left, n = 92937) and females (right, n = 130804) age 80 stratified by longevity score (x axis) according to a model using only disease information (red), only lab test measurements (yellow) and lab with disease data (blue). Error bars reflect 95% confidence intervals. e. Longevity model training. The schematic describes the longevity model training process. Starting from all patients at age 80, 5-year longevity model is trained using mortality data from patients with 3-year clinical data and 5-year known outcome, dead (0, purple) or alive (1, green). Cross validation model is applied to all patients with available clinical data, including patients with missing outcome. Top 5% scoring patients will be considered as positive cases (class=1) for age 75 longevity model. In 5-year intervals, younger age model is trained based on 5-year outcome and score from older age model. f. Longevity model transition matrices. Shown are the Markov model 5-year transition matrices by age (column) and sex (rows) color coded by log2 probability for transitioning from score at younger age to score at age+5 years. Score values were binned to 20 bins of 5% quantiles. An additional ‘no score’ bin was added for patients missing required lab data.

Extended Data Fig. 3 Disease models.

a. Disease model training. The schematic describes the disease models training process. Starting from all patients at age 80, 5-year disease model is trained using onset data with death as competing risk from patients with 3-year clinical data and 5-year known outcome that were not already sick at age 80, healthy (0, blue) or sick (1, red). Cross validation model is applied to all healthy (not already sick with the disease) patients with available clinical data, including patients with missing outcome. The top scoring patients, according to the expected number of patients to get sick between ages 80 and 85 computed from population cumulative incidence rates (Dprob(80,85)), will be considered as positive cases (class=1) for age 75 disease model. In 5-year intervals, younger age model is trained based on 5-year outcome and score from older age model. b. 5-year disease prevalence. Shown are the inferred 5-year (short-term) disease prevalence by age (x-axis) for low disease score (bottom 5%, blue) and high disease score (top 5%, red). Note that patients with suspected but unverified T2D were excluded from this analysis. c. Alzheimer’s disease. Shown is the 10-year cumulative incidence estimation for Alzheimer’s disease (left) and death (right) stratified by disease score (quantile normalized) for all patients without disease at age 80 (n = 171827) by disease score. Error bars indicate 95% confidence intervals. d. Prostate cancer. Shown are the incidence rate, and lifelong risk for prostate cancer disease model. e. Lung cancer. Similar to D for lung cancer disease model. f. Model features. Shown are mean values by age for key features contributing to prediction of lifelong disease risk in high- and low-risk patients (top/bottom five percentiles). g. Relative chronic disease risks. Similar to Fig. 4c, patients were separated into high / low risk according the disease risk listed in each column. Shown is the lifelong risk for diseases listed in each row. h. Distribution of age difference in T2D onset and other diseases onset. For each T2D newly diagnosed patient, we computed the time difference between T2D onset and the onset of each of the other modeled diseases (if these exist). Shown are boxplots of such time differences stratified by the age at first T2D diagnosis. The middle line indicates the median, box limits represent quartiles, and whiskers are 1.5× the interquartile range.

Extended Data Fig. 4 MLDP disease models features.

Heatmap of mean normalized feature value for patients age 55 for each disease stratified by quantile normalized disease risk (x-axis).

Extended Data Fig. 5 Multivariate MLDP score UMAP projection.

Quantile normalized longevity scores and disease scores were projected via UMAP for each age separately. Each dot represents a single patient. Only patients that were not diagnosed with any of the main diseases are shown.

Extended Data Fig. 6 Strongly healthy patients.

a. Disease predispositions by longevity in strongly healthy patients. Similar to Extended Data Fig. 2c for patients with low disease risk score for all modeled diseases (<0.5) and that were not diagnosed with cancer. N = 157239 age 30, 134983 age 40, 91796 age 50, 86081 age 60, and 53596 age 70. Error bars indicate 95% confidence intervals. b. Model features raw values. Similar to Fig 4g, showing lab raw value (not normalized for age and sex). c. High longevity score survival in strongly healthy patients. Shown are the Kaplan-Meier 10-year survival curves for the two best scoring longevity groups: top 95–100% (yellow) and 90–95% (turquoise) longevity score by patients age (x-axis) in strongly healthy patients. N = 23168/ 21778/ 16058/ 12236/ 8872 patients at age 60/65/70/75/80. Error bars indicate 95% confidence intervals.

Extended Data Fig. 7 MLDP performance using 10-year follow-ups.

a. 10-year lifelong risk models. Similar to Fig. 4c for models trained using 10 years follow-up time. b. 10-year Model features. Similar to Extended Data Fig. 3f for models with 10 years follow-up time. c. Comparing 5y to 10y disease model prediction. Shown are the 10-year estimated cumulative incidence of the diseases (rows) for selected ages (columns) stratified by disease score (quantile) as computed from 5-year (n = 2176596/ 1935991/ 1597164/ 1789586/ 1469513 for age 30/ 40/ 50/60/ 70) and 10-year (n = 2176591/ 1935234/ 1595472/ 1786253/ 1463280 for age 30/ 40/ 50/ 60/ 70) cross validation models on CHSDB. Error bars indicate 95% confidence intervals.

Extended Data Fig. 8 CVD risk model validation.

a-b. Framingham and Score2 score comparison. Framingham score and Score2 were computed on UKBB patients. Shown are the 10-year cumulative incidence estimates (with death as competing risk) for CVD by age stratified by score quantiles (x-axis) for Framingham and Score2 (dashed lines) and by our MLDP CVD score (solid line). N = 49486 / 55673/ 63991/ 82738/ 60391 for patients at age 50/ 55/ 60/ 65/ 70. Error bars indicate 95% confidence intervals.

Extended Data Fig. 9 Longevity heritability.

a. Parent-child correlations of disease risk. Disease risks were rank-based inverse normal transformed per disease and age. Parent-child correlation was estimated as the slope of linear regression between child and average of parents whereby parents were 15 years older than offspring. Shown are the estimates of parent-child correlations for each disease ± std error, stratified by age (x-axis) computed for male offspring (grey), female offspring (pink) and combined (black). n = 430347/ 376405/ 281894/ 173863/ 93739/ 45398/ 15364/ 2321 for T2D, n = 452927/ 400504/ 303569/ 189851/ 103988/ 50925/ 17291/ 2642 for CKD, n = 451686/ 398285/ 300855/ 187115/ 101900/ 49838/ 16914/ 2575 for COPD, n = 456341/ 404196/ 306780/ 192024/ 105297/ 51745/ 17672/ 2702 for CVD, and n = 452001/ 398489/ 300724/ 186539/ 101144/ 49292/ 16734/ 2559 for LD at ages 30/ 35/ 40/ 45/ 50/ 55/ 60/ 65. b. Parent-child correlations of longevity. Similar to A on patients’ longevity score, computed on the entire population (left, n = 536348/ 516826/ 442573/ 333617/ 238153/ 151855/ 67968/ 14587 for ages 30/ 35/ 40/ 45/ 50/ 55/ 60/ 65) and on strongly-healthy offspring individuals (right, n = 229886/ 242594/ 202371/ 141002/ 91847/ 54022/ 22286/ 4405 for ages 30/ 35/ 40/ 45/ 50/ 55/ 60/ 65). c. Longevity snps allele frequencies. Similar to Fig. 6a (right), showing breakdown of allele frequencies of all 15 predisposition groups as shown in Fig. 5d.

Extended Data Fig. 10 Longevity GWAS.

a. Alkaline-phosphotase distribution by ALPL genotype. Shown is the cumulative distribution of alkaline-phosphotase in patients by genetic variant of ALPL rs149344982 snp. b. Parents survival by ALPL variant and alk-phos values. Shown are the survival estimates for parents reaching the age of 85, stratified by ALPL snp variant (left, n = 181972/4942 for genotype 0/1) and by alk-phos lab values (right, n = 70668/108379 with low/high alk-phos). Error bars indicate 95% confidence intervals. c. Parents survival by longevity score, excluding alk-phos contribution. Similar to B, stratifying also by longevity score, computed using a model that did not consider alkaline-phosphatase feature. N = 30945/863 for genotype 0/1 in low longevity score, n = 64518/1790 for genotype 0/1 in mid longevity score and n = 74586/1983 for genotype 0/1 in high longevity score. Error bars indicate 95% confidence intervals. d. QQ (quantile-quantile) plot of GWAS on the longevity score with the five disease predisposition scores introduced as confounders. Shown are observed versus expected GWAS p-values on the –log10 scale. E. Variants with significant association to longevity score that were inconsistent with parents’ survival. Shown are the six snps, oriented by longevity score (high mortality in red), annotated by beta estimate for the longevity model with disease scores as covariates. GWAS for lab tests was obtained from Neale lab (65) and significant associations (pval < 5e-8) are depicted with red indicating variant high values and blue indicating low values inferred from GWAS beta estimates. Cox proportional hazards and Kaplan Meier survival estimates was performed on all snps on mother, father and parents combined. Also shown are Kaplan Meier (km) p-values.

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Cohen, N.M., Lifshitz, A., Jaschek, R. et al. Longitudinal machine learning uncouples healthy aging factors from chronic disease risks. Nat Aging 4, 129–144 (2024). https://doi.org/10.1038/s43587-023-00536-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s43587-023-00536-5

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing