Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

A survival model generalized to regression learning algorithms

Abstract

Survival prediction is an important problem that is encountered widely in industry and medicine. Despite the explosion of artificial intelligence technologies, no uniformed method allows the application of any type of regression learning algorithm to a survival prediction problem. Here, we present a statistical modeling method that is generalized to all types of regression learning algorithm, including deep learning. We present its empirical advantage when it is applied to traditional survival problems. We demonstrate its expanded applications in different types of regression learning algorithm, such as gradient boosted trees, convolutional neural networks and recurrent neural networks. Additionally, we demonstrate its application in clinical informatic data, pathological images and the hardware industry. We expect that this algorithm will be widely applicable for diverse types of survival data, including discrete data types and those suitable for deep learning such as those with time or spatial continuity.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Generalizing right-censored data analysis to a regression problem by complete rank.
Fig. 2: Simulation experiments for complete rank in different scenarios.
Fig. 3: Building complete rank with deep learning and LightGBM to predict cancer survival using histological images and clinical information.
Fig. 4: Building complete rank with RNNs (LSTM) to predict disk failure using time-series data.

Similar content being viewed by others

Data availability

Simulated data are available at from GitHub (https://github.com/GuanLab/GuanRank_All). TCGA data15 are third party and downloadable from their websites using the Genomic Data Commons (GDC) Data Portal19. Backblaze disk failure data are third party and downloadable from the Backblaze harddrive data and stats website16. Source data are available with this paper.

Code availability

Source code is available at https://github.com/GuanLab/GuanRank_All (ref. 20). No restriction is placed on access.

References

  1. Cox, D. R. Regression models and life-tables. J. R. Stat. Soc. B 34, 187–202 (1972).

    MathSciNet  MATH  Google Scholar 

  2. Ishwaran, H. The effect of splitting on random forests. Mach. Learn. 99, 75–118 (2015).

    Article  MathSciNet  Google Scholar 

  3. Ishwaran, H., Kogalur, U. B., Blackstone, E. H. & Lauer, M. S. Random survival forests. Ann. Appl. Stat. 2, 841–860 (2008).

    Article  MathSciNet  Google Scholar 

  4. Ishwaran, H., Kogalur, U. B., Chen, X. & Minn, A. J. Random survival forests for high-dimensional data. Stat. Anal. Data Min. 4, 115–132 (2011).

    Article  MathSciNet  Google Scholar 

  5. Kalbfleisch, J. D. & Prentice, R. L. in The Statistical Analysis of Failure Time Data 328–374 (Wiley, 2011); https://doi.org/10.1002/9781118032985.ch11

  6. Wei, L. J. The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. Stat. Med. 11, 1871–1879 (1992).

    Article  Google Scholar 

  7. Aitkin, M. & Clayton, D. The fitting of exponential, Weibull and extreme value distributions to complex censored survival data using GLIM. J. R. Stat. Soc. C 29, 156–163 (1980).

    MATH  Google Scholar 

  8. Lee, C., Yoon, J. & van der Schaar, M. Dynamic-DeepHit: a deep learning approach for dynamic survival analysis with competing risks based on longitudinal data. IEEE Trans. Biomed. Eng. 67, 122–133 (2020).

    Article  Google Scholar 

  9. Quirós, A., de Prado, A. P., Montoya, N. & Hernández, J. Multi-state models for the analysis of survival studies in biomedical research: an alternative to composite endpoints. In Proc. 13th International Joint Conference on Biomedical Engineering Systems and Technologies (eds De Maria, E. et al.) 194-199 (BIOSTEC, 2020); https://doi.org/10.5220/0009105701940199

  10. Cui, L. et al. A deep learning-based framework for lung cancer survival analysis with biomarker interpretation. BMC Bioinf. 21, 1–14 (2020).

    Article  Google Scholar 

  11. Ren, J., Singer, E. A., Sadimin, E., Foran, D. J. & Qi, X. Statistical analysis of survival models using feature quantification on prostate cancer histopathological images. J. Pathol. Inform. 10, 30 (2019).

    Article  Google Scholar 

  12. Li, H. et al. Deep convolutional neural networks for imaging data based survival analysis of rectal cancer. Proc. IEEE Int. Symp. Biomed. Imaging 2019, 846–849 (2019).

    Google Scholar 

  13. Ching, T., Zhu, X. & Garmire, L. X. Cox-nnet: an artificial neural network method for prognosis prediction of high-throughput omics data. PLoS Comput. Biol. 14, e1006076 (2018).

    Article  Google Scholar 

  14. Harden, J. J. & Kropko, J. Simulating duration data for the Cox model. Political Sci. Res. Methods 7, 921–928 (2019).

    Article  Google Scholar 

  15. Weinstein, J. N. et al. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45, 1113–1120 (2013).

    Article  Google Scholar 

  16. Backblaze. Hard Drive Data and Stats 2013–2015; https://www.backblaze.com/b2/hard-drive-test-data.html

  17. Schuster, M. & Paliwal, K. K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 45, 2673–2681 (1997).

    Article  Google Scholar 

  18. Swindell, W. R. Accelerated failure time models provide a useful statistical framework for aging research. Exp. Gerontol. 44, 190–200 (2009).

    Article  Google Scholar 

  19. National Cancer Institute. Genomic Data Commons Data Portal; https://portal.gdc.cancer.gov/

  20. Guan, Y. GuanRank code (version 1.0.0) (Zenodo, 2021); https://doi.org/10.5281/zenodo.4751702

Download references

Acknowledgements

Y.G. is supported by the NIH (R35-GM133346) and the NSF (#1452656).

Author information

Authors and Affiliations

Authors

Contributions

Y.G. conceived and implemented the complete rank algorithm, simulation and LSTM experiments and wrote the manuscript. D.Y. created the figures. H.L. and K.L. carried out cancer image experiments. D.Z., C.Y. and P.Z. performed LSTM experiments. All authors read and approved the manuscript.

Corresponding author

Correspondence to Yuanfang Guan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Computational Science thanks the anonymous reviewers for their contribution to the peer review of this work. Handling editor: Fernando Chirigati, in collaboration with the Nature Computational Science team.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Supplementary figures.

Source data

Source Data Fig. 2

Statistical source data for Fig. 2b–g.

Source Data Fig. 3

Statistical source data for Fig. 3c.

Source Data Fig. 4

Statistical source data for Fig. 4c.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guan, Y., Li, H., Yi, D. et al. A survival model generalized to regression learning algorithms. Nat Comput Sci 1, 433–440 (2021). https://doi.org/10.1038/s43588-021-00083-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s43588-021-00083-2

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics