Multistate models can be effectively used to characterise the natural history of cancer. Inference from such models has previously been useful for setting screening policies.
We introduce the basic elements of multistate models and the challenges of applying these models to cancer data. Through simulation studies, we examine (1) the impact of assuming time-homogeneous Markov transition intensities when the intensities depend on the time since entry to the current state (i.e., the process is time-inhomogenous semi-Markov) and (2) the effect on precancer risk estimation when observation times depend on an unmodelled intermediate disease state.
In the settings we examined, we found that misspecifying a time-inhomogenous semi-Markov process as a time-homogeneous Markov process resulted in biased estimates of the mean sojourn times. When screen-detection of the intermediate disease leads to more frequent future screening assessments, there was minimal bias induced compared to when screen-detection of the intermediate disease leads to less frequent screening.
Multistate models are useful for estimating parameters governing the process dynamics in cancer such as transition rates, sojourn time distributions, and absolute and relative risks. As with most statistical models, to avoid incorrect inference, care should be given to use the appropriate specifications and assumptions.
This is a preview of subscription content, access via your institution
Subscribe to Journal
Get full journal access for 1 year
only $4.96 per issue
All prices are NET prices.
VAT will be added later in the checkout.
Tax calculation will be finalised during checkout.
Get time limited or full article access on ReadCube.
All prices are NET prices.
All codes used for data simulation and analysis are available on Github (https://github.com/liccheung/multistate.model.simulations).
Beesley LJ, Shuman AG, Mierzwa ML, Bellile EL, Rosen BS, Casper KA, et al. Development and assessment of a model for predicting individualized outcomes in patients with oropharyngeal cancer. JAMA Netw Open. 2021;4:e2120055.
Beesley LJ, Morgan TM, Spratt DE, Singhal U, Feng FY, Furgal AC, et al. Individual and population comparisons of surgery and radiotherapy outcomes in prostate cancer using Bayesian multistate models. JAMA Netw Open. 2019;2:e187765.
Le-Rademacher JG, Peterson RA, Therneau TM, Sanford BL, Stone RM, Mandrekar SJ. Application of multi-state models in cancer clinical trials. Clin Trials. 2018;15:489–98.
Upshaw JN, Konstam MA, Klaveren D, Noubary F, Huggins GS, Kent DM. Multistate model to predict heart failure hospitalizations and all-cause mortality in outpatients with heart failure with reduced ejection fraction: model derivation and external validation. Circ Heart Fail. 2016;9:e003146.
van Vught LA, Klein Klouwenberg PM, Spitoni C, Scicluna BP, Wiewel MA, Horn J, et al. Incidence, risk factors, and attributable mortality of secondary infections in the intensive care unit after admission for sepsis. J Am Med Assoc. 2016;315:1469–79.
Lindbohm JV, Sipila PN, Mars NJ, Pentti J, Ahmadi-Abhari S, Brunner EJ, et al. 5-year versus risk-category-specific screening intervals for cardiovascular disease prevention: a cohort study. Lancet Public Health. 2019;4:e189–99.
Sabia S, Fayosse A, Dumurgier J, Dugravot A, Akbaraly T, Britton A, et al. Alcohol consumption and risk of dementia: 23 year follow-up of Whitehall II cohort study. BMJ. 2018;362:k2927.
Group DER, Nathan DM, Bebu I, Hainsworth D, Klein R, Tamborlane W, et al. Frequency of evidence-based screening for retinopathy in type 1 diabetes. N. Engl J Med. 2017;376:1507–16.
Armitage P, Doll R. The age distribution of cancer and a multi-stage theory of carcinogenesis. Br J Cancer. 1954;8:1–12.
Nowell PC. The clonal evolution of tumor cell populations. Science 1976;194:23–8.
Greaves M. Cancer causation: the Darwinian downside of past success? Lancet Oncol. 2002;3:244–51.
Wacholder S. Precursors in cancer epidemiology: aligning definition and function. Cancer Epidemiol Biomark Prev. 2013;22:521–7.
Duffy SW, Chen HH, Tabar L, Day NE. Estimation of mean sojourn time in breast cancer screening using a Markov chain model of both entry to and exit from the preclinical detectable phase. Stat Med. 1995;14:1531–43.
Chen HH, Duffy SW, Tabar L. A Markov chain method to estimate the tumour progression rate from preclinical to clinical phase, sensitivity and positive predictive value for mammography in breast cancer screening. J R Stat Soc D-Sta. 1996;45:307–17.
Duffy SW, Agbaje O, Tabar L, Vitak B, Bjurstam N, Bjorneld L, et al. Overdiagnosis and overtreatment of breast cancer: estimates of overdiagnosis from two trials of mammographic screening for breast cancer. Breast Cancer Res. 2005;7:258–65.
Yen MF, Tabar L, Vitak B, Smith RA, Chen HH, Duffy SW. Quantifying the potential problem of overdiagnosis of ductal carcinoma in situ in breast cancer screening. Eur J Cancer. 2003;39:1746–54.
Olsen AH, Agbaje OF, Myles JP, Lynge E, Duffy SW. Overdiagnosis, sojourn time, and sensitivity in the Copenhagen mammography screening program. Breast J. 2006;12:338–42.
Yen AM, Chen HH. Modeling the overdetection of screen-identified cancers in population-based cancer screening with the Coxian phase-type Markov process. Stat Med. 2020;39:660–73.
Taghipour S, Banjevic D, Miller AB, Montgomery N, Jardine AK, Harvey BJ. Parameter estimates for invasive breast cancer progression in the Canadian National Breast Screening Study. Br J Cancer. 2013;108:542–8.
Wu YY, Yen MF, Yu CP, Chen HH. Individually tailored screening of breast cancer with genes, tumour phenotypes, clinical attributes, and conventional risk factors. Br J Cancer. 2013;108:2241–9.
Uhry Z, Hedelin G, Colonna M, Asselain B, Arveux P, Rogel A, et al. Multi-state Markov models in cancer screening evaluation: a brief review and case study. Stat Methods Med Res. 2010;19:463–86.
Duffy SW, Day NE, Tabar L, Chen HH, Smith TC. Markov models of breast tumor progression: some age-specific results. J Natl Cancer Inst Monogr. 1997;22:93–7.
Launoy G, Smith TC, Duffy SW, Bouvier V. Colorectal cancer mass-screening: estimation of faecal occult blood test sensitivity, taking into account cancer mean sojourn time. Int J Cancer. 1997;73:220–4.
Prevost TC, Launoy G, Duffy SW, Chen HH. Estimating sensitivity and sojourn time in screening for colorectal cancer: a comparison of statistical approaches. Am J Epidemiol. 1998;148:609–19.
Chen TH, Yen MF, Lai MS, Koong SL, Wang CY, Wong JM, et al. Evaluation of a selective screening for colorectal carcinoma: the Taiwan Multicenter Cancer Screening (TAMCAS) project. Cancer. 1999;86:1116–28.
Chen CD, Yen MF, Wang WM, Wong JM, Chen TH. A case-cohort study for the disease natural history of adenoma-carcinoma and de novo carcinoma and surveillance of colon and rectum after polypectomy: implication for efficacy of colonoscopy. Br J Cancer. 2003;88:1866–73.
van Oortmarssen GJ, Habbema JD. Duration of preclinical cervical cancer and reduction in incidence of invasive cancer following negative pap smears. Int J Epidemiol. 1995;24:300–7.
Aron J, Albert PS, Wentzensen N, Cheung LC. Hidden mover-stayer model for disease progression accounting for misclassified and partially observed diagnostic tests: application to the natural history of human papillomavirus and cervical precancer. Stat Med. 2021;40:3460–76.
Taguchi A, Hara K, Tomio J, Kawana K, Tanaka T, Baba S, et al. Multistate Markov model to predict the prognosis of high-risk human papillomavirus-related cervical lesions. Cancers. 2020;12:270.
Kang M, Lagakos SW. Statistical methods for panel data from a semi-Markov process, with application to HPV. Biostatistics. 2007;8:252–64.
Kay R. A Markov model for analysing cancer markers and disease states in survival studies. Biometrics. 1986;42:855–65.
Chien CR, Lai MS, Chen TH. Estimation of mean sojourn time for lung cancer by chest X-ray screening with a Bayesian approach. Lung Cancer. 2008;62:215–20.
Wu GH, Auvinen A, Maattanen L, Tammela TL, Stenman UH, Hakama M, et al. Number of screens for overdetection as an indicator of absolute risk of overdiagnosis in prostate cancer screening. Int J Cancer. 2012;131:1367–75.
Bhatt R, van den Hout A, Pashayan N. A multistate survival model of the natural history of cancer using data from screened and unscreened population. Stat Med. 2021;40:3791–807.
Lange JM, Gulati R, Leonardson AS, Lin DW, Newcomb LF, Trock BJ, et al. Estimating and comparing cancer progression risks under varying surveillance protocols. Ann Appl Stat. 2018;12:1773–95.
Liu CY, Wu CY, Lin JT, Lee YC, Yen AM, Chen TH. Multistate and multifactorial progression of gastric cancer: results from community-based mass screening for gastric cancer. J Med Screen. 2006;13:S2–5.
Chen HH, Prevost TC, Duffy SW. Evaluation of screening for nasopharyngeal carcinoma: trial design using Markov chain models. Br J Cancer. 1999;79:1894–900.
Divison of the Cancer Epidemiology and Genetics NCI. Connect for Cancer Prevention Study. https://dceg.cancer.gov/research/who-we-study/cohorts/connect Accessed 7 July 2022.
Aalen OO, Borgan O, Fekjaer H. Covariate adjustment of event histories estimated from Markov chains: the additive approach. Biometrics. 2001;57:993–1001.
Cook RJ, Lawless JF. Multistate models for the analysis of life history data. Boca Raton, FL: CRC Press; 2018.
Yang Y, Nair VN. Parametric inference for time-to-failure in multi-state semi-Markov models: A comparison of marginal and process approaches. Can J Stat/La Rev Canadienne de Statistique. 2011;39:537–55.
Cook RJ, Lawless JF, Lakhal-Chaieb L, Lee K-A. Robust estimation of mean functions and treatment effects for recurrent events under event-dependent censoring and termination: application to skeletal complications in cancer metastatic to bone. J Am Stat Assoc. 2009;104:60–75.
Oeffinger KC, Fontham ET, Etzioni R, Herzig A, Michaelson JS, Shih YC, et al. Breast cancer screening for women at average risk: 2015 guideline update from the American Cancer Society. J Am Med Assoc. 2015;314:1599–614.
Shen Y, Zelen M. Robust modeling in screening studies: estimation of sensitivity and preclinical sojourn time distribution. Biostatistics. 2005;6:604–14.
Hsieh HJ, Chen TH, Chang SH. Assessing chronic disease progression using non-homogeneous exponential regression Markov models: an illustration using a selective breast cancer screening in Taiwan. Stat Med. 2002;21:3369–82.
Etzioni R, Shen Y. Estimating asymptomatic duration in cancer: the AIDS connection. Stat Med. 1997;16:627–44.
Dempster AP, Laird NM, Rubin DB. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc Ser B. 1977;39:1–38.
Gail MH, Brinton LA, Byar DP, Corle DK, Green SB, Schairer C, et al. Projecting individualized probabilities of developing breast cancer for white females who are being examined annually. J Natl Cancer Inst. 1989;81:1879–86.
Freedman AN, Slattery ML, Ballard-Barbash R, Willis G, Cann BJ, Pee D, et al. Colorectal cancer risk prediction tool for white men and women without known susceptibility. J Clin Oncol. 2009;27:686–93.
Katki HA, Kovalchik SA, Berg CD, Cheung LC, Chaturvedi AK. Development and validation of risk models to select ever-smokers for CT lung cancer screening. J Am Med Assoc. 2016;315:2300–11.
Bach PB, Kattan MW, Thornquist MD, Kris MG, Tate RC, Barnett MJ, et al. Variations in lung cancer risk among smokers. J Natl Cancer Inst. 2003;95:470–8.
Tammemagi MC, Katki HA, Hocking WG, Church TR, Caporaso N, Kvale PA, et al. Selection criteria for lung-cancer screening. N. Engl J Med. 2013;368:728–36.
Marcus MW, Chen Y, Raji OY, Duffy SW, Field JK. LLPi: liverpool lung project risk prediction model for lung cancer incidence. Cancer Prev Res. 2015;8:570–5.
Cassidy A, Myles JP, van Tongeren M, Page RD, Liloglou T, Duffy SW, et al. The LLP risk model: an individual risk prediction model for lung cancer. Br J Cancer. 2008;98:270–6.
Cheung LC, Ramadas K, Muwonge R, Katki HA, Thomas G, Graubard BI, et al. Risk-based selection of individuals for oral cancer screening. J Clin Oncol. 2021;39:663–74.
Robbins HA, Cheung LC, Chaturvedi AK, Baldwin DR, Berg CD, Katki HA. Management of lung cancer screening results based on individual prediction of current and future lung cancer risks. J Thorac Oncol. 2021;17:252–63.
Robbins HA, Berg CD, Cheung LC, Chaturvedi AK, Katki HA. Identification of candidates for longer lung cancer screening intervals following a negative low-dose computed tomography result. J Natl Cancer Inst. 2019;111:996–9.
Perkins RB, Guido RS, Castle PE, Chelmow D, Einstein MH, Garcia F, et al. 2019 ASCCP risk-based management consensus guidelines for abnormal cervical cancer screening tests and cancer precursors. J Low Genit Trac Dis. 2020;24:102–31.
Cheung LC, Egemen D, Chen X, Katki HA, Demarco M, Wiser AL, et al. 2019 ASCCP risk-based management consensus guidelines: methods for risk estimation, recommended management, and validation. J Low Genit Trac Dis. 2020;24:90–101.
Egemen D, Cheung LC, Chen X, Demarco M, Perkins RB, Kinney W, et al. Risk estimates supporting the 2019 ASCCP risk-based management consensus guidelines. J Low Genit Trac Dis. 2020;24:132–43.
Demarco M, Egemen D, Raine-Bennett TR, Cheung LC, Befano B, Poitras NE, et al. A study of partial human papillomavirus genotyping in support of the 2019 ASCCP risk-based management consensus guidelines. J Low Genit Trac Dis. 2020;24:144–7.
Wright TC Jr, Stoler MH, Behrens CM, Apple R, Derion T, Wright TL. The ATHENA human papillomavirus study: design, methods, and baseline results. Am J Obstet Gynecol. 2012;206:46.e1–e11.
Stoler MH, Wright TC Jr, Parvu V, Vaughan L, Yanson K, Eckert K, et al. The onclarity human papillomavirus trial: design, methods, and baseline results. Gynecol Oncol. 2018;149:498–505.
Schiffman M, Adrianza ME. ASCUS-LSIL Triage Study. Design, methods and characteristics of trial participants. Acta Cytol. 2000;44:726–42.
Herrero R, Hildesheim A, Rodriguez AC, Wacholder S, Bratti C, Solomon D, et al. Rationale and design of a community-based double-blind randomized clinical trial of an HPV 16 and 18 vaccine in Guanacaste, Costa Rica. Vaccine. 2008;26:4795–808.
Herrero R, Schiffman MH, Bratti C, Hildesheim A, Balmaceda I, Sherman ME, et al. Design and methods of a population-based natural history study of cervical neoplasia in a rural province of Costa Rica: the Guanacaste Project. Rev Panam Salud Publica. 1997;1:362–75.
Katki HA, Kinney WK, Fetterman B, Lorey T, Poitras NE, Cheung L, et al. Cervical cancer risk for women undergoing concurrent testing for human papillomavirus and cervical cytology: a population-based study in routine clinical practice. Lancet Oncol. 2011;12:663–72.
Clarke MA, Cheung LC, Castle PE, Schiffman M, Tokugawa D, Poitras N, et al. Five-year risk of cervical precancer following p16/Ki-67 dual-stain triage of HPV-positive women. JAMA Oncol. 2019;5:181–6.
Wellner JA, Zhan Y. A hybrid algorithm for computation of the nonparametric maximum likelihood estimator from censored data. J Am Stat Assoc. 1997;92:945–59.
Jackson C. Multi-state models for panel data: the msm package for R. J Stat Softw. 2011;38:1–28.
Cheung LC, Berg CD, Castle PE, Katki HA, Chaturvedi AK. Life-gained-based versus risk-based selection of smokers for lung cancer screening. Ann Intern Med. 2019;171:623–32.
Gruger J, Kay R, Schumacher M. The validity of inferences based on incomplete observations in disease state models. Biometrics 1991;47:595–605.
Cook RJ, Lawless JF. Statistical issues in modeling chronic disease in cohort studies. Stat Biosci. 2014;6:127–61.
de Una-Alvarez J, Meira-Machado L. Nonparametric estimation of transition probabilities in the non-Markov illness-death model: a comparative study. Biometrics. 2015;71:364–75.
Campos NG, Demarco M, Bruni L, Desai KT, Gage JC, Adebamowo SN, et al. A proposed new generation of evidence-based microsimulation models to inform global control of cervical cancer. Prev Med. 2021;144:106438.
Cheung LC, Pan Q, Hyun N, Schiffman M, Fetterman B, Castle PE, et al. Mixture models for undiagnosed prevalent disease and interval-censored incident disease: applications to a cohort assembled from electronic health records. Stat Med. 2017;36:3583–95.
Katki HA, Schiffman M, Castle PE, Fetterman B, Poitras NE, Lorey T, et al. Five-year risks of CIN 3+ and cervical cancer among women who test Pap-negative but are HPV-positive. J Low Genit Trac Dis. 2013;17:S56–63.
Aralis H, Brookmeyer R. A stochastic estimation procedure for intermittently-observed semi-Markov multistate models with back transitions. Stat Methods Med Res. 2019;28:770–87.
Gasparini A, Humphreys K. Estimating latent, dynamic processes of breast cancer tumour growth and distant metastatic spread from mammography screening data. Stat Methods Med Res. 2022;31:862–81.
Abrahamsson L, Isheden G, Czene K, Humphreys K. Continuous tumour growth models, lead time estimation and length bias in breast cancer screening studies. Stat Methods Med Res. 2020;29:374–95.
Schick A, Yu Q. Consistency of the GMLE with mixed case interval-censored data. Scand J Stat. 2000;27:45–55.
Zhang Z, Sun J. Interval censoring. Stat Methods Med Res. 2010;19:53–70.
Panageas KS, Ben-Porat L, Dickler MN, Chapman PB, Schrag D. When you look matters: the effect of assessment schedule on progression-free survival. J Natl Cancer Inst. 2007;99:428–32.
Sutradhar R, Barbera L. Multistate models for examining the progression of intermittently measured patient-reported symptoms among patients with cancer: the importance of accounting for interval censoring. J Pain Symptom Manag. 2021;61:54–62.
Tolusso D, Cook RJ. Robust estimation of state occupancy probabilities for interval-censored multistate data: an application involving spondylitis in psoriatic arthritis. Commun Stat - Theory Methods. 2009;38:3307–25.
Albert PS. A Mover-Stayer model for longitudinal marker data. Biometrics. 1999;55:1252–7.
Yen AMF, Chen THH, Duffy SW, Chen C-D. Incorporating frailty in a multi-state model: application to disease natural history modelling of adenoma-carcinoma in the large bowel. Stat Methods Med Res. 2010;19:529–46.
Chen HH, Duffy SW, Tabar L. A mover-stayer mixture of Markov chain models for the assessment of dedifferentiation and tumour progression in breast cancer. J Appl Stat. 1997;24:265–78.
Hsu C-Y, Hsu W-F, Yen AM-F, Chen H-H. Sampling-based Markov regression model for multistate disease progression: applications to population-based cancer screening program. Stat Methods Med Res. 2020;29:2198–216.
This work utilised the computational resources of the NIH HPC Biowulf cluster (http://hpc.nih.gov).
This work was funded in part by the Intramural Research Program of the US National Institutes of Health (NIH)/National Cancer Institute.
The authors declare no competing interests.
Ethics approval and consent to participate
The research was carried out on simulated data. No ethics approval was necessary.
Consent to publish
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Cheung, L.C., Albert, P.S., Das, S. et al. Multistate models for the natural history of cancer progression. Br J Cancer 127, 1279–1288 (2022). https://doi.org/10.1038/s41416-022-01904-5