Introduction

Multiple myeloma (MM) is a malignant transformation of plasma cells that caused an estimated 34,920 diagnoses and 12,410 deaths in the United States in 2021 alone1. MM is consistently preceded by a premalignant condition, monoclonal gammopathy of undetermined significance (MGUS)2. A prior cohort study of a predominately white population in Olmsted County, Minnesota suggested that MGUS progresses to MM at a rate of approximately 1% per year with the 20-year cumulative risk totaling 18%3. However, due to the absence of population-based screening and treatment recommendations for MGUS4,5, most MGUS cases are detected incidentally6, and, thus, our understanding of MGUS and MM are primarily informed by clinical studies. Further work is needed to understand the natural history of MM and how it varies across age, gender, and race/ethnicity.

One unresolved aspect of the natural history of MM concerns the observed disparities in MM incidence that exist by gender and race/ethnicity7. There is a greater burden of MM among men compared to women and non-Hispanic Black people compared to non-Hispanic white people. Additionally, both men and non-Hispanics Black people have a higher prevalence of MGUS8,9,10 and develop MM earlier than women and non-Hispanic white people11,12. However, it remains unclear whether the increased incidence of MM can be attributed to an increased incidence of MGUS, to an increased rate of progression from MGUS to MM, or a combination of both7. An improved understanding of the cause of these observed disparities can potentially guide future screening and treatment strategies aiming to reduce these disparities13.

Prior studies have aimed to understand racial and ethnic disparities in MM incidence10,11. Landgren et al.9 leveraged the National Health and Nutritional Examination Survey and found that high-risk features of MGUS were more common among non-Hispanic Black people as compared to non-Hispanic white people, suggesting an increased rate of progression from MGUS to MM among non-Hispanic Black people. However, this study lacked long-term outcomes on MM progression and was unable to confirm this finding. Using the Veterans Health Administration database, Landgren et al.10 examined veterans diagnosed with MGUS during 1980–1996 and found no difference in rate of progression from MGUS to MM among non-Hispanic white people and non-Hispanic Black people. Therefore, they concluded that the higher MM incidence among non-Hispanic Black people could be explained by an increased incidence of MGUS. A more recent analysis using the same database demonstrated progression to MM occurred at a younger age among non-Hispanic Black people as compared to non-Hispanic white people in patients diagnosed with MGUS, which may be attributed to the higher incidence of MGUS, the higher progression rate of MGUS to MM, or both in non-Hispanic Black people13. Nevertheless, reconciling the conclusions between these two studies is challenging, and, importantly, both studies relied upon study populations with clinically diagnosed MGUS. However, MGUS is primarily asymptomatic, so examining the progression of MM among patients with incidentally diagnosed MGUS excludes those with undiagnosed MGUS and therefore may provide an incomplete picture of the natural history of MM.

To address this, we leveraged two nationally representative databases from the United States on MGUS prevalence using data from the National Health and Nutritional Examination Survey (NHANES) and MM incidence using data from the Surveillance, Epidemiology, and End Results (SEER) program. Importantly, the NHANES tested all study participants for MGUS using serum protein electrophoresis irrespective of underlying comorbidities, providing more representative measures of MGUS prevalence than would be obtained using clinically diagnosed MGUS14,15,16. We constructed a compartmental model of the natural history of MM and fit this model using the aforementioned MGUS prevalence and MM incidence. We then used the fitted model to isolate the contributions of age, gender, and race/ethnicity to the observed disparities in MM incidence. Finally, we predicted how the preclinical dwell time, defined as the time from MGUS onset to MM onset, is likely to vary by age, gender, and race/ethnicity.

Results

Model fit

The fitted mathematical model of the natural history of MM shows that five independent MCMC chains were well-mixed (Fig. S2), and the Gelman-Rubin statistics for each parameter were 1.0 (Table S2), providing support that we converged upon on the posterior distribution.

The fitted model captured the trends in MGUS prevalence and MM incidence across age, gender, and race/ethnicity (Fig. 1). Consistent with the data, the fitted model predicted higher MGUS prevalence and MM incidence among non-Hispanic Black people compared to non-Hispanic white people and among men compared to women. Furthermore, the fitted model reproduced the data with appropriate uncertainty. The 95% posterior prediction interval contained all but two data points with most data points falling near the posterior median prediction. Taken together, these results show that our fitted model can explain the patterns in the data.

Fig. 1: Comparison of fitted model to SEER and NHANES data.
figure 1

The model predictions for age-stratified MGUS prevalence and MM incidence are compared against the continuous NHANES 1999–2004 MGUS data (top row) and SEER 2010 MM incidence data (bottom row) for non-Hispanic white men (first column), non-Hispanic white women (second column), non-Hispanic Black men (third column), and non-Hispanic Black women (fourth column). Black points are the data used to fit the model, and the black vertical segments are the 95% confidence intervals on the data. The 95% confidence intervals were calculated for each sub-population using the following samples sizes: n = 1735 non-Hispanic white men, n = 1,703 non-Hispanic white women, n = 454 non-Hispanic Black men, and n = 463 non-Hispanic Black women. The solid line is median posterior model prediction, the darker shaded area is the 50% posterior prediction interval (PPI), and the lighter shaded area is the 95% PPI.

Contributions of age, gender, and race/ethnicity to the development of MGUS and progression to MM

Independent of gender and race/ethnicity, the rates of development of MGUS and progression to MM increased nonlinearly with age. The rate of MGUS development among healthy individuals monotonically increased with age (Fig. 2a, black line), nearly tripling from 0.0012 (95% Credible Interval (CI): 0.0099–0.0015) yr−1 at age 60 to 0.0034 (0.0022–0.0049) yr−1 by age 80. By comparison, the rate of progression to MM among MGUS-positive individuals monotonically increased up to age 71 (95% CI: 67–77) and then monotonically decreased. Modeling the rate of MM progression as a quadratic relationship with age allowed us to capture the declines in MM incidence seen in the 80–84 and 85+ age groups in the SEER data (Fig. 1, bottom row). Supplementary analyses found that including a quadratic term produced a better model fit to the data on the basis of DIC as compared to an alternative model in which the quadratic term was not included (see Supplementary Information for more details).

Fig. 2: Contributions of age, gender, and race/ethnicity to the development of MGUS and progression to MM.
figure 2

a The isolated effects of age (black), age and female gender (red), and age and non-Hispanic Black race/ethnicity (blue) are shown for the rate of MGUS development (yr−1). b The posterior estimate of the MGUS multiplier is shown for female gender (red) and non-Hispanic Black race/ethnicity (blue). c The isolated effect of age (black), age and female gender (red), and age and non-Hispanic Black race/ethnicity (blue) are shown for the rate of MM progression (yr−1). d The posterior estimate of the MM multiplier is shown for female gender (red) and non-Hispanic Black race/ethnicity (blue). In (a) and (c), the line is the median estimate, and the shaded region is the 95% credible interval. In (b) and d), the point is the median estimate, and the vertical line segment is the 95% credible interval. Each 95% credible interval is calculated from the n = 50,010 posterior samples. The horizontal dotted line is the reference multiplier of one. Values above the reference multiplier suggest that the covariate increases the rate of development/progression, whereas values below the reference multiplier suggest that the covariate decreases the rate of development/progression.

We estimated that gender and race/ethnicity modified the rates of development of MGUS and progression to MM with age. For healthy individuals, female gender reduced the rate of MGUS development by a multiplier of 0.59 (95% CI: 0.43–0.79) (Fig. 2a, red line; Fig. 2b). Similarly, non-Hispanic Black race/ethnicity increased the rate of MGUS development among healthy individuals by a multiplier of 2.0 (95% CI: 1.4–2.7) (Fig. 2a, blue line; Fig. 2b). That the 95% CI for both of these multipliers did not include 1.0 implies that female gender and non-Hispanic Black race/ethnicity significantly affect the rate of MGUS development across all age groups. By comparison, female gender and non-Hispanic Black race/ethnicity did not significantly affect the rate of progression to MM among MGUS-positive individuals (Fig. 2d). The multipliers on progression to MM were 1.1 (95% CI: 0.82–1.6) for female gender and 1.2 (0.84–1.8) for non-Hispanic Black race/ethnicity. Supplementary analyses in which we modified the relationships of the rates of MGUS development and MM progression with age or when we used SEER data from 2004 consistently found a significant effect of female gender and non-Hispanic Black race/ethnicity on MGUS development (see the Supplementary Information for more details).

Duration of the preclinical dwell time

The preclinical dwell time, defined as the time from MGUS onset to MM onset, decreased with increasing age of MGUS onset (Fig. 3). For non-Hispanic white men, the expected preclinical dwell time was 16 (95% CI: 14–17) years at age 50, compared to 1.7 (95% CI: 1.6–1.8) years at age 90. As the age of onset increased, the variation in the expected preclinical dwell time similarly declined. This effect can be explained in part by the proportion of each cohort that survived to develop MM. For MGUS onset at age 50, 20% (95% CI: 16–26%) of non-Hispanic white men were expected to develop MM during their lifetime. By contrast, for MGUS onset at age 90, only 1.7% (95% CI: 1.1–2.6%) of non-Hispanic white men were expected to develop MM prior to death.

Fig. 3: Duration of the preclinical dwell time.
figure 3

The posterior median (point) and the 95% credible interval (vertical line segment) is shown for the expected preclinical dwell time (i.e., the time from MGUS onset to MM onset) as a function of age for non-Hispanic white men (dark blue), non-Hispanic white women (light blue), non-Hispanic Black men (dark orange), and non-Hispanic Black women (light orange). Each 95% credible interval is calculated from n = 50,010 posterior samples.

We estimated that the preclinical dwell time was affected by the gender and race/ethnicity of each cohort. Female gender was associated with an increased dwell time across all ages of MGUS onset. For example, the expected dwell time at age 50 was 17 (95% CI: 15–19) years for non-Hispanic white women versus 16 (95% CI: 14–17) years for non-Hispanic white men and 16 (95% CI: 14–18) years for non-Hispanic Black women versus 15 (95% CI: 13–16) years for non-Hispanic Black men. Furthermore, independent of gender, non-Hispanic Black race/ethnicity was associated with a shorter preclinical dwell time. For instance, at age 70, the expected dwell times were 6.6 (95% CI: 6.1–7.1) years for non-Hispanic white men compared to 6.1 (95% CI: 5.6–6.5) years for non-Hispanic Black men and 7.5 (95% CI: 6.9–8.2) years for non-Hispanic white women compared to 7.0 (95% CI: 6.4–7.5) years for non-Hispanic Black women.

Discussion

Obtaining a detailed understanding of the drivers of MM disparities is challenging, because MGUS, the preclinical state, is asymptomatic and most often detected incidentally2. Therefore, clinical studies, which examine progression to MM among patients with diagnosed MGUS exclude those with undiagnosed MGUS, likely biasing their analyses towards MGUS-positive patients with the greatest number of comorbidities10,13. By leveraging nationally representative data on MGUS prevalence14,15,16, our study calibrated a discrete-time, multistate compartmental model of the natural history of MM that was able to uncover whether the higher incidence of MGUS, the progression rate of MGUS to MM, or both contributed to MM health disparities across age, gender, and race/ethnicity.

Prior studies have revealed that male gender and non-Hispanic Black race/ethnicity are risk factors for MGUS and MM, independent of other factors, such as age and socioeconomic status8,9,17,18. Our fitted model suggests that these disparities in MM incidence can be explained by an increased incidence of MGUS among healthy men and non-Hispanic Black people. Importantly, we found no statistically significant difference in the rate of progression from MGUS to MM, and these results were robust to multiple supplementary analyses that considered alternative models, including one in which there was no effect of MGUS on mortality, as well as alternative years of SEER MM incidence data. That the disparities in MM incidence can be explained by differential rate of development of MGUS suggests that strategies aiming to reduce disparities by gender and race/ethnicity should emphasize interventions that reduce the development of MGUS among high-risk groups. Although we identified an increased rate of development of MGUS amongst men and non-Hispanic Black people, our study cannot provide a mechanistic explanation for this phenomenon. It has been previously suggested that greater background plasma cell activity among non-Hispanic Black people may predispose them to developing MGUS and ultimately MM19, and mutational signatures may be detectable in the early decades of life20. Alternatively, differences by race/ethnicity may be explained by socio-contextual factors7 and differences in the distribution of known risk factors, such as obesity13. Future investigation that accounts for these and other hypotheses may eliminate the practice of essentializing race/ethnicity in cancer risk prediction models21.

Independent of gender and race/ethnicity, we estimated the rates of developing MGUS and progressing from MGUS to MM increase nonlinearly with age. The rate of MGUS development monotonically increased with increasing age, suggesting that the observed increase in MGUS prevalence with age reflects a concomitant increase in MGUS incidence. This confirms a finding from a prior modeling study fitted to a predominately white cohort in Olmsted County, Minnesota and reveals that this finding is maintained across race/ethnicity as well as gender22. Furthermore, we estimated that the rate of progression from MGUS to MM peaked at approximately 71 years of age and subsequently declined, which mirrors the observed decline in MM incidence at higher age groups and may reflect a subset of older individuals with a more indolent presentation of MGUS and thus lower overall risk of progression to MM. Taken together, these results suggest that, if implemented, prevention strategies for MGUS may be cost-effective at all age groups, whereas prevention of MM among MGUS-positive individuals using pharmacological management, such as metformin23 or aspirin24, and non-pharmacological management, such as weight loss13, may not be cost-effective beyond 71 years of age.

We estimated that the preclinical dwell time, defined as the time from MGUS onset to MM onset, declined nonlinearly with increasing age of MGUS onset. Because the preclinical dwell time requires that individuals with MGUS survive long enough to develop MGUS, we attribute this phenomenon to two competing effects: (1) the rate of progression from MGUS to MM and (2) the baseline mortality rate. The rate of progression from MGUS to MM increases nonlinearly up to age 71, resulting in a concomitant decline in the preclinical dwell time. Additionally, as individuals age, they are subject to a greater competing risk of mortality. This implies that, at higher ages of MGUS onset, those individuals that survive to progress to MM do so much more quickly, thereby shortening the average preclinical dwell time. This explains apparent differences in the preclinical dwell time between non-Hispanic Black people and non-Hispanic white people. Although we identified no statistically significant difference in the rate of progression from MGUS to MM across race/ethnicity, non-Hispanic Black people are subject to a higher mortality, resulting in shorter preclinical dwell times as compared to non-Hispanic white people.

Our study is subject to a number of limitations. First, we lacked data on smoldering multiple myeloma (SMM), an intermediate state between MGUS and MM, so we are unable to assess how progression to SMM varies across age, gender, and race/ethnicity25. Second, the data from NHANES and SEER are not collected from a single cohort. In order to utilize these distinct data sources within our modeling framework, we assumed that these data sources are nationally representative and thus reflect samples from the true MGUS prevalence and MM incidence within the U.S. population. Given the extensive geographic coverage and large samples sizes of these data sources14,15,16,26, we believe the assumption is valid. However, the NHANES database is subject to non-response bias, which could affect our conclusions if the prevalence of MGUS was significantly different among those that responded as compared to those that did not respond. MGUS prevalence and MM incidence could also be underreported due to ascertainment bias. This would bias our estimates of the rates of MGUS development and progression from MGUS to MM downward and cause us to underestimate the preclinical dwell time. Additionally, extensions to our modeling framework could simulate multiple cohorts to account for period effects that likely shape MGUS prevalence and MM incidence. We considered only a single year of MGUS prevalence and MM incidence data. Our results were robust to the year of MM incidence data considered, but future work could assess whether increases in MM incidence over time can be attributed to an aging population or instead reflect an increase in MM risk over time. Finally, although we identified differences in MGUS development by gender and race/ethnicity, future work is needed to identify whether these differences can be explained by differences in the distribution of known risk factors, such as obesity18, across gender and race/ethnicity.

This study found that disparities in MM incidence can be explained by an increased incidence of MGUS, not an increased rate of progression to MM, among healthy men and non-Hispanic Black people. Future studies are needed to identify whether these differences can be explained by differences in the distribution of known risk factors.

Methods

Ethical approval was obtained from Washington University School of Medicine in St. Louis (IRB 202110041). All analyses were performed in accordance with relevant guidelines and regulations.

Model overview

Compartmental model

To model the natural history of multiple myeloma22, we constructed a discrete-time, multistate compartmental model consisting of four health states: healthy (H), monoclonal gammopathy of undetermined significance (MGUS), multiple myeloma (MM), and death (D) (Fig. 4).

Fig. 4: Schematic of compartment model.
figure 4

The schematic of the compartmental model of the natural history of multiple myeloma for a birth cohort is shown. Boxes represent the compartments, and arrows represent flows between compartments. \({{{{{{\rm{P}}}}}}}_{{{{{{\rm{H}}}}}}}\) is the proportion of the birth cohort that is healthy, \({{{{{{\rm{P}}}}}}}_{{{{{{\rm{MGUS}}}}}}}\) is the proportion of the birth cohort that has MGUS, \({{{{{{\rm{P}}}}}}}_{{{{{{\rm{MM}}}}}}}\) is the proportion of the birth cohort that has MM, and \({{{{{{\rm{P}}}}}}}_{{{{{{\rm{D}}}}}}}\) is the proportion of the both cohort that has died. \({\lambda }_{{MGUS}}\left(a,\, s,\, r\right)\) is the rate that a healthy individual of age a, gender s, and race/ethnicity r develops MGUS, and \({\lambda }_{{MM}}\left(a,\, s,\, r\right)\) is the rate that an individual of age a, gender s, and race/ethnicity r with MGUS develops MM. \({\mu }_{H}(a,\, s,\, r)\) is the mortality rate for a healthy individual of age a, gender s, and race/ethnicity r. \({\mu }_{{MGUS}}(a,\, s,\, r)\) is the mortality rate for an individual of age a, gender s, and race/ethnicity r with MGUS, and \({\mu }_{{MM}}(a,\, s,\, r)\) is the mortality rate for an individual of age a, gender s, and race/ethnicity r with MM.

For a given birth cohort of gender s and race/ethnicity r, the proportion P of the cohort in each of these states at a given age a is defined by the following set of differential equations:

$$\frac{d{P}_{H}}{{da}}=-{\lambda }_{{{{{{\rm{MGUS}}}}}}}\left(a,\, s,\, r\right){P}_{H}-{\mu }_{H}\left(a,\, s,\, r\right){P}_{H},$$
(1)
$$\frac{d{P}_{{{{{{\rm{MGUS}}}}}}}}{{da}}={\lambda }_{{{{{{\rm{MGUS}}}}}}}\left(a,\, s,\, r\right)\, {P}_{H}-{\lambda }_{{{{{{\rm{MM}}}}}}}\, \left(a,\, s,\, r\right)\, {P}_{{{{{{\rm{MGUS}}}}}}}-{\mu }_{{{{{{\rm{MGUS}}}}}}}(a,\, s,\, r)\, {P}_{{{{{{\rm{MGUS}}}}}}},$$
(2)
$$\frac{d{P}_{{{{{{\rm{MM}}}}}}}}{{da}}={\lambda }_{{{{{{\rm{MM}}}}}}}\, \left(a,\, s,\, r\right)\, {P}_{{{{{{\rm{MGUS}}}}}}}-{\mu }_{{{{{{\rm{MM}}}}}}}(a,\, s,\, r)\, {P}_{{{{{{\rm{MM}}}}}}},$$
(3)
$$\frac{d{P}_{D}}{{da}}={\mu }_{H}\left(a,\, s,\, r\right){P}_{H}+{\mu }_{{{{{{\rm{MGUS}}}}}}}\left(a,\, s,\, r\right){P}_{{{{{{\rm{MGUS}}}}}}}+{\mu }_{{MM}}\left(a,\, s,\, r\right){P}_{{{{{{\rm{MM}}}}}}}.$$
(4)

In Eq. (1), \({\lambda }_{{{{{{\rm{MGUS}}}}}}}(a,\, s,\, r)\) is the rate that a healthy individual of age a, gender s, and race/ethnicity r develops MGUS, which we computed as

$${\lambda }_{{{{{{\rm{MGUS}}}}}}}\left(a,\, s,\, r\right)={e}^{{\gamma }_{{{{{{\rm{MGUS}}}}}}}+{\beta }_{{{{{{\rm{MGUS}}}}}},a}a+{\beta }_{{{{{{\rm{MGUS}}}}}},s}+{\beta }_{{{{{{\rm{MGUS}}}}}},r}}.$$
(5)

In Eq. (5), \({\gamma }_{{{{{{\rm{MGUS}}}}}}}\) is an intercept term, such that \({e}^{{\gamma }_{{{{{{\rm{MGUS}}}}}}}}\) denotes the baseline rate of developing MGUS independent of all other covariates. Additionally, \({\beta }_{{{{{{\rm{MGUS}}}}}},a}\), \({\beta }_{{{{{{\rm{MGUS}}}}}},s}\), and \({\beta }_{{{{{{\rm{MGUS}}}}}},r}\) are the coefficients that modulate the respective effects of age, gender, and race/ethnicity on the rate of MGUS development. In Eq. (2), \({\lambda }_{{{{{{\rm{MM}}}}}}}(a,\, s,\, r)\) is the rate that an individual of age a, gender s, and race/ethnicity r with MGUS develops MM and is calculated as \({\lambda }_{{{{{{\rm{MM}}}}}}}\left(a,\, s,\, r\right)={e}^{{\lambda }_{{{{{{\rm{MM}}}}}}}+{\beta }_{{{{{{\rm{MM}}}}}},a}a+{\beta }_{{{{{{\rm{MM}}}}}},{a}^{2}}{a}^{2}+{\beta }_{{{{{{\rm{MM}}}}}},s}+{\beta }_{{{{{{\rm{MM}}}}}},r}}\). Finally, in Eqs. (14), \({\mu }_{H}(a,\, s,\, r)\) is the mortality rate for healthy individuals, \({\mu }_{{{{{{\rm{MGUS}}}}}}}(a,\, s,\, r)\) is the mortality rate for individuals with MGUS, and \({\mu }_{{{{{{\rm{MM}}}}}}}(a,s,r)\) is the mortality rate for individuals with MM. We followed Therneau et al.22 by defining distinct mortality rates for individuals with MGUS and MM.

Equations (14) cannot be solved analytically and must be simulated forward in time numerically. Doing so yields \({P}_{H}(a,\, s,\, r)\), \({P}_{{{{{{\rm{MGUS}}}}}}}(a,\, s,\, r)\), \({P}_{{{{{{\rm{MM}}}}}}}(a,\, s,\, r)\), and \({P}_{D}(a,\, s,\, r)\), representing the proportion of the birth cohort of individuals of gender s and race/ethnicity r that occupies each state at age a.

Prevalence and incidence

The quantities \({P}_{{{{{{\rm{MGUS}}}}}}}(a,\, s,\, r)\) and \({P}_{{{{{{\rm{MM}}}}}}}(a,\, s,\, r)\) obtained from Eqs. (14) do not represent the respective prevalence of MGUS and MM, because the denominator includes individuals in the birth cohort that previously died. To calculate age-stratified prevalence p of MGUS and MM among individuals of gender s and race/ethnicity r, we conditioned upon the proportion of the birth cohort that was alive at age a, such that

$${p}_{{{{{{\rm{MGUS}}}}}}}\left(a,\, s,\, r\right)=\frac{{P}_{{{{{{\rm{MGUS}}}}}}}(a,\, s,\, r)}{1-{P}_{D}(a,\, s,\, r)},$$
(6)
$${p}_{{{{{{\rm{MM}}}}}}}\left(a,\, s,\, r\right)=\frac{{P}_{{{{{{\rm{MM}}}}}}}(a,\, s,\, r)}{1-{P}_{D}(a,\, s,\, r)}.$$
(7)

Similarly, calculating age-stratified incidence i of MGUS and MM among individuals of gender s and race/ethnicity r required that we condition upon the proportion of the birth cohort that was alive at age a. Therefore, we calculated \({i}_{{{{{{\rm{MGUS}}}}}}}(a,s,r)\) and \({i}_{{{{{{\rm{MM}}}}}}}(a,s,r)\) from Eqs. (17) as

$${i}_{{{{{{\rm{MGUS}}}}}}}\left(a,\, s,\, r\right)={\lambda }_{{{{{{\rm{MGUS}}}}}}}\left(a,\, s,\, r\right){p}_{H}\left(a,\, s,\, r\right),$$
(8)
$${i}_{{{{{{\rm{MM}}}}}}}\left(a,\, s,\, r\right)={\lambda }_{{{{{{\rm{MM}}}}}}}\left(a,\, s,\, r\right){p}_{{{{{{\rm{MGUS}}}}}}}\left(a,\, s,\, r\right).$$
(9)

In Eq. (8), \({p}_{H}(a,\, s,\, r)\) is the prevalence of healthy individuals of age a, gender s, and race/ethnicity r, which we computed as \(1-{p}_{{{{{{\rm{MGUS}}}}}}}\left(a,\, s,\, r\right)-{p}_{{{{{{\rm{MM}}}}}}}(a,\, s,\, r)\) from Eqs. (67).

Model fitting

Data

Using the continuous National Health and Nutritional Examination Surveys (NHANES), 1999–200414,15,16, we obtained empirical estimates of MGUS prevalence for 4,355 individuals 50 years of age and older stratified by age, gender, and race/ethnicity (Table 1). We aggregated the continuous NHANES into 5-year age bins in order to increase the number of MGUS-positive samples within each group. This data is the most current nationally representative survey on MGUS prevalence within the United States. Additionally, we obtained age-, gender-, and race/ethnicity-stratified estimates of MM incidence in 2010 from the Surveillance, Epidemiology, and End Results (SEER) Program26. MM incidence from 2010 was chosen specifically because the 6-year gap between the SEER and NHANES datasets approximates the gap between the 50–54 age group (i.e., the youngest age group in continuous NHANES for which MGUS data was available) and the 55–59 age group (i.e., the approximate age group for which MM incidence begins to substantially increase in SEER). To confirm that our conclusions were robust to the year of MM incidence data, we performed an alternative analysis in which used SEER data from 2004 (see sensitivity analysis in the Supplementary Information).

Table 1 Characteristics of the data used for modeling fitting

We obtained population estimates by age, gender, and race/ethnicity in 2010 from Centers for Disease Control (CDC) WONDER database27. For individuals 85 years of age or greater, population estimates were aggregated. To disaggregate population estimates for 85+ years, we assumed that the population distribution of individuals 85–99 years of age in 2010 was equivalent to the distribution in 2014, the first year in which CDC WONDER did not aggregate this age group28.

For healthy individuals, we made use of age-, gender-, and race/ethnicity-stratified mortality rates in 2010 from the CDC Life Tables29. For MGUS-positive individuals, we followed Therneau et al.22 and assumed that mortality in MGUS-positive individuals was 1.25 times greater than the baseline age- and race/ethnicity-specific mortality rate for men and 1.11 times greater than the baseline age- and race/ethnicity-specific mortality rate for women (see sensitivity analysis in the Supplementary Information). Finally, for individuals with MM, we estimated gender- and race/ethnicity-stratified all-cause mortality rates by fitting an exponential survival distribution to MM all-cause survival data provided by SEER (Fig. S1). Because these survival curves are derived from the cohort of all MM individuals within SEER, they provide an estimate of the mean all-cause mortality rate among MM individuals, irrespective of treatment characteristics.

Likelihood function

A crucial component of model fitting is the likelihood, which calculates the probability of the model given the data. In this study, the likelihood is equal to the product of the likelihoods for the two data types: (1) MGUS prevalence and (2) MM incidence. Because we adopt a fully Bayesian approach, we do not specify a weight for each data type during model fitting.

MGUS prevalence

We modeled the probability that \({y}_{a,s,r}\) individuals of age a, gender s, and race/ethnicity r were MGUS-positive among \({n}_{a,s,r}\) surveyed individuals of age a, gender s, and race/ethnicity r from NHANES as a binomial process, such that

$$\Pr \left({y}_{a,s,r}| {n}_{a,s,r},\, {p}_{{{{{{\rm{MGUS}}}}}}}\left(a,\, s,\, r\right)\right)={p}_{{{{{{\rm{MGUS}}}}}}}{\left(a,\, s,\, r\right)}^{{y}_{a,s,r}}{\left(1-{p}_{{{{{{\rm{MGUS}}}}}}}\left(a,\, s,\, r\right)\right)}^{{n}_{a,\, s,\, r}-{y}_{a,\, s,\, r}}.$$
(10)

In Eq. (10), \({p}_{{{{{{\rm{MGUS}}}}}}}\left(a,\, s,\, r\right)\) is the model-predicted prevalence of MGUS among individuals of age a, gender s, and race/ethnicity r from Eqs. (14) and Eq. (6).

We aggregated the samples from the NHANES data into 5-year age bins in order to increase the number of MGUS-positive samples within each group. Accordingly, we calculated a weighted prevalence  \({\bar{p}}_{{{{{{\rm{MGUS}}}}}}}\left(\left[{a}_{1},\, {a}_{2}\right],\, {s},\, r\right)\) between ages a1 and a2 as

$${\bar{p}}_{{{{{{\rm{MGUS}}}}}}}\left(\left[{a}_{1},\, {a}_{2}\right],\, s,r\right)=\frac{\mathop{\sum }\nolimits_{a={a}_{1}}^{{a}_{2}}{N}_{a,\, s,\, r}{p}_{{{{{{\rm{MGUS}}}}}}}(a,\, s,\, r)}{\mathop{\sum }\nolimits_{a={a}_{1}}^{{a}_{2}}{N}_{a,\, s,\, r}},$$
(11)

where \({N}_{a,s,r}\) is the size of the subpopulation of age a, gender s, and race/ethnicity r. For age-binned MGUS prevalence, the probability of observing \({y}_{\left[{a}_{1},\, {a}_{2}\right],\, s,\, r}\) individuals between the ages of a1 and a2 among \({n}_{\left[{a}_{1},\, {a}_{2}\right],\, s,\, r}\) total individuals surveyed is then calculated equivalently to Eq. (10) using the weighted predicted prevalence from Eq. (11). Therefore, the likelihood of the model given the NHANES data can be expressed as

$$ {{{{{\mathscr{L}}}}}}\left(\vec{{{{{{\bf{y}}}}}}},\, \vec{{{{{{\bf{n}}}}}}},\, {\vec{{{{{{\bf{a}}}}}}}}_{{{{{{\bf{l}}}}}}},\, {\vec{{{{{{\bf{a}}}}}}}}_{{{{{{\bf{u}}}}}}} | \vec{{{{{{\boldsymbol{\theta }}}}}}}\right) \\ =\mathop{\prod }\limits_{i=1}^{n}\mathop{\prod}\limits_{s\in \left\{M,F\right\}}\mathop{\prod}\limits_{r\in \left\{{{{{{\rm{NHW}}}}}},{{{{{\rm{NHB}}}}}}\right\}}{{{{{\rm{Binomial}}}}}}\left({y}_{\left[{a}_{l,i},{a}_{u,i}\right],\, s,\, r} | {n}_{\left[{a}_{l,\, i},{a}_{u,\, i}\right],s,\, r},{\bar{p}}_{{{{{{\rm{MGUS}}}}}}}\left(\left[{a}_{l,i},{a}_{u,\, i}\right],\, s,\, r\right)\right).$$
(12)

In Eq. (12), \(\vec{{{{{{\boldsymbol{\theta }}}}}}}\) is the vector of parameters to be estimated, and \(\vec{{{{{{\bf{y}}}}}}}\) and \(\vec{{{{{{\bf{n}}}}}}}\) are the vectors of NHANES MGUS prevalence data where for each age bin the lower bound is defined by \({\vec{{{{{{\bf{a}}}}}}}}_{{{{{{\bf{l}}}}}}}\) and the upper bound is defined by \({\vec{{{{{{\bf{a}}}}}}}}_{{{{{{\bf{u}}}}}}}\).

MM incidence

Because MM incidence was reported as a continuous rate, we modeled the logarithm of MM incidence from SEER in individuals of age a, gender s, and race/ethnicity r as a normal distribution with mean \(\log \left({i}_{{{{{{\rm{MM}}}}}}}\left(a,\, s,\, r\right)\right)\) and variance τ2 where \({i}_{{{{{{\rm{MM}}}}}}}(a,\, s,\, r)\) is the model-predicted MM incidence among individuals of age a, gender s, and race/ethnicity r calculated from Eq. (9). We estimated τ2 as a parameter in our model and assumed that it did not depend upon age, gender, or race/ethnicity.

Similar to the MGUS prevalence data, MM incidence was binned by age. To accommodate this in our likelihood framework, we first calculated a weighted predicted incidence  \(\bar{i}\left(\left[{a}_{1},\, {a}_{2}\right],\, s,\, r\right)\) for individuals between ages a1 and a2 as

$$\bar{i}\left(\left[{a}_{1},{a}_{2}\right],\, s,\, r\right)=\frac{\mathop{\sum }\nolimits_{a={a}_{1}}^{{a}_{2}}{N}_{a,\, s,\, r}{i}_{{{{{{\rm{MM}}}}}}}(a,\, s,\, r)}{\mathop{\sum }\nolimits_{a={a}_{1}}^{{a}_{2}}{N}_{a,\, s,\, r}},$$
(13)

where \({N}_{a,s,r}\) is the population of individuals of age a, gender s, and r. The likelihood of the SEER MM incidence data can be expressed as

$${{{{{\mathscr{L}}}}}}\left(\vec{{{{{{\bf{x}}}}}}},\, {\vec{{{{{{\bf{a}}}}}}}}_{{{{{{\bf{l}}}}}}},\, {\vec{{{{{{\bf{a}}}}}}}}_{{{{{{\bf{u}}}}}}} \, | \, \vec{{{{{{\boldsymbol{\theta }}}}}}}\right)=\mathop{\prod }\limits_{i=1}^{n}\mathop{\prod}\limits_{s\in \left\{M,F\right\}}\mathop{\prod}\limits_{r\in \{{{{{{\rm{NHW}}}}}},{{{{{\rm{NHB}}}}}}\}}{{{{{\rm{Normal}}}}}}\left(\log \left({x}_{\left[{a}_{l,i},{a}_{u,i}\right],s,r}\right)|\bar{i}\left(\left[{a}_{l,i},\, {a}_{u,i}\right],\, s,\, r\right),\, {\tau }^{2}\right).$$
(14)

In Eq. (14),  \(\vec{{{{{{\boldsymbol{\theta }}}}}}}\) is the vector of estimated parameters, and \(\vec{{{{{{\bf{x}}}}}}}\) is the vector of SEER MM incidence where for each age bin the lower bound is defined by \({\vec{{{{{{\bf{a}}}}}}}}_{{{{{{\bf{l}}}}}}}\) and the upper bound is defined by \({\vec{{{{{{\bf{a}}}}}}}}_{{{{{{\bf{u}}}}}}}\).

Priors

We assumed uniform prior distributions for all model parameters (Table 2). The choice of the upper and lower bound for the prior distribution for each parameter was informed by the plausible ranges that yielded real model output. In general, we specified wide upper and lower bounds to allow the inference algorithm the flexibility to explore the parameter space. However, we restricted the prior distribution of \({\beta }_{{{{{{\rm{MGUS}}}}}},a}\) to [0,1]. The lower bound of this distribution was chosen because MGUS prevalence has been observed to increase with age3. Nevertheless, a sensitivity analysis was performed to evaluate how the choice of prior distribution affected the parameter inferences.

Table 2 Parameter definitions and prior distributions

Markov chain Monte Carlo

We estimated the parameters of our model from the NHANES and SEER data using a Markov chain Monte Carlo (MCMC) algorithm30. We ran the MCMC for 1,000,000 samples, applied a burn-in of 500,000 samples, and thinned every 50 samples to reduce autocorrelation, thereby obtaining a posterior distribution of 10,000 samples. We assessed convergence by running five chains in parallel and computing the Gelman-Rubin statistic for each parameter, where values less than 1.1 provide statistical support for convergence31. Converged chains were pooled to yield a final posterior distribution of 50,000 samples.

Analyses

After fitting the model and comparing the model predictions to the NHANES and SEER data, we used the fitted model to explore the epidemiology of MGUS and MM by age, gender, and race/ethnicity. First, we analyzed the estimated model parameters to isolate the contributions of age, gender, and race/ethnicity to the rates of progression from healthy to MGUS and from MGUS to MM. Next, we computed the expected duration of the preclinical dwell time (i.e., time from MGUS onset to MM onset) by age, gender, and race/ethnicity. The preclinical dwell time depends upon the rate of progression from MGUS to MM and the competing baseline mortality rate, both of which depend upon age, gender, and race/ethnicity. This relationship between the preclinical dwell time and the baseline mortality rate exists because individuals with MGUS must survive sufficiently long to progress to MM.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.