Introduction

Rheumatoid arthritis (RA) is one of the most common autoimmune diseases, affecting nearly 1.3 million people in the United States, and can severely impact patient quality of life1. RA is associated with many comorbidities and several extra-articular manifestations, including the most prevalent lung manifestation, interstitial lung disease (ILD). ILD is a progressive fibrotic disease of the lung and is associated with increased morbidity, mortality, and healthcare resource utilization2,3,4.

The prevalence of ILD among patients with RA has shown great variability in prior studies, ranging from 1 to 58% depending on the methodology and definitions used (for example, clinically significant or asymptomatic pre-clinical ILD; baseline or cumulative prevalence)5,6,7,8,9. Clinically significant ILD presents in approximately 10% of patients with RA10, and may be defined by the presence of respiratory symptoms, such as shortness of breath and coughing9. Pre-clinical ILD may be present in 33–60% of patients with RA, measurable by high-resolution computed tomography or pulmonary function tests, with no respiratory symptoms6,9,11. While patients with RA may lack clinical symptoms of ILD, they may be at high risk for developing this comorbidity12; thus, further studies are warranted in order to better understand the prevalence and time-to-onset of RA–ILD. The 10-, 20-, and 30-year cumulative incidence rates of ILD among patients with RA have been estimated as 4%, 6%, and 8%, respectively, and are significantly higher than those among patients without RA (10-, 20-, and 30-year cumulative incidence all ≤ 1%)13. With an estimated 5-year mortality rate of approximately 36–39%, a survival time of ≤ 10 years4,14, and delays in diagnosis potentially increasing the mortality risk15, prompt diagnosis and identification of patients with RA at high risk for development of ILD is crucial.

Well-established risk factors for RA–associated ILD (RA–ILD) have been identified from observational and medical records database studies (older age, male sex, history of smoking, and seropositivity for rheumatoid factor (RF) and/or anti-cyclic citrullinated peptide (anti-CCP) antibodies13,16,17. Nevertheless, given the increased incidence and mortality associated with RA–ILD, these risk factors are insufficient, and thus emphasize the need to identify additional risk factors that could lead to earlier diagnosis, and for collaboration between rheumatologists and pulmonologists. For example, two multi-centre, prospective, early RA inception cohorts (the Early RA Study and the Early RA Network) found that a higher risk of RA–ILD may be associated with factors such as rheumatoid nodules, higher baseline erythrocyte sedimentation rate (ESR), and a longer time from first RA symptoms to first outpatient visit18. Other potential risk factors include the presence of erosions or destructive joint changes13.

There are limited real-world data available for evaluating ILD among patients with RA, and further studies are needed to better understand the prevalence of and risk factors for ILD, including how ILD impacts RA disease activity, use of biologic treatments, and rheumatologist encounters.

The objectives of this analysis of real-world data were to evaluate the prevalence and time to onset of ILD in patients with RA. Exploratory objectives included a comparison of baseline clinical characteristics of patients with RA versus patients with RA–ILD and the evaluation of risk factors for RA–ILD. Further analyses were conducted with a subset of the population in order to compare RA disease activity, rheumatologist encounters, and treatments in a cohort of patients with RA versus a cohort of patients with RA–ILD, using data collected in the periods before and after the earliest recorded ILD diagnosis date.

Methods

Data source

Patient demographics and disease characteristics were retrospectively analyzed following data extraction from the Discus Analytics JointMan database, a large US electronic medical records-based dataset initiated in March 2009. The JointMan database includes > 17,000 rheumatology patients covered by commercial, Medicare, or Medicaid insurance health plans. Practices across the following eight states are included: Washington, New York, Oregon, Florida, Georgia, California, Wisconsin, and Kentucky. Patient data were collected at rheumatology centers and were de-identified prior to analysis. In addition to electronic medical record data, the JointMan user interface collects clinical outcomes recorded by physicians at the time of the encounter.

Patient population

Patients were included if they were aged ≥ 18 years at the initial visit with a rheumatologist participating in the JointMan network, had a provider-selected diagnosis of RA between January 1, 2009 and September 20, 2019, and had ≥ 1 visit after the initial visit date. Patients were excluded if their initial encounter occurred after RA diagnosis or if they experienced a drug-induced ILD diagnosis [International Classification of Disease, Tenth Revision, Clinical Modification (ICD-10-CM) codes J70.2 and J70.4] at any time during the study period. Patients were assigned to either the RA cohort (patients with confirmed RA but no diagnosis of ILD during the study period) or the RA–ILD cohort (patients with a provider diagnosis of non–drug-induced ILD on or after the initial RA diagnosis date). RA index date was defined as the first RA diagnosis date recorded in the JointMan database (provided by the rheumatologist).

The overall study population was comprised of patients who were followed from the day after the RA index date to the last patient encounter date or the end of the study (September 20, 2019), whichever occurred first. RA was diagnosed according to the ICD, Ninth Revision, CM (ICD-9-CM) code 714.0 and ICD-10-CM codes M05 and M06. ILD was identified by ICD diagnosis codes (ICD-9-CM codes: 516.0, 516.2, 516.3, 516.4, 516.5, 516.8, and 516.9; ICD-10-CM codes: J84.0, J84.1, J84.2, J84.81, J84.82, J84.83, J84.89, and J84.9) or by provider indication.

A subanalysis was conducted in a set of patients grouped based on ILD diagnosis. For the subanalysis population, the ILD diagnosis index was defined as the first date of ILD diagnosis recorded in the JointMan database (for patients in the RA–ILD cohort), and patient characteristics were described for the 90-day periods before and after the ILD diagnosis index. For patients without ILD, the index date was based on distribution of the number of days from RA diagnosis to ILD diagnosis in the RA–ILD cohort; characteristics were described for the 90-day periods before and after the index date (Supplementary Fig. S1).

Primary endpoints

The primary endpoints, assessed in the overall study population, were prevalence and time to onset of ILD. Prevalence was defined as the proportion of patients with RA and a diagnosis of ILD divided by the total number of patients with RA during the study period. Time to onset of ILD was defined as the time from initial RA diagnosis to first observed non-drug-induced ILD diagnosis.

Exploratory endpoints

Exploratory endpoints, assessed in the exploratory analysis population, included baseline demographics, comorbidities, RA characteristics, and overall RA disease activity in the RA cohort compared with the RA–ILD cohort. RA characteristics included joint stiffness, erosions, extra-articular disease, anti-CCP antibodies, joint swelling, ESR, C-reactive protein (CRP), and Clinical Disease Activity Index (CDAI). CDAI remission score was defined as ≤ 2.8; CDAI low, moderate, and high disease activity scores were defined as > 2.8–10, > 10–22, and > 22, respectively19. Simplified Disease Activity Index (SDAI) remission score was defined as ≤ 3.3; SDAI low, moderate, and high disease activity scores were defined as > 3.3 to 11, > 11 to 26, and > 26, respectively19. Disease Activity Score in 28 joints using CRP (DAS28 [CRP]) remission score was defined as ≤ 2.3; DAS28 (CRP) low, moderate, and high disease activity scores were defined as > 2.3 to 2.7, > 2.7 to < 4.1, and ≥ 4.1, respectively20. DAS28 (ESR) remission score was defined as < 2.6; DAS28 (ESR) low, moderate, and high disease activity scores were defined as 2.6 to < 3.2, 3.2–5.1, and > 5.1, respectively.19 Routine Assessment of Patient Index Data 3 (RAPID3) remission score was defined as ≤ 3; RAPID3 low, moderate, and high disease activity scores were defined as > 3 to 6, > 6 to 12, and > 12, respectively21. Variables were assessed as potential predictors of RA–ILD.

Subanalysis endpoints

For patients included in the subanalysis population, CDAI and RAPID3 scores, swollen and swollen28 joint counts, the number of rheumatologist encounters, and treatment utilization pre- and post-ILD diagnosis index were also assessed. The swollen and swollen28 joint counts are components of the DAS/DAS28 score: the swollen joint count is an assessment of 28 or more (up to 44) joints, while the swollen28 joint count is an assessment of only 28 pre-selected joints22.

Statistical analysis

The prevalence (95% confidence intervals [CIs]) of the first observed ILD diagnosis during follow-up was calculated. The time to ILD diagnosis was examined using unadjusted Kaplan–Meier survival curves. Descriptive statistics for continuous baseline variables were compared using Student’s t-test and percentages for categorical and binary baseline variables were compared using the Chi-square test.

Potential predictors of RA–ILD were analyzed by a Cox regression model. Patient demographic data and comorbidities were collected at baseline and were controlled for in the Cox model. RA characteristics were identified during and after the initial RA diagnosis and were controlled for as time-varying covariates in the Cox model. The final covariate lists were based on clinical rationale and model fitting; hazard ratios, 95% confidence intervals, and p values were provided for each covariate. Statistical significance for model inclusion was set at p < 0.05.

The number and percentage of patients with rheumatologist visits, treatment utilization, and each disease activity score in the pre- and post-index periods were calculated. P values for disease activity score category compared pre- and post-index periods and correspond to Fisher’s exact test or Chi-square test with statistical significance set at p < 0.05.

Ethical approval

This study was conducted in accordance with the International Society for Pharmacoepidemiology Guidelines for Good Pharmacoepidemiology Practices and applicable regulatory requirements23. The study protocol was reviewed by the internal BMS Observational Protocol Review Committee (OPRC). No identifiable protected health information was extracted or accessed from the database during the study, therefore the BMS OPRC confirmed that this analysis did not require ethical oversight. Additionally, the study did not involve the collection, use, or transmittal of individually identifiable data, and data were collected in the setting for the usual care of the patient. Informed consent from the study participants was not required because the dataset used in this observational study consisted of de-identified secondary data released for research purposes.

Results

Overall study population, persistence, and time to onset of ILD

In the overall study population, a total of 8963 patients with RA were identified during the period of January 1, 2009 to September 20, 2019. The prevalence (95% CI) of ILD in the overall population of patients with RA was 4.1% (3.7–4.5%).

Of the patients in the RA–ILD cohort, 91.8% (n = 337/367) had their first ILD diagnosis after their RA diagnosis. The mean time to onset of ILD after RA diagnosis was 3.3 years (median 2.3 years; Fig. 1).

Figure 1
figure 1

Copyright of the authors. Reprinted by Nature Portfolio, part of Springer Nature.

Kaplan–Meier survival curve estimate: time to ILD onset after RA diagnosis in the overall study population. ILD Interstitial lung disease, RA Rheumatoid arthritis. Previously presented at EULAR Congress held 3–6 June, 2020, oral presentation number OP0035.

Baseline patient demographics and disease characteristics

In the exploratory analysis population, there were a total of 5817 patients; 96.5% (n = 5612) had RA and no comorbid ILD diagnosis (RA cohort) and 3.5% (n = 205) had RA–ILD (RA–ILD cohort). Compared with the RA cohort, a significantly higher proportion of patients in the RA–ILD cohort were older, male, white, had Medicare as their primary insurance category, and had a history of chronic obstructive pulmonary disease (COPD) (Table 1). The proportion of patients with a smoking status of ‘yes’ was similar between cohorts.

Table 1 Baseline patient demographics and disease characteristics of patients in the exploratory analysis populationa, and split by patients in the RA and RA–ILD cohorts. Data are n (%) unless stated otherwise.

Patients in the RA–ILD cohort also had more severe and more active RA at baseline than patients in the RA cohort. Most RA characteristics or manifestations were significantly more prevalent in the RA–ILD cohort (RF + , rheumatoid nodules, erosions, extra-articular disease, and anti-CCP positivity). In addition, baseline ESR level was significantly higher in the RA–ILD cohort (Table 1). Patients in the RA–ILD cohort versus the RA cohort had higher mean baseline scores for CDAI, SDAI, DAS28 (CRP), and DAS28 (ESR); RAPID3 scores were similar between cohorts (Table 2). A higher proportion of patients in the RA–ILD cohort were in the high disease activity category for SDAI, DAS28 (CRP), and DAS28 (ESR) than those in the RA cohort.

Table 2 Baseline RA disease activity of patients in the exploratory analysis populationa, and split by patients in the RA and RA–ILD cohorts.

Risk factors for RA–ILD

Potential predictors of RA–ILD diagnosis were assessed in the exploratory analysis population (patients with 6 months of follow-up). Older age (≥ 65 years old) and a history of COPD at baseline were shown to be risk factors for developing ILD (Fig. 2). Several time-varying covariates (anti-CCP positivity, CRP > 5 mg/L, and a moderate-to-high CDAI score) were also shown to be predictive of developing ILD. No other covariates were significant based on evaluation of confidence intervals.

Figure 2
figure 2

Covariates potentially predictive of RA–ILD diagnosis in the exploratory analysis population (n = 5817)a. *p values are significant (p < 0.05); analyzed by Cox proportional hazards models. aPatients from the overall study population with a 6-month follow-up period from baseline. bBinary cut-offs were anti-CCP: > 20 (anti-CCP +) = 1, ≤ 20 (anti-CCP −), and missing = 0; ESR: > 28 mm/h = 1, ≤ 28 mm/h, and missing = 0; CRP: > 5 mg/L or > 0.5 mg/dL39 = 1, ≤ 5 mg/L or ≤ 0.5 mg/dL, and missing = 0; CDAI: moderate/high CDAI score = 1, remission/low/missing CDAI score = 0. CCP cyclic citrullinated peptide, CDAI Clinical Disease Activity Index, CI confidence interval, CRP C-reactive protein, COPD chronic obstructive pulmonary disease, ESR erythrocyte sedimentation rate, HR hazard ratio, ILD interstitial lung disease, RA rheumatoid arthritis, RA–ILD RA–associated ILD. Figure reprinted from ACR Convergence held November 5–9, 2020. The American College of Rheumatology does not guarantee, warrant, or endorse any commercial products or services. Reprinted by Nature Portfolio, part of Springer Nature.

Subanalysis: comparison of outcomes for patients in the RA and RA–ILD cohorts before and after ILD diagnosis

In order to evaluate RA disease activity, rheumatologist encounters, and treatments in patients in the RA–ILD versus RA cohort, data from the 90-day periods before and after the earliest recorded ILD diagnosis date were compared. In total, there were 7150 patients with RA only and 240 patients with RA–ILD who had data in both the 90 days prior to and 90 days after the ILD diagnosis index.

For both patient cohorts, disease severity measure missingness was lower in the post-index period compared with the pre-index period (for example, the proportion of patients with a CDAI score in the RA–ILD cohort post- versus pre-index was 94.6% versus 13.3%, and in the RA cohort post- versus pre-index was 49.6% versus 24.7%; Table 3). In the post-index period, for disease severity, ≥ 90% of patients in the RA–ILD cohort had CDAI or RAPID-3 scores reported compared with ~ 50% for patients in the RA cohort. In the post-index period, the proportion of patients in each severity category were similar between patients in the RA–ILD and RA cohorts. Approximately 97% of patients in the RA–ILD cohort had a swollen or swollen28 score in the post-index period compared with 52% of patients in the RA cohort (Fig. 3). Patients in the RA–ILD cohort reported more swollen joints in the post-index period compared with those in the RA cohort (Fig. 3).

Table 3 Disease activity in the subanalysis populationa: pre- and post-ILD diagnosis index date periods.
Figure 3
figure 3

Subanalysisa: Mean swollen joint counts in the pre- and post-ILD diagnosis index date periods for patients in the RA cohort (left) and RA–ILD cohort (right). aPatients with data collected 90 days pre- and 90 days post-ILD diagnosis index. bNon-missing values compared overall cohort numbers: RA cohort n = 7150 and RA–ILD cohort n = 240. cIn the RA cohort (patients without ILD), a stochastically determined modifier was imputed and added to the initial RA diagnosis based on the frequency distribution of days for patients in the RA–ILD cohort and characteristics were described for the 90-day periods before and after. ILD Interstitial lung disease, RA Rheumatoid arthritis, RA–ILD RA–associated ILD, SD Standard deviation.

For both the pre- and post-index periods, a greater proportion of patients in the RA–ILD cohort had rheumatologist visits compared with patients in the RA cohort. Patients in the RA cohort had a similar number of rheumatologist visits in the pre- and post-ILD diagnosis index periods: 69.8% (n = 4990/7150) versus 68.2% (n = 4877/7150), respectively. However, for patients in the RA–ILD cohort, there was an increase in the number of rheumatologist visits in the post-ILD diagnosis index period; pre- versus post-ILD diagnosis index periods: 74.2% (n = 178/240) versus 99.6% (n = 239/240), respectively.

For both the pre- and post-index periods, a greater proportion of patients in the RA–ILD cohort used glucocorticosteroids/disease-modifying antirheumatic drugs (DMARDs) and biologics compared with patients in the RA cohort. For patients in the RA–ILD cohort, a similar proportion of patients in the post-ILD versus pre-ILD diagnosis index periods used glucocorticosteroids/DMARDs (82% vs. 83%) and biologics (48% vs. 45%). However, for patients in the RA cohort, a lower proportion of patients used glucocorticoids/DMARDs (58% vs. 74%) and biologics (31% vs. 35%) in the post-ILD diagnosis index period compared with the pre-ILD diagnosis index period.

Discussion

In this large, real-world study, using data from the United States-based Discus Analytics JointMan database, the prevalence of RA–ILD was 4.1% and the mean time to onset of ILD after RA diagnosis was 3.3 years. We identified several risk factors for RA–ILD: age (≥ 65 years), COPD at baseline, anti-CCP positivity, CRP > 5 mg/L, and a moderate-to-high CDAI score. Patients with RA–ILD have increased morbidity compared with patients with RA without ILD3, which is supported by our results showing that patients with RA–ILD had more active RA at baseline and after ILD diagnosis. Consequently, patients with RA–ILD may require more clinical consultation.

The prevalence of RA–ILD ascertained from our study (4.1%) falls towards the lower end of the range previously reported; however, those studies had differing methodology and ILD definitions5,6,7,8,9. A recent United States-based cohort study using Medicare claims data from > 500,000 patients between 2008 and 2017 estimated the baseline prevalence of RA–ILD to be 2.0% and overall prevalence (RA–ILD was present or developed during the analysis period) to be approximately 5.0%, which is in line with our results24. A study, similar to that reported here, using the United States-based Truven Health MarketScan Commercial and Medicare Supplemental health insurance databases, showed the prevalence of RA–ILD in the US was 3.2 to 6.0 cases per 100,000 people4. A retrospective review of patient data in Jordan found prevalence of RA–ILD among 210 patients to be 3.7%25. It is important to note that the study reporting an RA–ILD prevalence at the higher end of the range of 58% was a small analysis of 36 patients with early RA (duration < 2 years); the prevalence estimate included both patients with “clinically significant ILD” and with “abnormalities compatible with ILD but no clinically significant ILD”9. As previously noted, in our study, patients were only classified as having RA–ILD if a diagnosis of ILD was definitive.

In this study, assessment of the clinical characteristics of patients in the RA and RA–ILD cohorts showed that patients with ILD were more likely to be older, male, have a history of COPD, and have more prominent RA disease characteristics (a higher proportion of patients were RF+, anti-CCP+, with rheumatoid nodules, erosions, extra-articular disease, swelling, and higher baseline ESR). A higher proportion of patients with RA–ILD had Medicare insurance when compared with the RA cohort; this can be at least partially explained by the age difference, as a larger proportion of patients with RA–ILD were over the age of 65 when compared with the RA cohort. Potential risk factors for RA–ILD were further analyzed by a Cox regression model and, in addition to older age and seropositivity, which are already established risk factors16,17,18,25,26,27,28,29, we confirmed baseline COPD30, and baseline moderate-to-high CDAI score, and CRP > 5 mg/L as risk factors. Although smoking is an established risk factor for RA–ILD25,31, in our analysis, differences in baseline smoking prevalence were not significant based on statistical testing. However, it should be noted that identification of smoking exposures in patient data is limited by missingness, and there may have been a large proportion of false negatives, which would limit reliability. It should further be noted that although COPD and ILD have distinct, separate pathophysiologies, they share overlapping risk factors, and so may develop either simultaneously or successively30,32.

Disease activity has previously been identified as a risk factor for RA–ILD, using DAS2833 or CDAI34 as the measure. A retrospective analysis of data from patients (n = 1419) with early/mild or severe interstitial lung abnormalities in the Brigham and Women’s RA Sequential Study revealed that those with high or moderate disease activity (defined by DAS28) had an increased risk of developing RA–ILD (compared with patients in remission or with low disease activity)33. A smaller (n = 118) case–control study showed that a CDAI score > 28 was associated with the presence of RA–ILD34. Previous studies have also identified baseline CRP level as a risk factor for RA–ILD: CRP > 10 mg/L or “higher” baseline levels35,36. Our analysis refines these further by identifying baseline CRP > 5 mg/L to be predictive of RA–ILD. The identification of new risk factors for RA–ILD may help physicians diagnose and treat patients earlier in the course of the disease.

Our subanalysis of outcomes before versus after ILD diagnosis provides some insight into RA disease severity and healthcare utilization (treatments, encounters) for patients with RA who develop ILD. Based on swollen joint counts, patients with RA–ILD appeared to have worse RA symptoms after ILD diagnosis compared with patients who did not develop ILD. It should be noted that more patients in the RA cohort had missing disease severity data, which may be an artifact of scheduling routine assessments 1–2 times per year. Missing data may also be accounted for by patients with low disease activity or those in remission being less likely to consult their physician as frequently as patients with medium/high disease activity. Thus, more complete disease activity data may highlight a greater disparity in RA symptom control between patients with RA who develop ILD and those who do not develop ILD. Our descriptive subanalyses suggest that this disparity contributes to greater use of glucocorticoids/DMARDs, biologics, and rheumatologist encounters in patients who develop ILD compared with patients with RA alone.

This was a large analysis of real-world data collected by rheumatologists across several regions of the United States. The comprehensiveness of the JointMan database, which incorporates rheumatology encounters, rheumatology-specific laboratory results, clinical evaluations, and prescriptions within the JointMan network for patients covered by commercial, Medicare, and Medicaid insurance plans, allows for longitudinal analysis of RA and related treatments and conditions. Other strengths are the integration of live patient electronic records allowing for continuous coverage, and being part of a rheumatology network which suggests the clinicians are knowledgeable on disease surveillance practice. Compared with randomized clinical trials, real-world studies are important to provide evidence that is generalizable to different populations and are useful for assessing specific characteristics of patient populations, risk factors on a pre-defined outcome, and comparative effectiveness37.

Despite the above strengths, there are naturally some limitations to the analysis. Coding errors may have occurred in the patient data, and in some instances, diagnostic codes may have been entered as rule-out criteria and not actual disease. Due to the nature of the study design, the symptoms and tests used to reach diagnosis were not captured in this study. Specific validation studies assessing the codes for RA are lacking, however the validity of ICD-9-CM and ICD-10-CM versus chart review data have been shown to be comparable for rheumatic disease38. Additionally, encounters outside of the JointMan network such as inpatient visits, emergency department visits, and visits with non-rheumatology physicians are not captured. The use of the JointMan database also varied between sites and over time. Although data were collected across many regions of the United States, the JointMan database population was limited to eight states, with most of the population located in Washington. As mentioned, our dataset also had different levels of missing data for swollen joint counts and disease severity scores for patients in the RA and RA–ILD cohorts. Missing data may have been driven by lower disease activity, especially for patients in the RA cohort. Furthermore, as this study covers patients from 2009 to 2019, clinical assessment of disease activity scores may have become more common since the beginning of the study period, which may contribute to missing data.

In conclusion, this work further describes the disease and natural history of patients with the debilitating conditions of RA and ILD. The prevalence of RA–ILD in this large, real-world study using data from the United States-based JointMan database was 4.1%. This study provides insight into the increased burden of disease among patients with RA–ILD versus RA without ILD; RA disease activity may be worse after ILD diagnosis compared with the pre-ILD diagnosis index period and compared with patients with RA alone. Several previously established risk factors for developing ILD were confirmed, including older age, COPD at baseline, anti-CCP positivity, CRP > 5 mg/L, and a moderate-to-high CDAI score. Recording and tracking routine clinical disease activity metrics may help identify patients at higher risk of RA complications. Recognition of the risk factors underscored here may lead to early diagnosis of RA–ILD and quicker treatment initiation, leading to better clinical outcomes for these patients.