Extending the data collection from a clinical trial: The Extended Salford Lung Study research cohort

The Extended Salford Lung Study (Ext-SLS) is an extension of the Salford Lung Studies (SLS) in asthma and chronic obstructive pulmonary disease (COPD) through retrospective and prospective collection of patient-level electronic health record (EHR) data. We compared the Ext-SLS cohort with the SLS intention-to-treat populations using descriptive analyses to determine if the strengths (e.g. randomization) of the clinical trial were maintained in the new cohort. Historical and patient-reported outcome data were captured from asthma-/COPD-specific questionnaires (e.g., Asthma Control Test [ACT]/COPD Assessment Test [CAT]). The Ext-SLS included 1147 participants (n = 798, SLS asthma; n = 349, SLS COPD). Of participants answering the ACT, 39% scored <20, suggesting poorly controlled asthma. For COPD, 61% of participants answering the CAT scored ≥21, demonstrating a high disease burden. Demographic/clinical characteristics of the cohorts were similar at SLS baseline. EHR data provided a long-term view of participants’ disease, and questionnaires provided information not typically captured. The Ext-SLS cohort is a valuable resource for respiratory research, and ongoing prospective data collection will add further value and ensure the Ext-SLS is an important source of patient-level information on obstructive airways disease.


Data analysis
To understand whether the Ext-SLS cohort was representative of the wider SLS population, we report descriptive comparisons of the Ext-SLS and SLS intention-to-treat (ITT) cohorts using data captured at entry to the original SLS. Post-hoc analysis was performed using chi-square testing to examine differences in select baseline characteristics. The ITT populations of the SLS included patients who were randomized in the studies and received at least one prescription of study medication (FF/VI or UC). The generalizability of the COPD population to the nontrial COPD population has been assessed previously 12 .
A descriptive analysis of the Ext-SLS primary care EHR and questionnaire data was conducted using N and % for categorical data; mean and standard deviation (SD) or median and inter-quartile range for continuous data. Secondary care data were not available at the time of analysis. Description of primary care data focused on current lifestyle information (smoking, body mass index [BMI]), comorbidities, and blood tests (eosinophils, neutrophils) recorded within 24 months prior to the Ext-SLS consent date. For participants with COPD, modified Medical Research Council (mMRC) dyspnea score, Global Initiative for Chronic Obstructive Lung Disease (GOLD) grade of airflow limitation (1-4) and lung function (forced expiratory volume in one second [FEV 1 ]) were assessed. Questionnaires were analyzed on a per-question basis using all "known" answers (ie, where an answer other than "Don't know" was provided) for each specific question. Additionally, questionnaire data were analyzed using the subset of participants that completed all required questions (i.e., not part of a skip pattern).

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.

RESULTS
The Ext-SLS study population Overall, 1183/7032 participants (18.9%) from the SLS ITT population consented to the Ext-SLS and 1147 of these (n = 798 asthma, n = 349 COPD) completed questionnaires and had primary care data available for analysis. The SLS ITT populations included 4233 asthma patients and 2799 COPD patients. While the intention was to collect full retrospective EHR data, ultimately, only~10 years of data (including retrospective and some prospective) data were obtained and are available. Participants consented to the Ext-SLS a mean of 3.2 years post-SLS. In the SLS, patients were randomized 1:1 to FF/VI versus UC. In the Ext-SLS, 46% (368) of participants with asthma and 48% (167) of participants with COPD were originally randomized to FF/VI during the SLS. Most Ext-SLS participants were from Salford (51%; n = 582) or Trafford (30%; n = 341) and provided consent in 2018 (91%; n = 1039).

Descriptive comparison of Ext-SLS and SLS ITT populations
Baseline demographics collected at the beginning of the SLS were broadly similar between the Ext-SLS and SLS cohorts (Table 2), with some exceptions. Among participants with asthma, the proportion aged >50 years at baseline was higher in the Ext-SLS cohort compared with the SLS cohort (p < 0.0001). Conversely, at entry to the SLS, participants in the Ext-SLS cohort with COPD

Asthma and COPD
A seven-question instrument, with five items relating to disturbance falling asleep or staying awake during the day. The remaining items concern sleep quality. Responses range from 1 = never to 5 = very often, with two items reverse-scored. The sum of raw scores is linearly transformed to arrive at a total score ranging from 0 to 100, with higher scores indicating worse sleep quality. The CASIS has a look-back period of 1 week. Asthma Control Questionnaire-6 (ACQ-6) Asthma A six-item assessment of control based on asthma symptom severity/frequency and rescue medication use. For each question, a score of 0 indicates no impairment and 6 indicates maximum impairment. The total score is the mean of responses to all six questions. A total score of 0.0-0.75 is classified as well-controlled asthma; 0.75-1.5 as a "gray zone"; and >1.5 as poorly controlled asthma. The ACQ has a look-back period of 1 week. were, on average, slightly younger at baseline than in the SLS COPD ITT population, with a greater proportion of participants aged <50 years (p = 0.0013). Compared with SLS asthma participants, there were more females in the Ext-SLS (p = 0.0035), and a larger proportion reported never smoking at baseline (p = 0.007). For participants with COPD, there was a greater proportion of "ex-smokers" and a smaller proportion of "current smokers" in the Ext-SLS cohort compared with the SLS cohort; however, these differences were not statistically significant. Respiratory disease characteristics for asthma (Table 3) and COPD (Table 4) were broadly similar between the SLS and Ext-SLS cohorts, but with some notable differences. Participants with asthma in the Ext-SLS potentially had less severe disease at SLS entry, compared with the SLS asthma cohort, based on selfreported symptoms, control as measured using the ACT and short-acting β 2 -agonist (SABA) reliever therapy use (Table 3). Participants with COPD in the Ext-SLS had slightly better lung function, lower CAT scores (a greater proportion of participants with scores below 10), and a smaller proportion of the Ext-SLS cohort were GOLD Grade 3 or 4 ( Table 4).
Description of select Ext-SLS primary care data A selection of variables relevant to asthma and COPD are described here, illustrating the breadth of information available. Table 5 shows BMI measurements recorded in participants' primary care EHR; most asthma and COPD participants were overweight (BMI ≥25 kg/m 2 ) or obese (BMI ≥30 kg/m 2 ). Average BMI measurements in Ext-SLS (at or just prior to the time of consent) reflected measurements recorded at SLS entry (asthma 29.9 and 29.9; COPD 28.1 and 27.8, respectively; Table 2).
The smoking status of participants in the Ext-SLS showed that, among participants who had been smokers at SLS baseline, 46% of those with asthma and 31% of those with COPD had stopped smoking (Table 5).
Common conditions (ever recorded in EHR) among Ext-SLS asthma participants included atopy (57.0%), pneumonia (53.8%), and diabetes mellitus (14.9%) ( Table 6). Evidence of current asthma (defined based on the presence of an asthma medical code in the 12 months preceding Ext-SLS consent) was observed in 86.2% of the asthma group. Two participants had no medical information available, meaning 99.7% of the asthma group were confirmed as having ever had asthma (i.e., any record of asthma prior to Ext-SLS consent). Among Ext-SLS COPD participants, comorbidities included pneumonia (68.5%), atopy (37.0%) and arrhythmia (26.1%). In the 12 months prior to Ext-SLS consent, 21.5% of COPD participants had evidence of current asthma, while 37.2% had no prior asthma diagnosis (Table 6).  A large proportion of asthma and COPD participants had records for blood eosinophil (66 and 83%, respectively) and neutrophil counts (68 and 86%, respectively); with many having multiple measures (mean 2.2 asthma, 2.6 COPD) in the 24 months prior to Ex-SLS consent. Compared with asthma participants, a slightly higher proportion of COPD participants had blood eosinophil counts ≥150/μl (69 versus 66%). Neutrophil counts <6000/μl were recorded for 88% of asthma participants and 77% with COPD.
Among the 349 Ext-SLS COPD participants, 75% had information on GOLD airflow grade; most participants were GOLD Grade 2 (43%). An average of three (range 1-12) FEV 1 measurements were taken in the 24 months prior to consent.
Prescriptions of respiratory medicines in the 12 months prior to Ext-SLS consent are detailed in Table 7. A high proportion of asthma and COPD participants were prescribed SABA (83 and 90%, respectively). This was also the case for inhaled corticosteroid/long-acting ß2-agonist (ICS/LABA) fixed-dose combinations (78 and 88%, respectively). Leukotriene receptor antagonists were prescribed to 15% of asthma participants, while 25% of COPD participants were prescribed a single-inhaler triple therapy.

Description of participant-completed questionnaires
All participants completed some of the questionnaires, with most participants completing all required questions (asthma: 616 [77%]; COPD: 221 [63%]). The distribution of responses for each question was similar when analyzing the responses from all patients who completed the question compared to the subset of patients who completed all required questions (Tables S1, S2).

Asthma
Among those who answered disease history questions (n = 738-755, variable number of responders), 31% reported the first occurrence of symptoms at age ≥40 years. The first diagnosis was made ≥40 years of age for 35% of patients, with 40% prescribed medication for the first time aged ≥40 years. Among the participants with asthma who responded to smoking questions (n = 784), 12% identified as current smokers (compared with 14% at SLS baseline and 14% as recorded in the primary care EHR data; Table 2).
The Asthma Control Test (ACT) was completed by 787 participants, 61% of whom scored ≥20 suggesting their asthma was well-controlled at the time of consent into the Ext-SLS. Scores suggesting partially controlled (ACT score [16][17][18][19] and   uncontrolled (ACT score <16) asthma were found in 20 and 19% of participants, respectively. Of the 789 participants who completed the Asthma Control Questionnaire-6 (ACQ-6), 37% scored ≤0.75, indicating controlled asthma, while 31% scored >0.75 and <1.5 suggesting partial control and 32% scored ≥1.5 suggesting uncontrolled asthma. The effect of asthma on sleep, as measured by the COPD and Asthma Sleep Impact Scale (CASIS) total score, was found to be low among the Ext-SLS participants (n = 793), with 87% scoring <60 out of 100, placing them in the first tertileindicating low impact (Fig. 1). Commonly reported triggers for asthma symptoms were airway infections and colds (88%), dust (85%), exercise (77%), and cold air (70%) (Fig. 2). Alterations to maintenance medication dose or frequency were made without GP input by 25% of 783 participants, with 51% doing this ≥3 times in the preceding six months. The majority of responders had never been prescribed an oral corticosteroid rescue pack (82% of 759 participants) nor received a written asthma management plan (67% of 717 participants).

COPD
A large proportion of responders (87% of 340) reported early exposure to second-hand tobacco smoke. For 66% of 342 participants, their mother was a smoker (Table S2). Among the participants with COPD who responded to smoking questions, 38% were classified as current smokers (compared with 46% at SLS baseline and 38% in the primary care EHR data). Among the participants who had ever smoked, 60% reported beginning smoking on most days between the ages of 15 and 19 years. When asked about environmental exposures, in the 93% of participants who had ever been employed, common exposures in the workplace were dust, cigarette smoke and fumes (including chemical fumes) (Fig. 3).
In response to questions on social and physical functioning (n = 331-346, variable number of responders), 76% of participants reported visiting friends or family at least once per week; while 36 and 37% described being moderately satisfied and not at all satisfied with their health, respectively. Overall, 36% of participants reported feeling tired nearly every day, 16% on more than half of the days and 34% on several days in the two weeks prior to answering the questionnaire. Pub or social club attendance was  reported by 29% of participants, 11% attended a sports club, gym, or golf club, 14% attended other group activities at least once per week but 53% reported that they did not attend any of the activities listed in the questionnaire. Being unable to walk unaided was described by 33% of participants, with COPD being a factor in the need for assistance in 88%.

DISCUSSION
The Ext-SLS offers a unique cohort of asthma and COPD patients who have been well characterized through involvement in a clinical trial, via EHR data collection and through completion of disease-specific questionnaires. Participants were originally recruited to the SLS after the presentation to primary care 12 and are thus more representative of a real-world population than patients from traditional clinical trials.
Comparisons of the Ext-SLS cohort with the SLS ITT populations using SLS trial data demonstrated that the Ext-SLS cohort is broadly similar to the wider SLS populations, but with some key differences. Ext-SLS asthma participants were, on average older, with fewer symptoms than the SLS asthma ITT population at the time of SLS entry. Conversely, Ext-SLS participants with COPD were, on average, younger, but similarly had less severe disease (based on airflow obstruction) at the time of SLS entry than the SLS COPD ITT population. Additionally, it is likely that the benefits of randomization have not been carried through to the Ext-SLS, even if the 1:1 ratio of FF/VI to UC participants was broadly maintained. Interestingly, a previous observational cohort study compared the cohort of SLS COPD patients with matched non-trial patients with COPD in England from the Clinical Practice Research Datalink database 12 . The study found that the trial population was similar to the non-trial COPD population in terms of baseline demographics, clinical and treatment variables, including COPD exacerbations, supporting the generalizability of SLS COPD results. There was evidence of a Hawthorne effect (a phenomenon whereby participants or practitioners modify their behavior due to an awareness of being observed) 13,14 , as more COPD exacerbations were reported in trial patients and/or were more likely to be recorded. However, the largest effect was observed through changes in behavior in patients and general practitioner coding practices.
While the Ext-SLS may not fully represent the SLS, it remains a detailed and valuable cohort for respiratory disease study. The comprehensive primary care data available describe participants' general health over the course of their disease, with measurements of BMI, laboratory parameters and information on comorbidities. The laboratory data provide insight into clinical biomarkers of respiratory disease, such as blood eosinophils and neutrophils, which are both potential biomarkers of disease severity [15][16][17][18][19] . Information on biomarkers, in combination with patient-reported information from the questionnaire, makes the Ext-SLS particularly useful in identifying disease traits (e.g., eosinophilic inflammation, smoking) and may help to address questions relating to the impact of a treatable traits 6,20 approach (in contrast to stepwise treatment approaches as are routinely used in asthma management 18 ) in obstructive airways disease.
Additionally, data on the prescription of respiratory medication can indicate common treatment patterns and indirectly provide information on disease control. A noteworthy limitation of the prescribing data is the restriction to prescriptions issued in primary care. Most new and repeat prescriptions for respiratory medications are issued in primary care, except biological therapies for asthma. Additionally, the exact dates of prescriptions were not available at the time of analysis (only month and year), meaning that we could not accurately determine the use of multiple-inhaler triple therapies.
The questionnaires (completed in full by a large proportion of participants) complement the EHR data, providing information on disease impact that is not typically available. Among Ext-SLS participants with asthma, the validated ACT, ACQ, and CASIS questionnaires detailed the Ext-SLS asthma population as one where many participants still have uncontrolled asthma, and disturbed sleep, albeit at a low level. The questionnaires also showed a high level of self-management among the participants.
Participants with COPD in the Ext-SLS had a mean CAT score >20, suggesting that COPD had a substantial impact on the lives of these participants. However, as with the asthma population, the CASIS results suggested that most participants did not find their sleep suffering as a result of their COPD, though many reported sleep disturbances. The social impact of COPD was demonstrated by the questionnaires, with more than half of participants reporting that they did not attend social activities-aligning with previous research into the social impact of COPD 21 . Data relating to early life and personal history provided valuable insights that could not be gathered from medical records, detailing factors relating to underlying etiologies. The asthma questionnaires indicated that asthma was often adult-onset in this cohort and outlined common triggers of symptoms. For participants with COPD, data from the questionnaires suggested factors that could have contributed to the disease in the household and working environments.
The Ext-SLS has some limitations that relate to the nature of the data. Patient-reported outcomes were collected in the diseasespecific questionnaires and could be susceptible to inaccuracy or error due to factors such as limitations in recall, recall bias and a subjective assessment of the variables. However, validated instruments were used to collect information about disease impact and management [22][23][24][25][26][27][28] . Similarly, the nature of real-world data means that data may be error-prone (e.g., inconsistent data recording and missing data). The time between the original SLS and the Ext-SLS led to patient attrition and there may be bias in the recruitment of patients due to the SLS population ageing and possibly becoming too unwell to participate.
In conclusion, the Ext-SLS cohort is a well-characterized and valuable resource in the field of respiratory research and may be particularly useful in addressing questions relating to treatable traits. The cohort is unique due to the inclusion of a randomized clinical trial in its timeline. Ongoing work collecting prospective data from the cohort will further increase the data available over time.

DATA AVAILABILITY
Anonymized individual participant data and study documents can be requested for further research from www.clinicalstudydatarequest.com. The secondary care data associated with this study are under license from NHS Digital and cannot be released as such.