The feasibility of pragmatic influenza vaccine randomized controlled real-world trials in Denmark and England

We estimated the frequency of non-specific influenza-associated clinical endpoints to inform the feasibility of pragmatic randomized controlled trials (RCT) assessing relative vaccine effectiveness (rVE). Hospitalization rates of respiratory, cardiovascular and diabetic events were estimated from Denmark and England’s electronic databases and stratified by age, comorbidity and influenza vaccination status. We included a seasonal average of 4.5 million Danish and 7.2 million English individuals, 17 and 32% with comorbidities. Annually, approximately 1% of Danish and 0.5% of English individuals were hospitalized for selected events, ~50% of them respiratory. Hospitalization rates were 40–50-fold and 2–10-fold higher in those >50 years and with comorbidities, respectively. Our findings suggest that a pragmatic RCT using non-specific endpoints is feasible. However, for outcomes with rates <2.5%, it would require randomization of ~100,000 participants to have the power to detect a rVE difference of ~13%. Targeting selected groups (older adults, those with comorbidities) where frequency of events is high would improve trial efficiency.


INTRODUCTION
The World Health Organization (WHO) recommends annual vaccination as the most effective method to prevent influenza 1 . Randomized control trials (RCTs) have, in recent years, relied on laboratory confirmed influenza as a study endpoint to demonstrate the efficacy of influenza vaccines 2 . However, there is a growing body of evidence that influenza is associated with a broader spectrum of non-respiratory events including cardiovascular, neurological and other complications [3][4][5] . If influenza virus infection precipitates these events, vaccination should be expected to prevent a proportion of them 6 . Limited evidence of the value of influenza vaccination at preventing non-respiratory outcomes exists based on observational studies, meta-analysis of RCT data and reanalysis of study safety data [7][8][9] .
RCTs are the most valid study designs to demonstrate causal relationships 10 ; but including non-specific endpoints in a traditional RCT may require large sample sizes because only a proportion of captured non-respiratory events would be influenza-associated [and therefore vaccine-preventable]. Results would also be sensitive to unpredictable and poorly-understood, time-lagged relationships between influenza and related outcomes or complications which could 'dilute' vaccine efficacy/ effectiveness (VE) 11 . Differentiated influenza vaccines have demonstrated improved immunogenicity and protection for older adults and some studies have included non-specific cardiovascular or other secondary events to illustrate the full public health value of these vaccines as compared with traditional influenza vaccines [12][13][14][15] . Studies comparing two vaccines measure relative VE (rVE) which, using an efficacious comparator, report smaller effect sizes and therefore require even larger sample sizes to demonstrate superior protection from influenza-associated cardiovascular and non-respiratory events.
Pragmatic trials, which typically measure outcomes using real world data from existing databases or public health registers, may be a feasible method of randomizing interventions in hundreds of thousands of study participants and therefore increase power to measure rVE against non-specific outcomes in a cost-effective manner 16,17 . Their sample sizes are dictated by the estimated rVE and incidence rate of the outcome under assessment. These studies would be feasible if it is logistically and financially possible to randomize and vaccinate the necessary sample size within a single or multiple healthcare systems from which outcomes could be reliably captured.
We conducted a retrospective study using electronic medical records from Denmark and England to estimate the incidence rate of cardiovascular, respiratory and exacerbation of diabetes events in adults ≥18 years, hospitalized during influenza season, to guide future pragmatic RCTs exploring broader, clinically-important, influenza-associated endpoints. We then estimated the sample size requirements to conduct a pragmatic RCT under different population inclusion and rVE scenarios and discussed the feasibility of conducting such studies.

RESULTS
Demographics and influenza vaccination coverage rate From Denmark, the study cohort aged ≥18 included a seasonal average of 4,469,268 individuals, 50.7% female. Overall, 17% of the study population had ≥1 high risk condition, increasing from 5.7% in those 18-34 years to 43% in those ≥75 years (Table 1). From England, the cohort included a seasonal average of 7,212,471 people of whom 50.5% were female. In the overall population, 32% had ≥1 high risk condition, increasing from 16.7% in the 18-34 years to 63% in those ≥75 years. Cardiovascular, diabetes, immunocompromised, asthma and other respiratory conditions were the most common high-risk conditions in both countries. Influenza vaccination coverage rates (VCR) captured in these healthcare databases was 13% in Denmark and 24% in England, increasing in the populations aged ≥75 to 55 and 78%, respectively ( Fig. 1; Supplementary Fig. 1). VCR was 6-10 fold higher in people with high-risk conditions vs those without in populations aged <65. This difference was much smaller in older adults: in Denmark 59% of adults aged ≥75 with high-risk conditions were vaccinated vs 52% of those without; in England the corresponding proportions were 82 and 72% ( Fig. 1  Influenza hospitalization rates varied between seasons in Denmark from 2 per 100,000 in the 2011/12 season to 52 per 100,000 in the 2017/18 season; and in England from 0.6 per 100,000 in 2011/12 to 33 per 100,000 in 2017/18. Inter-seasonal variation in cardiovascular or diabetic outcome groups was less pronounced, normally varying by <20%. Outcome rates were higher in influenza vaccine recipients than non-recipients particularly in younger individuals in whom vaccination was less common, giving rise to IRRs between vaccine recipients and non-recipients which were nearly always >1 ( Fig. 2 and Supplementary Tables 4 and 5). In Denmark, the IRR for all outcomes in the 18-34-year-old age group was 19.4 (95% CI: 18.1; 20.5) and declined in progressively older groups to 1.13 (1.12-1.14) in the ≥75 year old group. In England, the trend was similar with an IRR of 8.3 (95% CI: 8.0-8.7) in the youngest group, declining to 1.10 (95% CI: 1.08; 1.11) in the oldest. These trends were broadly similar across outcome groups with high IRRs in younger age groups declining to~1 in those aged ≥75 years.    Tables S8 and S9). IRRs in individuals with ≥1 high risk condition were highest in the 18-34-year-old age group (7.3 in Denmark and 9.5 in England), an effect driven by low hospitalization rates in healthy younger adults, and declined in older age groups. Significantly elevated incidence rates were observed in at-risk populations irrespective of their age.

Outcomes in individuals with existing high-risk conditions
Sample size for a pragmatic RCT Under different rVE assumptions, a range of incidence rate scenarios (which we assumed as attack rates) representing the  frequency of events reported above and a total sample size of 100,000, the power to conclude rVE >0 in a RCT varied from~7% to~100% (Fig. 4). With attack rates <0.5% or rVE <7%, power was low irrespective of other parameters. To achieve a power of 80% to ascertain a rVE of 10%, the frequency of events to be used as endpoints would need to be ≥1.5% in a population of at least 200,000 people. Rare event rates (<1%) as seen in certain populations would require even higher sample sizes for a similar expected rVE.

DISCUSSION
Our analysis over 8 years in Denmark and 9 years in England showed that~1% of Danish and~0.5% of English individuals were hospitalized for selected health events that could be associated with influenza every season, with rates varying significantly according to age and the presence of high-risk medical conditions. Among these events, respiratory hospitalizations were the most commonly seen in patients of all ages; the proportion of cardiovascular events increased markedly with age; diabetic exacerbations were exceedingly rare; and influenza as a primary diagnosis was reported in <2% of hospitalizations, a proportion which varied by season, synchronous with recorded epidemics 18 . Unsurprisingly, hospitalizations were more common in older adults: respiratory and cardiovascular hospitalizations were~40 fold and~100-fold higher in those aged ≥75 than in those aged 18-34 years.
The presence of high-risk medical conditions was strongly associated with hospitalization particularly in younger individuals but, even in older adults, high risk conditions were associated with a 2-3-fold elevated rate of hospitalization. Younger adults with cardiovascular or respiratory conditions experienced 10-50-times more hospitalizations than comparable individuals with no comorbidities, underlining the importance of chronic disease Fig. 4 Power to demonstrate relative vaccine efficacy >0% under different rVE, attack rate and sample size assumptions. Power calculated by exact method, based on binomial distribution of cases in investigational group among overall number of cases. Attack rate representing the frequency of events is assumed in control group -Type I error 2.5% -1:1allocation ratio. Selected attack rates reflect the range of incidence rates of events as estimated in our study (IR 1000/100,000 individuals = attack rate of 1%). management in these vulnerable groups, irrespective of their age. The presence of high-risk conditions has been shown to elevate risk of severe and hospitalized influenza outcomes, these individuals benefit most from influenza vaccination, and could therefore be considered priorities for inclusion in influenza vaccine studies 19,20 . Hospitalization rates were up to 20-fold higher in younger influenza vaccine recipients compared with unvaccinated groups of the same age, most likely because vaccination is indicated only for high-risk groups in this age. Across all ages we observed <50% of high risk individuals received influenza vaccination annually, as is common in European countries 21 , and it is likely that only patients at highest risk, in frequent contact with health services for example, receive annual influenza vaccination. This confounding by indication or health care seeking bias-whereby baseline health condition rather than vaccination status predicts the frequency of healthcare events-has been well-described in the influenza VE literature 22,23 . The magnitude of disparity in event rates between vaccine recipients/non-recipients we observed highlights the challenges of confounder adjustment in observational VE/rVE studies, and therefore the need for randomized studies, to reliably measure the performance of influenza vaccines 24 .
This study was conducted to improve planning of rVE studies by identifying populations likely to suffer hospitalizations and therefore offer reduced sample sizes. For example, in Denmark the population with high-risk conditions experienced~4x higher rate of outcomes than those without, corresponding to an improvement in power from~30% to~90%, with a sample of 50,000 if the true rVE is around 15%. To achieve the same power in the population without high-risk conditions would require >200,000 study participants and therefore incur significantly greater resources and may be unfeasible in many settings. Whether or not such a study is feasible would depend on the size of eligible population within a participating healthcare system, the ability to randomize that population into treatment groups and the frequency of the outcome of interest. Individuallyrandomized trials are more labor-intensive to conduct if a very large sample size is required to receive vaccination during a short period, which is the case for influenza vaccination campaigns that start shortly before the season 17 . There are wide variations in influenza season intensity, and studies may need to be conducted over a longer period if conducted in mild seasons.
Targeting populations at highest risk, in whom outcome rates were higher, would therefore improve efficiencies at the risk of reducing generalizability of study results, but because only highrisk and older adults are recommended for influenza vaccination in most countries, limiting inclusion may offer a feasible and relevant population for study 10 . Conversely, enrolling a highly comorbid population would result in a high background rate of non-specific events which are not vaccine-preventable, thereby 'diluting' and reducing rVE as endpoints become less specific, particularly in seasons with low influenza circulation where the proportion of attributable events would be low. This dilution effect may explain a recent study in patients with high-risk cardiovascular disease in which a high-dose inactivated influenza vaccine did not significantly reduce all-cause mortality or cardiopulmonary hospitalizations in comparison with a standard dose vaccine 25 . To increase specificity we conducted a thorough clinical validation of included codes and included only respiratory and cardiovascular hospitalizations which were considered likely to be associated with influenza based on assessment of previous clinical studies 3,5,8,9 . We assumed rVE scenarios of 5-20% based on existing data and identified a number of scenarios where pragmatic RCTs would provide high statistical power with a sample size of <200,000 participants (Fig. 4) 12,13,26,27 . However, the influenzaattributable burden of broader secondary outcomes remains incompletely understood and will vary over time: endpoint selection involves a compromise between frequency and specificity which affects rVE, and these assumptions will require refinement as additional evidence arises including from ongoing RCTs 17,28 .
This study was conducted in large databases capturing comprehensive healthcare outcomes with a long history of use for medical research, but databases are not perfect. VCRs are under-estimated because influenza vaccinations delivered at nonmedical settings such as pharmacies or workplaces may not always be captured. Reassuringly, the VCR we captured from both Denmark and England are similar to those reported in routine national statistics 29,30 . Trends were consistent by country, though overall incidence rates were higher in Denmark, probably a result of differential healthcare investments or health systems specificities, healthcare seeking behavior or clinical thresholds for hospitalization 31 . These findings may not be generalizable to other healthcare settings or countries. Our study did not collect individual-level data so could not describe the effect modification of age on high-risk or vaccination status or intra-season correlations due to repeated observations of the same participants in multiple seasons. We included slightly different ICD-10 codes than some other researchers, differences which should be considered when interpreting the public health implications of a given rVE value 8,32 . Importantly, high-risk conditions in England were based on primary care consultations rather than the hospital contact data used in Denmark, likely explaining the higher prevalence of some conditions, notably asthma and kidney disorders, in England. Focusing on specificity, we captured only the primary/main reason for hospitalization and therefore may underestimate influenza: due to laboratory confirmation and coding practices, the full influenza burden in the US, for example, has been shown to be around 3-fold higher if codes relating to "any" rather than the primary diagnostic position are included 33,34 .
In conclusion, we identified groups at high risk of respiratory and cardiovascular events who would represent ideal populations for inclusion in pragmatic influenza vaccine controlled trials. In addition to older individuals, younger adults with high-risk conditions experienced frequent hospitalizations; enrolling this population in rVE studies would increase the probability of detecting true differences between influenza vaccine types and platforms, allowing policymakers to make informed decisions on vaccine recommendations for this priority population group. Such studies appear feasible, particularly if enrollment was limited to individuals aged >50 yrs and/or with high-risk conditions. Pragmatic RCTs such as these would represent a research tool to understand the influenza-attributable proportion of respiratory and non-respiratory diseases and the full public health benefits of influenza vaccines in different population age and risk groups.

Study design and population
We conducted a retrospective cohort study in the 2010/11-2017/18 influenza seasons from Denmark and the 2010/11-2018/19 seasons from England using large healthcare databases in each country. Populations aged ≥18 years on December 1st each year were included in seasonal cohorts and the number of hospitalized events occurring between December 1st and May 31st (defined as the influenza season) was divided by these denominators to calculate seasonal incidence rates of various outcomes stratified by age (18-34 years [yrs]; 35-49 yrs; 50-64 yrs; 65-74 yrs; ≥75 yrs), influenza vaccination status and the presence of clinical high risk (hereafter "high risk") conditions. In both countries, influenza vaccination is recommended and provided free of charge for high-risk adults and adults aged ≥65 yrs 35,36 .

Data sources
All Danish citizens are assigned a unique personal identification number which allows for exact linkage of nationwide administrative registers at the individual level. The Danish Civil Registration System, which records date of birth, emigration status and vital status for all persons residing in Denmark, was used to define cohorts 37 . The Danish National Patient Registry (DNPR) has shown high validity of cardiovascular diagnoses and captures all inpatient and outpatient hospital contacts coded in International Classification of Diseases 10 (ICD-10) 22,38 . The DNPR was used to count hospitalized events and to define high risk conditions. Influenza vaccination status was captured from the Danish National General Practitioners Reimbursement registry. Analyses were conducted by Danish researchers with access to raw, de-identified nationwide registry data in accordance with Danish law.
The UK Clinical Practice Research Datalink (CPRD) is a longitudinal and representative primary care database from a network of over 1,800 primary care practices and includes 16 million currently registered active patients [39][40][41][42] . This analysis used data from the CPRD GOLD and CPRD Aurum primary care databases to define vaccination and high risk status, linked to secondary care data from Hospital Episode Statistics Admitted Patient Care database to capture hospitalized outcomes rates of specified events 43 . Influenza vaccinations administered in GP practices or community pharmacies are captured in these electronic health records. Analysis of the CPRD data was conducted internally by CPRD researchers using databases of pseudonymized patient EHRs, therefore individual participant consent is not required. The study protocol was approved by the Independent Scientific Advisory Committee (ISAC) at the Medicines and Healthcare products Regulatory Agency (protocol ref 20_115 R0 A1).

Outcome selection
We pre-specified groups of medical events, most of which were acute, based on previously documented and plausible associations with influenza, and which we considered outcomes of public health relevance for future pragmatic RCTs. ICD-10 coded primary discharge diagnoses (i.e., the main reason for hospitalization) resulting in hospitalization for ≥1 night were categorized into five groups: (1) influenza; (2) influenza and pneumonia; (3) respiratory; (4) cardiovascular; (5) exacerbations of diabetes. Groups were overlapping to explore the impact on incidence rate of including broader or more specific outcomes as potential study endpoints. The first occurrence of each event per season was included. The list of final codes within each category was selected from all "I" (cardiovascular), "J" (respiratory; of which J09-J11 were used to define 'influenza') and "E" (diabetic) ICD codes based on clinical review, available literature and discussion of the pathology and typical usage of those diagnostic codes in medical practice (Supplementary Table 1

Definition of high-risk conditions
Clinical high-risk conditions corresponding to eligibility for free annual influenza vaccination were modified from definitions used by the UK National Health Service and Danish Statens Serum Institute. They included cardiovascular disorders (including arrhythmias, congestive heart failure, ischemic heart disease and congenital heart disease), respiratory conditions (including asthma), hepatic and renal disorders, neurologic/neuromuscular disorders, blood disorders, metabolic/endocrine conditions including diabetes and conditions compromising the immune system 35,36 . For each condition, a list of ICD10 codes or prescription medication representing these diagnoses (for diabetes only) was defined (Supplementary Table 2). In the UK, primary care events coded with the SNOMED-CT architecture were mapped to these ICD-10 codes following review by a medical doctor (linked codes in Supplementary Data 1). Individuals diagnosed with qualifying events within the DNPR or CPRD primary care database within 3 years of the start of each influenza season, or a diabetes prescription <6 months before the start of each season, were included within that high-risk group for that season. Individuals receiving an influenza vaccination between August 1st and Jan 31st of the following year were considered vaccinated for that season.

Statistical methods
The total number of incident outcome events experienced by the study population was summed for each season. Incidence rates of included outcomes, expressed as rates per 100,000 population, were calculated per season for populations overall and stratified by age group, high-risk condition, and influenza vaccination status. Average seasonal incidence rates over included seasons and their 95% confidence intervals (CIs) were estimated using a Poisson model with the number of events as the dependent variable, no independent variables, and the log of the population size as an offset, with Stata's 'glm' command. In this parameterization, the exponential of the intercept is the incidence rate. The variance was adjusted by a scale factor equal to the deviance divided by the residual degrees of freedom to accounting for under/overdispersion in the underlying data 46,47 . Incidence rate ratios (IRR) and their 95% CIs comparing rates in: a) vaccinated vs unvaccinated and b) individuals with high-risk conditions vs those with no high-risk conditions, were similarly estimated with a Poisson model. A range of identified incidence rates were used to estimate the power of an rVE study by exact method, specifically coded in SAS, based on binomial distribution of cases in investigational groups among overall number of cases, a type I error of 2.5%, 1:1 allocation ratio and a maximum of 200,000 participants (100k per group). We assumed rVE ranging from 5-20% and expressed the result as a series of heatmaps. Analyses were conducted separately within the Danish and UK databases; subsequent manipulations were performed using Stata v 15.1 and SAS.

Reporting summary
Further information on research design is available in the Nature Research Reporting Summary linked to this article.

DATA AVAILABILITY
These data were obtained from national EHR sources which are subject to local laws and regulations. UK data were provided by Clinical Practice Research Datalink (CPRD) under a licence from the UK Medicines and Healthcare products Regulatory Agency. CPRD data can be obtained by researchers following a successful application to CPRD. All Danish data are governed by the Danish Data Protection Agency and can only be made available to any additional researchers if a formal request is filed with the Danish Authorities.