Main

COVID-19 is a viral illness caused by the coronavirus SARS-CoV-2. The acute clinical manifestations of COVID-19 have been well characterized and involve both pulmonary and extrapulmonary systemic manifestations1,2. Emerging reports suggest that—for some individuals—the symptoms of COVID-19 persist beyond the acute setting. However, the post-acute sequelae of COVID-19 are not yet clear.

Here we leveraged the breadth and depth of the US Department of Veterans Affairs electronic health databases to undertake a high-dimensional approach to comprehensively identify the 6-month outcomes of incident diagnoses (from 379 diagnostic categories), incident medication use (from 380 medication classes) and incident laboratory abnormalities (from 62 laboratory tests) in people who survived for at least the first 30 days after their COVID-19 diagnosis.

Non-hospitalized patients with COVID-19

The cohort included 73,435 users of the Veterans Health Administration (VHA) with COVID-19 who survived for at least the first 30 days after their COVID-19 diagnosis and who were not hospitalized, and 4,990,835 VHA users who did not have COVID-19 and were not hospitalized (Supplementary Fig. 1a, b). The median follow-ups were 126 (81–203; for all reported median values, parenthetical ranges refer to the interquartile range) and 130 (82–205) days for patients with COVID-19 and VHA users, respectively (Extended Data Table 1a). We examined a panel of negative-outcome controls, which yielded results that were consistent with our a priori expectations (for example, hazard ratios of 1.03 (0.94–1.12; for all hazard ratios and burdens, parenthetical ranges refer to 95% confidence intervals) and 1.03 (0.95–1.12) for neoplasms and accidental injuries, respectively); the results of all the negative-outcome controls are provided in Extended Data Table 2a. Our examination of the standardized differences of all high-dimensional variables across all outcome-specific cohorts (including those that were selected and those that were not selected in the models) showed that more than 99.99% of standardized differences were <0.1 after adjustment (Supplementary Fig. 2a, b), which resulted in similar distributions of baseline characteristics in each group after adjustment (Supplementary Table 1).

Beyond the first 30 days of illness, individuals with COVID-19 had an increased risk of death (hazard ratio of 1.59 (1.46–1.73)). We also estimated the adjusted excess burden of death due to COVID-19 per 1,000 persons at 6 months on the basis of the difference between the estimated incidence rate in individuals with COVID-19 and all VHA users. The excess death was estimated at 8.39 (7.09–9.58) per 1,000 patients with COVID-19 at 6 months. Individuals with COVID-19 had a higher risk of requiring outpatient care (hazard ratio of 1.20 (1.19–1.21)), at an excess burden of 33.22 (30.89–35.58; all excess burdens are given per 1,000 patients with COVID-19 at 6 months) and at a greater frequency of 0.47 (0.44–0.49) additional encounters every 30 days (Extended Data Table 2b, c).

We evaluated the risk of incident occurrence of 379 diagnoses (that were categorized according to ICD-10 codes based on Clinical Classifications Software Refined), 380 classes of medication and 62 laboratory tests beyond the first 30 days. For each of the outcomes we examined, we built a cohort who were free of the related outcome at baseline to identify the risk of incident outcome during follow-up. We found that several conditions in almost every organ system exhibited an adjusted hazard ratio that was greater than 1 and a P value lower than 6.57 × 10−5 (significance level adjusted for multiple comparisons). The adjusted hazard ratio and burden for all outcomes are presented in Fig. 1a–c and Supplementary Tables 24. The result for outcomes that were positively associated with COVID-19 are presented in Fig. 2a–c, Extended Data Fig. 1a–c, Supplementary Table 5 and are discussed here.

Fig. 1: High-dimensional identification of the incident post-acute sequelae of COVID-19.
figure 1

ac, Incident diagnoses (a), incident medication use (b) and incident laboratory abnormalities (c). All VHA users served as the referent category. Post-acute sequelae were ascertained from 30 days after infection until end of follow-up. Beginning from the outside ring, the first ring represents hazard ratios for the post-acute sequelae of COVID-19. A higher bar indicates a larger hazard ratio. Hazard ratios with a point estimate larger than one and that was statistically significant are shown in yellow. The second ring represents the excess burden per 1,000 patients with COVID-19 at 6 months. The colour of the cell indicates the value of the excess burden (deeper shades of red indicate a higher excess burden and deeper shades of blue indicate a greater reduced burden). The third ring represents the baseline incident rate in the control group (deeper shades of red indicate a higher incident rate). The fourth ring represents negative log of the P value; a higher bar indicates a smaller P value and yellow indicates that the value is statistically significant. ACR, albumin/creatinine ratio; AD, antidotes; AH, antihistamine drugs; Alb, albumin; ALP, alkaline phosphatase; ALT, alanine aminotransferase; AN, antineoplastic agents; AP, antiparasitic agents; AST, aspartate aminotransferase; AU, autonomic; BNP, brain natriuretic peptide; BUN, blood urea nitrogen; CD4, CD4 cell count; CD4/8, CD4/CD8 ratio; Cl, chloride; Cr, creatinine; CRP, C-reactive protein; dBIL, direct bilirubin; Derm, dermatological; DG, diagnostic; DT, dental; GFR, glomerular filtration rate; GT, genitourinary; HbA1c, haemoglobin A1c; HCT, haematocrit; HDL, high-density-lipoprotein cholesterol; Hgb, haemoglobin; hsCRP, high-sensitivity C-reactive protein; ID, irrigation or dialysis; IM, immunological; INR, international normalized ratio; IP, intrapleural; K, potassium; LDL, low-density-lipoprotein cholesterol; MS, musculoskeletal; pBNP, pro-B natriuretic peptide; Plt, platelet; Protein, total protein; PT, prothrombin time; PTT, partial thromboplastin time; RT, rectal; TBIL, total bilirubin;  TC, total cholesterol; TG, triglycerides; TnI, troponin I; TnT, troponin T; WBC, white blood cell.

Fig. 2: Burden of post-acute sequelae of COVID-19.
figure 2

ac, Incident diagnoses (a), incident medication use (b) and incident laboratory abnormalities (c). All VHA users served as the referent category. Post-acute sequelae were ascertained from 30 days after infection until end of follow-up. Sequelae were selected on the basis of having a hazard ratio of more than 1 and a P value of less than 6.57 × 10−5. Excess burdens per 1,000 patients with COVID-19 at 6 months are presented with 95% confidence intervals in parentheses. Outcomes are ranked within each domain on the basis of the excess burden, from high to low. Diagnoses are coloured on the basis of the diagnosis group, medications are coloured on the basis of their class and laboratory abnormalities are coloured on the basis of their being higher or lower than the normal range. F, female; M, male; NSAIDs, non-steroidal anti-inflammatory drugs.

Respiratory conditions

The most common excess burden at 6 months after a COVID-19 infection that did not result in a hospitalization in the first 30 days was that of respiratory conditions, which included respiratory signs and symptoms (excess burden of 28.51 (26.40–30.50)), respiratory failure, insufficiency and arrest (3.37 (2.71–3.92)), and lower respiratory disease (4.67 (3.96–5.28)). There was also evidence of a high burden of incident use of bronchodilators (22.23 (20.68–23.67)), antitussive and expectorant agents (12.83 (11.61–13.95)), anti-asthmatic agents (8.87 (7.65–9.97)) and glucocorticoids (7.65 (5.67–9.50)).

Diseases of the nervous system

An excess burden of nervous system conditions was also evident, and included nervous system signs and symptoms (14.32 (12.16–16.36)), neurocognitive disorders (3.17 (2.24–3.98)), nervous system disorders (4.85 (3.65–5.93)) and headache (4.10 (2.49–5.58)).

Mental health burden

Our results also showed an excess burden of sleep–wake disorders (14.53 (11.53–17.36), anxiety and fear-related disorders (5.42 (3.42–7.29)), and trauma- and stress-related disorders (8.93 (6.62–11.09)). These findings were coupled with evidence of excess burden of incident use of non-opioid (19.97 (17.41–22.40)) and opioid (9.39 (7.21–11.43)) analgesic drugs, antidepressant agents (7.83 (5.19–10.30)), and benzodiazepine, sedative and anxiolytic agents (22.23 (20.68–23.67)).

Metabolic disorders

An excess burden of several metabolic disorders was evident, including disorders of lipid metabolism (12.32 (8.18–16.24)), diabetes mellitus (8.23 (6.36, 9.95)) and obesity (9.53 (7.55–11.37)). These was also evidence of an excess burden of incident use of antilipaemic agents (11.56 (8.73–14.19)), oral hypoglycaemic drugs (5.39 (3.99–6.64)) and insulin (4.95 (3.87–5.90)), as well as an excess burden of elevated low-density lipoprotein cholesterol (9.48 (7.02–11.81)), total cholesterol (9.94 (6.61, 13.11)), triglycerides (9.40 (6.63–12.03)) and haemoglobin A1c (10.66 (6.77–14.35)).

Poor general wellbeing

Individuals with COVID-19 exhibited an excess burden of poor general wellbeing, including malaise and fatigue (12.64 (11.24–13.93)), muscle disorders (5.73 (4.60–6.74)), musculoskeletal pain (13.89 (9.89–17.71)) and anaemia (4.79 (3.53–5.93)). These diagnoses were coupled with laboratory evidence of an excess burden of anaemia, comprising decreased haemoglobin (31.03 (28.16–33.76)), decreased haematocrit levels (30.73 (27.64, 33.67)) and low serum albumin (6.44 (4.84, 7.92)).

Cardiovascular conditions

There was an excess burden of cardiovascular conditions, including hypertension (15.18 (11.53–18.62)), cardiac dysrhythmias (8.41 (7.18–9.53)), circulatory signs and symptoms (6.65 (5.18–8.01)), chest pain (10.08 (8.63–11.42)), coronary atherosclerosis (4.38 (2.96–5.67)) and heart failure (3.94 (2.97–4.80)). There was also evidence of excess burden of incident use of beta blockers (9.74 (8.06–11.27)), calcium channel blockers (7.18 (5.61–8.61)), loop diuretic agents (4.72 (3.59–5.72)), thiazide diuretic agents (2.52 (1.37–3.54)), and anti-arrhythmic drugs (1.28 (0.79–1.67)).

Gastrointestinal system

There was evidence of an excess burden of the following conditions: oesophageal disorders (6.90 (4.58–9.07)), gastrointestinal disorders (3.58 (2.15–4.88)), dysphagia (2.83 (1.79–3.76)) and abdominal pain (5.73 (3.7–7.62)). These conditions were coupled with evidence for an increased use of laxatives (9.22 (6.99–11.31)), anti-emetic agents (9.22 (6.99–11.31)), histamine antagonists (4.83 (3.63–5.91)), other antacids (1.07 (0.62–1.42)) and antidiarrhoeal agents (2.87 (1.70–3.91)). Laboratory abnormalities included an increased risk of incident high levels of alanine aminotransferase (7.62 (5.20–9.90)).

Other sequelae

There was also evidence of an excess burden in incident acute pulmonary embolism (2.63 (2.25–2.92)) and use of anticoagulant drugs (16.43 (14.85–17.89)). Other conditions included excess burden of skin disorders (7.52 (5.17–9.73)), arthralgia and arthritis (5.16 (3.18–7.01)) and infections, including urinary tract infections (2.99 (1.94–3.93)) (Fig. 2a–c, Supplementary Tables 25).

COVID-19 requiring hospitalization versus influenza

To gain a better understanding of the spectrum of clinical manifestations in patients with COVID-19 who were hospitalized, we undertook a comparative evaluation of a cohort of hospitalized individuals with COVID-19 versus individuals who were hospitalized with seasonal influenza (a well-known and well-characterized respiratory viral illness).

This cohort included 13,654 people with COVID-19 and 13,997 people with influenza who survived for at least 30 days after hospital admission (Supplementary Fig. 3a, b). The median follow-ups were 150 (84–217) and 157 (87–220) days for patients with COVID-19 and influenza, respectively (Extended Data Table 1a). We tested a panel of negative-outcome controls, which yielded results that were consistent with our a priori expectations (for example, hazard ratio of 0.98 (0.83–1.16) and 1.02 (0.90–1.15) for neoplasms and accidental injuries, respectively); the results of all the negative-outcome controls are provided in Extended Data Table 2a. Our examination of standardized differences of all high-dimensional variables (including those that were selected and those that were not selected in the models) in all outcome-specific cohorts showed that more than 99.75% of standardized differences were <0.1 after adjustment (Supplementary Fig. 4a, b), which resulted in similar distributions of baseline characteristics in each group after adjustment (Supplementary Table 6).

Beyond the first 30 days of illness, individuals with COVID-19 who had been hospitalized for this disease had an increased risk of death (hazard ratio of 1.51 (1.30–1.76)); we estimated excess death at 28.79 (19.52–36.85) per 1,000 persons at 6 months. Individuals with COVID-19 exhibited a higher risk of requiring outpatient care (hazard ratio of 1.12 (1.08–1.17)), at an excess burden of 6.37 (4.01–9.03) and with greater frequency of 1.45 (1.28–1.63) additional encounters every 30 days (Extended Data Table 2b, c).

Compared to individuals who were hospitalized with seasonal influenza (and beyond the first 30 days of illness), patients who had been hospitalized for COVID-19 had a higher burden of a broad array of pulmonary and extrapulmonary systemic manifestations, including neurological disorders (burdens of 19.78 (12.58–26.19) and 16.16 (10.40–21.19) for nervous system disorders and neurocognitive disorders, respectively), mental health disorders (for example, a burden of 7.75 (4.72–10.10) for mental and substance-use conditions), metabolic disorders (for example, a burden of 43.53 (28.71–57.08) for disorders of lipid metabolism), cardiovascular disorders (for example, a burden of 17.92 (10.73–24.35) for circulatory signs and symptoms), gastrointestinal disorders (for example, a burden of 19.28 (12.75–25.13) for dysphagia), coagulation disorders (14.31 (10.08–17.89)), pulmonary embolism (18.31 (15.83–20.25)) and other disorders including malaise and fatigue (36.49 (28.13–44.15)) and anaemia (19.08 (10.58–26.81)) (Extended Data Figs. 2a–f, 3a–c, Supplementary Tables 710). Analyses of risk and the burden of clinical manifestations that additionally adjusted for the severity of the acute infection yielded consistent results in both the direction and magnitude of estimates (Extended Data Figs. 4a–f, 5a–c, Supplementary Tables 1114). Our high-dimensional comparative evaluation of six-month outcomes in a cohort of hospitalized individuals with COVID-19 (n = 13,654) versus individuals who were hospitalized for other causes (n = 901,516) yielded consistent results (Extended Data Figs. 6a–f, 7a–c, Supplementary Tables 1518).

Analysing risk of prespecified COVID-19 outcomes

To complement our high-dimensional approach and to gain a deeper understanding of the clinical manifestations of post-acute COVID-19 across the severity of the initial acute disease, we evaluated the risks of a panel of prespecified outcomes across the care setting of the acute phase of the disease (using whether individuals were non-hospitalized, hospitalized or admitted to intensive care, as a proxy indicator of disease severity) and benchmarked risk in these populations to a common reference group (the broader population of the Veterans Affairs Health Care System (n = 4,990,835)) (Extended Data Table 1b). Our assessment of standardized differences across the four groups showed that none of these differences was less than 0.1 after adjustment (Supplementary Fig. 5). Our results reveal (1) an increased risk of a broad array of specific clinical manifestations that include acute coronary disease, arrythmias, acute kidney injury, chronic kidney disease, memory problems and thromboembolic disease (Fig. 3, Supplementary Tables 19, 20); (2) that this risk was evident even in individuals who were not hospitalized with COVID-19; and (3) a risk gradient that increased across the care setting of the acute COVID-19 infection from non-hospitalized individuals to those who were hospitalized, and risk was highest in patients who were admitted to intensive care (Fig. 3, Supplementary Tables 19, 20).

Fig. 3: Risks and burdens of incident prespecified high-resolution post-acute COVID-19 outcomes.
figure 3

Risks and burdens were assessed at 6 months in mutually exclusive cohorts comprising non-hospitalized individuals with COVID-19, people who were hospitalized for COVID-19 and people who were admitted to intensive care for COVID-19 during the acute phase (first 30 days) of the infection. All VHA users served as the referent category. Outcomes were ascertained from day 30 after COVID-19 diagnosis until the end of follow-up. Adjusted hazard ratios and excess burdens are presented; error bars represent the 95% confidence interval. GERD, gastrointestinal reflux disease; ICU, intensive care unit.

To gain a better understanding of whether these post-acute, prespecified outcomes are unique to COVID-19 or whether they represent a general post-viral syndrome, we further conducted comparative analyses (which were adjusted as specified in Methods, including adjusting for the severity of the acute infection) of the prespecified outcomes among people who were hospitalized with COVID-19 or seasonal influenza (Extended Data Table 1a, Supplementary Table 6). Our results show an increased risk and excess burden of a broad array of symptoms as well as multiple organ involvement among people with COVID-19 (Extended Data Fig. 8, Supplementary Table 21).

Negative-exposure controls

In addition to testing negative-outcome controls (Extended data Table 2a) and to further test the robustness of our approach, we developed and tested a pair of negative-exposure controls. We posited that exposure to influenza vaccination in odd- and even-numbered months between 1 October 2017 and 30 September 2019 should be associated with similar risks of clinical outcomes. We therefore tested associations between exposure to influenza vaccine in even- (n = 762,039) versus odd- (n = 599,981) numbered months and the full complement of 821 high-dimensional clinical outcomes considered in this study (including all diagnoses, medications and laboratory test results). We used the same data sources, cohort-building algorithm, variable definitions, analytical approach (including weighting method) and outcome specification, as well as a similar length of follow-up and interpretation method. Our results showed that none of the associations met the threshold of significance (P < 6.57 × 10−5) considered in this study (Supplementary Fig. 6, Supplementary Tables 2224).

Discussion

Here we use a high-dimensional approach to identify the spectrum of clinical abnormalities (incident diagnoses, incident medication use and incident laboratory abnormalities) experienced by individuals with COVID-19 who survive beyond the first 30 days of illness. The results suggest that, beyond the first 30 days of illness, people with COVID-19 are at higher risk of death and are more likely to use healthcare resources, and exhibit a broad array of incident pulmonary and extrapulmonary clinical manifestations (including nervous system and neurocognitive disorders, mental health disorders, metabolic disorders, cardiovascular disorders and gastrointestinal disorders) as well as signs and symptoms related to poor general wellbeing (including malaise, fatigue, musculoskeletal pain and anaemia). We observed an increased risk of the incident use of several classes of medication, including pain medications (opioid and non-opioid), antidepressant, anxiolytic, antihypertensive, antihyperlipidaemic and oral hypoglycaemic drugs and insulin. Our analyses of prespecified outcomes complement the high-dimensional approach to identify specific post-acute sequelae with greater diagnostic resolution and reveal two key findings: (1) that the risk and associated burden of post-acute sequelae is evident even among individuals whose acute disease was not severe enough to require hospitalization (representing the majority of people with COVID-19) and (2) that the risk and associated burden increases across the severity spectrum of the acute COVID-19 infection (from non-hospitalized to hospitalized individuals, to those admitted to intensive care). Our comparative approach to examining post-acute sequelae in individuals who are hospitalized with COVID-19 versus individuals with seasonal influenza (using a high-dimensional approach and through examination of prespecified outcomes) suggests that there is a substantially higher burden of a broad array of post-acute sequelae in the individuals who are hospitalized with COVID-19, which provides features that differentiate post-acute COVID-19 (both in the magnitude of risk and the breadth of organ involvement) from a post-influenza viral syndrome. Our results show that individuals who survive for 30 days or more after their COVID-19 diagnosis exhibit an increased risk of death and are more likely to use health resources, as well as a substantial burden of health loss that spans the pulmonary and several extrapulmonary organ systems; this highlights the need for holistic and integrated multidisciplinary long-term care of patients with COVID-19.

The mechanism or mechanisms that underlie the post-acute manifestations of COVID-19 are not entirely clear. Some of the manifestations may be driven by a direct effect of the viral infection, and may be explained by virus persisting in immune-privileged sites, an aberrant immune response, hyperactivation of the immune system or autoimmunity3. Indirect effects—including changes in social (for example, reduced social contact and loneliness), economic (for example, loss of employment) and behavioural conditions (for example, changes in diet and exercise)—that may be differentially experienced by people with COVID-19 may also shape health outcomes, and may be drivers of some of the post-acute clinical manifestations4,5,6,7,8. A better delineation of the direct and indirect effects, and a deeper understanding of the underlying biological mechanisms and epidemiological drivers, of the multifaceted long-term consequences of COVID-19 is needed9.

To our knowledge, this is the largest study of the post-acute sequelae of COVID-19; it involves 73,435 non-hospitalized patients with COVID-19, and 4,990,835 control individuals (corresponding to 2,070,615.52 person years of follow-up), as well as 13,654 hospitalized patients with COVID-19 and 13,997 patients hospitalized with seasonal influenza (corresponding to 12,179.05 person years of follow-up). We leveraged the breadth and depth of the national healthcare databases of the US Department of Veterans Affairs (the largest nationally integrated healthcare delivery system in the US) to undertake a comprehensive high-dimensional comparative approach (relative to control groups) to identify the 6-month health outcomes and clinical manifestations in patients who survived the first 30 days of COVID-19. We further examined risk in a prespecified set of outcomes with higher diagnostic resolution across care settings to enable a deeper understanding of the clinical symptomatology and diagnoses of post-acute COVID-19 across the spectrum of severity of the acute phase of the infection.

This study has several limitations. Although our approach identifies the incident post-acute sequelae in patients with COVID-19, it does not delineate which sequelae may be direct or indirect consequences of COVID-19 infection. Because of the predominantly male composition of the Veterans Affairs population, our findings may not identify clinical features of post-acute COVID-19 that may be much more pronounced in women, or non-expressed or very rare in men. Our approach demonstrated balance for more than 1,150 variables across several data domains (diagnoses, medications and laboratory data) and yielded successful testing of negative-exposure and -outcome controls, but we cannot completely rule out residual confounding effects. Finally, as the global pandemic of COVID-19 continues to evolve, as treatment strategies improve, as new variants of the virus emerge and as vaccine availability increases, it is likely that the epidemiology and short- and long-term outcomes of COVID-19 will also change over time.

Our findings show that, beyond the first 30 days of illness, a substantial burden of health loss that spans pulmonary and several extrapulmonary organ systems is experienced by individuals who survived the acute phase of COVID-19. Our results will inform global discussions on the post-acute manifestations of COVID-19, as well as health system planning and the development of care strategies that are aimed at reducing chronic and permanent health loss and optimizing wellness among patients with COVID-19.

Methods

All eligible participants were enrolled in the study, no statistical methods were used to predetermine sample size. The experiments were not randomized, and investigators were not blinded to allocation during experiments and outcome assessment.

Setting

Cohort participants were selected from US Department of Veterans Affairs (VA) electronic healthcare databases. The VHA provides healthcare to discharged veterans of the US armed forces and operates the largest nationally integrated healthcare system in the USA, with 1,255 healthcare facilities (including 170 VA Medical Centers and 1,074 outpatient sites) located across the USA. Veterans who are enrolled with the VHA have access to the comprehensive medical benefits package of the VA (which includes inpatient hospital care, outpatient services, preventive, primary and speciality care, prescriptions, mental healthcare, home healthcare, geriatric and extended care, medical equipment, and prosthetics). The VA electronic healthcare databases are updated daily.

Cohort

The cohort was constructed from 5,808,018 participants who had encountered the VHA between 1 January 2019 and 31 December 2019. Of those who were alive on 1 March 2020 (n = 5,606,309), a COVID-19 group was selected as individuals who had a positive test for COVID-19 between 1 March 2020 and 30 November 2020 (n = 98,661). Participants without hospitalization within the first 30 days of their first positive test were further selected (n = 76,877). To examine post-acute outcomes, we then selected from the COVID-19 group those alive on the 30th day after their positive test (participants with COVID-19, n = 73,435). To generate a comparison group that had a similar distribution of length of follow-up, we then matched each participant with COVID-19 with 70 VHA users who did not have a positive test for COVID-19 without replacement. In matching, the dates of cohort enrolment for the corresponding 70 VHA users were matched with time of cohort enrolment of the participant with COVID-19—that is, the date of testing positive (control group n = 5,140,450). In the VHA user group, we similarly selected individuals who were without hospitalization and alive during the first 30 days after the date of enrolment (control group n = 4,990,835) (Supplementary Fig. 1a, b). Participants were followed until 31 January 2021.

To compare post-acute outcomes of hospitalized participants with COVID-19 and hospitalized participants with seasonal influenza, we selected 15,846 participants with COVID-19 who were admitted to a hospital within 30 days after or 5 days before their first positive test (from the 98,661 patients with a positive COVID-19 test between 1 March 2020 and 30 November 2020). Similarly, we selected 62,909 patients who had their first positive seasonal influenza test between 1 October 2016 and 29 February 2020 and who had encountered the VHA at least once in the calendar year before the test was collected. Of these patients, 14,948 were admitted to a hospital within 30 days after or 5 days before their first positive influenza test. The hospitalized cohort was further restricted to those alive at the 30th day after hospital admission (COVID-19 n = 13,654; seasonal influenza n = 14,212), where for 215 patients who were in both the hospitalized COVID-19 and seasonal influenza group, only their COVID-19 hospitalizations were used in the analyses (Supplementary Fig. 3a, b). In this cohort, participants were considered to be enrolled at the time of hospitalization. To balance the duration of follow-up in the hospitalized COVID-19 and seasonal influenza groups, each participant in the seasonal influenza group was independently randomly assigned a duration of follow-up on the basis of the distribution of length of follow-up of the participants in the hospitalized COVID-19 group who were followed from date of hospitalization to 31 January 2021.

To examine high-resolution, prespecified post-acute COVID-19 outcomes across the severity spectrum of the initial acute disease, we built four mutually exclusive cohorts: VHA users without COVID-19 (n = 4,990,835), VHA users with COVID-19 (n = 73,435), VHA users who were hospitalized with COVID-19 within the first 30 days of follow-up (n = 10,068) and VHA users with COVID-19 who were admitted to the intensive care within the first 30 days of follow-up (n = 3,586). Participants in these cohorts were followed up until 31 January 2021.

Data sources

Electronic health records from VA Corporate Data Warehouse (CDW) were used in this study10,11,12,13. The CDW ‘outpatient encounters’ domains provided information related to outpatient encounters and ‘inpatient encounters’ domains provided information between hospital admission and discharge14. The CDW ‘outpatient pharmacy’ domain and CDW ‘bar code medication administration’ domain were used to collect medication data, and CDW ‘patient’ domain was used to collect demographic information. The CDW ‘laboratory results’ domain was used to collect laboratory test information, and the ‘COVID-19 shared data resource’ was used to collect COVID-19 test and demographic information for patients with COVID-19. In addition, the area deprivation index—which is a composite measure of income, education, employment and housing—was obtained from the University of Wisconsin15.

Post-acute use of health resources and death

Outcomes that occurred after 30 days of cohort enrolment—including death, incident outpatient encounter and frequency of outpatient encounter—were examined in both cohorts. The frequency of outpatient encounters was computed on the basis of the number of days with outpatient encounter divided by days of follow-up after 30 days, and is reported as the number of outpatient encounters per 30 days.

High dimensional post-acute clinical characteristics

Negative outcome and exposure controls

The application of negative controls in clinical epidemiology may help to detect both suspected and unsuspected sources of spurious bias, and may lessen concerns about unmeasured confounding and other latent biases16. Here we followed a previously published approach16 to examine a panel of eight negative-outcome controls (including neoplasms, accidental injuries, scars, fitting or adjustment of orthodontic or dental prosthetic device, fitting or adjustment of hearing device, fitting or adjustment or orthotics, fitting or adjustment of casts, and bandages), for which (based on current knowledge) there should be no causal relation between the exposures and risks of the negative-outcome controls. We also developed and tested a pair of negative-exposure controls (defined as exposure to influenza vaccine in odd- or even-numbered months during the period between 1 October 2017 and 30 September 2019). We posited that there should be no differences in risk of clinical outcomes associated with receipt in influenza vaccine in odd- versus even-numbered months. The negative-exposure controls were tested in all 821 high-dimensional outcomes considered in our analyses, including diagnoses, medications and laboratory test results; we used the same data sources, cohort-building algorithm, variable definitions, analytical approaches and outcome specification, as well as a similar length of follow-up and interpretation method. In the assessment of negative-outcome and negative-exposure controls, the relation of the exposure–outcome pairs may share the same potential biases with COVID-19 and the outcomes examined in this study (including biases in the underlying data, algorithms for the construction of cohorts, unmeasured confounders, misspecification of modelling algorithms, outcome ascertainment, analytical considerations, result interpretation and other latent biases)16,17. The successful testing of negative controls reduces concerns about both suspected and unsuspected sources of spurious associations, including associations owing to unmeasured confounding, flaws in the analytical approach, differences in outcome ascertainment and other sources of bias16. In particular, the successful testing of the outcome controls may reduce concerns about biases in outcome ascertainment and unmeasured confounding between the comparison groups (for example, if there was bias in ascertainment of clinical outcomes in one arm versus another, this bias may also extend to ascertainment of neoplasms, accidental injuries or other negative-outcome controls tested in this study); the successful testing of the exposure control may reduce concerns about biases in the analytical approach and underlying data (for example, if there was bias related to the analytic approach, it may also bias the negative-exposure control).

Diagnoses

All ICD-10 diagnosis codes from cohort participants from day 30 after COVID-19 diagnosis until the end of follow-up were used to define the post-acute diagnosis outcomes. More than 70,000 ICD-10 diagnosis codes were classified into 540 diagnostic categories based on the Clinical Classifications Software Refined (CCSR) version 2021.1, which is developed as part of the Healthcare Cost and Utilization Project sponsored by the Agency for Healthcare Research and Quality18,19,20. We examined only diagnostic categories that may plausibly be considered post-acute sequelae of COVID-19 in the adult population. Some diagnostic categories—including external causes of morbidity, injury, poisoning and some other consequences of external causes, congenital malformations, deformations and chromosomal abnormalities, some conditions originating in the perinatal period or outcome from pregnancy, childbirth and the puerperium—were not examined, yielding 379 diagnostic categories.

Medication use

The prescription records of cohort participants from day 30 after COVID-19 diagnosis until the end of follow-up were used to define the post-acute medication use. We classified 3,425 medications on the basis of the VA drug classification system, into 543 medication classes21,22. After removing items in the medication group of investigational agents or prosthetics, supplies and devices, we examined 380 different medication outcomes in total.

Laboratory abnormalities

In total, 62 laboratory test abnormalities from 38 laboratory measurements from day 30 after COVID-19 diagnosis until the end of follow-up were examined including absolute T cell count, alanine aminotransferase, aspartate aminotransferase, blood urea nitrogen, brain natriuretic peptide, C-reactive protein, carbon dioxide, CD4/CD8 ratio, direct bilirubin, estimated glomerular filtration rate, ferritin, haematocrit, haemoglobin, haemoglobin A1c, high-density-lipoprotein cholesterol, high-sensitivity C-reactive protein, international normalized ratio, low-density-lipoprotein cholesterol, microalbumin/creatinine ratio, partial thromboplastin time, platelet count, pro B natriuretic peptide, prothrombin time, serum albumin, serum alkaline phosphatase, serum calcium, serum chloride, serum creatinine, serum phosphate, serum potassium, serum sodium, serum total protein, total bilirubin, total cholesterol, total white blood cell count, triglycerides, troponin I and troponin T were identified on the basis of ‘Logical Observation Identifiers Names and Codes’. Each laboratory test result was classified into abnormally high or abnormally low on the basis of whether results were above the upper normal range or below the lower normal range (in instances in which a high or low result might be clinically possible for a given laboratory test). The definition of the abnormality for each laboratory test is presented in Supplementary Tables 4, 9.

High-resolution, prespecified post-acute COVID-19 outcomes

To identify clinical manifestations of post-acute COVID-19 with greater diagnostic resolution, we specified a list of outcomes on the basis of data from the Center of Disease Control and the National Institute of Health workshop on post-acute COVID-19. Outcomes were defined on the basis of previous definitions that have been validated for use with electronic health records, and integrated information from diagnoses, medications and laboratory measurements when appropriate23,24,25,26,27,28,29. To gain a deeper understanding of the risks of these outcomes across the severity scale of the acute infection, we examined the risk across the care setting of the acute disease—a proxy indicator of clinical severity—in four mutually exclusive cohorts (VHA users (who served as the referent category); people with COVID-19; people hospitalized for COVID-19; and people admitted to intensive care for COVID-19). In addition, we estimated the risks of these prespecified outcomes in individuals hospitalized with COVID-19 and seasonal influenza. The prespecified, high-resolution outcomes included acute coronary disease, acute kidney injury, anxiety, arrythmias, bradycardia, chest pain, chronic kidney disease, constipation, cough, depression, diarrhoea, type 2 diabetes mellitus, fatigue, gastric oesophageal reflux disease, hair loss, headache, heart failure, hyperlipidaemia, hypoxaemia, joint pain, memory problems, muscle weakness, obesity, shortness of breath, skin rash, sleep disorder, smell disorder, stroke, tachycardia and thromboembolism. We restricted capture of incident acute coronary disease, stroke and thromboembolism to inpatient diagnoses that were not present on admission. All other prespecified outcomes that may plausibly be encountered in either the outpatient or inpatient setting were accordingly ascertained in the setting in which they first occurred. Among individuals with COVID-19, and for each prespecified outcome, the percentages of outcomes that were ascertained from outpatient and inpatient data are presented in Supplementary Tables 19, 20.

Covariates

The predefined covariates for analyses included demographics (such as age, race (white, black and other), sex and receipt of long-term care) and proxies of healthcare use (such as number of outpatient encounters, number of hospital admissions, number of outpatient prescriptions and number of outpatient eGFR measurements in the year before enrolment). In addition, we included the area deprivation index at the residency address of patients as a summary measurement of socio-economic deprivation. We used the Sequential Organ Failure Assessment (SOFA) score to adjust for the severity of the acute infection in additional high-dimensional analyses of the hospitalized COVID-19 versus hospitalized seasonal influenza cohorts30,31. To address potential nonlinear associations, all continuous variables were adjusted as restricted cubic spline functions.

To further adjust the models in the most optimal manner, we leveraged the multidimensionality of the electronic healthcare databases of the VA to algorithmically identify covariates (potential confounders) that span multiple domains (diagnoses, pharmacy records and laboratory tests) and that showed evidence of difference in prevalence between the comparison groups24. In the COVID-19 versus VHA users cohort (and separately in the hospitalized COVID-19 versus influenza cohort), high-dimensional covariates were ascertained within one year before the date of enrolment. Within all diagnoses, medication classes and laboratory tests, we first selected variables that occurs in at least 10 patients in both groups. We then estimated the unadjusted relative risk of each variable with being in the COVID-19 or comparator group. The top 100 high-dimensional variables with the strongest association with group membership were used, along with predefined covariates, in the analyses.

To most optimally estimate the risk of the set of prespecified outcomes across the intensity of care needed during the acute infection, we ascertained four sets of high-dimensional covariates (corresponding to the four mutually exclusive groups (all VHA users, people with COVID-19, people who were hospitalized with COVID-19 and people who were admitted to intensive care with COVID-19)) in total, on the basis of the unadjusted relative risk of being in each group compared to being in the remaining three groups. High-dimensional covariates were used along with predefined covariates in the analyses32.

Statistical analyses

The characteristics of the VHA users who were not hospitalized for COVID-19, VHA users who were without COVID-19, hospitalized participants with COVID-19 and hospitalized participants with seasonal influenza are described in Extended Data Table 1a. The flow charts of the overall analytical approach are presented in Supplementary Figs. 7, 8.

We estimated the risk of health resource use and death, and the risk of each diagnosis, medication use and laboratory abnormality between individuals with COVID-19 and all VHA users, and—separately—between individuals who had been hospitalized for COVID-19 or seasonal influenza. To estimate the risk of each incident outcome, we built a cohort of participants without a history of the outcome being examined (for example, risk of insulin use was estimated within a cohort of participants without history of insulin use in the year before cohort enrolment). For each outcome-specific cohort, propensity scores based on predefined variables and high-dimensional algorithmically selected variables were estimated. The propensity scores were then used to compute the overlap weight, which is the probability of membership in the non-observed exposure group (one minus the propensity of in the observed group)33,34. We then—for all outcome models—assessed covariate balance, calculating the standardized difference after application of the overlap weight for all predefined variables, 100 algorithmically selected high-dimensional variables, and all high-dimensional variables that were not selected for inclusion in the propensity score models. We present the distribution of these standardized differences for 20 randomly selected outcome-specific cohorts, and across all outcomes, and the covariate distributions in overall cohort after adjustment.

The risks of health resource use—including outpatient encounter and death between individuals with COVID-19 and all VHA users, and between COVID-19 hospitalization and influenza hospitalization—were estimated from a Cox survival model weighted by overlap weights, in which death was considered as a competing risk in the evaluation of health resource use. The frequency of outpatient encounter was modelled on the basis of a weighted linear regression. Hazard ratios for each of the outcomes—including incident diagnoses, incident medication use and incident laboratory abnormalities—were estimated from cause-specific hazard models weighted by overlap weights, in which occurrence of death was considered as a competing risk. Event rates per 1,000 participants at 6 months (180 days) of follow-up in each group, and the adjusted excess burden based on the differences between two groups, were estimated. Models were built only for outcomes that occurred in at least 10 participants from each group. A Bonferroni correction was applied in consideration of multiple hypotheses testing for high-dimensional outcomes. A P values of less than 6.57 × 10−5 was considered statistically significant. Results are additionally presented with a focus on identified post-acute sequelae of COVID-19, in which we selected those sequelae with a hazard ratio greater than 1 and P values of less than 6.57 × 10−5. High-dimensional analyses of individuals who were hospitalized for COVID-19 versus seasonal influenza, which were adjusted for the severity of the acute infection (through inclusion of SOFA scores), were additionally undertaken. In addition, high-dimensional analyses were also conducted to evaluate the risk of six-month clinical outcomes in people who were hospitalized for COVID-19 versus those who were hospitalized for other causes. Participants who were hospitalized for other causes who survived the first 30 days after hospital admission were enrolled between 1 October 2016 and 29 February 2020 (n = 901,516).

We examined the risk of high-resolution, prespecified outcomes across care settings of the acute phase of the disease, analysing differences in risk of clinical manifestations of post-acute COVID-19 between mutually exclusive groups of people who were positive for COVID-19 (non-hospitalized, hospitalized and admitted to intensive care), and VHA users who were not positive for COVID-19. Propensity scores for group membership were estimated in outcome-specific cohorts free of the related disease at baseline32. Standardized differences in the predefined and algorithmically selected high-dimensional covariates are presented after application of overlap weighting35. The percentage of outcomes ascertained in the COVID-19 group in an inpatient and outpatient setting are presented. We then constructed Cox survival models to analyse the risk of outcomes using overlap weighting for multiple treatments. We report hazard ratios and event rate differences between each group. We also estimated the risks of prespecified outcomes among individuals who were hospitalized with COVID-19 or seasonal influenza, which were additionally adjusted using SOFA scores.

All analyses were done using SAS Enterprise Guide version 7.1. Data visualizations were performed in R 4.0.3. The study was approved by the Institutional Review Board of the Department of Veterans Affairs St. Louis Health Care System.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this paper.