SARS-CoV-2, the viral cause of COVID-19, has triggered a global pandemic infecting over 536 million people and over 6.3 million deaths worldwide as of June 20221. The acute effects of COVID-19 are well documented, but longer-term effects are actively being investigated and defined. Post-acute sequelae of SARS-CoV-2 infection (PASC) has been described by the World Health Organization as the persistence of symptoms or new symptoms more than 30 days post-SARS-CoV-2 infection2,3,4. Lingering symptoms can persist months after acute infection, including recurring fatigue, muscle weakness, dyspnea, anxiety, and depression. The symptomatology of PASC and time course, however, have been based on self-selected cases and not drawn from a well-defined PCR-positive population, compared to an appropriate control group and followed longitudinally.

Disease characterization and definition have changed over time and identification via standard ICD-10 diagnosis codes were only enacted in later 20215. Al-Aly and colleagues laid the groundwork by performing a comprehensive analysis of potential PASC symptomology within the Veterans Health Administration (VHA)3. Estiri et al. expanded PASC investigation to a non-hospitalized based cohort and identified 33 phenotypes in 3-6 month and 6-9-month periods post-COVID6. However, questions remain regarding the timing of conditions, and importantly what symptoms persisted from acute infection to late periods and which symptoms developed in the late period. Additionally, comparisons to matched populations with a negative SARS-CoV-2 test result (PCR-negative) have not been systematically conducted and are crucial to differentiating the impact of the pandemic from the impact of viral infection. A case-control approach permits a better curated PASC definition and condition identification.

Our primary aim is to define a set of PASC conditions, and to describe the timing of the conditions, by applying to a diverse population and comparison group of similar PCR-negative individuals. We selected a longitudinal cohort of COVIDPCR-positive patients and matched them to COVIDPCR-negative patients within the Kaiser Permanente Mid-Atlantic States (KPMAS), identified the clinical conditions for which there is an increased risk for those PCR-positive (vs. PCR-negative) and estimated PASC incidence among those PCR-positive.


31,390 total PCR-positive patients were identified. The majority were female and over half were less than 50 years old (Table 1). Over half were minority populations with 39% Black and 29% Hispanic. Over half were overweight (BMI > 25 kg/m2). The most frequent pre-existing conditions co-morbidity was diabetes mellitus.

Table 1 Positive SARS-CoV-2 RT-PCR patient demographics

PASC-related conditions among PCR-positive patients

We identified 17 PASC-related conditions (Table 2). The most common acute and persistent PASC-related conditions, that were either greater than the pre-existing conditions time interval or determined clinically significant during physician review, were other lower respiratory disease (4.5%) and respiratory failure (2.7%). Most common late PASC-related conditions (i.e., >1.5% among PCR-positive) were abdominal pain, gastrointestinal disorders, other nervous system disorders, nausea and vomiting, nonspecific chest pain, dizziness/vertigo, malaise and fatigue, anxiety disorders, mental health disorders, other lower respiratory diseases, and cardiac dysrhythmias. Overall, 37.7% of PCR-positive patients had at least one condition (in the acute and persistent or late period) and 16.5% of PCR-positive patients had at least one PASC-related condition in either period. 20.4% of our PCR-positive patients had at least one condition and 4.1% had a PASC-related condition in the acute and persistent period. Late period results were 26.1% and 13.6%, respectively.

Table 2 Clinical Classification Software (CCS) - PASC-related conditions deemed clinically significant by our infectious disease physicians among PCR-positive patients


The scaled matching algorithm resulted in a study population of 28,118 PCR-positive and 70,293 PCR-negative. 1:3 case to control matching represented 66.8% of the identified cohort, followed by 16.2% with 1:2 and 16.8% with 1:1 matching. Overall, both case and control groups had ~57% female patients, a higher distribution of Black (~40–43%) and Hispanic (~20–24%) compared with white (~18–22%) patients, ~87% distribution less than 65 years old, and 30%-33% distribution of patients with a BMI ≥ 30 kg/m2 (Table 3). Although Chi-Squared statistics showed association between cohort and age, BMI, COPD, hospitalization in the 30 -120 daytime period post T0, pregnancy, and race, all those associations were extremely weak (Highest Cramer’s V = 0.060) and likely an effect of overall cohort size. Overall, 37.7% of PCR-positive patients had at least one condition (in the acute and persistent or late period) and 16.5% of PCR-positive patients had at least one PASC-related condition in either period. 20.4% of our PCR-positive patients had at least one condition and 4.1% had a PASC-related condition in the acute and persistent period. Late period results were 26.1% and 13.6%, respectively. Among PCR-negative, 2.5% had a PASC-related condition in the acute period and 22.1% had any condition in this period (more than the PCR-positive). Further, among PCR-negative, 12.1% had a PASC-related condition in the late period, and 25.2% had any condition (fewer than the PCR-positive).

Table 3 Patient demographics and co-morbidities for the matched cohort

Risk analysis – CCS PASC-related conditions

There was an increased risk of numerous conditions in those PCR-positive compared to PCR-negative in both acute and persistent and late time periods (Table 4; Fig. 1). The risk of having any conditions in the late period was 4% higher in PCR-positive versus PCR-negative (RR = 1.04; 95%CI: 1.01,1.07) and 8% lower in the acute and persistent period (RR = 0.92; 0.89,0.95). The risk of having at least one PASC-related condition, however, was increased by 12% in the late period (RR = 1.12; 1.08,1.16) and 60% in the acute and persistent period (RR = 1.60; 1.48,1.72). These PASC related conditions had significantly higher risk among PCR-positive versus PCR-negative in the late period: anosmia (RR = 3.88; 2.79,5.40); cardiac dysrhythmias (RR = 1.25; 1.08,1.45); diabetes (RR = 1.20; 1.03,1.38); genitourinary conditions (RR = 1.21; 1.07,1.36); malaise and fatigue (RR = 1.60; 1.41,1.81), nonspecific chest pain (RR = 1.39; 1.24,1.55). For risk and cumulative incidence of all CCS categories considered, see Supplementary Table 1.

Table 4 Risk and cumulative incidence of CCS categories in the case vs. controls
Fig. 1: Unadjusted risk ratios (and 95% confidence intervals) of PASC-related conditions comparing PCR-positive (vs. PCR-negative), in three-time periods anchored on the date of SARS-CoV-2 PCR test result.
figure 1

A CCS condition risk ratio comparison with a 95% CI plot for PCR-positive population vs PCR-negative population within our study time periods. Risk ratio is the measure of interest comprised of the number of CCS conditions incident in the PCR-positive cohort (n = 28,118) versus CCS conditions incident in the PCR-negative cohort (n = 70,293), with 95% confidence intervals represented by the respective bands. Utilizing 1.0 as the baseline, significant risk ratios (p < 0.05) for the PASC-related conditions can be identified in bold and compared in scale to the other conditions. (*) Asterisk designates that the metric was too large to fit within the scale of the graphic.

Some of these conditions also had increased risk among PCR-positive (vs. PCR-negative) in the acute and persistent period, including cardiac dysrhythmias (RR = 1.90; 95%CI: 1.45, 2.49), diabetes (RR = 1.96; 1.50, 2.55), malaise and fatigue (RR = 2.89; 2.10, 3.98), nonspecific chest pain (RR = 2.39; 1.85, 3.10). Additional PASC related conditions that had increased risk in the acute and persistent period include other lower respiratory disease (RR = 2.51; 2.15, 2.92) and respiratory failure/insufficiency/arrest (RR = 22.95; 14.78, 35.64).

Distribution analysis for PASC-related conditions

There was some variation in the demographic distributions for those experiencing at least one PASC-related condition in the acute and persistent and/or late periods (Table 5; Fig. 1). Most notably, those experiencing a PASC-related condition, versus the overall cohorts, were mostly female (~62% vs ~57%), slightly older (Age 65 + : ~15% vs 12–13%) and had higher hospitalization in the 30–120 days post lab test date (~5.2–6.8% vs 1.8–1.9%).

Table 5 Patient demographics for the matched cohort vs patients experiencing PASC related conditions

Sensitivity analysis

The sensitivity analysis attenuated the increased risk ratios for PASC related conditions by allowing visible conditions to occur in multiple time periods; thus, removing the requirement of mutual exclusivity between time periods. None of the significant increased risk ratios changed statistical significance (Table 6). Abdominal pain (RR = 0.73; 95%CI: 0.63,0.85) and nausea and vomiting (RR = 0.61; 0.45,0.82) had protective associations in the acute and persistent period that strengthened and were significant in the sensitivity analysis; both conditions were less burdensome among PCR-positive and PCR-negative groups.

Table 6 Sensitivity analysis - risk and cumulative incidence of symptom-based CCS categories

The case/control diabetes and corticosteroid sensitivity analysis showed no significant association between cases and controls and corticosteroid use during the late time period (P = 0.89) had a significant, but weak, association (P = 0.01; Cramer’s V = 0.1169), between cases and controls and corticosteroid use during the acute and persistent time period. Only 32% of diabetic patients (cases and controls) were on corticosteroids during the late period and only 18% of diabetic patients (cases and controls) were on corticosteroids during the acute and persistent time period. The risk ratio for diabetes in the late time period was 1.20 (CI: 1.03, 1.38) and 1.96 (CI: 1.50, 2.55) in the acute and persistent time period; therefore, we can say that corticosteroid use and/or abuse likely has little to no association to the increased diabetes risk.


Our study introduces a list of clinical conditions associated with PASC and an overall incidence of PASC within our population that can be used to diagnose long-term effects of COVID-19. Additionally, our results aid in characterizing an operational PASC definition and provide a time frame for identifying conditions with significantly higher incidence post-COVID-19 infection, including those described by others7. Existing literature hasn’t fully explored PASC and long-term effects of COVID-19 infection8. We expand upon previous studies by comparing PCR-positive patients to a matched cohort of PCR-negative patients within a closed integrated health system and utilized a time-based approach to provide supporting evidence for the resulting common clinical conditions of PASC.

Unlike other studies, we separated conditions by time of presentation and accounted for previous pre-existing conditions in order to delineate conditions of PASC. While many pre-existing conditions may have been exacerbated by COVID, operationally, they should not be considered a late PASC condition. Future research should aim to understand the severity and persistence of these PASC-related conditions. Additional time periods post-COVID would be important in analyzing additional conditions or symptoms that develop well beyond expected time intervals, including PASC development with later surges/waves of COVID-19 and the impact of vaccination on PASC. We limited our cohort to this initial period of SARS-CoV-2 infection (i.e., 2020) to avoid the influence of later variants and vaccinations, and to only those with a PCR test result9.

Our study also reveals a presence of a disease burden among PCR-negative persons. While any conditions and our PASC-related conditions are distinctly different, the comparison of any condition provides additional evidence that the resulting PASC-related conditions are not only higher risk in the PCR-positive group, but higher risk compared to patients experiencing other symptoms/conditions. This comparison reenforces that the resulting PASC-related conditions are truly representative of patients experiencing PASC. Also, note that among patients without COVID, many experienced these same symptoms, providing further context for our results.

In contrast, the absolute differences in PASC-related conditions, or any conditions, are not large when comparing PCR-positive to PCR-negative groups. In fact, any condition (having any condition that was considered for PASC in our final analysis) was more common among PCR-negative than PCR-positive in the acute period. This has further implications for an operational PASC definition, in that while many conditions have been cited as potentially part of PASC, they are occurring with similar frequencies among PCR-negative patients. It is important, thus, to recognize that these symptoms, while elevated in patients with SARS-CoV-2 infection, are not unusual either in PCR-negative persons. Further, it is important to acknowledge the toll the pandemic has taken on all patients and while many of these PASC-defining conditions do not have large incidence rates, these conditions are still very impactful to the patients that experience them and require attention from their medical providers.

Our acute and persistent and late period PASC-related conditions are not surprising, as most have been described in case reports in the literature to date or are commonly seen in sub-acute viral illnesses10,11. It also should be noted that we did not compare our results or rates to a separate viral condition, such as HIV, as COVID and PASC were unique to this time and the focus of our study. While these PASC condition categories are multifaceted, such as GU disorders, all have been described by others12,13. Respiratory symptoms were more prominent in the acute and persistent period, which is consistent with COVID-19 symptomatology14. However, most of these did not occur in the late period, and many patients had pre-existing pulmonary conditions and respiratory-related diagnoses. As described in the literature, anosmia was seen at a higher incidence for PCR-positive patients7.

An acknowledged limitation is that not all conditions described in the literature or popular press can be coded consistently during our study period in the EHR, most notably brain fog; however, malaise and mental health were prominent PASC-related conditions and likely associated with brain fog symptomatology15. Consistent with the findings from Al-Aly3, diabetes mellitus had increased incidence during the post-COVID period. One possibility is that diabetic patients were simply undiagnosed until they sought care for their COVID-19 infection and were laterally diagnosed16,17. Another possibility is that COVID affects blood sugar and pancreatic endocrine function17. As noted in our sensitivity analysis, corticosteroid use did not greatly impact diabetes incidence. However, diabetes is relatively common in the KPMAS population, and an even further increased risk of the disease is of considerable concern for patient health. Future research is needed to understand the relationship between diabetes and COVID-19.

Al-Aly and colleagues3 provided an encompassing view of PASC conditions within the VHA population. Our study provides supporting evidence around specific conditions identified and negates their limitations around having a primarily older (average 60 years), white and male population3. We also utilized a time interval analysis which provides context and support for our most profound PASC-related conditions. Our results differ from the VHA study as they have an overall higher level of risk for most of their identified conditions. One potential reason for this difference is that our control group required testing negative for COVID-19, while their control group includes those who had no evidence of testing. Demographic differences may have also contributed to the result divergence. Our study also improves upon Estiri and colleagues’ PCR-negative comparison group by utilizing data from a closed integrated healthcare system with accurate membership accounting, applying a matching algorithm to better control for confounding variables, and most importantly, providing additional comparison periods to provide supporting evidence for the late conditions6.

Other limitations are relevant to our study. It is possible that pre-existing conditions could be found more than four years prior to the PCR test date. The intent of going back four years was to ensure that we capture conditions that may have been missed on more recent encounters prior to the COVID test. Individual risk estimates are also heavily dependent on time period length, by which larger time periods, such as the 4-year pre-existing condition period, likely have a high probability of diagnosis capture compared to the 30-day acute and persistent period. This time period length discrepancy has been attenuated in our analysis as our overall comparisons between the PCR-negative vs PCR-positive cohorts are compared within each equitable time interval. Additionally, our sensitivity analysis found no effect on the significance of our results when removing the mutual exclusivity requirement for time periods and allowing symptom-based conditions to count in the acute and persistent period as well as the late period, regardless of if a condition was pre-existing.

Additionally, KPMAS healthcare utilization patterns were also found to be significantly altered by the pandemic18. While we capture all telehealth visits, there is the potential for missing PASC diagnoses and encounters as some patients may not have sought medical care for their symptoms. We tested this limitation and found >76% of our PCR-positive and PCR-negative cohorts had at least one encounter, virtual or in-person, in the late period. Lastly, we acknowledge that our current study period does not include the outside influence of other variants, vaccination, and widely distributed home testing which may impact future definitions and symptomatology of PASC.

Our study population consists of insured patients only which includes Medicare and Medicaid, and charity care, so a wide demographic is included. While geographically limited, our population represents the general population well in the DC/VA/MD area4,5. We also cannot rule out the potential for missing data around PCR testing, especially in early 2020. Diagnoses, lab results and death, which was not compared with the National Death Index (NDI) for the cause of death, also have the potential for missingness; however, we believe our care model and connection to external data sources significantly reduces much of this limitation. Lastly, we cannot rule out the possibility of false negative COVID tests but given the high community prevalence of SARS-CoV-2 during the study period, and the KPMAS protocol for testing 5-14 days post COVID-19 exposure, the likelihood of missed COVID-19 diagnosis is low. Our strengths include examining data within an integrated and closed health system and drawing from a well-defined patient population of over 800,000 members. Further, our system has comprehensive capture of both PCR-positive and PCR-negative test results from both internal and external sources, as well as a comprehensive capture of PASC recorded symptoms and conditions. Additionally, our ability to create a matched PCR-negative population, with majority of PCR-positive cases being matched to three COVID controls, and our analysis of distinct time periods associated with condition manifestation are key distinctions of this study compared to others in the literature16.

Our study demonstrates a clearly defined set of conditions for PASC definition and delineation within an integrated care system. This delineation compared the acute and persistent time period conditions with conditions identified in the late time period. These conditions are at a significantly higher risk when compared with a PCR-negative population matched on similar demographics. However, PASC-related conditions do occur among PCR-negative populations and should not be neglected among these patients. Additionally, our study found that the overall cumulative incidence of PASC, as defined by COVID positive patients with a PASC-related diagnosis in the acute and persistent or late periods, is 16.5%. Importantly, the low-risk levels, defined by the cumulative incidence of each individual condition, provide context to the overall low burden of disease for PASC-related conditions in the KPMAS population. These findings contribute to the overall evaluation of PASC and can be employed by clinicians in their care of patients who are diagnosed with COVID-19. Our research provides supporting evidence for an accepted operational definition for PASC; however, understanding of the severity and duration of these conditions will be crucial.



Kaiser Permanente (KP) is an integrated health system in the United States, with over 800,000 members in the Mid-Atlantic region, representing Maryland, the District of Columbia, and Northern Virginia. KPMAS members are a diverse population and their demographics represent their respective jurisdictions19. They are provided comprehensive integrated health care, including, but not limited to, primary and specialty care, ambulatory and inpatient care (with integration among partner hospitals in the Mid-Atlantic region). Their healthcare is coordinated through an integrated electronic health record (EHR) system which includes clinical data, financial information (claims data) on services received external to KPMAS, and data from the Geographically Enriched Member Sociodemographic (GEMS) database20. KPMAS is a closed healthcare system with high ascertainment of COVID-19 in the population, as well as potential PASC conditions and symptoms.

Our study was approved by the KPMAS Institutional Review Board on an expedited basis.

Study population and COVID-19 classification

SARS-CoV-2 RT-PCR (PCR) testing, the most widely available test during our study period, has been regarded as the gold standard for COVID patient identification21. Given the magnitude of testing performed within our system and external testing linkages, we classified COVID positive patients as those with a confirmed PCR result and refer to those as PCR-positive. We refer to those with only COVID PCR-negative results as PCR-negative.

Utilizing KPMAS EHR, including internal and external records incorporated into the EHR (Epic® Care Everywhere and Maryland/Washington DC health information exchange called CRISP)9, we identified adult patients (≥18 years) who had a PCR result between January 1, 2020, through December 31, 2020. We limited our cohort to this period to avoid the influence of later variants and vaccinations, and only to those with a PCR test result9. Of note, the KPMAS protocol did not test patients prior to five days post-exposure or greater than 14 days post-exposure. We prioritized PCR-positive results for each patient, then selected the first positive date as our index date. Patients were classified into cases when they received a PCR-positive COVID result or controls if they only tested negative. We excluded patients not enrolled in KPMAS 120 days post-PCR test date.

For each PCR-positive patient (case), we matched up to three PCR-negative patients, without replacement, by PCR testing month and year (using their first negative test date), age group at the time of PCR test (18–29, 30–39, 40–49, 50–64, 65–74, 75–84, ≥85 years), race/ethnicity (Black/African American, White, Hispanic, Asian, Other/Unknown), sex (female or male), and service area (to account for any physician differences in diagnostic practices). When 1:3 matching was not possible, cases were matched to controls 1:2 or 1:1.

Demographic covariates of interest abstracted from the EHR included: race/ethnicity age, comorbidities (chronic kidney disease, chronic obstructive pulmonary disease, diabetes mellitus type 1 or 2, Hepatitis B, HIV, cancer), Body Mass Index (BMI; kg/m2), insurance type (commercial, Medicare, Medicaid, Charity, Affordable Care Act), pregnancy status, service area, and hospitalizations (30-120 days post-test) and known deaths post-index date.

Timing and classification of symptoms and conditions

The timing and definition of the conditions identified were critical in distinguishing sequelae of significance. Our index date (T0) was the PCR test date. Three, mutually exclusive, diagnostic time intervals were identified and anchored on T0: (1) pre-existing conditions time interval - diagnoses up to four years prior to T0, (2) acute and persistent time interval – diagnoses occurring 0–30 days post-T0 and persisted into the 30–120 days period, but not previously identified in the pre-existing conditions time interval, and (3) late time interval - new disease diagnoses 30–120 days post-T0, but not previously identified in the pre-existing conditions or acute and persistent time intervals (Fig. 2).

Fig. 2: Diagnosis Observation Periods.
figure 2

Diagnostic observation timeline for CCS conditions in relation to the COVID testing date as the index date. The time periods used in this study were defined as follows: Late: 30–120 days post COVID test date; Acute and persistent 0–30 days post COVID test date and persisted 30–120 days; Pre-existing conditions: four years prior to COVID test date.

Diagnostic grouping of ICD codes was performed with standard Healthcare Cost and Utilization Project (HCUP) Clinical Classifications Software (CCS)22. From this software, the CCS Category Level was chosen as the anchor group as it allowed for general diagnostic rollup but maintained enough specificity to identify distinct conditions for PASC. Further modification of the CCS condition mapping was performed after review by two KPMAS infectious disease physicians. It was determined that some ICD code mappings, for example, anosmia, did not meet expectation and were either excluded from the CCS mapping, remapped to another CCS Category, or placed under a CCS Category that we created (Tables 7, 8; excluded diagnoses/CCS categories Tables Supplementary Table 2-Supplementary Table 3). This method of manual modifications to CCS Categories has been performed in previous studies23. We abstracted all diagnoses from our EHR and claims systems that occurred within our observation periods and enforced mutual exclusivity time requirements at the CCS Category Level. CCS conditions were only counted once per patient and classified based on when the condition was first recorded in the EHR.

Table 7 CCS categories merged in analysis
Table 8 Specific ICD diagnoses merged in analysis

Distribution Analysis and PASC-related conditions

To determine which CCS conditions provided the signal for PASC diagnosis, distributions were calculated for PCR-positive patients by taking the distinct number of patients with a particular CCS condition over the total number of distinct patient-CCS condition combinations, within each respective time interval. We used an aggregated total distribution percentage, summed between all time periods, of 0.04% as a cutoff for CCS Conditions that merited clinical review. The .04% cutoff was determined by review of distribution counts for the CCS conditions, whereby the .04% cutoff merited an appropriate number of conditions for risk analysis. From the remaining symptoms and diagnoses, higher frequency conditions were reviewed by two KPMAS infectious disease physicians. Conditions flagged by the infection disease physicians, based on biologic plausibility and review of the medical literature to date of initial analysis (April 2021), were then further refined and grouped on clinical similarities and/or a more defined condition classification. For example, Genitourinary symptoms were grouped together while CCS conditions related to trauma were deleted from the analysis. These condition groupings were again presented to the KPMAS infectious disease physicians with a final determination made to which conditions had a plausible biologic association to PASC (Supplementary Table 1). CCS conditions that met our criteria of higher acute and persistent or late distributions, and deemed clinically significant by the infectious disease physicians, were considered PASC-related conditions (PASC-related conditions; Table 2).

Statistical analysis

Demographic characteristics were compared by COVID status. Slight variations in demographic characteristics used for matching (caused by the scaled matching technique) were tested via Cramer’s V to investigate distribution equality in cases and controls. Further distribution analyses were performed on those experiencing at least one PASC-related condition in the acute and persistent and/or late periods. Overall counts, cumulative incidence, and unadjusted risk ratios with 95% confidence intervals (using the Wald Test method) were calculated for each CCS condition category within each time interval. Cumulative incidence was defined as the total number of distinct patients with a particular CCS condition, within each respective time period, over the total number of patients in the observed cohort. Additionally, totals for a patient having at least one CCS condition or at least one PASC-related condition were estimated and stratified by time interval. Data collection and analyses were performed using SAS software (version 9.4; Cary, North Carolina), SQL developer (version 17.3.1) and Tableau (version 2019.1). A p-value <0.05 guided statistical interpretation.

Sensitivity analysis

To mitigate a potential limiting effect of our CCS selection criteria, we performed a sensitivity analysis on the PASC-related conditions deemed symptoms. CCS condition counts for abdominal pain, anosmia, conditions associated with dizziness/vertigo, malaise and fatigue, nausea and vomiting, nonspecific chest pain were recalculated by removing the pre-existing conditions diagnosis exclusion requirement for identification of acute and persistent and late diagnoses, thus allowing the presence of these symptoms to be counted irrespectively. Statistical analysis was performed in the same manner as our primary analysis.

In addition, to account for the potential effects of the use and/or abuse of corticosteroids on glucose levels, we performed a sensitivity analysis on patients identified as having diabetes mellitus in the acute and persistent or late periods. Dispensed corticosteroid medications during the time periods in question were identified for those patients. A chi-squared test was used to determine if an association was present between case and control patients with a diabetic diagnosis and corticosteroid use.

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.