A retrospective cohort study of 12,306 pediatric COVID-19 patients in the United States

Children and adolescents account for ~ 13% of total COVID-19 cases in the United States. However, little is known about the nature of the illness in children. The reopening of schools underlines the importance of understanding the epidemiology of pediatric COVID-19 infections. We sought to assess the clinical characteristics and outcomes in pediatric COVID-19 patients. We conducted a retrospective cross-sectional analysis of pediatric patients diagnosed with COVID-19 from healthcare organizations in the United States. The study outcomes (hospitalization, mechanical ventilation, critical care) were assessed using logistic regression. The subgroups of sex and race were compared after propensity score matching. Among 12,306 children with lab-confirmed COVID-19, 16.5% presented with respiratory symptoms (cough, dyspnea), 13.9% had gastrointestinal symptoms (nausea, vomiting, diarrhea, abdominal pain), 8.1% had dermatological symptoms (rash), 4.8% had neurological (headache), and 18.8% had other non-specific symptoms (fever, malaise, myalgia, arthralgia and disturbances of smell or taste). In the study cohort, the hospitalization frequency was 5.3%, with 17.6% needing critical care services and 4.1% requiring mechanical ventilation. Following propensity score matching, the risk of all outcomes was similar between males and females. Following propensity score matching, the risk of hospitalization was greater in non-Hispanic Black (RR 1.97 [95% CI 1.49–2.61]) and Hispanic children (RR 1.31 [95% CI 1.03–1.78]) compared with non-Hispanic Whites. In the pediatric population infected with COVID-19, a substantial proportion were hospitalized due to the illness and developed adverse clinical outcomes.

Over 4.2 million children in the United States have tested positive for coronavirus disease-2019 (COVID- 19) since the onset of the pandemic 1,2 . In comparison with adults, preliminary reports suggest that children (< 18 years of age) have relatively lower odds of adverse clinical outcomes associated with COVID-19 [3][4][5][6][7][8] . The lower observed prevalence of COVID-19 in the pediatric age-group worldwide is partially attributed to widespread school closures in response to the pandemic 7,9,10 . Furthermore, challenges in the adequate screening and testing of children, especially those who are asymptomatic or minimally symptomatic, may have also contributed to the underreporting of COVID-19 in children. The cautious reopening of schools in the United States and other countries has occurred in the backdrop of an increased possibility of community transmission of COVID-19 among children in schools 7,8,10 . Thus, it is important to characterize the demographic, clinical characteristics, and outcomes in children infected with COVID-19. There are limited data, especially from the United States, describing the demographics, clinical characteristics, and outcomes of lab-confirmed COVID-19 children [3][4][5][6][7][8] . We present the findings of an investigation evaluating the clinical characteristics, comorbidities, and complications in 12,306 lab-confirmed COVID-19 patients from a multicenter federated healthcare network electronic health record database.

Discussion
In this study of children with COVID-19, we observed a high prevalence of non-specific symptoms at presentation, with frequent multi-organ involvement. In our study cohort, ~ 5% were hospitalized, and among those who were hospitalized, ~ 18% required critical care, and ~ 4% needed mechanical ventilation. The clinical outcomes were similar across subgroups of sex. Non-Hispanic Black and Hispanic children with COVID-19 had a higher risk of hospitalization when compared with non-Hispanic White children. The temporal trend of the cases and hospitalization was similar to the nationwide population trends for COVID-19 cases. We confirm the findings of prior reports from smaller populations describing the relatively milder clinical course and a relatively lower incidence of adverse clinical outcomes among children compared to adults 8,[11][12][13][14][15] . The findings from our reports describe the wide spectrum of illness seen in children with COVID-19 across the demographic and age subgroups. The recognition of these clinical characteristics is important for the early identification and care of children with COVID-19. We observed a higher frequency of hospitalization in non-Hispanic Blacks and Hispanics. This is concordant with the recent findings reported by the Centers for Disease Control and Prevention (CDC) from the hospitals participating in the COVID-NET database 16 . The observed racial differences may be due to greater indirect viral exposure to children from racial/ethnic minorities due to the various socioeconomic impediments to implementation of infection control measures [17][18][19][20][21][22][23][24] . These racial disparities have been previously noted during the H1N1 pandemic 23,25,26 . These racial differences in COVID-19 burden are also evident in the current data from the pediatric intensive care units in the United States 27 . We observed a similar trend in COVID-19 cases and hospitalization in our database as being reported by larger tracking databases such as that by the COVKID project and the CDC's COVID-NET 2,28,29 .
We noted that the majority of our study population were not recorded to have typical symptoms (i.e., fever, cough, or dyspnea). Prior investigations in different populations, in diverse settings, and with varying age distributions have reported that up to 40-70% of pediatric patients may present with fever and respiratory symptoms 31,32 . In contrast, some investigations have also reported a high prevalence (up to ~ 50%) of asymptomatic or mild COVID-19 infections in children 33,34 . A previous study in the United States 32 had noted that testing indications were unclear in nearly 50% of the children, which may contribute to the low prevalence of typical www.nature.com/scientificreports/ symptoms at the time of diagnosis. There may also be additional factors contributing to COVID-19 testing among children, such as exposure to SARS-CoV-2 infected individuals, parental vigilance, and asymptomatic screening www.nature.com/scientificreports/ for travel or surgery 8,30,[33][34][35][36][37][38][39] . However, similar to our findings, nearly all prior studies have found a relatively high proportion of non-specific signs and symptoms prompting testing among pediatric COVID-19 patients, including lethargy, malaise, myalgia, sore throat, runny nose, sneezing, gastrointestinal symptoms, and fatigue 8,[30][31][32][33][34][35][36][37][38][39] .
The relatively lower rates of typical symptoms noted in our study compared with other studies may also be due to the incomplete reporting of symptoms in the electronic health records, difficulty in eliciting symptomology in pre-verbal pediatric patients, relatively higher proportion of non-typical symptoms, geographic differences in the extent of spread of COVID-19 40 , and differences in the local screening and testing approaches. Further evaluation of the clinical presentation of COVID-19 among pediatric populations is needed to adequately target our screening and testing approaches. The age-related differences in COVID-19 population prevalence 41 and associated clinical outcomes may be a result of multiple factors 3 . Poor clinical outcomes among adult COVID-19 patients are associated with a  www.nature.com/scientificreports/ higher comorbidity burden [42][43][44][45] . Children usually do not exhibit multiple comorbidities till a later age, and this may contribute to the lower rates of adverse clinical outcomes in children compared with adults. Pre-existing empiric immunity as a result of frequent seasonal human coronavirus infections has also been hypothesized to contribute to the lower SARS-CoV-2 infection rate among children and adolescents compared with adults 46 . In the absence of patient-level data about prior infections, the role of empiric immunity in the prevention of infection and its clinical manifestation requires further investigation. There may be other factors that may contribute to the observed lower risk in children compared with adults, such as age-dependent expression of ACE2 receptor (SARS-CoV-2 binding receptor) and androgen levels [47][48][49] . We also observed relatively lower neutrophil levels in the hospitalized pediatric COVID-19 patients indicating a role of age-related neutrophil recruitment in the mild manifestation of the illness in children, as reported previously 50 . The lower disease prevalence and severity in children may be due to both having lower susceptibility to COVID-19 infection and a lower likelihood of showing symptoms 3,4 . Our findings have public health implications. The initial risk of infection transmission among children may have been limited to some extent by the early closure of schools, colleges, and universities 9,10 . The public health impact of the school closure and reopening is not completely understood 9,10 . While the presented data does not capture the potential public health impact of the various measures, these data may help to address the lacunae of clinical evidence around the patient characteristics of children with COVID-19. We observed a higher prevalence of comorbidities among those who were hospitalized following COVID-19. Identifying children who are at greater risk of complications 51-53 may serve to create tailored strategies to screen them aggressively. Overall, our findings suggest that children and adolescents may have a milder course of illness compared with adults with COVID-19 54 . Given the high prevalence of non-specific signs and symptoms and the fact that the majority of the patients lacked typical 55 symptoms in our investigation, increased vigilance, innovative screening, and frequent testing is required among school-going children and their immediate contacts. Routine screening tools and procedures such as daily temperature checks in school may be less effective. Our study findings may guide the resource utilization and mitigation efforts by local and federal health authorities, especially in areas with high COVID-19 incidence and prevalence. Innovative approaches, such as sentinel surveillance, random testing of children and the teachers, prioritizing children from high-risk households for COVID-19 testing, and providing education and training on the appropriate use of non-invasive pulse oximeters, may yield additional benefits and help mitigate the spread of COVID-19 among children. Implementation of these strategies may need to be enhanced among children from racial/ethnic minorities to curtail the existing COVID-19 related health disparities.
There are several limitations to our study. The patient exposure and outcomes are defined using administrative codes, which may be subject to coding errors 56,57 . Similarly, it is difficult to parse out the severity of the clinical outcomes [58][59][60][61][62][63] . Importantly, there may be significant selection bias in the children who were tested based on the indication for obtaining the tests, availability of tests, and access to testing locations and hence the eventual inclusion of the children in the study population. There were also periods early in the pandemic where testing was primarily advised for children whose clinical symptoms were thought to represent a high likelihood of COVID-19. Additionally, the information presented in our investigation is accrued from the structured data recorded in the electronic documentation. Thus, the data may be subject to inaccuracies and incomplete reporting. This may be especially important in the adequate documentation of clinical symptoms in the patient records. Furthermore, laboratory markers may not have been collected in all patients, and there may be an indication bias in the reporting of those results. Additionally, the ability to elicit symptomology is naturally limited by the nature of pediatric medicine. There may also be under-reporting of comorbidities in the administrative datasets 56,57,[64][65][66] . We included all patients with race/ethnicity data available. While this may contribute to some degree of selection bias, it allows for accurate assessment of race-stratified outcomes. We used all-cause hospitalization as the study outcome, which makes it difficult to ascertain whether the hospitalization was due to COVID-19 or another cause. Due to lack of availability of the raw dataset, we were unable to compute the time to hospitalization and the duration of hospitalization from these data. Specific manifestations, such as the multi-systemic inflammatory syndrome 12 in COVID-19, may not have a uniform description in electronic documentation. Thus, we could not evaluate the prevalence of this important sequela of COVID-19. Our study is also limited by the ability to detect the transmission potential of the diagnosed patients. Due to the obfuscation of counts of less than 10 for privacy concerns, we are unable to report the exact number of deaths in the overall population and subgroups.
In summary, children infected with COVID-19 present with a broad spectrum of non-specific symptoms across the age groups. Children with COVID-19 can develop severe illness requiring hospitalization and critical care, but the rates of severe illness and death are relatively low.

Methods
Data source. The TriNetX (Cambridge, MA) COVID-19 Research Network database was used for this study [58][59][60][61][62][63] . This research network database is a federated health research network database that incorporates and integrates the electronic health records from the participating healthcare organizations, which includes nearly 59 million patients. The research network database integrates cloud-based HIPAA-compliant real-time aggregate patient-level data from the electronic health records, which includes diagnoses, procedures, medication use, and clinical laboratory values from the contributing organizations [58][59][60][61][62][63] . The participating organizations contribute data from inpatient, outpatient, and specialty services. The TriNetX database integrates data from all the participating organizations after clearance through local data warehouses and research data repositories. To ensure patient privacy, the stored and transmitted data are de-identified at the patient, and organization level. Structured data recorded in the electronic health records are assimilated into the database after mapping the data to standard and controlled clinical terms. A rigorous data quality assessment is done to exclude records that do www.nature.com/scientificreports/ not meet quality standards and basic formatting requirements for adequate data representation 63 . The referential integrity is maintained to ensure comparison of data across several databases. Moreover, TriNetX software also ensures data validity by regularly monitoring the temporal trend of data volume [58][59][60][61]63  Measures and outcomes. We identified the baseline patient characteristics, including past medical history, presenting symptoms, medications, and lab parameters. We identified the clinical features and laboratory values for lab parameters, which were identified from within the last 1 month up to the index event in the hospitalized cohort. Typical symptoms were defined as having any of the three symptoms: fever, cough, and shortness of breath, as defined by the CDC 55 . The lab-confirmed diagnosis of COVID-19 was defined as the index event.
The standardized ICD-10 diagnosis codes were used to identify the history of existing medical conditions. The main study outcome was the frequency of all-cause hospitalization within 30-days of testing positive in children with COVID-19 and in the abovementioned sub-groups. The additional study outcomes included mechanical ventilation and the requirement for critical care. The administrative diagnosis and procedural codes were used for the identification of the aforementioned study outcomes (Supplementary Tables 4-6).

Statistical analyses.
We summarized the baseline characteristics as mean ± standard deviation for continuous data and as numbers and percentages for categorical data. The baseline characteristics were compared using descriptive statistics with the continuous data were compared using independent sample t-test, and categorical data were compared using the z-score. The study outcomes were compared in age and sex subgroups. All primary and secondary outcomes were reported in the overall populations and the sub-groups of age, sex, and race/ethnicity. Logistic regression was applied to obtain a propensity score for each patient using logistic regression implemented by the function LogisticRegression of the scikit-learn package in Python version 3.7 63,68 . The output was verified by repeating the propensity scoring in R version 3.4.4. Subsequently, a 1:1 matching was done using greedy nearest neighbor matching with a caliper of 0.1 pooled standard deviation 63,69 . The propensity score-matched populations were matched for age, sex, race, and comorbidities (cardiovascular, respiratory, gastrointestinal, malignancy, metabolic, hematological or immunological, neurological and neuromuscular, congenital or genetic defects, renal or urological) (Supplementary Table 6) 70 . For the protection of inadvertent disclosure of protected health information, patient counts for demographics, clinical characteristics, and outcomes if less than is reported as ≤ 10. We report the comparative risk of the study outcomes as risk ratios with 95% confidence intervals. The two-tailed type I error of 0.05 was deemed to be statistically significant. The cloud-based TriNetX analytics platform, which utilizes a combination of JAVA, R, and Python, was used for all analyses [58][59][60][61]63 .

Data availability
The data from the TriNetX COVID-19 Research database is available to member healthcare organizations through the online cloud-based TriNetX research platform available at https:// www. trine tx. com/. The aggregate patient-level is integrated from the electronic health records of the member healthcare organizations, with data available for download by request at the participating institutions.