Epidemiological characterization of SARS-CoV-2 variants in children over the four COVID-19 waves and correlation with clinical presentation

Since the start of SARS-CoV-2 pandemic, children aged ≤ 12 years have always been defined as underrepresented in terms of SARS-CoV-2 infections’ frequency and severity. By correlating SARS-CoV-2 transmission dynamics with clinical and virological features in 612 SARS-CoV-2 positive patients aged ≤ 12 years, we demonstrated a sizeable circulation of different SARS-CoV-2 lineages over the four pandemic waves in paediatric population, sustained by local transmission chains. Age < 5 years, highest viral load, gamma and delta clades positively influence this local transmission. No correlations between COVID-19 manifestations and lineages or transmission chains are seen, except for a negative correlation between B.1.1.7 and hospitalization.

www.nature.com/scientificreports/ However, due to the modest COVID-19 related symptoms, it cannot be excluded that in the first months of the pandemic, when the diagnosis was mainly symptom-based, a substantial part of SARS-CoV-2 positive children escaped the SARS-CoV-2 screening strategies 15 , thus causing an underestimation of SARS-CoV-2 in this population.
Here, our objectives are to fill gaps in the knowledge of SARS-CoV-2 paediatric epidemic thanks to the viral definition, transmission dynamics and clinical characterization of more than 600 children diagnosed with SARS-CoV-2 infection in a large, scientific paediatric institute for research, hospitalization and healthcare in the Central Italy. With an integrated approach comprising epidemiological, clinical and viral genetic data, this study provides a unique data set to better understand temporal evolutions of SARS-CoV-2 in children and to identify any changes in clinical manifestations from emerged SARS-CoV-2 variants.

Methods
Sample collection and study design. This retrospective observational study originally included 731 SARS-CoV-2-positive nasopharyngeal-swabs, obtained from COVID-19 patients aged ≤ 12 years referred to Bambino Gesù Children Hospital IRCCS (OPBG) during the period of March 05, 2020, to August 31, 2021. The SARS-CoV-2-positive nasopharyngeal-swabs were selected according to the criteria defined in Supplementary  Fig. 1. Only 612 samples were finally selected cause successfully sequenced.
Demographics, epidemiological and clinical data were obtained retrospectively by pseudonymized electronic medical records. The study protocol was approved by local Research Ethics Committee of OPBG (prot. 2384_ OPBG_2021). This study was conducted in accordance with the principles of the 1964 Declaration of Helsinki. Informed consent was waived in accordance with the hospital regulations on observational retrospective studies.
Severity of SARS-CoV-2 infection was defined according to Dong Y, et al., Pediatrics. 2020 16 , and based on the clinical features, laboratory testing, and chest radiograph imaging in our cohort. The following definitions were used: (i) asymptomatic infection defined as children tested SARS-CoV-2 positive after a contact tracing but not developing any clinical symptoms; (ii) mild infection defined as symptoms of upper respiratory tract infection, such as cough, sore throat, runny nose, and sneezing, that may include also fever, fatigue, myalgia, and/or symptoms of gastrointestinal tract infection, defined as vomiting, abdominal pain, nausea or diarrhoea; (iii) moderate/severe infection defined as symptoms of lower respiratory tract infection including clinical signs of bronchitis or pneumonia (fever, cough, dyspnoea, fast breathing) with or without signs of gastrointestinal symptoms. Patients developing dyspnoea and with oxygen saturation < 92% were included in this category.
At the time of diagnosis, SARS-CoV-2 RNA was quantified by a ddPCR home-made protocol targeting three different RNA polymerase RNA dependent (RdRp) regions 17 . Quantitative results were then normalized in number of copies per mL.
Virus amplification and sequencing. Viral RNAs were extracted from nasopharyngeal swabs by using QIAamp Viral RNA Mini Kit, followed by purification with Agencourt RNAClean XP beads. Both the concentration and the quality of all isolated RNA samples were measured and checked with the Nanodrop. Amplicons of whole genome sequences of SARS-CoV-2 were generated with a 50 ng viral RNA template, by using a multiplex approach e.g. CleanPlex SARS-CoV-2 Research and Surveillance Panel, and QIAseq DIRECT SARS-CoV-2 Kit 18,19 according to the manufacturer's protocol. Libraries were then generated using the Nextera DNA Flex library preparation kit with Illumina index adaptors and sequenced on a MiSeq instrument (Illumina, San Diego, CA, USA) with 2 × 150-bp paired-end reads.
Reference-based assembly of the raw data was performed according to 20 . 612 consensus sequences were generated using the GitHub freely distributed software vcf_consensus_builder 21 . All SNPs having a minimum supporting read frequency of 40% with a depth ≥ 10 were retained. Phylogenetic analysis. SARS-CoV-2 lineages of the 612 SARS-CoV-2 consensus sequences obtained were assigned according to the PANGOLIN application (Pangolin https:// pango lin. cog-uk. io/) 22 . In order to describe the relatedness of the paediatric sequences against SARS-CoV-2 diversity, 1233 sequences from GISAID (GISAID sequences) were selected against location and sampling date, and 410 SARS-CoV-2 sequences (local sequences) belonged to adolescent and adult population (> 12 years) living in the same area of the study paediatric population were added.
Sequences were aligned using ClustalX and manually checked using Bioedit. The final alignment had 2255 sequences of 28,655 nucleotides long. Alignment positions showing significant homoplasy were identified by combined approaches in order to account for regions that might potentially be the result of hypervariability or sequencing artifacts. Homoplasies were firstly identified using HomoplasyFinder, and then confirmed by Treetime (homoplasy setting) 23,24 .
In order to explore the phylogenetic structure of the epidemic affecting population aged ≤ 12, a maximum likelihood (ML) [25][26][27] phylogeny tree was firstly performed by using the full alignment composed of GISAID, local population, and paediatric sequences. The ML phylogeny tree was built with IqTree 25 using the best-fit model of nucleotide substitution GTR + I + γ 26 . Tree topology was assessed with the fast-bootstrapping function with 1000 replicates. The ML tree was inspected in TempEst, 27 in order to define the correlation between genetic diversity (root-to-tip divergence) and time of sample collection ( Supplementary Fig. 2). Local clusters of SARS-CoV-2 sequences were defined by an intra-genetic distance < 0.0002 and a bootstrap support ≥ 99.0%.
Bayesian coalescent methods 28,29 were further performed, in order to define the phylogenetic structure of the paediatric epidemic against time, and to confirm potential transmission chains and clusters. A first Bayesian coalescent tree analysis was undertaken with BEAST v1.10.5 28 , using the GTR + I + γ 26 29 using only paediatric sequences. The most informative sequences for virus spread and clustering identified in the first Bayesian tree and in the ML tree were incorporated in a second Bayesian tree interference, in order to yield more robust reconstructions of transmission clusters (defined by a posterior probability support = 1). Four independent chains were run for 50 million states and parameters and trees were sampled every 1,000 states. Upon completion, chains were combined using LogCombiner after removing 10% of states as burn-in and convergence was assessed with Tracer (ESS > 100). Taxon sets were defined and used to estimate the posterior probability of monophyly and the posterior distribution of the tMRCA of observed phylogenetic clusters. iTOL 30 was used to annotate phylogenetic trees with information regarding lineages, clusters, symptoms, hospitalization, and SARS-CoV-2 viral load.
Statistical analysis. Likelihood Ratio Test, followed by a multinomial logistic regression model to estimate 95% confidence intervals of odds ratios, was used to compare demographic and clinical findings between general and selected SARS-CoV-2 infected populations.
Kruskal-Wallis and Chi-squared test for trend were used to estimate significant changes among different lineages and transmission clusters. Mann-Whitney test and Fisher exact test were used to estimate significant changes between in and out cluster sequences.
A multivariable logistic regression analysis was performed to evaluate demographic, virus-related, and clinical factors independently associated with clustering and hospitalization.
Two-sided p-values were always reported. Data were analyzed using Rgui and the statistical software package SPSS (v32.0; SPSS Inc., Chicago, IL).  Table 1 and Supplementary Results. One-hundred and nine samples were excluded due to failed amplification (n = 25) or poor genomic coverage (< 60%, n = 94). The final study population thus consisted of 612 patients, whose whole genome SARS-CoV-2 were successfully sequenced with their demographic and clinical characteristics reported in Table 1.

Patients
Most patients lived in Lazio region (n = 587, 95.9%), and were Caucasian (n = 527, 86.1%). Three hundred and forty-five (56.4%) were male. The median age was 2 (interquartile range [IQR]: 1-6) years. Two hundred and fifteen (35.1%) patients were under-one year of age. At the time of testing, mild infections were the most prevalent (436 cases, 82.3%), followed by moderate/severe infections (51, 9.7%). Only the 7.1% of patients, identified as a contact of a household case, was asymptomatic. One-hundred and seven patients required hospitalization (19.9%). Four patients, 2 of them aged < 1 year, manifested a severe disease 16 and required oxygen ventilation in the critical care unit. No deaths were reported.
Median (IQR) SARS-CoV-2 nasopharyngeal load was 7.7 (6.1-8.5) log copies/mL. The 62.4% of samples had a SARS-CoV-2 viral load > 7.0 log copies/mL. Viral load was slightly (but significantly) higher in patients aged ≤ 1 year compared to viral load in patients aged 1-5 and ≥ 5 years By considering the timing of diagnosis, the 53.4% of paediatric SARS-CoV-2 infections (n = 327) were collected during no-restriction periods (white zone), and the 6.2% (n = 38) during lockdown (red zone). The remaining diagnosis were performed during light restriction periods (yellow and orange zone).  Table 1.

Demographic characteristics
Age, years: The B/B.1/B1.1 lineages characterizing the initial months of the SARS-CoV-2 pandemic were present only in 30 individuals aged 5 (1-8) years diagnosed between March and October 2020. This low number of B-related infections could be explained by the low number of children diagnosed as SARS-CoV-2 positive during the early phase of the pandemic (Supplementary Fig. 3).
Other lineages were detected in 28 patients and involved among others the B.1.160 (n = 14), the B.1.525 (n = 2), and the variant of concern B.1.351 (n = 1). Of note, the B.1.160 (20A.EU2) was found in all European recipients, 5 of them belonging to Southern East Europe. This lineage became common in Europe after summer 2020 and was characterized by the S: G22992A(S477N), known to strengthen the binding of the SARS-COV-2 spike with the human ACE2 receptor 32 .
The composition of sequences did not change substantially from SARS-CoV-2 sequences retrieved from general population of the same geographical origin and from GISAID (Fig. 1). Concordant with the lineages' modification over time, the genetic pairwise distance of the 612 sequences indicated that the SARS-CoV-2 sequences evolved progressively during time (rho = 0.682, Supplementary Fig. 2).
No significant association was found between lineages and COVID-19 clinical presentation, even if a low number of moderate/severe manifestations was found in presence of B.1.1.7 lineage (Table 1 and Fig. 2A). www.nature.com/scientificreports/ Evidence of local transmission clusters. By looking at the time-scale phylogeny, it was possible to identify clear transmission chains and clusters, and to clarify the dynamic of viral lineages in the paediatric population (Fig. 2B). The characteristics of the local clusters, with insights for clusters composed of ≥ 10 sequences were reported in Supplementary Table 2.
Overall, 129 sequences (21.1% of total paediatric sequences) were found in clusters, six of them composed of ≥ 10 sequences.
Lineage B.1.1 was characterized by limited local transmission, probably due to the low number of SARS-CoV-2 diagnoses in children during the first months of the pandemic. The only one local cluster, cluster B9, was characterized by a posterior probability of 1.00 and a tMRCA dated June, 8 2020 (May, 24-June, 13). It was composed of 6 sequences from children almost exclusively aged less than 2 years (except for one 4-year-old child). Five out of six children had a SARS-CoV-2 load in nasopharyngeal swabs > 7.0 log copies/mL. All children lived in Rome, but two of them had a Nigerian origin. Of note, a sequence diagnosed in South Africa on March 31 2020, is at the origin of this cluster, confirming the probable foreign origin of this cluster. Two children experienced a SARS-CoV-2 related pneumonia. All cluster B9 sequences were characterized by the NSP3:synC7639T.
Two transmission chains probably starting between January and February 2021 were detected within P.1.1 lineage (Cluster γ5 and γ30, Fig. 2B and Supplementary Table 2). While cluster γ30 contains only 4 paediatric sequences intermixed with adult and global SARS-CoV-2 sequences, cluster γ5 involved a tracing network of 18 individuals, 12 of them (66.7%) below 12-years of age. All patients were diagnosed in Rome, but four of them had foreign origin. The earliest and most closely related strain was a sequence from a 4-months old child of Southern East Europe origin collected in late April 2021 in Rome, confirming a multi-seeded transmission. In line with this, the most recent ancestor of this cluster dates to February 12, 2021 (Feb 8 -Feb 24). All sequences were characterized by the RdRp:C13720T(P85S), NSP2:C2445T(T547I) and NSP4:synC9565T.
Delta clade was characterized by 4 local clusters, involving 39 sequences (28.1% of delta clade sequences). The high prevalence of delta sequences in local clusters suggests the sustained circulation in the paediatric population of this clade since its emergence. Among these 4 clusters, δ26 (posterior probability = 1.00) was composed of 37 sequences, 14 of them (37.8%) paediatrics and with a SARS-CoV-2 load in nasopharyngeal swabs > 7.0 log copies/mL in 12/14 patients (Supplementary Table 2 Table 2). The remaining individuals belonged to adolescent (n = 2, aged 15 and 18 years) and adult population (n = 11, age range: 20-85 years). Sequences were characterized by the RdRp:synC16111T, the NSP13:G17671A (V479I) and the S:G23593C(G677H), reported as recurrent arising independently in many SARS-CoV-2 lineages circulating worldwide by the end of 2020, and known to enhance viral infectivity and neutralizing antibodies resistance 36,37 .
Univariate and multivariate logistic regression models were performed to identify potential factors associated with local clusters ( Correlation with hospitalization. Univariate and multivariate logistic regression models were performed to define if lineages or clusters can be potentially associated with hospitalization ( Table 3). As confounding factors gender, age, nationality, residency, SARS-CoV-2 viral load, symptoms at diagnosis, and comorbidities were considered. The results showed that in our paediatric population, lineages and clusters were not associated with hospitalization, with the exception for B.1.1.7 lineage, significantly and negatively associated with hospitalization (AOR: 0.31 [0.13-0.71], P = 0.006). This lineage was also characterized by the lowest (even if not significant) number of COVID-19 moderate or severe manifestations (n = 4, 4.0%) compared with other SARS-CoV-2 lineages ( Table 1). As expected, patients aged < 1, patients with comorbidities, and moderate/severe COVID-19 were more frequently associated with hospitalization (P < 0.0001, Table 3).

Discussion
These data, based on the genetic characterization of the SARS-CoV-2 strains circulating in paediatric population, implemented with demographics and clinical information, indicate that at least four lineages widely circulated in paediatric population over the four COVID-19 waves. Children were also part of SARS-CoV-2 community transmission, as supported by the evidence of 6 transmission chains composed of more than 10 paediatric SARS-CoV-2 sequences and characterized by fixed synonymous and non-synonymous substitutions. The 21.1% of SARS-CoV-2 sequences were indeed involved in local clusters, and 64.3% of them takes part in 6 large transmission chains.
While transmission dynamics were deeply investigated in general population, helping in defining the geographical mapping, tracking and evolution of SARS-CoV-2 20,31,34,38 , limited evidence relating to SARS-CoV-2 transmission dynamics and correlation with demographic and clinical characteristics are available so far in paediatric population. The relationship between age, viral load, lineages and transmission trajectories across  www.nature.com/scientificreports/ the full symptom spectrum of SARS-CoV-2 infection has not been comprehensively investigated. The role of different ages in transmitting the virus is also uncertain. In our study, the SARS-CoV-2 positivity rate in children aged ≤ 12 years increased from 0.6% between March and July, 2020 to 5.2% between August and December 2020 remaining almost stable since August 2021. This positivity ranges should be interpreted in the context of circulating variants. In the early phase of the pandemic, the low number of children diagnosed as SARS-CoV-2 positive 39 were infected by B/B.1/B1.1 lineages. These lineages were still characterized by local transmission, as described by the B9 cluster, emerging in late May/early June 2020, and composed of 6 sequences from children exclusively aged less than 5 years and with a SARS-CoV-2 load in nasopharynx > 7.0 log copies/mL. The presence of a local transmission cluster composed of children just three months after the start of Italian SARS-CoV-2 epidemic 40 can be interpreted as the first evidence of the active role of children in the community SARS-CoV-2 transmission dynamics. Lineage B.1.177 was found to predominate between October 2020 and January 2021 and accounted for the 20.8% of all SARS-CoV-2 sequences analyzed in this study. Sequences of B.1.177 lineage belonged to 20E (EU1) clade 31 , whose earliest sequences were found in samples collected on June 20, 2020, in Spain and in the Netherlands. By the end of August, 20E (EU1) sequences had also been detected in most of European countries, including Italy 31 . The 17.4% of 20E sequences circulating in children were also involved in local transmission clusters, one of them composed of 24 sequences characterized by 5 SNPs. Some of these SNPs were previously described in Northern Europe 41 , and as of 3 December 2021 were not related with transmission or pathogenicity alterations. Among the variants rapidly causing concerns, P.1 (gamma clade), B.1.1.7 (alpha clade), and B.1.617 (delta clade) raised in paediatric population between March and July 2021. All these variants were clearly characterized by an increased transmissibility compared with the previous wildtype lineages 34,42,43 . In line with this, B.1.1.7 and B.1.617.2 + AY were the most prevalent variants in our population after the B.1.177 (20.8% and 22.7% of the 612 SARS-CoV-2 sequences, respectively). Transmission clusters with evidence of expanded evolution were detected in both alpha and delta clades. While clusters of alpha clade involved 17.3% of total B.1.1.7 sequences and did not report evidences of genetic alterations able to increase transmission or pathogenicity, transmission chains of delta clade involved the 28.1% of delta sequences. Thirteen of them reported the Q677H mutation known to enhance viral infectivity and neutralizing antibodies resistance 36,37 . The rapid mechanism of delta clade adaptation, and its onward transmission in local paediatric population might explain its outcompeting and dominance respect pre-and co-existing lineages as P.1 and B.1.1.7, as previously observed in adult population 44    www.nature.com/scientificreports/ Differently by alpha and delta clade, a limited spread of gamma clade was observed. Only 35 sequences (5.7% of the whole population here analysed) mainly concentrated between March and May 2021 were indeed identified. Of note, 22 of these 35 sequences were involved in probable transmission patterns. The limited spread and the closely relatedness of the P.1 variants detected in paediatric population might reflect the appropriate control strategy against this variant at local level. The P.1 limited spread in paediatric population can also reflect the rapid dominance of delta clade respect to this co-existing lineage, as previously stated.
Looking at factors associated with local transmission clusters, multivariate logistic regression model identified both gamma and delta clades as positively associated with transmission chains, implying that local transmission dynamics were the key drivers of the spread of these lineages in the paediatric population. Multivariate model also defines that patients aged less than 5 years were more frequently found in clusters respect to patients aged > 5 years (Table 2). Patients aged < 5 years were also characterized by a higher SARS-CoV-2 viral load in the nasopharynx respect to older paediatric population (SARS-CoV-2 log copies/mL: 8.3 [6.3-8.6] in > 5-yearold patients vs. 7.7 [6.2-8.4] in 1-5-year-old patients vs. 7.1 [6.0-8.3] in < 1-year-old patients. P < 0.0001), thus supporting their role as common SARS-CoV-2 transmitters. Thus, our study demonstrates that the role of children in SARS-CoV-2 community transmission should not be underestimated and strengthen the vaccine EMA recommendation approval of SARS-CoV-2 vaccine for children aged 5-11 45 . Vaccination optimization models including paediatric population is worthwhile to reduce the burden of hospitalization and risk of severe manifestations in this young category of patients, but may also increase the chance of reducing SARS-CoV-2 circulation, and thus the risk of new spreading variants. Individuals aged < 5 years, mainly associated with local transmission clusters, will be the latest group to benefit of SARS-CoV-2 vaccination. Clinical trials are to date ongoing to determine the optimal dose of vaccines to protect this group of children while minimizing potential adverse events. These findings advocate for a rapid finalization of vaccine in this age group. Meanwhile, the continued molecular surveillance in this unvaccinated population remains crucial.
The effort to implement with demographic and clinical data the 612 SARS-CoV-2 sequences obtained by paediatric individuals made it possible to estimate the role of SARS-CoV-2 variability in transmission dynamic, SARS-CoV-2 presentation and hospitalization. According to other reports 46 most of the paediatric patients presented with mild disease (83.2%), while moderate conditions accounted for the 9.7% of cases. No significant association was found between lineages and COVID-19 presentation, even if a lower number of moderate/severe cases were found during alpha-clade epidemic (Table 1). In line with this, patients involved in clusters of B.1.1.7 frequently reported only mild symptoms (moderate/sever infection was detected only in one in-cluster case), and no new amino acid mutation was detected in spike or nucleocapsid proteins, both implicated in SARS-CoV-2 virulence and pathogenicity 32,33,36,37 . B.1.1.7 variant was also found negatively associated with hospitalization in our multivariate model (Table 3). These results confirmed evidence observed in adult population, affirming that no change in symptoms or their duration were associated with alpha clade 47,48 .
We acknowledge some limitations to this study. To warrant high quality sequences and good genomic coverage, only samples with Ct values ≤ 29 were selected (see Supplementary Results). In order to exclude sampling bias that could affect viral diversity driven by this selection, the characteristics of the general paediatric SARS-CoV-2 affected population were compared with those of sampled population and no major differences were highlighted (Supplementary Table 1). Limitations to the assessments of the proportions of asymptomatic cases should be noted: most of our paediatric population get tested only when children have symptoms, so relatively few asymptomatic infections are recorded. Our study doesn't assess the on-going clinical symptoms neither their persistence after SARS-CoV-2 clearance. The paediatric nature of this study, and the paediatric Hospital in which sequences were collected, have limited the possibility to include adult patients with direct epidemiological linkage with the children here described, thus affecting the possibility to define household transmission trajectories. This limitation can also represent a strength, because the presence of adult individuals not sharing contacts with paediatric patients has confirmed the role of younger population in SARS-CoV-2 transmission at community level.
In conclusion, we report unique and updated information on the temporal trends of SARS-CoV-2 variants circulating in paediatric population, and their association with demographic and clinical manifestations. The data provided increase knowledge of SARS-CoV-2 transmission dynamic and the role of children in the community transmission. Transmission chains involved children mainly aged < 5 years and were detected in all phases of the pandemic. The frequent copresence of sequences from adults with no epidemiological link according to demographic data suggests both the superimposable distribution of paediatric and global SARS-CoV-2 epidemics, and the active role of paediatric population in SARS-CoV-2 community transmission. Even if children and adolescents usually demonstrate fewer and milder symptoms of SARS-CoV-2 infection compared to adults and are less likely than adults to experience severe COVID-19, continued molecular surveillance in this unvaccinated population will be essential to prevent hospitalization risk, COVID-19 related sequelae, SARS-CoV-2 transmission and spread, and thus new variants' selection.

Data availability
The 1022 SARS-CoV-2 sequences obtained in this study are openly available on GISAID portal and European Nucleotide Archive under the accession numbers EPI_ISL_7671548-EPI_ISL_7694367 and ERS8842923-ERS8843944, respectively. The list of accession numbers is available in the Supplementary Data 3. The list of the whole-genome SARS-CoV-2 sequences (n = 3244) retrieved from GISAID (gisaid.org) on 18 September 2021 and their corresponding accession numbers and lineages are available in the Supplementary Data 1. The list of the significant homoplastic positions identified by HomoplasyFinder and TreeTime are available in the Supplementary Data 2. Information regarding SARS-CoV-2 incidence rate in population aged ≤ 14 was made available on ECDC web page (https:// www. ecdc. europa. eu/ en/ publi catio ns-data/ covid-19-data-14-day-age-notifi cati on-rate-new-cases). The study protocol was approved by local Research Ethics Committee of OPBG (prot. 2384_OPBG_2021). This