Introduction

The international committee on taxonomy of viruses had divided Adenoviridae into 5 genera, Atadenovirus, Aviadenovirus, Mastadenovirus, Siadenovirus, and Ichtadenovirus. Through more than 6 decades, since the first characterizations of human adenoviruses (HAdVs)1, all of HAdVs falled within the genus Mastadenovirus. And HAdVs were classified into 7 groups (HAdV-A to HAdV-G), including 52 serotypes and 90 human HAdV genotypes2,3,4,5,6, which were recognized by Human Adenovirus Working Group, July, 2018 Update (http://hadvwg.gmu.edu/). Over the past 30 years, recombinant adenovirus-vectors based on the HAdV-C group had also been developed and extensively used in preclinical and clinical studies7. Among of these, members of the HAdV-B group (types 3, 7, 11, 14, 16, 21, 34, 35, 50 and 55) and HAdV-C group (types 1, 2, 5, 6 and 57) cause a variety of typically acute respiratory diseases. Especially, HAdV-C group could cause severe bronchiolitis or pneumonia in the early childhood8,9.

Three recombinant HAdV-C strains (BJ04, BJ09 and CBJ113), classified within HAdV-2 (P1H2F2), isolated from infants with acute respiratory infection in Beijing in 2009–2013 by labs in China CDC10,11. Among three strains, CBJ113 was characterized by a recombination among HAdV-2, HAdV-6, HAdV-1, HAdV-5, and HAdV-57 sequences. BJ04 recombination event involved parental strains HAdV-1, HAdV-2, whereas BJ09 involved in HAdV-1, HAdV-5 and CBJ113. Therefore, HAdV-1 was involved in recombination of other HAdV-C types.

The National Adenovirus Type Reporting System (NATRS) of the United States described trends in reported HAdV-C group was circulating in the United States after initiation of surveillance in 20148. NATRS also displayed that HAdV-1 was identified as the pathogen responsible for that outbreak8. Interestingly, here, we describe the characterization of a novel type of HAdV-1 isolated from a hospitalized infant with SARI. We found that there was the possibility of intraspecies recombination among HAdV-C group on the whole genome sequence analysis. In order to gain a better understanding of this phenomenon, we determined and analyzed the whole-genome sequence of HAdV-1 strain SH2016.

Results

Isolation and complete genomic characterization of the novel HAdV-C type

Throat swabs positive for the HAdV, when other viral nucleic acid detection was negative, were used initially for viral isolation. The isolated strain caused a visible CPE on culturing. It was archived as strain “human/China/SH/2016/1[P1H1F1]”, which we referred to as “SH2016” strain. Using next-generation and Sanger sequencing, the full-length genomic sequence of strain SH2016 was determined, and the genomic data, was deposited in GenBank (accession number: MH183293). The genome length of SH2016 strain (35,946 bp) was similar to the length of the prototype strain, human adenovirus type 1 (AC_000017, 36001 bp). The G + C content of the genome was 55.2%, which is similar to G + C content of other members of species group C adenoviruses1, and the plus strand had an overall base composition of 23.23% A, 27.99% C, 27.21% G, and 21.57% T. Similar to the genomes of prototype HAdV-1 reference strain (AC_000017), the genome encoded 40 coding sequences (Table 1) and 35 non-coding motifs (Table 2) were recognized. Whole genome phylogenetic analysis of 43 archived complete HAdV genomes from GenBank illustrated that strain SH2016 were clustered into HAdV-1, but it branched out independently with human/EGY/E13/2001/1[P1H1F1]12 (Fig. 1A). Then, phylogenetic analysis of 3 major antigen genes (penton, hexon and fiber) of the SH2016 showed that the 3 genes were classified to H1, P1, and F1 (Fig. 1B–D).

Table 1 Genome annotation of SH2016.
Table 2 SH2016 genome non-coding motifs annotations.
Figure 1
figure 1

Neighbor-joining phylogenetic trees based on the open reading frame sequences of the whole genome (A), hexon gene (B), penton gene (C) and fiber gene (D) of SH2016 strain in this study and those of HAdV-C whole genome reference strains from GenBank. Strain SH2016 highlighted with a solid square was characterized in this study. The trees were constructed using the neighbor-joining method of MEGA 6.06 with 1000 bootstrap trials performed to assign confidence to the grouping.

Comparative genome analysis

Compared with the complete group C genome sequences of the 5 prototype strains of HAdV-1 (AC_000017), HAdV-2 (AC_000007), HAdV-5 (AC_000008), HAdV-6 (HQ413315) and HAdV-57 (HQ003817), the SH2016 strain is conserved, sharing the highest nucleotide identity (97.93%) with the prototype strain of HAdV-1 (Table 3). Based on the nucleotide alignment of the different gene sequences, the nucleotide sequences of the penton, hexon and fiber genes showed the highest degree of homology between the prototype strain HAdV-1, with identities of 99.82%, 99.68% and 98.79% respectively. Genomic map of strain SH2016, contained 40 ORFs (rightward ORFs: 33, leftward ORFs: 7), was showed in Fig. 2A. Comparison of the nucleotide sequences of the 11 coding regions (E1A, E1B, E2B, L1 13.6 kDa, pTP, L1 52 kDa, pIIIa, pVII, E3 and E4) showed the highest sequence similarity between strains HAdV-2, HAdV-5, HAdV-6, and HAdV-57, with identities of 98.19~99.75%. On the other hand, HAdV-1 and HAdV-57 showed the greatest similarities to SH2016 in the pIX gene (99.53%), HAdV-6 and HAdV-57 in putative protein U gene (98.19%), HAdV-1, HAdV-5 and HAdV-6 in pX gene (99.59%), respectively. While the Iva2, pV, pX, pVI, DBP, and L4 coding regions displayed the highest similarity with HAdV-1.Through comparative genomics analysis, the novel HAdV-1 type showed limited sequence variation between the HAdV-C group.

Table 3 The nucleotides sequence identities between SH2016 and HAdV-C reprensentative strains.
Figure 2
figure 2

Genetic recombinant analyses of the complete genome of the novel strain SH2016. (A) Genomic map of strain SH2016. The l-strand of the genome is represented by a straight line. Rightward (top) and leftward (bottom) ORFs are represented by grey arrows. (B) Recombination events predicted in strain SH2016. Strain SH2016 genome is shown as a thick black line. The likely backbone is shown as a cyan line. Genetic components predicted by RDP4 to be involved in a recombination event are shown as purple line. Likely breakpoint positions are shown below the genome. (C) Similarity analyses of SH2016. SH2016 was used as the query sequence to compare with other 2 representative strains of HAdV-C. The default setting of SimPlot software was used as followed: Window size 200 bp, step size 20 bp, replicates 1000 times and tree model neighbor-joining.

Genomic recombination analysis of strain SH2016

RDP4 package strongly predicted that the strain SH2016 was a highly probable homologous recombinant resulting from HAdV-1 (strain: human/USA/VT2672/2003/1[P1H1F1], GenBank ID: JX173083) and HAdV-2 (stain: T215/Ft Jackson South Carolina USA/2002, GenBank ID: KX384959) with beginning breakpoint located around 28040 (without gaps) of HAdV-1, within the gene coding for putative host modulation protein E3 (early E3 12.5 kDa glycoprotein) and with ending breakpoint located around 31067 (without gaps) of HAdV-1, within the gene coding for fiber protein (Fig. 2B). The similarities with possible major parent strain (HAdV-1) and minor parent strain (HAdV-2) were 99.3% and 98.6%, respectively. Indeed, 7 algorithms (RDP, GENECONV, BootScan, MaxChi, Chimaera, SiScan, 3Seq, LARD, PhylPro (Supplemental Figs S17), were utilized to predict potential recombination events between the input sequences) supported this event with p-values ranging from 2.347 × 10−187 to 2.179 × 10−12 (Table 4). Similarity plot analysis using SimPlot software were performed to confirm the consequent of recombination events within the genome of SH2016. As well as, SimPlot analysis indicated that the mosaic structure comprised of the SH2016 genome originated not only from mainly circulating viral strain: prototype HAdV-1 basically, but also from a small quantity of HAdV-2 (Fig. 2C). The results coincided with phylogenetic analyses, which indicated that both of the left region of recombinant point (5′-end, 1–28039) and the right region of recombinant point (3′-end, 31067–35946) of SH2016 strain were clustered into HAdV-1 group with high confidence (bootstrap value = 100% or 97%, Fig. 3A,B), but the recombinant region was clustered into HAdV-2/6/57 group (bootstrap value = 100%, Fig. 3C). So these findings re-confirmed that SH2016 appeared from potential genetic recombination events, which HAdV-1, and HAdV-2 participated in this process.

Table 4 The number of algorithms of the RDP4 package that were predicting the recombination event and recombinant score are shown.
Figure 3
figure 3

Neighbor-joining phylogenetic trees based on the left region (A) of recombinant point (5′-end), the right region (B) of recombinant point (3′-end) and the recombinant frame (C) sequences of the recombinant regionof SH2016 strain in this study and those of HAdV-C whole genome reference strains from GenBank. Strain SH2016 highlighted with a solid square was characterized in this study. The trees were constructed using the neighbor-joining method of MEGA 6.06 with 1000 bootstrap trials performed to assign confidence to the grouping.

Discussion

In order to ensure the accuracy of the results, the phylogenetic trees were also constructed by maximum likelihood (ML) method implemented in IQ-TREE 1.6.7.113 under the most suitable nucleotide substitution models respectively, which were selected by jModeltest14 [Supplemental Figs S814]. The frameworks of all neighbor-joining (NJ) trees in this study were consistent with ML trees. Intriguingly, the penton NJ-tree was not really informative as the 43 sequences did not feature much divergence between each other (Fig. 1C), and the same situation still appeared in maximum likelihood tree (Fig. S10). This showed that SH2016 was convergent evolution with known HAdV-C sequences in the penton region. As the full genome trees had shown, SH2016 was related to AC_000017 which could be considered as the backbone of the prototype HAdV-1 genome (Fig. 1A, Supplemental Fig. S8). However, SH2016 genome sequence was showing some divergence at the E3 region and putative protein U region of the genome, both of which were located in reconstituted area. The E3 region and putative protein U region of the SH2016 genome (major areas of recombination) were more divergent than the rest of the genome, which had only 82.79% and 85.50% identities with the prototype HAdV-1 (Table 3), respectively.

In summary, the complete genome sequence of the novel recombinant HAdV-1 strain (SH2016) was determined and characterized, isolated in Shanghai, China. Phylogenetic and SimPlot analyses both displayed that the novel subtype of HAdV-1 (SH2016) was a recombinant event involving HAdV-1 and HAdV-2 (Figs 2B,C and 3). And The recombination area was located between 28040 and 31067, which including most of E3, whole U and few of L5 (Fig. 2A,B). However, the process of intratypic recombination incident is not clear in its evolutionary history, only in the case that more sequences were needed to investigate the spatiotemporal relationships of the novel HAdV-C group all over the world.

Comparison of the amino acid sequences of the fiber, hexon and penton of strain SH2016 with other type HAdV-1 fibers, hexons and pentons, only the fiber of strain SH2016 has three mutations. According to the protein structure, the fiber of SH2016 strain could also be divided into three components including an N-terminal tail (FNPVYPYD)2,15,16, two repeat/shaft regions and a C-terminal globular knob17,18,19,20. One (A71T) of mutations occurred in the first repeat/shaft region, and two (V432I and H470N) other mutations occurred in C-terminal globular knob, which is typically responsible for interaction with the cell receptors. Whether the mutations at these sites lead to antigenic drift need to be experimentally validated.

In conclusion, we propose that the SH2016 strain is a novel intratypic HAdV-C strain and may be an etiological agent of SARI. On the basis of their complete genome sequences, it arose through the recombination of two HAdV genotypes, HAdV-1 and HAdV-2, which frequently cause respiratory infection9,21,22,23. Whether the emergence of recombination strain might increase virulence, thereby posing a new global challenge with regard to acute respiratory diseases in the near future, warrants further investigation. So, epidemiological and virological surveillance of this uninvestigated respiratory disease pathogen should be strengthened.

Material and Methods

Specimen collection and identification

Throat swab specimens were collected from the outpatients with respiratory tract infection for surveillance subjects at designated intervals by trained medical staff of Xinhua Hospital Affiliated to Shanghai Jiao Tong University School of Medicine in this study. SH2016 was collected in February 4 at outpatient. The patient was more than two years old and clinically diagnosed with bronchitis and upper respiratory tract infection. The patient was diagnosed with human adenovirus infection and ruled out other possible common viral infections using our previous diagnostic methods24. After 3 days of antiviral treatment, the patient recovered.

Cell culture and virus isolation

HEp-2 cells (from American Type Culture Collection, ATCC Number CCL-23, Manassas, VA, USA) were maintained in complete DMEM supplemented with 10% FBS, 100 U/mL penicillin, and 100 µg/mL streptomycin (Invitrogen, Carlsbad, CA, USA) at 37 °C with 5% CO2. For the virus culture, DMEM with 2% FBS and antibiotics was used. Cells inoculated with clinical samples, which were filtered by the 0.22 m filter (Millipore, Merch, Germany), were incubated at 37 °C for 7 days. If no cytopathic effect (CPE) was observed, the culture supernatants were used to inoculate fresh cells for 2 additional passages. And if the adenovirus-like CPE were appeared, the cultures were passaged again to confirm the presence of the viruses. Virus-infected cells and supernatant were collected and used for subsequent detection and genome sequencing.

DNA extraction, PCR strategy and sequencing

Strain SH2016 was isolated from throat swab and underwent three passages in HEp-2 cells to obtain high-tilter stocks. The viral DNA was extracted using a QIAamp MinElute Virus Spin Kit (Qiagen, Germany) following the manufacturer’s instructions. The primer pairs (Supplementary Table 1) used to amplify complete genome was designed based primarily on of human mastadenovirus C strain CBJ113 (KR699642), human mastadenovirus C isolates human/CHN/BJ04/2012/[P1/H2/F2] (MF315028), human/CHN/BJ09/2012/[P1/H2/F2] (MF315029) and human adenovirus C strain human/EGY/E13/2001/1[P1H1F1] (JX173080)12, respectively.

Twenty four overlapping PCR fragments covering the entire genome were amplified by using the Platinum™ Taq DNA (Invitrogen, Thermo Fisher, CA, USA) according to the manufacturer’s protocol. PCR amplification was carried out at 95 °C for 5 min for one cycle to denature, and followed by 40 cycles for amplification at 95 °C for 30 s, 55 °C for 30 s, 72 °C for 180 s. At the end of the cycling, an additional extension period of 72 °C for 10 min was included, after which the samples were stored at 4 °C. For the 5′/3′-terminal genome sequences, the covalent junction between the purified DNA template and the terminal protein (TP) was broken by the addition of 0.4 N NaOH as described in Xu’s protocol2. The PCR products were separated by electrophoresis on 1.5% agarose gels and visualized under UV light. The PCR amplicon was then inserted into pGEM-T Easy Vector using TA cloning. The recombinant plasmid were identified by amplification primer pairs respectively, and was confirmed via sequencing using M13 forward and M13 reverse primers as sequencing primers. The recombinant plasmids were directly sequenced on an ABI 3730XL automatic DNA analyzer using an ABI Prism BigDye Terminator cycle sequencing kit 3.1 (Applied Biosystems). Either bracketing PCR or internal primers were used as sequencing primers to obtain overlapping and complementary sequences and a minimum twofold coverage. Whole genome sequences were obtained from 24 overlapping sequences assembled in ContigExpress Progect (Vector NTI).

Nucleotide sequence accession number

Annotated genome sequence of SH2016 was submitted to GenBank database under the following accession number MH183293.

Genome annotation

The BLASTn program (National Center for Biotechnology Information, Bethesda, MD, USA) was used to identify the homologous nucleotide sequences in the GenBank database (https://blast.ncbi.nlm.nih.gov/Blast.cgi?PROGRAM = blastn&PAGE_TYPE = BlastSearch&LINK_LOC = blasthome). The SH2016 genome sequence was annotated based on the previous annotation of HAdV-C strain (human/EGY/E13/2001/1[P1H1F1]). The DNA and protein sequence alignments were created by using BioEdit sequence alignment editor software (version BioEdit v7.1.3; Tom Hall, Ibis BioSciences, CA).

Phylogenetic analysis

Phylogenetic trees were generated with MEGA6.06 using the neighbor-joining (NJ) method with the maximum composite likelihood nucleotide substitution model and bootstrap test of phylogeny with replicates set to 1000 to assign confidence to the grouping. The maximum likelihood (ML) phylogenetic tree were reconstructed by the ML method implemented in IQ-TREE 1.6.7.113 based on the different models. Additional, the optimal evolutionary models were identified with the aid of the computer program jModelTest 2.1.714. The resulting ML trees were created and edited using FigTree (http://tree.bio.ed.ac.uk/software/figtree/). Strain SH2016 highlighted with a solid square in NJ trees or with red fond in ML trees were characterized in this study, respectively.

Recombination analysis

The aligned sequences of the SH2016 sequence available from GenBank were subjected to recombination analysis. The Recombination Detection Program (RDP) package Beta 4.96 was used for identifcation of recombinant sequences. Multiple methods in its default mode, such as RDP, GENECONV, BootScan, MaxChi, Chimaera, SiScan, 3Seq, LARD, PhylPro, were utilized to predict potential recombination events between the input sequences. Only those recombination events were taken into considerations which were supported by at least 4 methods to avoid misidentifcation using only a single methodology. The best signals for recombination are associated with the lowest P-values; the highest acceptable P-value was set to 0.05. Recombination events detected with RDP Beta 4.96 were confirmed and visualized with SimPlot Version 3.5.1. Bootscan analysis in the SimPlot package version 3.5.1 was used to test potential recombination events. Bootscan analysis in the SimPlot package version 3.5.1 was used to test potential recombination events. Similarity was calculated in each window of 200 bp by the Kimura (2-parameter) distance model with a transition-transversion ratio of 2.0. The window was successively advanced along the genome alignment in 20 bp increments. For bootscan analysis, the neighbor-joining algorithm was run with 1000 bootstrap replicates. A threshold of 70% or more of the observed permuted trees indicated potential recombination events. Potential genomic components were identified based on genetic distances and phylogenetic analyses.

Amino acid analysis

SH2016 ORFs were compared to 5 prototype sequences and the 38 remaining HAdV-C whole genome sequences from GenBank. The complete genome of SH2016 strain was annotated using AC_000017 (HAdV-1 prototype strain) as template.

Ethics statement

This study was reviewed and approved by the human Research Ethics Committee Ethics Review Committee of the Shanghai Public Health Clinical Center. All methods used in this study were performed in accordance with the relevant guidelines. Written informed consent for the collection of throat swabs for pathogenic identification was obtained from the participants involved in this study.