Respiratory viral co-infections among SARS-CoV-2 cases confirmed by virome capture sequencing

Accumulating evidence supports the high prevalence of co-infections among Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) patients, and their potential to worsen the clinical outcome of COVID-19. However, there are few data on Southern Hemisphere populations, and most studies to date have investigated a narrow spectrum of viruses using targeted qRT-PCR. Here we assessed respiratory viral co-infections among SARS-CoV-2 patients in Australia, through respiratory virome characterization. Nasopharyngeal swabs of 92 SARS-CoV-2-positive cases were sequenced using pan-viral hybrid-capture and the Twist Respiratory Virus Panel. In total, 8% of cases were co-infected, with rhinovirus (6%) or influenzavirus (2%). Twist capture also achieved near-complete sequencing (> 90% coverage, > tenfold depth) of the SARS-CoV-2 genome in 95% of specimens with Ct < 30. Our results highlight the importance of assessing all pathogens in symptomatic patients, and the dual-functionality of Twist hybrid-capture, for SARS-CoV-2 whole-genome sequencing without amplicon generation and the simultaneous identification of viral co-infections with ease.

,047 probes (120 mers) targeting 29 human respiratory viruses representing six major pathogenic viral clades, from Twist Biosciences. Furthermore, we evaluated the feasibility of simultaneously performing SARS-CoV-2 whole genome sequencing (WGS) analysis using data generated from both methods. This demonstrated the utility of such a process, and greater WGS coverage achieved using hybrid-capture sequencing over existing amplicon-based procedures in situations where primer binding sites are abolished by genomic deletions.

Rate of viral co-infection in Australian cases.
We examined the respiratory virome of 92 SARS-CoV-2 cases who tested positive for SARS-CoV-2 RNA between March and May 2020 in New South Wales (NSW), Australia (Supplementary Table 1). The abundance of SARS-CoV-2 RNA in the respiratory specimens obtained from these cases was diverse, with qRT-PCR cycle threshold (Ct) values ranging between 13.3 and 39.7. This was equivalent to a viral load range between 1.4 × 10 8 copies/mL and less than 10 copies/mL (Supplementary Table 2). Sequencing all vertebrate-infectious viruses in these specimens using VirCapSeq hybrid-capture generated a total of 982 million raw reads, an average of 3.6 million adapter/host filtered reads per sample. Overall, 47 species of viruses belonging to 17 different genera were detected with a minimum of 20 virus-classified reads per million (rpM; Supplementary File 1; Fig. 1a). None of these viruses were detected in the negative controls (SSC_1 and SSC_2), ruling out laboratory contamination or index switching as a source of spurious virus detection (Fig. S1a). SARS-CoV-2 reads above the positivity threshold were detected in 80% of samples (74/92), of which the highest Ct was 39.7. No SARS-CoV-2 reads were detected in the coronavirus-negative clinical controls (nCoV_neg_1 and _2; Fig. 1a). Among other respiratory viruses, picornaviruses (all of which were rhinoviruses) were detected in 5% (5/92) of cases and in nCoV_neg_1, and influenzavirus A in 2% (2/92) of cases. Consistent with our published data in other cohorts [9][10][11] , the non-respiratory viruses detected included mammarenaviruses (41%), roseoloviruses (36%), alphapolyomaviruses (35%), papillomaviruses (20%), lymphocryptoviruses (12%), lentivruses (9%), anelloviruses (3%), simplexviruses (3%), pestiviruses (3%), mastadenovirus (1%) and norovirus (1%). Overall, sequences of viruses other than SARS-CoV-2 were detected in 74% (68/92) of cases, and the rate of co-infection between SARS-CoV-2 and other respiratory viruses was 8% (7/92).   Fig. 1b). None of these viruses were detected in the negative controls (SSC_3 and SSC_4), ruling out laboratory contamination or index switching as a source of spurious virus detection (Fig. S1b). Some sequences in the negative controls were determined as false-positive hits to human adenovirus B, arising from alignment of reads to human genomic DNA sequences cloned within an adenovirus vector backbone (Fig. S1c). SARS-CoV-2 reads were detected in 95% (79/83) of cases, including 16/83 samples undetected using VirCapSeq. SARS-CoV-2 was absent in both coronavirus-negative controls. Consistent with VirCapSeq results, rhinovirus and influenzavirus were the only other respiratory viruses detected. Moreover, the Twist and VirCapSeq panels showed good concordance, with all samples positively identified for these viruses being detected on both platforms, with the exception of one additional rhinovirus-positive case detected by Twist (nCoV_235; Fig. 1a,b). We note that this discordant case was only marginally above the positivity threshold and reads covered < 5% of the reference genome (Supplementary Table 3). Other low-level positives included non-respiratory viruses: bocaparvoviruses, cytomegalovirus, flavivirus and roseolovirus (Fig. 1b).
Validation across multiple bioinformatics pipelines. The choice of de novo assembler can profoundly impact the analysis and interpretation of virome sequencing data 12 . To test whether the respiratory viral co-infections detected by VirCapSeq and Twist capture sequencing were reproducible using other pipelines that apply a different assembler or a k-mer approach, we compared IDseq results to outputs of VirMAP 13 and OneCodex 14 (Supplementary  Table 3; Fig. S2). In contrast, VirCapSeq consistently achieved higher mean depth of coverage across most segments of the influenzavirus (Supplementary Table 5; Fig. S2). Nevertheless, Twist capture sequencing provided complete genome coverage of co-infecting influenzaviruses at high depth, sufficient to detect single nucleotide variants (SNVs) at the consensus level (Fig. 2). For both rhinoviruses and influenzaviruses, identified types were concordant between VirCapSeq and Twist panels. Therefore, both platforms were suitable for sequence determination of co-infecting respiratory viruses. www.nature.com/scientificreports/ SARS-CoV-2 genome coverage. Viral WGS is being widely applied to study the transmission and evolution of SARS-CoV-2. Amplicon-based sequencing is currently used most frequently but has some limitations in scalability and reproducibility. Given the high sensitivity of the Twist panel for detecting SARS-CoV-2 reads even at viral loads near the qRT-PCR limit of detection (Fig. 3a), we investigated whether the Twist sequencing data provided sufficient coverage of SARS-CoV-2 for WGS analysis. The mean number of SARS-CoV-2 reads detected by the Twist was > tenfold higher than for VirCapSeq (Fig. S3). For samples quantitated with a Ct < 30 on qRT-PCR, Twist capture sequencing achieved a minimum > tenfold sequencing depth across > 90% the SARS-CoV-2 genome for 95% (57/60) of samples (Fig. 3b), and of > 30-fold depth for 89% (53/60) of samples (Fig. S4). The highest Ct at which Twist provided > 90% coverage of > tenfold depth was 32.1. Even in the sample with the lowest viral load (Ct 39.7), 91% of genome was covered at 1X depth (Supplementary File 2). www.nature.com/scientificreports/ Compared with the Twist platform, VirCapSeq did not achieve > 90% coverage even at 1X depth for all samples, except two with Ct < 20 ( Fig. 3b; Fig. S4; Supplementary File 2). To test whether this low genome coverage was primarily due to the fewer number of SARS-CoV-2 reads detected compared to Twist, SARS-CoV-2 reads in the Twist dataset were sub-sampled to equal to that of VirCapSeq. Even after sub-sampling, Twist-enriched sequences maintained > 90% coverage of 10X depth across the SARS-CoV-2 genome in > 50% of the samples (Fig. S5). This demonstrated that the superior coverage of the SARS-CoV-2 genome achieved using Twist capture sequencing was only partially due to the higher number of SARS-CoV-2 reads. Coverage heterogeneity was the more important determinant, with reads being uniformly distributed across the SARS-CoV-2 genome in Twist samples but scattered unevenly for VirCapSeq (Fig. 3c). Therefore, while the VirCapSeq panel is suitable for detection of SARS-CoV-2 and coinfecting viruses, it is unsuitable for SARS-CoV-2 WGS.

Detection of inter-individual variation of SARS-CoV-2.
We evaluated the capacity to identify interindividual genetic variation of SARS-CoV-2 from Twist sequencing data. Among 83 cases examined by Twist capture sequencing, the SARS-CoV-2 genomes of 48 cases were previously characterized from same samples through amplicon-based WGS on the Illumina platform. This confirmed the presence of inter-individual single nucleotide variants (SNVs) at the consensus level 15 . Taking the amplicon-WGS data as the truth set, we assessed the sensitivity and precision of consensus sequence variants detected from Twist sequences. Overall, 338 consensus level SNVs were detected with 96% sensitivity and 98% precision, perfectly identified in 88% (42/48) of samples examined (Supplementary Table 6).

Detection of ORF8 deletion and validation using amplicon WGS.
Multiple studies report major structural variations (SVs) in the SARS-CoV-2 genome [16][17][18][19] , namely the 382 nt deletion in the open reading frame 8 (ORF8), associated with changes in the replicative fitness 17 and milder infections 18 . Therefore, we investigated if similar ORF8 deletions could be detected from the cases examined by Twist capture sequencing. We identified two cases with a common 328 bp deletion in ORF8 (nCoV_200 and nCoV_225). Providing further validation, the same deletion was detected in both cases through amplicon based WGS ( Fig. 4; Fig. S6), using Oxford Nanopore Technology (ONT). Interestingly, in both cases, the deletion abolished a primer site, causing the failure of an adjacent amplicon (2.5 kb) and resulting in incomplete coverage of the SARS-CoV-2 genome when profiled by amplicon sequencing. By contrast, hybrid capture sequencing was able to achieve complete genome coverage. This demonstrates that Twist capture sequencing achieves sufficient coverage to reliably detect large deletions in the SARS-CoV-2 genome for clinical specimens of Ct < 30 and is more robust to large deletion or rearrangements in the genome, which can disrupt amplicon schemes.

Discussion
Determining the co-infection rate and consequent clinical impacts on COVID-19 is critical, particularly where therapeutic interventions for some coinfecting agents such as influenzavirus are available. In this study, we sequenced the respiratory virome using two hybrid-capture approaches and multiple taxonomic read classification pipelines, demonstrating an 8% rate of co-infection with other respiratory viruses among SARS-CoV-2 cases in Australia. This is less than half the rate reported in Northern California 2 , but greater than the initial estimates www.nature.com/scientificreports/ from Wuhan (0.0%) 1 and rates of viral co-infection reported from Chicago (1.6%) 20 , New York (2.0%) 21 , Singapore (1.4%) 22 , Barcelona (0.6%) 23 and Turkey (2.0%) 6 . Furthermore, it is higher than the 4.6% of co-infection observed among 175 cases from the same region, diagnosed and tested during a similar time period using multiplex qRT-PCR 7 . Such high inter-and intra-regional variability warrants further investigation, particularly in developed countries with similar SARS-CoV-2 incidence to Australia. Recent data from Iran 4 and Poland 24 support higher mortality of COVID-19 patients as a result of respiratory viral co-infections. Previous studies have reported co-infections between SARS-CoV-2 and common respiratory viruses including rhinovirus, influenzavirus, meatapneumovirus, parainfluenzavirus and respiratory syncytial virus 5,25 . In our results, co-infections between SARS-CoV-2 and rhinoviruses (6%) were predominant, lower for influenzaviruses (2%). Interestingly, co-infection between SARS-CoV-2 and influenza was not observed in a recent report of Australian SARS-CoV-2 cases 7 . The case specimens examined in the present study were collected between March and May 2020, overlapping with the start of the Southern Hemisphere influenza season. In Australia, the flu season thus far has reported > 90% reduction in incidence of influenzavirus infections compared to the same period in 2019 as a result of social distancing and mandatory quarantine measures applied during the COVID-19 pandemic 26 . Therefore, the significant reduction in circulating influenzavirus may have contributed to its low co-infection rate with SARS-CoV-2.
Our study has several limitations. All specimens examined in this study were freeze-thawed twice before library preparation. This may have prevented detection of viruses that were originally at very low titer. Therefore, the actual rate of co-infection may exceed 8%. To eliminate this potential in future analyses, double-stranded cDNA should be generated on the same day as SARS-CoV-2 qRT-PCR, from the same nucleic acid extracts avoiding freeze-thaw. A key limitation of our analysis was the lack of clinical metadata, precluding examination of potential associations between respiratory viral co-infection with SARS-CoV-2 and clinical outcomes of COVID-19. Although comparable to other co-infection studies to date, our sample size was small and included only a single timepoint for each case. Nevertheless, this represents the largest metagenomic sequencing study to date, examining co-infections between SARS-CoV-2 and other respiratory viruses.
There is growing appreciation for SARS-CoV-2 WGS as an essential tool to investigate the transmission and evolution of SARS-CoV-2, critical for research and public health responses to COVID-19 [15][16][17][27][28][29][30][31] . Existing WGS approaches can be divided into two main categories: 1. Amplicon sequencing; and 2. Hybrid-capture sequencing using SARS-CoV-2-specific probes. Neither are capable of simultaneously detecting co-infecting viruses. Our analysis of the SARS-CoV-2 genome using Twist-enriched sequenced demonstrated high breadth and depth of coverage for samples with Ct < 30, sufficient for downstream analysis of SNV, indels and SVs. This was despite using single-end sequence data. Hence, even greater confidence in variant calling can be achieved using paired-end sequencing. Overall, target enrichment sequencing using the Twist Respiratory Virus Panel offers dual-functionality, providing effective characterization of co-infecting respiratory viruses and the full genome of the SARS-CoV-2, simultaneously.
Unlike amplicon sequencing, Twist hybrid-capture does not require generation of SARS-CoV-2 amplicons. This significantly reduces processing time and manual handling, lowering the risk of cross contamination. Using the Twist's fast hybridization and multiplexed capture workflow, libraries ready for high throughput sequencing can be constructed from clinical specimen extracts in < 8 h. Although the amplicon approach can also construct libraries within a similar timeframe, in our experience of using two published amplicon WGS methods 15,28,32 , generating amplicons often took longer than anticipated due to certain parts of the genome amplifying poorly, requiring continuous optimization. Furthermore, in this study, all libraries were hybridized with Twist probes for 2 h. However, this can be reduced to 30 min with minimal loss in capture efficiency.
The default protocol for Twist hybrid-capture supports multiplexing up to 8 libraries (8-plex) per capture hybridization, combining libraries by equal mass to make up 1.5 µg of total DNA, or up to 4 µg total without compromising the efficacy of target enrichment. In this study, we performed Twist capture on libraries pooled up to 20-plex, whilst still maintaining the 4 µg total DNA limit. This highly multiplexed sample processing significantly reduced processing time, labor and cost per sample. Current per-sample cost of Twist Respiratory Virus Panel in a 20-plex sample format is $25 USD. This compares favorably with the cost of VirCapSeq (~ $23 USD per sample), particularly given its advantages in sensitivity and genome coverage of SARS-CoV-2.
Taken together, we provide a practical and cost-effective strategy for characterizing both respiratory viral co-infections and the full SARS-CoV-2 genome simultaneously, from clinical specimens with Ct < 30 or viral load > 3,000 copies/mL. We also recommend IDseq as the preferred pipeline for taxonomic classification of viral sequences in SARS-CoV-2 specimens, based on its high sensitivity for SARS-CoV-2 and other respiratory viruses, ease of use, and minimal requirements in terms of infrastructure and bioinformatic expertise. We envision broad application of our approach across research and clinical settings.

Clinical samples and SARS-CoV-2 qRT-PCR. Respiratory specimens of SARS-CoV-2 cases (adults) in
NSW diagnosed between March and May 2020 were obtained from at the Prince of Wales Hospital in Randwick, Sydney, Australia. Ethical approval and informed consent waiver was received from the South Eastern Sydney Local Health District Human Research Ethics Committee (2020/ETH02639). All methods were performed in accordance with the relevant guidelines and regulations. Prior to this study, samples were freeze-thawed twice and stored at − 80 °C following diagnostic testing at the NSW Health Pathology East Serology and Virology Division (SaViD). In total, 92 nasopharyngeal swabs suspended in Viral Transport Media (VTM) were selected for virome capture sequencing, all positive for a combination of four SARS-CoV-2 target genes (RdRp, S, N and E) on the Allplex SARS-CoV-2 qRT-PCR Assay (Seegene, Seoul, Korea). The approximate copy number of SARS-CoV-2 RNA was calculated by plotting the Ct against a standard curve built from tenfold serial dilution of a  34 and RAPsearch2 35 , respectively. Putative accessions were assigned to each read using the NCBI accession2taxid database (ftp://ftp.ncbi.nih.gov/pub/taxon omy/ acces sion2 taxid ) and a BLAST + (v 2.6.0) 36 database. In parallel, short reads were de novo assembled into contigs using SPAdes 37 . Raw reads were mapped back to the resulting contigs using Bowtie2 38 , to identify the contig to which they belong. Finally, each contig was aligned to the set of possible accessions represented by the BLAST database, to improve the specificity of alignments to all the underlying reads. Only viruses detected at ≥ 20 rpM based on nt alignments (NT rPM) were deemed positive and included in heatmaps generated using iheatmapr 39 .
For the comparative analysis between IDseq and other bioinformatic pipelines, taxonomic read classification summaries were generated using VirMAP 13 and OneCodex 14 . VirMAP was installed and run on the National Computing Infrastructure (NCI) HPC Gadi with modifications described at https ://githu b.com/rsoft one/virma p. OneCodex is a premium cloud-based pipeline, for which raw fastq files generated using Twist hybrid-capture sequencing were uploaded and summary reports downloaded using the web browser interface (https ://app. oneco dex.com/).

Data availability
All de-identified metagenomic sequencing data (raw and processed fastq files) will be made publicly available in time for publication.