Detection of SARS-COV-2 variants and their proportions in wastewater samples using next-generation sequencing in Finland

Lipponen, Anssi; Kolehmainen, Aleksi; Oikarinen, Sami; Hokajärvi, Anna-Maria; Lehto, Kirsi-Maarit; Heikinheimo, Annamari; Halkilahti, Jani; Juutinen, Aapo; Luomala, Oskari; Smura, Teemu; Liitsola, Kirsi; Blomqvist, Soile; Savolainen-Kopra, Carita; Pitkänen, Tarja

doi:10.1038/s41598-024-58113-8

Download PDF

Article
Open access
Published: 02 April 2024

Detection of SARS-COV-2 variants and their proportions in wastewater samples using next-generation sequencing in Finland

Anssi Lipponen^1,2,
Aleksi Kolehmainen¹,
Sami Oikarinen³,
Anna-Maria Hokajärvi¹,
Kirsi-Maarit Lehto³,
Annamari Heikinheimo^4,5,
Jani Halkilahti⁶,
Aapo Juutinen⁷,
Oskari Luomala⁷,
Teemu Smura⁸,
Kirsi Liitsola⁶,
Soile Blomqvist⁶,
Carita Savolainen-Kopra⁶,
Tarja Pitkänen^1,4 &
WastPan Study Group

Scientific Reports volume 14, Article number: 7751 (2024) Cite this article

528 Accesses
4 Altmetric
Metrics details

Subjects

Abstract

Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) variants may have different characteristics, e.g., in transmission, mortality, and the effectiveness of vaccines, indicating the importance of variant detection at the population level. Wastewater-based surveillance of SARS-CoV-2 RNA fragments has been shown to be an effective way to monitor the COVID-19 pandemic at the population level. Wastewater is a complex sample matrix affected by environmental factors and PCR inhibitors, causing insufficient coverage in sequencing, for example. Subsequently, results where part of the genome does not have sufficient coverage are not uncommon. To identify variants and their proportions in wastewater over time, we utilized next-generation sequencing with the ARTIC Network's primer set and bioinformatics pipeline to evaluate the presence of variants in partial genome data. Based on the wastewater data from November 2021 to February 2022, the Delta variant was dominant until mid-December in Helsinki, Finland’s capital, and thereafter in late December 2022 Omicron became the most common variant. At the same time, the Omicron variant of SARS-CoV-2 outcompeted the previous Delta variant in Finland in new COVID-19 cases. The SARS-CoV-2 variant findings from wastewater are in agreement with the variant information obtained from the patient samples when visually comparing trends in the sewerage network area. This indicates that the sequencing of wastewater is an effective way to monitor temporal and spatial trends of SARS-CoV-2 variants at the population level.

Viral variant-resolved wastewater surveillance of SARS-CoV-2 at national scale

Article 18 July 2022

Wastewater sequencing reveals early cryptic SARS-CoV-2 variant transmission

Article Open access 07 July 2022

Tracking SARS-COV-2 variants using Nanopore sequencing in Ukraine in 2021

Article Open access 21 September 2022

Introduction

Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has caused a worldwide pandemic with severe socio-economic impacts, indicating the importance of monitoring this pandemic at the population level¹. SARS-CoV-2 variants may have different characteristics, e.g., in transmission, mortality, and the effectiveness of vaccines, indicating the importance of variant detection at the population level².

Recently, it has been reported that subvariants of SARS-CoV-2 are escaping neutralizing antibodies, causing risk of reinfection and indicating the importance of variant tracing³. Wastewater-based surveillance of SARS-CoV-2 RNA fragments has been shown to be an effective way to monitor COVID-19 pandemic trends at the population level⁴. Wastewater-based surveillance has also indicated the potential to detect mutations that have not been detected in clinical samples, and so it could be a valuable complement for clinical surveillance^5,6.

SARS-CoV-2 variants have been detected from wastewater by using next-generation sequencing widely over the world, e.g., in Europe^7,8,9 as well as in North and South America^10,11. Also, various NGS library methods such as enrichment panels¹², amplicon panels^11,13, and metagenomic shotgun sequencing¹² have been utilized. Frequently, part of the genome does not have sufficient coverage, which may be related to wastewater being a complex sample matrix containing environmental factors and PCR inhibitors, possibly leading to the coverage problem¹⁴. This and a highly variable number of variant-defining mutations between known SARS-CoV-2 variants makes variant detection in wastewater samples difficult, especially when part of the variant-defining mutations are in the genomic region with low or null coverage.

In this study, we present next-generation sequencing utilizing the ARTIC Network’s primer set, including the bioinformatics pipeline to follow temporal and spatial trends of SARS-CoV-2 variants in Finland. The pipeline outcome includes (1) sequence quality control with parameters of uniformity, coverage, and the number of mapped reads, (2) a probability calculation with a hypergeometric test to evaluate the existence of a variant, even if part of the genome is not covered, and (3) the evaluation of proportions of SARS-CoV-2 variants in wastewater.

Materials and methods

Wastewater samples

For the purpose of variant detection, 24-h composite wastewater samples were collected, as described in Hokajärvi et al.¹⁵ and delivered within 24 h to a laboratory for analysis. Sixteen influent wastewater samples from 8 November 2021 to 28 February 2022 were collected weekly, following the standard biosafety precautions for handling untreated wastewater, from Viikinmäki Wastewater Treatment Plant (WWTP) in Helsinki, Finland. Viikinmäki WWTP serves the municipalities of Helsinki, Sipoo, Kerava, Tuusula, Järvenpää, Pornainen, Mäntsälä, and partly also Vantaa, covering a total population of 860 000 inhabitants. The steps of the analysis pipeline starting from wastewater samples collected from the wastewater treatment plant are summarized in Fig. 1.

Extraction of nucleic acids

Nucleic acids were extracted from wastewater, as described in Hokajärvi et al. and Tiwari et al.^15,16. In brief, 70 ml of wastewater was pre-centrifugated and supernatants were concentrated with Centricon Plus-70 centrifugal filters (#UFC701008, Merck, Germany) with a concentration time of 25 min in 3000g, producing 240–930 µl of concentrate. The concentrate volume less than 700 µl was normalized to 700 μl for all filtrated supernatants with flow-through wastewater. Nucleic acid was extracted from 300 µl of concentrate using a Chemagic-360D (Perkin-Elmer, MA, USA) instrument with a Chemagic Viral300 DNA/RNA extraction kit (#CMG-1033-S, Perkin-Elmer), and eluted to a 50- or 60-µl kit elution buffer.

Generation of amplicon libraries and next-generation sequencing

Nucleic acids extracted from wastewater samples may contain PCR inhibitors causing poor assay sensitivity, which also need to be taken into account in NGS-based methods¹⁴. To maximize the odds of ensuring high-quality sequencing data by minimizing the risk of inhibition, undiluted and diluted (1:5 or 1:10) RNA was used for cDNA synthesis with LunaScript RT SuperMix (#E3010L, New England Biolabs, MA, USA) in 20-µl reaction volume in accordance with the manufacturer’s protocol.

To generate the amplicon sequencing library, multiplex PCR was performed by using ARTIC.v4 (samples from 8 and 21 November 2021) or ARTIC.v4.1 primers (samples for any other date)¹⁷. An ARTIC Illumina library constructed by using 15 µl of amplified cDNA was purified with QuickStep™ 2 SOPE Resin and EdgeBio Optima DTR 96-well Plate by Edge Biosystems (Edge Biosystems Inc, CA, USA). Library preparation was performed in accordance with COVID-19 ARTIC v3 Illumina library construction and sequencing protocol v.4. Library preparation was miniaturized to × 0.25 from the original reaction volume. Unique Dual Index UMI oligos by IDT (Integrated DNA Technologies, IA, USA) were used as ligation adapters. Illumina-specific p5 and p7 primers were introduced in library amplification PCR. An equivolume pool was formed from amplified libraries and purified from adapter-dimers using Agencourt AMPure XP SPRI paramagnetic bead chemistry (Beckman Coulter, Indianapolis, IN, USA). The library pool was quantified for sequencing using LabChip GX Touch HT High Sensitivity assay (PerkinElmer, Waltham, MA, USA). Sequencing was performed with the Illumina NovaSeq 6000 system using an SP flow cell with a lane divider (Illumina, San Diego, CA, USA) with paired-end 251 bp reads.

Identification of variant specific mutations

Specific mutations of the variants were identified from a CoV-Spectrum platform (https://cov-spectrum.org) based on the Global Initiative on Sharing All Influenza Data (GISAID)^18,19. From CoV-Spectrum, lists of mutations and mutation frequencies in variant sequences of each variant likely to be present in the population—in this case Delta, Omicron BA.1, and Omicron BA.2—were downloaded. Using in-house R-script, mutations that were present in more than 80% of the sequences were identified and then compared to mutations of other variants. Only mutations unique to each variant were left on the mutation reference list.

To use this pipeline in other time frames of interest, a new list of variant specific mutations would need to be generated. In addition, other strategies for generating the reference list can be used, for example including mutations which might affect transmission or disease severity.

Mapping of the reads and quality control

Reads were mapped to reference the SARS-CoV-2 genome (NC_045512.2) with the Burrows-Wheeler Alignment Tool (BWA-MEM)²⁰ using the default settings. Quality control of the raw reads and mapped reads was performed using FastQC and MultiQC^21,22. To evaluate the quality of sequencing and mapping to the reference genome, we calculated the on-target coverage percentage of each nucleotide in the sample when the depth of reads was > 100 and > 1, as well as uniformity. Uniformity was calculated thus:

$$Uniformity=\frac{Total\;number\;of\;bases\;with\;coverage\ge 20\%\;of\;mean}{Total\;number\;of\;targeted\;bases}$$

For further analysis, sequencing data from RNA dilution producing the highest uniformity was used.

Identification of mutations from wastewater samples

ARTIC primers were trimmed from the BAM file with iVar using the ivar trim command, using the default settings (min-length 30; min-quality 20; sliding-window-width 4)²³. Mutations, deletions, and insertions were identified from the trimmed BAM file with ivar variants by using the default settings (minimum quality 20; minimum frequency threshold 0.03, minimum depth 0)^23,24.

By means of iVar output filing, mutations in the samples were divided into three categories:

SARS-CoV-2 detected in the sample: > 20 reads were mapped across the reference genome (NC_045512.2).
Mutation in SARS-CoV-2 in a specific locus can be detected: > 20 read mapped mutation reads per nucleotide²⁵.
Allelic frequency of the SARS-CoV-2 variants in the sample can be calculated: > 20 mutation reads, coverage > 100 mapped reads per nucleotide, and the p-value of a Fisher’s exact test from iVar for mutations < 0.05, which ensure that allelic frequency in a given position is higher than the mean error rate²³.

Since the capability to detect mutations in the sample may vary in accordance with sequencing coverage and uniformity of the reads across the genome, and so all the nucleotides of the genome may not be sequenced as high coverage, a hypergeometric test was applied to justify the existence of the variant in the sample. The hypergeometric test was performed using the phyper function in R (version 4.3.1), utilizing the total number of mutations in the sample, the total number of variant specific mutations found in the sample, the total number of variant specific mutations in the reference, and the total number of nonspecific mutations found in the sample. Based on the results, the variants with the p-value < 0.05 were considered detected.

Proportions of variants in wastewater

To define proportions of the SARS-CoV-2 variants in the sample, average and standard deviation from the nucleotide allelic frequencies were calculated from the samples and variant specific nucleotide mutations, which fulfilled the allelic frequency calculation criteria described above.

Results

SARS-CoV-2 genomic coverage of sequenced samples

Variation in the genomic coverage, uniformity, and number of mapped reads between sequenced samples was observed (Fig. 2). The median of genomic coverage when the depth was > 100 reads per nucleotide was 92.0% (range 66.3–97.1%), the mean of uniformity was 85.5% (range 72.2–90.9%), and the mean of mapped reads 1.916 M (range 0.032–5.403 M) when the highest coverage from two sample dilutions of the same sample was used to identify variants from the given sample. As genomic coverage in some of the samples was < 100%, not all potential mutations could be reliably identified (Table 1, Fig. 2).

Table 1 Quality parameters of Artic amplicon sequencing and mapping of the reads to SARS-CoV-2 reference genome. The best result by coverage from two sequenced nucleic acid dilutions is presented in the table. Data from Viikinmäki Wastewater Treatment Plant, Helsinki, Finland.

Full size table

Identification of variant specific mutations from CoV-Spectrum

SARS-CoV-2 variant-specific mutations were identified from the CoV-Spectrum platform (cov-spectrum.org [accessed 24.03.2023]), which is based on sequences of SARS-CoV-2 clinical samples in GISAID¹⁸. Variant specific mutations were used to recognize SARS-CoV-2 variants from wastewater. To use this pipeline in other time frames where some other variants are present, the list of reference needs to be updated accordingly.

For Delta B.1.617.2, 33 variant specific nucleotide mutations were identified. In total, 21 of these were nucleotide substitutions, of which 18 led to change of amino acid, two were synonymous, and one was an intergenic mutation. A total of 12 nucleotide deletions were identified, from which 11 led to the deletion of amino acids D119-, F120-, E156-, and F157-, and one nucleotide deletion was identified as an amino acid mutation (Table 2).

Table 2 Variant specific mutations identified for Delta B.1.617.2, Omicron BA.1, and Omicron BA.2 variants of the SARS-CoV-2 CoV-Spectrum platform, used to identify variants from wastewater samples.

Full size table

For Omicron BA.1, a total of 37 variant-specific nucleotide mutations were identified. A total of 16 of these were nucleotide substitutions, of which 12 led to the mutation of amino acid, and four were synonymous mutations. The rest (n = 21) were nucleotide deletions, and 18 of these were mutations leading to amino acid deletion, while three were associated with amino acid change (Table 2).

A total of 53 Omicron BA.2 variant-specific nucleotide mutations were identified. Of these, 24 were nucleotide substitutions, of which 17 led to amino acid mutation, and seven were synonymous mutations. Twenty-nine nucleotide deletions led to amino acid deletion and one to amino acid change. A total of 17 were intergenic mutations (Table 2).

Detected SARS-CoV-2 variants in wastewater and their proportions

Based on the detected variant specific mutations, Delta B.1.617.2 was found in wastewater samples from the start of the current study (8.11.2021) until 17.1.2022, according to a hypergeometric test (p < 0.05, Table 3). Delta B.1.617.2 was the most dominant variant in the Helsinki wastewater for the whole year (2022) until 20.12.2022 when Omicron BA.1 became the variant with the highest proportion (p < 0.05) (Fig. 3). Omicron BA.1 was observed for the first time in the influent wastewater of Viikinmäki WWTP jn Helsinki in a composite sample taken on 12–13.12.2021. Omicron BA.2 was observed for the first time on 24.1.2022 (p < 0.05). Both Omicron BA.1 and BA.2 were observed until the end of this study period (28.2.2022). The Omicron BA.2 variant became more common on 28.2.2022 than Omicron BA.1.

Table 3 Detected SARS-CoV-2 mutations found in Helsinki, Viikinmäki WWTP wastewater samples between November 2021 and February 2022. The P-value of hypergeometric distribution when comparing found mutations to known mutations of the variants in the sample. When the p-value of the hypergeometric test was < 0.05, the variant was considered found. Underlining represents synonymous nucleotide mutations.

Full size table

Notably, the Delta variant was detected by hypergeometric tests in the wastewater sample collected on 8.11.2022, even if only 22 of the 33 variant specific mutations were found (p < 0.05). This may be related to 76.9% uniformity and 89.0% coverage, indicating that part of the genome does not have enough reads to assess Delta mutations. This indicates that a hypergeometric test is able to justify the existence of the variant in the sample, even with partial genome coverage (Tables 1 and 3). Interestingly, 13 samples out of 16 contained mutations that were not identified to Delta, Omicron BA.1, or Omicron BA.2 variants (Table 3), indicating that method also identified mutations that are not included in the list of variant specific mutations (Table 1).

Discussion

The present study introduces a complete protocol to identify variants of the SARS-CoV-2 virus and its proportions in wastewater samples. The protocol was tested with 24 h composite samples from influent derived from Helsinki Viikinmäki Wastewater Treatment Plant, Finland. The composite wastewater samples were concentrated with ultrafiltration columns prior to nucleic acid extraction followed by cDNA synthesis. cDNA was used to generate the ARTIC amplicon library, which was then sequenced with Illumina NGS platform. Bioinformatics of NGS data contained the mapping of reads to the SARS-CoV-2 reference genome with BWA-MEM, following primer trimming, detection of mutations, and calculation of allelic frequencies with iVar following quality control along with uniformity, coverage, and the number of mapped reads. Finally, the existence of the variant was evaluated with hypergeometric distribution following the calculation of variant proportions in the wastewater sample. The protocol presented here was used in Finland to assess virus variants and their proportions in the population, and could also be used to evaluate variants in wastewater elsewhere.

To identify the variant specific mutations, we utilized the CoV-Spectrum platform which contains the percentage proportion of amino acid and nucleotide mutations of the clinical sample sequences submitted to database¹⁸. The CoV-Spectrum has been used as a source to identify variant specific mutations in order to compare clinical and wastewater data and the identification of variants from wastewater studies^26,27,28. This way, we could systematically identify variant specific mutations and use them to recognize SARS-CoV-2 variants from wastewater sequencing data in the time frame of interest and update list of specific mutations to recognize other variants. To extend the pipeline to follow other properties of the virus in the population, mutations which might affect transmission or disease severity could be included in the reference list.

As wastewater is a difficult sample matrix affected by variations in the amount and composition of environmental factors and PCR inhibitors, this results in a situation where part of the target genome may not have sufficient coverage¹⁴. As the data presented herein shows, the whole genome may not be covered, even when uniformity percentage and genome coverage are at a sufficient level. Our strategy of using two RNA dilutions before cDNA synthesis from the same sample resulted in over 90% genomic coverage and over 85% uniformity on average, indicating a high success rate but also gaps in the sequencing data. In the optimal case, when sequencing reads are evenly distributed to the target genome and have at least 100× coverage, it should be possible to detect a virus proportion of about 3–6%²³. To evaluate the quality of sequencing, we utilized genome coverage percentage and uniformity as quality parameters. The use of one of these parameters alone could lead to a situation where the sample has sufficient mapped reads, but they are unevenly distributed; or reads are evenly distributed but the coverage is too low to identify mutations and evaluate the allelic frequency.

Similar to our study, many others have used iVar for primer clipping and identification of nucleotide mutations in SARS-CoV-2 wastewater studies^{9,23,29,30,31}. Identified mutations in the samples were systematically categorized into three categories in accordance with the number of mapped reads, sequencing depth, and iVar p-value to evaluate their capability to identify variants and calculate variant proportions in the samples. For the detection of mutations, we used the threshold of 20 mapped reads, as suggested by World Health Organization instructions for clinical samples to detect mutations²⁵. Since wastewater samples might contain more diverse mutations than clinical samples⁵, and some errors in sequencing are possible, the use of any lower threshold might lead to the detection of erroneous mutations. However, to calculate allelic frequency and later in evaluating proportions of the variant, we used > 100 reads per mutation as a coverage threshold to ensure sufficient coverage and reliable value for allelic frequency, as previously described by Rios et al.⁹. This coverage should be enough to detect a frequency of about 10%. However, higher coverage would reduce variation in the estimation of allelic frequency²³.

Evaluation of the presence of variants may be difficult if there are gaps in genome coverage and a variation in the number of variant specific mutations, which have been recognized as challenges in variant detection from wastewater⁵. To evaluate the presence of variants in the sample, we used a hypergeometric test, which has been previously used widely in bioinformatics analysis to evaluate, e.g., enriched pathways in gene expression data^32,33. With this strategy, the evaluation of the presence of variants in the wastewater sample is feasible, even if the sequencing data does not cover all genome areas.

During the period of the study, Delta B.1.617.2, Omicron BA.1, and Omicron BA.2 variants were identified from wastewater influent of Helsinki wastewater. Overall, identified variants and their proportions show positive agreement with clinical samples in the Helsinki area, indicating that this pipeline follows temporal and spatial variation of variants at the population level when visually comparing trends³⁴. We also found some mutations that were not on the list of variant-specific mutations. These mutations may be associated with the founder effect of some variant in Finland, or represent traits of new variants in the population³⁵. Also, those mutations may be shared with two or more variants, or are mutations which have frequency below 80% and are thus not includedin the list of variant-specific mutations. This indicates that the limitation of our strategy is the identification of novel variants, since variants are now recognized by predefined mutations. Discovering novel variants from wastewater may also be challenging due to several virus variants/types in sample and short sequencing reads^5,12,35.

To conclude, the pipeline presented herein is suitable for detecting variants of SARS-CoV-2 from wastewater samples to follow spatial and temporal trends in the population. Also, this pipeline could be easily modified to detect variants of some other pathogen, e.g., influenza or the RS virus, when novel amplicon panels are designed with an appropriate tool³⁶ and by replacing reference genome and primer clipping files of this protocol with the corresponding primer panel in use.

Data availability

The amplicon sequencing data in FASTQ files generated in this publication have been deposited with links to BioProject accession number PRJNA1042787 in the NCBI BioProject database (https://www.ncbi.nlm.nih.gov/bioproject).

References

Nicola, M. et al. The socio-economic implications of the coronavirus pandemic (COVID-19): A review. Int. J. Surg. (London, England) 78, 185–193 (2020).
Article Google Scholar
Tiwari, A. et al. Tracing COVID-19 trails in wastewater: A systematic review of SARS-CoV-2 surveillance with viral variants. Water 15, 1018 (2023).
Article CAS Google Scholar
Hachmann, N. P. et al. Neutralization escape by SARS-CoV-2 omicron subvariants BA.2.12.1, BA.4, and BA.5. N. Engl. J. Med. 387, 86–88 (2022).
Article PubMed Google Scholar
Medema, G., Heijnen, L., Elsinga, G., Italiaander, R. & Brouwer, A. Presence of SARS-coronavirus-2 RNA in sewage and correlation with reported COVID-19 prevalence in the early stage of the epidemic in The Netherlands. Environ. Sci. Technol. Lett. 7, 511–516 (2020).
Article CAS PubMed Google Scholar
Smyth, D. S. et al. Tracking cryptic SARS-CoV-2 lineages detected in NYC wastewater. Nat. Commun. 13, 1–9 (2022).
Google Scholar
Wu, F. et al. Making waves: Wastewater surveillance of SARS-CoV-2 in an endemic future. Water Res. 219, 118535 (2022).
Article CAS PubMed PubMed Central Google Scholar
Agrawal, S. et al. Prevalence and circulation patterns of SARS-CoV-2 variants in European sewage mirror clinical data of 54 European cities. Water Res. 214, 118162 (2022).
Article CAS PubMed PubMed Central Google Scholar
Pechlivanis, N. et al. Detecting SARS-CoV-2 lineages and mutational load in municipal wastewater and a use-case in the metropolitan area of Thessaloniki, Greece. Sci. Rep. 12, 1–12 (2022).
Article Google Scholar
Rios, G. et al. Monitoring SARS-CoV-2 variants alterations in Nice neighborhoods by wastewater nanopore sequencing. Lancet Regional Health Europe 10, 100202 (2021).
Article PubMed PubMed Central Google Scholar
Barbosa, M. R. F. et al. One-year surveillance of SARS-CoV-2 in wastewater from vulnerable urban communities in metropolitan São Paulo, Brazil. J. Water Health 20, 471–490 (2022).
Article PubMed Google Scholar
Sutton, M. et al. Detection of SARS-CoV-2 B.1.351 (Beta) variant through wastewater surveillance before case detection in a community, Oregon, USA. Emerg. Infect. Dis. 28, 1101–1109 (2022).
Article CAS PubMed PubMed Central Google Scholar
Crits-Christoph, A. et al. Genome sequencing of sewage detects regionally prevalent SARS-CoV-2 variants. mBio. https://doi.org/10.1128/mbio.02703-20 (2021).
Article PubMed PubMed Central Google Scholar
Bar-Or, I. et al. Detection of SARS-CoV-2 variants by genomic analysis of wastewater samples in Israel. Sci. Total Environ. 789, 148002 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Ahmed, W. et al. Minimizing errors in RT-PCR detection and quantification of SARS-CoV-2 RNA for wastewater surveillance. Sci. Total Environ. 805, 149877 (2022).
Article ADS CAS PubMed Google Scholar
Hokajärvi, A.-M. et al. The detection and stability of the SARS-CoV-2 RNA biomarkers in wastewater influent in Helsinki, Finland. Sci. Total Environ. 770, 145274 (2021).
Article ADS PubMed PubMed Central Google Scholar
Tiwari, A. et al. Detection and quantification of SARS-CoV-2 RNA in wastewater influent in relation to reported COVID-19 incidence in Finland. Water Res. Oxford. 215, 118220 (2022).
Article CAS Google Scholar
Itokawa, K., Sekizuka, T., Hashino, M., Tanaka, R. & Kuroda, M. Disentangling primer interactions improves SARS-CoV-2 genome sequencing by multiplex tiling PCR. PLoS ONE 15 (2020).
Chen, C. et al. CoV-Spectrum: Analysis of globally shared SARS-CoV-2 data to identify and characterize new variants. Bioinformatics 38, 1735–1737 (2022).
Article CAS PubMed Google Scholar
Elbe, S. & Buckland-Merrett, G. Data, disease and diplomacy: GISAID’s innovative contribution to global health. Glob. Challenges 1, 33–46 (2017).
Article Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (Oxford, England) 25, 1754–1760 (2009).
CAS PubMed Google Scholar
Andrews, S. FastQC: A Quality Control Tool for High Throughput Sequence Data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/.
Ewels, P., Magnusson, M., Lundin, S. & Kaller, M. MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics (Oxford, England) 32, 3047–3048 (2016).
CAS PubMed Google Scholar
Grubaugh, N. D. et al. An amplicon-based sequencing framework for accurately measuring intrahost virus diversity using PrimalSeq and iVar. Genome Biol. 20, 8 (2019).
Article PubMed PubMed Central Google Scholar
Danecek, P. et al. Twelve years of SAMtools and BCFtools. GigaScience. https://doi.org/10.1093/gigascience/giab008 (2021).
Article PubMed PubMed Central Google Scholar
World Health Organization. Genomic Sequencing of SARS-CoV-2: A Guide to Implementation for Maximum Impact on Public Health. https://www.who.int/publications-detail-redirect/9789240018440 (2021).
Caduff, L. et al. Inferring transmission fitness advantage of SARS-CoV-2 variants of concern from wastewater samples using digital PCR, Switzerland, December 2020 through March 2021. Eurosurveillance 27, 2100806 (2022).
Article CAS PubMed PubMed Central Google Scholar
Jahn, K. et al. Early detection and surveillance of SARS-CoV-2 genomic variants in wastewater using COJAC. Nat. Microbiol. 7, 1151–1160 (2022).
Article CAS PubMed PubMed Central Google Scholar
Kuroiwa, M. et al. Targeted amplicon sequencing of wastewater samples for detecting SARS-CoV-2 variants with high sensitivity and resolution. Sci. Total Environ. 893, 164766 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Amman, F. et al. Viral variant-resolved wastewater surveillance of SARS-CoV-2 at national scale. Nat. Biotechnol. 40, 1814–1822 (2022).
Article CAS PubMed Google Scholar
Izquierdo-Lara, R. et al. Monitoring SARS-CoV-2 circulation and diversity through community wastewater sequencing, the Netherlands and Belgium. Emerg. Infect. Dis. 27, 1405–1415 (2021).
Article CAS PubMed PubMed Central Google Scholar
Sapoval, N. et al. Enabling accurate and early detection of recently emerged SARS-CoV-2 variants of concern in wastewater. Nat. Commun. 14, 2834 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Gillespie, M. et al. The reactome pathway knowledgebase 2022. Nucleic Acids Res. 50, D687–D692 (2022).
Article CAS PubMed Google Scholar
Sherman, B. T. et al. DAVID: A web server for functional enrichment analysis and functional annotation of gene lists (2021 update). Nucleic Acids Res. 50, W216–W221 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Genomic surveillance of SARS-CoV-2-THL. Finnish Institute for Health and Welfare https://thl.fi/en/web/infectious-diseases-and-vaccinations/what-s-new/coronavirus-covid-19-latest-updates/coronavirus-variants/genomic-surveillance-of-sars-cov-2.
Karthikeyan, S. et al. Wastewater sequencing reveals early cryptic SARS-CoV-2 variant transmission. Nature 609, 101–108 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Quick, J. et al. Multiplex PCR method for MinION and Illumina sequencing of Zika and other virus genomes directly from clinical samples. Nat. Protoc. 12, 1261–1276 (2017).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

cDNA amplification, library preparation, and sequencing were performed by the Sequencing Laboratory at the Institute for Molecular Medicine Finland FIMM Technology Centre, University of Helsinki, Finland. The authors are grateful to the personnel of Helsinki’s Viikinmäki Wastewater Treatment Plant for providing wastewater samples for this study. The authors wish to acknowledge the CSC-IT Center for Science, Finland, for their computational resources. This study was supported by the emergency support instrument (ESI) under Council Regulation (EU) 2016/369 as amended by Council Regulation (EU) 2020/521, which support Member States in establishing national systems, local collection points, and digital infrastructure for monitoring COVID-19 and its variants in wastewaters. The authors would like to express special thanks to Mirka Korhonen, Marjo Tiittanen, Tiina Heiskanen, Tarja Rahkonen, Eveliina Nurmi, Annika Laaksonen, and Kirsi Ristkari for their assistance in the laboratory, and Kristiina Valkama for her help with sample transportation arrangements.

Author information

A list of authors and their affiliations appears at the end of the paper.

Authors and Affiliations

Expert Microbiology Unit, Department of Health Security, Finnish Institute for Health and Welfare, Kuopio, Finland
Anssi Lipponen, Aleksi Kolehmainen, Anna-Maria Hokajärvi, Tarja Pitkänen, Ananda Tiwari & Anniina Sarekoski
Institute of Biomedicine, Faculty of Health Sciences, University of Eastern Finland, Kuopio, Finland
Anssi Lipponen
Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland
Sami Oikarinen, Kirsi-Maarit Lehto, Annika Länsivaara, Rafiqul Hyder & Erja Janhonen
Department of Food Hygiene and Environmental Health, Faculty of Veterinary Medicine, University of Helsinki, Helsinki, Finland
Annamari Heikinheimo, Tarja Pitkänen, Annamari Heikinheimo, Viivi Heljanko, Venla Johansson, Paula Kurittu, Ahmad Al-Mustapha & Anniina Sarekoski
Microbiology Unit, Laboratory and Research Division, Finnish Food Authority, Helsinki, Finland
Annamari Heikinheimo & Annamari Heikinheimo
Expert Microbiology Unit, Department of Health Security, Finnish Institute for Health and Welfare, Helsinki, Finland
Jani Halkilahti, Kirsi Liitsola, Soile Blomqvist, Carita Savolainen-Kopra, Kati Räisänen & Carita Savolainen-Kopra
Infectious Disease Control and Vaccinations Unit, Department of Health Security, Finnish Institute for Health and Welfare, Helsinki, Finland
Aapo Juutinen, Oskari Luomala & Teemu Möttönen
Department of Virology, Faculty of Medicine, University of Helsinki, Helsinki, Finland
Teemu Smura

Authors

Anssi Lipponen
View author publications
You can also search for this author in PubMed Google Scholar
Aleksi Kolehmainen
View author publications
You can also search for this author in PubMed Google Scholar
Sami Oikarinen
View author publications
You can also search for this author in PubMed Google Scholar
Anna-Maria Hokajärvi
View author publications
You can also search for this author in PubMed Google Scholar
Kirsi-Maarit Lehto
View author publications
You can also search for this author in PubMed Google Scholar
Annamari Heikinheimo
View author publications
You can also search for this author in PubMed Google Scholar
Jani Halkilahti
View author publications
You can also search for this author in PubMed Google Scholar
Aapo Juutinen
View author publications
You can also search for this author in PubMed Google Scholar
Oskari Luomala
View author publications
You can also search for this author in PubMed Google Scholar
Teemu Smura
View author publications
You can also search for this author in PubMed Google Scholar
Kirsi Liitsola
View author publications
You can also search for this author in PubMed Google Scholar
Soile Blomqvist
View author publications
You can also search for this author in PubMed Google Scholar
Carita Savolainen-Kopra
View author publications
You can also search for this author in PubMed Google Scholar
Tarja Pitkänen
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

WastPan Study Group

Sami Oikarinen
, Kirsi-Maarit Lehto
, Annika Länsivaara
, Rafiqul Hyder
, Erja Janhonen
, Annamari Heikinheimo
, Viivi Heljanko
, Venla Johansson
, Paula Kurittu
, Ananda Tiwari
, Ahmad Al-Mustapha
, Tarja Pitkänen
, Anssi Lipponen
, Anna-Maria Hokajärvi
, Anniina Sarekoski
, Aleksi Kolehmainen
, Teemu Möttönen
, Oskari Luomala
, Aapo Juutinen
, Soile Blomqvist
, Kati Räisänen
& Carita Savolainen-Kopra

Contributions

Conceptualization A.L., T.P., J.H., S.O., and C.S-K.; methodology A.L., A.K., S.B., K.L., S.O., and T.S.; software A.L., A.K., J.H., O.L., T.S., and A.J.; validation A.L., T.P., and A.K.; formal analysis A.L., O.L., A.J., and A.K.; investigation A.L., A.K., S.B., A-M.H., and K.L.; resources T.P. and C.S-K.; data curation A-M.H., A.L., and A.K.; writing – original draft A.L., A.K., O.L., and A.J.; writing – review & editing A.L. and J.H., A-M.H., S.B. and T.P.; visualization A.L. and A.K.; supervision A.L., T.P., and C.S-K.; project administration supervision A.L., T.P., and C.S-K.; and funding acquisition T.P. and C.S-K.

Corresponding author

Correspondence to Anssi Lipponen.

Ethics declarations

Competing interests

Sami Oikarinen and Kirsi-Maarit Lehto reports a relationship with Greenseq Ltd that includes: board membership. Other authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Lipponen, A., Kolehmainen, A., Oikarinen, S. et al. Detection of SARS-COV-2 variants and their proportions in wastewater samples using next-generation sequencing in Finland. Sci Rep 14, 7751 (2024). https://doi.org/10.1038/s41598-024-58113-8

Download citation

Received: 26 November 2023
Accepted: 25 March 2024
Published: 02 April 2024
DOI: https://doi.org/10.1038/s41598-024-58113-8

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.