RT-qPCR-based tests for SARS-CoV-2 detection in pooled saliva samples for massive population screening to monitor epidemics

Swab, RT-qPCR tests remain the gold standard of diagnostics of SARS-CoV-2 infections. These tests are costly and have limited throughput. We developed a 3-gene, seminested RT-qPCR test with SYBR green-based detection designed to be oversensitive rather than overspecific for high-throughput diagnostics of populations. This two-tier approach depends on decentralized self-collection of saliva samples, pooling, 1st-tier testing with highly sensitive screening test and subsequent 2nd-tier testing of individual samples from positive pools with the IVD test. The screening test was able to detect five copies of the viral genome in 10 µl of isolated RNA with 50% probability and 18.8 copies with 95% probability and reached Ct values that were highly linearly RNA concentration-dependent. In the side-by-side comparison, the screening test attained slightly better results than the commercially available IVD-certified RT-qPCR diagnostic test DiaPlexQ (100% specificity and 89.8% sensitivity vs. 100% and 73.5%, respectively). Testing of 1475 individual clinical samples pooled in 374 pools of four revealed 0.8% false positive pools and no false negative pools. In weekly prophylactic testing of 113 people within 6 months, a two-tier testing approach enabled the detection of 18 infected individuals, including several asymptomatic individuals, with substantially lower cost than individual RT-PCR testing.


Results
Validation of primer design. Three detection primer pairs, targeting SARS-CoV-2 genes: RNA-dependent RNA polymerase (RdRp), Spike protein (Spike) and Nucleocapsid protein (N) and additionally, control primer pair, targeting human housekeeping gene of Glyceraldehyde-3-phosphate dehydrogenase (GAPDH) was designed. Bioinformatic specificity analysis revealed that the chosen viral detection primers were specific to 99.857% of the analyzed Coronaviridae genomes with four false positive results, while this test was 99.995% sensitive towards SARS-CoV-2 genomes with four false negative results. A summary of this information is presented in the confusion matrix in Fig. 1.

Criteria of SARS-CoV-2 detection in screening test.
Melting temperatures of specific qPCR products for viral genes: data from several independent experiments using several different matrix RNA sources (tittered standard RNA or RNA isolated from BEI standard or RNA isolated from clinical samples) were analyzed. Data from samples with high probability of being true positive measurements (both replicates similar, melting temperature (Tm) close to preliminary measured values and cycle threshold (Ct) < 30 cycle) were extracted. Melting temperatures for individual genes were as follows (mean ± 3xSD (°C), number of data points): RdRp 78.25 ± 0.311, 140; Spike 80.27 ± 0.319, 140; N 80.72 ± 0.322, 140; GAPDH 78.99 ± 0.445, 344. Considering the screening purpose of the test, we intentionally chosen slightly wider threshold values to keep overall output oversensitive rather than overspecific, especially because weaker signals (higher Ct values) tended to deviate more. Finally, the chosen qualification rules for each gene were selected as follows: "to be considered specific, the measured melting temperature must lie within the following intervals (center of an interval ± deviation (°C)): 78.3 ± 0.35 for RdRp; 80.3 ± 0.35 for Spike; 80.7 ± 0.35 for N; 79.0 ± 0.50 for GAPDH". Representative melting curves are presented in Fig. 2a. The thermal cycler used in this study (LightCycler 480) always saves a colder peak as Tm1 and a warmer peak as Tm2 in the case of a two-product melting curve, regardless of the height of such peaks; if the case of two products occurred, we checked both measured melting temperatures against the chosen threshold. In the next series of experiments, seven SARS-CoV-2-positive saliva samples were pooled in duplicate with a pooling factor of four-, six-or eightfold, with randomly chosen negative samples (as described in "Materials and methods"), and RNA was isolated. The screening test was able to correctly identify 20/20 nega-  Based on Ct values for minimal RNA concentration that still gives detectable positive signal (from calibration curve experiment), a Ct value of 36 was chosen as a threshold value above which amplification was considered nonspecific by default. In the case of the control human gene (GAPDH) Ct values were extracted from several independent experiments in which RNA isolated from individual or pooled saliva samples was tested. GAPDH from 243 tested samples had a mean Ct value of 20.53 with an SD of 1.61, and the highest Ct value reached in the tested population was 26.39. Therefore, we decided to use a slightly higher threshold value of 28.0, above www.nature.com/scientificreports/ which, probably due to inefficiency of RNA isolation or too high of an inhibitor load in a sample, the result of the test is questionable. Based on the results collected during the preparation of calibration curves, the probability of obtaining a positive result of the screening test, depending on the viral RNA concentration in the tested sample, was estimated. A logistic curve was fitted to the calculated probability values (Fig. 3b); the P 50 value was reached at a concentration of 4.86 ± 0.11 genome copies per 10 µl of RNA sample (mean ± SEM), and the P 95 value was 18.8 ± 0.75. www.nature.com/scientificreports/ Next, we assessed the efficiency and reproducibility of viral RNA isolation. Seven representative saliva samples were used as carriers for 10 4 , 10 5 and 10 6 viral particles/ml; subsequently, RNA was isolated according to a standard protocol, and a screening test was performed. Only 54% of viral gene replicates at a concentration of 10 4 /ml were positive, and thus for further analysis, only two higher concentrations (which were 100% positive) were used. The results are presented in Fig. 3c. The median RNA isolation efficiency was equal to 5.00%, with lower and upper quartiles equal to 1.33% and 11.09%, respectively. This value is somewhat dependent on the concentration of viral particles in the source material (3.42 × higher median for the 10 6 /ml group than 10 5 /ml) and strongly dependent on individual saliva sample parameters (13.2 × difference between the highest and lowest efficient saliva samples). To complete the methodology of quantitative measurements, an experimental cross reactivity test with RNA isolated from four respiratory viruses was performed and revealed no amplification in viral genes of the screening test. The results are presented in Fig. 3d. None of the replicates of viral genes shown in a figure can be considered specific according to assumed test thresholds (Tm and Ct values). GAPDH, the control gene, showed specific amplification because the RNA samples were derived from human cell cultures.

Economic analysis of pooled screening test.
The results of the comparison of the cost of screening pooled test to individual, in vitro diagnostics (IVD) certified, diagnostic testing are presented in Fig. 4. This analysis suggests that pooling of samples has economic justification only when the predicted number of positive cases in the tested population is below 30%. At the higher prevalence of infection, cost of the pooling approach per sample exceeds the cost of the individual diagnostic tests and negatively affects workflow organization (additional RNA isolation and detection steps), significantly delaying the results. However, in the case of lower prevalence, the presented approach may significantly decrease the cost of testing large groups of people. Comparing different pooling factors, the fourfold option seems to be the most cost effective, assuring the lowest overall cost in the widest range of higher positive case ratios and maintaining a very good baseline cost. Higher pooling factors present better economic outcomes only in the screening population with a relatively low prevalence of infection (< 2.5% of positive cases in the population). It should also be taken under consideration that the cost of pooled testing is highly sensitive to the false positive ratio of the screening test; therefore, such a test must be highly sensitive for screening purposes but also provide high specificity.
Validation of screening test in routine use. We performed multiple rounds of monitoring of employees of two institutions involved in the study for SARS-CoV-2 infection (involving a total of 113 people who decided to take part in our study). Testing was performed on weekly basis. All positive pools were unraveled, and individual samples were tested using a reference test (IVD certified commercial diagnostic test). In selected rounds, negative pools were also unraveled, and individual samples were tested with a reference test. The data on the performance of the screening test under such a scenario are presented in Fig. 5. The screening test showed a 0.80% false positive rate, defined as the number of pools assigned as positive by the screening test outcome that did not contain at least one individual component categorized as positive or inconclusive in the reference test after unraveling the pool, divided by the total number of pools tested in screening. In 22 unraveled negative pools, there was no false negative defined as a pool tested as negative with at least one individual component categorized as positive or inconclusive in the reference test. www.nature.com/scientificreports/

Discussion
We have developed and characterized a qPCR-based method for the detection of SARS-CoV-2 dedicated to massive testing using pooled saliva samples. Tests have been employed in practice in a two-tier approach, in which each pool positive in the screening test is unraveled and individually tested using a standard diagnostic test. The idea of massive screening of populations for the presence of viruses is being explored in different countries with different testing methodologies. One potential method of such testing is based on simple point-of-care rapid tests such as antigen tests 11,12 or loop-mediated isothermal amplification (LAMP)-based genetic tests 13,14 . Such an approach has certain obvious advantages, such as decentralization of testing, fast turnover of test results, and relatively low cost. On the other hand, available, rapid individual tests do not have sufficient sensitivity to identify infected individuals at the earliest phase of infection (prior to onset of symptoms) and asymptomatic carriers 15 . Massive screening of populations performed with the use of such tests has resulted in unsatisfactory outcomes in several countries, among which testing of almost the entire country population of Slovakia is a wellpublicized example 16 . On the other hand, testing over 7 million people in the Chinese city Qingdao in October 2020 that employed PCR-based, pooled, screening test allowed to nip the epidemics in the bud 17 . Comparison of the further development of epidemics in Slovakia and Quindao after the publication date of the articles cited above seems to further support the better outcome of Quindao screening 18 . Our approach for massive PCR-based screening for SARS-CoV-2 is based on the detection of viral RNA in saliva. While nasopharyngeal swabs are still considered by the WHO as the diagnostic gold standard, there is already a bulk of literature data validating saliva as a diagnostic specimen. Several clinical studies comparing nasopharyngeal swabs and saliva as diagnostic specimens have been performed on a relatively large number of patients in different countries and have demonstrated comparable sensitivity and specificity of both diagnostic approaches [19][20][21][22][23] . This conclusion is also supported by a systematic review of 28 original reports 24 and by a metaanalysis of clinical data representing 16 clinical trials performed on 5922 patients who showed 83.2% sensitivity of saliva-based tests compared to 84.8% nasopharyngeal swabs 25 . Taking into account comparable sensitivity, it should be stressed that saliva-based testing has a significant advantage over swabs in screening children in schools, where the discomfort of performing swabs may prevent compliance with routine testing [26][27][28] . It is also preferable in terms of mitigating the risk of exposure of medical personnel to infection, as self-collected samples could be directly delivered to designated drop-off points, which allow scattered sample collection without involving additional healthcare workers.
The proposed seminested qPCR with SYBR green-based detection of viral RNA based on the combination of the Ct threshold and range of Tm (Fig. 2a,b) showed high sensitivity and specificity. The sensitivity allows for the detection of five copies of the viral genome (probability > 50%) and 95% probability of detection of 18.8 copies of the genome in 10 µl of isolated RNA, which is comparable to the sensitivity of qPCR detection based on fluorescent hybridization probes. According to the manufacturer manuals of both IVD reference tests used in this study (DiaPlexQ (SolGent) and Triplex (Vazyme)), both tests have a limit of detection of 200 genomes per milliliter of sample prior to RNA isolation (2 copies/10 µl) in nasopharyngeal swabs. Screening test in saliva showed the threshold of detection of 27 copies/10 µl. The direct comparison of published sensitivity thresholds is challenging because of different RNA isolation methods and different diagnostic specimens employed. Additionally, the limit of detection values published for a group of diagnostic assays officially approved until half of 2020 varied by 10,000-fold 29 . However, in side-by-side comparison, our test in saliva-based testing performs as well as Triplex (data not shown) and slightly better than DiaPlexQ. Our data suggest that practical test sensitivity will be limited by isolation of the RNA step; samples with a lower number of viral particles may limit detection even to 5% of the sensitivity observed with the reference RNA solution and allow the detection of approximately www.nature.com/scientificreports/ 27 viral particles in 10 µl of saliva (> 50% chance of detection) (Fig. 3c). Relatively high sensitivity is matched by high specificity demonstrated in silico (Fig. 1) and experimentally (Fig. 3d). This method also showed good linearity within a wide range of viral RNA concentrations (Fig. 3a) and viral load (Fig. 3c). This is expected because SYBR green-based qPCR is well fitted for quantification of nucleic acids and makes the proposed method applicable in a quantitative approach for detecting the viral load in SARS-CoV-2-infected patients. In particular, quantitative measurements of SARS-CoV-2 RNA concentration in saliva are well correlated with the clinical outcome of infection 28,30 . High sensitivity of detection of viral RNA allowed for effective test application to pooled biological samples in this method, which demonstrated 100% specificity and 89.8% sensitivity in pools of saliva factored by 4, 6, and 8. This is consistent with published data on the application of fluorescent hybridization probe-based qPCR tests to pooled saliva samples, where there was 90-94% conformity with individual testing 31 . The test has already been implemented on a small scale for the routine testing of a group of 113 employees with promising results as the first tier of a two-tier testing approach. With such an approach, testing polled saliva samples shows significant cost savings compared to individual testing based on commercially available IVD RT-qPCR tests (Fig. 4). Our results revealed that pooling saliva samples for the detection of SARS-CoV-2 infection support testing with substantial cost savings, especially at lower prevalence levels. The similar results have been obtained by other authors 23,32,33 . The best time and cost efficiency is achieved when a 96-well plate is filled with samples (360 individual samples pooled by 4, and controls). Testing as few as 12 pools is still cost efficient compared with individual tests but does not result in any time saving. Therefore presented method is most suitable for routine repetitive testing of a selected populations (e.g. students, workers etc.) when large number of samples is collected at the same time. In case of fewer numbers of samples or samples sourcing from high prevalence populations (e.g. sets of samples from infectious diseases hospital) the entire pooling protocol may be completely omitted and samples tested individually at comparable cost but faster. Implementation of the developed test for routine testing generated data for 1475 individual samples combined into 374 pooled samples that demonstrated very good test performance, with 0.8% false positives and undetectable false negatives for 88 individually analyzed samples (Fig. 5). The lack of false negatives suggests that our pooled saliva-based qPCR testing may have a much lower false negative ratio than previously prognosed for saliva pools factored by 2-, 4-, 8-, 16 and 32 at the level of 10% 34 . However, calculated number of false positives and false negatives (especially) must be treated with caution as relatively low number of true positives were detected and not all samples were tested for false negativity. It is worth mentioning that routine testing based on this test several times led to the detection of SARS-CoV-2 infection in asymptomatic persons, preventing them from coming to work and most likely decreasing the risk of virus transmission among coworkers. This difficult to objectively control but repeatable observation is well aligned with the output of an in silico model of epidemics in hospitals predicting that weekly screening by saliva-based PCR tests would be able to detect 95% of symptomatic and 30% of asymptomatic SARS-CoV-2 infections and reduce numbers of new infections 35,36 . Furthermore, the cumulative percentage of employees identified in prophylactic screening as SARS-CoV-2-positive was 15.9%, which is nearly threefold larger than the 6.21% of cumulative positive cases for the whole Polish population as of April 1, 2021 18 and for a particular region (Voivodeship of Lodz with 6.53% of cumulative positive cases vs. 6.95% of the mean for the whole country, according to data from the Polish Ministry of Health at April 15, 2021). At the same time, the average percentage of positives per number of individual tests was 1.22%, which is well below the 5% maximum threshold suggested by the WHO as a prerequisite for effective monitoring of epidemic dynamics 37 . This is once again in contrast with the percentage of positives reported in diagnostic tests in Poland that, since the middle of October 2020, continuously exceeded 20% with a peak of 59% at 16.11.2020, and a temporary fading period, with a positive level of approximately 10-15% at the turn of January and February of 2021 (Polish Ministry of Health). Taken together, these data support the validity of the proposed analytical approach for prophylactic screening of selected populations for infection during COVID-19 epidemics. Choice of conservative regions in SARS-CoV-2 genome. First, 94,155 genome sequences were downloaded from the GisAid database, including 94,139 SARS-CoV-2 and 16 non-SARS-CoV-2 genomes (Appendix 1, access 07.09.2020). Then, one reference genome (NC_045512.2) was chosen from which sequences of RdRp, Spike and N genes were selected. Using BLAST 2.9.0 + (default parameters) and reference gene sequences as queries, desired genes were extracted from the remaining SARS-CoV-2 genomes. Sequences were aligned in MAFFT(v7.310) with adjustment of direction and the FFT-NS-1 method to build a full MSA (multiple sequence alignment) and to find conservative regions among selected sequences. Moreover, as a control for the reaction, consensus sequences of the human GAPDH gene were extracted from 5 transcriptional variants: 1,2,3,4 and 7 (NM_002046.7, NM_001256799.3, NM_001289745.3, NM_001289746.2 and NM_001357943.2). (Table 1) targeting the mentioned fragments were designed using the open access software UGENE (v.35). Then, the primers' sensitivity and specificity were verified. For this purpose, all Coronoviridae genomes marked as complete were downloaded from the nucleotide database (NCBI). First, non-SARS-CoV-2 genomes were chosen for specificity analysis, resulting in 2802 sequences. Then, from all SARS-CoV-2 genomes, any incomplete duplicates or sequences shorter than 90% of the reference NC_045512.2 genome were deleted, and 80,770 SARS-CoV-2 were used for the sensitivity analysis. In silico PCR was conducted 38 with the mismatch threshold set to1 bp. In this way, a set of primers (both seminested and detection primers) was chosen. Primers were synthesized by Genomed (Warsaw, Poland) and purified by high-performance liquid chromatography. Parameters of primers (GC content, Tm and self-complementarity) were analyzed in Oligo Calc 39 . Primers used in the screening test were routinely premixed and stored as ready-to-use stock solutions with each primer concentration of 10 µM in ultrapure water; details are listed in Table 2.

Primer design and verification. Sets of primers
Design of seminested RT-PCR. The first step of the procedure consisted of reverse transcription of viral RNA and 20 cycles of multiplex PCR. First, 5 µl of Mastermix 1 (3.75 µl of water; 0.25 µl of Primer mix 1; 1 µl of dNTPs mix) and 10 µl of RNA solution were added to each well in a 96-well PCR plate. The plate was sealed,  RpRp_F  GAA ATC AAT AGC CGC CAC TAGAG  153  SARS-CoV-2  Detection   RpRp_R  GGC ATG GCT CTA TCA CAT TTAGG  153  SARS-CoV-2  Detection   Spike_F  AGA AGT CCC TGT TGC TAT TCA TGC  199  SARS-CoV-2  Detection   Spike_R  TGC CCG CCG AGG AGA ATT AGT  199  SARS-CoV-2  Detection   N_F  CGC GAT CAA AAC AAC GTC GGC  96  SARS-CoV-2  Detection   N_R  GGA ATT TAA GGT CTT CCT TGC CAT G   Cross reactivity with RNA of selected respiratory viruses. Genomic RNA standards of human respiratory viruses were used according to the manufacturer's instructions. 5 µl of HCoV 229E, 10 µl HCoV NL63, 2 µl RSV A2001/3-12, and 5 µl RSV B1 were added per reaction and tested according to the main protocol of the method. As a positive control, additional reactions were performed with primers specific for these viruses (Table 1). In control reactions, RNA was transcribed to cDNA with a single reverse primer. The first PCR step was omitted, and 3 µl of undiluted cDNA sample was directly used as a matrix for qPCR.

RNA-based Ct calibration curve.
Titrated SARS-CoV-2 RNA isolated from infected Vero cell culture was serially diluted in ultrapure water. At each concentration, reverse transcription coupled with seminested PCR was performed in duplicate, followed by qPCR in triplicate for each gene, resulting in 18 qPCR replicates in total and six for each individual gene. Only data points with correct melting temperatures were employed for further analysis. Ct values were plotted against logarithmic concentration of viral RNA, and linear regression curves were fitted to data using GraphPad Prism software.
Establishing criteria for SARS-CoV-2 detection in screening tests. First, melting curves from both reference viral RNA standards and from clinical samples with high probability of being true positives were used to determine the melting temperature range for specific PCR products in qPCR outcome for viral gene detection. Next, the Ct threshold distinguishing positive from negative qPCR outcomes for viral gene detection was determined by parallel analysis of RNA from a set of pooled saliva samples with a screening test and reference test (DiaPlexQ). To this end, 7 representative SARS-CoV-2-positive saliva samples (with different viral loads) and 20 negative saliva samples were used. Each positive sample was pooled by combining 3, 5 or 7 randomly chosen negative samples. Each pool was prepared in duplicate (but with different negative samples). All prepared pools as well as individual saliva samples (positive and negative) were isolated according to the described protocol and tested with the screening and reference test. Data from this experiment were also employed to determine the number of positive qPCR replicates needed to consider the clinical pooled saliva sample as positive.
Estimation of the probability of obtaining a positive result. Data points from calibration curve preparation were extracted. For each tested concentration of viral RNA, the number of positive (determined according to criteria of SARS-CoV-2 detection in screening test) replicates of viral genes (N, Spike and RdRp) was divided by the total number of tested viral replicates (n = 18). The result is an observed probability ( p ) of obtaining a positive replicate of the viral gene at a given concentration of viral RNA. As the criteria of positive screening tests may assign to each replicate only a value of 1 or 0, we assumed that the variable was Bernoulli's binomial distribution. We used the normal distribution approximation approach to calculate the 95% confidence interval (CI) of the data: where p is an observed probability for a single replicate and p positive is the probability for the whole test to give a positive result. Newly calculated intervals were plotted against logarithmic concentration of viral RNA, and a four-parameter logistic curve was used to calculate P 50 and P 95 (concentration at which test has 50% and 95% chance to give positive outcome, respectively). Curve fitting was performed in GraphPad Prism software.

RNA isolation efficiency. From a group of SARS-CoV-2 negative (confirmed by 2019-Novel Coronavirus
(2019-nCoV) Triplex RT-qPCR diagnostic test, Vazyme), heat inactivated, frozen saliva samples, 21 specimens were randomly chosen. Subsequently, samples were divided into seven three-element groups, and saliva within groups was equally pooled and mixed, forming seven averaged saliva representations. Each pooled sample was aliquoted by 500 µl into four test tubes, and 10 µl of control: Hanks' balanced salt solution (HBSS) or proper dilution of standard: titrated, heat-inactivated SARS-CoV-2 virus in HBSS was added. Next, RNA was isolated, and a screening test was performed. RNA isolation efficiency was calculated based on measured Ct values and the Ct calibration curve; theoretical viral load was calculated for each gene replicate. Subsequently, that value was multiplied by the volume coefficient of 26.7 (which converts [n/10 µl of RNA] into [n/ml of saliva]: 10 µl of RNA sample out of 80 µl of RNA isolate from 300 µl of saliva sample) and divided by the initial viral load in the saliva sample.

Economic analysis of pooled screening test. Center of Molecular Diagnostics of Pathogens, Proteon
Pharmaceuticals S.A. (Lodz, Poland) provided information on costs included in testing viral samples according to the gold standard diagnostic approach (based on commercially available one-step RT-qPCR diagnostic kits, isolation of RNA, disposables, personal protective equipment (PPE), labor and other direct costs of diagnostics).
The summarized costs were considered as reference 100%-the whole cost of standard diagnostic test per sample. Subsequently, the cost of the screening test was calculated based on the following premises: (1) positive cases were randomly distributed among the tested population with an even distribution; (2) the test had ideal sensitivity and specificity; and (3)