Accuracy of self-collected versus healthcare worker collected specimens for diagnosing sexually transmitted infections in females: an updated systematic review and meta-analysis

The use of self-collected specimens as an alternative to healthcare worker-collected specimens for diagnostic testing has gained increasing attention in recent years. This systematic review aimed to assess the diagnostic accuracy of self-collected specimens compared to healthcare worker-collected specimens across different sexually transmitted infections (STIs) including Chlamydia trachomatis (CT), human papillomavirus (HPV), Mycoplasma genitalium (MG), Neisseria gonorrhoea (NG), Treponema pallidum and Trichomonas vaginalis (TV) in females. A rigorous process was followed to screen for studies in various electronic databases. The quality of included studies was assessed using the Quality Assessment of Diagnostic Accuracy Studies 2 tool. There were no studies on syphilis that met the criteria for inclusion in the review. A total of six studies for chlamydia, five studies for HPV, four studies for MG, and seven studies for gonorrhoea and trichomoniasis were included in the review. However, not all studies were included in the sub-group meta-analysis. The analysis revealed that self-collected specimens demonstrated comparable diagnostic accuracy to healthcare worker-collected specimens across most STIs. This indicates that the diagnostic accuracy of self-collected specimens can provide accurate results and enhance access to diagnostic testing, potentially improving healthcare service delivery. Future research should further explore the diagnostic accuracy of self-collected specimens in larger and more diverse populations.

healthcare worker collected specimens as the reference or gold standard, (c) the study population comprised of specimens that had been tested for STIs including HPV, NG, CT, Treponema pallidum (syphilis), Trichomonas vaginalis (TV), and Mycoplasma genitalium (MG), (d) examined self-collected versus clinician-collected samples using different diagnostic assays including nucleic-acid-based assays, and manual methods that included wet mount, culture, and gram stain peer-reviewed studies published in 2015 and onwards to diagnose STIs.Data on investigations conducted on females was extracted from studies that include people of another gender.There were no language restrictions applied and studies with different study designs were included.Studies were excluded if: (a) the time of self-sampling and healthcare worker specimen collection exceeded three weeks due to the window period for seroconversion, (b) presented information on combined specimen results, (c) self-sampling was not conducted in females, (d) self-sampling and healthcare worker collected specimen was collected from different individuals.

Index test
The diagnostic accuracy of self-collected specimens to diagnose STIs was evaluated against healthcare worker specimens.Self-collected specimens for STI diagnosis included vaginal swabs, urine, cervical swabs and tampons.The sensitivity and specificity of each diagnostic assay for each STI were evaluated.

Reference standard
Healthcare worker-collected specimens for the diagnosis of STIs were used as the gold reference standard in this study.

Search strategy
A systematic search of data was conducted in Cochrane, Medline, Scopus, Web of Science, and PubMed electronic databases (see Table 1).The search was limited to studies from 2015 onwards.The Principal Investigator (PI) developed the search strategy with an experienced librarian at the University of Pretoria.Medical Subject Headings (MeSH) terms were used to define our searches with Boolean operators (AND/OR) between search terms.The search terms used included but were not limited to (1) "Self-sampling" or "self-collected" or 'selfadministered" or "self-obtained" (2) "sexually transmitted infections" (3) "diagnostic specimens" or "diagnostic samples" (4) "women" or "females".A hand search for grey literature was also conducted on the WHO website, the Department of Health South Africa (DoH SA), and the Open Grey website.

Study selection
Screening of studies suitable for inclusion in the systematic review and meta-analysis was conducted on the studies between 2015 and 2022.Since this systematic review stems from the findings of a scoping review which was conducted in 2021.Studies which had been screened for the scoping review from 2015 to 2021 were rescreened using eligibility criteria for the systematic review.To ensure the inclusion of studies conducted in 2022, the assisting librarian conducted a new search for studies that were published in 2022.An EndNote library was then created for all studies that were eligible for full-text screening.Thereafter, ZNJ and TD performed fulltext screening of all studies that fulfilled the eligibility criteria of the systematic review and meta-analysis.NT resolved discrepancies that arose during full-text screening by ZNJ and TD.Thereafter, ZNJ and NT extracted data from studies found eligible for inclusion at the full-text screening stage.Thereafter, any disagreements were resolved by discussion until an agreement was reached.Study selection for the systematic review was guided by the PRISMA flowchart.

Data extraction
ZNJ and NT independently extracted data from eligible studies using a data extraction tool that was designed to extract data from the included primary studies.The tool was piloted using 10% of the included studies and amended accordingly before final use.The extracted data was divided into two separate sections namely a section for basic qualitative information and another section for the quantitative outcomes of interest.Basic information extracted included author name(s) and year of publication, study title, study aims, study population, study design, sample size, eligibility criteria, reference standard specimen, type of self-collected specimen, type of laboratory assay, main findings, and conclusions.Data extracted for the section on the outcome of primary studies true positive, true negative, false positive, false negative, sensitivity and specificity, positive predictive value, negative predictive value, and evidence of agreement or concordance between self-collected and healthcare worker collected specimens.In some instances, the true negative, true positive false positive and false negative results were not available, and the relevant data was requested from the authors.A 2 × 2 table was produced based on the collected data.Any discrepancies that arose between the reviewers were discussed until a unanimous resolution was reached.

Assessment of methodological quality
The Quality Assessment of Diagnostic Accuracy Studies 2 (QUADAS-2) tool for primary diagnostic accuracy studies, was utilised to assess the quality of all the included studies 25 .This tool consists of four main domains that include patient selection, index test, reference standard, and flow and timing 25 , which were adapted to the current study accordingly.To determine the risk of bias, signalling questions answered as "yes" "no" or "unclear", were used in each phase 25

Statistical analysis and data synthesis
For included studies in which sensitivity and specificity had been assessed and reported a meta-analysis of diagnostic accuracy was performed.The Review Manager (RevMan) software was used to conduct statistical analysis.The RevMan software was also used to calculate the pooled sensitivity, specificity, and diagnostic odds ratio with a 95% confidence interval.Cochran's Q statistics were utilised to determine heterogeneity among the included primary studies.Statistical significance in all the analyses was calculated using the p-value where a p-value of < 0.05 indicated statistical significance.

Ethical approval
Ethical clearance for the study was obtained from the University of Pretoria's Faculty of Health Sciences Research Ethics Committee.The reference number is 136/2022.Participant consent was not applicable.

Study selection and characteristics of included studies
Sixteen studies conducted in 2015, which were retrieved during a database search for the scoping review underwent title screening using the relevant eligibility criteria for the systematic review.For the new database search conducted by the librarian to ensure the inclusion of studies in Aug 2022, forty-eight search results were retrieved.Nine were duplicates, which left only thirty-nine eligible for title screening.The abstract screening was then conducted on fifty-five studies (thirty-nine plus sixteen studies).Post abstract screening, thirty-seven studies ( sampling OR sample OR "self sampling" OR "self sample" OR "sti testing" OR "sti diagnosis" OR "sexually transmitted infections test*" OR "self-collect*" OR "sexually transmitted disease testing*" ) AND ( (MH "Sexually Transmitted Diseases + ") OR "Sexually Transmitted Disease*" OR "sexually transmitted infection*" OR STI OR STD ) AND ( "Specimen Handling" OR (MH "Specimen Handling + ") ) NOT ( (MH "HIV") OR (MH "Acquired Immunodeficiency Syndrome") OR aids OR "HIV Infections" OR hiv OR "human immunodeficiency virus" OR "acquired immunodeficiency syndrome" ) www.nature.com/scientificreports/were excluded and only eighteen studies were eligible for data extraction.Reasons for exclusion were studies presenting data on pooled specimens, studies not presenting data on self-collected and healthcare worker collected specimens, and studies not about self-sampling STIs.Post full text screening of the studies only fourteen were eligible for inclusion in the systematic review.Four studies were excluded for being conducted before 2015, studies not about self-sampling, not about STIs, and a study presenting data on pooled specimens.Ultimately, data extraction was conducted on a total of fourteen studies (see Fig. 1 below).There was moderate agreement between the reviewers at full-text screening (kappa = 0.5).

Characteristics of included studies
The characteristics of included studies are all depicted in Table 2. Fourteen studies were included in the systematic review but not all of them were included in the meta-analysis.A large portion of the studies, five studies, were from the United States of America (USA) [26][27][28][29][30] , one study in Canada 31 , one in Haiti 32 , one in France 33 , one study in Saudi Arabia 34 , one in India 35 , one in the Republic of Korea 36 , one study in Kenya 37 , one in Chad 19 , one study in Ghana 38 .See Table 3 for quantitative characteristics of included studies.It is important to note that some of the sensitivity and specificity measurements were obtained from the articles as calculated by the authors.However, where the measurements were not available, the researchers calculated using data that was already available on the manuscripts and original data obtained from authors of some of the included studies.Furthermore, for studies where this information was not available at all, it was not reported.The characteristics of the included studies were further divided into sub-groups for meta-analysis for each STI as outlined in the following sections: Chlamydia A total of six studies compared the diagnostic accuracy of self-collected specimens to healthcare worker collected specimens in females 19,27,30,31,33,37 .Five of the studies were conducted in a clinic 19,27,31,33,37 , and study location was not reported for one of the studies 30 .Of these six studies, three of them compared healthcare worker collected vaginal swabs to self-collected vaginal swabs 30,31,33 .In two of the studies healthcare workers collected cervical swabs were compared to self-collected cervicovaginal swabs 27,37 .Only one study compared healthcare worker collected endocervical swabs to self-collected veil specimens 19 .STI testing was performed using automated NAAT based assays.All six studies were cross-sectional studies.In five of the studies, research participants had received instructions on how to self-collect specimens for testing 19,27,31,33,37 , and in one study the research participants did not receive any instructions 30 .The number of research participants in the studies ranged from 189 to 3860.Only four of the six studies were included in the subgroup meta-analysis 19,30,31,33  www.nature.com/scientificreports/study was excluded because only agreement data was reported and the other parameters were not reported 37 .
Similarly, the other study only reported sensitivity and specificity data 27 .Figure 2 presents research findings for the subgroup analysis of four studies, where the summary estimate for sensitivity was 0.85 (95% Confidence  Interval 0.77-0.92),while specificity was 0.95 (95% Confidence Interval 0.91-0.98).The SROC plot (Fig. 3) is a depiction of the pooled sensitivity and specificity of the studies.
The studies show statistical significance in the studies, but there is moderate evidence of heterogeneity among the studies.The diagnostic tests have a good discriminatory ability to differentiate between individuals with and without chlamydia (Table 4).

Human papilloma virus
Five studies compared the diagnostic accuracy of healthcare worker collected specimens with self-collected specimens to diagnose HPV 27,28,32,36,38 .Three of the studies compared healthcare worker collected cervical swabs were compared to self-collected vaginal swabs 32,36,38 , while one study compared healthcare worker collected   www.nature.com/scientificreports/cervical swabs with self-collected tampons and vaginal swabs 28 , and another study compared healthcare worker collected cervical swab with self-collected cervicovaginal swabs 27 .All the studies were conducted in a research clinic.The sample size of the studies ranged from 151 to 1836 study participants.Study participants received instructions on how to self-collect their specimens for STI diagnosis, prior to specimen collection.NAAT based diagnostic assays were used in all the studies.Four of the studies were cross-sectional studies 28,32,36 , and only one was a clinical trial 27 .In one study, the sensitivity and specificity of self-collected specimens was 100 and 88.9% respectively, while healthcare worker collected diagnostic result sensitivity and specificity were 100 and 90% respectively 27 .In another study, the sensitivity and specificity of self-collected specimens compared to healthcare worker collected specimens were 92.6 and 95.9% respectively 38 .One study reported the sensitivity of self-collected specimens as 100% 36 .Another study reported the sensitivity and specificity of only self-collected swab as 86 and 94% respectively, while for the self-collected tampon it was 77 and 100% respectively 28 .Another study reported sensitivity results of self-collected specimens as 89.1% and sensitivity of healthcare workers collected specimens as 87.9% 32 .However, a sub-group meta-analysis was not performed because the relevant data for TN, FN, TP and FP was not available.

Mycoplasma genitalium
Out of the four studies that investigated MG infection, two studies compared self-collected cervicovaginal swabs with healthcare worker collected cervical swabs 27,37 ; one study compared healthcare worker collected vaginal and cervical swabs with self-collected vaginal swabs 33 , and another one compared healthcare worker collected endocervical swabs with self-collected veil specimens 19 .Diagnostic testing was performed using NAAT based assays in all the studies.All the studies were conducted in clinics.In all the studies, research participants received instructions on how to self-collect specimens before collecting their own specimens.The sample size ranged from 193 to 1028 participants.All studies were cross-sectional.Only two of the included studies had sufficient data for a meta-analysis for this subgroup 19,33 .Figure 4 presents the analysis of the two studies where the summary estimate for sensitivity was 0.49 (95% Confidence Interval 0.39-0.58)and for specificity was 0.88 (95% Confidence Interval 0.81-0.94).Presented below in Fig. 5 is the SROC plot depicting the diagnostic accuracy of the studies in this subgroup.
The sub-group meta-analysis suggests that the accuracy of the diagnostic test may vary across studies, with poor sensitivity in one study and poor specificity in the other.However, overall, the test shows a moderate to high diagnostic accuracy, as indicated by the high DOR value (Table 5).

Gonorrhoea
Seven studies investigated the diagnostic accuracy of self-collected specimens in comparison to healthcare worker collected specimens in diagnosing NG.Six of these studies were cross-sectional 19,26,30,31,33,37 , and only one was a clinical trial 27 .The sample size of the studies ranged from 89 to 3860.Laboratory diagnosis was performed using automated NAAT based assays in all the studies, and one of the studies also used manual diagnostic methods 26 .Six studies reported that specimen collection had occurred at research clinics 19,26,27,31,33,37 , and one study did not indicate 30 .In six of the studies, research participants received instructions before specimen collection 19,26,27,31,33,37 , but in one study there was no report about whether research participants had been instructed how to self-collect their specimen 30 .Two studies compared diagnostic accuracy in healthcare worker collected vaginal swabs to self-collected vaginal swabs 30,31 .One study compared self-collected vaginal swabs to cervical and vaginal swabs collected by healthcare workers 33 .Two studies compared diagnostic accuracy in self-collected cervicovaginal swabs and healthcare worker collected cervical swabs 27,37 .In one study diagnostic accuracy is compared between healthcare worker collected endocervical swabs with self-collected vaginal swabs 26 .Another study compared diagnostic accuracy in self-collected veil specimens with healthcare worker collected endocervical swabs 19 .Figure 6 below presents summary estimates for the sensitivity and specificity of diagnostic accuracy of healthcare worker collected specimens compared to self-collected specimens.The summary estimate for sensitivity and specificity is 0.59 (95% Confidence Interval 0.49-0.68)and 0.84 (95% Confidence Interval 0.76-0.91).
Presented below in Fig. 7 is the SROC plot depicting the diagnostic accuracy of the studies in this subgroup.The Cochran's Q test shows significant heterogeneity among the studies at 17.156.The diagnostic odds ratio of 2.579 suggests that the overall accuracy of the diagnostic test is low to moderate.The p-value indicates statistical significance (Table 6).

Trichomoniasis
Seven studies investigated the diagnostic accuracy of self-collected specimens in comparison to healthcare worker collected specimens in diagnosing trichomoniasis.Six of the studies were cross-sectional 19,29,[33][34][35]37 , and one study was a clinical trial 27 . For studies utilised automated NAAT-based assays 19,27,33,37 , one study used manual testing methods 35 , while two studies used both automated NAAT assays and manual methods for TV diagnosis 29,34 .Study participants in all the studies collected their specimens at the research clinics.In five of the studies the participants received instructions on how to self-collect specimens before collecting their specimens 19,27,[33][34][35]37 , and in one study this was not reported 29 .The sample size of research participants ranged from 174 to 1867. One study compared te diagnostic accuracy of healthcare worker collected vaginal and cervical swabs with self-collected swabs 33 .Two studies compared healthcare worker collected cervical swabs with self-collected vaginal swabs 29,37 .
One study compared endocervical swabs collected by healthcare workers with self-collected veil specimens 19 .Two studies compared diagnostic accuracy between self-collected vaginal swabs with healthcare worker collected vaginal swabs 34,35 .Only one study compared healthcare worker collected endocervical swabs with self-collected vaginal swabs 19 .Figure 8 below presents summary estimates for the sensitivity and specificity of diagnostic accuracy healthcare worker collected specimens compared to self-collected specimens.The summary estimate for sensitivity and specificity is 0.94 (95% Confidence Interval 0.89-0.98)and 0.91 (95% Confidence Interval 0.85-0.96)respectively and it is depicted on the SROC in Fig. 9 below.Additionally, Fig. 9 depicts the diagnostic accuracy of the studies in this subgroup.
The Cochran's Q test result shows that there is significant heterogeneity among the studies and the diagnostic test is moderately accurate in identifying patients with disease (Table 7).

Methodological quality of studies
Table 8 below depicts the risk of bias and applicability assessment of included studies using the QUADAS-2 tool used to assess quality 25 .The domains of the QUADAS-2 tool are patient selection, index test, reference standard, and flow and timing.Patient selection outlines the process of selecting study participants in the primary studies which includes setting, presentation, prior testing, and intended use of index test; index test describes how the test of interest was conducted and interpreted; reference standard describes how the standard test was conducted and interpreted, and flow and timing describe excluded studies and intervals between the index and reference   tests 25 .For the current study, the index test is designated as the self-collected specimens, while the reference test refers to the healthcare worker collected specimens.For the majority of the studies, the sampling approach utilised was convenience sampling and not random or consecutive sampling which are the options available in the patient selection domain.Although convenience sampling was used for most of the studies and therefore introduced a high-risk bias, that is unlikely to interfere with the diagnostic accuracy of self-sampling and healthcare worker collected specimens.The reference standard domain and flow and timing domains were found to mostly be at low risk of bias in all the studies.Concerning applicability, all studies were at low risk of bias.However, regarding the applicability of the reference standard, it was unclear for most studies.The graphical results of the included studies from the QUADAS-2 quality assessment tool are indicated in Fig. 10.

Discussion
This study compared the diagnostic accuracy of self-collected specimens to healthcare worker collected specimens for diagnosing CT, HPV, MG, NG, syphilis, and TV in females.No studies on syphilis fulfilled the eligibility criteria for inclusion in this review.For CT, six studies were included in the analysis, out of which four were included in the subgroup meta-analysis.The summary estimate for sensitivity was 0.85 (0.77-0.92), while specificity was 0.95 (0.91-0.98).For HPV, five studies were included, and there was insufficient data to perform a sub-group meta-analysis.However, the sensitivity and specificity of self-collected specimens of the individual studies compared to healthcare worker collected specimens varied between studies, with sensitivity ranging from 86 to 100%, and specificity ranging from 88.9% to 100%.For MG, four studies investigated diagnostic accuracy, and two studies had sufficient data for a sub-group meta-analysis.The summary estimate for sensitivity was low at 0.49 (0.39-0.58), while specificity was 0.88 (0.81-0.94).For NG, seven studies were included in the analysis, and four studies were included in the sub-group meta-analysis.The pooled sensitivity and specificity estimate was 0.59 (0.49-0.68) and 0.84 (0.76-0.91) respectively.In the case of CT and NG, it is important to note that the low sensitivity and high specificity are comparable to previous findings 15 .For TV, seven studies investigated diagnostic accuracy, and four studies were included in the sub-group meta-analysis.The results of the meta-analysis showed that self-collected specimens have high sensitivity and specificity for the diagnosis of trichomoniasis, with a summary estimate for sensitivity and specificity of 0.94 (0.89-0.98) and 0.91 (0.85-0.96), respectively.
The study found that there was significant heterogeneity among the studies.This may be attributed to differences in the methods used to collect and test specimens across the different studies.The DOR results indicated that the diagnostic tests used in the studies had a good ability to differentiate between individuals with and without CT, HPV, NG, MG and TV.The study also presented a SROC curve to visualize the sensitivity and specificity of all included studies, with most points falling between 0.9 and 1.00 on the y-axis (sensitivity), indicating better performance in distinguishing between the presence and absence of infection.
The QUADAS-2 tool was used to assess the quality of the included studies, and it showed that a majority of them used convenience sampling to select patients.Although this sampling method can increase the risk of bias, it did not appear to affect the diagnostic accuracy of self-collected specimens and specimens collected by  healthcare workers.Most of the included studies had a low risk of bias in the index test, reference standard, flow, and timing domains.Overall, the included studies introduced minimal bias, which enhances the quality of the research findings.Study screening, selection, and data extraction were conducted systematically to ensure the most suitable studies were included in the review.A comprehensive approach to reviewing existing evidence on the diagnostic accuracy of self-collected specimens versus those collected by healthcare workers was employed.Only peer-reviewed and published studies were included to ensure reliable results.Some of the included studies utilized convenience sampling, which may have introduced bias in the patient selection process.
Since we classified healthcare worker collected specimens as the gold-standard diagnostic accuracy was presumed to be 100%.For CT the healthcare worker collected sensitivity ranged between 50 and 100%, while specificity was 88 and 99.2%; for MG sensitivity ranged between 97 and 100%, while specificity was 88 and 100%; NG sensitivity ranged between 40 and 97%, while specificity was 88 and 100%; and TV sensitivity ranged between 96 and 100%, while specificity was 88 and 100%.
The results indicate that self-collected specimens are a comparative alternative to healthcare worker collected specimens for STI testing.This is in keeping with previous studies that advocate for the use of self-sampling interventions as alternative tools to enable and promote screening of STIs even in asymptomatic patients and resource-limited settings 15,39 .These findings have important implications for STI testing, particularly in settings where access to healthcare workers may be limited or where stigma and embarrassment may prevent individuals from seeking testing.

Limitations
The lack of eligible studies for syphilis and insufficient study data for meta-analysis in HPV limits the comprehensiveness of the review.There was significant heterogeneity among included studies, likely due to varying specimen collection and testing methods, which introduced variability and challenges with generalizability of the findings.Despite efforts to minimise bias during data analysis, the use of convenience sampling in most studies introduced potential bias in patient selection.Assuming the accuracy of the gold standard of healthcare worker-collected specimens may not fully capture variability in sensitivity and specificity among these samples.Conversely, the wide range of sensitivity and specificity values across individual studies underscores the complexity of interpreting overall diagnostic accuracy.Lastly, it is important to consider that the findings of this study may not be generalizable to resource-limited settings where access to healthcare workers and testing facilities differs.

Conclusion
This study presents evidence of the accuracy of self-collected specimens when used to diagnose STIs in females.The meta-analysis findings highlight that the diagnostic accuracy of self-collected specimen to diagnose STIs in females is comparable with that of healthcare worker collected specimens.When considering the global burden of STIs on the public health system, such findings are an indication of how self-sampling for STI diagnosis could be used to improve STI management services across the globe.Although much evidence exists on the use of this intervention in high-income countries 22 , the researchers hope that the findings of this study will capture the attention of governments in LIMCs and cause them to see their need for it.Furthermore, the potential of selfsampling interventions to improve screening of asymptomatic STIs must be recognized and utilized as a tool to fulfil goal 3 of the sustainable development goals which is targeted at treating and improving access to quality healthcare for all people across the globe.The study is limited in that the investigation of diagnostic accuracy of self-collected specimens was only conducted on females.Therefore, the findings are not representative of selfcollected specimens among a broader and more diverse population.We, therefore, recommend a future study to investigate the accuracy of self-collected specimens for diagnosing a wide range of STIs in a more diverse and broader population.

Figure 2 .
Figure 2. Forest plot of chlamydia studies that compared self-collected vaginal swabs with healthcare worker collected cervical and vaginal specimens.

Figure 3 .
Figure 3. SROC depicting diagnostic accuracy of included studies for chlamydia.

Figure 4 .
Figure 4. Forest plot of MG studies that compared self-collected vaginal swabs with healthcare worker collected cervical and vaginal specimens.

Figure 5 .
Figure 5. SROC depicting diagnostic accuracy of MG in included studies.

Figure 7 .
Figure 7. SROC depicting diagnostic accuracy of NG in included studies.

Figure 8 .
Figure 8. Forest plot of TV studies that compared self-collected vaginal swabs with healthcare worker collected cervical and vaginal specimens.

Figure 9 .
Figure 9. SROC depicting diagnostic accuracy of TV in included studies.

Table 4 .
Heterogeneity and statistical significance for CT.

Table 5 .
Heterogeneity and statistical significance for MG.Figure 6. Forest plot of gonorrhoea studies that compared self-collected vaginal swabs with healthcare worker collected cervical and vaginal specimens.

Table 6 .
Heterogeneity and statistical significance for NG.

Table 7 .
Heterogeneity and statistical significance for TV.