The performance of the SEPT9 gene methylation assay and a comparison with other CRC screening tests: A meta-analysis

The SEPT9 gene methylation assay is the first FDA-approved blood assay for colorectal cancer (CRC) screening. Fecal immunochemical test (FIT), FIT-DNA test and CEA assay are also in vitro diagnostic (IVD) tests used in CRC screening. This meta-analysis aims to review the SEPT9 assay performance and compare it with other IVD CRC screening tests. By searching the Ovid MEDLINE, EMBASE, CBMdisc and CJFD database, 25 out of 180 studies were identified to report the SEPT9 assay performance. 2613 CRC cases and 6030 controls were included, and sensitivity and specificity were used to evaluate its performance at various algorithms. 1/3 algorithm exhibited the best sensitivity while 2/3 and 1/1 algorithm exhibited the best balance between sensitivity and specificity. The performance of the blood SEPT9 assay is superior to that of the serum protein markers and the FIT test in symptomatic population, while appeared to be less potent than FIT and FIT-DNA tests in asymptomatic population. In conclusion, 1/3 algorithm is recommended for CRC screening, and 2/3 or 1/1 algorithms are suitable for early detection for diagnostic purpose. The SEPT9 assay exhibited better performance in symptomatic population than in asymptomatic population.

The blood SEPT9 gene methylation exhibits adequate sensitivity and specificity in CRC detection and screening. Table 1 summarized the sensitivity, specificity, PLR, NLR, OR, algorithm and kits used in each study. The sensitivity and specificity of the cohort or case-control study were affected by study design, population, selection of cases, choice of kits and algorithm, etc. Generally speaking, the sensitivity of these studies ranged from 48.2% to 95.6%, with the specificity ranged from 79.1% to 99.1% (Table 1). The latest commercialized Scientific RepoRts | 7: 3032 | DOI: 10.1038/s41598-017-03321-8 SEPT9 assay, the Epi proColon 2.0, exhibited a higher sensitivity at 71.1-95.6%, and maintained high specificity at 81.5 to 99% ( Table 1). The effect of different algorithm can be observed in studies with multiple algorithms applied, and it will be discussed further in this article.
The PLR of the 25 studies ranged from 2.87 to 73.00, exhibiting a high ratio between true positive and false positive rate and suggesting a high probability of true positive when a test result is positive. The NLR of the 25 studies ranged from 0.05 to 0.57, exhibiting a high ratio between false negative and true negative rate and suggesting a high probability true negative when a test result is negative. The OR of the 25 studies ranged from 5.67 to 349.63, indicating that the SEPT9 methylation is a high-risk factor and has diagnostic significance for CRC. PLR, NLR and OR exhibited a big variation among all studies. This may be due to different design of studies, since various inclusion of CRC cases, non-CRC colonic diseases, and normal controls in case-control, cohort or screening studies can greatly affect the positive and negative rate. This may also be due to the difference in kit performance, as those early stage research kits and commercialized kits exhibited lower detection capability than later improved kits.  Table 1. Sensitivity, specificity, PLR, NLR and OR of the blood-based SEPT9 gene methylation assay in CRC detection or screening with various algorithm. PLR = positive likelihood ratio, NLR = negative likelihood ratio, OR = odds ratio, CRC = colorectal cancer, NED = no evidence of diseases, AA = advanced adenoma, NAA = non-advanced adenoma, SP = small polyps, LDT = laboratory developed test. algorithms. The sROC curves represent the ROC plot of the hierarchical summary estimates of sensitivity and specificity for SEPT9 assay with 95% confidence and prediction ellipses for various algorithms. In the Forest plot, the sensitivity or specificity for each study was plotted as solid squares with bars indicating 95% confidence interval. The red lines indicate the pooled estimates of sensitivity or specificity. The rhombus indicates the 95% confidence interval of the pooled estimates.
Scientific RepoRts | 7: 3032 | DOI:10.1038/s41598-017-03321-8 Currently, the PRESEPT study in the only screening study performed in average-risk population from 50 to 75 years old. The sensitivity reported (48.2%) using 1/2 algorithm was apparently lower than those reported in previous cohort or case-control studies. In a later report by Potter and colleagues 9 , triplicate PCRs were performed using samples from the same study. The sensitivity increased to 68.2% and the specificity decreased to 80.0%. The US FDA approved the Epi proColon, the commercialized SEPT9 assay, based on the data from the PRESEPT study with 1/3 algorithm 7,9 . Apart from the screening in average-risk population, the SEPT9 assay was also used in the opportunistic screening for high-risk population. In one study performed in four northern Chinese hospitals (RESEPT study) using the SensiColon assay, the SEPT9 assay exhibited a sensitivity of 76.6% with a specificity of 95.9% at a total positive rate of 25.8% 10 . In another recent opportunistic screening study using the Epi proColon 2.0 CE kit, the SEPT9 assay exhibited a sensitivity of 75.1% with a specificity of 95.1% 11 .
The choice of algorithm affects the performance of the SEPT9 assay. Table 1 lists the four algorithms currently used in studies of SEPT9 assay. The positive test results were determined by one positive count out of three PCRs (1/3 algorithm), one positive count out of two PCRs (1/2 algorithm), two positive counts out of three PCRs (2/3 algorithm), or one positive count out of one PCR (1/1 algorithm). It can be clearly seen that a majority of studies performed 1/3 or 2/3 algorithm, while 1/2 and 1/1 algorithm were also used for some studies.
In order to compare the performance of SEPT9 assay at various algorithms, study data from each algorithm were pooled and meta-analyzed. As shown in Fig. 2, 1/3 algorithm exhibited the best sensitivity (0.78) with lowest specificity (0.84) among all algorithms ( Fig. 2A), while 2/3 algorithm exhibited the highest specificity (0.96) (Fig. 2B). The sensitivity and specificity of 2/3 and 1/1 algorithm ( Fig. 2B and D) were very similar (sensitivity: 0.73 vs 0.74, specificity: 0.96 vs 0.94). 1/2 algorithm (Fig. 2C) exhibited the lowest sensitivity (0.59) with satisfactory specificity (0.91). The area under the curve (AUC) showed very similar values for 1/3, 2/3 and 1/2 algorithm, while the AUC for 1/1 algorithm appeared to be slightly lower than others. It can be observed that the 1/3 algorithm provides the best sensitivity at the price of lower specificity, while the 2/3 algorithm provides the best balance between sensitivity and specificity.
SEPT9 assay is effective for the detection of early-stage CRC. Although SEPT9 assay exhibited overall high sensitivity and specificity in detecting CRC, the ability of detecting early-stage CRC is more important than later stages of CRC, since early detection can ensure early intervention to enhance the cure rate and reduce mortality. Stage I and II are regarded as early stages in this article. Table 2 summarizes the stage-related detection rate of all available studies and the pooled data categorized by the algorithm used in these studies. The data in Table 2 is plotted in Fig. 3 to compare the detection rate at each stage ( Fig. 3A) and the effect of algorithm on stage-dependent detection rate (Fig. 3B). It can be seen from  which the detection rate increases with the escalation of clinical stage, indicating that the level of SEPT9 methylation had a correlation with the degree of malignancy, regardless of the algorithm. Furthermore, the algorithm exhibited a clear effect on the sensitivity for every stage (Fig. 3B). The sensitivity with various algorithm can be ranked as 1/3 > 2/3 > 1/1 > 1/2 from highest to lowest. SEPT9 assay generally exhibited the highest detection rate for all stages of CRC with 1/3 algorithm (Fig. 3B). The detection rate for stage I and II reached 59.6% and 85.7%, respectively (Table 2), representing a very high detection rate for early-stage CRC among current in-vitro diagnostic methods. The SEPT9 assay with 2/3 or 1/1 algorithm detected approximately half of stage I and 70% of stage II CRC, respectively, which was also satisfactory for early CRC detection ( Table 2, Fig. 3B). Apparently, for CRC screening aiming at early-stage cancer detection, algorithms with high sensitivity should be adopted. However, false positive detection will also increase with higher sensitivity algorithms. These data clearly show that the SEPT9 assay is effective for early stage CRC detection.
The Performance of the SEPT9 assay is superior to CEA and the FIT tests in the screening of symptomatic population. In order to further evaluate the performance of the SEPT9 assay in CRC screening and diagnosis, the sensitivity and specificity of the assay was compared with the current clinically used serum markers (CEA, CA50, CA242, CA724 and CA199) 32 in the screening of symptomatic population. As 1/3 algorithm was recommended by the US FDA as the interpretation method, the pooled data from 1/3 algorithm of the assay was used for comparison. The sensitivity and specificity of these tests were meta-analyzed by reviewing the study on the performance of each individual test on CRC detection. The Forest plot was made for each serum marker and the estimated sensitivity and specificity were analyzed for comparison with the SEPT9 assay ( Fig. 4A and B). It can be clearly seen from Fig. 4A and B that the sensitivity of the SEPT9 assay (78%) in CRC detection was much higher than any of the five serum markers, and its specificity (84%) was at the same range as these serum markers, including the most commonly used CRC serum marker, CEA. The detailed data for the serum markers was summarized and analyzed in Supplementary Figure 2 (CEA), 3 (CA50), 4 (242), 5 (CA724) and 6 (CA199), respectively.
As FIT is a test widely used for CRC screening in symptomatic population, three studies compared the performance of FIT with that of the SEPT9 assay side by side. Table 3 summarized the sensitivity and specificity of FIT and the SEPT9 assay in the three studies. It can be seen from the pooled data that the SEPT9 assay exhibited significantly higher sensitivity than the FIT test (75.6% vs 67.1%, p < 0.05), while they showed essentially identical specificity. It appeared that the performance of the SEPT9 assay in screening of symptomatic population is better than that of the FIT test.
The SEPT9 assay is less potent than the FIT and the FIT-DNA tests in the screening of asymptomatic population. Currently, the PRESEPT study 7,9 is the only study investigated the performance of the blood SEPT9 assay in average-risk asymptomatic population, we therefore compared the data from this report with data from FIT 33 and FIT-DNA test 34 in the same type of population. It can be clearly seen from Fig. 5A and B that SEPT9 exhibited lower sensitivity (68.0% for the SEPT9 assay, compared with 79.0% for FIT and 92.3% for Figure 5. The sensitivity and specificity of the SEPT9 assay were lower than FIT and FIT-DNA tests in asymptomatic population. The sensitivity of the SEPT9 assay in CRC screening in asymptomatic average-risk population appeared to be lower than that of the FIT and FIT-DNA tests, as shown in panel A. Similarly, its specificity was also lower than the other two tests (panel B). Bars indicate 95% confidence interval. The sensitivity and specificity of the SEPT9 assay were superior to serum protein markers in symptomatic population. The sensitivity of the SEPT9 assay in CRC screening or detection appeared to be much higher than that of the CEA, CA199, CA242, CA50 and CA724, as shown in panel A. In contrast, its specificity was at the same range as these serum protein markers (panel B). Bars indicate 95% confidence interval.  FIT-DNA test) and lower specificity (80.0% for the SEPT9 assay, compared with 94.0% for FIT and 86.6% for FIT-DNA test) than FIT or FIT-DNA test. The performance of the blood SEPT9 assay in asymptomatic population screening appeared to be lower than that of the FIT and FIT-DNA tests. However, the SEPT9 assay exhibited better compliance than FIT test. One recent study 35

Discussion
It is notable that the screening sensitivity of 48.2% reported in the PRESEPT study was much lower than those reported in previous cohort or case-control studies. This can be explained from three aspects. Firstly, duplicate PCR reactions, instead of triplicate PCRs, were used in this study. The sensitivity in this study was obtained from at least one positive reaction out of two PCRs, instead of three PCRs, therefore, the chance to detect abnormally methylated SEPT9 DNA was lower than those previous studies performing three PCRs. Secondly, the study setting was different from those previous studies. The PRESEPT study aimed at screening of asymptomatic average-risk population between 50 and 75 years old. CRC patients found in the asymptomatic population are more likely to be those with early-stage CRC. As the sensitivity in early-stage CRC appeared to be lower than the overall sensitivity from all stages, the sensitivity from the asymptomatic population tends to be lower than those with symptomatic population, i.e. CRC subjects recruited from hospitals in those case-control or cohort studies. Thirdly, the reaction system in different studies varied, and this may explain why 1/1 algorithm exhibited higher performance than 1/2 algorithm. The PCR reaction from Epi proColon series product used 30 µl reaction system, however, is was reported that the latest SensiColon product used 60 µl reaction system with doubled amount of DNA template 10 . This allowed higher chance of methylated DNA detection, although only one PCR reaction was performed.
The PRESEPT study represents the scenario in the real life CRC screening setting and therefore has better guiding significance than case control studies. The application of three PCRs was based on the consideration of identifying as many cancer patients as possible in the screening background. It appears that the sensitivity reported in the study using 1/3 algorithm (68.2%) was much better than that obtained from 1/2 algorithm 9 . Therefore, 1/3 algorithm was accepted by the US FDA for the approved product.
Apart from the screening of average-risk population, the SEPT9 assay was also used in the opportunistic screening of high-risk population, in which the chance of identifying positive subjects is much higher than that in average-risk population, as this screening is commonly performed in hospital environment. Therefore, the positive rate, sensitivity and specificity are related to specific population in a screening. The SEPT9 assay exhibited high sensitivity and specificity in opportunistic screening 10 . These parameters appeared to be higher than those in PRESEPT study, as the case composition in this opportunistic screening was quite different from that in REPSEPT study.
A distinct feature of the SEPT9 assay is that most studies performed multiple PCRs to enhance test sensitivity. This is because SEPT9 assay is designed for detecting trace amount of methylated SEPT9 gene copies in strong background of unmethylated SEPT9 DNA, and the amount of detectable methylated SEPT9 DNA can be as low as 7.8 pg/ml, equivalent to 1-2 copies of genome DNA 9 . This leads to the question of interpreting the PCR data when some of the PCR reactions show positive while others show negative results. Due to the application of different algorithms in data analysis, sensitivity and specificity vary with different methods of interpretation.
The choice of algorithms is based on the specific needs in a test. Two aspects need to be considered before a method can be properly chosen. One is to exclude healthy negative population, which requires high specificity, and the other is to identify as many real patients as possible to enhance the disease detection rate, which requires high sensitivity. These two aspects are normally against each other, as they are two factors at the ends of a teeterboard. As the 1/3 algorithm showed the best sensitivity while the 2/3 algorithm showed the best specificity, the choice of 1/3 algorithm or 2/3 algorithm depends on the purpose of a study. If detecting cancer is the main purpose, such as that in a screening test, 1/3 algorithm can be used to maximize the number of patients detected. The situation of false positive subjects can be confirmed by further diagnostic method, such as colonoscopy. In contrast, if the main purpose is to exclude healthy subjects to ensure low misdiagnosis rate, 2/3 algorithm can be used, and routine screening program should be implemented to minimized the ratio of missed cancer patients.
As the SEPT9 assay was shown to detect early-stage CRC with high sensitivity, it has great advantages over serum protein markers in CRC early detection. CEA is the most commonly used markers for CRC detection, however, due to its low sensitivity for early-stage CRC, it is more widely used in post-surgery monitoring of CRC recurrence and therapeutic effects, rather than a screening marker. SEPT9 gene methylation appeared to be the best blood-based single marker for CRC screening and early detection so far.
Although the SEPT9 assay did not exhibit similar sensitivity and specificity compared with FIT and FIT-DNA in asymptomatic population screening, it is a competitive option for CRC screening and early detection, as it has been shown to have higher compliance in CRC screening than FIT and colonoscopy 35 . A good screening test should be one that not only has high sensitivity and specificity, but also has high uptake rate by population.

Conclusions
The SEPT9 assay is the first blood-based test aiming at ctDNA detection for CRC screening and early detection. It shows a high sensitivity and specificity in CRC screening and early detection. The choice of algorithm is based on the needs for a test. Algorithm with high sensitivity (1/3) can be used in screening, while algorithm with high specificity (2/3 or 1/1) can be used in CRC early detection for diagnostic purpose to exclude normal subjects. The SEPT9 assay exhibited better performance than protein markers and FIT test in symptomatic population, Scientific RepoRts | 7: 3032 | DOI:10.1038/s41598-017-03321-8 while showed lower sensitivity and specificity than the FIT and the FIT-DNA tests in asymptomatic population. However, it provides an effective option for patients who have low compliance with FIT or colonoscopy.

Methods
All literature search, selection, data extraction, study quality assessment, and data analysis were performed based on the rules, guidelines or recommendations from the preferred reporting items for systematic reviews and meta-analyses (PRISMA) and the quality assessment for diagnostic accuracy studies (QUADAS) 8,30,31,36,37 . Relevant softwares were used for the above process and data analysis, including Review Manager 5.

Study inclusion and exclusion criteria.
The aim of the study selection is to identify the studies that are clinical studies evaluating the performance of the SEPT9 assay using blood samples from human subjects. (Fig. 1). Duplicates from all databases were excluded and 180 studies remained. In the next screening, a total of 104 articles, including letters, reviews, meta-analysis and guidelines (18 in MEDLINE and EMBASE,61 in CBMdisc and CJFD), basic research studies (3 in MEDLINE and EMBASE, 7 in CBMdisc and CJFD) and articles irrelevant to mSEPT9 detection assays (3 in MEDLINE and EMBASE, 12 in CBMdisc and CJFD) were excluded. In the following eligibility screening, a further 48 articles were excluded, including studies that were not using plasma or serum samples (16 in MEDLINE and EMBASE,5 in CBMdisc and CJFD), studies that were not detecting gene methylation (9 in MEDLINE and EMBASE, 18 in CBMdisc and CJFD). Finally, 28 studies were included in the qualitative synthesis, and studies that did not have statistically significant number of CRC or non-CRC cases included (3 in CBMdisc and CJFD), and 25 studies were included in the quantitative synthesis for this meta-analysis.

Data extraction.
A standardized data abstraction form was developed, and key elements related to test parameters were collected by two independent reviewers. The study design, type of study (case-control, cohort or screening), sample types, sample size (number of cases, controls, males, females), subject age distribution, algorithm and gold standard for diagnosis were first examined and recorded to ensure the validity of a study. The test sensitivity, specificity and positivity rate with detailed positive, negative and total numbers of cases were then collected. The PLR, NLR and OR were calculated based on these numbers. Study quality and risk of bias assessment. The quality of the included reports was assessed using the QUADAS system 8 by Review Manager 5.2 software. The methodological quality of the studies with focuses on the risk of bias and applicability was systematically assessed, as shown in Supplementary Figure 1. All items in the PRISMA checklist was assessed and finished, as shown in Supplementary Table 1. Most studies were cohort or case-control studies, and only three of them were prospective screening studies. Therefore, some bias inevitably appeared in cohort or case-control studies. Although most of them clearly describe the patient selection criteria and procedure, and the number of CRC cases was comparable with the number of controls, a few of them were lack of description of selection criteria, or had unbalanced number of cases against controls, and these may lead to the risk of bias. As the kits from Epigenomics, Inc. were used for a majority of studies, the cutoff values were identical for most studies, and the index test (marker test) and the reference test were performed parallel. Finally, Figure 6. Deeks' funnel plot asymmetry test for all studies included in this meta-analysis. colonoscopy and subsequent pathological examination were used as the gold standard for cancer or normal subject determination for all studies, and this ensured the quality of data. The overall bias of the included studies was tested using the Deeks' funnel plot (Fig. 6), and the P value of 0.77 indicates that the distribution of studies is symmetric and there is no systematic bias across all studies analyzed in this study.
Subgroup analyses and definition of diagnostic outcomes. All studies were divided into four subgroups based on the algorithm used in the interpretation of multiple qRT-PCR or high-resolution melting (HRM) data. If the final positive test result was determined from at least one positive count out of three repeats, the study was categorized into the 1/3 algorithm group, and if the final positive test result was determined from at least two positive counts out of three repeats, the study was categorized into the 2/3 algorithm group. Similarly, if the final positive test result was determined from at least one positive count out of two repeats, the study was categorized into the 1/2 algorithm group, while if the final positive test result was determined from only one reaction, the study was categorized into the 1/1 algorithm group. The sensitivity, specificity, PLR, NLR and OR were calculated based on the algorithm, and parameters for each algorithm were calculated by pooling the data from studies in the same algorithm group. The stage-dependent sensitivity was calculated based on the number of positive cases and the total number of cases for a certain stage and the algorithm used in the study.
The heterogeneity of the studies was analyzed based on four algorithms used for SEPT9 assay interpretation, and RR (relative risk) and OR (odds ratio) values were calculated to show the heterogeneity in all four algorithms. It can be seen from Fig. 7 that all RR analysis (the left figure in each panel) showed high I 2 values (69.3% to 92.6%) with very small P values, indicating the existence of heterogeneity in studies for all four algorithms. The OR analysis (the right figure in each panel) showed high I 2 values (54.3-77.0%) with very small P values in 1/3. 2/3 and 1/1 algorithm, suggesting the presence of heterogeneity, while did not show heterogeneity for 1/2 algorithm. Taken together, it can be suggested that heterogeneity was present among studies in all algorithms, and the random effect model should be used for analysis. Statistical analysis. The sROC curves for each algorithm were simulated, and the Forest plot for each algorithm and the relevant statistics (including estimated sensitivity, specificity, area under curve (AUC), 95% confidence interval, 95% confidence contour, and 95% prediction contour) were performed using the Stata 14.0 software. The Forest plot for the five serum protein markers and relevant statistics were performed using the Meta-disc 1.4 software (Supplementary Figures 2-6). The estimates and 95% confidence interval for all data in histograms were calculated and plotted using the PRISM 5.0 software. The comparison of ratio parameters was performed using the χ 2 test, and p < 0.05 was regarded as a significant difference.