Main

Vulval cancer accounts for 3–5% of all gynaecological malignancies and 1% of all cancers in women, with an estimated 27 000 women diagnosed each year (Berek and Hacker, 2005). Standard treatment for squamous cell carcinoma of the vulva involves excision of the primary tumour and inguinofemoral lymphadenectomy (IFL) in all but FIGO stage Ia or superficially invasive disease. Groin lymph node status has been identified as the most important factor in predicting mortality attributable to vulval cancer (Royal College of Obstetricians and Gynaecologists, 1999). The efficacy of this treatment is good, with reported groin recurrence rates varying between 1% and 10% (Burger et al, 1995; Bell et al, 2000). However, only a third of patients with early-stage disease will have lymph node metastases, and the remainder will not benefit from elective IFL while risking significant morbidity (de Hullu et al, 2006; van der Zee et al, 2008). Complications affect over 50% of patients having IFL, including infection of groin wounds, wound breakdown, lymphocyst formation, lymphoedema and cellulitis (Gould et al, 2001; Pereira de Godoy et al, 2002; Gaarenstroom et al, 2003; Beesley et al, 2007).

Inguinofemoral lymphadenectomy is the standard of care because unrecognised disease in the inguinofemoral lymph nodes is usually fatal. A sentinel lymph node (SLN) refers to the first lymph node that receives drainage directly from the primary tumour and therefore has the highest probability of containing metastatic disease. The SLNs can be identified by lymphoscintigraphy using the radioactive tracer Technetium 99 (99mTc) and/ or with blue dye. The lymph node obtained can be examined using standard histopathology with haematoxylin and eosin (H&E), frozen section or enhanced testing (ultrastaging) with serial sectioning of the lymph node and immunohistochemistry for cytokeratins. Accurate identification of the sentinel node in early-stage vulval cancer may potentially spare the patient from undergoing IFL with its associated morbidity. The diagnostic performance of SLN biopsy in the ‘real-world’ setting to guide omission of IFL where the SLN is negative is also not fully established, but is the subject of on-going multicentre studies (GOG-0270 and GROINSS V11). We conducted a systematic review to evaluate the accuracy of SLN biopsy in vulval cancer.

Materials and methods

Protocol development and overview

A protocol was developed for undertaking systematic reviews of test accuracy, diagnostic and therapeutic impact. Scoping searches for relevant systematic reviews were conducted in MEDLINE, EMBASE and the Cochrane Library. Systematic reviews were carried out using established methods (Higgins and Green, 2011; Diagnostic Test Accuracy Working Group, 2012). Presentation of results is according to the PRISMA guidelines (Moher et al, 2009). Inclusion of studies, data extraction and quality assessment were carried out in duplicate using predesigned and piloted data extraction sheets with differences resolved by consensus and/or arbitration involving a third reviewer. A two-stage process was used, firstly by screening titles and abstracts. For all references categorised as ‘include’ or ‘uncertain’ by both reviewers, full text was retrieved wherever possible and final inclusion decisions were made on the full paper.

Search strategy, inclusion and exclusion criteria and quality assessment

Comprehensive searches from the inception of database to 25 October 2013 were conducted in MEDLINE, Embase, Science Citation Index, the Cochrane Library, MEDION, Cochrane Database of Systematic Reviews, Database of Abstracts of Reviews of Effects, the Health Technology Assessment Database, Clinical Trials.com as well as a search of internet resources (UK Clinical Research Network Portfolio, specialist search gateways (OMNI and the National Cancer Institute), Google and Copernic) to identify relevant published and unpublished studies and studies in progress. Electronic searches were supplemented by checking reference lists, handsearching the journal Gynecologic Oncology and contact with authors of included studies for information on any relevant published or unpublished studies. No language restrictions were applied. Search strategies were designed from a series of test searches and discussions of the results of searches among the review team. Both MESH terms and text words were used and included ‘vulva cancer’, ‘sentinel lymph node biopsy’ and ‘lymphoscintigraphy’.

The population of interest was women with early stages of vulval cancer: at least 75% of population with FIGO stage I and II or TNM categories T1-2 N0 M0. We excluded studies on patients with vulval melanomas, advanced cancer – FIGO stage IV, inoperable tumours, tumours unsuitable for primary surgery, patients with clinical suspicion of metastases, that is, with palpable inguinofemoral lymph nodes, enlarged lymph nodes (>1.5 cm) on imaging or cytologically proven inguinofemoral lymph node metastases at the start of the study. The index testing strategies were SLN biopsy with 99mTc, blue dye or combined technique (99mTc with blue dye), with histopathology by H&E either on formalin-fixed or frozen sections or enhanced testing with thinner sections and/or immunohistochemistry. Where studies reported any of ultrastaging, serial sections, multiple slices, additional sections or step sections, these were all classified as ‘ultrastaging’. Studies on other imaging modalities and novel metastasis detection techniques were excluded. Reference standard was histology of IFL or clinical follow-up for SLN-negative patients. Outcomes of interest were diagnostic accuracy, morbidity following SLN biopsy, mortality and disease-free survival, quality of life, and impact on surgeon’s and team’s skills and experience (learning curve). Studies with nonclinical outcomes and those that reported outcomes per groin only were excluded. Any prospective or retrospective test accuracy study designs, studies investigating the diagnostic and therapeutic impact with or without concurrent assessment of test accuracy and prospective cohort studies of outcomes of patients tested with 99mTc, blue dye or combined technique for SLN biopsy were included. Case studies were excluded.

Study quality was assessed using standard guidelines for test accuracy (QUADAS) and diagnostic and therapeutic impact studies (Meads and Davenport, 2009; Whiting et al, 2010).

Statistical analysis

RevMan version 5.2 (The Nordic Cochrane Centre, Copenhagen, Denmark) was used for statistical analyses and Meta-Disc 1.4 (Unit of Clinical Biostatistics team of the Ramón y Cajal Hospital, Madrid, Spain) was used for meta-analysis. Sensitivity, specificity, true positives (TP), false positives (FP), true negatives (TN) and false negatives (FN) were taken directly from the source papers. If that was not possible, values were calculated from data provided. Based on an investigation of heterogeneity, summary estimates of sensitivity, specificity and likelihood ratios (LRs) were derived as appropriate. Results were displayed graphically on Forest and receiver operating curve (ROC) plots. Summary SLN detection rates and their 95% confidence intervals were calculated using Meta-Disc.

Results

Characteristics of included studies

There were 2950 citations identified from searches, of which 82 full papers were obtained and 29 relevant studies (38 publications) were included. Figure 1 displays the PRISMA diagram. Most studies were small with <50 patients, but there were 3 larger studies with 127 patients (Hampl et al, 2008), 452 patients (269 with tumours under 4 cm; Levenback et al, 2012) and 403 patients (van der Zee et al, 2008). The characteristics of the studies are presented in Supplementary Tables 1 and 2. Patients with early-stage vulval cancer varied between 86% and 100% of subjects, with 18 out of 29 (62%) studies having all patients at early stage. Where reported, tumour locations were evenly spread between midline or lateral positions. The most commonly reported tumour types were squamous cell carcinoma. Either TNM and FIGO staging alone or a combination of both were given in all studies.

Figure 1
figure 1

The PRISMA diagram for diagnostic review.

Index tests and histopathological techniques used for SLN biopsy, and reference standards used in each of the studies are summarised in Table 1. Out of 29 studies, 24 presented results for both blue dye and 99mTc tests for SLN identification, although not all patients underwent both tests in every study. Presentation of results varied considerably. In 21 studies, detection rates per groin were presented for each test separately, and both tests combined. It is worth noting that SLN's were always subject to rigorous examination, whereas histopathological techniques for corresponding IFL nodes were less detailed and were assumed to be H&E unless otherwise stated.

Table 1 Studies included in the systematic review showing details of index test and reference standard

Quality of included studies

Quality assessment is reported in Supplementary Table 3. Of the 29 included studies, 4 had no information about histopathological methods (Pitynski et al, 2003; Nyberg et al, 2007; Vakselj and Bebar, 2007; Camara et al, 2009;). One study used frozen section as reference standard (Camara et al, 2009). In 19 studies, upon negative H&E, immunohistochemistry using antibodies such as AE1, AE3, S-100, HMB-45, Mab, CKMNF, CK-88 and EMA was performed. In others, additional sections/ultrastaging was used if samples were negative by H&E staining and standard sectioning. Thickness of slices varied between studies. Only de Hullu et al (2000) achieved blinding of pathologists.

Test accuracy results

Reporting of results was frequently ambiguous, making it difficult to distinguish between patients who had no SLN detected from those with negative SLN biopsy on histology. Results of test accuracy are presented on the basis of detected SLN. In all, 24 studies evaluated the test accuracy of SLN with IFL for all, and 5 studies evaluated SLN with clinical follow-up for test-negative patients and IFL for patients with malignancy detected in SLN biopsy (Van den Eynden et al, 2003; Terada et al, 2006; Moore et al, 2008; van der Zee et al, 2008; Achimas-Cadariu et al, 2009). For calculation of sensitivity and specificity, studies have been categorised into groups by the reference standards used, the index test used and the histopathological techniques used as follows:

  1. 1

    Inguinofemoral lymphadenectomy for all

    • Technetium 99 with blue dye (Supplementary Table 4)

      • ▪ Haematoxylin and eosin only or insufficient details to determine whether immunohistochemistry or ultrastaging were used

      • ▪ Immunohistochemistry

      • ▪ Frozen section only

      • ▪ Immunohistochemistry with ultrastaging

    • Technetium 99 only (Supplementary Table 5)

      • ▪ Haematoxylin and eosin only or insufficient details to determine whether immunohistochemistry or ultrastaging were used

      • ▪ Immunohistochemistry

    • Blue dye only (Supplementary Table 6)

      • ▪ Immunohistochemistry with ultrastaging

  2. 2

    Inguinofemoral lymphadenectomy for SN positive and clinical follow-up for SLN negative (Supplementary Table 7)

    • Technetium 99 and blue dye

      • ▪ Immunohistochemistry

      • ▪ Ultrastaging

Point estimates of specificity are 100%. The ROC plane was unhelpful and not presented. Although the point estimates of sensitivity are close to 100%, confidence intervals were wide, reflecting the small sample sizes available.

The SLN detection rates for each of the analysed techniques (blue dye, 99mTc and blue dye/99mTc) are presented in Table 2. The detection rate calculated per patient was available in all included studies. Combined blue dye and 99mTc testing had the highest rate of SLN detection. Pooled rates are 94.0% for 99mTc (95% CI 90.5–96.4), 68.7% for blue dye alone (95% CI 63.1–74.0) and 97.7 (95% CI 96.6–98.5) for 99mTc and blue dye combined.

Table 2 The SLN detection rate of blue dye, 99mTc and both

Training and experience

Studies commonly specified the first 10 cases as learning curve (Hampl et al, 2008; van der Zee et al, 2008; Achimas-Cadariu et al, 2009), after which SLN biopsy without IFL could be performed. Only Levenback et al (2001) calculated that the rate of SLN detection was worse in the first 2 years of the study (failure rate 16% vs 7%).

Recurrence rates following SLN biopsy

Two groups presented recurrences at follow-up. The first group used full IFL at initial operation to establish diagnostic accuracy, but also presented follow-up data (Martinez-Palonez et al, 2006; Vakselj and Bebar, 2007; Vidal-Sicart et al, 2007; Klat et al, 2009; Crosbie et al, 2010). The second group used clinical follow-up for SLN-negative patients to establish diagnostic accuracy (Van den Eynden et al, 2003; Terada et al, 2006; Moore et al, 2008; van der Zee et al, 2008; Achimas-Cadariu et al, 2009). In the first group, number of recurrences seen (18) was less than the number of SLN-positive patients (36) in the 4 studies that present follow-up data by SLN status (Martinez-Palonez et al, 2006; Vakselj and Bebar, 2007; Klat et al, 2009; Crosbie et al, 2010). The SLN-negative patients developed recurrence in 3 of these 4 studies (Martinez-Palonez et al, 2006; Vakselj and Bebar, 2007; Klat et al, 2009). Of these, two studies showed a higher recurrence rate in SLN-negative patients than in patients who underwent IFL after false-negative SLN biopsies (Martinez-Palonez et al, 2006; Vakselj and Bebar, 2007). This may imply a therapeutic benefit to IFL or the confounding effect of adjuvant radiotherapy.

In the second group with clinical follow-up for SLN-negative patients, recurrence rates for groin and distant recurrence were calculated (Supplementary Table 7). Pooled sensitivity from SLN with clinical follow-up (91% CI 85%–95%) is comparable to estimates where patients received IFL as the gold standard (pooled sensitivity of 95% (92–98%), but with a lower NPV (95.3% vs NPV of 97.9%; see Figures 2 and 3).

Figure 2
figure 2

Forest plot of sensitivity of SLN biopsy in group with IFL for all, 99mTc with blue dye – ultrastaging with immunohistochemistry.

Figure 3
figure 3

Forest plot of sensitivity of SLN biopsy in group with IFL for SN positive, clinical follow-up for SN negative, 99mTc and blue dye, ultrastaging, groin and distant recurrences only* (*data from Achimas-Cadariu et al, 2009 could not be included as there were insufficient data in paper).

Survival rates

Nine studies gave information about survival (Martinez-Palonez et al, 2006; Terada et al, 2006; Vakselj and Bebar, 2007; Moore et al, 2008; van der Zee et al, 2008; Achimas-Cadariu et al, 2009; Klat et al, 2009; Crosbie et al, 2010; Oonk et al, 2010). All studies were consistent with a relatively low survival rate for patients with groin relapse. Deaths were reported by Achimas-Cadariu et al (2009) (12 out of 46 patients, survival 61.2 months for whole cohort and16.2 months for 8 patients with relapse), Crosbie et al (2010) (2 out of 32 patients, follow-up 62 months), Klat et al (2009) (1out of 23 patients, follow-up 8–46 months), Moore et al (2008) (1 out of 35 patients died of intercurrent disease, follow-up 29 months), Terada et al (2006) (follow-up 55 months, 2/3 node-positive patients died of cancer), Vakselj and Bebar (2007) (6 out of 10 node-positive patients, 2 out of 25 node-negative patients, 1 died of disease at 49 months) and Vidal-Sicart et al (2007) (1 out of 50 patients died of disease, follow-up 20 months).

The largest study presented disease-specific survival for node-negative patients with a median follow-up of 35 months, with 202 out of 276 (73.2%) patients having at least 24 months of follow-up (van der Zee et al, 2008). The 3-year disease-specific survival for patients with unifocal vulval disease and negative SLN was 97.0%. In a subsequent paper, 5-year disease-specific survival was 77.3% in patients with positive SLN. However, survival varied depending on the histopathology technique used; 64.9% when malignant SLN were identified by routine pathology and 92.1% when identified by ultrastaging, and was higher with size of SLN metastases >2 mm (Oonk et al, 2010).

Quality of life

One study (62 patients) investigating quality of life found few differences between SLN and IFL with EORTC QLQ-C30; only the financial difficulties scale was statistically significantly worse in the IFL group. For the FACT-V questionnaire, there were significantly worse results for the contentment functional scale, and oedema, complaints and stockings symptom scales (Oonk et al, 2009).

Adverse events

Information about adverse events was generally poorly reported. Eight studies (Table 3) provided data (Van den Eynden et al, 2003; Terada et al, 2006; Brunner et al, 2008; Johann et al, 2008; Moore et al, 2008; van der Zee et al, 2008; Achimas-Cadariu et al, 2009; Crosbie et al, 2010). Patients undergoing IFL had worse morbidity than those undergoing SLN alone. Definitions of morbidity are not standardised, and therefore statistical comparisons were not possible.

Table 3 Summary of adverse events from SLN or SLN biopsy with IFL

Discussion

This systematic review comprises 29 studies with information on test accuracy of 99mTc and/or blue dye identification of SLN biopsy with reference standard of either IFL for all (24 studies) or IFL for SLN-positive nodes (containing metastases) and clinical follow-up for SLN-negative nodes (5 studies). There were, in effect, three index tests (99mTc, blue dye and 99mTc with blue dye) and five reference standard groups (H&E only, insufficient details to determine histopathological techniques used, frozen section only, immunohistochemistry, ultrastaging and ultrastaging with immunohistochemistry). Therefore, calculating the sensitivity and specificity of finding metastases in a SLN biopsy compared with the reference standard was not straightforward and no meta-analysis of all 29 studies was appropriate. Unfortunately, it was not possible to establish the success rate of SLN detection (technical success) as distinct from sensitivity of SLN for detection of metastatic disease.

The largest group of studies (21 studies) using 99mTc, blue dye and immunohistochemistry demonstrated a pooled SLN detection rate of 97.7% (CI 96–98%) with blue dye and 99mTc, 94.0% (CI 90–96%) with 99mTc alone and 68.7% (CI 63–74%) with blue dye alone. We could not statistically compare accuracy of 99mTc with blue dye as there were insufficient studies of similar clinical characteristics to be able to conduct meta-regression. Nevertheless, these results provide evidence that a combination of blue dye/technetium and ultrastaging is the most accurate test. Using blue dye and technetium may also benefit the learning curve as blue dye enables direct visualisation of the SLN (Bass et al, 1999).

Recurrence occurred in two cohorts of patients; the first underwent IFL and then follow-up, and the second underwent follow-up after a negative SLN. The number of clinically apparent recurrences at follow-up was less than the number of SLN-positive patients. This is either because microscopic metastases detected with sophisticated histology techniques in SLN biopsy are less likely to be clinically significant than those detected with standard histopathology techniques, or because subsequent IFL and/or the subsequent adjuvant treatment is therapeutic. It is noteworthy that nodes retrieved as ‘sentinel’ were subject to intensive examination to detect micrometastases, whereas nodes retrieved at subsequent IFL were either subject to routine histopathology or details of histopathological examination were not presented. Thus, accuracy results are skewed to overscoring positives for SLN or underscoring positives for IFL. Unless both SLN and IFL are subjected to the same technique, the true value of micrometastases detected by ultrastaging of SLN will not be established. This review highlights the challenges in truly assessing the value of a test such as SLN when there is such variability in how the nodes are examined at pathology. There is an urgent need for consensus to define the standards of histopathology and the need for ultrastaging. Our review did not identify any papers that presented management of groin lymph nodes at recurrence.

One recent systematic review has reported high rates of detection of SLN nodes, but did not make any attempt to stratify studies by techniques used to examine SLN or differentiate studies based on follow-up for SLN negatives (Hassanzade et al, 2013). The strength of the test accuracy systematic review was the rigour of its conduct and the focus on comparing and contrasting the different types of index test and reference standard. The studies had considerable methodological limitations, including lack of an adequate description of inclusion criteria, population (especially stage of disease) and reference standard used. Histological methods varied considerably, particularly with regard to ‘ultrastaging/additional sections’, and the optimum methods of examining the SLN were unclear. The results from the two largest studies (van der Zee et al, 2008; Levenback et al, 2012) are extremely consistent showing FN rates of <3% in unifocal vulval tumours <4 cm in size and supporting the use of SLN biopsy in these patients. It is noteworthy that van der Zee et al (2008) implemented a robust training protocol before entry for SLN biopsy. Our results show a higher FN rate of 9% with clinical follow-up for SLN biopsy negatives, reflecting pooled estimates from smaller studies and highlighting the importance of the learning curve effect. Gynaecological oncologists will value the clinical utility of knowing the FN rate of SLN in counselling patients.

It is also uncertain whether patients would rather risk groin metastases by forgoing IFL if they are SLN negative. One small study surveyed 106 patients who had undergone IFL as part of treatment; 66% would recommend IFL if the risk of missing metastasis from SLN biopsy was 1 in 80 and 84% would recommend IFL if the risk of missing metastasis from SLN biopsy was 1 in 8. Age and the presence or degree of side effects experienced by the patients surveyed, including 39% with severe lymphoedema and 28% with severe pain, did not affect preferences for each procedure (de Hullu et al, 2001). Further research on factors that influence lymphatic spread, for example, age, stage of disease and grade of tumour and exploration of patient’s preferences, may aid decision making in the individual patient. Sophisticated quality-of-life studies need to investigate the impact of SLN vs IFL in patients.

At this stage, given the relatively small numbers of studies evaluating SLN with clinical follow-up, SLN should only be implemented within a research protocol, for unifocal tumours <4 cm, at selected centres with sufficient numbers and expertise to establish quality control. Careful patient counselling is essential, referring to the trade-off between morbidity from lymphadenectomy and the slightly higher rate of recurrence with SLN biopsy. Consensus standards for histopathological examination and reporting for SLN and IFL nodes are urgently needed, particularly with regard to ultrastaging and immunohistochemistry protocols. Given the higher recurrence rate in patients receiving SLN biopsy, we recommend that where patients have opted for SLN only and have not undergone full IFL, and the SLN is negative, they should be followed-up at close intervals (e.g., 2-monthly for 2 years) to detect any missed groin node metastases early, and facilitate an attempt at salvage therapy. In the absence of data to guide optimal method of follow-up (clinical vs imaging), careful clinical monitoring would seem pragmatic.