Background

Sentinel lymph node biopsy (SLNB) in cutaneous melanoma is critical for therapeutic decision making1,2. Initially, when the method was devised, a positive SLNB signaled the need for a completion lymph node dissection (CLND)3. At the time, effective systemic adjuvant therapy was unavailable, and lymphadenectomy was the dominant of a handful of feasible interventions. Surgical oncologists, who saw their patients relapse and die from unstoppable disease, sought to intervene early and aggressively, following the adage that a chance to cut is a chance to cure4. Eventually, both the Multicenter Selective Lymphadenectomy Trial (MSLT)-2 and Dermatological Cooperative Oncology Group (DeCOG)-SLT trial failed to confirm a survival benefit for CLND in SLNB-positive patients5,6. While CLND may still have a role in selected patients7, the procedure is no longer standard treatment.

Today, SLNB is used to rule in adjuvant therapies1. Still, not all patients are believed to be high enough risk for the procedure. Per current guidelines, SLNB is not recommended if the risk of nodal metastasis is <5%, as in T1a melanoma with a Breslow thickness of <0.8 mm and no adverse features. SLNB should be considered if the risk of nodal metastasis is between 5% and 10% (T1b, Breslow thickness, 0.8–1.0 mm) and is recommended if the risk of nodal metastasis exceeds 10% (T2 to T3, Breslow thickness, >1.0 mm)1,2. Patients with T3b and T4 melanoma (Breslow thickness of >2.0–4 mm with ulceration or Breslow thickness >4 mm) qualify for adjuvant therapy irrespective of nodal status and may forgo the procedure8. Of all SLNB-eligible patients, less than 20% have nodal metastasis9,10. When performed by a skilled surgeon, the SLNB procedure is safe, but patients face a risk of surgical complications, including bleeding, infection, seroma, and lymphedema11,12.

While SLNB is an important tool for individualizing therapy, the method has several disadvantages. The false-negative rate has been estimated at 15–20%9,13,14,15, which is consistent with our experience, and is due to technical problems, operator-dependency, disruption of lymphatic drainage from diagnostic biopsies, idiosyncratic lymphatic obstruction, complex lymphatic drainage in the head and neck, inadequate histological analysis and complex metastasis patterns. SLNB can also be false-positive when benign melanocyte-marker-positive cells in sentinel nodes are called malignant16. Up to 5.1% of lymph nodes from patients with no history of melanoma contain single MART-1 positive cells17. In a series of T1 to T3 cutaneous melanoma from Mayo Clinic, 24.3% of positive SLN showed only individual melanocytes or melanocytic cell clusters <0.1 mm10. Differentiating single benign from malignant melanocytes is challenging, especially when cells are only found on immunohistochemistry sections but not on corresponding hematoxylin eosin sections16.

When Donald Morton devised the SLNB procedure for cutaneous melanoma in the 1990s, molecular diagnostics was in its infancy. Routine molecular analysis of paraffin-embedded diagnostic biopsy tissue was not feasible. The paradigm shifted after 2000, when OncotypeDx18 and MammaPrint19 were introduced, gene expression-based assays that stratified the likelihood of breast cancer recurrence. Today, we can apply a vast machinery of sophisticated molecular tools to analyze routinely processed biopsy samples. The question arises whether a complex surgical procedure like SLNB20 can be supplemented by the molecular analysis of diagnostic biopsy tissue, for example, to better select patients for the procedure (Fig. 1).

Fig. 1: Based on current patient selection, approximately 85% of sentinel lymph node biopsies (SLNB) are negative (SLNB-) and non-therapeutic.
figure 1

In the future, molecular tests based on gene expression profiling (GEP) may be combined with clinicopathologic (CP) variables such as Breslow thickness and patient age to better select patients for SLNB. CP-GEP low risk patients can forgo SLNB without compromising oncologic safety.

Clinicopathologic predictors of metastasis risk

Breslow thickness is the bedrock on which melanoma staging is built2. Conceived in the late 1960s by Alexander Breslow21, the idea of using tumor thickness as a predictor of metastasis and survival is powerful and simple. Measuring Breslow thickness accurately, however, requires the complete excision of the primary tumor, which in today’s busy clinical practice does not always take priority. There is an increase in the use of shave biopsies, which are fast, easy to do, and sutureless, but can lead to transected tumors22,23. Other variables have stood the test of time. Paradoxically, numerous studies have found that younger age increases both SLNB metastasis risk and survival rates24, whereas elderly patients are less likely to be node positive but have a worse prognosis25. In pediatric patients, SLNB loses its prognostic value because sentinel nodes are frequently positive and the clinical course is excellent26.

Additional primary tumor-derived prognostic variables include tumor ulceration, Clark level, mitotic rate (MR), lymphovascular invasion (LVI), tumor-infiltrating lymphocytes (TILs), and histologic type, among others. While these have been known for decades and studied extensively, they have limitations. For example, the consistency of reporting Clark level, MR, LVI, and TILs is low, highlighting problems with reliability27. Discordance in the reporting of MR between institutions is high and can result from the counting of mitotic figures at the dermato-epidermal junction instead of the invasive component of a melanoma or by counting mitotic non-melanocytes28. Problems with reliability have led to the removal of MR as a staging variable for thin melanomas, a decision widely applauded and cited as the reason for the greater reliability of current guidelines29. Other histologic features have powerful implications but are rare. Pure desmoplastic melanomas, for example, almost never metastasize to sentinel nodes but occur in only 3% of SLNB-eligible patients10. Acral melanomas often metastasize to sentinel nodes but occur in only 1–2% of SLNB-eligible patients in the United States10,30.

Which clinicopathologic (CP) variables are used to make predictions? The American Joint Committee on Cancer (AJCC) relies on two primary tumor-derived histologic features for staging, Breslow thickness and tumor ulceration2. The Memorial Sloan Kettering Cancer Center (MSKCC) nomogram for predicting SLNB outcome uses five CP variables: Breslow thickness, tumor ulceration, patient age, Clark level, biopsy location31. The more recent Melanoma Institute of Australia (MIA) nomogram considers six: Breslow thickness, tumor ulceration, patient age, histologic subtype, MR, and LVI32. The most ambitious models for predicting SLNB status, such as those developed by the University of Padova and MIA, combine up to eight CP variables, and put special emphasis on MR as a feature outperforming Breslow thickness33.

A review of patients seen at Mayo Clinic did not reproduce the outsized role of MR as a predictor of nodal metastasis. When we applied statistical methods to identify CP-based predictors of nodal metastasis that maximize the separability of patients and minimize prediction error, Breslow thickness and patient age were sufficient10. More complex CP models did not improve model performance. We concluded that there is a limit to the ability of CP factors to predict SLNB status10. We further hypothesized that our vast new knowledge of cancer cell biology combined with the availability of new molecular techniques should facilitate the discovery of predictive and prognostic molecular features34.

Molecular predictors of metastasis risk

Ambitious CP models like those by Mocellin and coworkers33 identified 18–30% of SLNB-eligible patients who could safely forgo the procedure, meaning that the negative predictive value (NPV) of a low risk test result was greater than 90%. CP models, operationalized as nomograms, are therefore believed to hold promise as clinically relevant tools for SLNB decision making31,32. To improve on the performance of CP variables, gene expression-based molecular biomarkers for melanoma risk-stratification have been explored in translational research34,35,36,37, and are now introduced to clinical care10,38,39. Current clinical guidelines acknowledge the potential of molecular tools, but also highlight the need for extensive prospective validation1. Recently, a consensus statement by the Melanoma Prevention Working Group, a group of melanoma key opinion leaders, defined the required evidence to endorse the use of expression-based melanoma risk-stratification assays in clinical care40. The group noted a lack of high-quality evidence supporting the routine use of prognostic gene expression profile (GEP)-based testing and highlighted the need for both prospective validation and benchmarking against CP variables.

Two prognostic GEPs are used in patient care or clinical trials, i.e., the 31-GEP (DecisionDx)38 and 11-GEP (MelaGenix)41. 31-GEP is reimbursement by Medicare for identifying patients at risk of nodal metastasis42, but has been developed as a purely prognostic tool38. 11-GEP is studied as a decision tool for postoperative adjuvant therapy in an ongoing multicenter trial for stage II melanoma in Germany43. The added value of these GEPs over established CP variables has not yet been convincingly shown.

The CP-GEP (Merlin) test for melanoma risk assessment

Another concept of molecular testing was introduced with CP-GEP (Merlin)10,44,45. The CP-GEP Assay, which is reimbursed by Medicare, is the only GEP-based classifier specifically developed to identify patients at risk of nodal metastasis. Unlike pure GEP-based models like 31-GEP or 11-GEP, the CP-GEP approach, introduced by Suman and colleagues34, was to first develop models of the likelihood of nodal metastasis based on either CP variables (CP models) or GEPs of the primary tumor (GEP models) and then to assess the performance of a combined model of CP and GEP factors (CP-GEP models). CP variables considered included Breslow thickness, Clark level, MR, ulceration, LVI, biopsy location, regression, histologic type, and patient age at diagnosis. Of these, variable selection and regularization techniques selected Breslow thickness and patient age. More complex CP models did not improve performance. The combined CP-GEP model was based on Breslow thickness, patient age and the expression of eight well-characterized genes, which have been functionally linked to melanoma invasion and metastasis, i.e., Melanoma Antigen Recognized By T-Cells 1 (MLANA), Macrophage Inhibitory Cytokine 1 (GDF15), Interleukin 8 (CXCL8), Lysyl Oxidase Like 4 (LOXL4), TGF-β Receptor I (TGFBR1), β3 Integrin (ITGB3), Tissue-Type Plasminogen Activator (PLAT), and Protease Nexin 1 (SERPINE2)10.

In the original publication on CP-GEP10, performance was stratified by tumor (T) stage because currently, T staging constitutes the basis of SLNB decision making. The predictive value of CP-GEP over established staging parameters was expressed by the SLNB reduction rate, a metric introduced by Mocellin and colleagues33 to quantify the fraction of patients who can be deselected for SLNB by a test. The SLNB reduction rate of CP-GEP varied by T stage, was highest for T1b melanoma, and then decreased as tumor thickness increased10. Of note, the SLNB reduction rate remained positive even for T3b melanoma, indicating that CP-GEP outperformed CP variables across a broad range of tumor thickness. Bartlett and colleagues46 from the MSKCC acknowledged in a critical review that CP-GEP achieved a 14% improvement in SLNB reduction rate over the best CP models. The overall SLNB reduction rate over current clinical practice, which is based on T staging, was ~60%10.

While achieving a high SLNB reduction rate is desirable, test results need to be oncologically safe. In other words, the number of false-negative test results for a rule-out test like CP-GEP needs to be <5% and the NPV > 95%. These are the metrics that apply to current patient selection. SLNB should be considered and discussed with patients if Breslow thickness is ≥0.8 mm (stage T1b) because T1b melanoma metastasizes to SLNB at a rate of ≥5%. SLNB is recommended for T2a melanoma because T2a melanoma metastasizes to SLNB at rate of ≥10%. T1a melanoma, in contrast, metastasizes at a rate below 5% and, therefore, does not require SLNB1. At a false-negative rate of <5% for T1a melanoma, we likely miss a non-trivial number of node positive T1a patients because the number of patients with T1a melanoma is very large. But as part of the trade-off, we also avoid a very large number of unnecessary, non-therapeutic SLNB procedures. The idea of a test like CP-GEP is to extract molecular information from slightly thicker melanomas, such as T1b and T2 melanomas, and use molecular information to push the pretest probability of a positive SLNB to below 5% (NPV > 95%). Low risk tested patients can then forgo SLNB based on established patient selection criteria, which allows us to dramatically reduce the number of unnecessary SLNB. However, because we stop doing SLNB on all patients with T1b and T2 melanoma, we also miss a few patients with positive SLNB. The trade-off is the same as for T1a melanoma.

Some have argued that ruling-out patients for SLNB by CP-GEP is problematic because patients with false-negative test results are precluded from receiving adjuvant systemic therapy46. However, the likelihood of receiving a false negative test result with CP-GEP is no higher than for T1a melanoma, which we also do not consider for SLNB. Further, CP-GEP testing might aid in the interpretation of SLNB with minimal tumor burden. More than 20% of patients with positive SLNB have single positive nodes with minimal numbers of melanocytes (individual tumor cells or cell clusters <0.1 mm)10,47, which are often only detectable by highly sensitive methods such as immunohistochemistry. The prognostic significance of minimal SLN tumor burden is unclear and optimal treatment uncertain. Preliminary evidence from an international multiinstitutional cohort of 1,684 melanoma patients suggests that CP-GEP might help stratify patients with minimal SLN tumor burden according to recurrence risk and guide clinical decision-making beyond nodal assessment48. The prognostic value of CP-GEP has previously been established by Eggermont et al. who have shown that CP-GEP very effectively stratifies SLNB-negative patients based on their relapse risk45,49,50.

During the COVID-19 pandemic, CP-GEP has helped prioritize patients for surgery, thereby optimizing healthcare resources20,51. CP-GEP is the only test with a GEP component that has been shown to add predictive value over complex CP models such as the MSKCC10 and MIA nomograms44. A multicenter registry study is ongoing in the United States to prospectively validate CP-GEP52. A number of retrospective validation studies have also been published47,53,54,55.

Molecular testing: Why bother?

Some authors have argued that CP-GEP adds cost to patient care without significantly outperforming CP-based models like the MIA nomogram56. Here, we argue for molecular testing and discuss the limitations of models that solely rely on CP-based nomograms.

Issue #1: There is variability in the reporting of CP variables used in nomograms

Patient age, Breslow thickness, tumor ulceration, and TNM pathological stages are consistently reported among trained pathologists from different centers and geographical areas. There is less agreement on other histopathological features, including Clark level, regression, TILs, MR, and LVI, which are often used in CP-based nomograms27,28,57,58, or experimental variables such as calculated tumor area, which have yet to be validated59. As was pointed out by Monshizadeh and colleagues27, the majority of melanomas are diagnosed in non-specialized non-academic centers, and variability in reporting is a function of training and experience. In the United States, there are disparities in the reporting of MR across institutions that exist within traditionally disadvantaged populations60. Of note, Hispanics and non-Whites are also less likely to undergo excisional biopsies61. Because of the variability in histopathology read-out, some dermatopathologists have argued for the routine use of GEP-based testing, which they consider methodologically precise62.

Issue #2: CP-based nomograms produce inconsistent results and have not been extensively validated

Three CP-based SLNB prediction models from specialized academic centers are accessible online: the MSKCC31,63, MIA32,64 and Life Math (Harvard) nomograms65. These models use different sets of predictors. For example, the newer MIA model considers LVI, whereas the older MSKCC and Life Math models do not. Not all patients fit the preset categorizations of nominal variables like histologic type in some of the nomograms. Conversely, for MR, models require discrete values when pathologists often report ordinal ranges. In our experience, MIA nomogram risk scores could not be calculated for 10 to 20% of patients, mostly because of incompatible histologic types44,54. Moreover, there are disparities in predicted outcomes between the nomograms that directly impact patient care66. These are likely due to variable choice, historical differences between model development cohorts, and local idiosyncrasies of pathology interpretation.

Expectations expressed by melanoma key opinion leaders for the development and validation of GEP-based tools40 greatly exceed what has been the norm for CP-based nomograms. 31-GEP leads the way in the number of validation studies and publications, however, clinical and statistical experts have criticized data quality, study design and statistical approach67,68,69. A multicenter clinical trial is underway in Germany to study 11-GEP for postoperative adjuvant therapy in stage II patients43. CP-GEP has been validated retrospectively in multiple independent cohorts both in the United States and Europe47,53,54,55. A multicenter prospective registry study for CP-GEP is recruiting in the United States52.

Issue #3: Molecular testing has the potential to be cost saving

For patients with a pretest probability of SLNB positivity that is greater than 5–10% and who do not otherwise qualify for adjuvant pembrolizumab8, SLNB continues to be critical for therapeutic decision making1. However, if patient selection is solely based on Breslow thickness and tumor ulceration, as is currently the case1, approximately 85% of SLNB procedures will be negative and non-therapeutic10. To better quantify the pretest probability of SLNB metastasis and reduce the number of SLNB-eligible patients without sacrificing oncologic safety, molecular testing will likely have a role to play. Whether molecular testing is cost saving depends on the cost of the test, the cost of SLNB, and the SLNB reduction rate that a test can achieve while maintaining oncologic safety at an NPV of ~95%. It follows that the great majority of T1a patients should not be tested as their risk of nodal metastasis is <5% based on Breslow thickness alone. Exceptions may include patients with transected biopsies, elevated MR, or other worrisome histopathologic features recognized by an experienced dermatopathologist. T1b and T2 melanomas, in contrast, are well-suited for a test like CP-GEP because the SLNB reduction rate achieved at an oncologically safe NPV of ~95% is high (up to 80%)10. At an average cost of SLNB in the United States of ~$19,00070, CP-GEP testing would be cost-saving even at a several thousand dollar price point. Studies on this important topic are forthcoming. Of note, molecular testing not only reduces the rate of SLNB but also improves health-related quality of life as CP-GEP low-risk patients can forgo a complex surgical procedure20.

Future directions

Melanoma risk assessment is changing with the recent advent of molecular tests like CP-GEP, which hold the promise of improving patient selection for SLNB and other interventions10. We interpret these tests as an important step forward, as they hold the promise of improving patient outcomes while reducing healthcare costs. Molecular tests like CP-GEP might help us avoid negative, non-therapeutic SLNB so that healthcare resources can be reallocated to more deserving interventions20,51. CP-based nomograms, which have been promoted as alternatives to molecular testing, tend to be unreliable as they rely on complex CP features such as TILs, MR, and LVI. These features are inconsistently reported among pathologists, which has been attributed to variability in training, experience, the local quality of care, and idiosyncrasies of histopathology interpretation27,61,66. It is therefore unsurprising that CP nomograms perform well in development cohorts from world-renowned subspecialized centers32 but validate poorly in alternate settings54. Molecular tests on the other hand are methodologically precise across continents and patient populations. Such tests can be combined with clear-cut CP predictors like Breslow thickness and patient age10,71. As we await the transition to a molecular-driven world, the careful evaluation of melanoma histopathology and sentinel nodes will remain important cornerstones of melanoma staging and therapy.