Main

In 1993 a pilot series of sentinel lymph node (SLN) biopsies in breast cancer patients was published where the SLN was identified using a hand held gamma probe after injection of a radioisotope tracer around the breast tumor.1 This made it possible to identify the location of the sentinel node before skin incision and use the probe to guide surgery. This new technique complemented sentinel node biopsy research using vital blue dyes that was being investigated particularly in melanoma but also in breast cancer. The sentinel node biopsy technique for breast cancer rapidly diffused throughout the surgical community. Sentinel nodes were more likely to contain metastases, if they were present, and occult metastases deeper in paraffin blocks were more likely to be identified in SLNs than non-SLNs.2 The occult metastasis investigation was initially applied as a ‘proof of principle’ but was also viewed as a mechanism to more accurately stage breast cancer patients, reopening a Pandora's box from the 1940s. It has been difficult to reach consensus on SLN standardization because reference outcome studies in breast cancer often take decades, especially in subgroups with more favorable prognoses, a group the majority of sentinel node patients represent. In fact, we have observed our node positive rate in sentinel nodes decrease from 26 to 15% over the last decade as ultrasound screening with fine needle aspirate biopsy of abnormal regional nodes has improved patient selection for sentinel node biopsy. Another complicating factor is our new understanding of intrinsically aggressive biologic subtypes of breast cancer such as ‘triple negative’ basal phenotype tumors and Her2/neu over expressing tumors. Minimal nodal tumor burden may have a different significance for these patients compared to Her2 negative, estrogen receptor positive patients. In short, the issues surrounding evaluation of breast SLNs are complicated by limited or unavailable clinical outcome data, understanding definitions and applying classification criteria, and a lack of standardized node evaluation protocols.

Detection and clinical significance of micrometastases

In 2009, we are still debating and trying to understand the clinical significance of micrometastases in breast SLNs. In the context of studies using pre-2003 data, micrometastases include all metastases 2.0 mm in greatest dimension. A recent analysis of population-based data from the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) national cancer database showed that the presence of micrometastases no larger than 2.0 mm in lymph nodes is associated with an overall decrease in survival at 10 years of 1, 6, and 2% for T1 (no larger than 2.0 cm), T2 (larger than 2.0 cm but no larger than 5.0 cm), and T3 (larger than 5.0 cm) tumors, respectively, compared to patients with no nodal metastases detected.3 This SEER analysis included years prior and subsequent to the widespread use of SLN biopsy. Thus, for the usual sentinel node patient with a mammographically detected tumor <2.0 cm, this study suggests there is little expected detrimental impact associated with the presence of micrometastases. However, the study does suggest that for larger tumors, detection of micrometastases may be more important. As this is a population-based study, we have no idea how the nodes were sampled leaving open the possibility that micrometastases in larger tumors are a marker of aggressive intrinsic biology or a marker of undetected macrometastases deeper in paraffin blocks. In contrast to this SEER analysis, a large retrospective analysis of pre-SLN era data from California and Massachusetts showed no impact on 15-year mortality estimates in any tumor size category when only a single lymph node contained a metastasis, regardless of primary tumor size.4 In that analysis, the median metastasis size was 6.0 mm. Their study supports the hypothesis that primary tumor characteristics have more prognostic importance than minimal lymph node tumor burden. The study also raises questions about grouping patients with one positive node together with patients having two or more nodes positive. One other issue that often confuses the discussion of significance of micrometastases in SLNs is the predictive vs prognostic significance. The predictive significance concerns whether to perform a completion axillary dissection after detection of a metastasis in a sentinel node; this is not the topic of the current discussion. The prognostic significance concerns disease-free survival (risk for axillary or systemic recurrence) and overall survival.

A new category of metastases, isolated tumor cell clusters (ITCs), was defined in the 2003 version of the AJCC/UICC staging manuals to accommodate the increasing frequency of detecting tiny tumor deposits in sentinel nodes and to limit stage migration secondary to comprehensive histopathology evaluation of lymph nodes. Micrometastases are defined as tumor deposits larger than 0.2 mm but no larger than 2.0 mm and ITCs are defined as cell clusters or single cells with no single cluster larger than 0.2 mm.5 The threshold limits for ITCs and micrometastases represent a 1000-fold difference in volume of a spherical metastasis (Table 1). The threshold upper size limit for an ITC also is an attempt to acknowledge that humans are imperfect and screening for minute metastases is imprecise (see below). The application of the definition of an ITC has been mildly problematic. The AJCC definition focuses exclusively on size and the UICC definition includes additional restrictions including assessments of potential proliferation and stromal reaction.6 These definitional differences can lead to differences in patient stratification.7, 8 However, when a uniform definition is applied together with example training images, reproducible and uniform classification can be achieved.9

Table 1 Tumor volume for spherical metastatic foci

An essential aspect of patient care is an understanding of the uncertainty inherent in our examination of SLNs. The fact that what we observe on a microscopic section may not be the largest dimension of a metastasis translates to prognostic overlap between the size-defined nodal classification groups (ITCs and micrometastases). This will confound statistical outcome analyses of ITCs and micrometastases, particularly for small studies, making it difficult to show differences in outcome for the two groups. Moving away from dichotomous concepts such as ‘node positive’ and ‘node negative’ and embracing the continuum of semi-quantitative nodal tumor burden is in the best interest of our patients regardless of these limitations. We know that increasing numbers of positive lymph nodes confer a worse prognosis10 and conversely that decreased metastatic tumor volume is associated with improved outcome. This concept can intuitively be extended to micrometastatic tumor burden where the predicted prognostic impact should be somewhere between a patient with negative nodes and one with a single node positive; a common sense test we should apply to prognostic analyses of SLN outcomes. Nodal tumor burden is a continuum from a single cell to bulky palpable disease and prognosis should follow tumor burden when all other intrinsic biologic factors are equal.

In several respects, detecting nodal metastases is similar to trawling for fish: the size of the net will largely determine the size of the fish. In the case of sentinel nodes, the size of the net is determined by the thickness of unexamined tissue. In 1971, Huvos et al11 showed that patients with metastases no larger than 2.0 mm had similar survival to patients with negative nodes. This concept was incorporated into the Manual for Staging Cancer and the dividing line between micrometastases and macrometastases was set. There is virtually uniform international agreement that the first net we cast when looking for nodal metastases is 2.0 mm. In other words, historical precedent and outcome evidence indicate that we do not want to miss metastases larger than 2.0 mm. How do we accomplish this seemingly simple task? First, we must inspect the node and any adherent fat. If any dimension is larger than 2.0 mm, the node must be sectioned. Most lymph nodes take the form of an asymmetric ellipsoid, or are bean shaped, with one long axis and two shorter axes. We recommend cutting the node parallel to the longest axis even though this is harder than sectioning perpendicular to this axis. Cutting parallel to the long axis produces fewer 2.0 mm slices to examine and there is old anatomic data that suggest afferent lymphatics are more likely to enter the node in this plane. Thus, the number of afferent lymphatic junctions with the subcapsular sinus may be increased when the sections are in the same plane as the two ‘faces’ of a bean-shaped node. In reality, it is often difficult to discern this microanatomy. What is important is assuring that no slice is thicker than 2.0 mm. This is such a fundamental concept that it bears emphasis: the single most important advancement in breast cancer staging and pathologic assessment of axillary lymph nodes attributable to SLN biopsy is thin slicing prior to embedding and the high likelihood of detecting all metastases larger than 2.0 mm. This principle is relevant regardless of whether the pathologist is examining a full axillary dissection, an axillary sampling, or a well executed sentinel node biopsy. In our institution, and in many others, thin sectioning of nodes has trickled into all node evaluations in the surgical pathology suite including evaluation of colectomy specimens, pelvic node dissections, head and neck dissections, and any other oncologic specimen with node evaluation. I was first taught this technique as a pathology resident evaluating nodes for malignant lymphoma where the goal was superb fixation.

The capsular relaxation induced by bisecting nodes will often be sufficient to produce 2.0 mm sections. In this situation, the two opposing cut faces should be placed down in the tissue processing cassette and full face sections should be examined microscopically. Histology technicians are taught to cut from the surface placed down in the cassette. When a node is trisected, the two end pieces should be placed cut surface down; the middle section is placed randomly unless gross examination identifies a suspicious lesion and then this is placed down in the cassette. When more than three sections are submitted, the middle sections must be carefully placed in the cassette so that two opposing faces are not placed down in such a manner that microscopic sections are more than 2.0 mm apart (Figure 1). All pathology laboratories should strive to meet this standard protocol of submitting sections that are no thicker than 2.0 mm and assuring that at least one microscopic section is examined every 2.0 mm through the node. We recently showed that the median lymph node section thickness for a group of SLN study patients was 2.1 mm but the modal thickness was 2.3 mm with over half the blocks containing node slices thicker than 2.0 mm and 9% with slices thicker than 3.0 mm.12 Continued diligence is necessary if we are going to efficiently identify macrometastases larger than 2.0 mm.

Figure 1
figure 1

Gross sectioning and embedding of sentinel nodes. The primary objective of the gross management of sentinel nodes is assuring that all macrometastases larger than 2.0 mm are identified microscopically by assuring no slice is thicker than 2.0 mm before embedding in paraffin. When nodes are serially sectioned, special care must be taken to place the sections into the embedding cassette in a manner that eliminates more than 2.0 mm of unexamined tissue. Histology technicians are taught to embed and cut the surface that is placed down in the tissue-processing cassette. Dashed lines represent the surface placed down in the cassette. (a and b) Central serial sections are placed in the cassette so that nonopposing surfaces are examined microscopically. One of the end sections will be an opposing surface. (c) This shows incorrect grossing and embedding preparation. The central serial sections were placed in the cassette so that neither surface containing the micrometastasis was evaluated microscopically.

Statistical principles

By slicing nodes at 2.0 mm intervals then embedding all the slices and examining one section from the top of the paraffin block we have a high probability of detecting all metastases 2.0 mm. The term ‘detection’ needs further clarification. Surely some smaller fish will have been caught in our net by chance. Unlike the fishing analogy, we cannot see the entire metastasis in a two-dimensional microscopic section. We do not mean that each 2.0 mm metastasis will actually measure 2.0 mm. We may be looking at the largest dimension of a nodal metastasis or at the tip of a metastatic iceberg where more tumor is lurking deeper in the paraffin block or has already been cut away during facing of that block. As scientists, we could solve this dilemma: all we need to do is mount every section from the paraffin block on glass and examine these sections microscopically. Now, we would be able to reconstruct the three-dimensional size of any metastases present and we would be much more certain that a node was negative when we fail to detect metastases. If we are very careful to assure that our gross sections are no thicker than 2.0 mm (2000 μm) and we cut our sections at 0.005 mm (5 μm) we will need to examine 400 sections per paraffin block. Most of us are not willing to do this so how do we proceed? First, we must accept uncertainty. Second, we must develop statistical reference groups based on what we see in our sections not what might actually be present in the block. Third, we compare the admittedly inaccurate findings of an individual patient assessment to the admittedly inaccurate reference groups to determine an ‘estimate of outcome’ that is also flawed but has practical value. However, it must be stressed that the empiric outcomes of the reference groups are entirely dependent on the assay system used to develop the groups. With respect to node classification groups, the assay is the procedure used by the pathologist to determine whether nodal metastases are present and if present, the procedure used to quantify the metastatic burden. This may include gross examination, microscopic examination, serial sections, immunohistochemistry (IHC), or molecular techniques. The dilemma becomes more complex when we deviate from the reference assay, in this case when our examination of the lymph node differs from the examination used in the reference population. Over time, we have developed and adopted more sensitive detection assays (eg thin sectioning of nodes, serial sections, etc) that have shifted ‘node negative’ patients to ‘minimally node positive’ patients. The resulting medical Will Rogers effect tells us that both groups will have improved outcomes compared to historical reference standards.

Historically, lymph nodes were examined grossly. Microscopic examination taught us that some nodes that appeared negative by gross examination were in fact positive on microscopic examination. These observations were established long before I began to practice pathology. In my training, except when nodes were small, we rarely submitted the entire node for microscopic examination. When the entire node is submitted, the likelihood of detecting metastases increases; conversely, when we fail to examine the entire node, we as pathologists miss metastases that are present.2 Thin sectioning of nodes accompanying SLN biopsy detects more metastases. In fact, in the United States SEER national cancer database, node positive Stage II breast cancer increased from 60 to 80 cases per 100 000 population-based individuals during the period from 1995 to 1999 when it reached a new plateau (Figure 2). We would expect these patients to have outcomes somewhere between node positive and node negative patients because we know that nodal tumor burden is a continuum with respect to survival.10 As we have monumentally altered the sensitivity of our assay (ie the pathologic node examination) we cannot expect minimally node positive patients (micrometastases and ITCs) to have outcomes comparable to historical reference groups whose pathologic examination occurred before the early 1990s. In other words, retrospective studies examining the clinical significance of occult metastases in lymph nodes are doomed to predictive failure—they are overly pessimistic—because we did not routinely submit the entire node or thinly section lymph nodes prior to the ‘sentinel node era.’ Occult metastases in these studies may represent an epiphenomenon of macrometastases that were ‘left in the bucket’ and never embedded in paraffin; we will never really know. All these studies can tell us is that it is worth taking another look at micrometastases in the setting of our new and more comprehensive assay system of thinly slicing lymph nodes.

Figure 2
figure 2

Stage II breast cancer incidence, US women 50–64-years old, 1992–2000, National Cancer Institute, Surveillance, Epidemiology, and End Results (SEER) data base. Stage II breast cancers are heterogeneous and include tumors larger than 2.0 cm with negative lymph nodes (Stage 2, N−) and tumors smaller than 2.0 cm with positive lymph nodes (Stage 2, N+). The sentinel lymph node biopsy (SLNB) technique for breast cancer rapidly disseminated after it was initially reported in the early 1990s. The steady rise in Stage II node positive breast cancer (solid diamonds) from 1995 to a new plateau in 1999 can be attributed to SLNB. These new Stage II breast cancers were recruited from Stage I node negative patients because of the more comprehensive evaluation of sentinel nodes and increased detection of nodal micrometastases. Note that the incidence of Stage II node negative breast cancer (open squares) has remained steady over the same time frame.

Sensitivity and sensibility of immunohistochemistry

I have deliberately avoided discussing serial sections and IHC up to this point because they cannot be considered part of a ‘standard’ protocol. To date, we do not have reference populations with outcome data that are the result of a uniform approach to microscopic pathologic evaluation of our thinly sliced nodes. All protocols that use serial sections and IHC must be considered experimental until validated with outcome data. It is a verified fact that IHC can enhance detection of small tumor clusters and single cells; however, searching for metastases in a paraffin block with IHC is a little like fishing for whales with a minnow net: scooping the net through the surface of the ocean does not mean there are no whales in deeper waters. If we are interested in the prognostic significance of metastases <2.0 mm, I would suggest that it is more important to make sure we do not miss 1.0 mm metastases in a paraffin block rather than place too much emphasis on finding tumor clusters <0.1 mm in our initial sections. Many of the advocated experimental protocols consist of several levels that include routine and IHC stains. These protocols generally have sections separated by 20–200 μm and usually only evaluate the top 0.5 mm of the 2.0 mm block leaving 1.0–1.5 mm of the block unexamined and potentially harboring fairly large micrometastases. A protocol using only routine H&E stains and compulsive attention to thinly slicing lymph nodes 2.0 mm before embedding will detect more significant micrometastases than cytokeratin IHC performed on SLN paraffin blocks that are thicker than 2.0 mm.

Dealing with uncertainty

The best candidate protocols for meaningful prognostic information will include sections at predetermined intervals through the block. Even this can be an expensive proposition. For efficiency and economics, the NSABP B-32 experimental protocol used a three level examination. Nodes were sectioned thinly and a single section from the top of the block was used for clinical management. IHC was not allowed routinely but could be used when findings on routine stains were suspicious. After this, a blinded examination of all ‘negative’ blocks was conducted at a central laboratory (University of Vermont). A routine and IHC cytokeratin stain were evaluated at 0.5 and 1.0 mm deeper into the block relative to the surface (four total additional sections). This protocol was a compromise of equally spaced sections through the block because we anticipated that many SLNs would be bisected and this approach over samples the middle of the node or is ‘volume weighted.’ The primary objective of the B-32 protocol is to determine whether axillary recurrence rates or systemic recurrence rates are higher in women treated with SLN biopsy alone compared to axillary dissection.13 If this study proves to be a negative study—no statistically significant increase in axillary recurrence—then there is one other dilemma that must be solved in the SLN era of breast cancer: is it necessary to perform a completion axillary dissection in all patients with a positive SLN? This is the immediate predictive significance of a positive SLN that was mentioned above and generates another set of questions that is beyond the scope of this discussion. It was also the aim of the American College of Surgeons Oncology Group Z-011 trial, which had to be closed because of poor accrual. The secondary aim of the B-32 trial is to determine whether patients with occult metastases on deeper sections, the experimental pathology component of the protocol, represent a population at increased risk for axillary or systemic recurrence compared to the group with no further metastases detected. If this group does prove to be at increased risk, then we will have a randomized prospective reference population with outcome data on 4000 patients linked to a standardized gross examination and paraffin block sectioning strategy. We would be able to infer that any future patients evaluated with the same strategy would have similar outcomes to those in the reference population or subpopulations identified. Significant deviations from the pathology evaluation protocol may invalidate comparison to the outcome group(s).

As part of our ongoing quality assurance for this trial we have shown that humans screening lymph nodes will frequently miss tumor cell clusters that are isolated and up to 0.07 mm (70 μm) but also miss clusters as large as 0.1 mm even when IHC is used.14 When only routine hematoxylin and eosin stains are used, we expect slightly larger clusters would be missed, probably as large as 0.2 mm. This further complicates the practical and applied clinical significance of tumor clusters smaller than 0.1 mm (100 μm) or 0.2 mm (200 μm). Although prognostic impact estimates can be calculated for a large group of patients with detected metastatic deposits in this size range, the estimates will be inaccurate for an individual because many of the reference patients would have been incorrectly classified as ‘node negative’ and would have been excluded from the calculation or would in reality have micrometastases because we failed to detect the largest diameter of the metastasis. In other words, because there is a random component to detecting ITCs, the expected outcome for an individual patient with ITCs detected would be somewhere between calculated outcome estimates for ‘node negative’ patients and patients with ITCs or micrometastases detected.

In another quality assurance project, we examined the empiric detection rates for ITCs and micrometastases using several different paraffin block sampling protocols compared to a comprehensive sectioning protocol where one microscopic section was evaluated every 0.18 mm (180 μm) until the lymph node tissue block was exhausted.12 It is no surprise that the most comprehensive protocol detects the most tumor deposits; however, this has significant economic implications with respect to health-care dollars spent manufacturing the slides and pathologist time screening the slides. For less comprehensive sampling strategies, the best performance is observed when the microscopic sections examined are widely spaced; a strategy that maximizes detection of larger metastatic deposits at the expense of missing smaller deposits (Figure 3). The protocol with the worst performance examined several sections with narrow spacing (180 μm) between each section. As expected, this study also showed that micrometastases (larger than 0.2 mm) would be misclassified as ITCs for any sectioning protocol with spacing between levels more than 0.2 mm. For example, compared to the comprehensive protocol with sections examined through the block every 0.18 mm, examining three levels with 0.5 mm spacing between each level under classified 22% of cases with micrometastases. This observation clearly shows why it will be difficult to evaluate prognostic differences between ITCs and micrometastases except in studies with very large numbers of patients and sufficient statistical power. To accomplish this task, we would need a standardized comprehensive evaluation protocol (for example, levels through the block at 0.1 mm spacing), accurate assessment of primary tumor prognostic variables, and long-term follow-up. To perform the appropriate subset analyses (tumor size, grade, hormone receptor status, Her2 status) we would need tens of thousands of patients followed for at least 10 years. This exceeds the expectations of a clinical trial and could only be accomplished in a population-based observational database where the inherent variability in data quality would increase the number of patients necessary to achieve statistical power. In the interest of our patients, our energy will probably be better spent focusing on a less comprehensive but statistically valid sampling of sentinel node paraffin blocks, standardizing the evaluation protocol nationally or even internationally, standardizing nodal classification, and then evaluating outcome at some point in the future through population-based registries or national cancer databases.

Figure 3
figure 3

Performance of various microscopic sectioning protocols for detecting occult micrometastases. (a–e) All SLNs were grossly sectioned at close to 2.0 mm thick sections. (a) The reference protocol examined one section from the top of the block. All cases with negative initial sections were evaluated with protocol (e); protocols (bd) were simulated by examining only specific sections from protocol (e). Detection rates for micrometastases >0.2 mm and no larger than 2.0 mm (Micromet) and ITCs no larger than 0.2 mm (ITC) were calculated for each protocol. (b) Two additional sections separated by 0.18 mm. (c) Two additional sections separated by 0.5 mm. (d) Four additional sections separated by 0.5 mm. (e) Multiple additional sections separated by 0.18 mm completely through the block (median 11 sections per block). Only protocol (e) can reliably detect all micrometastases present but will still miss ITCs. The maximum size of missed metastases is dependent on the thickness of tissue not examined between each section or remaining in the block. Protocols (c) and (d) are compromise protocols that perform better than protocol (b) and do not perform as well as protocol (e) but are less expensive and less time consuming than protocol (e). (Data adapted from Ref. 12).

Standard recommendation

In summary, only one standard protocol for evaluating SLNs can be supported and endorsed based on evidence, albeit old evidence, at this time. Thin sectioning of nodes at 2.0 mm intervals, embedding all sections, and examining one section from the surface of the block is a strategy designed to detect all metastases larger than 2.0 mm. The resulting metastases can then be placed into statistically stratified groups such as those defined by AJCC and UICC. This strategy is recommended by the College of American Pathologists and the American Society of Clinical Oncology.15, 16 It is recognized that more comprehensive sampling will identify additional micrometastases and ITCs; however, we have not yet developed the reference populations with these experimental protocols to understand their prognostic significance for patients. Candidate new standard protocols should only include evenly spaced levels through the block. Clinical trials such as the NSABP B-32 pathology study of occult metastases and other studies where the SLN gross examination protocol has been standardized will help us determine whether additional levels are clinically useful.