Main

Correct diagnosis of precancerous endometrial lesions enables treatment before progression to an invasive neoplasm. Correspondingly, exclusion of a premalignant lesion is important to avoid over treatment. Precancerous lesions in the current WHO schema are encompassed within a spectrum of four types of endometrial hyperplasias, diagnosis of which has a high intra- and interobserver variability1, 2, 3, 4, 5 in addition to modest predictive value for carcinoma.1, 2, 6 New diagnostic criteria, for fewer diagnostic categories, are required to resolve these issues.

Endometrial intraepithelial neoplasia (EIN) is a high-cancer-risk lesion first proposed by an international group of pathologists in 2000,7 in response to newly emergent molecular, histomorphometric, and clinical outcome studies.7, 8, 9 EIN fulfills all lines of evidence necessary to define a premalignant lesion, as established by a National Cancer Institute-sponsored consensus conference in 2006.10 Molecular studies have shown that these lesions are clonally related to invasive carcinomas.11, 12, 13 Correlation of molecular, histomorphometric, and clonal characteristics of endometrial lesions with clinical cancer outcomes has permitted de novo discovery of those diagnostic criteria, which characterize premalignant lesions, thus enabling precise definition of EIN diagnostic criteria.14, 15 Key changes include objective architectural thresholds (gland area exceeds that of stroma), interpretation of cytological changes relative to background normal glands in the same sample, and minimum lesion size (1 mm) within a single tissue fragment. EIN classification has better diagnostic reproducibility16 and cancer predictive value than17 WHO endometrial hyperplasias. Patients with EIN lesions have a 45-fold increased endometrial cancer risk.13

EIN has simple diagnostic criteria, outlined above, that can be used on a routine basis using only standard hematoxylin and eosin-stained tissue without the need for additional immunohistochemical or sophisticated molecular studies.8 This characteristic makes it very appealing to the routine pathologist. Successful implementation of any new diagnostic schema, however, requires learning new criteria and consistently applying them in the intended manner. This is especially challenging with EIN, as the background tissue appearance is highly variable, and some lesion features such as size, method of assessing cytological change, and threshold gland densities are unfamiliar to many pathologists. Prior studies testing the reproducibility of this terminology have involved teams consisting only of gynecological pathologists, moreover some of these were carried out using the morphometric D-score rather than plain light microscopy.9, 18, 19 In this study we test the reproducibility of the EIN diagnosis using only light microscopy. We further tried to emphasize its simplicity, with respect to the ease of learning and application of the diagnostic criteria, even to the inexperienced routine pathologist.

Materials and methods

Case Selection

Sixty-two endometrial biopsy and curettage materials performed for suspicion of endometrial hyperplasia between the years of 2001 and 2008, were retrieved from the archives of the Pathology Department of the Hacettepe University. We compiled a case mix of roughly equal numbers of atypical (n=25), complex non-atypical endometrial hyperplasia (n=21) and simple non-atypical (n=4)supplemented with examples diagnosed previously as diagnostically difficult benign lesions suspected by the pathologist to represent some form of hyperplasia (n=8) or carcinoma (n=4). One representative slide from each of the 62 technically adequate endometrial biopsies was assessed by the reviewing study pathologists.

EIN Diagnostic Categories

Each case was classified into one of the following diagnostic entities: specimen technically inadequate for diagnosis, benign non-EIN, EIN, or adenocarcinoma. EIN was diagnosed according to published criteria that include presence of all five diagnostic elements as listed here:15

  1. 1)

    Gland area greater than stromal area

  2. 2)

    Cytological demarcation of the lesion relative to background

  3. 3)

    Lesion size exceeds 1 mm in minimum diameter

  4. 4)

    Exclusion of mimics

  5. 5)

    Exclusion of cancer

Reference EIN Diagnoses by Expert Panel (‘Expert Pathologists’)

All 62 cases were rediagnosed using published EIN criteria by two subspecialty trained gynecological pathologists involved in the initial development of EIN diagnostic criteria (expert 1 and expert 2). Each expert independently reviewed all cases blinded to history and prior diagnoses, and a consensus expert diagnosis (‘reference’) was attained by blinded agreement, or in cases of disagreement, by adjudication at a multiheaded microscope.

Review by Practicing Pathologists (‘Reviewing Pathologists’)

Following review of published training materials, 20 practicing pathologists rediagnosed all cases blinded to the reference EIN diagnosis and all clinical features. These 20 European pathologists differed in the duration of practice experience and practice context. Seventeen were gynecological pathologists working in different regions of Turkey either in university or community hospitals with varying gynecological material workloads. Three were general surgical pathologists without specialized training in gynecopathology. Before evaluating the cases all participants were advised to read two references ; ‘Benign Endometrial Hyperplasia and EIN ‘chapter in Robboy's Pathology of Female Reproductive Tract20 and view online instructional material at www.endometrium.org. After scoring the slides, each pathologist completed a questionnaire grading (on a 5-point scale) the learnability and usefulness of the EIN terminology in daily practice (Table 1).

Table 1 Answers of 20 reviewing pathologists to the questionnaire

Statistical Analysis

A measure of diagnostic accuracy for each reviewing pathologist was generated by comparison with reference diagnoses. Reference diagnoses were the adjudicated concensus diagnoses of experts 1 and 2. Accuracy was expressed in two metrics: that percentage of cases in which the reference and reviewing pathologist diagnoses were concordant; and the weighted kappa statistic (κw) between the reviewing pathologist and reference diagnoses. Interobserver agreement between the 20 reviewing pathologists, a measure of diagnostic reproducibility, was calculated using the kappa statistic (κ) for multiple raters when there are more than two diagnostic outcomes.21

To evaluate the effect of current institutional affiliation, type of practice (speciality gynecological pathologist vs general surgical pathologist) and the diagnostic system currently used (EIN vs WHO) on the % agreement between reviewer and reference; we used Mann–Whitney U and Kruskal–Wallis tests. We explored the effect of the continuous variable of years of pathology experience on the % agreement using the Spearman correlation analysis. Kappa analyses and the statistical tests were performed in STATA version 10.0. The statistical significance was set at P<0.05.

Diagnostic trends were examined by hierarchical cluster analysis in a heat-map (color=diagnosis) matrix of reviewer (columns) by case (X and Y axis, respectively). On the basis of the hierarchical clustering of 20 reviewers, each was assigned to a diagnostic style group. Cluster analysis using a metric of percentage agreement and Ward's linkage was performed in Systat (v13, 2009, Systat Software, Chicago, IL, USA).

Results

Specimen Adequacy

All 62 specimens were considered technically adequate by the expert reviewers, who assigned diagnoses to each. Among the 20 reviewers reviewing the 62 cases, there were 1240 ‘diagnostic passes’. Of these, 99% (1229/1240) were considered diagnosable by the reviewers, who rendered a specific diagnosis.

Expert Review and Reference Consensus Diagnoses

The two expert reviewers agreed on the diagnostic classification as EIN vs other (benign non-EIN, or adenocarcinoma) and the consensus expert reference diagnoses, included 27 benign non-EIN, 26 EIN, and 9 adenocarcinoma cases.

Agreement Between Expert Reference and Each of 20 Observers

The percent agreement with regard to the reference diagnoses for the 20 individual pathologists are shown in Table 2. Overall, 79% of all reviewer diagnoses were exactly concordant with the reference expert diagnoses. As expected, unanimous agreement among all 20 reviewers was uncommon, occurring in 18 cases, (17 benign, 1 adenocarcinoma). The weighted κ values of each observer with the final consensus diagnosis are shown in Table 2, averaged 0.72, and varied between 0.45 and 0.84.

Table 2 Agreement between each of the 20 reviewing pathologists and consensus expert reference diagnoses (Percent concordant, weighted kappa statistics, κw)

Interobserver Reproducibility Among 20 Independent Observers

The extent of interobserver diagnostic concordance between all pairwise permutations of each of the 20 reviewers was calculated as a 20 × 20 Spearman correlation matrix of 62 cases across all reviewers. Of all 190 possible pairwise interobserver comparisons of reviewer classification, the average Spearman correlation coefficient was 0.72, range 0.37–0.93. The overall κ (Interobserver reproducibility) among 20 independent observers was calculated as 0.58 for all diagnostic groups (benign vs EIN vs cancer), 0.64 for benign (vs EIN+CA), 0.47 for EIN (vs benign+cancer), and 0.64 for carcinoma (vs benign + EIN).

Characteristics of 20 Independent Reviewers

Table 2 gives descriptive statistics of the 20 reviewing pathologists. The 20 reviewers were trained at 13 different institutions, and practicing at 11 different hospitals at the time of the study. In response to the questionnaire (Table 1) only one observer read none of the recommended text, the other observers read at least one (two observers) or both of the references before evaluating the cases. On a 5-grade scale (1 defines ‘disagree’, while 5 defines ‘absolutely agree’) observers think that the EIN terminology is learnable (mean 4.16) and moderately easy to use in daily practice (mean 3.76).

The extent of concordance between reviewer and expert reference diagnoses (Table 2) was not significantly different by reviewer years of experience (r=0.149, P=0.529), practice type (P=0.926), training institution (P=0.082), practice institution at the time of study (P=0.255), or classification system used in practice (P=0.437).

Diagnostic Style Groups

Figure 1 shows patterns of pathologist diagnosis based on percentage of matching interobserver diagnoses. Discordant cases (Figure 1, rows, marked by arrows), which polarized the reviewing pathologists (Columns) into different diagnostic clusters (diagnostic ‘style groups’) groups were identified (Figure 1, arrows).

Figure 1
figure 1

Diagnostic style groups of 20 independent reviewers. Hierarchical clustering of diagnoses rendered (tile color blue=benign, green=EIN, gold=cancer) in 62 cases (rows) by each of 20 reviewing pathologist (columns). Reviewer assignment to one of three major diagnostic style groups is shown at the top by a colored branching tree: green (n=4), yellow (n=11), and red (n=5). Expert reference diagnoses cosegregate with the red cluster (not shown). Marked rows (arrows) are example case diagnostic discordances that distinguish the green (Cases 11, 26, 14, 38, 16) and red (Cases 51, 6, 25, 44, 61) style groups from the majority yellow group. Hierarchical dendrogram calculated using Wards Linkage, percentage matching distance metric.

Figure 1 is a detailed hierarchical dendrogram organizing each pathologist into one of three distinct diagnostic style groups designated here by the color of the branch termini (green, yellow, red). When the expert reference diagnoses are reintroduced in the same analysis by comparison, the reference diagnoses cosegregate with the red style. Diagnostic style group membership was not significantly associated with reviewer years of experience (P=0.435), practice type (exact P=0.228), training institution (exact P=0.236), practice institution at the time of study (exact P=0.204), or classification system used in practice (exact P=0.376).

Unblinded secondary review was undertaken by one expert pathologist (expert 1) and the study head (reviewer E) for subsets of cases (Figure 1, arrows), which most distinguished the minority red and green diagnostic style groups (Figure 1). This included five cases preferentially diagnosed as EIN by the red group but benign by the others (Figure 1, arrows on the right), and five cases preferentially diagnosed as benign non-EIN by the green group, but as EIN by the others (Figure 1, arrows on the left). Unblinded observations from the secondary review of these cases are shown in Table 3.

Table 3 Unblinded review of selected cases that distinguish diagnostic styles

Discussion

This study shows good interobserver diagnostic reproducibility by a mixed group of pathologists applying published criteria for diagnosis of endometrial biopsies (cases with a previous diagnosis of hyperplasia) according to the EIN schema (benign, EIN, cancer). Benign, premalignant (EIN), and malignant (cancer) diagnoses represent an ascending severity of disease that initiate differing clinical interventions, underscoring the need to discriminate all three entities. A full-spectrum case mix permitted assessment of diagnostic reproducibility across two different decision thresholds of benign vs neoplastic (EIN or cancer) and non-malignant (benign or EIN). Interobserver diagnostic reproducibility for both thresholds was very good, each with κs of 0.64 across all 20 observers. This shows a balanced reproducibility between EIN and the range of lesser and worse entities with which it is likely to be confused.

Previous studies testing the reproducibility of the WHO hyperplasia classification have revealed poor to good interobserver agreement for the high-risk group (Table 4).1, 2, 5, 22, 23 A peculiar feature is that, although the 1994 WHO hyperplasia classification contains four discrete hyperplasia entities (simple/complex, combined with atypical/non-atypical), few of these studies record and analyze four categories exactly as defined by the WHO. Atypical simple hyperplasia has such a low interobserver reproducibility (κ 0.06–0.08)5 and low frequency of occurrence that some groups doubt whether it truly exists5 (Skov), whereas others beg the question by combining it with complex atypical hyperplasias during analysis. The result is that few of the studies use the 4-class WHO scheme as published and most published results represent contractions thereof. In its most extreme form, aggregation of complex and simple atypical hyperplasias with well differentiated carcinoma under the rubric of ‘endometrial neoplasia’ has been proposed.1 In general, these aggregation strategies do improve interobserver reproducibility, there being fewer choices to consider. Thus the low overall interobserver reproducibility for the full 4-class WHO scheme (κ 0.2–0.25),5 improves substantially upon combining simple and complex atypical hyperplasias into a single group (κ 0.40–0.69).1, 2, 22 In these cases, what is actually being reported upon is something different than the published WHO 4-class system. A further factor is that the overall reproducibility likelihood is affected by the range of non-hyperplastic tissues included in the series, especially whether carcinomas and normal tissues are included or excluded during case selection.

Table 4 Summary of the previous studies analyzing the reproducibility of the endometrial hyperplasia diagnosis

The EIN lesion is a singular entity representing a high-risk premalignant lesion, which from its inception has been documented by a published body of molecular, histopathological, and clinical outcome evidence. Similar to the aggregated WHO hyperplasia categories, it has the reproducibility advantage of fewer entities, but differs in achieving this by design and precise primary definition rather than secondary bundling of categories during analysis. Most pathologists now recognize that the four histopathological WHO hyperplasias do not represent four different clinico-biological groups.1, 2, 7, 22 The evidence base for EIN includes new clinical outcome predictive diagnostic features such as lesion size, quantitative gland density, and methods of interpreting cytology that provide pathologists with improved diagnostic criteria relative to those within the 1994-vintage WHO scheme. Implementation of a tripartite (benign-EIN-cancer) scheme is most responsibly accomplished by using expanded and explicit criteria such as those published for EIN.

We carried forward the diagnostic patterns of each pathologist through complete data analysis, allowing us to discern how individual cases divided pathologists into diagnostic clusters or ‘styles’. This is shown intuitively as a graphic matrix of cases by pathologists organized into clusters based on diagnostic proximity (Figure 1). Pathologists fell into three different style groups, which can simply be described as follows: (1) a majority group of 11 pathologists who used a balanced spectrum of benign, EIN, and carcinoma diagnoses (Figure 1, Yellow); (2) four pathologists who favored benign diagnoses and infrequently diagnosed EIN or carcinoma (Figure 1, green); and (3) five pathologists who favored EIN diagnosis more than their peers (Figure 1, red). Pathologists are well aware of this phenomenon of diagnostic style, using terms such as ‘splitters vs lumpers,’ or ‘benign vs malignant’ to classify the diagnostic personality of colleagues. These data demonstrate its expression in practice and suggest that diagnostic behavior of the individual pathologist is subject to limited modification by simple elaboration of diagnostic criteria or retraining. There are many possible personal experiences that might contribute to individual diagnostic style, but in this limited study we were unable to show a correlation with place of training, practice environment, years of practice, or practice type.

Detailed examination of those individual cases capable of polarizing pathologists into style groups provides clues to the basis of predictable, nonrandom diagnostic disagreement. The red style group had a tendency to diagnose reference EIN lesions not appreciated by the other groups, including those with confounding factors of altered differentiation (case 51), small size of localized EIN focus (case 6), or background stromal breakdown (case 25). Correspondingly, the red style group overdiagnosed as EIN some reference benign cases, which were technically inferior preparations (case 44, 61) or had confusing global estrogen-induced changes (case 61). The green style group tended to diagnose EIN reference cases as benign when tubal differentiation was geographically present in the EIN (cases 11, 14, 26, 38), when the EIN was focally distributed within only one of many fragments (case 26) or was present within a polyp (case 16).

The cases with discordant results were usually due to reasons previously described in the literature.8, 16 EIN lesions may arise completely or partially within endometrial polyps and the diagnosis of EIN within a polyp can be problematic24 because polyps themselves have variable gland density and somewhat altered gland cytology.8, 16 It has been stressed before that when making a diagnosis of EIN within a polyp the suspicious focus should be compared with the polyp background and not that of the uninvolved endometrium, so as not to mistake the somewhat altered gland cytology that can be seen within polyps for EIN.25 The minimum size for a lesion to be considered as EIN was determined to be 1 mm in maximum dimension within a single tissue fragment.18, 26, 27 Small foci can be overlooked, also accounting for discordance in our study. Technical artifacts, due to poor fixation and processing, in some cases made it difficult to evaluate both large scale (architecture) and small scale (cytology) features of EIN lesions. This is a problem not specific to the endometrium. Lastly, the altered cytology that characterizes EIN must be interpreted within the context of both cytoplasmic and nuclear findings. Several cases with prominent cytoplasmic changes were construed as benign metaplasias rather than EIN. The clue to an EIN diagnosis in these examples is the clustered and discrete nature of the altered gland focus, in contrast to randomly distributed hormonally induced metaplasias, or reactive cytological change in association with stromal breakdown.25

An important finding of our study is that EIN morphological criteria can be easily learnt and that learning ability is not affected by years of experience. One of the reviewers in the study, who was not a gynecological pathologist and was unfamiliar with the terminology and diagnostic criteria, had a good (κ 0.72) diagnostic agreement with the reference diagnoses. Furthermore, survey results showed that the EIN morphological criteria are easy to learn (graded as 4.16, with 5.0 being ‘agree completely’). These results underscore that self-learning of morphological criteria of EIN is possible and easy. However, reviewing pathologists thought that application of this knowledge to practice is not as easy as learning the criteria. Observers graded ease of application of the diagnostic criteria to daily practice as 3.76 on a 5-grade scale. The pathologists who were using the EIN terminology and diagnostic criteria on their daily practice had similar results and good agreement with the consensus diagnosis (0.69–0.78). We must remember that all of the WHO studies mentioned above use criteria, which every pathologist is familiar with. We may assume that the diagnostic reproducibility of the pathologists will increase as EIN criteria are used in daily practice, as results of our study suggest.

In summary the reproducibility of the EIN diagnoses was high in our study, especially considering the fact that the group consisted of a variety of pathologists. Our group included a mix of speciality gynecological and general pathologists, and we were still able to reach κ values for interobserver agreement of moderate to substantial degree. The high interobserver reproducibility under these circumstances underlines the fact that this terminology is relatively simple, can easily be learned and applied by the inexperienced pathologist and hence has a potential for widespread appeal.