Main

While determination of histological type for ovarian carcinoma can be excellent,1 reproducibility of histological type for endometrial carcinoma has been reported to be quite variable, depending on whether all endometrial carcinomas or just the subset of high-grade cases are examined.2, 3 Despite comprehensive literature on the morphological features of endometrial carcinoma types,4, 5 problems remain in distinguishing serous carcinomas from FIGO grade 3 endometrioid carcinomas, as well as diagnosing clear-cell carcinoma.6, 7, 8, 9 Histological-type assessment in high-grade endometrial carcinomas has been singled out as one of the major challenges in gynecological pathology at a recent Society of Gynecologic Oncologists of Canada meeting. It will gain importance as different treatment strategies are considered for different type of endometrial carcinomas.

Recent progress in molecular pathology has provided a rough framework of oncogenic alterations in endometrial carcinomas, showing type-specific molecular alterations in high-grade endometrioid versus serous carcinomas.10, 11, 12, 13, 14, 15

The aim of this study was to assess specifically the interobserver agreement, recorded by gynecological pathologists from five academic centers across Canada, of histological type in high-grade endometrial carcinomas. A second aim was to correlate morphological diagnosis with a set of six routine immunohistochemical markers (TP53, CDKN2A (p16), ER, PGR, Ki67, and VIM) as well as a set of six experimental immunohistochemical markers (PTEN, ARID1A, CTNNB1, IGF2BP3, HNF1B, and TFF3). PTEN, ARID1A, CTNNB1 (β-catenin), and HNF1B have all been studied recently for their association with histological types of endometrial carcinoma. IGF2BP3, insulin-like growth factor II mRNA-binding protein 3 (also known as IMP3), is an oncofetal protein that is highly expressed in endometrial serous carcinomas. TTF3, or trefoil factor 3, has also been reported to be specifically expressed in endometrioid carcinomas. We chose a marker panel that is in routine diagnostic use in many laboratories and appended some experimental markers from the recent literature that might aid in the differential diagnosis of endometrioid, serous, or clear-cell types.6, 7, 8, 11, 16, 17, 18, 19, 20, 21, 22, 23

Materials and methods

Case Selection and Participants

One hundred and sixteen hysterectomy cases with high-grade endometrial carcinomas, collected between 2005 and 2011 with slides and blocks archived in Calgary Laboratory Services, were identified from the Tom Baker Cancer Centre gynecological oncology tumor board review consultation file of one author (MAD). Histological slides were reviewed and a pathologist (GH) selected one to two representative slides from each case. The interobserver agreement study involved five pathologists with specialist training in gynecological pathology from five Canadian academic institutions (PC, MK, CE, JA, and MC).

Inter- and Intraobserver Agreement

All participants reviewed the glass slide set independently, blinded to clinical information and immunohistochemical results. The participants were asked to submit type and grade according to their practice without prior training or agreement on criteria (submitted diagnosis). These submitted diagnosis were then grouped into six categories (categorized diagnosis) that may potentially lead to significant difference in clinical management or prognosis: endometrioid adenocarcinoma FIGO grade 3, endometrioid carcinoma low-grade (comprises FIGO grade 1 or 2 endometrioid carcinomas), categorized serous carcinoma (comprises pure serous carcinoma and mixed endometrioid/serous carcinoma), categorized clear-cell carcinoma (comprises pure clear-cell carcinoma and mixed endometrioid/clear-cell carcinoma), high-grade endometrial carcinoma, not otherwise specified (comprises dedifferentiated and undifferentiated carcinomas), and non-endometrial carcinoma (comprises carcinosarcoma and tumors suspicious for metastatic disease).

The histological slides were scanned using an Aperio ScanScope CS (Aperio, Vista CA, USA) and are accessible under http://diagnostics.vetmed.ucalgary.ca/ (username: hgecgroup; Password: rememdium). The case IDs were rearranged in a random manner, and three of the participants (MK, MC, and JA) also reviewed the scanned slides online from computer screen with a 5-month time interval between the two attempts to assess for intraobserver reproducibility.

A consensus diagnosis was defined by ≥80% agreement, that is, four out of five observers agreed. Cases without consensus were classified into major disagreement with a discrepancy between low-grade endometrioid versus high-grade endometrial and minor disagreement with a discrepancy of type within the high-grade endometrial group.

Tissue Microarray Construction and Immunohistochemistry

Tissue microarrays of the 116 cases were reconstructed in 0.6-mm duplicate cores (Pathology Devices, Westminster, MD, USA). Immunohistochemistry was performed using standard semiautomated platforms. Antibodies and description of scoring cutoffs are depicted in Table 1.24 Scoring was independently performed by two reviewers (GH and MK). In discrepant cases, consensus was achieved at a multiheaded microscope.

Table 1 Antibodies and scoring methods

Statistics

Inter- and intraobserver agreements were calculated using the κ-statistic. k-Values were calculated based on both the submitted diagnosis, and the categorized diagnosis. A κ-value of <0.4 indicates poor agreement, 0.4–0.6 indicates moderate agreement, 0.6–0.8 substantial/good agreement, and 0.8–1.0 near-perfect/excellent agreement.25

To search for the most characteristic immunoprofile of endometrioid carcinoma FIGO grade 3 and serous carcinoma, consensus diagnosis of endometrioid carcinoma FIGO grade 3 and serous carcinoma were used as end points for full model fit nominal logistic regression modeling. The model was started with fitting the full 12-marker panel and utilized a manual, iterative backwards elimination process.23 The criterion for the exclusion of a particular marker was based on the highest P-value in the effect likelihood ratio test. From the model predictions diagnosis of endometrioid carcinoma FIGO grade 3 or serous carcinoma, a receiver operator characteristic area under the curve was calculated and a confusion matrix of model predicted versus morphological consensus type was generated. From the confusion matrix, sensitivity, specificity, positive predictive value and negative predictive value were calculated including 95% confidence interval. Recursive partitioning modeling was used to search for a hierarchical order of immunohistochemical markers that could distinguish between histological types. The statistical analyses were performed using JMP v. 9.0.1 (SAS Institute, Cary, NC, USA).

Results

For the 116 cases, the five reviewers submitted up to 11 diagnostic categories according to their daily practice. The paired interobserver agreement for the submitted diagnosis ranged from κ 0.37 to 0.56 (median 0.47) and for the categorized diagnosis from κ 0.50 to 0.63 (median 0.575; Table 2). Intraobserver reproducibility for three reviewers repeating the review after a 5-month interval for the submitted diagnosis ranged from κ 0.43 to 0.65 (median 0.51) and for the categorized diagnosis from κ 0.49 to 0.67 (median 0.61). The three most common submitted diagnoses were endometrioid carcinoma FIGO grade 3, serous carcinoma, and clear-cell carcinoma (Figure 1). A consensus diagnosis was established in 74 out of 116 cases (64%) with the submitted diagnosis and frequencies shown in Table 3. Clear-cell, dedifferentiated and undifferentiated carcinomas were called with a relatively narrow range close to the consensus frequency. In contrast, serous, FIGO grade 2 endometrioid adenocarcinoma and mixed endometrioid/serous carcinomas showed a wide range of calls much higher than their consensus diagnosis. For example, mixed endometrioid/serous carcinoma was diagnosed in average nine times, but consensus was not reached in a single case. Submitted diagnoses were categorized into six categories (see Materials and methods and Table 3). For the categorized diagnosis, consensus was reached in 84 out of 116 cases (72%) (Table 3). Of the 32 (28%) cases with no consensus, 6 cases (5%) had major disagreement (ie, high-grade versus low-grade), and 26 cases (23%) had minor disagreement (ie, tumor type within the high-grade group).

Table 2 Kappa-values for inter- and intraobserver reproducibility
Figure 1
figure 1

Examples of endometrial carcinomas with consensus diagnosis. (a and b) Serous carcinoma showing papillary budding, slit-like spaces and diffuse severe nuclear atypia. (c and d) Endometrioid carcinoma, FIGO grade 3 showing >50% of area with solid architecture, and moderate nuclear atypia. (e and f) Clear-cell carcinoma displaying tubulocystic and papillary architecture with stromal hyalinization and low mitotic activity.

Table 3 Submitted and categorized diagnosis from a total of 116 cases

For the resulting categorized consensus diagnosis, the mean age for endometrioid carcinoma FIGO grade 3 was 61 years (range 38–88 years), for categorized serous carcinoma 67 years (range 46–83 years), and for categorized clear-cell carcinoma 71 years (range 56–86 years) compared with the mean age for the entire cohort that was 65 years (range 38–88 years). Of the consensus diagnosis of endometrioid carcinoma FIGO grade 3, 46% were in FIGO stage III, versus 69% of categorized serous carcinoma and 20% of categorized clear-cell carcinoma (as compared to 47% for the entire cohort).

In a next step, we assessed the expression of 12 immunohistochemical markers on a tissue microarray containing these 116 cases. The frequency of marker expression across submitted consensus diagnosis is shown in Table 4. The total number of cases per groups slightly varies because of drop out of cores from tissue microarray sections. We then assessed the diagnostic value of marker combinations for the differential diagnosis of endometrioid carcinoma FIGO grade 3 versus pure serous carcinoma, that is, submitted diagnosis. We restricted the analysis to these two types because their distinction is a common diagnostic problem in practice and both represented the two largest groups in our study. For n=40 endometrioid carcinoma FIGO grade 3 and n=15 serous carcinoma, data from the full marker set were available for analysis. A nominal logistic regression model was fitted with all 12 markers and consensus endometrioid carcinoma FIGO grade 3 versus serous carcinoma was chosen as model end point. After eliminating five markers, a seven-marker combination consisting of ER, PGR, CDKN2A (p16), TP53, VIM, PTEN, and IGF2BP3 still showed an area under the curve by receiver operator characteristic of 1.00, that is, 100% concordance in predicting endometrioid carcinoma FIGO grade 3 versus serous carcinoma compared with the morphological gold standard (Table 5). The performance of alternative smaller two- to four-marker combinations is shown in Table 5.

Table 4 Marker positivity across submitted consensus diagnosis
Table 5 Diagnostic test performance of marker combinations for serous carcinoma (n=15) versus endometrioid carcinoma FIGO grade 3 (n=40) morphological consensus diagnosis

We next used the most accurate seven-marker combination to predict the type of cases with major and minor disagreement. From the 26 cases with minor disagreement, in 16 cases (14%, 16/116) the diagnostic difficulty was endometrioid carcinoma FIGO grade 3 versus serous carcinoma. Using the seven-marker immunohistochemical classifier, 6 out of 16 (38%) of cases were predicted as serous carcinoma, while the remaining 62% were predicted as endometrioid carcinoma FIGO grade 3 with >90% probability. In five of the six cases with major disagreement, the model predicted a diagnosis of serous carcinoma in one and endometrioid carcinoma in the remaining cases (Table 6 and Figure 2).

Table 6 Description of six cases with major disagreement
Figure 2
figure 2

Histology of the six cases with major disagreement (for details see Table 6).

In an alternative approach, we fitted the five most widely used immunohistochemical markers (TP53, CDKN2A (p16), ER, PGR, and VIM) in a recursive partitioning model to distinguish between endometrioid carcinoma FIGO grade 3 and serous carcinoma. A hierarchical decision tree was generated based on three of the markers, TP53, CDKN2A (p16), and ER (as shown in Figure 3). Majority (97%) of the cases with TP53 wild-type expression were endometrioid carcinoma FIGO grade 3. Eleven of the 43 endometrioid carcinoma FIGO grade 3 (26%) and 15 of the 16 serous carcinoma (94%) were found to have aberrant TP53 expression (Figures 3 and 4). Cases with aberrant TP53 expression and low ER expression were almost always serous carcinoma, while cases with aberrant TP53 expression but high ER and patchy (non-diffuse) CDKN2A (p16) expression were all endometrioid carcinoma FIGO grade 3. A triple positive immunoprofile was seen in endometrioid carcinoma FIGO grade 3 and serous carcinoma in similar frequencies. A supplementary figure (Figure S1) shows the results from recursive partitioning using the 7-marker panel (TP53, PTEN, VIM, IGF2BP3, PGR, CDKN2A, ER) as input for the best nominal logistic regression model.

Figure 3
figure 3

Recursive partitioning modeling of 43 consensus FIGO grade 3 (The International Federation of Gynecology and Obstetrics grade 3) endometrioid carcinoma (EC3) and 16 consensus serous carcinoma (SC). Three levels of hierarchical splits are shown based on the expression of TP53 aberrant versus wild-type, estrogen receptor (ER) low vs high and CDKN2A (p16) patchy vs diffuse.

Figure 4
figure 4

Immunohistochemistry of typical consensus FIGO grade 3 (The International Federation of Gynecology and Obstetrics grade 3) endometrioid carcinoma and serous carcinoma. (a) Serous carcinoma, diffuse strong TP53 expression in almost all cells (aberrant TP53 expression). (b) Endometrioid carcinoma FIGO grade 3, patchy TP53 expression. Note that the majority of tumor cells show weak to moderate expression. (c) Serous carcinoma, estrogen receptor (ER) staining in <50% of tumor cells. (d) Endometrioid carcinoma FIGO grade 3, diffuse ER expression. (e) Serous carcinoma, diffuse CDKN2A (p16) expression; note the negative normal endometrium in the adjacent area. (f) Endometrioid carcinoma FIGO grade 3, patchy CDKN2A (p16) expression.

Discussion

Establishing accurate histological cell type diagnosis using a uniform approach is critical for inclusion into histological type-specific clinical trials. One example of such an approach is provided by a recently launched clinical trial, which includes endometrial clear-cell carcinomas together with other rare tumor types (NCIC: IND.206). Our study shows that interobserver reproducibility for high-grade endometrial carcinoma is moderate (κ 0.575), which is slightly less (κ 0.67) than what was reported by Nedergaard et al2 seventeen years ago.2 The difference between the study by Nedergaard et al2 and ours is that we enriched for challenging cases within the high-grade category, as compared to using disease prevalence-based selection criteria, which predominantly yield cohorts composed of ≥85% of endometrioid type. In 28% of our cases, no consensus could be reached even after categorization according to current clinical management. Among those, 5% had a major disagreement, which we defined as low-grade versus high-grade endometrial carcinoma. Within the minor disagreements, the most common problem was the distinction between serous carcinoma and endometrioid carcinoma FIGO grade 3, which occurred in 16% of cases. Similar problems have been identified in several previous articles.7, 8 Other studies have looked at interobserver agreement of endometrial carcinomas preselected for the presence of clear cells and reported moderate interobserver agreement.9, 26 In our study, clear-cell carcinoma was diagnosed in a more robust manner, maybe because of the fact that we did not preselect for difficult cases with clear-cell changes. Interestingly, diagnosis of dedifferentiated or undifferentiated carcinoma was also made with good agreement.27, 28

The question remains as to why the distinction of endometrioid and serous carcinomas is so difficult in the endometrium, while it seems to be resolved in the ovary. A possible explanation may be that endometrioid carcinomas account for only 10% of ovarian carcinomas, of which fewer than 20% are high-grade, making high-grade endometrioid versus high-grade serous carcinoma a less encountered problem in the ovary.29 The majority of the high-grade ovarian carcinomas with glandular architecture are currently considered to be the same as high-grade serous carcinomas.30, 31 High-grade serous and endometrioid carcinomas of the ovary have different cells of origin, with fallopian tube-type tissue and endometriosis being the respective putative tissue of origin in majority of the cases. Hence, to prove serous cell lineage, a specific biomarker such as WT1 can assist in difficult cases. In contrast, both endometrial carcinomas of endometrioid and serous type are presumably derived from the same cell of origin: the endometrium.32 Therefore, unlike in the ovary, discrimination relies purely on differential oncogenic events. The shared common cell lineage, and potential overlapping oncogenic pathways, may be reflected by endometrial carcinomas displaying ambiguous morphology that are between classical serous and endometrioid types. This notion is supported by the fact that the intraobserver agreement is in a similar range as the interobserver agreement; this suggests that it is less likely that differences in training or use of diagnostic criteria between institutions led to disagreements.

Several biomarkers have been implicated in carcinogenesis of endometrioid and serous carcinoma of the endometrium, but no single marker shows sufficient sensitivity or specificity to serve as a stand-alone diagnostic tool. We assessed 12 biomarkers for which immunohistochemical assays were available on the study cohort. The staining frequency of individual markers in our study is consistent with previous studies.6, 7, 8, 11, 16, 17, 18, 19, 20, 21, 22 By analyzing marker combinations using nominal logistic regression models and restricting the analysis to cases with an unambiguous morphological diagnosis (submitted consensus diagnosis), we identified a seven-marker panel (ER, CDKN2A (p16), TP53, VIM, PTEN, PGR, and IGF2BP3) that could differentiate endometrioid carcinoma FIGO grade 3 from serous carcinoma with 100% concordance compared with the morphological gold standard. Interestingly, a similar approach to the subclassification of high-grade endometrial carcinomas using mutational profiles did not generate a reliable multivariate logistic regression model.13 This is possibly due to the lower discrimination of mutational data between types: that is, PI3K mutations are seen in endometrioid carcinoma FIGO grade 3 and serous carcinoma. It has also been shown that certain immunohistochemical markers such as PTEN ‘outperform gene sequencing’.22 Nevertheless, it is prudent to study how mutational status and immunohistochemical profiles overlap with each other and with morphology. Potentially, the best markers that arise from both techniques could be used as an ancillary technique to morphology to subclassify high-grade endometrial carcinomas in the future. Of note, this model is based on only 40 consensus endometrioid carcinoma FIGO grade 3 and 15 consensus serous carcinoma for which the full data set was available, and there is the potential for overfitting of the model. Rigorous external validation using larger sample sizes in a consortium-type approach is needed. In this exploratory analysis, the 95% confidence intervals indicate that the seven-marker panel has a negative predictive value of at least 89% to rule out serous carcinoma 19 times out of 20. This is important clinical information since treatment of endometrioid carcinoma may require a less aggressive surgical approach and a different adjuvant combination therapy when compared with serous carcinoma. A seven-marker panel may not be practical. An issue for consideration is the interpretation of complex biomarker panels, that is, more than three markers, with numerous possible combinations and potential contradicting results. The generation of probabilities using nominal logistic regression models for marker combinations that exceed three markers may be necessary in future validation studies.

Our study and other recent studies have indicated that wild-type TP53 expression in combination with tumor morphology is often sufficient to distinguish endometrioid carcinoma FIGO grade 3 from serous carcinoma.18, 19 However, a significant number of endometrioid carcinoma FIGO grade 3 can show aberrant TP53 expression. In a daily practice with limited immunohistochemical marker availability, a hierarchical decision tree similar to Figure 3 may be useful. In our study, among all consensus endometrioid carcinoma FIGO grade 3 and serous carcinoma with aberrant TP53 expression, cases with low ER expression were serous carcinoma, whereas high ER expression and patchy CDKN2A (p16) expression was almost exclusively with endometrioid carcinoma FIGO grade 3. A triple positive immunoprofile (TP53 aberrant, CDKN2A (p16) diffuse and ER high expression) is non-informative, and additional biomarker such as ARID1A and PTEN are needed for this differential diagnosis.

Garg and Soslow8 and Alkushi et al33 promoted the use of a new grading system, which is independent of type, for cases with ambiguous morphology. However, our data show that reproducible histological type diagnosis is possible in approximately two-thirds of high-grade endometrial carcinomas and that immunohistochemistry can predict a certain histological type in the remaining cases. We recognize the fact that a small number of tumors may demonstrate ambiguous morphology and inconclusive immunohistochemical results. We believe the term ‘ambiguous carcinoma, high grade’ should be strictly reserved for such cases. Based on our data showing promising results in achieving robust differentiation of traditional histological type by immunohistochemistry, we would be cautious about the liberal use of yet another typing/grading system. We would favor the classical approach of trying to commit to a histological type/grade combination, aided, of course, by ancillary markers. Further study is needed to determine whether the immunohistochemically aided diagnoses of histological type correlates with response to specific therapies. Such attempts are in their infancy and any new biomarkers will require robust biological and technical validation in independent studies before their inclusion into the clinical panel of any laboratory becomes a common practice. Potentially unbiased comprehensive large-scale projects, such as the Cancer Genome Atlas project (TCGA), may identify further biomarkers that can add to the current classification systems.