Main

The MYC oncogene is involved in many types of human cancer. The discovery of a consistent balanced chromosomal translocation involving the MYC gene in Burkitt lymphoma was the first evidence to characterize MYC as a human oncogene.1 Subsequently, MYC gene alterations have been discovered in B-cell neoplasms other than Burkitt lymphoma.2 Among those neoplasms, diffuse large B-cell lymphoma is the most widely studied. The presence of MYC rearrangements in patients with diffuse large B-cell lymphoma treated with Rituximab, Cyclophosphamide, Doxorubicin, Vincristine and Prednisone (R-CHOP) has been shown to be associated with poor prognosis.3, 4 In particular, the so-called ‘double-hit’ lymphomas that are characterized by MYC rearrangement and a concurrent rearrangement of other B-cell lymphoma-associated genes such as BCL2 or BCL6 are associated with poor response to therapy, aggressive clinical course and dismal prognosis.5, 6, 7 These lymphomas are classified as ‘B-cell lymphoma, unclassifiable with features intermediate between diffuse large B-cell lymphoma and Burkitt lymphoma’ in the current 2008 WHO classification of hematopoietic neoplasms.8

Evaluation for MYC alterations in diffuse large B-cell lymphoma is typically performed by fluorescence in situ hybridization (FISH). FISH is capable of detecting MYC gene alterations that result from translocation or amplification of the gene. These genetic alterations result in MYC protein overexpression, which is ultimately responsible for the oncogenic effect.1, 2 MYC protein overexpression has also been found to occur as a consequence of other genomic events not detected by FISH.9 Thus FISH might miss a subset of diffuse large B-cell lymphoma cases that demonstrate MYC protein overexpression.

MYC protein expression has been evaluated in formalin-fixed paraffin-embedded tissue by immunohistochemistry in multiple studies.9, 10, 11, 12, 13 These studies have shown that cases of diffuse large B-cell lymphoma with concurrent overexpression of MYC and BCL2 proteins have a dismal prognosis similar to those with double-hit lymphomas. The rate of so-called double-hit lymphoma-like cases as determined by immunohistochemistry is larger than the one detected by FISH.9, 10 Evaluation of MYC expression in these studies is performed by estimating the percentage of MYC protein expressing tumor cells. In the majority of studies, a cutoff of ≥40% is used to define MYC protein overexpression.

Although high rates of inter-observer concordance have been reported in prior studies, scoring is performed on scant material in tissue microarrays.9, 10, 11, 12, 13 Tissue microarrays sample only a small portion of the tumor and therefore may under-represent the heterogeneity of staining among tumor cells encountered in daily practice. The heterogeneity of MYC protein staining is not adequately addressed in these studies. Additionally, the concordance rate is assessed among only two10, 11, 13 or three9 pathologists that might underestimate inter-observer variability among practicing pathologists. We hypothesized that these concordance rates might not be reproducible in daily practice. In this study, we examined the concordance rate in MYC scoring among nine hematopathologists from two institutions when evaluating entire tumor sections and investigated whether scoring a tissue microarray-sized field instead of entire section will improve the concordance. We also identified some features that characterize discrepant cases and evaluated whether careful scoring of these cases can improve concordance. The impact of using an image analysis program was also assessed.

Materials and methods

Case Selection

Following institutional review board approval, two sets of high-grade B-cell lymphomas were selected. The training set contained 13 cases of diffuse large B-cell lymphoma and 4 cases of Burkitt lymphoma diagnosed between 2003 and 2011 at the University of New Mexico and Presbyterian Hospital in Albuquerque, NM, USA. The validation set included 18 cases of diffuse large B-cell lymphoma and 1 case of Burkitt lymphoma diagnosed between 2013 and 2014 in the Department of Pathology at the University of New Mexico. The training set is used to identify potential factors leading to discrepant scoring while the validation set is used to evaluate whether careful scoring of cases characterized by these factors can improve concordance rate among hematopathologists. The cases in both sets were selected to represent various sites, including nodal (neck, mediastinal, axillary, pelvic, para-aortic and inguinal) and extra-nodal (tonsil, brain, thyroid, stomach, small bowel, spleen, uterine cervix, bone marrow and spine) and different specimen types, including resection (thyroid and spleen), excisional biopsy, needle core biopsy and bone marrow biopsy. Cases of Burkitt lymphoma were expected to have a very high MYC expression and served as a quality control.

Immunohistochemistry

Paraffin immunohistochemistry was performed using a monoclonal MYC antibody (clone Y69; Epitomics, Burlingame, CA, USA) at 1:50 dilution and with 24-min incubation. Briefly, four-micron thick recuts of representative paraffin-embedded tissue blocks were baked for at least 30 min in an oven at 60 °C. Deparaffinization, antigen retrieval (CC1 Ventana, pH 8), blockage of endogenous peroxidase activity, antibody dispense and incubation steps were all performed on automated Ultra Benchmark Instrument (Ventana, Tuscan, AZ, USA). Next, the slides were removed from the Ultra instrument after completion of the run, dipped 10–15 times in Dawn water to remove the oil, rinsed in tap water, dehydrated using a graded series of reagent alcohols, dipped in xylene and coverslipped for microscopic review.

Whole-Slide Digitalization

Cases were de-identified, and slides were scanned using the Aperio whole-slide digitalizer (scanscope CS system, Leica Biosystems, Buffalo Grove, IL, USA) at × 20 magnification. A password-protected account was created, and the pathologists were provided access to this account. Slides were reviewed through the Aperio ImageScope software (v11.2.0.780). One-millimeter fields were marked on the digital slides using the circle annotation tool, and the diameter of the field was confirmed using the measure tool. The fields were selected based on having the highest amount of tumor and the least amount of necrosis and/or crush artifact. Image analysis was performed on the discrepant cases to generate an automated score using the Aperio immunohistochemistry nuclear algorithm (v9.1.19.1569). The parameters used in the algorithm were: threshold type, edge threshold method; segmentation type, cytoplasmic rejection; lower threshold, 0; upper threshold, 230; and nuclear threshold, 220. The most critical parameter was the nuclear threshold, and it was selected by comparing various thresholds used to score a specific field to consensus score among pathologists.

Scoring

Six hematopathologists from the University of New Mexico and three hematopathologists from the Presbyterian Hospital scored each set of cases at two different time intervals. Scoring was performed on the digital slides ensuring that the exact same section was scored by all the pathologists. Prior to scoring of both sets, relevant publications were discussed in a journal club, and pathologists were asked to assign their scores based on their understanding of the literature.9, 10 For the training set, pathologists were instructed to avoid areas of necrosis, assign each case a specific score (ie, no range was permitted) and provide their comments on cases they perceived to be difficult to score. Pathologists estimated the percentage of positive tumor cells and reported their scores in increments of 5%.

For the validation set, pathologists were instructed to avoid areas of necrosis and to assign each case a specific score. Additionally, if pathologists identified any of the factors that could explain the reasons for their discrepant score in training set or if their score was within 15 percentage points of the 40%, they were instructed to score that case twice, on two different days, and provide a mean of the two scores.

Scoring of tissue microarray-sized fields in discrepant cases was performed by eight hematopathologists. These fields were clearly marked on the digital slides ensuring that the exact same field was scored by all the pathologists.

Definition of MYC Positivity and Discrepancy

Cases with a mean score of ≥40% MYC nuclear expression in tumor cells were defined as being MYC positive. A discrepant case was defined as any case having ≥1 discrepant scores. A discrepant score was defined as any score that resulted in a different MYC status designation (ie, from negative to positive and vice versa) than that of the mean score of the case.

Statistical Analysis

Each case was given a total of nine individual scores. The mean, s.d. and range of these scores were calculated for each case. Concordance rate was evaluated by Kappa score. A pairwise comparison between every pair of pathologists was first performed to calculate the Cohen’s Kappa score and the corresponding P-value.14 The Fleiss’ Kappa score was then calculated as an index of concordance among all pathologists.15 The analyses were performed using the R software (http://www.R-project.org/) with package irr (R package version 0.84, http://CRAN.R-project.org/package=irr).

Results

Training Set

There was moderate concordance among hematopathologists for scoring MYC expression in diffuse large B-cell lymphoma cases in the training set with a Fleiss Kappa score of 0.69 (P<0.001) and a Fleiss Kappa of 0.71 (P<0.001) for all cases (Table 1). Seven out of the 17 (41%) were discrepant, including three cases of diffuse large B-cell lymphoma and one case of Burkitt lymphoma. Among the discrepant diffuse large B-cell lymphoma cases, the number of discrepant scores were 3 out of 9 (33%) in one case, 2 out of 9 (22%) in two cases and 1 out of 9 (11%) in three cases. The discrepant Burkitt lymphoma case had 4 out of 9 (44%) discrepant scores (Table 2). Of note, the reviewing hematopathologists commented on all the discrepant cases except for case 8 (Table 2). The factors that contributed to discrepant results were identified as: geographic variation of MYC staining, variation in intensity of MYC stain, necrosis and crush artifact (Figures 1, 2, 3, 4). Only one non-discrepant case was commented on by pathologists (Table 2). All the discrepant cases had a mean score within 15 percentage points of the 40% threshold.

Table 1 Fleiss’s Kappa score calculated for different combinations of cases
Table 2 Results of training set scoring
Figure 1
figure 1

Geographic variation of MYC staining. Variability in MYC expression among different areas of tumor; immunoperoxidase stain, original magnification, × 200.

Figure 2
figure 2

Intensity variation of staining. Variability in intensity of MYC expression among tumor cells; immunoperoxidase stain; original magnification × 200.

Figure 3
figure 3

(a, b) Inconsistency in MYC staining in necrotic foci. Geographic necrosis evident on this hematoxylin and eosin-stained histological section of a diffuse large B-cell lymphoma case; (a) hematoxylin and eosin stain; (b) immunoperoxidase stain showing artifactual lack of MYC expression among necrotic portion of the tumor, original magnification × 200.

Figure 4
figure 4

Crush artifact. The majority of tumor in this section is crushed precluding adequate evaluation of MYC expression; immunoperoxidase stain, original magnification × 200.

Validation Set

The concordance rate for scoring the validation set was similar to that of the training set with a Fleiss Kappa score of 0.69 (P<0.001) for all cases and a score of 0.67 (P<0.0001) for diffuse large B-cell lymphoma cases (Table 1). Six out of the 19 cases (32%) were discrepant, all of which were diffuse large B-cell lymphoma. Among the discrepant cases, the number of discrepant scores were 4 out of 9 (44%) in two cases, 3 out of 9 (33%) in two cases and 1 out of 9 (11%) in two cases (Table 3).

Table 3 Results of validation set scoring

Tissue Microarray-Sized Fields on Discrepant Cases

The concordance rate for scoring preselected tissue microarray-sized fields in the discrepant cases was significantly much higher than in scoring entire sections with Fleiss Kappa scores of 0.66 and 0.17, respectively (P-values<0.001). After preselecting for tissue microarray-sized fields, the total number of discrepant cases decreased from 13 to 7 and the total number of discrepant scores decreased from 30 to 9 (Table 4).

Table 4 Comparison of scoring entire sections versus 1-mm fields only in discrepant cases

Scoring of Discrepant Cases Using Image Analysis Program

Nine out of the 13 (69%) and 12 out of the 13 (92%) discrepant cases had an automated score with MYC designation concordant to that of manual scoring when entire sections and 1-mm fields were scored, respectively (see Table 5).

Table 5 Comparison of manual versus automated scoring on entire sections and 1-mm fields of discrepant cases

Discussion

Evaluation of MYC protein overexpression by immunohistochemistry is becoming an important tool in prognostic stratification of patients with diffuse large B-cell lymphoma. Expression of MYC protein by ≥40% of the neoplastic cells has been applied as a cutoff in most studies. These studies report high concordance in scoring of MYC expression in diffuse large B-cell lymphoma among pathologists. However, concordance has been assessed among few pathologists who performed scoring on tissue microarrays. An accurate scoring of MYC expression on tissue microarrays can be problematic given the limited tissue present for evaluation, which might not be representative. In this study, we investigated the concordance rate among a larger number of hematopathologists who performed the MYC scoring on the entire biopsy sections of DLBCL cases. The study also identified features of discrepant cases and investigated whether careful scoring of such cases can improve concordance rate.

The overall concordance rate among the nine hematopathologists who participated in our study was lower than the one reported previously in the literature (Table 1). Twelve out of the 31 cases of diffuse large B-cell lymphoma (39%) from both training and validation set showed discrepant results in MYC scoring. Two of the discrepant cases (cases 26 and 27) were particularly difficult to score, and the pathologists were almost divided on the MYC status in these two cases (Table 3). The most important feature noted in the discrepant cases was variation of MYC staining across the tumor, including variation in distribution of staining, intensity of the staining or both (Figures 1 and 2). Additional features that contributed to discrepant results included necrosis and crush artifact (Figures 3 and 4). Our findings indicate that MYC expression scoring on a representative biopsy section could be a significantly challenging task due to heterogeneity in distribution and intensity of MYC staining that may not be a factor in a tissue microarray. As there are no established criteria in the current literature that addresses the optimal approach to the issue of staining heterogeneity, significant inter-observer discrepancy on MYC scoring is expected when adequate tissue samples are evaluated.

The impact of a careful scoring strategy was evaluated in the validation set. Pathologists were instructed to score certain cases twice on two separate days and provide a mean of the two scores. Cases demonstrating staining heterogeneity, necrosis and crush artifact were all scored twice. Additionally, as all discrepant cases from the training set had a mean score within 15 percentage points of the 40% cutoff, any case in the validation set that was assigned a score between 25% and 55% was also scored twice. Despite this additional re-evaluation step, there was no significant difference in concordance between the two sets (Table 1).

The effect of preselecting tissue microarray-sized fields was evaluated among the discrepant cases. When pathologists scored 1-mm diameter circular fields instead of entire sections, the concordance was significantly much higher as indicated by kappa scores (see Table 1). The number of discrepant cases dropped by 46% (from 13 cases to 7 cases), and the total number of discrepant scores dropped by 70% (from 30 to 9 scores). This indicates that preselecting for tissue microarray-sized fields significantly improves concordance among pathologists but does not entirely eliminate discordant cases.

The potential effect of using an image analysis program to improve scoring of challenging cases was also investigated by using the Aperio immunohistochemistry nuclear algorithm. We applied this algorithm to score entire sections and tissue microarray-sized fields in discrepant cases and compared the automated scores with the mean score rendered by pathologist on each case. The automated and manual scores resulted in concordant MYC designation in 9 out of the 13 (69%) and 12 out of the 13 (92%) cases when entire sections and tissue microarray-sized fields were scored, respectively. This indicates that image analysis might be helpful in cases that are difficult to score. Preselection for tissue microarray-sized field still had a positive effect on concordance even when automated scoring was employed. One important caveat for using image analysis is that its performance is significantly impacted by the parameters used in the algorithm. In our study, the most significant parameter is nuclear threshold. The specific value selected for this parameter will determine how intense the nuclear staining has to be for the software to call it positive.

Our findings indicate that a significant number of diffuse large B-cell lymphoma cases are inherently difficult to score for MYC protein expression. Careful scoring of potentially difficult cases does not improve concordance in our study. Prior studies indicated that discrepant cases were resolved through group review at multi-headed microscope but did not provide any further details.9, 10, 13 As staining heterogeneity is the most significant factor causing discrepancy, specific instructions on how to address this problem are needed to improve concordance.

As expected, four cases of Burkitt lymphoma had very high MYC expression and perfect scoring consistency. Interestingly, one case of Burkitt lymphoma (case 9) was very difficult to score. This case was re-reviewed in its entirety by three hematopathologists. Based on morphological and immunohistochemical findings, two hematopathologists agreed with the original diagnosis of Burkitt lymphoma, and one hematopathologist thought it represented a B-cell lymphoma, unclassifiable, with features intermediate between diffuse large B-cell lymphoma and Burkitt lymphoma. FISH analysis did not show evidence of MYC-IGH rearrangement. The discrepant scoring in this case could be due to the fact that this case actually represents a grey zone lymphoma rather than Burkitt lymphoma although up to 10% of Burkitt lymphoma cases may lack a demonstrable MYC translocation by FISH.16 Other explanations include the presence of extensive necrosis or antigen decay due to the lengthy storage of paraffin block for 9 years in this case.

In summary, our findings in this study indicate that an accurate evaluation of MYC protein overexpression by immunohistochemistry is more challenging than previously described and may lead to discrepant MYC status designation among pathologists in a significant proportion of cases. Until specific instructions about how to deal with staining heterogeneity becomes available, pathologists are advised to exercise caution when interpreting MYC protein expression by immunohistochemistry, especially in cases with staining heterogeneity or scores close to 40%.