Main

The 2004 WHO classification of lung cancer contained four major patterns of adenocarcinoma: bronchioloalveolar, acinar, papillary and solid pattern with the most common pattern consisting of a mixture of these four subtypes.1 In the recent IASLC/ATS/ERS lung adenocarcinoma classification several major changes are made.2 First, the mixed subtype category is discontinued and tumors are subtyped according to the predominant pattern following a comprehensive semiquantitatively estimating the percentage of each of the adenocarcinoma histologic patterns. However, evidence for use of predominant patterns to improve reproducibility pattern diagnosis was at the time not available. Therefore, this was put forth as a weak recommendation with low quality of evidence (Pathology Recommendation 4).1 Second, the term bronchioloalveolar carcinoma (BAC) is no longer used, as BAC was being interpreted in four different ways: (1) adenocarcinoma in situ, (2) minimally invasive adenocarcinoma, (3) overtly invasive adenocarcinoma with a lepidic pattern and (4) invasive mucinous adenocarcinoma (formerly mucinous BAC). In addition, micropapillary adenocarcinoma was added as a fifth major pattern due to its association with poor prognosis.3, 4 Diagnostic inconsistencies may originate from difficulties in interpretation due to subjective application of existing criteria.

In the past, the distinction between small-cell and non-small-cell lung cancers has been shown to have high accuracy and reproducibility.5, 6 Also, in resection specimen, accuracy in distinguishing squamous cell carcinoma from adenocarcinoma has been repeatedly demonstrated, even though cases that are difficult to classify by morphology alone may remain in poorly differentiated tumors.5, 6, 7, 8, 9, 10 However, there remains a lack of data on reproducibility in relation to identifying predominant patterns of adenocarcinoma.

The intention of this study was therefore to assess the reproducibility of histopathological subtyping for adenocarcinomas among pulmonary pathologists from three continents, with respect to both ‘histologic patterns’ and ‘invasion’.

Materials and methods

To assess reproducibility of adenocarcinoma subtyping, two ring studies were performed. In the first study, 19 pathologists were asked to submit six cases, consisting of micro-photomicrographs, pasted into a PowerPoint slide. Five of the six cases represent one example each of the five typical histological patterns of adenocarcinoma: acinar, non-mucinous lepidic (formerly BAC), micropapillary, papillary and solid pattern, as perceived by the contributing pathologist.1 The sixth case was regarded as a difficult case by the contributing pathologist. The PowerPoint slide contained two images: the left sided one very-low-magnification picture (objective 2 to × 4) to represent the general architectural pattern, and the right side one showing a higher magnification (objective × 10, or minimally four sizes of ‘normal’ alveolar spaces in the longest axis of the image) to highlight the diagnostic area to be evaluated. The assumption was made that assessments would be made only on the high-magnification images. Cases (n=115) were randomized and blinded to participants who classified cases by their dominant pattern; if more than one pattern was recognized, then additional pattern(s) were provided additionally (Table 1).

Table 1 CRF contains the possible categories for diagnosing the case(s)

After the first round, participants felt that difficult cases were centered on the concept of ‘invasion’ (distinguishing pure lepidic pattern from others), which led to the initiation of a second ring study. For the second study, 10 pathologists submitted photomicrographs in similar manner as before, but to show typical invasion (n=20), no invasion (n=20) and ‘problem cases’ (n=24). All cases were randomized (JK) and for each case the reviewer were asked to provide a ‘score’ on invasion using five categories: invasion, definite or probable, no invasion, definite or probable, and undetermined.

Statistical Analysis

Kappa score was calculated for the typical cases (separately for five typical patterns and typical invasion) and difficult cases (separately for patterns and invasion) by comparing the scores of submitting pathologist with the ‘blind’ reading of 26 pathologists. For difficult cases, kappa score was calculated for combinations of all pathologists. For the pattern subtyping a dominant and a subscore were calculated for each case and pathologist. The subscore for each subpattern (=non-dominant histologic pattern) was 32, 24, 20 or 16 in case of 1, 2, 3 or 4 subpatterns. The dominant pattern received 512 points minus the number of subpatterns times the subscore. The total score for each pattern was obtained by summing the scores over all pathologists. The overall dominant score for each case was then defined as the highest total score divided by the sum of all total scores. For each ‘typical’ pattern, the number of cases where 70 of the pathologists scored a submitted pattern as single or a predominant pattern were also calculated.

In the second study on ‘invasion’, kappa was analyzed for five and three categories. For each case an invasion score was calculated: definite invasion=one point; probable invasion=two points; undetermined=three points; probable no-invasion=four points; definite non-invasive=five points.

Results

Pattern Reproducibility

In the first ring study, the reproducibility of pattern classification was performed on 115 cases by 26 pathologists. The 115 cases were as follows: acinar (n=20), lepidic (formerly BAC, n=19), micropapillary (n=16), papillary (n=19), solid (n=20) and difficult cases (n=21). In total, the number of scores for the typical patterns combined and for the difficult cases were 2444 and 546, respectively.

The kappa score (mean and s.d.) calculated between all pathologists for the five typical patterns combined was 0.77±0.07. For the difficult cases, mean and s.d. for Kappa was 0.38±0.14. The distribution of kappa scores for all cases is shown in Figure 1.

Figure 1
figure 1

Distribution of kappa scores between all pairs of pathologists for typical cases (a) and difficult (b) cases.

Dominant scores were calculated as described before. A dominant score close to 1 is perfect agreement, whereas a score close to 0 corresponds to major disagreement. The distribution of the scores for each pattern is shown in Figure 2. Overall, two patterns have reasonable to good concordance: lepidic and solid. The range of dominant scores for acinar, micropapillary and papillary carcinoma varied to a larger extend. The difficult case category had the lowest average score of agreement.

Figure 2
figure 2

Box plot distribution of the dominant pattern score (1=perfect agreement, 0=no agreement) is shown for the ‘typical’ patterns according to the submitting pathologist for each of the five histologic subtypes: acinar, lepidic, microp(apillary), papillary and solid. Note that box represents interquartile range (IQR), line in the box is median, wiskers 1.5 × IQR and occasional outliers (o,*) are numbered cases.

For the typical cases more than one pattern was recorded in 848 of the 2444 (35%) scores, indicating heterogeneity of adenocarcinoma patterns. In 1048 of the 1205 (87%) diagnostic scores with more than one pattern, two patterns were scored. Three or more scores were present in 13% of the cases. Call overlap existed between all patterns, except between solid and lepidic pattern. Patterns of overlap in adenocarcinoma sub-classification are shown in Table 2. Perhaps not surprisingly, the highest overlap was noted between papillary and micropapillary patterns.

Table 2 Cross table of 5 patterns of adenocarcinoma subclassification denoting overlap areas of the diagnostic scores (in percentages) with more than one pattern

The concordance rates among 26 pathologists to recognize 70% of the submitted typical patterns as a single pattern ranged from 12–65%, lowest for micropapillary and highest for solid types (Table 3). However, when the submitted pattern was recognized as the predominant pattern (combining single plus multiple patterns with the submitted pattern being the predominant one), the concordance rates between submitted and dominant patterns reached 62–100% (Table 3); four of the patterns scoring ≥92% except micropapillary. In general, the pattern of overlap was similar for both submitted ‘typical’ and ‘difficult’ cases. Examples of overlap in pattern diagnoses are shown in Figure 3.

Table 3 The number of pathologists among 26, who scored at least 70% of the cases correctly as a single pattern or predominant pattern
Figure 3
figure 3figure 3

Examples of unanimous lepidic (a), acinar (b), micropapillary (c), papillary (d) and solid (e) pattern are shown, as well as examples of more than one pattern with percent of pathologists recording that pattern (with judgement on second image, fj). Patterns scored by >10% of the pathologists are mentioned. f: overlap solid (96%), acinar (93%), micropapillary (15%), papillary (15%); g: overlap micropapillary (96%), papillary (11%); h: overlap AIS (92%), acinar (78%); i: overlap AIS (78%), papillary (54%); j: overlap AIS (66%), micropapillary (42%), papillary (30%) and acinar (27%).

Invasion Reproducibility

In the second ring study with an emphasis on reproducibility of invasion, 28 pathologists scored the 64 cases. For each case, an invasion score was calculated. A score of 1 with standard deviation of 0 indicated that all 28 scores were definite invasion and vice versa a score of 5 for a case showed perfect agreement for non-invasion. In Table 4, the distribution of invasion scores is shown. Complete agreement was present in only 6 out of 64 cases, combining the probable and definite categories. In 37 cases, at least five readings differed from the majority score (presence or absence of invasion).

Table 4 The invasion score (mean±standard deviation, s.d.) is shown for each case, ranked from high (4.93 non-invasive score) to low (1.0 invasive score)

In 15 cases, the scores were equivocal for invasion and non-invasion: at least 9 scores were for invasion and at least 9 against invasion. It turned out that the same pathologists were systematically scoring invasion and another group of pathologists consistently diagnosed no invasion on the same cases.

Examples of definite non-invasion were similar to AIS/lepidic pattern. Examples of tumor cases with equivocal and definite invasion scores are shown in Figure 4. Morphologic features attributable to discrepant interpretation of the pattern and invasion judgments appear to include: (i) the characteristics of background stroma (fibroblastic reaction versus dense fibrosis), (ii) occurrence of inflammation, (iii) tumor architecture, (iv) presence of micropapillary component, and (v) detection of mucinous component.

Figure 4
figure 4figure 4

Examples of unanimous absence of invasion (a) and definite invasion (b) are shown, as well examples of cases (judgement on second image) with split opinion (ch) having at least nine pathologists for invasion (‘invasion yes ≥9’) and a different group of at least 9 for non-invasion (‘NO≥9’). In two cases images of another slide (same case) was available as well (e3, e4 elastic stain and f3, f4 elastic stains).

Kappa statistics for easy cases of invasion (mean value±s.d.) was 0.55±0.06 when relying on five as well as on three categories (ie, putting together probable invasion and invasion, and probable no-invasion and non-invasion). For the difficult invasion cases, the kappa value was 0.08±0.02 when relying on five and 0.15±0.05 on three categories. Splitting the pathologists in two categories based on the 15 cases with equivocal scoring into a group A, which favored invasion (n=14), and group B, which favored no invasion, the kappa scores for groups A and B for the easy cases (3 categories) were 0.61±0.06 and 0.59±0.07. In contrast, the kappa scores for the difficult cases for group A and group B were 0.16±0.09 and 0.27±0.15, respectively. The improvement in kappa scores for pathologists in groups A and B supports a difference in diagnostic interpretation. Comparing the composition of the two groups, there was some segregation of pathologists in relation to different countries (Chi-2, P=0.02).

Discussion

In these image-based ring studies, substantial reproducibility was found for typical patterns of pulmonary adenocarcinoma subtypes. When multiple patterns are present, and for the assessment of invasion in pulmonary adenocarcinoma, the reproducibility level was good (0.77) for cases showing classic architectural patterns and fair for classic invasion (0.55), but low to poor for problematic pattern and invasion cases.

The kappa score for adenocarcinoma subtyping (0.77–0.38) was higher than in a previously reported study using an older classification (0.18).11 In our study, it was evident that solid and lepidic patterns without collapse were more reliably recognized than others, such as micropapillary versus papillary and acinar versus papillary versus lepidic, in particular in relation to what constituted invasion (wholly lepidic versus other). Thus, a second study was undertaken to examine this area more closely, with the kappa value being very similar overall (0.40) to that for submitted problematic cases in the first phase. Although issues in distinguishing micropapillary versus papillary patterns are self-evident, problems in distinguishing acinar/papillary patterns from lepidic pattern are not so obvious. However, pulmonary adenocarcinoma poses particular challenges for pathologists through the superimposition of its neoplastic growth on the underlying lung architecture. As they grow into aerated, alveolar tissue, cross-cutting of growth along alveolar walls (lepidic) will mimic papillary structures and desmoplastic reaction will produce acinar structures which in reality are collapsed areas lacking invasion (Figure 4d, g and i). This can be further complicated by pre-existing lung architectural changes such as emphysema or interstitial fibrosis, and inconsistent use of formalin inflation technique to fix the tumor specimens.

Thus, the prime confounder is using a two-dimensional histological section to diagnose a lesion with a complex three-dimensional architecture an issue that not adequately addressed within current definitions, given kappa values in this study. This problem was borne out in both parts of this study in which there were clearly two ‘groups’, one being very literal (group A) in their application of diagnostic criteria and one being more interpretive (group B), whereas others are more interpretive in their approach and translating ‘two-dimensional data’ into a ‘three-dimensional categorization’. In addition, a difference in the two groups may be related to country of practice. We believe a constructive approach is to improve definitions and increase education on the usage of this terminology.12, 13 Further studies are ongoing in this respect.

Post-study discussion also identified variation in interpretation of various morphological features. First, some pathologists interpreted a stromal component as tumor-related stroma with fibroblasts (also called desmoplastic stroma), whereas others considered the same feature as benign scarring/fibroelastosis (Figure 4c and i). Second, the presence of elastin was variably weighed as representing native alveolar wall by some pathologists but not by others (Figure 4e and f). Third, inflammation in alveolar walls implied invasive disease to some. Fourth, although there was good agreement between pathologists in cases with a prominent micropapillary component, there was variation in interpretation between what some interpreted as focal micropapillary component and tangential cutting of both lepidic and true papillary structures. Finally, some pathologists interpreted a mucinous lepidic component as being invasive, based upon the reasonable assumption that elsewhere in the tumor an invasive component with scarring is highly likely, whereas others interpreted the image in itself as non-invasive (Figure 4h). It is therefore notable that much of the interobserver variation stems from interpretation based on operator experience and opinion, and improved definitions and better education on their usage are required to reduce interobserver variability.

The main limitation of the study was using digitized photographic image to present relevant images of pulmonary adenocarcinoma. The advantages of this approach were that there was precision of diagnosis on specific areas and the study was undertaken in timely manner. The main disadvantage was that the whole section was not examined and therefore the procedure was not representative of daily diagnostic practice. Therefore, to add some context, a low power image was added to the case. Although this did not compensate fully for using a microscope-based approach on whole sections, we believe that this did allow appropriate examination of a pathologist's ability to identify histological patterns in a robust manner. Another limitation is that the review of highly selected images does not represent the frequency of problems encountered in routine practice. For example, in this study great emphasis was placed on distinguishing invasive versus non-invasive patterns, when the frequency of AIS, MIA and lepidic predominant patterns occur in 10–20% of all early-stage resected lung adenocarcinomas there is no prognostic difference between the AIS and MIA categories both of which should have 100% 5-year disease-free survival, if completely resected.3, 4

In conclusion, given that most cases of pulmonary adenocarcinoma show mixed morphology in relation to the five major histological patterns, this study provides strong evidence that a predominant pattern can be reproducibly identified with high concordance among pathologists in resection specimens, thus supportive of the adoption of ‘predominant pattern’ for subtyping invasive adenocarcinoma in the updated classification, as more data are published highlighting the clinical relevance of this approach. Recognition of the adenocarcinoma in-situ pattern is more problematic though kappa values are fair, but this area could be improved by having more precise definitions and subsequent better education on interpretation of existing terminology, and/or additional markers of invasion.