The PD-1/PD-L1 axis, consisting of the programmed death 1 (PD-1) receptor and its ligand programmed death ligand-1 (PD-L1), plays a crucial role in T-cell regulation. PD-1 is a checkpoint molecule expressed in various immune cell types and negatively regulates T-cell activity after binding to its ligand PD-L1. PD-L1 is normally expressed in antigen presenting cells to suppress unnecessary immune activation and reduce the autoimmune responses, but tumor cells may also express PD-L1 to escape immune surveillance [1]. Targeted inhibition of the PD-1/PD-L1 axis can reactivate the anti-tumor immune response in PD-L1 expressing tumors, and this has demonstrated efficacy and promise in treating multiple human cancers [2, 3]. Several such agents have already been approved by FDA for the treatment of non-small cell lung carcinoma and many more have entered clinical trials. Appropriate biomarker selection is therefore essential to improve treatment efficacy. However, a specific PD-L1 immunohistochemical assay has been developed now for each PD-1-targeting or PD-L1-targeting agent, and this complicates the establishment of PD-L1 assay in pathology laboratories.

The current study aimed to evaluate the prevalence of PD-L1 expression and its association with clinicopathological features in a large cohort of non-small cell lung carcinoma using the PD-L1 IHC 22C3 PharmDx kit, a FDA-approved companion diagnostic test. Moreover, we compared the analytical and clinical performance of 22C3 to those of three other commercially available PD-L1 diagnostic assays, namely, PD-L1 IHC 28-8 PharmDx, VENTANA PD-L1 SP142 and SP263, in selecting patients for first-line and second-line anti-PD-1/PD-L1 treatment.

Materials and methods

Sample cohort

Formalin-fixed, paraffin-embedded samples from 780 consecutive non-small-cell lung carcinoma patients who underwent surgical resection between 1995 and 2011 were obtained from the archives of Department of Anatomical and Cellular Pathology, Prince of Wales Hospital, Hong Kong. Medical records were reviewed and clinicopathological data were collected. The pathological stages were determined according to the seventh edition of American Joint Committee on Cancer (AJCC) tumor-node-metastasis (TNM) classification system. The study protocol was approved by the Joint CUHK-NTE Clinical Research Ethics Committee, Hong Kong. Demographic characteristics of the study cohort are summarized (Supplementary Table S1), and driver mutation status of this cohort was published previously [4].

A pathologist reviewed all cases to confirm the histological diagnosis and select the representative tumor area with appropriate tumor content for the study. Tissue microarrays were constructed using a tissue arrayer (Beecher Instruments, Silver Spring, MD). The tissue microarray blocks were made in triplicate. For each tumor, three 1-mm cores sampling from different areas were punched out and transferred to three recipient blocks separately. Serial sections at 4 µm were then made from each tissue array block for PD-L1 immunohistochemistry testing.

PD-L1 Immunohistochemistry Diagnostic assays

Four PD-L1 diagnostic assays, Dako PD-L1 IHC 22C3 PharmDx, 28-8 PharmDx, Ventana PD-L1 SP142 and SP263 were performed according to the manufacturers’ instructions (Supplementary Table S2). The PD-L1 expression on tumor cells and tumor-infiltrating immune cells were scored separately. The tumor proportion score, which is the percentage of tumor cells with partial or complete membranous staining of any intensity, was assigned in 1% increments over a range of 0–5% and 5% increments over a range of 5–100%. The tumor-infiltrating immune cells were scored as a percentage of tumor area covered by PD-L1 positive immune cells. All specimens were considered adequate if at least one core yielded adequate amount of viable tumor cells (>100). The highest triplicate score was used to classify PD-L1 status of each case. Two trained pathologists independently scored the sections and a meeting between the two was convened to review the discordant cases. The final scores were consensus “true” results reached by the two.

Statistical analysis

Associations between PD-L1 status and clinicopathological parameters were analyzed by a chi-squared test or Fisher’s exact test for categorical variables, and one-way ANOVA for continuous variables. The Kaplan–Meier method was used to calculate the survival rates for different groups. A log-rank test was used to compare the survival curves. Cox proportional hazards regression was employed for univariate and multivariate survival analyses. The tumor and immune cell scores were plotted for each assay by cases and best-fit lines were determined by a regression analysis to demonstrate the relationship between assays. Pairwise concordance between assays was evaluated using scatter plots. The inter-rater agreement between the two raters was assessed by intraclass correlation coefficients for tumor cell scores and by Cohan’s Kappa when tumor cell score was divided as a binary variable with cutoff points of ≥1% and ≥50%. All statistical analyses were performed using R version 3.02 (R Foundation for Statistical Computing, Vienna, Austria). All P-values were two-sided and a P-value < 0.05 was regarded as statistically significant.

Results

Prevalence and clinicopathological correlation of PD-L1 expression

Sixty-seven cases were excluded from analysis due to insufficient tumor content. The final cohort consisted of 713 non-small-cell lung carcinoma patients. Of these patients, 396 (56%) were classified as PD-L1 positive by the 22C3 assay with the cutoff at ≥1% tumor proportion score, while 149 (21%) were positive with a ≥50% cutoff (Table 1). Increased PD-L1 expression (using ≥50% tumor proportion score as a cutoff) was significantly associated with male gender (P = 0.001), ever smoking (P < 0.001), squamous cell carcinoma (P = 0.001), large cell carcinoma (P < 0.001), lymphoepithelioma-like carcinoma (P = 0.006) and sarcomatoid carcinoma (P < 0.001).

Table 1 Clinicopathologic correlation of PD-L1 expression by 22C3 PharmDx in non-small cell lung cancer and adenocarcinoma

In patients with lung adenocarcinoma (N = 399), PD-L1 positive rate was 46% with a cutoff at ≥1% tumor proportion score, and 12% with a ≥50% cutoff. The expression of PD-L1 in adenocarcinoma patients was significantly lower than that in patients with other histologic subtypes (P < 0.001). Similar to the findings in non-small cell lung carcinoma, PD-L1 expression in adenocarcinoma was significantly higher in male patients (P = 0.005) and ever-smokers (P = 0.002). In addition, EGFR wild-type and KRAS-mutant tumors were found to have significantly higher level of PD-L1 expression (Table 1). Using the Cox proportional hazard model, we found that patients with high PD-L1 expression (i.e., ≥50% tumor proportion score) in adenocarcinoma was significantly associated with shorter overall survival (OS) compared with those without PD-L1 expression (i.e., <1% tumor proportion score) (hazards ratio (HR), 1.80; 95% confidence interval (CI), 1.10–2.94; P = 0.019). It remained an independent prognostic factor in multivariate analysis (adjusted HR, 1.71; 95% CI, 1.03–2.86; P = 0.039) (Supplementary Table S3). Fig. 1 shows the Kaplan–Meier curves of OS stratified by PD-L1 expression in non-small cell lung carcinoma and adenocarcinoma. Sub-group analysis of patients with other histological subtypes, including squamous cell carcinoma, large cell carcinoma, adenosquamous carcinoma, lymphoepithelioma-like carcinoma and sarcomatoid carcinoma, showed no significant difference in OS between PD-L1-positive and -negative groups.

Fig. 1
figure 1

A and B, Kaplan–Meier curves of overall survival stratified by PD-L1 strong (tumor cell proportion score ≥50%), moderate (1–49%) and negative (<1%) expression in non-small cell carcinoma (a, N = 713) and adenocarcinoma (b, N = 399). c, d Kaplan–Meier curves of overall survival stratified by PD-L1 positive (tumor cell proportion score ≥1%) and negative (<1%) expression in non-small cell carcinoma (C, N = 713) and adenocarcinoma (D, N = 399)

Analytical comparison of the four PD-L1 diagnostic assays

Representative immunohistochemical images for the four assays were shown in Fig. 2, while analytical comparisons of the tumor and immune cell scores for these assays were shown in Fig. 3. 22C3 and 28-8 demonstrated similar staining patterns in tumor and immune cells across most of the cases. SP263 showed higher tumor and immune cell scores, whereas SP142 consistently showed lower tumor and immune cell scores. Supplementary Figure S1 shows the pairwise comparisons of the four assays for tumor and immune cell scores. A high degree of agreement was observed among 22C3, 28-8 and SP263 assays in tumor cell scoring. Among the three, 22C3 versus 28-8 showed the highest correlation in tumor cell score (Pearson R2 = 0.873), followed closely by SP263 versus 28-8 (R2 = 0.865) and SP263 versus 22C3 (R2 = 0.841). On the other hand, all three assays showed lower correlation (Pearson R2 ≈ 0.70) with SP142. Nevertheless, the Wilcoxon signed rank test showed a significant difference among all assays for tumor cell score (Table 2). For immune cell score, all assays showed low concordance (with Pearson R2 ranging from 0.263 to 0.682), and were significantly different from each other.

Fig. 2
figure 2

Examples of the range of PD-L1 immunohistochemical staining by four assays

Fig. 3
figure 3

Analytical comparison of tumor and immune cell staining for the four PD-L1 diagnostic assays. Distribution of the tumor cell proportion score (a) and immune cell proportion score (by tumor area, b) for each assay were plotted by case. For a better illustration of the association among the four assays, the first 300 cases with low tumor cell score and first 400 cases with low immune cell score (<1%) were removed from the plots

Table 2 Pairwise comparison of tumor proportion score and immune cell proportion score between assays

Comparison of four PD-L1 assays across clinically relevant cutoffs

At the time of this study, pembrolizumab was the only PD-1/PD-L1 inhibitor approved by FDA as the first-line single-agent treatment for metastatic non-small cell lung carcinoma patients showing high PD-L1 expression (at a cutoff of ≥50% tumor proportion score), as determined by the companion diagnostic 22C3 PharmDx assay. At a cutoff of ≥50% tumor proportion score, 149 (21%), 158 (22%), 111 (16%) and 166 (23%) of non-small cell lung carcinoma patients were classified as PD-L1 positive by 22C3, 28-8, SP142 and SP263, respectively. 110 (15%) patients were classified as PD-L1 positive and 541 (76%) patients were classified as PD-L1 negative by all four assays. Sixty-two (9%) patients showed discordant PD-L1 status (Fig. 4a). A high degree of agreement for positive (>96%) and negative (>96%) was observed among 22C3, 28-8 and SP263 (Supplementary Table S4). However, a lower rate of positive agreement was observed between 22C3 and SP142 (74%) as SP142 stained significantly fewer tumor cells.

Fig. 4
figure 4

Comparisons of the four assays at clinically relevant cutoffs (tumor cell proportion score ≥50% and ≥1%)

If a tumor cell score cutoff of ≥1% was used, 396 (56%), 416 (58%), 194 (27%) and 400 (56%) patients were classified as PD-L1 positive by 22C3, 28-8, SP142 and SP263, respectively. A quarter of cases (178 out of 713 patients) were classified as positive and 201 cases (28%) were classified as negative by all four assays (Fig. 4b). So a large proportion of the cases (333/713, 48%) showed discordant PD-L1 status. The overall percentage agreements using a ≥1% cutoff (68.6-82%) were also lower when compared with those using a 50% cutoff (94.4-97.9%).

Supplementary Table S5 shows the between-assay agreement across clinically relevant cutoffs of all assays. 28-8 at 1%, 5% and 10%, SP142 at TC50IC10 (≥50% tumor cells or ≥10% immune cell in tumor area), and SP263 at 25% were set as reference standards for comparisons against the other assays at matched cutoffs.

Inter-rater variation

To evaluate the inter-rater variation between the two pathologists, intraclass correlation coefficients of the raw tumor cell score obtained with the four assays were computed. The highest intraclass correlation coefficient was observed in the SP263 assay (0.967, 95% CI: 0.961–0.971), followed by 22C3 (0.963, 95% CI: 0.957–0.968), 28-8 (0.932, 95% CI: 0.922–0.941) and SP142 (0.916, 95%CI: 0.904–0.927). Pairwise comparisons of raw tumor cell score between the two raters for all four assays are shown in Supplementary Figure S2. We also assessed the inter-rater variations of tumor cell score when it was divided as a binary variable with cutoff points of ≥1% and ≥50% using Cohen’s kappa, and a better inter-rater agreement was observed using a ≥50% cutoff compared with using a ≥1% cutoff (Table 3).

Table 3 Inter-rater variation between two pathologists for tumor cell score by 4 PD-L1 assays

Discussion

Targeting immune checkpoints has become one of the most promising modalities in cancer treatment. Several PD-1/PD-L1 inhibitors such as nivolumab, pembrolizumab and atezolizumab have been approved for pretreated metastatic non-small cell lung carcinomas [5]. More recently, pembrolizumab has been included as a standard first-line treatment option for patients with advanced non-small cell lung carcinomas [6]. Appropriate predictive biomarker for patient selection is therefore essential for the implementation of personalized therapy, and PD-L1 immunohistochemistry is a biomarker waiting to be approved for non-small-cell lung carcinoma patient selection in clinical practice. The current study is the largest cohort study of ethnic Chinese patients with non-small cell lung carcinoma that provides the prevalence as well as clinicopathological features of PD-L1 expression. We noted that the overall prevalence of PD-L1 expression in our cohort was lower than that reported in the KEYNOTE studies (KETNOTE-001, 010 and 024) [7,8,9] (20.9% versus 28% using a ≥50% cutoff, and 55.5% versus 66% using a ≥1% cutoff) [10]. Factors including ethnicity, histologic subtypes, smoking status, driver mutation status may contribute to this difference in positive rates of PD-L1 expression. Although the KEYNOTE studies recruited patients globally, > 70% of them were Caucasians. Asian and Western patients with non-small cell lung carcinoma were known to have different characteristics epidemiologically and genetically. High PD-L1 expression (using a tumor cell score ≥50% cutoff) assessed by 22C3 assay was previously found in 25% and 29.6% of non-small cell lung carcinoma patients in studies from Denmark (N = 204) and the US (N = 71), respectively [11, 12], but only observed in 6% of the patients in a Korean study (N = 1090) [13]. So ethnicity might be a significant factor affecting the prevalence of PD-L1 expression. Moreover, Asian lung cancer patients have more EGFR mutations and fewer KRAS mutations compared to Caucasians [14, 15]. The EGFR and KRAS mutation rates were 27.3% and 8.8%, respectively in the current patient cohort, whereas they were 15.5% and 26.1%, respectively, in KETNOTE-001 and 8% and unavailable respectively in KEYNOTE-010 [7, 8]. Our data indicated that PD-L1 expression was negatively associated with EGFR-mutant and positively associated with KRAS-mutant. This is in agreement with the emerging concept that PD-L1 negative status is associated with low mutation burden. EGFR-mutants have lower mutation burden than EGFR wild-type tumors, while KRAS mutants are associated with smoking, increased somatic mutation and neoantigens [16]. A recent study further demonstrated that KRAS-mutant induced PD-L1 expression through p-ERK signaling in lung adenocarcinoma [17]. Therefore, the high prevalence of EGFR mutations and low prevalence of KRAS mutations might contribute to the lower PD-L1 expression in our patient cohort compared to the KEYNOTE studies.

The prognostic value of PD-L1 is still controversial. Some reported that high PD-L1 expression was significantly associated with poor prognosis in several cancer types including lung cancer [18,19,20]. Others suggested that there is no significant correlation between PD-L1 expression and survival or prognosis [12, 21, 22]. Multiple factors like variation in antibodies used, the choice of cutoffs, ethnicity and histological subtypes can all contribute to the discrepant results. A meta-analysis of 7,319 patients from 29 studies covering 12 carcinoma types showed that PD-L1 expression associated with unfavorable prognosis (HR, 1.81; 95% CI, 1.33–2.46) [23]. Similar findings were reported in two other studies focusing on meta-analysis of lung cancer [24, 25]. In their sub-group analysis by ethnicity, the HRs was 1.51 (95% CI: 1.24–1.7954) for Asian and 1.35 (95% CI: 1.08–1.63) for non-Asian in Zhou’s study, and 1.83 (95% CI: 1.41–2.38) for Asian and 1.54 (95% CI: 0.99–2.39) for non-Asian patients in Wang’s study. A recently meta-analysis enrolling 11,444 non-small cell lung cancer patients from 47 studies [26] concluded that although PD-L1 expression associated with unfavorable prognosis in pooled populations (HR = 1.26, 95% CI: 1.05–1.52), PD-L1 is an indicator of the poor prognosis in Asian populations (HR = 1.64, 95% CI: 1.36–1.96, P < 0.001), but not in non-Asian populations (HR = 0.85, 95% CI: 0.70–1.02, P = 0.07). Our study suggested that PD-L1 expression associated with poor OS in patients with adenocarcinoma but not in non-small cell lung carcinoma. In concordance with our findings, a Korean study found a high PD-L1 expression to be associated with poorer prognosis and the association was driven mainly by the patients with adenocarcinoma [13].

One major limitation of our study is that the analysis was done on tissue arrays. Significant intra-tumor heterogeneity of PD-L1 expression has been reported in lung cancer [27]. Using tissue arrays may under-represent tumor heterogeneity. We took the highest PD-L1 score of triplicated tumor cores as reported previously [28]. In order to eliminate the possible effect of sample selection bias, we ran a separate set of clinicopathologic correlation and survival analysis using the average 22C3 score from the triplicates. Lower PD-L1 positive rates were observed at both 1% and 50% cutoff when average scores were used (Supplementary Table S6). However, the clinicopathological correlation and prognostic value remained similar between the maximal and the average PD-L1 scores. PD-L1 positive by average score associated with male, smoker, specific histological subtypes (squamous cell carcinoma, large cell carcinoma, lymphoepithelioma-like carcinoma and sarcomatoid carcinoma), EGFR wild-type and KRAS mutants (Supplementary Table S6). High (≥50%, HR = 1.721, 95% CI: 1.009–2.938, P = 0.046) and intermediate (1–49%, HR = 1.565, 95% CI: 1.055–2.321, P = 0.026) PD-L1 expression associated with unfavorable OS in patients with adenocarcinoma. And high PD-L1 expression remained independent prognostic factor for poor OS in adenocarcinoma (adjusted HR = 1.857, 95% CI: 1.05–3.284, P = 0.033) by multivariable analysis (Supplementary Table S3).

Several PD-L1 immunohistochemical assays using different antibody clones, testing platforms as well as scoring systems have been developed and linked to their specific therapeutic anti-PD-1/PD-L1 agents. This complicates establishment of PD-L1 assay in pathology laboratories since not all these resources are readily available in one diagnostic laboratory. Efforts for harmonization have been made to evaluate the analytical equivalence of these commercially available PD-L1 immunohistochemical assays [29,30,31,32,33]. However, some previous studies were limited by small sample size and some were conducted in collaboration with pharmaceutical companies. The current project is the largest cohort study to date, with 713 surgically resected non-small cell lung carcinomas, that compared four commercially available PD-L1 immunohistochemical assays, namely, 22C3, 28-8, SP142 and SP263 using tissue arrays. In agreement with the findings from the Blueprint and NCCN studies, we demonstrated that the analytical performances of 22C3, 28-8 and SP263 were highly concordant and produced similar results in tumor cell staining. SP142 stained fewer tumor cells compared to other three assays, resulting in lower positive percentage agreements with other assays. Although our results apparently suggested that the three assays might be used interchangeably, it should be noted that 2–3% of the non-small cell lung carcinoma patients would be classified differently using a 50% cutoff (i.e., the threshold for first-line patient selection).

Since our study is a retrospective cohort study, the patients were not treated with PD-1/PD-L1 inhibitors. Therefore, the lack of treatment response data hindered us from evaluating the predictive power of PD-L1 immunohistochemical tests. Further studies will be needed to address the predictive value of the assays, especially in those cases with discordant results. The concordant rate among the four assays decreased further when a tumor cell score of ≥1% was used as the cutoff. This is consistent with the finding of a recent study comparing the performance of SP263 and that of 22C3 [32], and other studies also reported a lower inter-rater agreement using a 1% cutoff [29, 33]. Hence, binary classification using a low cutoff value (i.e., ≥1%) poses a great challenge to pathological assessment.

In conclusion, this large cohort study demonstrated distinct prevalence and clinicopathological correlation of PD-L1 expression in Asian patients with non-small cell lung carcinomas. Three commercially available PD-L1 immunohistochemical assays, PD-L1 IHC 22C3 PharmDx, 28-8 PharmDx and VENTANA PD-L1 SP263 showed high agreement with each other in analytical performance. Further studies comparing the predictive value of the assays will be needed to address the interchangeability of these assays for clinical use.