Immunohistochemical detection of PD-L1 among diverse human neoplasms in a reference laboratory: observations based upon 62,896 cases

Targeting of the PD1/PD-L1 immune checkpoint pathway has rapidly gained acceptance as a therapeutic strategy for a growing number of malignancies. Testing for expression of PD-L1 in tumor cells and immune cells has been used as a companion or complementary test for drugs targeting the PD1/PD-L1 pathway. We evaluated the results of PD-L1 testing in a large reference lab cohort. Using Food and Drug Administration-approved methods and interpretive instructions for each individual test, 62,896 cases were evaluated for PD-L1 using antibody clone 22C3, 28-8, SP142, or SP263. Case data analyzed included test results and information on tumor location and clinical history. No clinical outcome information was available and no attempt was made to correlate PD-L1 results with any other tests performed. The following numbers of cases were evaluated: 22C3 with tumor proportion score [n = 52585], 22C3 with combined positive score [n = 2631], 28-8 [n = 4191], SP142 [n = 850], and SP263 [n = 70]. In 22C3/tumor proportion score cases, the general results were as follows: negative 33.1% (n = 17,405), (low) expression 33.9% (n = 17,822), and high expression 29.5% (n = 15,486). In cases identified as metastatic, the results were as follows: negative 35.9% (n = 1411), (low) expression 30.8% (n = 1211), and high expression 30.7% (n = 1208). We found broad ranges of expression in tumor types with increasing positivity, as adenocarcinomas were reported as poorly differentiated, whereas squamous cell carcinomas showed more positivity as tumors were described as well-differentiated. The results of many individual tumor types were evaluated and showed, in general, high levels of positive expression. Practical challenges and observations of PD-L1 stain results and interpretation are also discussed.


Introduction
Programmed death 1 (PD1) is a cell surface receptor expressed on cytotoxic T cells and pro-B cells, which binds to its cognate ligands PD-L1 and PD-L2, expressed on macrophages, epithelial cells, and other normal cells. Under normal physiologic conditions, the PD1/PD-L1 interaction produces specific conformational changes, which protect normal cells from immune recognition, and inhibits subsequent destruction by cytotoxic T cells, which would otherwise lead to a state of autoimmunity. As a result of this inhibition, reactive T cells become exhausted through signaling pathways, which lead to a combination of cessation of division and proliferation, and programmed cell death or apoptosis [1][2][3].
Certain neoplasms have developed mechanisms of evading this immune surveillance by upregulating PD-L1 expression on the surface of neoplastic cells, such that their PD-L1 receptors may bind to the PD1 ligand on activated T cell, and ultimately render them inactive and subject to clonal exhaustion. In this way, the neoplastic cells are able to escape this so-called "immune checkpoint" and continue to proliferate unabated. In recent years, cancer immunotherapy has focused on the development of a new generation of immunotherapy agents, which specifically block the interplay between tumor cells and the immune system, known as immune checkpoint inhibitors, of which the PD1/ PD-L1 axis is but just one target [4,5]; other targets of interest include cytotoxic T lymphocyte-associated protein 4, lymphocyte-activation gene 3, and killer-cell immunoglobulin-like receptor [4][5][6][7].
There are currently five approved therapeutic agents on the market targeting the PD1/PD-L1 pathway, two of which (nivolumab and pembrolizumab) are humanized IgG monoclonal antibodies directed at the PD-1 receptor, whereas the other three (atezolizumab, durvalumab, and avelumab) are humanized IgG monoclonal antibodies directed at the PD-L1 receptor [ Table 1]. Each of these drugs binds to a different epitope on their respective target and therefore each has a distinct immunogenic profile and, by extension, its own dynamic range. Based on results of large-scale clinical trials showing statistically significant response rates and improvements in overall survival in the context of a variety of solid tumors, including non-small cell lung carcinomas, gastrointestinal carcinomas, head and neck squamous cell carcinomas, renal cell carcinomas, urothelial carcinomas, cervical carcinomas, and breast carcinomas, as well as in lymphomas and melanoma, all have been approved for use as second-line treatment in patients whose tumors have stopped responding to conventional chemotherapy, whereas pembrolizumab has recently received approval for use as the first-line therapy for advanced/metastatic non-small cell lung carcinomas in which the tumor cells show > 50% PD-L1 expression, as it was shown to be associated with a significantly longer progression-free and overall survival with fewer adverse events than in patients receiving platinum-based chemotherapy [8][9][10][11][12].
The advent of personalized healthcare, which refers to developing targeted therapeutics for specific patients or patient subgroups by identifying which patients are most likely to experience a favorable benefit-risk outcome with a selected therapy, has necessitated the development of an array of in vitro laboratory tests designed to measure predictive biomarker levels in these patients, with a view to tailoring individual treatment protocols. These diagnostic assays fall into one of two distinct categories, companion diagnostics and complementary diagnostics, based on requirement for drug eligibility [13,14]. Companion diagnostic tests provide information that is essential for use of each of the aforementioned immune checkpoint inhibitors, are typically linked to a specific drug within their approved label, and determine patient eligibility for treatment with the corresponding drug. Complementary diagnostic tests may assist in the therapeutic decision-making algorithm associated with a particular therapy by informing on which patients may benefit from that therapy, but they do not restrict patients from receiving co-developed therapies based on the outcome of the diagnostic test, because therapeutic benefit with that drug has been demonstrated in all patients, regardless of biomarker expression status. The first companion diagnostic test to receive Food and Drug Administration (FDA) approval was the Her2 in-situ hybridization assay for trastuzumab in 1998 and, although the term "complementary diagnostic" had been in used since the 1990s. The PD-L1 immunohistochemical assay for use with nivolumab was the first complementary diagnostic test to meet FDA regulatory requirements [13]. Both categories of tests can inform on enhanced benefits in subgroups of patients, depending on degree of biomarker expression at varying cutoffs, and matching PD-L1 biomarker assays have been developed for each of the aforementioned five immune checkpoint inhibitors, with each developed by different companies, run on different analytic platforms, and each requiring their own respective validation studies with some distinctive methods of scoring [15,16] [ Table 2]. This study is intended to be largely observational. In this study, we evaluate the ordering and expression patterns of various PD-L1 antibodies using their individual FDAapproved methodologies. In addition, we address individual tumor-and sample-type expression results. Finally, we examine some common pitfalls and challenges in PD-L1 immunohistochemical staining interpretation.

Materials and methods
Materials were sent for consultation to Neogenomics Laboratories from multiple locations. Testing was performed at our laboratory in Aliso Viejo, California. In each individual case, testing was performed either as requested for specific PD-L1 testing, or as part of a comprehensive evaluation for diagnosis or prognostic/theranostic markers in a tumor. As is typical in reference laboratory testing, submitted clinical history was minimal in most cases and was limited to tumor site (in most cases) with some indication of general tumor type (often), or specific diagnosis either by text (occasional) or international calssification of diseases code (occasional). No clinical follow-up is available on individual results. Further, because of limitations of the scope of this research, no attempt was made to correlate PD-L1 results with any other tests performed. All research was performed in accord with local and national standards for ethical research.
Data searches were performed using a natural language search of submitted information for site and submitted clinical history. However, as an example a submitted site of "lung" and history of "cough" would not allow for thorough categorization. All cases may not be represented from larger data set due to ambiguous or missing information in submitted information. In many cases, the possibility of a primary or metastatic tumor (such as the lung, brain, or liver sites) could not be disambiguated. When possible, searches were performed with parameters that would include or exclude data in such a way as to make the results relatively unambiguous. However, rare cases that had unusual presentation (e.g., lung carcinoma metastatic to thyroid) may have been included in some search sets. In the case of the lung, two different search parameters were used, in an attempt to assess whether internal results were relatively consistent, in addition to obtain fairly "pure" results for lung cases.
Cases that were considered quantity not sufficient lacked appreciable tumor on the PD-L1 stain. Although FDA guidelines specify that stains should not be scored when there are < 100 tumor cells present, in practical terms, if appreciable aggregates or clusters of tumor were present on PD-L1 stain, scoring was attempted. Other causes for rejection included diffuse necrosis, diffuse granular staining without specific membrane staining, or unreadable tissue due to histologic limitations (wrinkles, folds, tissue fall-off, etc.).

Results
We evaluated results for a total of 62,896 cases ( Fig. 1). Cases from February 2017 till May of 2018 were evaluated. Cases without any identification of gender, age, or as part of clinical trials were not further evaluated (n = 2577). When considering all evaluated cases, 3.7% (n = 2226) of cases were considered quantity not sufficient for analysis and no score result was generated. No specific additional analysis was performed on these cases. The male-to-female ratio of all tested cases was 52:48. The average age was 68.6 years (range < 1-105 years) [ Table 4] [Supplementary Materials 1]. As part of routine laboratory quality assurance practices, monthly scores for PD-L1 22C3 tumor proportion score (no expression, expressed, highly expressed), and combined positive score results (no expression, expression) were compared. In 7 months during that were analyzed, tumor proportion scores showed minimal month-to-month variation (percent positive range 61.9-66.2%) and combined positive scores showed slightly more variation (percent positive: 77.9-86.1%). [Supplemental materials 2] 22C3: combined positive score Immunohistochemistry for PD-L1 using 22C3 with the combined positive score is intended for evaluation of gastric and gastroesophageal adenocarcinoma during the time of this study. In addition to this indication, a variety of cases were submitted for combined positive score scoring, irrespective of testing/therapeutic guidelines.
A total of 2623 cases were evaluated using 22C3/ combined positive score. The results of 22C3/combined positive score are summarized in Table 5. The age range was 51-78 years (average 65.5) with a male-to-female ratio of 67:33. Quantity not sufficient cases accounted for  Fig. 1 Combined results for positive expression, negative, and quantity not sufficient for SP142, 28-8, and 22C3 tumor proportion score (TPS), 22C3 combined positive score (CPS), and SP263 3.7% (n = 97). Esophageal and gastric cancers were comparable in the number of cases with expression (85.8% vs. 83.6%).

22C3: tumor proportion score
A total of 52,585 cases were evaluated. The age range was < 1-105 years (average 68.8), with a male-to-female ratio of 51:49. Quantity not sufficient cases accounted for 3.6% (n = 1872); the quantity not sufficient percentages in all subset analysis was somewhat variable ( < 1-11.1%); however, those groups with the highest quantity not sufficient rates (Hodgkin lymphoma, thymoma) had only small numbers of cases. In 22C3/tumor proportion score scored cases, the general results were as follows: negative 33.1% (n = 17,405), (low) expression 33.9% (n = 17,822), and high expression 29.5% (n = 15,486) [ Table 6]. Tumors that had highest numbers of no expression ( > 45% of cases) were as follows: neuroendocrine, endometrial, mucinous adenocarcinoma, well-differentiated adenocarcinoma, thyroid, bladder, and renal. Tumors with the largest numbers of highly expressed cases ( > 40% of cases) were as follows: pericardial fluid, mycosis fungoides, poorly differentiated adenocarcinoma, and adenosquamous carcinoma. Staining intensity was recorded for 22C3/tumor proportion score, but not analyzed further (Fig. 2).
Comparison of 22C3/tumor proportion score in well-, moderately and poorly differentiated adenocarcinoma and squamous cell carcinoma and adenosquamous carcinoma Although only a subset of cases were captured using this search (n = 1928), we compared the overall results of adenocarcinoma identified as well-vs. moderately and poorly differentiated. We noted a statistically significant difference between poor-and well-differentiated cases (p = 0.02), although moderately differentiated cases were not significantly different from well-or poorly differentiated cases [Table 6a]. Furthermore, cases identified as mucinous adenocarcinomas were far more similar to well-differentiated as opposed to poorly differentiated adenocarcinomas (p = 0.91 compared with well-differentiated vs. p = 0.02 compared with poorly differentiated).
Similarly, we compared cases of squamous cell carcinoma identified as well-vs. moderately and poorly differentiated [Table 6b]. In contrast to adenocarcinoma, which showed greatest expression in poorly differentiated cases (75.5%), squamous cell carcinoma showed greatest expression in well-differentiated cases (81.7%). Poorly differentiated and moderately differentiated cases were found to be statistically significantly different from well-

Evaluation of 22C3/tumor proportion score expression in metastases
Expression pattern was evaluated in cases identified as metastatic (n = 3933). Other cases that were considered to be metastatic include the following: pleural fluid (n = 2105), pericardial fluid (n = 44), bone (n = 2259), adrenal (n = 549), and cases metastatic to brain (n = 47). Although each of these theoretically could have primary disease (such as, primary adrenal tumors), metastatic disease is far more likely. Furthermore, these groups may overlap somewhat, depending on the limitations of the submitted tumor history/ site information. In cases identified as metastatic, the results were as follows: no expression 35.9% (n = 1411), expressed 30.8% (n = 1211), and highly expressed 30.7% (n = 1208) [ Table 6]. Surprisingly, pericardial fluid had an exceedingly high rate of positivity (100% highly expressed) compared with pleural fluid, although only a relatively small number of pericardial cases were analyzed (n = 44) (Fig. 3).
Compared with all cases identified as metastatic, those identified as bone had a slightly higher quantity of not sufficient rate and only mildly increased numbers of no expression cases. The overall results do not show significant differences, suggesting that decalcification in bone specimens likely has no or little effect of PD-L1 results using 22C3.
Comparison of 22C3/tumor proportion score and combined positive score in esophageal/gastric cases In many cases, tumors or adenocarcinomas of gastric, gastroesophageal, or gastroesophageal junction were submitted for evaluation for tumor proportion score rather than the combined positive score. However, as a result, this allows for a comparison of these cases using the combined positive score and tumor proportion scores, and indicates the degree of contribution to immune cell scoring for 22C3-stained cases. Scores for esophageal cancers were 49.2% with tumor proportion score vs. 85.8% with combined positive score (an increase of 36.6%). Score for gastric cancers were 50.3% with tumor proportion score vs. 83.6% with combined positive score (an increase of 33.3%).

28-8
The 28-8 antibody was evaluated in 4191 cases. The age range was 2-103 with an average age of 68 years. The male-to-female ratio was 53:47. Quantity not sufficient cases accounted for 4.7% (n = 197). Negative results were seen in 45.5% of cases (n = 1905) and positive results in 49.8% of cases (n = 2089) [ Table 7]. High levels of expression ( > 40%) were identified in all subgroups analyzed: non-small cell lung cancer, urothelial carcinoma, melanoma, mesothelioma, adenocarcinoma, squamous cell carcinoma, and metastatic disease.

SP142
The SP142 antibody was evaluated in 850 cases. The age range was 2-96 with an average age of 69 years. The maleto-female ratio was 48:52. Quantity not sufficient cases accounted for 6.2% (n = 53). Negative results were seen in 68.4% of cases and positive results in 25.4% of cases [ Table 8].

SP263
The SP263 antibody was evaluated in 70 cases. The age range was 2-94 with an average age of 64 years. The maleto-female ratio was 25:45. Quantity not sufficient cases accounted for 10% (n = 7). No expression seen in 62.9% of cases (n = 44) and positive results in 35.7% of cases (n = 19) [ Table 9]. Cases of urothelial carcinoma (n = 21) were  Comparison of 22C3/tumor proportion score, 28-8, and SP142 in tumor types As has been highlighted in other publications, SP142 had the lowest levels of positivity [ Table 10]. This was true in all tumor types examined: lung, breast, esophagus, adenocarcinoma, squamous cell carcinoma, and metastatic disease.

Comparison with Keynote and Checkmate studies
We compared our results with those published previously in select Keynote and Checkmate studies [8,12,[17][18][19][20][21]. These results are summarized in Table 11. In Keynote 59, evaluation of gastroesophageal carcinomas using combined positive score, we showed significant differences from the reported NSCLC non-small cell lung cancer a In contrast to other antibodies, NSCLC was indicated as a distinct parameter in database for SP142

Practical observations in PD-L1 interpretation
Interpretation should be performed according to specific instructions for each antibody and indication. Intensity of staining can vary significantly within a single case (Fig. 4). Furthermore, positive staining in tumor cells should be membranous, but does not have to encompass the entire membrane (Fig. 5). Staining of the apical surfaces only within glands is not considered a positive result. Occasionally, macrophages within gland lumens are strongly positive, with no staining in tumor cells (Fig. 6). This is not generally considered as a positive result.
Cytologic specimens can be especially challenging to interpret, especially when there is cytologic atypia of positive histiocytes, and tumor cells or clusters are of a comparable size. This is especially true in cytologic specimens with large numbers of histiocytes. As in all cytology specimens, when tumor cells are rare, interpretation and scoring can be a challenge. Rarely, if tumors cells are positive, with negative staining in immune cells, then their appearance can be quite easy to detect on the PD-L1 stain.
In occasional cases, positivity of tumor cells at an interface of tumor and stroma or histiocytes/lymphocytes can be seen with no significant staining within the more central portions of the tumor (Fig. 7). This edge effect is likely a result of direct interaction between the tumor cell antigens and upregulation of PD-L1 expression by adjacent immune cells. This should be distinguished from the frequently seen edge artifact identified in many immunohistochemical stains.

Normal staining and artifacts
As mentioned previously, expression of PD-L1 can be seen in many histiocytes/macrophages in various body sites. Other cells that are usually or always positive for PD-L1 staining include perineurial cells, nerve fibers, plasma cells, follicular dendritic cells, mast cells, and megakaryocytes (Fig. 8).
Bacteria and acellular debris may have significant positivity and are ignored for stain interpretation. Furthermore, as platelets express PD-L1, their aggregation in debris or tissue may impart positivity. Although incomplete membrane staining is considered positive, granular cytoplasmic staining in tumor cells is not considered positive in any of the scoring systems (Fig. 9). Rarely, nuclear staining may be identified but is not considered positive in any scoring systems (Fig. 9).
Rarely, pigment can complicate stain interpretation. Melanin pigment in primary or metastatic melanomas, anthracotic pigment (typically in the lung and hilar lymph nodes), tattoo pigment (lymph nodes), or extensive hemosiderin deposition need to be carefully excluded for PD-L1 interpretation (Fig. 10). As in all cases, careful assessment of immunohistochemical stains, positive and negative controls, as well as comparison with hematoxylin and eosin-stained specimens can minimize this difficulty.

Discussion
PD-L1 expression in tumors and, in some cases, immune cells, evaluated by immunohistochemical staining, is currently a highly used test in conjunction with anti-PD1 and anti-PD-L1 therapies. Good-to-excellent reliability of scoring of PD-L1 expression on tumor cells has been demonstrated, although immune cell scoring has a lower reliability [15,22]. We present the results of a large number of tests in a broad range of tumor types and using a variety of available stains.  We showed significantly higher expression in gastroesophageal carcinomas compared with the original Keynote 59 study [ Table 11]. Results from initial studies could be affected by case selection with biases toward advanced stage disease, higher pathologic grade, or those cases with a marked immune reaction. In addition, we found significant differences from reported results of Keynote 10 (previously treated non-small cell lung cancer), Checkmate 141 (recurrent or metastatic squamous cell carcinoma), and Checkmate 238 (adjuvant therapy for melanoma). The comparisons of the current study results with those from the studies are not exact. For example, we compared Checkmate 141 with "all" squamous cell carcinoma results and "all" melanoma results in Checkmate 238, although the results compared are only with stated stain results and not keyed to outcomes. This is highlighted by comparison with the result of Checkmate 57 (metastatic lung cancer). When compared with the current results for lung cancers, the results were not significant (p = 0.4); however, when compared with results of all metastases, the results were significant (p = 0.01). We did not show significant differences to those published in Checkmate 275 (urothelial carcinoma), Keynote 21 (lung cancer), or Checkmate 67 (melanoma). Our study parallels that of Rimm et al [23], which shows that SP142 had considerably lower reactivity than other antibodies [ Table 10].
Our data suggest that there are fewer negatives and higher overall expression in poorly differentiated adenocarcinomas vs. those identified as well-differentiated adenocarcinomas (within the largest data group, 22C3-TPS) (poorly differentiated 75.5% expressed, moderately differentiated 60.6% expressed, and well-differentiated 45.7% expressed). However, statistical comparisons show significance only between poorly differentiated adenocarcinomas from well-differentiated adenocarcinomas (p = 0.02) [ Table 6a]. Comparisons of moderate-to poor-and moderate-to well-differentiated adenocarcinomas were not significant (p = 0.57 and 0.09, respectively). This would support the general hypothesis that more neoantigens generate more potent expression of the immune checkpoint markers [24].
In parallel, mucinous adenocarcinomas were also statistically different from poorly differentiated adenocarcinomas (p = 0.02), but not compared with moderate (p = 0.07) or well (p = 0.91) differentiated adenocarcinomas. In spite of a tendency to be histologically poorly differentiated, mucinous adenocarcinomas appear to have similar reactivity to that of well-differentiated adenocarcinomas. A possible explanation is that the tumor antigens are not exposed to the immune response, as they are masked by mucus or that there are generally fewer tumor-infiltrating lymphocytes [25].
In contrast to adenocarcinomas, in the 22C3/tumor proportion score group, squamous cell carcinomas have increasing positivity, as there is greater differentiation (poorly differentiated 73.0%, moderately differentiated 75.2%, and well-differentiated 81.7%). There are statistically significant differences between well-and moderately differentiated squamous cell carcinoma (p = 0.04) and well-and poorly differentiated ones (p = 0.03) [ Table 6b]. Adenosquamous carcinoma is significantly different from poorly differentiated squamous cell carcinoma (p = 0.04), borders on significance There is no obvious answer as to the differences identified in pericardial tumor expression (100% strong expression) vs. that of pleural fluid (30% no expression, 33% expression, and 35% strong expression). However, pericardial effusions are far more rare than pleural effusions and involvement of the pericardial space may have a more robust or vigilant immune response, whereas the pleural fluid may be somewhat more permissive to the immunologic challenges of tumor involvement.
The data presented is a reflection of the ordering patterns of pathologists and the requests of oncologists. These may reflect "off-label" uses, such as requesting combined positive scores on samples that do not have a current indication and do not have supportive research for their use. Conversely, other orders do not request the appropriate antibody or scoring system for the intended drug being used. Although these may represent a practical approach to the "information overload" associated with myriad available antibodies and scoring systems, it would appear that the burden of education in this areas lies most heavily on the drug manufacturers and the producers of the antibodies.
Our data show that there are large numbers of tests being performed in tumors to assess PD-L1 expression. The efficacy of this testing in a coordinated manner, with the use of best performing antibodies, rather than those approved for companion or complementary diagnostic testing, may provide a better and more consistent understanding of the tumor and immune cell expression of PD-L1. This data raises many interesting questions about expression patterns of anti-PD-L1 antibodies in tumors and immune cells. It is likely to be that future research will be able to identify subsets of results that are able to better predict the most efficacious responses of the anti-PD1/PD-L1 therapies.
Publisher's note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons. org/licenses/by/4.0/. Fig. 9 a Faint cytoplasmic staining for PD-L1 in breast carcinoma. This is not considered to be true positive staining (22C3). b Nuclear staining for PD-L1. This is not considered to represent positive staining in any scoring systems (22C3)