Introduction

AT-rich interacting domain containing protein 1A (ARID1A; also known as BAF250a and SMARCF1) is a key non-catalytic component in the SWItch/sucrose non-fermentable (SWI/SNF) chromatin remodeling complex [1]. Of all human tumors, ~20% harbor mutations involving members of the SWI/SNF complex, including ARID1A, ARID1B, ARID2, PBRM1, SMARCA4, and others [2]. Mutations in ARID1A are frequent across cancer types, present in over 30–50% of ovarian and endometrial carcinomas [3,4,5], 10% of hepatocellular [6, 7] and bladder carcinomas [8], and 5–10% of colorectal [8], gastric [9, 10], and non-small cell lung carcinomas [8, 11, 12]. The functional significance of ARID1A is context-dependent. ARID1A functions primarily as a tumor suppressor in most tumor types, including ovarian clear cell and endometrioid carcinomas [3, 4]. ARID1A loss is associated with PI3K-Akt pathway activation in ovarian clear cell carcinomas [13], resistance to trastuzumab in HER2-positive breast carcinomas [14], and impairment in enhancer-mediated gene regulation in murine colorectal tumor models [15]. In contrast, in hepatocellular carcinoma, ARID1A acts as an oncogene in tumor initiation, but as a tumor suppressor in subsequent maintenance and metastasis [16].

Data on the functional and clinical significance of ARID1A mutations in non-small cell lung carcinomas are limited. Studies suggest that disruption of different members of the SWI/SNF complex may have similar implications for lung cancer phenotype [17]. In particular, mutations in SMARCA4 are present in 6–11% of non-small cell lung carcinomas [12, 18, 19]; SMARCA4 mutations and expression loss have been implicated in the pathogenesis of a subset of aggressive thoracic sarcomas [20,21,22] and lung adenocarcinomas [18, 19, 23,24,25,26,27]. Given the nuanced context-dependent functions of ARID1A, it is nonetheless difficult to infer the impact of ARID1A alterations based on studies from other tumor types or SWI/SNF members. Notably, ARID1A may be included as part of multigene panels for targeted next-generation sequencing [28]. A recent study on plasma DNA from 185 patients with treatment-naive lung adenocarcinoma has identified ARID1A mutations in 12% of patients, with co-occurrence of other oncogenic mutations such as KRAS and EGFR [29]. Given the preclinical data on targeting ARID1A-mutant tumors with EZH2 inhibitors [30] and emerging data on implications of immunotherapy outcomes [31,32,33], we anticipate a growing need to detect, interpret, and understand the impact of ARID1A alterations in diverse tumors including non-small cell lung carcinomas. This study aims to characterize the spectrum of ARID1A genetic alterations, correlate ARID1A mutations with expression by immunohistochemistry, and assess the clinicopathologic significance of ARID1A mutations and expression loss in non-small cell lung carcinomas.

Materials and methods

Case selection and clinicopathologic evaluation

The study was approved by the Institutional Review Boards at Brigham and Women’s Hospital and Dana-Farber Cancer Institute and included a consecutive series of 2440 patients with their non-small cell lung carcinomas analyzed by targeted next-generation sequencing assays at the Center for Advanced Molecular Diagnostics, Brigham and Women’s Hospital, between January 2014 and December 2017. This study included patients diagnosed with non-small cell lung carcinomas (adenocarcinoma, squamous cell carcinoma, adenosquamous carcinoma, and pleomorphic/giant cell/spindle cell carcinoma); while patients with other diagnoses including small cell carcinoma, other neuroendocrine tumors (carcinoid tumors and large cell neuroendocrine carcinoma), combined small cell or large cell neuroendocrine with non-small cell carcinomas, and adenoid cystic carcinoma were excluded. Clinical features recorded in the DFCI/BWH Oncology Data Retrieval Systems (OncDRS) for the entire cohort included sex, age at molecular diagnosis, and survival/last follow-up. For the subset of patients with ARID1A-mutant tumors, age at pathologic diagnosis, smoking history, and clinical stage (based on the American Joint Committee on Cancer Staging Manual 7th edition) were recorded.

Targeted next-generation sequencing and variant interpretation

Targeted next-generation sequencing (OncoPanel) was performed as previously described [34, 35], using DNA (at least 50 ng) extracted from formalin-fixed paraffin-embedded whole-tissue sections, solution-based hybrid capture with Illumina HiSeq 2500 (San Diego, CA), and a custom set of Agilent SureSelect capture probes (Santa Clara, CA). Over the 4-year study period, three versions of the in-house sequencing panel were used, targeting 275 genes covering 757,787 bp in 1305 (50.6%) cases, 298 genes covering 831,033 bp in 739 (28.7%) cases, and 447 genes covering 1,315,708 bp in 535 (20.7%) cases (a total of 2579 cases from 2440 patients). Sample reads (with mean target coverage of at least 50×) were analyzed using a custom bioinformatics pipeline with Picard and BWA (Broad Institute, Cambridge, MA) for alignment, VisCap Cancer (Dana-Farber Cancer Institute) for copy number variants, BreaKmer for structural variants, MuTect and GATK indelocator (Broad Institute) for single-nucleotide and insertion–deletion variants, and Oncotator and Integrated Genome Viewer (Broad Institute) for annotation.

For interpretation of ARID1A variants, we excluded synonymous variants, single-nucleotide variants with a minor variant allele frequency of >0.1% in the Exome Sequencing Project database (University of Washington, Seattle, WA), variants listed in the genome Aggregation Database (gnomAD; Broad Institute, Cambridge, MA), and suspected private germline variants based on the allele frequencies. Variants listed in the Catalogue of Somatic Mutations in Cancer (COSMIC) database (Wellcome Trust Sanger Institute, UK) were rescued. ARID1A variants were interpreted based on tumor percentage and associated copy number alterations if present. Nonsense, frameshift, conserved splice site, and structural rearrangement/truncating variants were classified as loss-of-function (LOF) mutations. In tumors with known driver oncogene (such as KRAS or EGFR) mutations, an ARID1A mutation clonality index was derived by normalizing the variant allele frequency of ARID1A relative to that of the driver mutation. Samples with <20% tumor were excluded from this analysis. We defined an ARID1A mutation with a clonality index of ≥0.6 as truncal and <0.6 as sub-clonal. Given the relative lack of robust informatics tools for assessing clonal architecture in solid tumors undergoing targeted next-generation sequencing, we determined that this cutoff point was a reasonable discriminator between truncal and sub-clonal variants, based on empiric data derived from whole-genome sequencing studies in myeloid tumors [36]. Clonality was indeterminate in tumors lacking driver mutations. Tumor mutational burden (TMB) was derived by tabulating the total number of non-synonymous missense and small insertion–deletion variants relative to the megabases (Mb) sequenced. Copy number loss was interpreted by initial pathology review based on copy number VisCap plots in log2 ratio values relative to the genome baseline. Genomic instability index was calculated by tabulating the total number of copy number variants relative to the Mb sequenced. Copy-neutral loss of heterozygosity of ARID1A was defined by showing either a presumed somatic variant allele frequency of ≥0.7 or a clonality index of ≥1.4, in the absence of a detectable copy change. Biallelic inactivation was defined as the presence of two LOF mutations (with no evidence of being in cis), one LOF mutation plus one copy deletion by copy number analysis, or one LOF mutation plus suspected copy-neutral loss of heterozygosity based on allele fraction of known polymorphisms. Cases were considered insufficient for this analysis when tumor content was estimated at <20% (the validated lower limit for copy number detection) or when sample quality (reflected in low mean target coverage or high sequencing noise) precluded confident assessment of copy number alterations or quantification of variant alleles. Assessment of mismatch repair (MMR) deficiency was performed on sequencing data using an in-house clinically validated algorithm based on the number of insertion–deletion mutations involving homopolymer repeats [37].

Immunohistochemistry

Immunohistochemistry for ARID1A was performed on 4-µm-thick formalin-fixed paraffin-embedded whole-tissue sections using a rabbit antihuman ARID1A polyclonal antibody (1:500 dilution; HPA005456, Sigma) following pressure cooker antigen retrieval (0.01 M citrate buffer pH 6.0). Immunohistochemistry for ARID1A was performed on all ARID1A-mutant non-small cell lung carcinomas with available material (143 cases from 139 patients), along with 40 ARID1A-wild-type non-small cell lung carcinomas. ARID1A immunohistochemistry was reviewed by two pathologists (YPH and LMS), and a consensus score was recorded; both were blind to the ARID1A molecular data at the time of the immunohistochemistry review. The percentage of cells with altered ARID1A expression and the pattern of loss were noted. Complete ARID1A loss was defined as the absence of perceptible ARID1A nuclear staining, while diminished ARID1A expression was defined as barely perceptible to moderate decrease in ARID1A nuclear staining, and ARID1A immunoreactivity was considered intact when there was no perceptible change in ARID1A nuclear staining in the tumor cells as compared with background control (stromal or inflammatory) cells. Diffuse loss of expression was defined as aberrant staining in at least 90% of cells; whereas heterogeneous loss of expression was defined as aberrant staining in <90% of cells.

Statistical analysis

The landscape of ARID1A alterations and mutation maps were illustrated using Microsoft Excel 2016 and Illustrator for Biological Sequences version 1.0.3 [38]. Statistical analysis was performed using GraphPad InStat version 3.1 and GraphPad Prism version 5.02 (LaJolla, CA). Categorical and non-categorical data were analyzed using chi-square tests and Mann–Whitney U tests, respectively. For patients with ARID1A-mutant tumors and the remainder of the cohort, survival was defined as the time of pathologic diagnosis and the time of molecular diagnosis, respectively, to the time of death from any cause or to the time of last clinical follow-up, at which point the data were censored. For assessment of mutational hotspots, a P value was calculated at each amino acid position in comparison with the expected value based on the Poisson distribution (µx eµ/x!). Statistical significance was defined by a P value < 0.05, with Bonferroni correction used in multiple comparisons.

Results

Clinicopathologic and molecular features of our study cohort

Our institutional cohort comprised 2440 patients (Supplementary Fig. 1), including 1442 (59%) females and 998 (41%) males, with a median age of 67 years (range 11-96). The histologic types included 1946 (80%) adenocarcinomas; 324 (13%) squamous cell carcinomas; 147 (6%) poorly-differentiated non-small cell lung carcinomas, not otherwise specified; 12 (0.5%) pleomorphic/giant cell/spindle cell carcinomas; and 11 (0.5%) adenosquamous carcinomas. Targeted next-generation sequencing was performed on 1531 (63%) primary tumors, 848 (35%) metastases, 36 (2%) local recurrences, and 25 from unspecified sites. The cohort harbored mutations in KRAS and EGFR overall in 33% and 19% of cases, respectively, comparable with published data [39].

Mutations in ARID1A in non-small cell lung carcinomas

ARID1A mutations were detected in 184 (7.5%) non-small cell lung carcinomas, of which 138 (75%) were adenocarcinomas, 33 (18%) were squamous cell carcinomas, and 13 (7%) were other subtypes. LOF alterations were present in 127 (69%) tumors (Fig. 1a), including nonsense mutations in 68 (37%), frameshift mutations in 52 (28%), splice site mutations in 6 (3%), and structural rearrangement/truncating mutations in 5 (3%). Missense mutations were noted in 77 (42%). Multiple ARID1A mutations were present in 28 (15%) tumors (Fig. 1b), including two mutations each in 26 tumors and three mutations in 2 tumors. The mutations were scattered throughout the ARID1A gene, as expected for a tumor suppressor gene (Fig. 1c). Of all ARID1A mutations in this study, 4 (2%) each were located at Q515 (4 nonsense) and Q1519 (2 nonsense; 2 frameshift), representing recurrently affected positions in this gene (P = 2.3 × 10−6 < Bonferroni-corrected threshold 0.05/2286 = 2.2 × 10−5).

Fig. 1: Mutation spectrum and protein expression patterns of ARID1A in non-small cell lung carcinomas.
figure 1

a Pie chart showing the presence and spectrum of ARID1A mutations in non-small cell lung carcinomas. b Heatmap showing the distribution of ARID1A alterations in non-small cell lung carcinomas (214 from 184 patients), with the types of alterations designated by their respective colors. c Location, number, and type (loss-of-function, red; missense, blue) of ARID1A mutations, with hotspot loss-of-function mutations at Q515 and Q1519 (P = 2.3 × 10−6). ARID1A expression patterns included (d) intact, e diminished, f complete loss of expression in all tumor cells, g heterogeneous expression with loss in a subset of tumor cells in a geographic pattern, or (h-i) an interspersed pattern, with ARID1A-deficient cells interspersed among ARID1A-intact cells.

Immunohistochemistry for ARID1A in non-small cell lung carcinomas

Intact nuclear immunoreactivity (Fig. 1d), at an intensity commensurate with benign internal control cells, was observed in all 40 ARID1A-wild-type lung tumors. Of 144 (78%) ARID1A-mutant tumors tested, 5 had insufficient tumor cells or internal positive control (stromal or inflammatory cells) for scoring and were excluded from further analysis. Among the 139 quantifiable cases, ARID1A expression was intact in 75 (54%) and aberrant in 64 (46%) tumors. Patterns of aberrant expression included: diffuse diminished expression (Fig. 1e) in 17 (12%) ARID1A-mutant tumors, diffuse complete loss (Fig. 1f) in 13 (9%) tumors, and intratumoral heterogeneous loss of ARID1A expression in 34 (25%) tumors. Heterogeneous patterns of loss of expression could be geographic, involving a contiguous area (Fig. 1g); or interspersed, with ARID1A-deficient cells interspersed among ARID1A-intact cells (Fig. 1h, i).

Correlation between ARID1A mutation and expression status in non-small cell lung carcinomas

The landscape of the 139 ARID1A-mutant non-small cell lung carcinomas with corresponding ARID1A expression patterns, molecular profile, and clinicopathologic features is illustrated in Fig. 2. The list of ARID1A mutations along with their corresponding clonality index, TMB, and ARID1A immunohistochemical patterns are provided in Supplementary Table 1. The relationship between ARID1A mutation status and protein expression is complex. This is exemplified by three lung adenocarcinomas harboring an identical ARID1A nonsense mutation Q515* (Supplementary Fig. 2a–c) but showing different ARID1A expression patterns, with diffuse complete loss, heterogeneous loss, and intact expression in one case each (Supplementary Fig. 2d–f).

Fig. 2: Landscape of ARID1A protein expression and genomic alterations in 139 non-small cell lung carcinomas.
figure 2

The data are grouped by ARID1A expression patterns and include types and numbers of ARID1A mutations, ARID1A mutation clonality assessment, and evidence of biallelic inactivation. Concurrent mutations in EGFR, KRAS, HRAS, NRAS, BRAF, ERBB2, TP53, ARID1B, ARID2, PBRM1, SMARCA4, and SMARCB1 are displayed if present, along with tumor mutation burden, patients’ sex, smoking history, clinical stage, tumor site sequenced, and histology.

To dissect the relationship between ARID1A mutation and expression status (Table 1), we examined the following parameters: (1) quantity, types, and positions of mutations in ARID1A; (2) clonality index to determine if an ARID1A mutation is truncal or sub-clonal; and (3) evidence of biallelic inactivation (see “Methods”). We found complete loss or diminished ARID1A expression to be significantly associated with ARID1A LOF mutations (88% vs. 55%; P < 0.0001; Table 1) and evidence of biallelic inactivation (59% vs. 5%; P < 0.0001; Table 1). On the other hand, the presence of multiple ARID1A mutations alone had no significant effect on the presence of aberrant expression (20% vs. 9%; P = 0.09; Table 1), nor could the positions of ARID1A LOF mutations within the gene explain the expression patterns (Supplementary Fig. 3). Notably, for the 13 non-small cell lung carcinomas with complete loss of ARID1A expression, each harbored one or multiple ARID1A LOF mutations, all of which were truncal rather than sub-clonal, with evidence of biallelic inactivation identified in all cases for which these parameters could be assessed.

Table 1 Correlation between ARID1A protein expression and ARID1A molecular alterations in ARID1A-mutant non-small cell lung carcinomas.

ARID1A mutations and loss of expression correlate with higher tumor mutational burden

We examined the clinical and functional significance of ARID1A mutations in non-small cell lung carcinomas; their correlations are summarized in Table 2.

Table 2 Clinicopathologic and molecular characteristics of non-small cell lung carcinomas with ARID1A mutations.

Compared with ARID1A-wild-type tumors, ARID1A-mutant tumors showed similar distributions in age, gender, and histologic types. ARID1A-mutant tumors were less likely to harbor EGFR mutations (9% vs. 20%; P = 0.0003) and more likely to harbor TP53 mutations (69% vs. 52%; P < 0.0001) and showed higher mutational burden than ARID1A-wild-type tumors (P < 0.0001). Overall, 84% of ARID1A-mutant tumors harbored greater than the median number of mutations seen in our entire cohort of non-small cell lung carcinomas. Subgroup analysis demonstrated no significant differences in the mutational burden between tumors with ARID1A LOF mutations and those with missense mutations (P = 0.54). ARID1A mutations or aberrant ARID1A expression patterns were not associated with mutations in other SWI/SNF members ARID1B, ARID2, SMARCA4, and SMARCB1, though PBRM1 mutations were more frequent in ARID1A-mutant tumors (Supplementary Table 2).

For the 139 ARID1A-mutant non-small cell lung carcinomas, the correlations of ARID1A expression status with clinicopathologic features are summarized in Table 3. Compared with the 75 ARID1A-intact tumors, the 64 tumors with aberrant expression showed similar distributions in age, gender, histologic types, mutations in EGFR/TP53/KRAS, and mutational burdens; their mutational burden was significantly higher than that of ARID1A-wild-type tumors (P < 0.0001).

Table 3 Clinicopathologic and molecular characteristics of ARID1A-mutant non-small cell lung carcinomas with aberrant ARID1A expression.

Of note, in this cohort of 2440 non-small cell lung carcinoma patients, as expected, the presence of TP53 mutation was associated with increased copy number alterations and genomic instability (median index 31.2 vs. 18.2; P < 0.0001). However, there were no statistically significant differences in the genomic instability index between ARID1A-wild-type and ARID1A-mutant tumors (Table 2) and between ARID1A-intact tumors and tumors with aberrant ARID1A expression (Table 3).

Loss of ARID1A protein expression, but not mutation, correlates with worse overall survival among ARID1A-mutant tumors

In 2107 non-small cell lung carcinoma patients with available survival data, no significant differences in overall survival were noted for tumors harboring mutations in ARID1A, ARID1B, ARID2, or PBRM1 (Fig. 3a); however, survival was worse for patients with SMARCA4-mutant tumors, consistent with the published literature [18, 19, 21, 22]. Among the 125 patients with ARID1A-mutant non-small cell lung carcinomas and available survival data, survival was similar among those with ARID1A LOF versus missense variants (Fig. 3b), regardless of clinical stages (Supplementary Fig. 4a, b). In this group of patients, aberrant ARID1A expression was significantly associated with decreased overall survival (median 2.6 years vs. not reached; P = 0.03; Fig. 3c and Table 3). Subgroup analysis revealed that survival differences persisted in advanced clinical stages III/IV (median 2.0 years vs. not reached; P = 0.014; Supplementary Fig. 4c) but not in stages I/II (median 6.0 years vs. not reached; P = 0.53; Supplementary Fig. 4d); survival differences also persisted for tumors with diffuse ARID1A loss (median 2.5 years vs. not reached; P = 0.04; Table 3) but not for tumors with heterogeneous ARID1A loss (median 6.0 years vs. not reached; P = 0.10; Table 3). Of note, the median overall survival was 2.1 years, 2.5 years, and 2.1 years for patients with tumors showing diffuse complete loss, diffuse diminished expression, and heterogeneous diminished expression of ARID1A, respectively; whereas the median overall survival was not reached for patients with tumors with intact expression or heterogeneous complete loss of ARID1A (Supplementary Fig. 4e).

Fig. 3: Overall survival of non-small cell lung carcinoma patients with ARID1A mutations.
figure 3

a Forest plots of overall survival in 2107 patients with non-small cell lung carcinomas harboring mutations in ARID1A, ARID1B, ARID2, PBRM1, and SMARCA4, showing worse survival (P < 0.0001) only in SMARCA4-mutant tumors but not in other groups. Kaplan–Meier overall survival of 125 ARID1A-mutant non-small cell lung carcinoma patients (b) harboring ARID1A loss-of-function (LOF) mutation as compared with non-LOF mutation; and c showing intact expression as compared with loss of ARID1A expression.

Diffuse loss of ARID1A expression in non-small cell lung carcinomas associated with poorly differentiated histology and frequent lack of driver mutations

We next focused on the 30 non-small cell lung carcinomas with diffuse loss of ARID1A expression (Fig. 2 and Table 3). This group included 18 women and 12 men, with a median age of 69 (range 40–88); all but one (97%) were smokers. Histologically, this group included 20 adenocarcinomas (including solid-predominant in 13 and acinar-predominant with minor solid or cribriform patterns in seven tumors; Fig. 4a–f), six squamous cell carcinomas (including one with focal pleomorphic features and complete loss of ARID1A expression in both squamous and pleomorphic components; Fig. 4g–k), two pleomorphic carcinomas, and two poorly differentiated non-small cell carcinomas, not otherwise specified. Overall, 27 (90%) tumors had high-grade features. TTF-1 immunoreactivity was noted in 8 of 20 (40%) tumors tested, including 7 of 13 (54%) adenocarcinomas and one pleomorphic carcinoma.

Fig. 4: Photomicrographs of ARID1A-deficient non-small cell lung carcinomas.
figure 4

a Metastatic lung adenocarcinoma (ARID1A Q474*; no known driver mutation) characterized by tumor cells with moderate eosinophilic cytoplasm and a cribriform pattern. b Lung adenocarcinoma (ARID1A S617*, KRAS G12F) in a solid architecture, with moderate amphophilic-to-vacuolated cytoplasm and brisk background chronic inflammation. c Metastatic lung adenocarcinoma (ARID1A E1718* with loss of heterozygosity, KRAS G13C) showing a solid architecture, prominent admixed neutrophils, and scattered mucin droplets. df All three tumors demonstrated complete loss of ARID1A expression. gi Lung squamous cell carcinoma (ARID1A S261* with loss of heterozygosity; no known driver mutation) composed of h squamoid-to-clear cells with focal keratinization, i along with a minor pleomorphic component with conspicuous mitoses and prominent spindling. j, k Complete ARID1A expression loss was noted in both components.

Genetically, 14 (47%) tumors harbored KRAS activating mutations (G12V in 4; G12C in 3; G12F, G12A, G12L, G12D, G13C, V14I, and Q61H in one tumor each), and an HRAS activating mutation G13V was present in one additional tumor. No driver mutations or rearrangements were detected in the remaining 15 (50%) tumors, including in 8 of 13 (62%) tumors with complete ARID1A loss. Other genomic alterations included pathogenic TP53 mutations in 20 (67%) tumors, including in 12 (80%) tumors with no mitogenic driver mutations. A PBRM1 nonsense mutation and a SMARCA4 frameshift mutation were present in one case each (Supplementary Table 3). Compared with ARID1A-wild-type tumors, non-small cell lung carcinomas with diffuse loss of ARID1A expression showed increased mutational burden.

Diffuse loss of ARID1A expression in rare non-small cell lung carcinomas with MMR deficiency

MMR pathway-deficient non-small cell lung carcinomas were rare, with only three (0.12%) tumors identified in our cohort of 2440 patients. All three contained ARID1A mutations. Indeed, MMR deficiency was significantly enriched in tumors with diffuse ARID1A loss (2 of 30; 6.7%; P < 0.0001), involving a tumor with complete ARID1A loss and a tumor with diminished ARID1A expression. Both tumors demonstrated an MMR-deficient mutational signature with frequent homopolymer insertion-deletion mutations (12.1–13.7/Mb) and increased mutational burden (49.4/Mb; >99th percentile). One MMR-deficient lung adenocarcinoma with an ARID1A frameshift mutation showed intact ARID1A protein expression, suggesting this represented a hypermutation-related passenger event in this tumor.

Heterogeneous loss of ARID1A expression in non-small cell lung carcinomas

The 34 patients with heterogeneous ARID1A loss (Fig. 2 and Table 3) included 20 women and 14 men, with a median age of 68 (range 43–91); all except one (97%) were smokers, and 85% of tumors were adenocarcinomas. This subgroup showed no significant differences in its clinicopathologic and molecular features as compared with ARID1A-mutant tumors with intact or diffuse loss of ARID1A expression (Table 3), including no survival differences as compared with tumors with intact ARID1A expression (P = 0.10) or to those with diffuse ARID1A loss (P = 0.64). The patterns of heterogeneous ARID1A loss included complete loss in 11 (geographic in 3, interspersed in 8) and diminished expression in 23 tumors (geographic in 11, interspersed in 12). Geographic complete loss of ARID1A expression was notable in a squamous cell carcinoma and two adenocarcinomas, including one that showed high-grade fetal-like differentiation corresponding to areas of ARID1A loss (Fig. 5a–c).

Fig. 5: Geographic loss of ARID1A protein expression in a lung adenocarcinoma showing focal high grade fetal-like differentiation.
figure 5

Photomicrographs of a lung adenocarcinoma (ARID1A Q2070*, KRAS G12V) at a low- and b high-power magnification, showing c heterogeneous geographic loss of ARID1A expression with corresponding high-grade fetal-like differentiation.

Discussion

In this large institutional cohort of 2440 non-small cell lung carcinomas, ARID1A mutations were detected in 7.5%; however, we estimated that only 1–2% of non-small cell lung carcinomas show corresponding loss of ARID1A protein expression, consistent with a recent report of ARID1A loss in 1.3% of 1013 non-small cell lung carcinomas using tissue microarrays [17]. Aberrant ARID1A expression patterns included complete loss or diminished expression in a diffuse or heterogeneous pattern, the latter of which could be geographic or interspersed. While complete or heterogeneous geographic ARID1A loss has been described, the interspersed ARID1A loss pattern has not been previously reported to our knowledge. Aberrant ARID1A expression correlated with ARID1A LOF mutations and evidence of biallelic inactivation. Both ARID1A mutations and aberrant expression correlated with a lack of EGFR mutations, frequent TP53 mutations, and increased mutational burden. ARID1A-mutant tumors showed similar overall survival compared with ARID1A-wild-type tumors; however, among patients with ARID1A-mutant tumors, ARID1A expression loss correlated with worse overall survival. Lung tumors with diffuse loss of ARID1A expression were predominantly adenocarcinomas, poorly differentiated (including pleomorphic carcinomas in two tumors), almost exclusively from smokers, lacking driver mutations in 50% of cases, and enriched for MMR deficiency. Heterogeneous geographic ARID1A loss was notable in three tumors, including an adenocarcinoma showing fetal-like differentiation in areas with ARID1A loss.

This study was limited in its use of data from a single institution with potential selection/referral bias, tumor-only sequencing with germline variants confounding interpretation of somatic alterations, and the use of a targeted sequencing panel that limits confident assessment of variant clonality. ARID1A also contains repetitive regions that are susceptible to sequencing artifacts and misalignment, rendering some variant interpretation challenging. Nonetheless, we reported ARID1A alterations in non-small cell lung carcinomas with similar prevalence as other studies [5, 8, 11, 12, 25, 29].

Despite the association of aberrant ARID1A expression with ARID1A LOF mutations and evidence of biallelic inactivation, we noted that ARID1A mutation status alone was an unreliable predictor of ARID1A expression. This disconnect between ARID1A mutation and expression status had been documented in a subset of endometrial adenocarcinomas [5] and MMR-stable EBV-infected gastric adenocarcinomas [9]. In endometrial adenocarcinomas, while ARID1A mutations were early clonal events in only 25% of cases, loss of ARID1A expression was commonly seen in large subsets of both primary tumors and metastases [40]. The discordance between ARID1A mutation status and its expression status in non-small cell lung carcinomas may be due to several technical and biological reasons: First, a substantial fraction (47%) of ARID1A mutations herein were sub-clonal, raising the possibility that at least some of these are passenger mutations of questionable functional relevance. Second, our data may be limited by tumoral heterogeneity, as a tumor with heterogeneous loss of expression may show only intact or complete-loss pattern in a limited sample; this bias was minimized by performing immunohistochemistry on whole-tissue sections rather than tissue microarrays. Third, loss of ARID1A expression may be due to posttranscriptional, posttranslational, and epigenetic mechanisms [1] not examined herein; however, in no case did we observe protein loss in the absence of a corresponding LOF mutation. Fourth, as we measured ARID1A expression as a correlate of ARID1A gene function, we did not capture deleterious effects from missense or in-frame indel variants that nonetheless have intact epitope. Rare missense mutations in SMARCA4 had been reported to disrupt chromatin remodeling by altering its conserved ATPase surfaces and the accessibility landscape of enhancers [41]. Nevertheless, our findings of ARID1A mutational status being a poor predictor for ARID1A expression status may have implications for interpretation of ARID1A sequence variants if ARID1A status is used in consideration of choice of therapy.

In describing the clinicopathologic and molecular features of non-small cell lung carcinomas with ARID1A mutations or aberrant ARID1A expression, we noted the following observations:

First, our findings of frequent TP53 mutations in ARID1A-mutant non-small cell lung carcinomas were consistent with a recent analysis on plasma DNA from patients with advanced lung adenocarcinoma, in which 70% of patients with ARID1A-mutant tumors harbored concurrent TP53 mutations. Interestingly, this observation demonstrated a different genetic context and relationship between TP53 and ARID1A alterations as compared with other tumors, as ARID1A-mutant gastric, endometrial, and ovarian carcinomas typically harbored wild-type TP53 [9, 10, 42], while no relationship was apparent between ARID1A and p53 expression in esophageal adenocarcinoma [43].

Second, heterogeneous geographic ARID1A loss was notable in one adenocarcinoma showing high-grade fetal-like differentiation. Though rare, lung adenocarcinomas with a high-grade fetal adenocarcinoma-like component harbored a poor prognosis [44]. The loss of ARID1A expression precisely in the fetal-like areas suggested that ARID1A may be a molecular correlate of this histologic transformation in some cases.

Third, while the association between ARID1A loss and microsatellite instability we observed in rare non-small cell lung carcinomas had been described in endometrial [42, 45, 46], ovarian [46], colorectal [47,48,49], and gastric adenocarcinomas [9, 10, 47], the mechanistic links between ARID1A loss and MMR deficiency varied by tumor types. ARID1A loss may represent a consequence of MMR deficiency, as ARID1A contains repetitive sequences that are prone to mutagenesis, as seen in MMR-deficient gastric adenocarcinomas [9]. ARID1A loss is associated with MLH1 promoter hypermethylation resulting in sporadic MMR deficiency in a subset of colorectal [48] and endometrial adenocarcinomas [42]. ARID1A is also a binding partner of MSH2; loss of ARID1A can thus directly compromise MMR [31].

ARID1A loss has been associated with improved response to immunotherapy across diverse tumor types [31, 33, 50]. ARID1A can directly interact with EZH2 to antagonize EZH2-mediated interferon response [51]. Specific to non-small cell lung carcinomas, ARID1A mutation status may predict improved response to durvalumab plus tremelimumab [32]. However, it is unclear whether ARID1A mutation status or protein expression status is a better biomarker in predicting response in these patients. Further correlative studies of ARID1A may clarify its role in selecting patients who benefit from immune checkpoint blockade.

In summary, while ARID1A mutations were present in 7.5% of non-small cell lung carcinomas, <2% showed diffuse loss of ARID1A expression, which was associated with high-grade histologic features, frequent TP53 mutations, and lack of targetable driver mutations. With ARID1A emerging as a potential therapeutic target, our findings have implications for ARID1A variant interpretation in clinical sequencing assays. Functional characterization of alterations including in ARID1A that contribute to the phenotypic spectrum of non-small cell lung carcinomas may enable better patient selection for personalized treatment.