ATRX, DAXX or MEN1 mutant pancreatic neuroendocrine tumors are a distinct alpha-cell signature subgroup

The commonly mutated genes in pancreatic neuroendocrine tumors (PanNETs) are ATRX, DAXX, and MEN1. We genotyped 64 PanNETs and found 58% carry ATRX, DAXX, and MEN1 mutations (A-D-M mutant PanNETs) and this correlates with a worse clinical outcome than tumors carrying the wild-type alleles of all three genes (A-D-M WT PanNETs). We performed RNA sequencing and DNA-methylation analysis to reveal two distinct subgroups with one consisting entirely of A-D-M mutant PanNETs. Two genes differentiating A-D-M mutant from A-D-M WT PanNETs were high ARX and low PDX1 gene expression with PDX1 promoter hyper-methylation in the A-D-M mutant PanNETs. Moreover, A-D-M mutant PanNETs had a gene expression signature related to that of alpha-cells (FDR q-value < 0.009) of pancreatic islets including increased expression of HNF1A and its transcriptional target genes. This gene expression profile suggests that A-D-M mutant PanNETs originate from or transdifferentiate into a distinct cell type similar to alpha cells.

P ancreatic neuroendocrine tumors (PanNETs) or islet cell tumors are a relatively rare neuroendocrine malignancy with an annual incidence of <1 per 100,000 per year 1 (about 1000 new cases per year in the United States) but currently represent the second most common epithelial neoplasm after ductal adenocarcinoma of the pancreas and account for 1-2% of pancreatic tumors. PanNETs were erroneously considered a benign group of neoplasm because they were initially mostly comprised of benign symptomatic insulin-producing tumors (insulinomas). However, in the past three decades, it has become apparent that at least half of all PanNETs are nonfunctional, and they are a heterogeneous group of tumors with often unpredictable and varying degrees of malignancy. As many as 50-80% of PanNETs are associated with synchronous or metachronous metastatic disease 2 . Knowledge of functional PanNETs has evolved from insulinoma to almost a dozen other diverse hormone-secreting tumors. These individual lesions may have specific clinical, pathologic, and genetic associations, including multiple endocrine neoplasia type 1 (MEN-1), tuberous sclerosis, and von Hippel-Lindau (VHL) syndromes. Thus, the entity of PanNET represents a diverse group of heterogeneous neoplasms where combined clinical and pathologic assessment is required to further identify their genetic basis for neoplasia and to define their specific clinical behavior. The nonfunctional tumors require further elucidation to characterize their diverse pathogenesis and to predict outcome with potential biomarkers and molecular signatures. Current classification scheme for PanNETs include grade and stage 1 . The World Health Organization (WHO) classification, which assesses the proliferative index of neoplastic cells, divides PanNETs into low grade (G1), intermediate grade (G2), and high grade (G3). The higher grade PanNETs are generally associated clinically with more aggressive behavior 3 . Poorly differentiated neuroendocrine tumor of the pancreas is extremely rare and clinically aggressive, which represents a different pathogenesis from the well differentiated counterpart 4 . While well-differentiated PanNETs can be successfully treated with surgery, there are few treatments for metastatic PanNETs, and they do not respond to conventional chemotherapy. A greater understanding of Pan-NET pathogenesis may guide the development of novel therapeutic options.
Molecular studies have identified mutations in MEN1, ATRX, and DAXX to be the most commonly found in PanNETs 5,6 (found in approximately 40, 10, and 20% of tumors, respectively). All three genes play a role in chromatin remodeling. MEN1 is a component of a histone methyltransferase complex 7 that specifically methylate Lysine 4 of histone H3 and functions as a transcriptional regulator. ATRX and DAXX interact to deposit histone H3.3-containing nucleosomes in centromeric and telomeric regions of the genome 8 . Additional mutations in mTOR pathway genes including TSC2, PTEN, and PIK3CA are found in one in six well-differentiated PanNETs 5 . Other reported rare mutations in PanNETs include DNA damage repair genes (MUTYH, CHEK2, BRCA2) and chromatin remodeling gene SETD2 6 .
The neuroendocrine cells in the pancreas include alpha, beta, delta, pancreatic polypeptide (pp)-producing and vasoactive intestinal peptide (VIP)-producing cells. The cell of origin for nonfunctional PanNETs is not well established. Here, we genotyped 64 well differentiated PanNETs for mutations in ATRX, DAXX, and MEN1 and performed RNA sequencing (n = 33) and DNA methylation (n = 32) analysis to identify distinct molecular phenotypes of A-D-M mutant PanNETs which potentially reveals their distinct cell of origin or transdifferentiated state.

Results
Clinical annotation and genotyping for ATRX, DAXX, and MEN1. We initially performed Sanger sequencing to genotype the ATRX, DAXX, and MEN1 genes in 64 individual PanNETs. All cases were histologically confirmed to be well-differentiated PanNETs of WHO G1/G2 grade, and cases of poorly differentiated neuroendocrine carcinoma were excluded. The mean patient age was 52 ± 1.5 years (mean ± standard error, ranging from 26-73) with a 59% male population. The locations of the tumors were 38% proximal/mid body, and 62% distal pancreas. Eighty-one percent of the cases were clinically non-functional and the remaining cases included insulinomas, glucagonomas, gastrinomas, and VIPomas. The median size of tumor was 3.6 ± 0.4 cm (ranging from 1.0-14.5 cm). Sixty-eight percent of patients had localized disease without distant metastasis at the time of initial diagnosis (Supplementary data 1).
An A-D-M mutant genotype was identified in 58% (37/64) of cases with ATRX, DAXX, MEN1, MEN1/ATRX, and MEN1/ DAXX mutations in 8, 16, 20, 3, and 11% cases, respectively (Fig. 1a). The majority of mutations in ATRX, DAXX, and MEN1 were truncation mutations (stopgain or frameshift) and loss of function consistent with their role as tumor suppressors (Supplementary Data 2). Similar to the observations in our previously published data 9 , the 5-year disease specific survival was associated with tumor stage (p-value < 0.04, log-rank tests), tumor grade (G1 vs G2 p-value < 0.02, log-rank tests), and distant metastasis (p-value < 0.002, log-rank tests), respectively. Among 44 patients who initially presented with localized disease without distant metastasis, those with the A-D-M mutant genotype had a worse recurrence free survival than those of A-D-M WT genotype (Fig. 1b). Furthermore, in comparison to A-D-M WT PanNETs, the A-D-M mutant PanNETs were associated with larger tumor size (3.6 ± 0.6 cm vs. 5.6 ± 0.7 cm, p-value < 0.03, log-rank tests) and higher tumor stage (T1 and T2 vs. T3, p-value < 0.04, logrank tests). Other demographic and clinical characteristics (including gender, age, tumor functionality, and lymph node metastasis) revealed no statistically significant differences between the two genotypes of PanNETs.
Gene expression and DNA methylation reveal two subtypes of PanNETs. We performed RNA sequencing on 33 randomly selected tumors (19 A-D-M mutant, and 14 A-D-M WT). Unsupervised hierarchical clustering of the top 3000 variable genes across the PanNETs revealed two distinct clusters where almost all A-D-M mutant PanNETs were found in one cluster (Fig. 2a). The grouping of A-D-M mutant PanNETs into one distinct cluster by gene expression was robust to the number of most variable genes used for clustering (Supplementary Figure 1 To investigate the global histone methylation level in PanNETs with and without A-D-M mutations, we performed immunohistochemistry on H3K4me3, H3K9me3, H3K27me3, and H3K36me3 on 36 PanNETs. There was a general trend of lower histone methylation level for MEN1 mutated PanNETs when compared to WT PanNETs (Supplementary Figure 6 and Supplementary Table 1).
A-D-M mutant PanNET gene expression resembles that of alpha cells. There are multiple neuroendocrine cell types in the pancreas including alpha, beta, gamma, delta, and epsilon. We used gene expression data for these various pancreatic neuroendocrine and exocrine cell types from a single cell sequencing study 11 (Supplementary Table 2) to identify gene-set signatures representing highly expressed cell-type-specific genes (Supplementary data 4). The A-D-M mutant PanNETs uniformly exhibited a gene expression signature that was very similar to that of alpha cells (Fig. 4a). The A-D-M WT PanNETs were more heterogeneous in their expression of the genes among the gene set signatures for the different pancreatic neuroendocrine cell types. Greater heterogeneity of gene expression signature in A-D-M WT PanNETs was consistent with the greater heterogeneity found in global gene expression.
To further investigate the gene expression signature of A-D-M mutant PanNETs we performed gene set enrichment analysis 12 (GSEA) on the 13 manually curated gene sets for pancreatic endocrine and exocrine cells from a previous study. This study assessed gene expression of individual pancreatic cell  Table 2). Our analysis indicates that only the alpha cell gene signature was significantly enriched in A-D-M mutant PanNETs (FDR q-value < 0.009) (Fig. 4b, c) (Supplementary Table 3).    . Additionally, we found alpha cell signatures to be significantly enriched (FDR q < 0.001) only in the A-D-M mutant PanNETs from the two validation data sets using GSEA (Fig. 5b, c).  GCG  LOXL4  PLCE1  IRX2  GC  KBTBD10  CRYBA2  TTR  TM4SF4  RGS4  FEV  ARX  PTGER3  HMGB3  RFX6  MAFB (Supplementary data 7). Of the 287 differentially methylated genic CpG sites, 70 (associated with 59 genes) were found at promoter (transcriptional start site, TSS1500 and TSS200) or within first exon, a region where DNA methylation is associated with transcriptional repression 17 . Thirteen of the 59 genes were also found to be differentially expressed (with fold change greater than 3 and corrected p-value < 0.05, Benjamini-Hochberg) and seven genes that were hypomethylated in A-D-M mutant and over-expressed are APOH, CCL15, EMID2, PDZK1, HAO1, BAIAP2L2, and NPC1L1. One gene, TACR3, was hypomethylated in A-D-M WT and over-expressed (Supplementary data 7). Four of the 70 CpG sites were found in the gene PDX1 (pancreatic and duodenal homeobox 1), a transcription factor necessary for pancreatic development and beta cell maturation. PDX1 functions in the cell fating of endocrine cells, favoring the production of insulin positive beta cells and somatostatin positive delta cells while repressing glucagon positive alpha cells 18 . These four CpG sites were all hypermethylated in A-D-M mutant PanNETs (Fig. 7a) and the expression of PDX1 was 2.92 fold higher in A-D-M WT PanNETs (p-value < 0.005, DeSeq2) (Fig. 7a, b). In contrast, while ARX was highly expressed in A-D-M mutant Pan-NETs compared to A-D-M WT PanNETs, the promoter and first exon of ARX are not differentially methylated.

Discussion
Similar to a number of recent studies 19,20 , we have demonstrated in this cohort of PanNETs that, in additional to pathologic stage and grade of the tumor, mutations in DAXX, ATRX, and MEN1 are associated with adverse clinical outcome in comparison to those without these mutations. Our results seem to be in contradiction to the findings initially reported by Jiao et al. 5 , in that 15 patients with PanNETs carrying mutations in DAXX or ATRX genes had better survival than did 12 patients with wild-type PanNET. This discrepancy between our data and their data could be attributed to a different composition of the tumors. Indeed, all   ATRX-DAXX and MEN1 are involved in distinct biochemical pathways to regulate gene expression. Therefore, we would expect that loss of these proteins during transformation of A-D-M mutant PanNETs would result in a more heterogeneous gene expression profile. Due to the high degree of homogeneity of the A-D-M mutant PanNETs at the level of gene expression and the strong expression of genes that are known to be alpha cell specific, we hypothesize that alpha cells are the cell-of-origin for this group of tumors. In addition, MEN1 and ATRX/DAXX mutations occur alone or in a combined pattern suggest that they have independent oncogenic activities in A-D-M mutant PanNETs, making the idea of reprogramming to a homogeneous alpha-like cell state less probable. Some of the A-D-M WT PanNETs have a strong beta cell signature and these may have arisen from beta cells (Fig. 4a). However, other A-D-M WT PanNETs have neither alpha nor beta cell signatures, which may arise from other cell types in the pancreas.
Conditional knockouts of MEN1 in mice support the model of an alpha cell origin for A-D-M PanNETs 21 . The restricted deletion of MEN1 to alpha cells surprisingly led to the development of insulinomas 22,23 . Most of our PanNETs were nonfunctional (26 of 33 PanNETs) but the functional tumors were insulinoma and VIPoma, even though their gene expressions have alpha cell signature. Moreover, some PanNETs express combinations of neuroendocrine hormones (GCG, INS, SST, PPY, GHRL, VIP, and GAST), suggesting that regulation of cell type specific hormone may be disrupted. To create robust gene signatures that are not sensitive to changes in expression of a few genes, we use a large number of genes to create the A-D-M mutant and alpha cell signatures. In other mouse models of PanNETs 21,24,25 , MEN1 deletion using the insulin or PDX1 promoter driven Cre construct, insulinomas, glucagon-expressing tumors and well differentiated PanNETs were also observed. However, Cre expression may be leaky in these models and further study is needed to understand the heterogeneity of the cells in the tumors that develop and trace the cell of origin or transdifferentiated state of the cancer cells.
In our gene expression analysis, we have not identified the oncogenic pathways activated in A-D-M mutant PanNETs. MEN1 has been shown to upregulate expression of long noncoding RNA MEG3 in MIN6 mouse insulinoma cell line 26 . In the same study, they show MEG3 represses expression of the oncogene MET leading to delayed cell cycle progression and reduced cell proliferation. In a different study, MEN1 and DAXX were shown to repress the expression of the membrane metalloendopeptidase (MME) and mutations in MEN1 or DAXX result in loss of this repression leading to neuroendocrine tumor proliferation 27 . Our data is consistent with these studies when comparing A-D-M mutant to WT PanNETs, showing that A-D-M mutant PanNETs have lower expression of MEG3 (7.3 fold lower, p-value < 4.3E-07, DeSeq2), higher expression of MET (3 fold higher, p-value < 0.003, DeSeq2), and higher expression of MME (4 fold higher, p-value < 0.001, DeSeq2). Among A-D-M mutant PanNETs, we do not see expression differences of MEG3, MET, and MME depending on mutation status of ATRX, DAXX, and MEN1.
While PanNETs may seemingly represent as a single clinical disease, they can be further characterized into different subtypes based upon their cell lineage and the associated molecular genotype. Understanding the epigenetic and transcriptional dysregulation of PanNETs will require comparison to their proper cells of origin which may explain the unpredictable outcome of the disease and facilitate the development of unique and targeted therapeutic strategies.

Methods
Patient's information. Retrospective and prospective reviews of well-differentiated, pancreatic neuroendocrine neoplasms were performed using the pathology files and pancreatic database at MSKCC with IRB approval. All patients were evaluated clinically at our institution with confirmed pathologic diagnoses, appropriate radiological and laboratory studies, and surgical or oncological management. Follow-up information was obtained for all cases.
Tissue acquisition and nucleotide extraction. Briefly, cases of pancreatic neuroendocrine tumors were identified. Fresh-frozen tumor and paired normal tissues were obtained from MSKCC's tissue bank under an Institutional Review Board protocol. Histopathology of all tissues was evaluated on hematoxylin and eosin stained sections by an experienced gastrointestinal-hepato-pancreatobiliary pathologist to insure the nature of the tissue, greater than 80% tumor cellularity and absence of necrosis. The relevant tissues were then macro-dissected (20-25 mg) and DNA/RNA extraction using Qiagen's DNeasy Blood & Tissue Kit and RNeasy Mini Kit, respectively was carried out according to the manufacturer's protocols (Qiagen, Valencia, CA).
Sanger sequencing for gene mutation. All exons of the DAXX, ATRX, and MEN1 genes were amplified by PCR and then sequenced using Sanger sequencing. Every mutation detected was validated by bidirectional Sanger sequencing on the tumornormal pairs. To maintain the correct sample annotation, we used mutation status as sample name with sample ID (for example, A_mk11 sample is ATRX mutant and mk11 is sample ID). Supplementary data 1-3 contains all the clinical information, mutational profile, sample annotation and ESTIMATE tumor purity. Online Oncoprint was used to plot create Fig. 1a.
PanNETs transcriptome sequencing and data analysis. RNA Library preparation and RNA sequencing was done by MSKCC Genomics Core Laboratory using Illumina HiSeq with (2 × 75 bp paired end reads) to a minimum depth of~50 million reads were generated for each sample. Raw fastq files were probed for sequencing quality control using FastQC [http://www.bioinformatics.babraham.ac. uk/projects/fastqc]. Sequencing reads were mapped to human transcripts corresponding to Genepattern 28 genome (hg19 version) GTF annotations using RSEM with default parameters. RSEM package 29 was used to prepare the reference genome with given GTF and calculated expression from mapped BAM files. STAR 30 aligner was used to map reads in RSEM algorithm. Transcripts mapped data were normalized to TPM (Transcript Per Million) from RSEM and log2 transformed (Supplementary data 8). This log2TPM values were used for all downstream analysis. Unsupervised clustering and Principal Component analysis was conducted to elucidate subtypes structure using top 3000 variant genes as input. To query robustness of this subtyping, multiple variant gene sets were used and repeated the same process of unsupervised clustering. Top 100 variable genes were used to find genes, which were highly expressed in each subtype. Subset of these genes is selected to show in Fig. 2d for liver and complement system genes. To find differentially expressed genes (DEgenes) between A-D-M mutant PanNETs and A-D-M WT panNETs, we used DeSeq2 R package 31 on raw count (values from RSEM). We used significance cutoff with greater than 3 fold change and corrected p-value < 0.05 (Benjamini-Hochberg) to call a gene as DEgenes. GSEA Preranked 12 method was used on DEgenes to find significant KEGG pathways, motif and biological process.
Clustering and principal component analysis. For unsupervised clustering on log2TPM, we used Pearson distance metric and ward.D2 hclust method (unless stated otherwise). PCA analysis was done using prcomp in R. R (http://www.rproject.org/) was used for all the analysis and visualization of data.
PEEGset from published dataset. The neuroendocrine cells in the pancreas include alpha, beta, delta, pancreatic polypeptide (pp)-producing and vasoactive intestinal peptide (VIP)-producing cells. Gene sets representing different endocrine islet and exocrine pancreatic cells (PEEGset) were obtained from three metadata 11,15,32 (Supplementary Table 2). We created 13 PEEGset representing all major cells from endocrine and exocrine pancreases. Supplementary Table 2 shows these gene sets with major cell types and number of genes in each set. These gene set were used as prior defined gene set for GSEA analysis.
Gene set enrichment analysis on major islets cell types. Gene Set Enrichment Analysis 12 (GSEA) was performed on the log2TPM expression values of all samples using downloaded version of GSEA software (Broad Institute, Cambridge, MA, USA) to identify the statistically enriched gene sets between A-D-M mutant and A-D-M WT PanNETs. Published pancreatic islet endocrine and exocrine cells signatures were used as prior defined sets as an input. We used all default parameters to perform GSEA on this gene sets to determine the enrichment of specific cell signature enrichment in the PanNET subtypes. We ran GSEA on 1000 permutation mode on phenotypic label to generate FDR and enrichment score (ES) for each gene set. Significant gene set was filtered based on FDR q-values (cutoff of 0.05).
Bramswig et al. FACs sorted alpha and beta cells gene expression. We extensively used Bramswig et al. 15 FACs sorted RNAseq data to understand normal alpha and beta cells and correlated their gene signature sets with our A-D-M mutant and A-D-M WT panNETs. We downloaded supplement file for total RNA seq normalized expression data for alpha (3 replicate) and beta (3 replicate) and exocrine cells (2 replicate). Bramswig et al. 15 provide strong genes associated with alpha, beta and exocrine cells as supplement file. We used this strong cell specific genes and created gene set for alpha, beta and exocrine and named as Bramswig et al. gene set. HNF1A gene expression values were fetched to check whether HNF1A is over expression in normal alpha as compared to beta and exocrine. We applied Student ttest's between three alpha and three beta samples to calculate pvalue for HNF1A gene expression. Bramswig et al. strong alpha cell genes (n = 465) were queried to check for HNF1A transcription factor motif enriched using online GSEA version (C3 TFs motif database).
Microarray gene expression analysis. RNA extracted from PanNETs samples were submitted for Affymetrix microarray profiling using chip hgu133a2 array. All following analysis was done in R. Briefly, the raw Affymetrix CEL files were loaded in R (simpleaffy 33 ). Normalized expression values were calculated by the GC Robust Multi-array Average (GCRMA 34 ) algorithm and subjected to mean transformation in order to collapse all probes to respective genes in R using col-lapseRow 35 . We then followed the same clustering and PCA procedure that were done on RNAseq expression data. Collapsed average gene expression values were imported in GSEA to run against 13 PEEGset cell types to find enrichment. 450 K DNA methylation array analysis. DNA extracted from PanNETs samples and interrogated using the Illumina 450 K platform (Illumina Inc. San Diego, CA) to access the DNA methylation profiles. All the analysis was performed using ChAMP 36 version 2.6.0 open source software implemented in R. Briefly, IDAT file raw data were imported in R and filtered to exclude samples with detection p-value < 0.01 (DeSeq2) and beadcount <3 in at least 5% of samples and normalized using FunctionNormalization 37 . This normalization method correct for background; remove dye bias followed by Quantile normalization. Unsupervised clustering and PCA were done on top variants 2000 probes (Var2000) across all samples to find classes of PanNETs. We repeated this clustering using different number (Var10000, Var5000, Var3000, Var1000, and Var500) of probes to check robustness of this subtyping. Differentially methylated CpG sites (DMP) between the A-D-M mutant and A-D-M WT PanNETs were identified using champ.MVP using the all default parameter method (Bonferroni-Hochberg) to adjust the p-value(<0.01, DeSeq2). Significant DMP sites from respective genes were compared to DEgenes to find overlapping dysregulated genes in each subtype.  13 and pval was calculated using Wilcox test. We performed GSEA analysis on A-D-M mutant and WT panNETs from Sadanandam et al. 13 .
Immunohistochemistry (IHC). A representative, formalin-fixed, paraffinembedded tissue section (4 μm thick) of each case was submitted to our institution's core facility to perform immunohistochemistry-using antibodies recognizing the APOH proteins. Briefly, sections were de-paraffinized and pre-treated in Cell Conditioning 1 (CC1 mild; Ventana Medical Systems, AZ, USA) using an automated staining system (Ventana Discovery XT Autostainer; Ventana Medical Systems Inc, Tucson, AZ). Primary antibodies were applied for 60 min at a dilution of 1:100 for APOH (anti-APOH, polyclonal antibody; Proteintech). The sections were then incubated for 60 min with secondary antibody (1:200) followed by DAB Map detection (DAB visualization; Ventana Medical Systems). Cytoplasmic (APOH) labeling in at least 50% of the tumor cells was considered positive. In the case of APOH, normal liver tissue was used as a positive control in each experiment.
Histone marks IHC. Serial unstained slides (4 μm) were prepared from each block for subsequent immunohistochemistry with the following Histone 3 lysine antibodies cones: H3K4me3 clone C42D8 (Cell Signaling Technologies Catalog number 9751 1:1000 dilution), H3K9me3 clone EPR16601 (abCam, Catalog number Ab8898 1:1000 dilution), H3K27me3 clone C36B11 (Cell Signaling, Catalog number 9733 1:100 dilution) and H3K36me3, clone 333 (Active motif, Catalog number 61021, 1:500 dilution). Staining for all clones was performed on the Leica Bond immunohistochemistry platform according to the manufacturer's protocol with the Lieca DAB IHC detection kit. All slides were pretreated with epitope retrieval 2 solution (Lieca Biostystems) for 30 min. Primary antibodies were incubated for 30 min. Multi-tissue normal positive control was used. The PanNET cases (n = 36, 14 A-D-M mutant and 22 A-D-M WT) were read and interpreted by an independent observer blinded to the clinicopathologic information. Scoring of all histone marks was performed using previously validated scoring systems H3K4me3 38 , H3K9me3 39 , H3K27me3 40 , H3K36me3 41 . The tumor was considered positive for the histone mark if there was histological evidence of nuclear staining. Every tumor was scored on a scale of 0-3 according to the percentage of cells with nuclear staining: (0, 0-5% positive cells; 1, 6-50% positive cells; 2, 51-75% positive cells; 3, 76-100% positive cells). Scores of 0-1 were estimated as low expression and scores of 2-3 indicated high expression. Student t-test was used to test significance for histone methylations across MEN1 panNETs as compared to MEN1 WT PanNETs.
Statistical analysis. Data are represented as mean ± standard deviation. GraphPad Prism 6 (GraphPad Software Inc, La Jolla, Ca) was used for statistical and survival analyses. Survival analysis p-values (2-sided) were based on log-rank tests. Significance was defined as P < 0.05.