ATAC-seq reveals alterations in open chromatin in pancreatic islets from subjects with type 2 diabetes

Impaired insulin secretion from pancreatic islets is a hallmark of type 2 diabetes (T2D). Altered chromatin structure may contribute to the disease. We therefore studied the impact of T2D on open chromatin in human pancreatic islets. We used assay for transposase-accessible chromatin using sequencing (ATAC-seq) to profile open chromatin in islets from T2D and non-diabetic donors. We identified 57,105 and 53,284 ATAC-seq peaks representing open chromatin regions in islets of non-diabetic and diabetic donors, respectively. The majority of ATAC-seq peaks mapped near transcription start sites. Additionally, peaks were enriched in enhancer regions and in regions where islet-specific transcription factors (TFs), e.g. FOXA2, MAFB, NKX2.2, NKX6.1 and PDX1, bind. Islet ATAC-seq peaks overlap with 13 SNPs associated with T2D (e.g. rs7903146, rs2237897, rs757209, rs11708067 and rs878521 near TCF7L2, KCNQ1, HNF1B, ADCY5 and GCK, respectively) and with additional 67 SNPs in LD with known T2D SNPs (e.g. SNPs annotated to GIPR, KCNJ11, GLIS3, IGF2BP2, FTO and PPARG). There was enrichment of open chromatin regions near highly expressed genes in human islets. Moreover, 1,078 open chromatin peaks, annotated to 898 genes, differed in prevalence between diabetic and non-diabetic islet donors. Some of these peaks are annotated to candidate genes for T2D and islet dysfunction (e.g. HHEX, HMGA2, GLIS3, MTNR1B and PARK2) and some overlap with SNPs associated with T2D (e.g. rs3821943 near WFS1 and rs508419 near ANK1). Enhancer regions and motifs specific to key TFs including BACH2, FOXO1, FOXA2, NEUROD1, MAFA and PDX1 were enriched in differential islet ATAC-seq peaks of T2D versus non-diabetic donors. Our study provides new understanding into how T2D alters the chromatin landscape, and thereby accessibility for TFs and gene expression, in human pancreatic islets.

(http://broadinstitute.github.io/picard/) v1.126 was used to remove PCR duplicated reads and for insert size distribution analysis. Reads mapping to mitochondrial DNA and the Y chromosome were filtered and only uniquely mapping paired reads with a quality score≥30 were kept for subsequent analysis. The read start sites were adjusted to represent the center of the transposon binding event (5bp on forward strand and 4bp on reverse strand) as described 3 . Phantompeakqualtools v2.0 was used to calculate the strand cross-correlation and deepTools v2.0.1 was used to calculate correlation between samples 4,5 . MACS2 v2.1.0 was used for peak calling with parameters "--nomodel--nolambda-keep-dup all--call-summits-g hs-B", and peaks were filtered based on default settings and false discovery rate (FDR) < 0.1 6 . Peaks in regions known to show artificially high signal were filtered using the hg19 blacklist file from ENCODE. BEDTools v2.26.0 7 was used for various analyses based on peak locations. Peaks located less than 50bp apart were merged in order to facilitate comparison between samples. At this stage, ~40% of peaks were only present in one sample and these reflect individual-specific peaks. A peak was only considered for subsequent analyses if it was identified in three or more islet donors.
Fisher's exact test was used for an occupancy-based analysis (i.e. to identify islet ATAC-seq peaks that were found in significantly more non-diabetic compared to T2D donors or vice versa). The R Diffbind package and edgeR package 8 were used for an affinity-based analysis (i.e. to identify islet ATAC-seq peaks where the mean number of mapped reads differ between the groups). Gender and sample treatment (frozen/non-frozen) were included as covariates in the design matrix. The whole ATAC-seq workflow was implemented in Snakemake. All sequencing track figures were generated with the UCSC Genome Browser.

Islet RNA-seq
Total RNA was extracted using the AllPrep DNA/RNA mini kit (Qiagen). RNA quality was measured by 2100 Bioanalyzer and 2200 Tape station (Agilent Technologies). RNA quantity was measured by NanoDrop 1000 (NanoDrop Technologies) or Qubit 2.0 Fluorometer (Life Technologies). 1µg of total RNA with a RIN value≥8 was used for library preparation with TruSeq RNA sample preparation kit (Illumina) before sequencing on Illumina HiSeq2000 platform as described 9,10 .
Expression values are presented as log2 transcripts per million (TPM) after quality control, normalization and transformation as described 9 . Transcripts were categorized into four categories; transcripts with TPM<0.1 were considered as non-expressed and the remaining transcripts were equally categorized as low-, medium-and high-expressed based on their TPM value 11 . For the purpose of correlating ATAC-seq peaks with mRNA expression, genes were classified based on whether any of their annotated features (TSS, TTS, intron, exon, 3'UTR, 5'UTR) were within 1500 bp of a peak summit.

Overlapping ATAC-seq peaks with public ChIP-seq datasets
Easeq version 1.03 12 was used for co-localization analysis e.g. overlapping ATAC-seq peaks with histone modifications from ENCODE and TF binding 13 . Chi-square tests were used to analyze enrichment of ChIP-seq peaks overlapping with ATAC-seq peaks. Expected frequencies were set to 0.05 of the total number of ChIP-seq peaks found for each studied attribute. 4

Availability of data
Datasets supporting the conclusions of this article are available in the GEO repository, accession number GSE50398 and GSE129383 or are available upon request.