Main

Gene expression2,3 and genetic variation4 studies suggest gene-regulatory changes may underlie AD, but regulatory epigenetic alterations during neurodegeneration remain uncharacterized, given the inaccessible nature of human brain samples. To address this need, we profiled transcriptional and epigenomic changes during neurodegeneration in the hippocampus of the CK-p25 mouse model of AD5,6,7 and CK littermate controls at both early and late stages of neurodegeneration (2 weeks and 6 weeks after p25 induction). CK-p25 mice, in which accumulation of the Cdk5 activator protein p25 is inducible, exhibit DNA damage, aberrant gene expression and increased amyloid-β levels at early stages7, followed by neuronal and synaptic loss and cognitive impairment at late stages5,6.

For transcriptome analysis, we used RNA sequencing to quantify gene expression changes for 13,836 ENSEMBL genes (see Methods, Extended Data Fig. 1a and Supplementary Table 1). We found 2,815 upregulated genes and 2,310 downregulated genes in the CK-p25 AD mouse model as compared to CK littermate controls (at q < 0.01; Supplementary Table 1), which we classified into transient (2 weeks only), late-onset (6 weeks only) and consistent (both) expression classes (Fig. 1a, Extended Data Fig. 4a and Supplementary Table 1). These showed distinct functional enrichments (Fig. 1a and Supplementary Table 2), with transient-increase genes enriched in cell cycle functions (P < 10−92), consistent-increase genes enriched in immune (P < 10−10) and stimulus-response (P < 10−4) functions, and consistent- and late-decrease genes enriched in synaptic and learning functions (P < 10−12).

Figure 1: Conserved gene expression changes between mouse and human AD are associated with immune and neuronal functions.
figure 1

a, Six distinct temporal classes of differentially expressed genes are denoted; transient (early) increase (pink) or decrease (light blue), consistent increase (red) or decrease (blue), and late (6 week) increase (dark red) or decrease (navy blue). Expression is shown relative to the mean of three replicates at 2-week control (CK) mice. Shown are the most significant distinct biological process Gene Ontology (GO) categories in each class of differentially regulated genes (asterisk denotes enrichment of hypergeometric P < 0.01). Grey boxes indicate no overlapping genes. b, T-statistic identifying the bias of each differentially regulated class of genes in AD cases relative to controls; negative t denotes lower expression in AD, positive t denotes higher expression in AD. c, Enrichment of Gene Ontology categories for differentially expressed genes between AD cases and controls in human2. d, e, Enrichment of each Gene Ontology category examined in the gene expression analysis was calculated for H3K4me3 promoters (pro.; red) (d) and H3K27ac enhancers (enh.; yellow) (e). Asterisk denotes categories with a binomial P < 0.01. f, g, Enrichment of regulatory motifs within changing promoters (top) (f) and enhancers (bottom) (g) in the mouse AD model. h, i, Overlap of changing promoters (top) (h) and enhancers (bottom) (i) with regions shown to be bound by immune (orange) and neuronal (purple) transcriptional factors (TF) and co-factors profiled using ChIP-seq in mouse immune and neuronal tissues15,16,17,18,19.

PowerPoint slide

These coordinated neuronal and immune changes are consistent with the pathophysiology of AD2 and probably reflect both cell-type-specific expression changes and changes in cell composition. Indeed, comparison with expression in microglia8 (the resident immune cells of the brain) shows that both the cell type composition (P = 2.7 × 10−4) and microglia-specific activation (P = 2.9 × 10−6) significantly contribute to the gene expression changes (see Methods). Additionally, reverse transcription followed by quantitative PCR (RT–qPCR) of increased-level genes in purified CD11b+ CD45low microglia populations confirms cell-type-specific activation for five of the seven microglia-specific genes tested (Extended Data Fig. 2).

Confirming the biological relevance of our mouse model for human AD, the observed changes in gene expression in mouse, especially for the consistent and late classes, agreed with gene expression differences between 22 patients with AD and 9 controls in human post-mortem laser capture microdissected hippocampal grey matter2 (Fig. 1b). The enriched Gene Ontology classes also agreed between mouse and human, with higher immune gene expression and lower neuronal gene expression in patients with AD (Fig. 1c).

For epigenome analysis, we used chromatin immunoprecipitation sequencing (ChIP-seq) to profile seven chromatin marks9: histone 3 Lys 4 trimethylation (H3K4me3; associated primarily with active promoters); H3K4me1 (enhancers); H3K27 acetylation (H3K27ac; enhancer/promoter activation); H3K27me3 (Polycomb repression); H3K36me3 and H4K20me1 (transcription); and H3K9me3 (heterochromatin) (Extended Data Fig. 1a). We used ChromHMM (http://compbio.mit.edu/ChromHMM/) to learn a chromatin state model (Methods and Extended Data Fig. 3a) defined by recurrent combinations of histone modifications, consisting of promoters, enhancers, transcribed, bivalent, repressed, heterochromatin and low-signal states (Extended Data Fig. 3a). We defined 57,840 active promoters using H3K4me3 peaks within promoter chromatin states, and 151,447 active enhancer regions using H3K27ac peaks within enhancer chromatin states (Extended Data Fig. 1a, Supplementary Table 3 and Methods).

We mapped orthologous genes between mouse and human using ENSEMBL one-to-one orthologues (see Methods). We also mapped orthologous noncoding regions using multiple mammalian sequence alignments, mapping each mouse peak to its best human match (see Methods). We found matches for 90% of promoter regions, 84% of enhancers, 74% of Polycomb-repressed regions and 33% of heterochromatin regions (Supplementary Table 3). Comparing our mouse chromatin states to human hippocampus chromatin states10, we found significant epigenomic conservation at orthologous noncoding regions (Extended Data Fig. 3b), consistent with recent results11.

We quantified epigenomic changes in promoter regions using relative differences in H3K4me3 levels resulting in 3,667 increased-level and 5,056 decreased-level peaks (q < 0.01; Extended Data Fig. 4b and Supplementary Table 3), which we classified into transient, consistent and late-stage, as for gene expression changes. For enhancer regions, we used relative levels of H3K27ac, resulting in 2,456 increased-level and 2,154 decreased-level peaks (Extended Data Fig. 4c and Supplementary Table 3). Only a very small number of peaks showed differences in Polycomb-repressed and heterochromatin regions, leading us to focus on enhancer and promoter changes for the remaining analyses (Extended Data Fig. 4d, e and Supplementary Table 3).

Genes flanking increased- and decreased-level regulatory regions (see Methods) showed consistent gene expression changes for both promoter and enhancers regions (Extended Data Fig. 5), and were consistently enriched in immune and stimulus-response functions for increased-level enhancers and promoters, and in synapse and learning-associated functions for deceased-level enhancers and promoters (Fig. 1d, e), consistent with our Gene Ontology results of changing gene expression levels.

Increased- and decreased-level regulatory regions showed distinct regulatory motif enrichments (Fig. 1f, g). Increased-level peaks were enriched in NFκB, E2F, PPARG, IRF and PU.1 (ref. 12) transcription factor motifs for both enhancers and promoters, consistent with immune regulator targeting. Decreased-level peaks in enhancers were enriched for DNA-binding RFX motifs, and peaks in promoters were enriched for zinc-finger ZIC motifs, two known neurodevelopmental regulators13,14.

Consistent with the observed motif enrichments, increased-level enhancers and promoters showed in vivo binding of PU.1 in mouse embryos15,16 (Fig. 1h, i). Only increased-level promoters were bound in macrophages and BV-2 microglial-like cells17,18,19 that are both implicated in AD20, while both increased- and decreased-level promoters were bound in several immune cell lineages (Fig. 1h). The PU.1 regulator itself (encoded by the SPI1 gene) showed increased expression and enhancer levels (Extended Data Fig. 1b), possibly contributing to immune enhancer and promoter upregulation, consistent with roles for PU.1, ETS-1 and other ETS family members in microglia activation and proliferation during neurodegeneration21,22. By contrast, neuronal function regulators were not enriched in increased-level enhancers (except for a weak enrichment of fetal brain CREB; Fig. 1i), consistent with primarily immune and inflammatory function of these regions.

Decreased-level enhancers and promoters were targeted by different regulators, suggesting distinct regulatory programs. Decreased-level promoters were preferentially bound by CREB and SRF (P < 10−21 and P < 10−16), two known regulators of neuronal activity in cortical neurons23, and decreased-level enhancers were preferentially bound by CBP (Phypergeometric = 5.4 × 10−20), a known co-activator for neuronal activity16 (Fig. 1h, i). Surprisingly, p300-bound regions15 did not show any enrichment, suggesting distinct roles for CBP and p300, despite a general association with enhancers for both. The distinct neuronal and immune targeting of decreased-level and increased-level regulatory regions provides a mechanistic basis for the expression differences observed for neuronal and immune genes, and suggests potential therapeutic targets for reversing observed alterations during neurodegeneration.

On the basis of chromatin state annotations in 127 human cell types and tissues10 (Fig. 3a and Supplementary Table 4), regions orthologous to increased-level enhancers in mouse showed immune cell enhancer activity in human (P < 10−100), while orthologues of decreased-level enhancers in mouse showed fetal brain tissue enhancer activity in human (P < 10−8 consistent; P < 10−17 late-stage; Fig. 2a and Supplementary Table 4). Adult brain tissues (including hippocampus) were not as strongly enriched, suggesting changes are biased towards neuronal plasticity. These results are consistent with decreased neuronal plasticity, and increased microglial activation and proliferation during AD progression24.

Figure 2: AD GWAS loci are preferentially enriched in increasing enhancer orthologues with immune function.
figure 2

a, Enrichment (y axis) of changing mouse AD enhancer orthologues, with a focus on consistently increasing (red) category of enhancers, in 127 cell and tissue types profiled by the Roadmap Epigenomics Consortium10 (columns). Roadmap samples are grouped into fetal brain (purple), adult brain (green), immune/blood cell types (orange) and all other (grey). b, Cell-type-specific fold luciferase reporter expression change relative to control (ctrl) for selected increasing enhancer regions in BV-2 microglia (orange) versus N2A neurons (purple) (n = 3, P < 0.05, two-tailed t-test). c, Enrichment of AD-associated SNPs (y axis, binomial P value) in human regions orthologous to the mouse enhancers. d, e, Enrichment of AD-associated SNPs (y axis, permutation P value) in tissue-specific enhancer annotations from the Roadmap Epigenomics Consortium (points), relative to their enrichment for consistently increasing (d) and consistently decreasing (e) orthologous enhancer regions in the mouse AD model (x axis, hypergeometric P value). Linear regression trend line and R2, based on Pearson correlation, is shown.

PowerPoint slide

To verify whether the increased-level putative enhancer regions were indeed functional, we used a luciferase reporter assay to evaluate their ability to drive in vitro gene expression in immortalized murine microglial (BV-2) and neuroblastoma (N2a) cell lines. Eight of the nine increased-level human orthologues tested were indeed able to drive in vitro reporter expression. Two of these, BIN1 and ZNF710, were active in both cell types, while the remaining six showed a BV-2-cell-specific increase in luciferase expression (Fig. 2b and Supplementary Table 5), confirming both functional conservation and tissue specificity of increased-level enhancer regions implicated by our mouse model of AD.

Human orthologues of increased-level enhancers were also enriched for expression quantitative trait loci (eQTLs) in CD4+ T cells and CD14+ monocytes25,26 (Extended Data Fig. 6 and Supplementary Table 6), indicating that they contain driver mutations controlling immune cell regulatory programs. The enrichment was strongest for CD14+ monocytes (Extended Data Fig. 6), which also showed the highest enhancer enrichment and is consistent with the observed inflammatory response Gene Ontology category.

To test whether the implicated regulatory regions are causal, we examined their enrichment for AD-associated variants from genome-wide assocation studies (GWAS). Genetic variants associated with AD in a meta-analysis of 74,000 individuals4 were enriched in increased-level enhancer orthologues (Fig. 2c) (4.4-fold enrichment, binomial P = 1.2 × 10−10 at GWAS cutoff P < 0.001; 9.7-fold enrichment, binomial P < 3.7 × 10−6 at GWAS cutoff P < 10−5). By contrast, decreased-level enhancer orthologues were surprisingly not enriched (0.61-fold), suggesting a causal role specifically for immune-related processes. Promoter regions were only weakly enriched, strongly implicating distal enhancers in mediating AD predisposition (Extended Data Fig. 7).

Across diverse cell types and tissues, we found concordance between the enrichment for AD GWAS single nucleotide polymorphisms (SNPs) and the enrichment for increased-level enhancer orthologues (R2 = 0.49; Fig. 2d, Extended Data Fig. 8a, left and Supplementary Table 4), with CD14+ immune cells being the most enriched in both, followed by other immune cell types, and with fetal brain enhancers showing the smallest enrichment in both. By contrast, decreasing enhancers orthologues showed a very weak correlation (R2 < 0.08) (Fig. 2e, Extended Data Fig. 8b, right and Supplementary Table 4). The increased-level enhancer orthologue enrichment for AD GWAS SNPs persisted both within CD14+ enhancers (3.0-fold enrichment, binomial P = 1.3 × 10−5) and outside CD14+ enhancers (3.4-fold, P = 0.005), suggesting it is not solely a feature of CD14+ cell type enrichment (see Methods).

These results are consistent with enhanced microglial expression of CD14 in brains of animal models of AD, and a regulatory role of the CD14 receptor in microglial inflammatory response, which modulates amyloid-β deposition24. Thus, the enrichment of AD-associated variants in CD14+ primary immune cells, but not neuronal cells, indicates that AD genetic predisposition is primarily associated with immune function, while decrease in neuronal plasticity may be affected primarily by non-genetic effects, such as diet, education, physical activity and age, which are thought to lead to epigenetic changes related to cognitive reserve27.

We next used the epigenomic annotations of increased-level enhancer orthologues to gain insights into AD-associated loci (Supplementary Table 7). Among the 20 genome-wide significant AD-associated loci4, 11 contain no protein-altering SNPs in linkage disequilibrium (LD), indicating they may have noncoding roles. Of these, five localize within increased-level enhancer orthologues, including two well-established GWAS loci (PICALM and BIN1), and three loci (INPP5D, CELF1 (also containing the SPI1 gene) and PTK2B) only recently recognized as significant by combining all AD cohorts.

For INPP5D (Fig. 3a), a known regulator of inflammation28, the most significant variants localize within an increased-level enhancer orthologue, which also shows CD14+ enhancer activity. In the CELF1 locus (Fig. 3b) a large region of association spans several genes, but the strongest genetic signal (P = 2 × 10−6) localizes upstream of SPI1 (PU.1), and specifically within an increased-level enhancer orthologue that is also active in immune cells. We confirmed that the AD-associated C–T substitution, rs1377416, in the SPI1 enhancer leads to increased in vitro enhancer activity in murine BV-2 microglia cells using a luciferase reporter assay (Fig. 3d). In addition, the AD-associated SNP rs55876153 near SPI1, which overlaps an increased-level mouse enhancer orthologue, is in strong linkage disequilibrium (LD = 0.89, see Methods) with a known SPI1 eQTL, rs10838698 (ref. 25), even though it did not significantly alter enhancer activity in the luciferase assay.

Figure 3: Increasing enhancer orthologues help interpret AD-associated non-coding loci.
figure 3

ac, Overlap of disease-associated SNPs (top) with increasing enhancers (second row, red) and immune enhancers in human (CD14+ primary cells) is shown for genome-wide significant (INPP5D and CELF1 (containing the SPI1 gene); a and b) and below-significance (ABCA1; c) AD GWAS loci. Roadmap chromatin state annotations for immune cells (CD14+ primary; E029), hippocampus (E071) and fetal brain (E81), with colours as shown in the key. Light red highlight denotes increasing enhancer regions tested in luciferase assay. kb, kilobases; Mb, megabases. d, AD-associated SNP rs1377416 amplifies in vitro luciferase activity of putative enhancer region 38,313–37,359 base pairs (bp) upstream of SPI1 (PU.1) gene in BV-2 cells. n = 3, P < 0.0001, one-way analysis of variance (ANOVA); P < 0.01, Tukey’s multiple comparison post-hoc test. NS, not significant.

PowerPoint slide

Outside known GWAS loci, an additional 22 weakly associated regions (3.9-fold, P < 4.9 × 10−7) contain variants within increased-level enhancer orthologues (Supplementary Table 7), of which 17 lack protein-altering variants in linkage disequilibrium (R2 < 0.4), providing strong candidates for directed experiments. One such example includes ABCA1 (P = 6.9 × 10−5; Fig. 3c), a paralogue of AD-associated ABCA7 and encoding a glial-expressed transporter that influences APOE metabolism in the central nervous system29. The region lacks protein-altering variants and all five SNPs in the cluster of association lie specifically within an increased-enhancer orthologue, which is also active in CD14+ immune cells and, to a lesser extent, in human hippocampus and fetal brain.

Overall, our study revealed contrasting changes in immune and neuronal genes and regulatory regions during AD-like neurodegeneration in mouse, strong human–mouse conservation of gene expression and epigenomic signatures, and enrichment of AD-associated loci in increased-level enhancer orthologues in human. While immune genes are known to be among the most significant genetic loci associated with AD, the depletion of neuronal promoters and enhancers is particularly notable for a cognitive disorder with well-established environmental and experiential factors that include diet, exercise, education and age. These results are consistent with a model in which increased immune susceptibility to environmental factors during ageing and cognitive decline is mediated by interactions between genetically driven immune cell dysregulation and environmentally driven epigenomic alteration in neuronal cells.

Our study also illustrates the power of model organisms for the study of human disease progression, especially for disorders affecting inaccessible tissues for which only post-mortem samples are available in human. We find that molecular changes in both genes and regulatory regions are highly conserved between human AD and CK-p25 neurodegeneration, enabling detailed studies of the molecular signatures associated with disease progression across diverse environmental conditions, in a variety of brain regions and cell types, and in response to therapeutic agents before or after disease onset.

Lastly, our results indicate specific therapeutic targets for AD, including putative causal nucleotides lying in increased-level enhancer orthologues that may be targeted by CRISPR/Cas9 genome editing30, and trans-acting regulators. In particular, the transcription factor PU.1 is implicated as a therapeutic target by its genetic association with AD, as well as the enrichment of the PU.1 motif and the PU.1 in vivo binding sites at increased-level regulatory regions during mouse neurodegeneration. The conservation of neuronal and immune regulatory circuitry between mouse and human suggests that CK-p25 mice may offer a powerful model for studying the gene-regulatory and cognitive effects of such interventions.

Methods

Animals

All mouse work was approved by the Committee on Animal Care of the Division of Comparative Medicine at MIT. Adult (3-month-old) female double-transgenic CK-p25 (ref. 5) mice and their respective control littermates were used for the experiments. Brain tissue was collected at either 2 or 6 weeks after p25 induction. Upon dissection tissue was flash-frozen in liquid nitrogen. No animals were excluded from the study and no randomization or blinding was required.

Chromatin immunoprecipitation

Mouse hippocampus was collected immediately after euthanasia. Chromatin immunoprecipitation was then performed as described in Broad ChIP protocol (http://www.roadmapepigenomics.org/protocols/type/experimental/). In brief, tissues were minced and crosslinked in 1% formaldehyde (Thermo Scientific) for 15 min at room temperature and quenched with glycine for 5 min (Sigma). The samples were homogenized in cell lysis buffer containing proteinase inhibitors (complete, Roche) and chromatin was then fragmented to a size range of 200–500 bp using a Branson 250 digital sonifier. Solubilized chromatin was then diluted and incubated with 1 μg antibody at 4 °C overnight. Immune complexes were captured with Protein-A-sepharose beads, washed and eluted. Enriched chromatin was then subjected to crosslink reversal and proteinase K digestion at 65 °C, phenol–chloroform extraction and ethanol precipitation. Isolated ChIP DNA was resuspended and quantified using the Qubit assay (Invitrogen). H3K4me1 (Abcam, ab8895), H3K4me3 (Millipore, 07-473), H3K9me3 (Abcam, ab8898), H3K27me3 (Millipore, 07-449), H3K27ac (Abcam, ab4729), H3K36me3 (Abcam, ab9050) and H4K20me1 (Abcam, ab9051) were used to immunoprecipitate endogenous proteins.

ChIP-seq high-throughput sequencing, read mapping and quality control

Sequencing libraries were prepared from 1–5 ng ChIP (or input) DNA as described previously31. Gel electrophoresis was used to retain library fragments between 300 and 600 bp. Before sequencing, libraries were quantified using Qubit (Invitrogen) and quality-controlled using Agilent’s Bioanalyzer. The 36-bp single-end sequencing was performed using the Illumina HiSeq 2000 platform according to standard operating procedures. For each histone modification, five biological replicate data sets were produced with corresponding whole-cell extract controls, except for H3K4me3, H4K20me1and H3K27me3 in the 2-week control (CK) sample, where optimal amount of reads for sufficient coverage was obtained from four biological replicates. Reads were mapped to the mm9 reference mouse genome using MAQ v0.7.1-9 using default parameters32. Reads mapping to multiple locations were discarded. Duplicates were marked and filtered using PICARD (http://picard.sourceforge.net/). After filtering, roughly 55–60 million unique reads were obtained for each histone modification in each condition (9–12 million reads per replicate) and 110–145 million reads in total for the whole-cell extract controls in each condition. All replicate data sets passed quality control based on ENCODE ChIP-seq data standards based on read quality, read mapping statistics, library complexity and strand cross-correlation analysis (to measure signal-to-noise ratios)33.

RNA sequencing

Mouse brains were homogenized and total RNA was extracted using Trizol reagent (Ambion). Total RNA was quality-controlled using Agilent’s Bioanalyzer and prepared for sequencing using Illumina’s TruSeq Stranded Total RNA Sample Preparation Kit with Ribo-Zero. High-throughput sequencing was performed on an Illumina HiSeq 2000 platform. Roughly 15 million 76-pair-end reads were generated for each data set. Sequence reads were aligned to mouse mm9 genome with Bowtie. On the basis of the reproducibility of the results (Fig. 2a), three replicate biological data sets were produced for each condition. A small number of replicates suffice for RNA sequencing (RNA-seq) studies34 and we were able to detect large-scale changes in read counts in coherent gene ontology categories, with similarities to human AD (Fig. 2c, d). Therefore, we decided that additional replicates were not necessary.

Peak calling and signal coverage tracks for ChIP-seq data

For each histone modification in each condition, mapped reads were pooled across ChIP-seq replicates and regions of enrichment (peaks) were identified for the pooled ChIP-seq data set relative to the pooled control using the MACS2 peak caller (version 2.0.10.20130712)35 (https://github.com/taoliu/MACS/) using a relaxed p-value of 0.01. For each histone modification, overlapping peaks (at least 1 bp overlap) were merged across all conditions to obtain a non-redundant master list of regions of enrichment. Master lists of broad domains of enrichment for the diffused marks H3K27me3, H3K9me3, H3K36me3 and H4K20me1 were obtained by allowing merging peaks across conditions that were within 1 kb of each other. Genome-wide signal coverage tracks representing per-base fold enrichment and the likelihood ratio of ChIP relative to control were also computed using MACS2.

Learning combinatorial chromatin states

We used ChromHMM to learn combinatorial chromatin states jointly across all four conditions36. ChromHMM was trained using all seven chromatin marks in virtual concatenation mode across all conditions. Reads from replicate data sets were pooled before learning states. The ChromHMM parameters used are as follows: reads were shifted in the 5′ to 3′ direction by 100 bp; for each ChIP-seq data set, read counts were computed in non-overlapping 200-bp bins across the entire genome; each bin was discretized into two levels, 1 indicating enrichment, and 0 indicating no enrichment. The binarization was performed by comparing ChIP-seq read counts to corresponding whole-cell extract control read counts within each bin and using a Poisson P value threshold of 1× 10−4 (the default discretization threshold in ChromHMM). We trained several models with the number of states ranging from 12 to 23 states. We decided to use a 14-state model for all further analyses as it captured all the key interactions between the chromatin marks and larger number of states did not capture significantly new interactions. To assign biologically meaningful mnemonics to the states, we used the ChromHMM package to compute the overlap and neighbourhood enrichments of each state relative to coordinates of known gene annotations. The trained model was then used to compute the posterior probability of each state for each genomic bin in each condition. The regions were labelled using the state with the maximum posterior probability. The chromatin state models and browser tracks can be downloaded from http://www.broadinstitute.org/~anshul/projects/liz/segmentation/results/S14/webpage_14.html.

Differential analysis and visualization

We used the DEseq2 method that models read count statistics from replicates across multiple conditions to identify differentially expressed genes and regions of enrichment of histone marks37. Our procedures are consistent with the standards for ChIP-seq and RNA-seq analysis determined by rigorous benchmarking as a part of the ENCODE project33. The minimal recommended depth for sufficient sensitivity of peak detection for histone marks for the human or mouse genome is 20 million mapped reads33. However, owing to limited amount of starting material obtained from a single mouse, we obtained 10 million unique mapped reads from each biological replicate. Directly, using read counts from the original replicates would result in significant loss of power to detect differential events. To improve sensitivity, for each histone mark in each condition, we pooled mapped reads from all replicates and created a pair of pseudo-replicates with equal number of reads (30 million) by randomly subsampling (without replacement) from the pool. Reads were then extended to the predominant fragment length. Extended-read counts were computed within all regions in the master peak list of a histone mark for all pseudo-replicates in all conditions and the table of counts was used as input to DEseq2. The raw data are available online (NCBI GEO GSE65159).

For RNA-seq data, the numbers reads overlapping ENSEMBL gene models38 were determined by HT-Seq (http://www-huber.embl.de/users/anders/HTSeq/). The raw data are available online (NCBI GEO GSE65159). To ensure that the genes we chose were sufficiently quantifiable, we remove every gene where fewer than 20 reads were found across all samples. The resulting set of genes is found in Supplementary Table 1.

IGV39 is used to visualize the histone marks, gene expression, chromatin state and AD GWAS data relative to the RefSeq gene model. Gene expression levels shown are raw read density. Levels of histone marks plotted are the log-likelihood ratio of ChIP signal relative to whole-cell extract control.

Within the DEseq2 framework of generalized linear models, we used a combination of different models to determine the significantly regulated genes and significantly regulated histone mark levels. We compared the set of all 2-week and 6-week controls to the three following groups: (1) the 2-week CK-p25 samples; (2) the 6-week CK-p25 samples; (3) a group containing both the 2-week and 6-week samples. The first two tests identified changes that might be 2-week or 6-week specific. The third test identified changes that might be too subtle to detect at any one time point alone. In each case, the most basic equation (count ≈ CKp25 status) was used, but for a subset of samples. A stringent threshold of q < 0.01 (Benjamini Hochberg) was used to determine significantly changing genes expression levels and histone mark levels. Next, to determine the temporal bias of genes expression levels and histone marks we built another model (count ≈ time), which compared the 2-week and 6-week CK-p25 samples. Levels considered likely to change (q < 0.5) were categorizes as transient (2-week bias) or late-stage (6-week bias). The results of the RNA-seq analysis are found in Supplementary Table 1, while the results of the histone mark analysis are in Supplementary Table 2.

For the histone modifications, we defined promoters using H3K4me3 peaks labelled with the promoter state annotation under any of the conditions (CK-p25 or control, and 2 or 6 weeks). We define enhancers based on peaks of H3K27ac labelled by the enhancer chromatin state. We define Polycomb-repressed regions based on H3K27me3 peaks labelled by the Polycomb-repressed chromatin state. Our definitions are consistent with known roles of these histone modifications40. Defining the boundaries of the regulatory regions using the peaks of the relevant histone modifications, and not the chromatin states, maximizes our power to detect changes in histone mark levels.

Pathway and Gene Ontology analysis for the gene expression data were then generated through the use of DAVID41,42. We present the most significant biological process gene ontology category result as well as a subset of non-redundant less significant categories that still pass our threshold significant (q < 0.01). For the regulatory regions, GREAT (with default parameters) was used to find the fold enrichment in the same Gene Ontology categories43.

Statistical framework for comparing CK-p25 changing genes and regulatory regions to other data sets

A common theme throughout the analysis is the characterization of regulatory regions that change in the CK-p25 mouse model. The most stringent control for this characterization is genes or regions of the same type that do not change in CK-p25. Owing to the six categories of direction (increasing and decreasing) and temporal pattern (transient, consistent and late-stage), we chose a discrete statistical framework as opposed to trying to define a ranking across these different conditions. To measure the overlap between these discrete categories and other discrete data sets, we could use either a hypergeometric P value or a binomial P value. For every test in the material described below, we computed both significance values and obtained consistent results, with only minor differences in exact P value. In general, we chose the hypergeometric test, which is the most direct to look at overlap of annotated regions. As opposed to the overlap of the CK-p25 mouse categories with other ChIP-Seq peaks, the overlap with transcription factor binding site motifs or SNPs can be thought of as sampling with replacement, which lends itself to the binomial P value. No power analysis was done to estimate sample size.

Comparison of histone marks and gene expression

As described above, DESeq2 was used determine the log fold change in expression at 2 and 6 weeks in CK-p25 mice relative to control. Each enhancer and promoter was mapped to the closest ENSEMBL gene model based on distance to transcription start. For each category of histone mark direction and temporal pattern, we examined the enrichment of each category of CK-p25 gene expression change relative to unchanging genes. The significance of the enrichment is calculated using a hypergeometric test.

Identification of orthologous human regions

The promoter (H3K4me3 peaks annotated as transcription start site by chromatin state), enhancer (H3K27ac peaks annotated as enhancer by chromatin state) and Polycomb-repressed regions (H3K27me3 peaks annotated as Polycomb-repressed by chromatin state) were mapped to the human genome. BED files representing the coordinates of these peaks in mm9 were mapped to mm10 using liftover44. Those peaks were mapped compared to the human genome the UCSC multiple alignment chain files (http://hgdownload.soe.ucsc.edu/goldenPath/mm10/multiz60way/)45. More specifically, the alignments that overlap the mouse peak and include hg19 were extracted. We calculated the human mouse pairwise alignment for each multiple alignment using the ‘globalms’ function of biopython (http://biopython.org/, version 1.59; python version 2.71). The highest scoring pairwise alignment formed base of the orthologous region in human. This region was extended on either side using lower scoring multiple alignments. The orthologous region in hg19 was required to be greater than 30 bp and no more than twice the length of the region in mouse. The mean conservation was examined using the PHASTCons score across placental mammals46 based on the same 60-way multiple sequence alignment. The mapped enhancer regions were annotated with their chromatin state in human hippocampus, and across all 127 cell types and tissues, using BEDTools47. The information from human tissues was collected according to protocols described in more detail in the companion publication as a part of the Roadmap Epigenomics project10 (http://www.roadmapepigenomics.org/). The protocols are approved by the NIH and no sequence information from identifiable subjects is provided.

Computational analysis of cell type proportion

To estimate computationally the relative composition of the neural and immune cell types we compared the changing expression patterns in our data set to a set of established cell-type-specific markers48,49,50. This analysis shows that indeed it is likely that cell type composition is changing in the CK-p25 mouse model, consistent with a known decrease in number of neurons and astrogliosis at 6 weeks5. In summary, a transient enrichment of monocyte specific transcripts was observed at 2 weeks, a consistent enrichment of microglial-specific transcripts was enriched at 2 and 6 weeks, while astrocyte, oligodendroctye and endothelial-specific markers were primarily increased at 6 weeks (Extended Data Fig. 9a, b). We could also detect a signature of neuronal loss, primarily at 6 weeks as well (Extended Data Fig. 9a, b). On the basis of these results alone, it is possible that changes in cell type composition are contributing to some of the differences we observe in our mouse model.

We also compared our data to a published study of microglial activation in another mouse model of AD8, to dissect out computationally changes that are probably due to cell type proportion versus changes due to activation within cells. If the changes in our mouse model were primarily due to cell type proportion, then the increase we observed in the CK-p25 mice should be proportional to the expression level of those genes microglia. If the changes we observed were primarily due to activation, then the changes we observe in the CK-p25 mouse should be proportional to the amount of activation found in during neurodegeneration8. Using the genes with published gene expression changes during activation8, we modelled these two possibilities as a linear regression problem and examined the relative significance of both hypotheses in the R programming language: CK-p25 log fold change ≈ microglial expression + microglial activation log fold change. We found that the changes in the CK-p25 mice were significantly related to the changes in cell activation (P = 2.9 × 10−6) as well as the changes in cell type proportion (P = 2.7 × 10−4), suggesting that both cell activation and composition changes occur.

Comparison of gene expression in mouse model and human AD

To examine the relationship between AD in the mouse model and human, we mapped each 1–1 orthologous gene from mouse to human in ENSEMBL (http://www.biomart.org/)51. For each category of expression change in mouse, we examined how that set of genes behaved in human AD cases relative to controls in whole hippocampus52 as well as laser capture microdissected hippocampal grey matter2. To make this comparison we first downloaded both data sets from GEO (GSE1297 and GSE28146), applied a variance stabilization normalization, and then used limma53 to find the log fold change in expression of all cases relative to controls. For each category of mouse gene expression, we calculated a P value based on a t-test for the bias of genes to increase or decrease in human AD relative to control. Because the original study52 had more confounders owing to changes in grey/white matter proportion, we focused our analysis on the 22 cases and 9 controls from the laser capture samples2.

Enrichment of cofactors and transcription factors

Peaks representing both neural15,16 and immune17,18,19 enhancers or transcription factor binding were used to annotate the H3K27ac enhancers and H3K4me3 promoters. We used a hypergeometric test to evaluate whether or not these external annotations were enriched in the set of increased-level or decreased-level enhancers relative to the enhancers whose levels do not change. This same procedure was used to look at the enrichment of the CK-p25 enhancer orthologues in Roadmap Epigenome data. In this case, only enhancers that map to human are taken to be the background.

The putative binding sites based on transcription factor binding site motifs were identified independent of conservation and have been previously published54. The transcription factor binding sites were further clustered based on similarity55. The least significant of two statistical tests was used as a stringent measure of binding site enrichment. (1) The real transcription factor binding site motifs in the category of interest were compared shuffled control motifs that preserved nucleotide content. (2) The real transcription factor binding site motifs in the category of interest were compared the real motifs in enhancers that are stable in the CK-p25 mice. To estimate the significance for test (1), we use a binomial P value because the length distribution is different for changing regulator regions compared to unchanging. Then we estimate the probability of finding a site per base pair. To estimate the significance for test (2), we use a hypergeometric test. After identifying significant transcription factor binding sites in categories or regulatory regions, we collapsed the results into clusters of almost identical motifs, representing families. The group members can be found in a companion manuscript10 as well as online (http://www.broadinstitute.org/~pouyak/motifs-table/).

Luciferase reporter assay

A total of 14 oligonucleotide gBlocks (IDT), ranging in 500–1,000 nucleotides in length, and corresponding to 10 enhancer regions were synthesized. Each gBlock contained a constant 5′-GCTAGCCTCGAGGAT and 3′-ATCAAGATCTGGCCT region, for direct cloning into an EcoRV (NEB) linearized minimal promoter firefly luciferase vector pGL4.23[luc2/minP] (Promega). The resulting reporter constructs were verified by DNA sequencing. BV-2 cells were provided by B. Yankner. N2a cells were purchased from the American Type Culture Collection and maintained following their protocols. In brief, cells were grown in RPMI-1640 and DMEM respectively, supplemented with 10% FBS and 1% penicillin/streptomycin, and split 1:10 every 3 days. Cells were seeded into 24-well plates 1 day before transfection. Transfections into BV-2 and N2a cells were performed with 1 μg of a pGL4.23 plasmid and 200 ng of Renilla luciferase construct pGL4.74[Rluc/TK] (Promega). Luciferase activities were measured 24 h after transfection using the Dual-Glo Luciferase Assay (Promega) and an EnVision 2103 Multilabel Plate Reader (PerkinElmer) and normalized to Renilla luciferase activity. All assays were performed in triplicate.

Microglia isolation

The 2-week-induced CK-p25 mice and age-matched controls were perfused with 50 ml PBS to wash away blood and minimize macrophage contamination in the brains. Hippocampal tissue was collected immediately after perfusion and a single-cell suspension was prepared as described previously56. FACS was then used to purify CD11b+ CD45low microglia cells using allophycocyanin (APC)-conjugated CD11b mouse clone M1/70.15.11.5 (Miltenyi Biotec, 130-098-088) and phycoerythrin (PE)-conjugated CD45 antibody (BD Pharmingen, 553081). Cells were collected directly into RNA lysis buffer (Qiagen, 74104).

cDNA synthesis and qPCR

Total RNA was extracted using the RNeasy Mini kit (Qiagen, 74104) according to manufacturer’s instructions. RNA concentration and purity was determined using Agilent’s Bioanalyzer and reverse transcribed using iScript cDNA Synthesis Kit (Biorad, 170-8891). For gene expression analysis cDNA from three biological replicates was quantitatively amplified on a thermal cycler (BioRad) using SYBR green (Biorad) and gene-specific primers (Supplementary Table 8). The comparative Ct method57 was used to examine differences in gene expression. Values were normalized to expression levels of Cd11b (also known as Itgam). Three technical replicates were used for each gene.

eQTL analysis

The human orthologous regions to mouse enhancers that change in the CK-p25 mouse were compared to control for the their enrichment to overlap regulatory SNPs from published eQTL studies in immune cell types under a variety of conditions25,26. Because the eQTLs were processed separately, we applied our own threshold (P < 1 × 10−4). We then calculated enrichment of human orthologues of different categories CK-p25 enhancers relative to stable regions and used a binomial P value to estimate the significance.

Enrichment of AD GWAS SNPs in Roadmap enhancers

The enrichment of AD GWAS SNPs that map to Roadmap enhancer regions is calculated on the basis of permutations of SNPs. In brief, SNPs were permuted 1,000,000 times preserving distance to gene, minor allele frequency, and a number of SNPs in LD. The thousand genomes projects database was used as the reference for this information.

Comparison of regulatory regions to AD meta-analysis

The enrichment of CKp-25 human enhancer orthologues in AD was calculated by comparing the number changing regions that overlap SNPs4 to unchanging regions that overlap SNPs. We calculate the significance using a binomial P value, in which the probability of success in the changing enhancers is based on the frequency in the unchanging enhancers. The results for the consistently increasing enhancers were slightly more significance when using a hypergeometric test instead of the binomial. To test whether the enrichment of increasing enhancer orthologous regions was due to the overlap with CD14+ cell enhancers, we repeated the above enrichment procedure within the set of CK-p25 enhancer orthologues that also overlap CD14+ cell enhancers. The enrichment using this control was still significant (3.0-fold enrichment, binomial P = 1.3 × 10−5). AD GWAS SNPs that were in a mouse enhancer orthologues were expanded using an LD of 0.8 and then tested for potential coding SNPs58 or eQTLs (Supplementary Table 7).