Large eQTL meta-analysis reveals differing patterns between cerebral cortical and cerebellar brain regions

The availability of high-quality RNA-sequencing and genotyping data of post-mortem brain collections from consortia such as CommonMind Consortium (CMC) and the Accelerating Medicines Partnership for Alzheimer’s Disease (AMP-AD) Consortium enable the generation of a large-scale brain cis-eQTL meta-analysis. Here we generate cerebral cortical eQTL from 1433 samples available from four cohorts (identifying >4.1 million significant eQTL for >18,000 genes), as well as cerebellar eQTL from 261 samples (identifying 874,836 significant eQTL for >10,000 genes). We find substantially improved power in the meta-analysis over individual cohort analyses, particularly in comparison to the Genotype-Tissue Expression (GTEx) Project eQTL. Additionally, we observed differences in eQTL patterns between cerebral and cerebellar brain regions. We provide these brain eQTL as a resource for use by the research community. As a proof of principle for their utility, we apply a colocalization analysis to identify genes underlying the GWAS association peaks for schizophrenia and identify a potentially novel gene colocalization with lncRNA RP11-677M14.2 (posterior probability of colocalization 0.975).


Introduction
Defining the landscape of genetic regulation of gene expression in a tissue-specific manner is useful for understanding both normal gene regulation and how variation in gene expression can alter disease risk. In the latter case, a variety of approaches now leverage the association between genetic variants and gene expression changes, including colocalization analysis 1-7 , transcriptome-wide association studies (TWAS) 8,9 , and gene regulatory network inference [10][11][12][13][14][15][16] .
There has been a relative lack of expression quantitative trait loci (eQTL) studies from the brain. Because of the more accessible nature of tissues such as blood or lymphoblastoid cell lines (LCLs), much of the large-scale identification of expression quantitative trait loci (eQTL) has occurred in these tissues [17][18][19][20] . For most other tissues, obtaining samples for RNA sequencing (RNA-seq) requires invasive biopsy, and brain tissues are typically only available in post-mortem brain samples. One effort, the Genotype-Tissue Expression (GTEx) project 21,22 , has profiled a broad range of tissues (42 distinct) for eQTL discovery, however, samples sizes in brain have been small

Results
We generated eQTL from the publicly available AMP-AD (ROSMAP 27,28,35,37 , Mayo RNAseq 29,[38][39][40] ) and CMC 23,24,26 (MSSM-Penn-Pitt 24,26 , HBCC 26 ) cohorts with available genotypes and RNA-seq data, using a common analysis pipeline (Supplementary Table 1) (https://www.synapse.org/#!Synapse:syn17015233). Analyses proceeded separately by cohort. Briefly, the RNA-seq data were normalized for gene length and GC content prior to adjustment for clinical confounders, processing batch information, and hidden confounders using Surrogate Variable Analysis (SVA) 41 . Genes having at least 1 count per million (CPM) in at least 50% of samples were retained for downstream analysis (Supplementary Table 2). Genotypes were imputed to the Haplotype Reference Consortium (HRC) reference panel 42 . eQTL were generated adjusting for diagnosis (AD, control, other for AMP-AD cohorts and schizophrenia, control, other for CMC cohorts) and principal components of ancestry separately for ROSMAP, Mayo temporal cortex (TCX), Mayo cerebellum (CER), MSSM-Penn-Pitt, and HBCC. For HBCC, which had a small number of samples derived from infant and adolescents, we excluded samples with age-of-death less than 18, to limit heterogeneity due to differences between the mature and developing brain.
We then performed a meta-analysis using the eQTL from cortical brain regions from the individual cohorts (dorsolateral prefrontal cortex (DLPFC) from ROSMAP, MSSM-Penn-Pitt, and HBCC and TCX from Mayo). The meta-analysis identifies substantially more eQTL than the individual cohorts (Table 1, Fig. 1). There is a strong relationship between the sample size in the individual cohorts and meta-analysis and the number of significant eQTL and genes with eQTL (Fig. 1b,c). Notably, the meta-analysis identified significant eQTL (at FDR ≤ 0.05) in >1000 genes for which no eQTL were observed in any individual cohort. Additionally, we find significant eQTL for 18,295 (18,433 when considering markers with minor allele frequency (MAF) down to 1%) of the 19,392 genes included in the analysis.
We then compared our cortical eQTL to those from GTEx (v7) 21 , which is the most comprehensive brain eQTL database available in terms of number of available brain tissues (Table 1, Table 2). Due to the substantially larger power in these data, we find >3.8 million eQTL not identified in GTEx cortical regions (Anterior Cingulate Cortex, Cortex or Frontal Cortex) and we find eQTL for >11,000 genes with no eQTL in these cortical regions in GTEx. While GTEx employs a stricter approach to the control of false discovery rate (FDR), we find that re-analysis of the GTEx cortical regions using an approach similar to ours (see Methods) did not account for the number of eQTL and genes with eQTL discovered in this analysis, but not in GTEx (3,619,693 and 6,866 for eQTL and genes, respectively, when using the less conservative approach). Next, we evaluated the replication within our cortical and cerebellar eQTL of the region specific eQTL identified in GTEx. The cortical eQTL generated through the current analyses strongly replicate the eQTL available through GTEx, not only for cortical regions, but for all brain regions including cervical spinal cord (Table 2). Interestingly, the replication in these cortical eQTL of eQTL derived from the two GTEx cerebellar brain regions (cerebellum and cerebellar hemisphere) is consistently lower than for other brain regions represented in GTEx. However, replication of GTEx cerebellar eQTL is high when compared to the cerebellar eQTL generated in this analysis from the Mayo Clinic CER samples. We also performed the reverse comparison, by examining the replication of our eQTL in those region-specific eQTL identified in GTEx. Unsurprisingly, the replication levels were substantially lower, due to the lower power in the GTEx analyses. Replication rates were not substantially changed by using GTEx eQTL discovered using our less conservative approach. Additionally, we compared our eQTL to a publically available fetal brain eQTL resource 43 and found good replication of these eQTL as well (estimated replication rate π 1 = 0.909 for the cortical meta-analysis, and π 1 = 0.861 for cerebellum), though somewhat lower than the replication observed in the GTEx cohorts, which are comprised of adult-derived samples.
Finally, as a proof of concept, we performed a colocalization analysis between our eQTL meta-analysis and the Psychiatric Genomics Consortium (PGC) v2 schizophrenia GWAS summary statistics 36 . Seventeen genes showed posterior probability of colocalization using coloc 7 (PP(H 4 )) > 0.7 (Table 3), with 3 showing PP(H 4 ) > 0.95 (FURIN, ZNF823, RP11-677M14.2). FURIN, having previously identified as a candidate through colocalization 24 has recently been shown to reduce brain-derived neurotrophic factor (BDNF) maturation and secretion when inhibited by miR-338-3p 44 . ZNF823 has been identified in previous colocalization analyses 45,46 . RP11-677M14.2, a lncRNA located inside NRGN, while not previously identified through colocalization analysis, has been shown to be down-regulated in the amygdala of schizophrenia patients 47 Table 3). At the THOC7 locus, the competing gene, C3orf49 shows slightly lower strength for colocalization (PP(H 4 ) = 0.820), and the associations do not appear to be independent (R 2 between best SNPs = 0.979). At the FAM85B locus, the competing pseudo gene FAM86B3P shows substantially lower evidence for colocalization (PP(H 4 ) = 0.513) and in this case too, the associations appear to be non-independent (R 2 = 0.902).    Table 2. Replication rates between GTEx and publicly available eQTL from these analyses.

Discussion
Using resources generated in the AMP-AD and CMC consortia, we have generated a well-powered brain eQTL resource for use by the scientific community. Unsurprisingly, we see a strong relationship between the number of significant eQTL, as well as genes with significant eQTL, and sample size using analyses from the individual cohorts as well as the meta-analysis of those cohorts. This result has previously been shown for lower sample sizes 21 . We also show higher replication of GTEx eQTL in the meta-analysis relative to the individual cohorts. These conclusions appear to be independent of methodological differences between our analysis and the one done by GTEx. Notably, we find significant eQTL for nearly every gene in our analysis, which include all but very lowly expressed genes (less than 1 cpm in more than 50% of samples). The wide discovery of eQTL is potentially beneficial for analyses utilizing these results, such as colocalization analysis or TWAS imputation, because more genes with significant eQTL means more genes can be evaluated with these approaches. Because we have discovered eQTL for most genes, further increasing sample size will not substantially increase the number of genes with significant eQTL. However it is likely that the number of significant eQTL associations within each gene would continue to increase, which may include additional associations tagging the same regulator or independent associations tagging weaker regulators, along with the accuracy of estimated effect sizes. This will result in a more accurate landscape of regulatory association, which will improve the ability to fine-map causal regions, and colocalize eQTL signal with clinical traits of interest. Thus, it will be valuable to continue to update this meta-analysis with additional data from these consortia and other resources as they become available, and continue to improve this resource as future data permits. Future work may also focus on using well-powered analysis to study the landscape of causal variation and co-variation in gene regulation.
We found distinct eQTL patterns across cerebral cortical and cerebellar brain regions in our resource. Specifically, comparison of eQTL from our resource with those from GTEx shows high replication for the majority of brain regions. However, cerebellar regions show consistently lower replication with the cerebral cortical eQTL generated here. In contrast, the cerebellar eQTL generated from the Mayo Clinic study replicate GTEx cerebellar eQTL at a substantially higher rate, suggesting a different pattern of regulatory variation affecting expression in cerebellum versus other brain regions. Indeed, epigenomic analyses show substantial differences between cerebellar and cerebral cortical regions [48][49][50][51] , particularly in methylation patterns, which could drive different eQTL association patterns. This is further corroborated by the observation of substantial coexpression differences between cerebellar and other brain regions 52 . These effects could be due to differences in cell type composition, with cerebellar regions consisting of substantially more neurons than other brain regions 53 . This is supported by a gene enrichment analysis of genes showing different eQTL association patterns between cerebellum and cortex, which showed that many of the top gene sets were neuron or signaling related (Supplementary Table 4). One recent report suggests that there are also widespread differences in histone modifications within cell types derived from cerebellar and cortical regions 54 , though this effect had not been noted in other studies. In particular, Ma et al. 54 observed that both neuronal and non-neuronal cell types show differing histone modifications across tissue of origin. Further work is necessary to confirm this finding and to develop models to deconvolve the cell-type specific regulatory effects in different brain regions [55][56][57] , however our analysis demonstrates that this meta-analysis is representative of eQTL across the majority of brain regions, with the exception of cerebellum. Future meta-analytic analyses may also cast a wider net in terms of brain regions included.  www.nature.com/scientificdata www.nature.com/scientificdata/ The replication of fetal eQTL, while significant, is somewhat lower than the replication of adult eQTL represented in GTEx. This may be due to multiple factors. The fetal eQTL analysis was generated from brain homogenate, rather than dissected brain regions, though the lower replication likely also reflects broad transcriptional differences between developing and mature brain 58 . These transcriptional differences may also explain why we find substantially more eQTL than a recently published, similarly sized eQTL analysis which uses samples from across developmental and adult timepoints 25 , and why this meta-analysis shows higher replication of GTEx eQTL.
Previous studies report a lack of widespread disease-specific eQTL observed in schizophrenia (CMC) 24 and Alzheimer's (ROSMAP) 35 . In accordance, we find a strong overlap among eQTL across disparate disease samples, particularly those with neuropsychiatric and neurodegenerative disorders, as well as normal individuals from these and other cohorts such as GTEx 24,35 . This suggests that disease-specific eQTL, if they exist, are likely few in number and/or small in effect size, relative to condition-independent eQTL in general. If they do exist, disease-specific eQTL discovery may be successful in more targeted analyses or with larger sample sizes or meta-analyses, but was not explored for the purpose of this general resource. Thus, the heterogeneous samples derived from different disease-based cohorts can be meta-analyzed to create a general-purpose brain eQTL resource representing adult gene regulation, despite comprising samples with different disease backgrounds, along with normal controls. Therefore, these eQTL will be useful both within and outside these specific disease contexts. For example, since these eQTL are not disease specific they may be used to understand healthy gene expression regulation in the brain, as well as to infer colocalization of eQTL signatures with disease risk for any disease whose tissue etiology is from the brain, since these signatures are reflective of normal brain regulation. It should be stated that while many eQTL are not disease specific, i.e. they are identified under various central nervous system (CNS) disease diagnoses and in control brains, they may still contribute to common CNS diseases as previously demonstrated 24,[32][33][34]45,46 . While we have demonstrated a proof-of-concept colocalization analysis with a previously published schizophrenia GWAS, these eQTL are a broadly useful resource for studying neuropsychiatric and neurodegenerative disorders, as well as for understanding the landscape of gene regulation in brain.
For the AMP-AD studies (ROSMAP, Mayo RNAseq), Picard v2.2.4 (https://broadinstitute.github.io/picard/) was used to generate FASTQ files from the available BAM files, using the Picard SamToFastq function. Picard SortSam was first applied to ensure that R1 and R2 reads were correctly ordered in the intermediate SAM file before converting to FASTQ. The converted FASTQs were aligned to the GENCODE24 (GRCh38) reference genome using STAR v2.5.1b, with twopassMode set as Basic. Gene counts were computed for each sample by STAR by setting quantMode as GeneCounts.

RNA-seq normalization.
To account for differences between samples, studies, experimental batch effects and unwanted RNA-seq-specific technical variations, we performed library normalization and covariate adjustments for each study separately using fixed/mixed effects modeling. A mixed effect model was required to jointly normalize both tissues from the Mayo cohort. All other cohorts contained only one tissue, so a fixed effect model was used. The workflow consisted of the following steps: 1. Gene filtering: Out of ~56 K aligned and quantified genes, only genes showing at least modest expression were used in this analysis. Genes that were expressed more than 1 CPM (read Counts Per Million total reads) in at least 50% of samples in each tissue and diagnosis category were retained for analysis. Additionally, genes with available gene length and percentage GC content from BioMart December 2016 archive were subselected from the above list. This resulted in approximately 14 K to 16 K genes in each study.

Calculation of normalized expression values:
Sequencing reads were then normalized in two steps. First, conditional quantile normalization (CQN) 61 was applied to account for variations in gene length and GC content. In the second step, the confidence of sampling abundance was estimated using a weighted linear model using the voom-limma package in bioconductor 62,63 . The normalized observed read counts, along with the corresponding weights, were used in the following steps. 3. Outlier detection: Based on normalized log2(CPM) of expression values, outlier samples were detected using principal component analysis (PCA) 64,65 and hierarchical clustering. Samples identified as outliers using both the above methods were removed from further analysis. 4. Covariate imputation: Before identifying associated covariates, important missing covariates were imputed. Principally, post-mortem interval (PMI), or the latency between death and tissue collection, which is frequently an important covariate for the analysis of gene expression from post-mortem brain tissue, was imputed for a portion of samples in Mayo RNAseq data for which true values were unavailable. Genomic predictors of PMI were estimated using ROSMAP and MSSM (an additional RNA-seq study available through AMP-AD) samples and were used to impute missing values as necessary. 5. Covariate identification: Normalized log2(CPM) counts were then explored to determine which known covariates (both biological and technical) should be adjusted. Except for the HBCC study, we used a stepwise (weighted) fixed/mixed effect regression modeling approach to select the relevant covariates having a significant association with gene expression. Here, covariates were sequentially added to the model if they were significantly associated with any of the top principal components, explaining more than 1% of variance of expression residuals. For HBCC, we used a model selection based on Bayesian information criteria (BIC) to identify the covariates that improve the model in more than 50% of genes.
www.nature.com/scientificdata www.nature.com/scientificdata/ 6. SVA adjustments: After identifying the relevant known confounders, hidden-confounders were identified using the Surrogate Variable Analysis (SVA) 41 . We used a similar approach as previously defined 24 to find the number of surrogate variables (SVs), which is more conservative than the default method provided by the SVA package in R 66 . The basic idea of this approach is that for an eigenvector decomposition of permuted residuals each eigenvalue should explain an equal amount of the variation. By the nature of eigenvalues, however, there will always be at least one that exceeds the expected value. Thus, from a series of 100 permutations of residuals (white noise) we identified the number of covariates as shown in Supplementary  Table 1. We applied the "irw" (iterative re-weighting) version of SVA to the normalized gene expression matrix, along with the covariate model described above to obtain residual gene expression for eQTL analysis. 7. Covariate adjustments: We performed a variant of fixed/mixed effect linear regression, choosing mixed-effect models when multiple tissues or samples, were available per individual, as shown here: gene expression ~ Diagnosis + Sex + covariates + (1|Donor), where each gene was linearly regressed independently. Here Donor (individual) was modeled as a random effect when multiple tissues from the same individual were present. Observation weights (if any) were calculated using the voom-limma 62,63 pipeline, which has a net effect of up-weighting observations with inferred higher precision in the linear model fitting process to adjust for the mean-variance relationship in RNA-seq data. The diagnosis component was then added back to the residuals to generate covariate-adjusted expression for eQTL analysis.
This workflow was applied separately for each study. For the AMP-AD studies, gene locations were lifted over to GRCh37 for comparison with the genotype imputation panel (described below). For HBCC, samples with age <18 were excluded prior to analysis. Supplementary Table 1 shows the covariates and surrogate variables identified in each study.

AD diagnosis harmonization.
Prior to RNA-seq normalization, we harmonized the LOAD definition across AMP-AD studies. AD controls were defined as patients with a low burden of plaques and tangles, as well as lack of evidence of cognitive impairment. For the ROSMAP study, we defined AD cases to be individuals with a Braak 67 greater than or equal to 4, CERAD score 68 less than or equal to 2, and a cognitive diagnosis of probable AD with no other causes (cogdx = 4), and controls to be individuals with Braak less than or equal to 3, CERAD score greater than or equal to 3, and cognitive diagnosis of 'no cognitive impairment' (cogdx = 1). For the Mayo Clinic study, we defined disease status based on neuropathology, where individuals with Braak score greater than or equal to 4 were defined to be AD cases, and individuals with Braak less than or equal to 3 were defined to be controls. Individuals not meeting "AD case" or "control" criteria were retained for analysis, and were categorized as "other" for the purposes of RNA-seq adjustment.
Genotype QC and imputation. Genotype QC was performed using PLINK v1.9 69 . Markers with zero alternate alleles, genotyping call rate ≤ 0.98, Hardy-Weinberg p-value < 5e−5 were removed, as well as individuals with genotyping call rate < 0.90. Samples were then imputed to HRC (Version r1.1 2016) 42 as follows: marker positions were lifted-over to GRCh37, if necessary. Markers were then aligned to the HRC loci using HRC-1000G-check-bim-v4.2 (http://www.well.ox.ac.uk/~wrayner/tools/), which checks the strand, alleles, position, reference/alternate allele assignments and frequencies of the markers, removing A/T & G/C single nucleotide polymorphisms (SNPs) with minor allele frequency (MAF) > 0.4, SNPs with differing alleles, SNPs with > 0.2 allele frequency difference between the genotyped samples and the HRC samples, and SNPs not in the reference panel. Imputation was performed via the Michigan Imputation Server 70 using the Eagle v2.3 71 phasing algorithm. Imputation was done separately by cohort and by chip within cohort, and markers with R 2 ≥ 0.7 and minor allele frequency (MAF) ≥ 0.01 (within cohort) were retained for analysis.
Genetic ancestry inference. GEMTOOLs 72 was used to infer ancestry and compute ancestry components separately by cohort. The GEMTOOLs algorithm uses a significance test to estimate the number of eigenvectors (ancestry components) necessary to represent the variability in the data 73 . For each cohort, we used the top components as estimated by the GEMTOOLs algorithm, which resulted in some variation in the number of components selected. For MSSM-Penn-Pitt and HBCC, which are multi-ethnic cohorts, only Caucasian samples were retained for eQTL analysis. eQTL analysis. eQTL were generated separately in each cohort and tissue using MatrixEQTL 74 adjusting for harmonized Diagnosis and inferred Ancestry components using "cis" gene-marker comparisons: Expression ~ Genotype + Diagnosis + PC 1 + … + PC n, , where PC k is the k th ancestry component, using Expression variables which were previously covariate adjusted as described above. Here we define "cis" as ± 1 MB around the gene, and GRCh37 gene locations were used for consistency with the marker imputation panel. Meta-analysis was performed via fixed-effect model 75 using an adaptation of the metareg function in the gap package in R. In order to assess potential inflation of Type 1 error, we performed 5 permutations of the gene expression values, relative to genotype and ancestry components, within diagnosis for each cohort, and repeated the regression analyses as described above. For each of the 5 iterations of permutation, a meta-analysis was then performed across the 4 cohorts. We found that Type 1 error was well controlled (Fig. 1a). Given that multiple tissues were present, we also evaluated a random-effect model, but found substantially deflated p-values (less significant) in the permutations, relative to the expected distribution, suggesting that this model is over-conservative.
Comparison with GTEx and fetal eQTL. Full summary statistics for the GTEx v7 21 eQTL for all available brain regions were obtained from the GTEx Portal (https://gtexportal.org/), and fetal eQTL were obtained from www.nature.com/scientificdata www.nature.com/scientificdata/ Figshare 76 . For each replication comparison (e.g. meta-analysis vs GTEx or meta-analysis vs. fetal eQTL), only markers and genes present in both the external eQTL and our analysis were retained for comparison. As this was done separately for GTEx and for the fetal eQTL resource, the list of genes and SNPs varies slightly for each comparison. The replication rate was estimated as the π 1 statistic using the qvalue package 77 in R as follows: we extracted the meta-analysis p-values for all SNP-gene pairs, which were significant in GTEx at FDR ≤ 0.05. We then applied the 'qvalue' command to the meta-analysis p-values to generate π π = − 1 1 0 , which corresponds to estimated proportion of non-null p-values 77 . The 'smoother' option was used to estimate π 0 as a function of the tuning parameter λ as it approaches 1. The variance around this estimate is relatively small (see Supplementary  Figures 1 and 2 for example) and does not materially affect the observations in this manuscript.
Conversely, we estimated the replication rate of significant meta-analysis eQTL SNP-gene pairs in GTEx. Analogous methods were used to estimate all other replication rates. For the purposes of reporting the total number of eQTL not present in GTEx, and genes without eQTL in GTEx (Table 1), we have included genes and SNP-gene pairs not present in GTEx in the count, however this accounts for a relatively small proportion of the difference (472,995 eQTL and 1481 genes).

GTEx eQTL generation.
In order to verify that the observed power increase and replication imbalances were not due to methodological differences between this manuscript and those performed by GTEx, we obtained access to the GTEx v7 data, and generated eQTL for cortex, anterior cingulate cortex, and frontal cortex using our approach. We used gene expression and imputed genotypes as provided, as well as the provided covariates, which included 3 ancestry covariates, 14-15 surrogate variable covariates, sex and platform. We then repeated the comparisons with the meta-analysis described in the previous section, using a MAF cutoff of 0.03, which best appeared to control Type 1 error, as observed by permutation between genotype and gene expression, while maximizing the number of significant eQTL in the true data. Results did not change materially. Pathway analysis of cerebellar eQTL genes. In order to identify whether genes showing cerebellar-specific eQTL patterns showed any biological coherence, we performed a pathway analysis as follows. For genes with at least 5 significant cerebellar eQTL, we computed the Spearman correlation of effect-size between cerebellum eQTL and cortical eQTL for the loci that were significant in cerebellum. We then selected genes for which the effect-sizes did not show positive correlation (Spearman's ρ < 0.1) between the two tissues as showing different eQTL association patterns across the gene and performed a pathway analysis with GO biological processes Fisher's exact test. Note that due to the (power-mediated) greater detection of eQTL in cortex, we did not perform the reverse comparison. The results were relatively robust to the choice of minimum number of significant eQTL, correlation cutoff, and choice of correlation statistic (Spearman vs Pearson).
Coloc analysis. We applied Approximate Bayes Factor colocalization (coloc.abf) 7 from the coloc R package to the summary statistics from the PGC2 Schizophrenia GWAS 36 downloaded from the PGC website (http://pgc. unc.edu), and the summary statistics from the eQTL meta-analysis. Each gene present in the meta-analysis was compared to the GWAS in turn, and suggestive and significant GWAS peaks with p-value < 5e-6 were considered for analysis.

Data availability
Data for the ROSMAP 37 and Mayo cohorts 40 are available through the AMP-AD Knowledge Portal 31 . Data for the MSSM-Penn-Pitt and HBCC cohorts are available through the CommonMind Knowledge Portal 23 .
eQTL results for the ROSMAP 78 , Mayo TCX 79 , Mayo CER 80 and cortical meta-analysis 81 are available through the AMP-AD Knowledge Portal. These results include SNP (location, rsid, alleles, and allele frequency) and gene (location, gene symbol, strand and biotype) information, as well as estimated effect size (beta), statistic (z), p-value, FDR, and expression-increasing allele.

Code availability
An R package with all code for the gene expression normalization is available at https://github.com/Sage-Bionetworks/ampad-diffexp. All other analyses were generated using packages publicly available from their respective authors.