Gene expression genetics of the striatum of Diversity Outbred mice

Brain transcriptional variation is a heritable trait that mediates complex behaviors, including addiction. Expression quantitative trait locus (eQTL) mapping reveals genomic regions harboring genetic variants that influence transcript abundance. In this study, we profiled transcript abundance in the striatum of 386 Diversity Outbred (J:DO) mice of both sexes using RNA-Seq. All mice were characterized using a behavioral battery of widely-used exploratory and risk-taking assays prior to transcriptional profiling. We performed eQTL mapping, incorporated the results into a browser-based eQTL viewer, and deposited co-expression network members in GeneWeaver. The eQTL viewer allows researchers to query specific genes to obtain allelic effect plots, analyze SNP associations, assess gene expression correlations, and apply mediation analysis to evaluate whether the regulatory variant is acting through the expression of another gene. GeneWeaver allows multi-species comparison of gene sets using statistical and combinatorial tools. This data resource allows users to find genetic variants that regulate differentially expressed transcripts and place them in the context of other studies of striatal gene expression and function in addiction-related behavior.


Background & Summary
Substance use disorder is a highly heritable trait involving variation in neural circuitry underlying motivated behavior and behavioral inhibition.Characterization of addiction-related brain regions in genetically diverse mice can lead to the discovery of molecular mechanisms of addiction-related behaviors.These mechanisms can in turn, aid in connecting genetic, genomic and behavioral variation within and across species through shared target genes 1 .
Drug-induced transcriptional changes in the corticostriatal system have been reported in rodent models 2,3 and substance dependent individuals 4,5 .Additionally, behavioral correlates of substance use disorder, namely impulsivity and incentive sensitization, all involve corticostriatal circuitry [6][7][8][9][10] ; however, the molecular mechanisms underlying these relationships are unknown.The striatum plays a central role in addiction-related behavior 11,12 and influences behaviors (e.g., sensation seeking) that predict the development of substance use disorders [13][14][15][16] .It receives inputs from diverse brain regions, including the midbrain, prefrontal cortex, and thalamus, and plays a fundamental role in goal-directed actions and habits 11 .The dopamine projections from the ventral tegmental area to the nucleus accumbens and prefrontal cortex are at the heart of this reward circuit, and their importance to drug reward is well established 17 .Neuroimaging studies of people with a history of cocaine use disorders and rodent studies have indicated that addiction is a circuit-level disorder involving several functionally inter-connected brain regions 18,19 .Identifying drug-induced changes in gene expression 20,21 and resulting neural plasticity 22,23 in addiction-relevant neurocircuits can reveal underlying sources of addiction risk and resilience.
Gene expression quantitative trait locus (eQTL) and systems genetic analyses facilitate the identification of genes and variants associated with complex traits, including those related to addiction 24 .Such data allow the reduction of large numbers of positional candidate genes and variants implicated in quantitative trait locus (QTL) for behavioral traits.These data are also useful for discovering transcripts significantly correlated with behavior and uncovering transcriptional co-expression networks to identify the biomolecular mechanisms underlying complex traits 24 .Researchers can also use eQTL data to identify genetic variants regulating differentially expressed genes, such as those discovered in drug exposure studies.This data can be used with data from other model organisms 25 to identify convergent evidence for biological mechanisms of addiction across species 26,27 .Model organism eQTLs can also be related to convergent findings in humans to prioritize genome-wide association study (GWAS) results and to contextualize the role of the identified variant 1 .
To ensure variation in nearly every gene in the genome and to increase the precision of genetic mapping and variant associations, the Diversity Outbred [28][29][30] (J:DO) mice were developed as an advanced intercross of the eight-way hybrid Collaborative Cross (CC) population 31,32 .Within the J:DO population, there are over 45 million single-nucleotide polymorphisms (SNPs) and millions of segregating structural variants 12,33,34 .This high genetic diversity results in high phenotypic and transcriptomic variation 35,36 , enabling the discovery of genes and variants associated with behaviors.
Transcription regulatory variation is often context specific.Studies of transcriptional variation in J:DO mice have revealed precise genetic variation affecting proteomes and cellular transcriptional states in addition to bulk transcriptomics of tissues relevant to metabolism in health and disease [37][38][39] .
To construct a versatile reference data resource for addiction genetics, we performed a series of addiction-relevant exploratory and risk-taking behavioral assays 35,40 , and then profiled transcript abundance in striatum using RNA-Seq on 368 drug naïve J:DO mice.Data from this study are delivered in a platform that allows for the identification of eQTL effects, analysis of local SNP associations, assessment of gene expression correlations, and application of mediation analysis.This provides a resource for genetic studies of transcriptional diversity in the striatum of drug naïve mice.Combined with behavioral phenotyping, this resource enables the prioritization of behavioral QTL positional candidates by incorporating evidence from strong cis-eQTL effects and their underlying allelic patterns.Furthermore, behavioral QTLs can be subjected to local SNP association analysis followed by prioritization of positional candidates where the SNP strain distribution pattern of positional candidates matches the allelic effects of interest.Positional candidates can then be queried in this resource for the presence or absence of strong cis-eQTLs.Finally, the data can be used in global analyses of the relationship of trait-relevant variation across species, using increasingly sophisticated approaches for leveraging model organism data to predict, model, and explain polygenic risk for human disease [41][42][43] .
Behavioral Phenotyping.At three to six months of age, mice were phenotyped four separate times with a different assay on each day, Monday to Thursday (open field, light-dark, hole-board, and novelty place preference) 44 and euthanized on Friday in batches of 16-24 by decapitation.Phenotyping protocols are available in the Mouse Phenome Database, a recognized NIH biomedical data repository, and at https://www.addiction-neurogenetics.org/data-and-resources/.The Jackson Laboratory (JAX) follows husbandry practices in accordance with the American Association for the Accreditation of Laboratory Animal Care (AAALAC), and all work was done with the approval of the JAX Institutional Animal Care and Use Committee (Approval #10007).
Dissections.Testing and euthanasia were consistently performed between 8 AM to 12 PM to control for circadian effects.All surgical instruments were cleaned with RNAase Away (ThermoFischer Scientific, Waltham, MA) prior to use and between samples.Whole intact brains were removed, hemisected, and incubated in RNAlater (ThermoFischer Scientific, Waltham, MA) for 8-14 minutes.Then, under a dissection microscope, the striatum, hippocampus, and prefrontal cortex were removed and soaked for 24 hours in RNAlater at room temperature before being stored at −80 °C until processing.
RNA Isolation and sequencing.The striatum was homogenized, and total RNA was isolated using a TRIzol Plus kit (Life Technologies, City, State) with on-the-column DNase digestion according to the manufacturer's instructions.The quality of the isolated RNA was assessed using an RNA 6000 Nano LabChip using an Agilent 2100 Bioanalyzer instrument [RRID:SCR_019389 (Agilent Technologies, Santa Clara, CA)] and a NanoDrop spectrophotometer [RRID:SCR_018042 (ThermoFisher Scientific, Wilmington, DE)].The External RNA Controls Consortium spike-in (ERCC, Ambion, Austin, TX) was added to the samples to allow for normalization in accordance with the core facility's standard operating procedure but was not used in our downstream analyses.An RNA-Seq library was prepared using the KAPA Stranded RNA-Seq Kit with RiboErase (Kappa Biosystems, City, State).Libraries were then pooled and sequenced at The Jackson Laboratory using a 100 bp paired-end process on a HiSeq2500 (Illumina) sequencing system (RRID:SCR_016383) targeting 40 million read pairs per sample.Sequencing achieved a median read depth of 65.7 million read pairs per sample (range: 31.4 million to 117.4 million reads).
Sequencing analysis.Raw read data were demultiplexed and converted to FASTQ files.Paired-end FASTQ files from multiple lanes were concatenated together prior to alignment.All paired-end FASTQ files were aligned to the Mus musculus GRCm38 reference (GenBank accession number: GCA_000001635.2) with Ensembl v94 (October 2018) annotation using STAR (v2.6.1c)(RRID:SCR_004463) set to produce both genome and transcriptome Binary Alignment Map (BAM) files.STAR was used with default options ensuring that these defaults allowed for a maximum of 10 multi-mapped reads and a maximum of 10 mismatches.Reads exceeding these criteria were excluded from further downstream processing.Expression estimation was performed using the RSEM package (v1.3.0)(RRID:SCR_013027) with-estimate-rspd using the transcriptome BAM files obtained following alignments from STAR.RSEM expected counts per transcript were used for downstream analysis.
Genotyping, haplotype reconstruction and sample QC.Genotyping was performed on tail biopsies by Neogene Genomics (Lincoln, NE) using the Mouse Universal Genotyping Array (GigaMUGA) 47 consisting of 143,259 markers.Based on published genotype QC workflows 48 , 110,524 markers and 386 mice were retained for further analysis 48 .Genotypes were converted to founder strain-haplotype reconstructions using R/ qtl2 49 (qtl2_0.21-1,http://kbroman.org/qtl2)(RRID:SCR_018181).We computed Pearson correlations between GBRS-based 50 and array-based haplotype probabilities on 69,000 common grids.We found that 90% of samples have Pearson correlation r > 0.8.The median correlation coefficient was 0.834.Sample swaps were identified by detecting incorrect pairing of RNA-Seq and genotyping array data among these samples.We corrected the sample mix-ups by reassigning gene expression and genotypes that maximize correlation on haplotype probabilities.
Expression QtL mapping.Prior to eQTL mapping, gene expression counts were obtained by summing expected counts over all transcripts for a given gene.Expression for eQTL analysis was adjusted for choroid plexus contamination by regressing the log-mean of Kl and Ttr as additive covariates.eQTL mapping was performed on regression residuals of 17,248 genes using the R/qtl2 package and the founder haplotype regression method.To correct for population structure, kinship matrices were computed with the Leave One Chromosome Out (LOCO) option for kinship correction (http://kbroman.org/qtl2) 51.Additive covariates of sex and J:DO generation were used in the eQTL mapping model.Specifically, for each gene, the following linear model was fit, where y i is the gene expression abundance of the i th animal, s i is the sex of animal i, β s is the effect of sex, gen i is the generation of animal i, β gen is the effect of generation, g ij is the founder probability for founder allele j in animal i, β j is the genotype coefficient, and γ i is the random effect representing the polygenic influence of animal i as modeled by a kinship matrix.All variant associations are visible in the eQTL viewer and can be obtained from reanalysis of the whole data set.Each eQTL was categorized as either cis or trans for the purpose of listing them in GeneWeaver, where cis is defined as eQTLs within +/− 2 MB of the transcription start site (TSS) of the gene, and trans are eQTLs further away or on other chromosomes.Although LD for this generation is estimated to be quite small (LD 1/2 ~ 0.1 Mb 52 ) a2 MB interval was chosen to ensure inclusion of longer distance effects on expression, e.g.distal enhancer variation.It is important to note that such distal variants need not be in high LD with the target gene to regulate it in cis.creation of the eQTL viewer object.The results of the eQTL analysis, along with expression estimates, genotypes, and covariates, are encapsulated in an RData object.RData objects are designed for use in R and contain all the objects necessary for reproducing an analysis.We followed the instructions provided by the developers of the eQTL viewer 53,54 .Specifically, we created the following elements: kinship, map, genotype probabilities (genoprobs), markers, and a dataset object that contains information on the gene annotations, covariates, expression data, and sample annotations.
WGcNA analysis.RNA-Seq data was analyzed with WGCNA (RRID:SCR_003302) 55 .A soft thresholding power of 3 was selected using the WGCNA scale-free topology R 2 threshold of 0.9 with a signed network with a minimum module size of 30.This soft threshold of 3 was chosen because that was the lowest threshold that yielded a high scale-free topology R 2 of >0.9 while providing maximal connectivity (median k = 2,270).The correlation calculation utilized was bicor, and modules used numeric labels instead of colors.
Paraclique analysis.RNA-Seq data was analyzed with paraclique 56 using a bicor with a correlation coefficient threshold of |0.5| (unsigned), minimum seed clique size of 5, minimum finished paraclique size of 10, proportional glom factor of 0.2 for paraclique construction.
Gene sets for analysis in geneweaver.Sets of genes representing the J:DO striatum eQTL, the WGCNA modules and the paracliques were deposited in GeneWeaver.org (RRID:SCR_009202) 57 and accession numbers for each set of genes are found in Supplemental Table 1.In addition the eQTLs have been separated into cis and trans sets and are also presented as sets per chromosome.
Primary phenotyping data.Phenotyping data has been deposited at the Mouse Phenome Database (RRID:SCR_003212) under project CSNA03 60 .This data is part of a multi-arm phenotyping project within the Center for Systems Neurogenetics of Addiction, and connect the data through canonical correlation using the baseline behavioral battery as described in Skelly 61 .

QTL viewer repository. QTL Viewer is an interactive web-based analysis tool allowing users to replicate
the analyses reported for a study (Fig. 1).The tool with the data set described here is available at https://qtlviewer.jax.org/viewer/CheslerStriatum.It includes the ability to search various subsets of data from a study, such as phenotypes or expression data, and then visualize data with profile, correlation, LOD, effect, mediation, and SNP association plots (Fig. 2).Detailed information about the structure of the QTL viewer objects is available at https://github.com/churchill-lab/qtl-viewer/blob/master/docs/QTLViewerDataStructures.md.A complementary dataset from the hippocampus previous described in Skeelly et al. 61 is already available at https://churchilllab.jax.org/qtlviewer/DO/hippocampus.
QTL viewer rdata object.The primary data record associated with this study is qtlviewer_DO_ Striatum_02102020.Rdata.This RData object contains the following: • genoprobs -the genotype probabilities • K -the kinship matrix created using the leave one chromosome out (loco) method • map -list of one element per chromosome, with the genomic position of each marker • markers -marker names and positions • dataset.DO_Striatum_416 -Fig. 1 Screenshot of a query for the gene "Rab3b" in QTL Viewer.QTL results for Rab3b expression in the Diversity Outbred (J:DO) mice striatum.Metadata related to the J:DO generation and sex is displayed, and genes co-expressed with the selected gene can be accessed from the correlation tab.The allele effect plots, SNP association mapping, and mediation analysis can also be performed and viewed from the page by click on the tab and then the QTL peak to run the analysis.Upper right: A genome scan for Rab3b expression in the Diversity Outbred (J:DO) mice striatum identifies a strong (LOD > 65) cis-eQTL on chromosome 4. Lower Right: The allele effect plot for the haplotypes of the J:DO that regulate the expression of Rab3b.There are strong effects of WSB/EiJ, NZO/HILtJ, and C57BL/6 J on the expression in one direction and 129S1/SvJ, CAST/EiJ, and A/J in the opposite direction.
• annot.mrna-annotations of the mRNA data • annot.samples-sample annotations • covar.info-specific information about the covariates • covar.matrix-matrix of covariates data, samples (rows) x covariates (columns) • data -expression data, samples (rows) x mRNA (columns).This matrix is used in the eQTL mapping analysis.
• datatype -type of data set, either mRNA or protein • display.name-simple display name for the viewer • ensembl.version-version of Ensembl used to annotate locations • lod.peaks -LOD peaks over a certain threshold, set to >7 in this dataset

technical Validation
Blinding and randomization.In a population genetics study, mouse genotypes are collected randomly.The J:DO population is bred using a pseudorandom mating scheme, and test mice are obtained from several breeding cohorts.Experimenters are unaware of mouse genotypes and their relationship to gene expression.Coat color diversity in this population may create some experimenter bias in conventional mouse populations, but coat color is rarely a predictor of behavior in J:DO mice.In the ideal genetic population, genotypes are fully randomized, and individuals are all genetically equidistant.Because this is not the case, genetic mapping analyses include a relationship matrix, a structured covariance matrix that estimates the relations among individuals based on genotype similarity.Mice of both sexes are counterbalanced across test runs, with each run containing either male or female mice to avoid pheromonal effects on behavior.
RNA quality.RNA quality includes three primary components: integrity, purity, and concentration.RNA integrity was determined by the Agilent Bioanalyzer 2100 using an RNA Integrity Number (RIN).This metric uses the ratio of 28 S:18 S rRNA to rate RNA quality on a scale of 1-10.The median RIN was 9.1 (range: 7.9-9.9).RNA purity and concentration were determined using a NanoDrop spectrophotometer.Using this method, RNA concentration is determined using absorbance at 260 nm, while purity is determined using the ratio of absorbance at 260 nm to 280 nm (A260:A280) and the ratio of absorbance at 260 nm to 230 nm (A260:A230).The median concentration was 74.0 ng/µL (range: 9.4-186.3ng/µL), the median A260:A280 ratio was 2.00 (range: 1.67-2.06),and the median A260:A230 ratio was 2.02 (range: 0.98-3.23).These values indicated that the RNA quality was sufficient for RNA-Seq analysis.

Heritability of transcript variation.
Based on variance accounted for by genotypes across the genome, and an additive covariant of generation, transcript abundance has a median heritability of 0.229, which is about two times the observed median heritability observed initially in the BXD 24 Cis-eQTLs were highly detectable across the entire genome, as a diagonal band (seen in Fig. 1).Trans-eQTLs were independent of each other in the genetically unstructured and large population as would be expected.UNC506203, on chr 1 and 40.21Mbp and is the peak marker for the most number (42) of cis-and trans-eQTL.The largest interval between markers (13.852282 Mbp) that have no eQTL is on the X chromosome.This is all consistent with technically valid eQTL mapping.

Usage Notes
There are four means by which the processed drug naïve striatum gene expression datasets can be used.First, users can download the entire processed dataset in the RData format.Once downloaded, it is readily readable in the R programming environment.Second, users can access the data at https://qtlviewer.jax.org/viewer/CheslerStriatum and proceed with the eQTL profile by entering their specific gene of interest in the search text box.Thirdly, users can access these data using the QTL viewer API interface (https://github.com/churchill-lab/qtl2api).Fig. 2 The Rab3b cis-eQTL data retrieved in QTL Viewer.(A) SNP association mapping within the QTL peak interval displaying all the SNPs that drive the QTL.Most of the highest score SNPs are around Rab3b, as expected for a gene regulated in cis.User tip, select the SNP Association tab, then click on the QTL peak marker to launch the analysis.(B) The genes that are negatively (Scp2-ps2) or positively (Ttc4, 0610037L13Rik, Zyg11b) correlated with Rab3b expression can be displayed.A scatter plot can be generated for each gene with the gene of interest.(C).Mediation analysis can be performed on the data set to identify candidate causal mediators.This analysis retests the QTL effect at the locus of interest, iteratively conditioned on candidate mediators.Here the SNP in Scp2-ps2 creates the greatest LOD drop.User tip, select the Mediation tab, then click on the QTL peak marker to launch the analysis.
Finally users can download sets of genes derived from the eQTL data by various methods such as WGCNA and paraclique, from the online data repository and suite of tools GeneWeaver.org.