Background & Summary

Substance use disorder is a highly heritable trait involving variation in neural circuitry underlying motivated behavior and behavioral inhibition. Characterization of addiction-related brain regions in genetically diverse mice can lead to the discovery of molecular mechanisms of addiction-related behaviors. These mechanisms can in turn, aid in connecting genetic, genomic and behavioral variation within and across species through shared target genes1.

Drug-induced transcriptional changes in the corticostriatal system have been reported in rodent models2,3 and substance dependent individuals4,5. Additionally, behavioral correlates of substance use disorder, namely impulsivity and incentive sensitization, all involve corticostriatal circuitry6,7,8,9,10; however, the molecular mechanisms underlying these relationships are unknown. The striatum plays a central role in addiction-related behavior11,12 and influences behaviors (e.g., sensation seeking) that predict the development of substance use disorders13,14,15,16. It receives inputs from diverse brain regions, including the midbrain, prefrontal cortex, and thalamus, and plays a fundamental role in goal-directed actions and habits11. The dopamine projections from the ventral tegmental area to the nucleus accumbens and prefrontal cortex are at the heart of this reward circuit, and their importance to drug reward is well established17. Neuroimaging studies of people with a history of cocaine use disorders and rodent studies have indicated that addiction is a circuit-level disorder involving several functionally inter-connected brain regions18,19. Identifying drug-induced changes in gene expression20,21 and resulting neural plasticity22,23 in addiction-relevant neurocircuits can reveal underlying sources of addiction risk and resilience.

Gene expression quantitative trait locus (eQTL) and systems genetic analyses facilitate the identification of genes and variants associated with complex traits, including those related to addiction24. Such data allow the reduction of large numbers of positional candidate genes and variants implicated in quantitative trait locus (QTL) for behavioral traits. These data are also useful for discovering transcripts significantly correlated with behavior and uncovering transcriptional co-expression networks to identify the biomolecular mechanisms underlying complex traits24. Researchers can also use eQTL data to identify genetic variants regulating differentially expressed genes, such as those discovered in drug exposure studies. This data can be used with data from other model organisms25 to identify convergent evidence for biological mechanisms of addiction across species26,27. Model organism eQTLs can also be related to convergent findings in humans to prioritize genome-wide association study (GWAS) results and to contextualize the role of the identified variant1.

To ensure variation in nearly every gene in the genome and to increase the precision of genetic mapping and variant associations, the Diversity Outbred28,29,30 (J:DO) mice were developed as an advanced intercross of the eight-way hybrid Collaborative Cross (CC) population31,32. Within the J:DO population, there are over 45 million single-nucleotide polymorphisms (SNPs) and millions of segregating structural variants12,33,34. This high genetic diversity results in high phenotypic and transcriptomic variation35,36, enabling the discovery of genes and variants associated with behaviors.

Transcription regulatory variation is often context specific. Studies of transcriptional variation in J:DO mice have revealed precise genetic variation affecting proteomes and cellular transcriptional states in addition to bulk transcriptomics of tissues relevant to metabolism in health and disease37,38,39.

To construct a versatile reference data resource for addiction genetics, we performed a series of addiction-relevant exploratory and risk-taking behavioral assays35,40, and then profiled transcript abundance in striatum using RNA-Seq on 368 drug naïve J:DO mice. Data from this study are delivered in a platform that allows for the identification of eQTL effects, analysis of local SNP associations, assessment of gene expression correlations, and application of mediation analysis. This provides a resource for genetic studies of transcriptional diversity in the striatum of drug naïve mice. Combined with behavioral phenotyping, this resource enables the prioritization of behavioral QTL positional candidates by incorporating evidence from strong cis-eQTL effects and their underlying allelic patterns. Furthermore, behavioral QTLs can be subjected to local SNP association analysis followed by prioritization of positional candidates where the SNP strain distribution pattern of positional candidates matches the allelic effects of interest. Positional candidates can then be queried in this resource for the presence or absence of strong cis-eQTLs. Finally, the data can be used in global analyses of the relationship of trait-relevant variation across species, using increasingly sophisticated approaches for leveraging model organism data to predict, model, and explain polygenic risk for human disease41,42,43.

Methods

Mice

416 J:DO mice (strain #:009376) from both sexes, spanning generations G21, G22, and G23, were purchased from The Jackson Laboratory. The mice were housed in an elevated barrier pathogen- and opportunistic-free animal room (Health report available at: https://www.jax.org/-/media/jaxweb/health-reports/g200.pdf?la=en&hash=7AD522E82FA7C6D614A11EFB82547476157F00E1) before being transferred at weaning to an intermediate barrier specific pathogen-free room (https://www.jax.org/-/media/jaxweb/health-reports/g3b.pdf?la=en&hash=914216EE4F44ADC1585F1EF219CC7F631F881773). Mice were individually housed under (12:12) light/dark cycle and allowed ad libitum access to standard rodent chow [sterilized NIH31 5K52 6% fat chow (LabDiet/PMI Nutrition, St. Louis, MO)] and acidified water (pH 2.5–3.0) supplemented with vitamin K. Mice cages contained a pine-shaving bedding (Hancock Lumber) and environmental enrichment consisting of a nestlet and a Shepard’s Shack. The mice were identified by ear notching at weaning and moved between cages and testing using metal forceps.

Behavioral Phenotyping

At three to six months of age, mice were phenotyped four separate times with a different assay on each day, Monday to Thursday (open field, light-dark, hole-board, and novelty place preference)44 and euthanized on Friday in batches of 16–24 by decapitation. Phenotyping protocols are available in the Mouse Phenome Database, a recognized NIH biomedical data repository, and at https://www.addiction-neurogenetics.org/data-and-resources/. The Jackson Laboratory (JAX) follows husbandry practices in accordance with the American Association for the Accreditation of Laboratory Animal Care (AAALAC), and all work was done with the approval of the JAX Institutional Animal Care and Use Committee (Approval #10007).

Dissections

Testing and euthanasia were consistently performed between 8 AM to 12 PM to control for circadian effects. All surgical instruments were cleaned with RNAase Away (ThermoFischer Scientific, Waltham, MA) prior to use and between samples. Whole intact brains were removed, hemisected, and incubated in RNAlater (ThermoFischer Scientific, Waltham, MA) for 8–14 minutes. Then, under a dissection microscope, the striatum, hippocampus, and prefrontal cortex were removed and soaked for 24 hours in RNAlater at room temperature before being stored at −80 °C until processing.

RNA Isolation and sequencing

The striatum was homogenized, and total RNA was isolated using a TRIzol Plus kit (Life Technologies, City, State) with on-the-column DNase digestion according to the manufacturer’s instructions. The quality of the isolated RNA was assessed using an RNA 6000 Nano LabChip using an Agilent 2100 Bioanalyzer instrument [RRID:SCR_019389 (Agilent Technologies, Santa Clara, CA)] and a NanoDrop spectrophotometer [RRID:SCR_018042 (ThermoFisher Scientific, Wilmington, DE)]. The External RNA Controls Consortium spike-in (ERCC, Ambion, Austin, TX) was added to the samples to allow for normalization in accordance with the core facility’s standard operating procedure but was not used in our downstream analyses. An RNA-Seq library was prepared using the KAPA Stranded RNA-Seq Kit with RiboErase (Kappa Biosystems, City, State). Libraries were then pooled and sequenced at The Jackson Laboratory using a 100 bp paired-end process on a HiSeq2500 (Illumina) sequencing system (RRID:SCR_016383) targeting 40 million read pairs per sample. Sequencing achieved a median read depth of 65.7 million read pairs per sample (range: 31.4 million to 117.4 million reads).

Sequencing analysis

Raw read data were demultiplexed and converted to FASTQ files. Paired-end FASTQ files from multiple lanes were concatenated together prior to alignment. All paired-end FASTQ files were aligned to the Mus musculus GRCm38 reference (GenBank accession number: GCA_000001635.2) with Ensembl v94 (October 2018) annotation using STAR (v2.6.1c) (RRID:SCR_004463) set to produce both genome and transcriptome Binary Alignment Map (BAM) files. STAR was used with default options ensuring that these defaults allowed for a maximum of 10 multi-mapped reads and a maximum of 10 mismatches. Reads exceeding these criteria were excluded from further downstream processing. Expression estimation was performed using the RSEM package (v1.3.0) (RRID:SCR_013027) with–estimate-rspd using the transcriptome BAM files obtained following alignments from STAR. RSEM expected counts per transcript were used for downstream analysis.

Expression estimate data were imported into R v3.5.1 (RRID:SCR_001905) using tximport v 1.10.1 (RRID:SCR_016752). Data were TMM-normalized with edgeR v3.24.3 (RRID:SCR_012802) and log-transformed to stabilize variance using voom + limma in limma v3.38.3 (RRID:SCR_010943). We used the biomaRt R package v2.38.0 (RRID:SCR_019214) to annotate the data using the v94 Ensembl archive (oct2018.archive.ensembl.org). Using X and Y chromosome gene expression, we discovered that some samples had sex chromosome aneuploidies (X0 females and partial XXY males), a previously documented phenomenon among J:DO mice45. These samples were excluded from downstream analyses. Additionally, we discovered that some samples included choroid plexus contamination. We remediated this contamination by taking the residuals of expression regressed on the log-mean expression of the genes klotho (Kl, ENSMUSG00000058488) and transthyretin (Ttr, ENSMUSG00000061808), which are unambiguous markers for the choroid plexus46.

Genotyping, haplotype reconstruction and sample QC

Genotyping was performed on tail biopsies by Neogene Genomics (Lincoln, NE) using the Mouse Universal Genotyping Array (GigaMUGA)47 consisting of 143,259 markers. Based on published genotype QC workflows48, 110,524 markers and 386 mice were retained for further analysis48. Genotypes were converted to founder strain-haplotype reconstructions using R/qtl249 (qtl2_0.21–1, http://kbroman.org/qtl2) (RRID:SCR_018181). We computed Pearson correlations between GBRS-based50 and array-based haplotype probabilities on 69,000 common grids. We found that 90% of samples have Pearson correlation r > 0.8. The median correlation coefficient was 0.834. Sample swaps were identified by detecting incorrect pairing of RNA-Seq and genotyping array data among these samples. We corrected the sample mix-ups by reassigning gene expression and genotypes that maximize correlation on haplotype probabilities.

Expression QTL mapping

Prior to eQTL mapping, gene expression counts were obtained by summing expected counts over all transcripts for a given gene. Expression for eQTL analysis was adjusted for choroid plexus contamination by regressing the log-mean of Kl and Ttr as additive covariates. eQTL mapping was performed on regression residuals of 17,248 genes using the R/qtl2 package and the founder haplotype regression method. To correct for population structure, kinship matrices were computed with the Leave One Chromosome Out (LOCO) option for kinship correction (http://kbroman.org/qtl2)51. Additive covariates of sex and J:DO generation were used in the eQTL mapping model. Specifically, for each gene, the following linear model was fit,

$${y}_{i}={s}_{i}{\beta }_{s}+ge{n}_{i}{\beta }_{gen}+{\sum }_{j=1}^{7}{g}_{ij}{\beta }_{j}+{\gamma }_{i}+\varepsilon $$

where yi is the gene expression abundance of the ith animal, si is the sex of animal i, βs is the effect of sex, geni is the generation of animal i, βgen is the effect of generation, gij is the founder probability for founder allele j in animal i, βj is the genotype coefficient, and γi is the random effect representing the polygenic influence of animal i as modeled by a kinship matrix. All variant associations are visible in the eQTL viewer and can be obtained from reanalysis of the whole data set. Each eQTL was categorized as either cis or trans for the purpose of listing them in GeneWeaver, where cis is defined as eQTLs within +/− 2 MB of the transcription start site (TSS) of the gene, and trans are eQTLs further away or on other chromosomes. Although LD for this generation is estimated to be quite small (LD1/2 ~ 0.1 Mb52) a2 MB interval was chosen to ensure inclusion of longer distance effects on expression, e.g. distal enhancer variation. It is important to note that such distal variants need not be in high LD with the target gene to regulate it in cis.

Creation of the eQTL viewer object

The results of the eQTL analysis, along with expression estimates, genotypes, and covariates, are encapsulated in an RData object. RData objects are designed for use in R and contain all the objects necessary for reproducing an analysis. We followed the instructions provided by the developers of the eQTL viewer53,54. Specifically, we created the following elements: kinship, map, genotype probabilities (genoprobs), markers, and a dataset object that contains information on the gene annotations, covariates, expression data, and sample annotations.

WGCNA analysis

RNA-Seq data was analyzed with WGCNA (RRID:SCR_003302)55. A soft thresholding power of 3 was selected using the WGCNA scale-free topology R2 threshold of 0.9 with a signed network with a minimum module size of 30. This soft threshold of 3 was chosen because that was the lowest threshold that yielded a high scale-free topology R2 of >0.9 while providing maximal connectivity (median k = 2,270). The correlation calculation utilized was bicor, and modules used numeric labels instead of colors.

Paraclique analysis

RNA-Seq data was analyzed with paraclique56 using a bicor with a correlation coefficient threshold of |0.5| (unsigned), minimum seed clique size of 5, minimum finished paraclique size of 10, proportional glom factor of 0.2 for paraclique construction.

Gene sets for analysis in geneweaver

Sets of genes representing the J:DO striatum eQTL, the WGCNA modules and the paracliques were deposited in GeneWeaver.org (RRID:SCR_009202)57 and accession numbers for each set of genes are found in Supplemental Table 1. In addition the eQTLs have been separated into cis and trans sets and are also presented as sets per chromosome.

Data Records

Primary sequence data

Primary raw paired end RNA-Seq data files (FASTQ formatted) from 416 J:DO mice were submitted to the Sequence Read Archive (RRID:SCR_004891) and are available https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE16273258.

Primary genotyping data

Raw data has been deposited at the Diversity Outbred Database https://divdb.jax.org (RRID:SCR_018180)59.

Primary phenotyping data

Phenotyping data has been deposited at the Mouse Phenome Database (RRID:SCR_003212) under project CSNA0360. This data is part of a multi-arm phenotyping project within the Center for Systems Neurogenetics of Addiction, and connect the data through canonical correlation using the baseline behavioral battery as described in Skelly61.

QTL viewer repository

QTL Viewer is an interactive web-based analysis tool allowing users to replicate the analyses reported for a study (Fig. 1). The tool with the data set described here is available at https://qtlviewer.jax.org/viewer/CheslerStriatum. It includes the ability to search various subsets of data from a study, such as phenotypes or expression data, and then visualize data with profile, correlation, LOD, effect, mediation, and SNP association plots (Fig. 2). Detailed information about the structure of the QTL viewer objects is available at https://github.com/churchill-lab/qtl-viewer/blob/master/docs/QTLViewerDataStructures.md. A complementary dataset from the hippocampus previous described in Skeelly et al.61 is already available at https://churchilllab.jax.org/qtlviewer/DO/hippocampus.

Fig. 1
figure 1

Screenshot of a query for the gene “Rab3b” in QTL Viewer. QTL results for Rab3b expression in the Diversity Outbred (J:DO) mice striatum. Metadata related to the J:DO generation and sex is displayed, and genes co-expressed with the selected gene can be accessed from the correlation tab. The allele effect plots, SNP association mapping, and mediation analysis can also be performed and viewed from the page by click on the tab and then the QTL peak to run the analysis. Upper right: A genome scan for Rab3b expression in the Diversity Outbred (J:DO) mice striatum identifies a strong (LOD > 65) cis-eQTL on chromosome 4. Lower Right: The allele effect plot for the haplotypes of the J:DO that regulate the expression of Rab3b. There are strong effects of WSB/EiJ, NZO/HILtJ, and C57BL/6 J on the expression in one direction and 129S1/SvJ, CAST/EiJ, and A/J in the opposite direction.

Fig. 2
figure 2

The Rab3b cis-eQTL data retrieved in QTL Viewer. (A) SNP association mapping within the QTL peak interval displaying all the SNPs that drive the QTL. Most of the highest score SNPs are around Rab3b, as expected for a gene regulated in cis. User tip, select the SNP Association tab, then click on the QTL peak marker to launch the analysis. (B) The genes that are negatively (Scp2-ps2) or positively (Ttc4, 0610037L13Rik, Zyg11b) correlated with Rab3b expression can be displayed. A scatter plot can be generated for each gene with the gene of interest. (C). Mediation analysis can be performed on the data set to identify candidate causal mediators. This analysis retests the QTL effect at the locus of interest, iteratively conditioned on candidate mediators. Here the SNP in Scp2-ps2 creates the greatest LOD drop. User tip, select the Mediation tab, then click on the QTL peak marker to launch the analysis.

QTL viewer rdata object

The primary data record associated with this study is qtlviewer_DO_Striatum_02102020.Rdata. This RData object contains the following:

  • genoprobs - the genotype probabilities

  • K - the kinship matrix created using the leave one chromosome out (loco) method

  • map - list of one element per chromosome, with the genomic position of each marker

  • markers - marker names and positions

  • dataset.DO_Striatum_416 –

    • annot.mrna - annotations of the mRNA data

    • annot.samples - sample annotations

    • covar.info - specific information about the covariates

    • covar.matrix - matrix of covariates data, samples (rows) x covariates (columns)

    • data - expression data, samples (rows) x mRNA (columns). This matrix is used in the eQTL mapping analysis.

    • datatype - type of data set, either mRNA or protein

    • display.name - simple display name for the viewer

    • ensembl.version - version of Ensembl used to annotate locations

    • lod.peaks - LOD peaks over a certain threshold, set to >7 in this dataset

Technical Validation

Blinding and randomization

In a population genetics study, mouse genotypes are collected randomly. The J:DO population is bred using a pseudorandom mating scheme, and test mice are obtained from several breeding cohorts. Experimenters are unaware of mouse genotypes and their relationship to gene expression. Coat color diversity in this population may create some experimenter bias in conventional mouse populations, but coat color is rarely a predictor of behavior in J:DO mice. In the ideal genetic population, genotypes are fully randomized, and individuals are all genetically equidistant. Because this is not the case, genetic mapping analyses include a relationship matrix, a structured covariance matrix that estimates the relations among individuals based on genotype similarity. Mice of both sexes are counterbalanced across test runs, with each run containing either male or female mice to avoid pheromonal effects on behavior.

RNA quality

RNA quality includes three primary components: integrity, purity, and concentration. RNA integrity was determined by the Agilent Bioanalyzer 2100 using an RNA Integrity Number (RIN). This metric uses the ratio of 28 S:18 S rRNA to rate RNA quality on a scale of 1–10. The median RIN was 9.1 (range: 7.9–9.9). RNA purity and concentration were determined using a NanoDrop spectrophotometer. Using this method, RNA concentration is determined using absorbance at 260 nm, while purity is determined using the ratio of absorbance at 260 nm to 280 nm (A260:A280) and the ratio of absorbance at 260 nm to 230 nm (A260:A230). The median concentration was 74.0 ng/µL (range: 9.4–186.3 ng/µL), the median A260:A280 ratio was 2.00 (range: 1.67–2.06), and the median A260:A230 ratio was 2.02 (range: 0.98–3.23). These values indicated that the RNA quality was sufficient for RNA-Seq analysis.

Heritability of transcript variation

Based on variance accounted for by genotypes across the genome, and an additive covariant of generation, transcript abundance has a median heritability of 0.229, which is about two times the observed median heritability observed initially in the BXD24 Cis-eQTLs were highly detectable across the entire genome, as a diagonal band (seen in Fig. 1). Trans-eQTLs were independent of each other in the genetically unstructured and large population as would be expected. UNC506203, on chr 1 and 40.21 Mbp and is the peak marker for the most number (42) of cis- and trans-eQTL. The largest interval between markers (13.852282 Mbp) that have no eQTL is on the X chromosome. This is all consistent with technically valid eQTL mapping.

Usage Notes

There are four means by which the processed drug naïve striatum gene expression datasets can be used. First, users can download the entire processed dataset in the RData format. Once downloaded, it is readily readable in the R programming environment. Second, users can access the data at https://qtlviewer.jax.org/viewer/CheslerStriatum and proceed with the eQTL profile by entering their specific gene of interest in the search text box. Thirdly, users can access these data using the QTL viewer API interface (https://github.com/churchill-lab/qtl2api). Finally users can download sets of genes derived from the eQTL data by various methods such as WGCNA and paraclique, from the online data repository and suite of tools GeneWeaver.org.