Cocaine induces paradigm-specific changes to the transcriptome within the ventral tegmental area

During the initial stages of drug use, cocaine-induced neuroadaptations within the ventral tegmental area (VTA) are critical for drug-associated cue learning and drug reinforcement processes. These neuroadaptations occur, in part, from alterations to the transcriptome. Although cocaine-induced transcriptional mechanisms within the VTA have been examined, various regimens and paradigms have been employed to examine candidate target genes. In order to identify key genes and biological processes regulating cocaine-induced processes, we employed genome-wide RNA-sequencing to analyze transcriptional profiles within the VTA from male mice that underwent one of four commonly used paradigms: acute home cage injections of cocaine, chronic home cage injections of cocaine, cocaine-conditioning, or intravenous-self administration of cocaine. We found that cocaine alters distinct sets of VTA genes within each exposure paradigm. Using behavioral measures from cocaine self-administering mice, we also found several genes whose expression patterns corelate with cocaine intake. In addition to overall gene expression levels, we identified several predicted upstream regulators of cocaine-induced transcription shared across all paradigms. Although distinct gene sets were altered across cocaine exposure paradigms, we found, from Gene Ontology (GO) term analysis, that biological processes important for energy regulation and synaptic plasticity were affected across all cocaine paradigms. Coexpression analysis also identified gene networks that are altered by cocaine. These data indicate that cocaine alters networks enriched with glial cell markers of the VTA that are involved in gene regulation and synaptic processes. Our analyses demonstrate that transcriptional changes within the VTA depend on the route, dose and context of cocaine exposure, and highlight several biological processes affected by cocaine. Overall, these findings provide a unique resource of gene expression data for future studies examining novel cocaine gene targets that regulate drug-associated behaviors.

Background 1, p-val1. Testing all genes from the RNA-seq analysis after filtering out low gene expressions: the number of background genes used is 15000 (the last cell in the contingency table).
Background 2, p-val2. Testing only the DEGs in two groups: for example in CPP VS IVSA, the total number of DEGs from both groups is 579, then 579 is the number of background genes used (the last cell in the contingency table).

Supplemental Figure 5.
Supplemental Figure 5. Predicted biological processes altered by each cocaine paradigm. Top gene ontology (GO) terms of both upregulated and downregulated DEGs within each paradigm are listed (left). Heat maps displaying the log-fold changes of DEGs associated with a particular GO term are shown (right).
Supplemental Figure 6. Predicted biological processes altered by each cocaine paradigm. Top gene ontology (GO) terms of both upregulated and downregulated DEGs within each paradigm are listed (left). Heat maps displaying the log-fold changes of DEGs associated with a particular GO term are shown (right).   Common predicted upstream regulators of both upregulated genes and downregulated genes induced by cocaine in each paradigm. a. Upstream regulators of genes upregulated by cocaine. There were no common upstream regulators of genes upregulated by cocaine across all paradigm from the top 30 regulators. However, several common upstream regulators were found between paradigms. b. Upstream regulators of genes downregulated by cocaine. Supplemental Methods.
RNA Sequencing: RNA was isolated from the ventral tegmental area punches, using the RNeasy minikit (Qiagen, 74104). RNA quality was assessed by Bioanalyzer and only samples with an RNA integrity number greater than 9 were included in our analysis. cDNA libraries for all groups were prepared using the TruSeq RNA Sample Preparation Kit (Illumina, RS-122-2001). One hundred nanograms of total RNA from each mouse was purified with poly-T oligo-attached magnetic beads and heat fragmented. First and second strands of CDNA were synthesized and then purified. After blunting of the ends occurred, the 3′ end was adenylated to prevent concatenation of the template during adapter ligation. For each treatment group, a unique adapter set was added to the ends of the cDNA and the c. d.
libraries were amplified using PCR. The quality of the library was assessed by Bioanalyzer and quantified using qRT-PCR with a standard curve prepared from a commercial sequencing library (Illumina). Samples were mutliplexed, with each treatment group represented in each flow cell of the sequencer. 10 nM of each library was pooled in four multiplex libraries and sequenced on an Illumina HiSeq 2500 instrument during a single-read 100 bp sequencing, run by the Genomic High-Throughput Facility at the University of California, Irvine. The resulting sequencing data for each library were postprocessed to produce FastQ files. The data were then demultiplexed and filtered using Illumina CASAVA 1.8.2 software as well as in-house software. Control reads successfully aligned to the PhiX control genome or poor quality reads failing Illumina's standard quality tests were removed from the following analyses. The quality of the remaining sequences was further assessed using PHRED quality scores produced in real time during the basecalling step of the sequencing run.
Alignment to the reference genome and transcriptome The reads from each experiment were separately aligned to the mm10 reference genome using Bowtie2. The first 10 bases from the start of the read are trimmed using Trimmomatic, and reads shorter than 50 bp are excluded. Reads uniquely aligned by both tools to known exons or splice junctions with no more than two mismatches on any 25 bp fragment of the read were included in the transcriptome. Similarly, reads matching several locations in the reference genome were removed from analysis. The percentage of reads assigned to the reference genome and transcriptome using this protocol is reported for each group of replicates (Supp. Data 4). This resulted in an average of approximately 53 million reads per sample.

Downstream DEG analysis:
Top up-or down-regulated genes are identified for further analysis. This includes identification of differentially expressed upstream transcription factors and their downstream targets using transcription factor binding site (TFBS) enrichment analysis of all promoters using the MotifMap database. In MotifMap, the default parameters of 10,000 bp upstream and 2,000 bp downstream were used. Gprofiler was used to identify Gene Ontology terms from DEGs. This rank-rank hypergeometric overlap (RRHO) plots (Raudvere et al. 2019). All RNA-seq files are available on Gene Expression Ominbus (GEO GSE155313).

Significance of Overlapping Genes:
We performed Fisher exact test (FET) to calculate the significance in overlapping genes in Figure 2a-2h. Two sets of FET were ran using different background genes in the last cell of the contingency table: (1) we included roughly 15000 genes as background genes and those genes were obtained by filtering out genes with low FPKM values. The FET statistics from this set of background genes were shown in p-val1 in the table below.
(2) we included only the DEGs relevant to the groups in comparison (for example, the background genes used in Figure 2a are DEGs in HC Acute plus DEGs in HC Chronic). The number of background genes used for FET and the FET statistics were shown in background2 and p-val2 in the table. By using the first set of background genes described in (1), most of the overlaps were significant except for 2b and 2c. By using the other set of background genes described in (2), the large p-values were improved and overlapping genes in all groups were significant.

WGCNA analysis:
Co-expression network analysis was performed using the R WGCNA library. The soft thresholding power beta was chosen for each paradigm based on the criterion of approximate scale-free topology. By using hierarchical clustering on the dissimilarity in the topological overlap matrix (dissTOM = 1-TOM), gene modules were calculated for each paradigm. Module eigengenes, 1 st principle component of the module, were computed. GO term enrichment analysis was performed on top three modules with the most genes using R library enrichr. The top modules enriched from each paradigm were calculated using the FET statistics of over-representation of differential genes (DEGs) in each module. We selected a top modules that showed the highest significant enrichment (p-value<0.05) after FDR correction. For each paradigm, the top 30 hub genes were identified and visualization of the top 30 hub genes were shown as network graphs plotted by Cytoscape (Otasek et al. 2019). Using the R function softconnectiviity, we calculated the kME, which was used for calculating intra-modular connectivity for every gene. We took the top 30 most connected genes within a module and labeled them hub genes. For the WGNCA enrichment, we used DEGs that were detected from each paradigm to determine whether that same paradigm had modules enriched for their own DEGs (i.e. DEGs from IVSA enriched in the IVSA WGCNA).
Cell-type enrichment analysis was performed on the top five modules in each paradigm. Cell-type specific marker genes for six cell types in the midbrain include: astrocyte, cholinergic, dopaminergic, microglia, oligo, and serotenergic (Mancarci et al. 2017). Pvalue from Fisher exact test was calculated and corrected by FDR multi-test correction.
qPCR: RT-qPCR was performed as described previously. cDNA was created using the Transcriptor First Strand cDNA Synthesis kit (Roche Applied Science). Primers were designed using the Roche Universal Probe Library (Supp. Data 5). All values were normalized to Hprt5 expression levels. Analyses and statistics were performed using the Roche and REST 2009 software based on the Pfaffl method.
Statistical Analysis for Behavior. Prism was used for all graphing and statistical analysis of behavior. Significance thresholds were set at p < 0.05. Main effects and interactions were analyzed by either a one-way ANOVA (CPP data) or two-way ANOVA repeated measures (IVSA data). Tukey post-hoc comparisons were performed when appropriate. All analyses are two-tailed, with an α value of 0.05 required for significance. Error bars in all figures represent SEM. For all experiments, values ± 2SD from the group mean were considered outliers and were removed from analyses.