Transcriptomic analyses of murine ventricular cardiomyocytes

Mice are used universally as model organisms for studying heart physiology, and a plethora of genetically modified mouse models exist to study cardiac disease. Transcriptomic data for whole-heart tissue are available, but not yet for isolated ventricular cardiomyocytes. Our lab therefore collected comprehensive RNA-seq data from wildtype murine ventricular cardiomyocytes as well as from knockout models of the ion channel regulators CASK, dystrophin, and SAP97. We also elucidate ion channel expression from wild-type cells to help forward the debate about which ion channels are expressed in cardiomyocytes. Researchers studying the heart, and especially cardiac arrhythmias, may benefit from these cardiomyocyte-specific transcriptomic data to assess expression of genes of interest.


Background & Summary
In this study, we present next-generation RNA sequencing (RNA-seq) data of murine ventricular cardiomyocytes (CMC). To date, only whole-heart RNA-seq data have been published [1][2][3] , in which a variety of cell types, such as fibroblasts, endothelial cells, and atrial and ventricular cardiomyocytes, are pooled. We endeavoured to provide RNA-seq data of isolated CMCs for several reasons. Firstly, since the pump function of the heart relies on proper CMC function, CMCs are the most thoroughly studied cardiac cell type. Researchers studying CMCs may benefit from CMC-specific RNA-seq data from which expression of genes of interest can be extracted. Secondly, because of the crucial role of ion channels in cardiac electrical excitability and arrhythmogenesis, researchers that study cardiac arrhythmias have debated the question of which ion channels are expressed in CMCs. However, existing ion channel expression data are low-throughput, often contradictory 4-6 , fragmented 7 , or expression is assessed in the whole heart. The present work reveals the expression of the more than 350 ion channel family members, including pore-forming and auxiliary subunits, in CMCs (see Fig. 1 and Tables 1-3 (available online  only)). We therefore believe that these data will be valuable for ion channel researchers attempting to resolve the ongoing debate.
We have also included cardiac-specific knockout models of the ion channel regulators dystrophin, synapse-associated protein-97 (SAP97), and calmodulin-activated serine kinase (CASK). They interact with ion channels and modify their cell biological properties, such as membrane localization 3,[8][9][10][11] . Notably, CASK provides a direct link between ion channel function and gene expression. It regulates transcription factors (TFs) in the nucleus, such as Tbr-1, and induces transcription of T-elementcontaining genes 12 . CASK also regulates TFs of the basic helix-loop-helix family, which bind E-box elements in promoter regions, by modulating the inhibitor of the DNA-binding-1 TF 13 . Additionally, CASK and SAP97 directly interact with each other 11 . For these reasons, we include CASK, SAP97, and dystrophin knockout mice to investigate whether these three proteins have a similar effect on gene expression, which may suggest their involvement in similar pathways. However, research beyond the scope of this paper would be needed to determine whether CASK-dependent TF regulation caused the differential expression that we observed.
To date, mutations in approximately 27 ion channel genes have been associated with cardiac arrhythmias, such as congenital short-and long-QT syndrome (SQTS and LQTS), Brugada syndrome (BrS), and conduction disorders (see http://omim.org) [14][15][16] . Notably, our ion channel expression data, as presented in Fig. 1 and Tables 1-3 (available online only), reveal that several arrhythmia-associated ion channel genes are not or are scarcely expressed in murine ventricular CMCs (including Kcne2, Kcne3, Scn2b, and Scn3b). Although murine and human ion channel expression may differ, we are presently unaware of any available transcriptome of human CMCs 17,18 . We are also unable to either exclude or assess the effect of enzymatic isolation on the transcriptome. Finally, other cardiac cell types such as (myo)fibroblasts may express these ion channels and therefore may be important for arrhythmogenesis. Indeed, many ion channel genes that are not expressed in cardiomyocytes have been reported in murine whole-heart tissue 2 . These include Scn1a, Scn3b, 10 voltage-gated Ca 2+ channels, 10 K v channels, and four two-pore K + channels. Conversely, all ion channel genes expressed in CMCs are also reported in whole-heart expression data.
In sum, this study presents RNA-seq data from wildtype murine ventricular CMCs, as well as from SAP97, CASK, and dystrophin knockouts and controls (see Fig. 2 for a schematic overview of study design). We performed differential gene expression analysis to compare the knockouts to their controls, and we extracted wildtype ion channel gene expression data (Tables 1-3 (available online only), Fig. 1). We believe that these data will be valuable for researchers studying cardiomyocytes and ion channels to assess expression of genes of interest.

Mouse models
All animal experiments conformed to the Guide to the Care and Use of Laboratory Animals (US National Institutes of Health, publication No. 85-23, revised 1996); have been approved by the Cantonal Veterinary Administration, Bern, Switzerland; and have complied with the Swiss Federal Animal Protection Law. Mice were kept on a 12-hour light/dark cycle. Lights were on from 6:30 AM to 6:30 PM. To avoid the influence of circadian rhythm, mice were sacrificed between 10:00 AM and 1:00 PM. Mice were all male and were between the ages of 8 and 15 weeks.
MHC-Cre. The cardiac-specific murine alpha-myosin heavy chain (μMHC) promoter drives the expression of Cre recombinase, which, in turn, can recombine LoxP sequences. The μMHC-Cre strain was generated as previously described 19 and acquired from the Jackson Laboratory (stock #011038).
CASK and SAP97 knockout mice. CASK KO and SAP97/Dlg1 KO mice were generated as previously described 9,20 . Both the CASK and SAP97 mouse lines were on mixed backgrounds. The appropriate control mice were selected in accordance with the publications that characterized both mouse lines 9,20 . CASK control mice express Cre while the first CASK exon is not floxed. SAP97 control mice are Crenegative and the first SAP97 gene was floxed.  Dystrophin knockout (MDX-5CV) mice. The MDX-5CV strain demonstrates total deletion of the dystrophin protein. It was created as previously described 21 , and acquired from the Jackson laboratory (stock #002379). MDX mice were on pure Bl6/Ros backgrounds. Control mice were on pure Bl6/J background, except for MDX_Ct5 and MDX_5, which were Bl6/Ros mice backcrossed three times on Bl6/J.

RNA extraction and sequencing
RNA-seq was performed by the Next Generation Sequencing Platform at the University of Bern. Total RNA was isolated from freshly dissociated cardiomyocytes with an FFPE Clear RNAready kit (AmpTec, Germany), which included a DNase treatment step. RNA quality was assessed with Qubit and Bioanalyzer, and RNA quantity was checked with Qubit. To allow sequencing of long non-coding RNA (lncRNA), libraries were constructed with 1 μg RNA using the TruSeq Stranded Total RNA kit after Ribo-Zero Gold (Illumina) treatment for rRNA depletion. Library molecules with inserts o300 base pairs (bp) were removed. Paired-end libraries (2x150 bp) were sequenced on an Illumina HiSeq3000 machine.

Data Records
The data were submitted to NCBI Gene Expression Omnibus (GEO) (Data Citation 1). This GEO project contains raw data and TPM values from all samples, and differential gene expression analysis between knockout and control samples.  Table 4 for an overview of RNA-seq metrics, including mapping rates). One sample (MDX_1) yielded few reads and was therefore excluded from further analyses. The proportion of reads mapping to annotated exons ranged from 65 to 77%. Mapping, no-feature (2-13%), and ambiguous (11-23%) read pairs together accounted for 89-97% of the total number of RNA reads (Table 4). Read pairs covered 49,671 genes of the Mus musculus reference genome (GRCm83.38).

Quality assessment
The quality of all samples was assessed with FastQC. Except for MDX_1, all samples were of high quality. Where applicable, a representative example (MDX_Ct1) is shown. Firstly, the insert size histogram (Fig. 3a) shows that the inferred insert size of each sample exceeded 150 base pairs, demonstating that the sequencing was not contaminated by adapter sequences. Secondly, the GC content plot (Fig. 3c) ideally shows a roughly normal distribution centred around the average GC content of the genome, which varies between species. The peaks observed in Fig. 3c are likely caused by sequences that are detected at high copy numbers, and should not pose problems for downstream analyses. Furthermore, Phred scores ( Fig. 3d) are well within the green area of the graph indicating good base quality along the length of reads. As well, the gene coverage graph (Fig. 3e) of sample MDX_Ct1 shows that reads are distributed evenly along the length of the gene body. Because the gene coverage for all other samples is highly comparable to that of MDX_Ct1, only one example is shown. Lastly, the saturation report (Fig. 3f) represents the number of splice junctions detected using different subsets of the data from 5 to 100% of all reads. At sequencing depths sufficient to perform alternative splicing analysis, at least the red line, representing known junctions, should reach a plateau where adding more data does not much increase the number of detected junctions. Only MDX_1 does not reach this plateau.

Gene expression variation of biological replicates
We performed Principle Component Analyses (PCA) to assess whether samples from the same experimental group have similar gene expression profiles (Fig. 3b). Of note, samples within each sample group still show considerable variation. The mixed genetic background of most sample groups may explain this variation; only the MDX control mice are on a pure Bl6/J background. The variation seen in MDX control mice is likely due to a batch effect, as two rounds of samples were sequenced. However, considering that PCA plots are based on the 500 genes with the highest variability in one sample, our genes of interest, including all ion channel genes, show similar expression levels throughout all samples.