Transcriptomes of cochlear inner and outer hair cells from adult mice

Inner hair cells (IHCs) and outer hair cells (OHCs) are the two anatomically and functionally distinct types of mechanosensitive receptor cells in the mammalian cochlea. The molecular mechanisms defining their morphological and functional specializations are largely unclear. As a first step to uncover the underlying mechanisms, we examined the transcriptomes of IHCs and OHCs isolated from adult CBA/J mouse cochleae. One thousand IHCs and OHCs were separately collected using the suction pipette technique. RNA sequencing of IHCs and OHCs was performed and their transcriptomes were analyzed. The results were validated by comparing some IHC and OHC preferentially expressed genes between present study and published microarray-based data as well as by real-time qPCR. Antibody-based immunocytochemistry was used to validate preferential expression of SLC7A14 and DNM3 in IHCs and OHCs. These data are expected to serve as a highly valuable resource for unraveling the molecular mechanisms underlying different biological properties of IHCs and OHCs as well as to provide a road map for future characterization of genes expressed in IHCs and OHCs.


Background & Summary
Hair cells are the sensory receptors of both the auditory system and the vestibular system in the ears of all vertebrates. Hair cells transduce mechanical stimuli, i.e., movement in their environment, into electrical activity 1,2 . There are two types of hair cells in the mammalian cochlea, inner hair cells (IHCs) and outer hair cells (OHCs). These two types of hair cells are anatomically and functionally distinct 3 . Although much is known about how IHCs and OHCs function in hearing, we have limited knowledge of molecular mechanisms, i.e., gene expression and regulation, that underlie their distinct morphological and functional specializations.
While all cells in multicellular organisms have nearly identical genome, the genes that are transcribed are different for each cell type. Diverse patterns of gene expression and post-transcriptional regulation of gene expression by miRNA underlie phenotypic variances of different cell types. Genome-wide characterization of cell-specific transcriptomes is central to understanding the biological property of a cell or a population of cells. High-throughput mRNA sequencing (RNA-seq) allows simultaneous transcript discovery and abundance estimation with a wide dynamic range and lower false-negative and falsepositive discovery rates 4,5 . Direct sequencing of RNA libraries also provides the opportunity to explore alternative splicing, a key mechanism that contributes to transcriptome diversity [6][7][8] . Transcriptome analysis has emerged as a powerful tool in revealing the genetic and molecular profile of a cell or a population of cells.
In a previous study, we used the microarray technique to examine transcriptional profiling of purified IHCs and OHCs from adult mice 9 . Although microarray is a powerful technique, it has limitations in dynamic range and identification of multiple splice variants of the genes. Furthermore, it relies on prerequisite sequence information, which precludes analysis of unannotated genes 10 . Because of this, 22 to 24% of the transcripts detected in our previous microarray study were uncharacterized or unannotated transcripts or genes 9 . Hair cell-specific transcriptomes have been analyzed using RNA-seq in several recent studies [11][12][13][14] . However, these studies analyzed transcriptomes of cochlear and vestibular hair cells only from embryonic and neonatal mice. Furthermore, those studies did not distinguish between IHCs and OHCs.
Here, we describe transcriptome-wide profiling of IHCs and OHCs obtained from one-month-old CBA/J mouse cochleae to provide a comprehensive view of the gene expression in IHCs and OHCs. Unlike some other mouse strains (such as C57/B6) which carry mutations that can cause early onset of age-related hearing loss, CBA/J mice do not exhibit age-related hearing loss until 18 months of age. We took advantage of the established pulled glass pipette technique 9,15 and distinct morphology of the two types of hair cells to separately collect 1,000 isolated IHCs and OHCs. Two biological replicates of IHCs and three replicates of OHCs, each containing 1,000 hair cells, were prepared for RNA-seq. An overview of the study design is depicted in Fig. 1a. Transcriptomes of adult IHCs and OHCs from microarray technique 9 , as well as neonatal hair cells from RNA-seq [11][12][13][14] were presented along with transcriptomes from the current study. We validated our results by comparing some IHC and OHC preferentially expressed genes between the present study and previous studies 9,11-14 as well as by real-time quantitative PCR (RT qPCR). In addition, we used antibody-based immunostaining to show the preferential expression of SLC7A14 and DNM3, whose function in hair cells has not been characterized. While SLC7A14 showed strong staining in the soma of IHCs, DNM3 was detected in the stereocilia bundle of only OHCs. These two genes/proteins can be used as specific markers for adult IHCs and OHCs. Finally, we examined the expression of deafness-related genes in hair cells. Mutations or deficiencies affecting approximately 150 genes have been linked to inherited syndromic or non-syndromic hearing loss 16 . We analyzed the expression of 143 known deafness genes, excluding X-chromosome-linked genes, and showed 128 genes are expressed in hair cells.
Our dataset is expected to serve as a highly valuable resource for unraveling the molecular mechanisms underlying different biological properties of IHCs and OHCs. The dataset will also provide a road map for future characterization of genes expressed in these two types of hair cells and for assisting the auditory research community in exploring the functions of deafness-related genes.

Methods
Hair cell isolation and collection CBA/J mice aged between 28 and 35 days old were used for the study. The basilar membrane together with the organ of Corti was isolated as described before 15 . The sensory epithelium was transferred to an enzymatic digestion medium containing 1 ml L-15 and 1 mg Collagenase IV (Sigma) in a small Petri dish. After 5 min for incubation at room temperature (20 ± 2 o C), the tissue was transferred to a small plastic chamber (0.8 ml in volume) containing enzyme-free Leibovitz's L-15 medium (7.35 pH, 300 mOsm). Hair cells were separated after gentle trituration of the basilar membrane with a 200 μL Eppendorf pipette tip. The chamber containing the hair cells was then mounted onto the stage of an inverted Olympus IX71 microscope equipped with a video camera. The chamber (with inlet and outlet) was perfused with fresh L-15 medium to wash out debris for 5 min. IHCs and OHCs in most cases retained their distinct morphological feature after isolation. Some representative images of solitary IHCs and OHCs are presented in Fig. 1b.
To collect solitary hair cells, two pulled glass pipettes with a diameter of~30 μm were used to pick up and transfer IHCs and OHCs. Each pipette was designated for one cell type to prevent cell type www.nature.com/sdata/ SCIENTIFIC DATA | 5:180199 | DOI: 10.1038/sdata. 2018.199 contamination in the pipette. The pickup pipette was fabricated from 1.5 mm thin-wall glass tubing pulled by a two-stage electrode puller. The pipettes were mounted in two separate electrode holders mounted on two Narashigi micromanipulators (Narashigi, Japan). The suction port of the pipette holder, held by the micromanipulator, was connected to a micrometer-driven syringe to provide positive or negative pressure to draw in or expel the cells. An image of an OHC before being drawn into a pickup pipette is shown in Fig. 1c. A video showing a mouse OHC being drawing into a pickup pipette is provided (Data Citation 1). IHCs and OHCs were identified based on their morphology under direct visual observation and solitary hair cells that were not attached to any other cell types were collected. Any hair cells with ambiguous morphology were excluded. Hair cells were transferred to a microcentrifuge tube containing 50 μl RNAlater (Thermo Fisher Scientific, Waltham, MA) after~10 cells were collected in the pipette. Cells were expelled from the pipette by applying positive pressure. This step was repeated until approximately 50 to 80 IHCs and 100 to 150 OHCs were collected from each mouse. Thirty mice were used for the collection of two biological replicates of IHCs and three replicates of OHCs.

RNA isolation, amplification
Approximately 1,000 cells suspended in 100 μL RNALater from each biological replicate were used to extract total RNA, including small RNAs (>~18 nucleotides), using the Qiagen miRNeasy mini plus Kit (Qiagen Sciences Inc, Germantown, MD). DNA contamination was eliminated by on-column DNase digestion. The quality and quantity of RNA after purification was examined using an Agilent 2100 BioAnalyzer (Agilent Technologies, Santa Clara, CA) and compared to examples of pure RNA results found in the Agilent 2100 Bioanalyzer 2100 Expert User's Guide. Total RNA from each sample was approximately 8 to 10 ng/μl (with~3-4 μl total for each sample). These samples were reverse transcribed into cDNA and amplified using the SMART-Seq V4 Ultra Low Input RNA kit (Clontech Laboratories, Inc., Mountain View, CA).

RNA-sequencing and bioinformatic analyses
Genome-wide transcriptome libraries were produced from biological replicates of IHCs and OHCs. SMART-Seq V4 Ultra Low Input RNA kit (Clontech) was used to generate cDNA in combination with the Nextera Library preparation kit (Illumina, Inc., San Diego, CA). To ensure the inserts were the appropriate size and to determine concentration prior to sequencing, a Bioanalyzer 2100 and a Qubit fluorometer (Invitrogen) were used to assess library size and concentration. Transcriptome libraries were sequenced using the HiSeq 2500 Sequencing System (Illumina). Libraries were multiplexed and three samples per lane were sequenced as 100-bp paired-end reads. This generated approximately 100 million reads per sample. The files from the multiplexed RNA-seq samples were demulitplexed and fastq files representing each library and quality control data were generated.

Bioinformatics analyses
CLC Genomics Workbench software (CLC bio, Waltham, MA, USA) was used to map the reads to the mouse genome (mm10, build name GRCm38) and generate gene expression values in the normalized form of reads per kilobase of transcript per million mapped reads (RPKM) values. Reads were mapped to exonic, intronic, and intergenic sections of the genome. Gene expression estimates were derived from the mapped reads using HTSeq count 17 . Ingenuity IPA program (www.ingenuity.com) and DAVID 18 were used for functional annotation. Entrez Gene, HGNC, OMIM, and Ensembl database were used for verification, reference, and analyses.

Code availability
No custom code was used in any of these analyses.

Real-time qPCR
We validated the expression of 26 genes using RT qPCR. RT qPCR experiments were run on an Applied Biosystems 7500 Fast Real-Time PCR system. Ten microliters of Powerup SYBR Green Master Mix (Thermo Fisher Scientific, Waltham, MA, USA) was used in each 20 microliter reaction. Primer concentrations were 450 nM. The original cDNA samples were diluted twenty-fold with two microliters for every reaction. The fast thermal cycling mode of the Applied Biosystems 7500 instrument was used. We calculated ΔCt values (ΔCt = Ct (GOI) − Ct AVG HKG ) of each gene (gene of interest or GOI) after normalizing to Ct value of a house-keeping gene (HKG). For comparing differential expression of a gene between IHCs and OHCs, we calculated ΔΔCt, where ΔΔCt = ΔCt (IHCs) − ΔCt (OHCs) 19 . Thus, a positive value would suggest that this gene has a higher expression value in IHCs than OHCs, whereas a negative value suggesting higher expression in OHCs than in IHCs. The sequences of the oligonucleotide primers were designed using A plasmid Editor (ApE) software (http://biologylabs.utah.edu/jorgensen/wayned/ape/) and BLAST searches (http://blast.ncbinlm.nih.gov/ Blast.cgi.) to find unique and appropriate sequences with melting temperatures above 60°C that had predicted low rates of homodimerization. Oligonucleotide primers were acquired from Integrated DNA Technologies (Coralville, Iowa). The sequences of oligonucleotide primers are shown in Table 1.

Data Records
Raw fastq sequencing files, comprised of 2 biological repeats of IHCs and 3 biological repeats of OHCs, each with 2 technical repeats, have been deposited in the NCBI Sequence Read Archive (Data Citation 2). The individual accession numbers for each biological and technical replicate is provided in Table 2. An excel file containing the RPKM gene expression values of each biological and technical repeat of IHCs and OHCs is included as "GSE111348_Inner_and_outer_ hair_cells_RPKM.xlsx" (Data Citation 3). Since microarray-based transcriptomes of IHCs and OHCs from adult CBA/J mice are available from our previous study (Data Citation 4) 9 , we aligned the expression values of all the genes detected from RNAseq and microarray, according to the Ensembl annotated gene names (symbols). We also obtained transcriptome datasets (Data Citation 5 and Data Citation 6) of neonatal cochlear hair cells from two published studies 12,13 . The gene expression values together with transcriptome datasets from these published studies are included for comparison in Data Citation 7. Alignment of each gene from different studies was also assisted by reference to Ensembl, HGNC, Entrez Gene and OMIM. Additional resources such as the gEAR (https://www.umgear.org) and SHIELD (https://shield.hms.harvard.edu/index.html) were also used for reference and verification.

RNA quality control and RNA-seq quality control
We analyzed RNA quality and concentration of our samples to determine their suitability for RNAsequencing using an Agilent 2100 BioAnalyzer. In addition to using the 2:1 ratio (28S:18S) as an indication for determining the integrity of RNA in the electropherogram, we also used the RIN (RNA integrity number) software algorithm to evaluate the quality of our RNA samples. All of our samples had a RIN of 9, indicating that the integrity of RNA samples was high with minimal degradation.

Sequencing accuracy
We used the FastQC app (version 1.0.0) on the Illumina cloud computing interface (https://basespace. illumina.com/ome/index) to examine the quality of the reads. The analysis compared the read signals to the probability of accurate base-reading with a Phred quality score 20 . The fastq files generated from RNAsequencing were analyzed for base-reading accuracy. All our sequencing runs exceeded 30, which reflects a 99.9% accuracy of the correct base at a given nucleotide in the sequence. This suggests that the RNAsequencing performed was of high quality and unambiguous. We used Phred quality score ≥ 30 as the high-quality cutoff in our analysis for all samples.

Reproducibility of biological samples
Correlation coefficient was used to examine reproducibility of biological and technical replicates of IHCs and OHCs. Fig. 2a   coefficient between biological replicates of OHCs is 0.994 ± 0.0003 (mean ± SD), while the mean correlation coefficient between technical repeats of OHCs is 0.999 ± 0.0045. The correlation coefficient between biological replicates of IHCs is 0.9984 ± 0.0003 (mean ± SD), and the coefficient between technical repeats is 0.994 ± 0.0045. The analysis suggests that the results were highly reproducible. Principal component analysis (PCA) is a technique commonly used to measure levels of variation and similarity among gene expression datasets. We used PCA to examine similarity of gene expression of different cell populations as well as reproducibility of biological replicates. Fig. 3c shows PCA of the gene expression profiles of IHCs and OHCs. Transcriptome data of mouse liver cells from a published study 21 was downloaded and normalized with our data set. As shown, the expression profiles of OHCs are highly reproducible as the data points from three biological and three technical repeats are clustered all together with small variability. Similarly, the expression profiles of IHCs are also highly reproducible. However, the datasets of IHCs and OHCs are separated by a large distance, suggesting that their gene expression profiles are different. The gene expression profile of liver cells is also distinct from those of IHCs and OHCs, as liver cells are further away from hair cells in the graph.

Real-time qPCR validation
Fifteen additional CBA/J mice were used to prepare three biological replicates of IHCs and OHCs for RT qPCR to validate the expression of 26 genes, 14 of which were highly expressed in OHCs and 12 were highly expressed in IHCs. The expression values were all normalized to the cycle threshold (Ct) value of Nono and Ppia. Nono and Ppia, used as reference genes in a previous study 19 , had similar level of expression with no statistical significance between the two populations of hair cells in both previous microarray 9 and present RNA-seq studies. We compared the patterns of differential expression of these genes between IHCs and OHCs using expression values from qPCR and RNA-seq. While log2 fold difference for each gene was computed using the RPKM values of IHCs vs. OHCs from RNA-seq, the ΔΔCt for each gene was calculated from RT qPCR. Fig. 2d shows such a comparison after the expression values were normalized to fold changes. Although the values from two analyses are different, the trend of differential expression of these genes is highly consistent between the two datasets.

Immunocytochemistry
We used immunocytochemistry to detect the expression of SLC7A14 and DNM3; the function of these proteins in the two populations of hair cells has not been characterized. Slc7a14 and Dnm3 are differentially expressed in IHCs and OHCs, respectively, as shown in our previous microarray-based transcriptome analysis 9 . Current study (Fig. 2d) also shows that Slc7a14 and Dnm3 are preferentially expressed in IHCs and OHCs, respectively. Slc7a14 is predicted to encode a glycosylated, cationic amino acid transporter protein to mediate lysosomal uptake of cationic amino acids. This gene is expressed in the photoreceptor layer of the retina and mutations in this gene are associated with autosomal recessive retinitis pigmentosa 22 . Dnm3 encodes dynamin-3, which is predicated to be involved in producing microtubule bundles and able to bind and hydrolyze GTP. We used antibodies (against SLC7A14 and DNM3) and confocal microscopy to determine where they are expressed and whether they are differentially expressed. As shown in Fig. 3a,b, strong staining of SLC7A14 is detected in the soma of IHCs and but not in the soma of OHCs. Thus, SLC7A14 may be used as a specific marker for IHCs. Conversely, DNM3 expression is detected in the stereocilia bundle of OHCs, but not in the bundle of IHCs and vestibular hair cells (Fig. 3c-h) suggesting that DNM3 may play an important role in the biological property of OHC stereocilia and the components of the IHC and OHC stereocilia may be different. The functional roles of these two proteins in OHCs and IHCs are yet to be determined.

Validation by comparison with published studies
Previous studies have identified and characterized many genes expressed in hair cells in developing and adult animals using immunocytochemistry, molecular biology, and electrophysiology techniques. These genes encode some proteins for unique structure and function of hair cells as well as transcription factors important for hair cell differentiation, specification and maintenance. Since the expression of these genes has been validated by either in situ hybridization, antibody staining or molecular biology and electrophysiology techniques, comparison of the genes detected in our RNA-seq analysis with the genes that are already be described in the inner ear is a good way to validate our dataset. We compiled a list of genes that were identified in previous studies and presented in Table 3 (available online only). In the table, the expression (RPKM) values from our RNA-seq analysis are included for comparison. As shown, most genes that were previously detected in hair cells are also expressed in our dataset. We should point out though, some genes (especially those encoding transcription factors) are known to be expressed during development and significantly downregulated in adulthood. This may explain why some genes are expressed at lower levels (e.g., Atoh1 and Jag2) or no longer detected (Foxj1, Scn11a, and Tmc2) in adult hair cells. Several previous studies used microarray and RNA-seq to examine the gene expression profiles of cochlear and vestibular hair cells from embryonic and neonatal mice 9,11-14 as well as hair cells in the inner ear and lateral lines of larval and adult zebrafish 11,[23][24][25][26][27] . Comparison of our dataset with the transcriptome datasets from previous studies offers another way to validate our results. The gene names and their expression values from microarray-based transcriptomes of IHCs and OHCs from adult mice 9 are presented in Data Citation 7. Although the expression values are not directly comparable because of the two different techniques used, the majority of the genes that are detected in hair cells in different datasets are highly consistent. In the same file, we also included transcriptome datasets from neonatal mouse hair cells 12,13 . Since these datasets were obtained from a mixed population of both IHCs and OHCs from neonatal mice, some differences between the datasets are expected.
Although the expression values from microarray and RNA-seq are not directly comparable, we expect that the genes that are differentially expressed in one cell population in the two studies should largely be consistent. We used the top differentially expressed genes in IHCs and OHCs from Fig. 4b,c of the microarray study 9 for comparison. We computed the log2 fold difference between the two hair cell types from each study and present side-by-side comparison of the fold difference values from the two techniques in Fig. 4. As shown, none of the differentially expressed genes in IHCs (IHCs/OHCs in Fig. 4a) or OHCs (OHCs/IHCs in Fig. 4b) displace fold changes in the opposite direction from the two studies, suggesting that the differentially expressed genes identified by the two techniques are highly consistent. These differentially expressed genes may provide valuable information to understand different biological properties (such as structural and functional differences) of IHCs and OHCs in the adult inner ear.

Usage notes
While acquired deafness associated with age or noise exposure is more common than genetic deafness by roughly two orders of magnitude, congenital deafness occurs in 1 out of every 1,000 to 2,000 births. Hereditary hearing loss and deafness can be regarded as syndromic or non-syndromic. Mutations or deficiencies affecting approximately 140 genes have been linked to inherited syndromic or nonsyndromic hearing loss 16 . Although majority of these genes are known to be expressed in the inner ear, it is important to determine whether they are expressed in hair cells. We analyzed the expression of 125 known deafness genes. Table 4 (available online only) shows expression levels of the 125 deafness genes in adult IHCs and OHCs. As shown, most of these genes are detected in hair cells. We should point out that several genes are known to be expressed during development and significantly downregulated in adulthood. Other genes may be expressed in spiral ganglion neurons, supporting cells, and stria vascularis and play important roles in those cells. Thus, it is not surprising that the expression of some genes is not www.nature.com/sdata/ SCIENTIFIC DATA | 5:180199 | DOI: 10.1038/sdata.2018.199 detected in hair cells. However, the analysis will be highly useful for assisting the auditory research community in exploring the function of these deafness-related genes in hair cells.