Transcriptomes of electrophysiologically recorded Dbx1-derived respiratory neurons of the preBötzinger complex in neonatal mice

Breathing depends on interneurons in the preBötzinger complex (preBötC) derived from Dbx1-expressing precursors. Here we investigate whether rhythm- and pattern-generating functions reside in discrete classes of Dbx1 preBötC neurons. In a slice model of breathing with ~ 5 s cycle period, putatively rhythmogenic Type-1 Dbx1 preBötC neurons activate 100–300 ms prior to Type-2 neurons, putatively specialized for output pattern, and 300–500 ms prior to the inspiratory motor output. We sequenced Type-1 and Type-2 transcriptomes and identified differential expression of 123 genes including ionotropic receptors (Gria3, Gabra1) that may explain their preinspiratory activation profiles and Ca2+ signaling (Cracr2a, Sgk1) involved in inspiratory and sigh bursts. Surprisingly, neuropeptide receptors that influence breathing (e.g., µ-opioid and bombesin-like peptide receptors) were only sparsely expressed, which suggests that cognate peptides and opioid drugs exert their profound effects on a small fraction of the preBötC core. These data in the public domain help explain the neural origins of breathing.


Results and discussion
We analyzed Dbx1 preBötC neurons using Patch-Seq 18 , which entails whole-cell patch-clamp recording followed by next-generation sequencing ( Supplementary Fig. 1A) and bioinformatics ( Supplementary Fig. 1B).
The maximum yield of high-quality RNA was inversely proportional to whole-cell recording duration (5 min on average, 3-8 min in all experiments). Inspiratory burst characteristics and intrinsic membrane properties for Type-1 and Type-2 neurons (Fig. 1A top and bottom, respectively) are already well established 14,15 . Given the time constraints, we focused on measuring the intrinsic membrane properties i.e., delayed excitation and sag potentials, at the expense of recording fewer inspiratory burst cycles.
Our patch-clamp recordings confirmed the previously published disparities between Type-1 and Type-2 neurons. Type-1 neurons showed delayed excitation of 167 ± 40 ms (n = 7 neurons from 6 slices) when subjected to suprathreshold current steps from a baseline membrane potential below − 70 mV (i.e., evidence of I A ; Fig. 1B, top trace) but negligible sag potentials (2 ± 1 mV, n = 7 neurons from 6 slices) when subjected to hyperpolarizing current steps from a baseline membrane potential of − 50 mV (i.e., no evidence of I h ; Fig. 1C, top trace).
Type-2 neurons exhibited minimal delays in excitation (76 ± 40 ms, n = 9 neurons from 8 slices) when subjected to suprathreshold current steps from a baseline membrane potential below − 70 mV (i.e., no evidence of I A ; Fig. 1B, bottom trace) but their membrane potential trajectory 'sagged' 11 ± 4 mV (n = 9 neurons from 8 slices) in the direction of baseline when subjected to hyperpolarizing current steps from a baseline membrane potential of − 50 mV (i.e., evidence of I h ; Fig. 1C, bottom trace). The disparities between delayed excitation and sag potentials measured in Type-1 and Type-2 neurons are unlikely to occur by random sampling from a single population with normally distributed expression of I A and I h with probabilities of p delay = 0.0006 (t = 4.53, df = 13) and p sag = 0.0002 (t = 4.96, df = 14), respectively. Therefore, we reject the null hypothesis and reconfirm that Type-1 and Type-2 are unique subpopulations of Dbx1 preBötC neurons 14 .
One Dbx1 preBötC inspiratory neuron we recorded and sequenced did not fit the criteria for Type-1 or Type-2, so we analyzed it as an Unknown.
We mapped all 17 samples from 14 slices to the murine genome (mm10 from Ensembl); 83% ± 3% of the sequences aligned uniquely resulting in an average of 10,335,384 uniquely aligned reads (Supplementary Table 1).
Transcriptomic differences between type-1 and type-2 neurons. The 31,543 genes that were expressed in at least one sample (7 Type-1 neurons and 9 Type-2 neurons) were examined for differential expression by DESeq2 ( Fig. 2A) We used principal component analysis (PCA) to assemble Dbx1 preBötC neurons in a plane based on transcriptome similarity, which revealed that Type-1 and Type-2 neurons form two distinct clusters (Fig. 2B). When we scrambled their Type-1 or Type-2 identities, PCA produced non-zero classification errors in the majority of trials (14/17), indicating that if half of the neuron transcriptomes were divorced from their identities as Type-1 or Type-2 then discrete clusters generally did not form ( Supplementary Fig. 2). These data suggest that Type-1 and Type-2 neurons are separate neuron classes based on their transcriptome ( Fig. 2B and Supplementary Fig. 2) in addition to their unique neurophysiological properties (Fig. 1).
Genes upregulated in type-1. We report an upregulation of the 5-HT 1D receptor gene, Htr1d (  Table 2). In rhythmic slices from neonatal rats, bath application of 5-HT increases inspiratory burst frequency by evoking a non-selective inward cation current via at 5-HT 2 receptors [19][20][21] . Our data suggest that 5-HT may affect Type-1 neurons via 5-HT 1D receptor although we have no corroborating physiological data yet. If so, we speculate that 5-HT could modulate I A , which would not result in depolarization but could affect inspiratory rhythm nonetheless 16 . I A is subject to neuromodulation by 5-HT receptors in mouse trigeminal ganglion neurons 22 and CA1 pyramidal neurons 23,24 .
Genes upregulated in type-2. Depletion of Ca 2+ in the endoplasmic reticulum (ER) activates the stromal interaction molecule (STIM) proteins, which subsequently activate the Ca 2+ release-activated Ca 2+ (CRAC) channels on the plasma membrane via the key subunit of CRAC channel, Orai1 [25][26][27][28] . We report Type-2 upregulation of the CRAC channel regulator 2A gene, Cracr2a, as well as the serine/threonine protein kinase gene, Sgk1, which activates STIM1 and Orai1 and thus enhances store-operated Ca 2+ entry (SOCE) 29 . SOCE-related mechanisms that could support or augment inspiratory (eupnea-related) and sigh-related pattern generation remain important topics for investigation. For example, regarding inspiration, intracellular Ca 2+ signaling in the context of SOCE could recruit Ca 2+ -activated non-specific cationic current (I CAN ), which profoundly contributes to amplify inspiratory bursts [30][31][32] . One of the likely candidates giving rise to I CAN in preBötC is the Trpm4 ion channel 31,33 . We do not see differential expression of Trpm4; however it does appear to be more highly expressed in Type-1 Additionally, burst amplification is also important for inspiratory rhythm because burstlets, proposed to be the core rhythmogenic mechanism, are subthreshold from the standpoint of inspiratory burst generation 12,13 . Type-1 neurons, putatively rhythmogenic, need to cross this threshold, possibly via Trpm4 ion channels (Fig. 4), in order to trigger a pattern-generating inspiratory burst. Thus, both Type-1 and Type-2 neurons require the burst amplifying I CAN but in different settings: Type-1 to cross the threshold to trigger an inspiratory burst and Type-2 neurons to further recruit the premotor and motor neurons, probably using the SOCE mechanisms, to generate inspiratory breathing movements. Sigh breaths, which occur at lower frequencies but are two-fold larger in magnitude 34 are likely also to involve Ca 2+ signaling and possibly SOCE mechanisms that recruit I CAN

35
. Dbx1 preBötC neurons are glutamatergic 3,4 and excitatory synaptic interactions, predominantly mediated by postsynaptic AMPA receptors, are essential for inspiratory rhythm and pattern generation 36,37 . We detect Type-2 upregulation of the AMPA receptor, Gria3 (Figs. 3 and 4, Supplementary Table 2), which may at first seem counterintuitive for the neuron class with shorter inspiratory drive latency and typically lower-amplitude inspiratory bursts 11,14,15 . Nevertheless, because the longer inspiratory drive latency in Type-1 neurons may be attributable to the rich topology of their excitatory synaptic interconnections 38,39 , one possible interpretation would be that upregulation of Gria3 in Type-2 neurons augments inspiratory drive in these less richly interconnected preBötC neurons. Upregulation of Gria3 may amplify postsynaptic AMPA receptor-mediated inspiratory drive to accomplish the output pattern-related role of Type-2 neurons. www.nature.com/scientificreports/ Synaptic inhibition influences inspiratory rhythm and output pattern 5,[40][41][42][43] . We report upregulation of the GABA A receptor, Gabra1, in Type-2 neurons; Gabra3 appears to be more highly expressed in Type-2 neurons but it did not pass the DESeq2 criteria for differential expression (Figs. 3 and 4, Supplementary Table 2). Of course GABAergic drive acts on Type-1 neurons, but Gabra1 (and perhaps Gabra3) upregulation in Type-2 neurons suggests that a stronger source of inhibitory drive may be equipped to bypass the oscillator (i.e., the predominant rhythmogenic Type-1 neuron class) and selectively act on the output pattern-generating Type-2 subpopulation to arrest inspiration with immediacy. Behavioral exigencies that might apply include the breath-hold dive reflex upon submersion; attentive immobility, that is, the arrest of all movement (including breathing) for predators stalking prey or prey attempting to camouflage themselves in the context of being hunted; or, finally, in the context of a Valsalva maneuver.
Transcription factors program cell fate during embryonic development and regulate gene expression postnatally. This study was performed postnatally (P0-2) so one does not expect pre-mitotic transcription factors. Nevertheless, we detected genes that help differentiate cardinal cell populations in the ventral progenitor zone of the neural tube including Dbx2, a sonic hedgehog-repressed (Class I) transcription factor co-expressed with Dbx1 in the p0 domain, and, surprisingly, a host of sonic hedgehog-induced (Class II) Nkx genes associated with more ventral domains ( Supplementary Fig. 4). Because all 17 of our neurons were derived from Dbx1-expressing precursors and 8 of 17 expressed Evx1 or Evx2 ( Supplementary Fig. 4, Supplementary Table 3), and only 1 of 17 expressed Pax7, our sample represents the V0 ventral (V0 V ) cardinal class 44,45 .
Further, we report upregulation of transcription factors Akna and Runx1 in Type-2 neurons (Fig. 3, Supplementary Table 2). Runx1 helps consolidate spinal motor neuron identity by suppressing interneuron programs 46 . It may, therefore, seem counterintuitive that Runx1 is upregulated in Type-2, but we speculate that it may be . Quantitative gene expression for ionotropic and metabotropic synaptic receptors, neuropeptides, neuropeptide receptors, and Trp channels. The first two bars show group data for Type-1 (n = 7; magenta bar) and Type-2 (n = 9; blue-cyan bar). The height of the bar is log 2 (mean FPKM) value and the error bar with horizontal cap shows log 2 (mean + SD). The next set of 17 bars shows log 2 (FPKM) values for each neuron in the following order: 7 Type-1 neurons (magenta), 1 Unknown neuron (green), and 9 Type-2 neurons (blue-cyan). Gene names are color-coded according to the subfamily to which they belong. Gene names in bold indicate DE and contain a superscript 1 if upregulated Type-1 neurons and 2 if upregulated in Type-2 neurons. www.nature.com/scientificreports/ acting to suppress Type-1 programs or else halting any further programming or developmental changes to Type-2 neurons. The potential role of Akna is not known.
Non-coding RNA. We report 15 differentially expressed long non-coding RNA (such as 2610035D17Rik, Table 2). The function of these transcripts and their role(s) in inspiratory rhythm-and pattern-generation is unexplored for now.
Transcripts associated with cellular neurophysiology. We next examined a broad spectrum of ionotropic and metabotropic synaptic receptors, peptides, peptide receptors, and transient receptor potential (Trp) ion channels (Fig. 4); voltage-dependent ion channels, regulatory subunits, and intracellular receptors (Fig. 5); purine receptors, monoamine receptors, and cell adhesion molecules ( Supplementary Fig. 3); as well as transcription factors ( Supplementary Fig. 4) irrespective of whether they are DE or non-DE genes. Here, our goal was to understand preBötC neuron excitability and signaling in general, not differential expression, so the criteria for inclusion were relaxed: any genes that were expressed in > 25% of the neurons (4 out of 17), regardless of the type of neuron, were quantified (Supplementary Table 3  . Quantitative gene expression for voltage-dependent ion channels, regulatory subunits, and intracellular receptors. The first two bars show group data for Type-1 (n = 7; magenta bar) and Type-2 (n = 9; blue-cyan bar). The height of the bar is log 2 (mean FPKM) value and the error bar with horizontal cap shows log 2 (mean + SD). The next of 17 bars shows log 2 (FPKM) values for each neuron in the following order: 7 Type-1 neurons (magenta), 1 Unknown neuron (green), and 9 Type-2 neurons (blue-cyan). Gene names are colorcoded according to the subfamily to which they belong. None are DE.  Table 3) that substantially enhance I A without affecting its voltage-dependence or kinetics 49,50 . Neither the genes for K v ion channels nor these regulatory genes cross our significance threshold in DESeq2 so we make no statistical claims. Nevertheless, we posit that Kcnip2 and Kcnip4 are important for augmenting I A in Type-1. The mean expression levels appear to differ greatly between the subpopulations (see kcnip2 and kcnip4 bar heights in Fig. 5) but the high variability in the expression level for these genes preludes their ratification as "DE" by DESeq2 analysis. The high variability in the gene expression level is probably due to the technical limitations of Patch-Seq that are discussed in "Perspectives" below. The sag potential in Type-2 neurons is a classic property of hyperpolarization-activated, cyclic nucleotide-gate channels (HCN) channels encoded by Hcn1-4 (Fig. 5, Supplementary Table 3). We do not detect any significant difference in the expression of these genes.
Our data show that Type-1 and Type-2 neurons comprise electrophysiologically discrete classes (Fig. 1) and yet they do not express different levels of ion channels typically associated with I A and I h (Fig. 5). We propose that the magnitude of I A and I h , and thus their influence on membrane potential trajectories, are instead governed by either Kcnip2 and Kcnip4 or by neuromodulation, in which neurons presynaptic to the preBötC regulate its constituent neurons' membrane properties. The influence of neurons outside of-and presynaptic to-the preBötC would be impossible to assess by Patch-Seq, the potential role of Htr1d notwithstanding (see above).  Table 3). Our results are congruent with the recent demonstration that ~ 8% of Dbx1 preBötC neurons express Oprm1 52,55 . The apparent sparsity of Oprm1 expression does not negate the obvious potency of µ-opioid receptor-mediated effects in the preBötC but it constrains the mechanism underlying opioid-induced respiratory depression to operate on a small fraction of constituent preBötC neurons, both Type-1 and Type-2, which affect rhythm and pattern.
Sigh breaths draw on the inspiratory reserve volume of the lungs to reinflate collapsed (or collapsing) alveoli and optimize pulmonary function 34 . Bombesin-like peptide receptors, Nmbr and Grpr, that modulate sighing were detected in ~ 7% of all preBötC neurons, including Dbx1 and non-Dbx1 subpopulations 57 . We report more than double that expression level: 18% of our samples (n = 3) expressed bombesin-like peptide receptor transcripts: 1 Type-1 and 1 Type-2 neuron expressed Nmbr for a combined mean expression level of 0.05 ± 0.19 log 2 (FPKM) and 1 Type-2 neuron expressed Grpr at an expression level of 0.01 log 2 (FPKM) (Supplementary Table 3). These data are in line with expectations because our study focuses exclusively on Dbx1 preBötC neurons, the core for inspiratory breathing rhythm and pattern, whereas Li et al. 57 measured transcripts in any preBötC neuron, ~ 50% of which are not derived from Dbx1-expressing progenitors.
Pituitary adenylate cyclase-activating peptide (PACAP) is important for breathing responses to CO 2 . PACAP mutant mice exhibit blunted chemosensitivity and die within 3-weeks after birth due to respiratory defects 58 . Bilateral microinjections of PACAP in preBötC, in vivo, increases breathing frequency and inspiratory motor output 58 and in vitro leads to an increase in XII motor output frequency 58,59 . We report expression of PACAP receptor (Adcyap1r1) in 15 out of 17 samples (6 Type-1 neurons, 8 Type-2 neurons and 1 Unknown) for a combined mean expression level of 3.52 ± 3.24 log 2 (FPKM), which would explain the effect of PACAP on both, rhythm and pattern (Fig. 4, Supplementary Table 3).
SST (and SST receptors) are expressed in core preBötC neurons, including-but not limited to-those derived from Dbx1-expressing precursors 3,4,60 . Several seminal studies in vitro and in vivo show that SST modulates inspiratory rhythm and pattern 6,11,61 . However, more recent studies specifically posit that SST-expressing (SST + ) preBötC neurons are specialized for output pattern 6,11 . We report Sst expression in 14 out of 17 Dbx1 neurons: 5 Type-1, 8 Type-2, and 1 Unknown for a combined mean expression level of 10.11 ± 11.65 log 2 (FPKM) (Fig. 4). Therefore, our data are incongruent with the theory that divides rhythm and pattern Dbx1 preBötC neurons on the basis of SST expression. Since we find that SST + Dbx1 neurons are well apportioned among putative rhythmand pattern-generating subpopulations, Type-1 and Type-2, respectively, we posit that a substantial population of non-Dbx1 SST + neurons with exclusively pattern-generating functionality exists in the preBötC, which we did not sample because of our focus on Dbx1-derived subpopulations.
Perspectives. Dbx1-derived preBötC neurons operate in unison to generate breathing movements. We distinguish rhythm and pattern functions in seeking to understand the neural origins of breathing but there is just one behavior. Type-1 and Type-2 neurons are significantly different on the basis of only 123 genes out of the 31,420 genes we detected, although it should be noted that our criteria of p adj < 0.01 and L2FC > 1.5 are relatively stringent [62][63][64] . Thus, these putative rhythm-and pattern-generating populations have much more in common in terms of their transcriptomes, compared to that which differentiates them.
We acknowledge technical limitations could have limited our ability to resolve Type-1 versus Type-2 transcriptome disparity. The minute amount of starting material (usually ≤ 10 pg RNA in the retrieved cytoplasm) from a single neuron necessitates substantial amplification to get sufficient cDNA for sequencing (> 150 pg/ www.nature.com/scientificreports/ µL). Amplification leads to bias favoring some sequences and invariably drowning-out others 65 . Also, reverse transcription errors can lead to faulty replication followed by a failure to map to the reference genome. These caveats produce false zeros for genes that are biologically expressed at non-zero levels 62,66,67 . We performed multiple quality checks ( Supplementary Fig. 1B) for our sequences, used stringent criteria for selecting DE genes and scrambling to ensure the differential expression analysis was efficient in detecting DE genes. However, these checks cannot differentiate a technical zero from a biological zero. The upshot of these caveats is that our Patch-Seq analysis assuredly missed some expressed genes and incorporates some false zeros. We further acknowledge that the disparities between Type-1 and Type-2 neurons may come about during translation and post-translational modifications, which impact phenotypic properties like inspiratory burst magnitude (I CAN , i.e., Trpm4-dependent), delayed excitation (I A ), and sag potentials (I h ). Nevertheless, these data provide the electrophysiology and transcriptomic data, including non-coding transcripts, for Dbx1 preBötC inspiratory neurons at the core of inspiratory rhythm and pattern generation. The transcriptomic data reported here can be utilized or meta-analyzed to design new experiments studying the neural control of breathing.

Methods
The Institutional Animal Care and Use Committee at William & Mary approved these protocols, which conform to policies set forth by the Office of Laboratory Animal Welfare (National Institutes of Health) and the National Research Council 68 .
The animals were housed in colony cages on a 12/12 h light/dark cycle with controlled humidity and temperature at 23 °C and fed ad libitum on a standard commercial mouse diet (Teklad Global Diets, Envigo) with free access to water.
In vitro slice preparations. The workbench was cleaned with RNase ZAP (Thermo Fisher, Waltham, MA) before beginning each experiment. All the beakers and tools were either autoclaved or cleaned first with RNase ZAP and then with nuclease-free water (NFW).
Dbx1;Ai148 pups were anesthetized by hypothermia, consistent with the American Veterinary Medical Association (AVMA) guidelines for euthanasia 71 . The neuraxis, from the pons to lower cervical spinal cord, was removed within ~ 2 min and submerged in ice-cold artificial cerebrospinal fluid (aCSF) containing (in mM): 124 NaCl, 3 KCl, 1.5 CaCl 2 , 1 MgSO 4 , 25 NaHCO 3 , 0.5 NaH 2 PO 4 , and 30 dextrose. The aCSF was prepared in an RNase-free environment and then aerated continually with 95% O 2 and 5% CO 2 during the experiment. We trimmed the neuraxis and glued the dorsal surface of the brainstem onto an agar block (exposing the ventral side). The block and brainstem were affixed rostral side up within a vibratome (Campden Instruments 7000 smz-2, Leicester, UK) while perfusing with aerated ice-cold aCSF. We cut a single transverse slice 450-500 µm thick (n = 14 slices total) with preBötC on its rostral surface 72 . We started a 3-h countdown clock as soon as the mouse was anesthetized and discarded the slice at the end of the interval to avoid sample degradation and contamination.
Whole-cell patch-clamp recording and cytoplasmic sample collection. We perfused slices with aCSF (28 °C) at 2-4 ml/min in a recording chamber on a fixed-stage upright microscope. The external K + concentration ([K + ] ext ) of the aCSF was raised from 3 to 9 mM to facilitate robust respiratory rhythm and XII motor output. We recorded XII motor output using suction electrodes fabricated from autoclaved borosilicate glass pipettes (OD: 1.2 mm, ID: 0.68 mm) fire polished to a diameter of ~ 100 µm. XII motor output was amplified by 2000, band-pass filtered at 0.3-1 kHz, and RMS smoothed for display.
Starting from a quiescent membrane potential between inspiratory bursts, we defined inspiratory drive latency as the elapsed time from first detection of summating synaptic potentials (EPSPs) until the onset of the inspiratory burst.
We measured two intrinsic membrane properties from a baseline membrane potential of − 60 mV: input resistance and cell capacitance. We applied 1 s current steps in a 10-step sequence and plotted the resulting current-voltage (I-V) relationship. For input resistance, we then measured the slope of the I-V curve in the linear region between (approximately) − 60 and − 40 mV. The membrane time constant was fitted by regression to an Scientific Reports | (2022) 12:2923 | https://doi.org/10.1038/s41598-022-06834-z www.nature.com/scientificreports/ exponential function based on the membrane voltage response to a 500 ms hyperpolarizing current pulse. We measured the cell capacitance using the membrane time constant and input resistance. We tested for A-type K + current (I A ) by applying suprathreshold depolarizing current step commands of 750-1000 ms duration from holding potentials of − 70 mV (Fig. 1B) and − 50 mV. The net applied current during the step command was equivalent regardless of holding potential. Neurons expressing I A exhibited delayed excitation of 120-220 ms from a holding potential of − 70 mV, but not from a holding potential of − 50 mV 14,15 .
We tested for hyperpolarization-activated cationic current (I h ) by applying hyperpolarizing current step commands of 750-1000 ms duration, which caused initial voltage excursions exceeding − 30 mV from a holding potential of − 50 mV (Fig. 1C). Neurons expressing I h exhibited a time-and voltage-dependent depolarizing 'sag' 14,15 .
After, categorizing the Dbx1 preBötC neuron as Type-1 and Type-2 ( Supplementary Fig. 1A a ), cytoplasmic contents were extracted under voltage clamp (− 60 mV holding potential) by applying negative pressure (0.7-1.5 psi). Successful extraction left the neurons visibly shrunken. Negative pressure was applied for a maximum of 10 min or until the neuron was electrophysiologically unstable, indicated by holding currents exceeding − 600 pA, whichever happened first. The patch pipettes were retracted promptly, and the cytoplasmic contents were ejected by breaking the pipette tip at the bottom of the RNase-free PCR tube containing 4 µL of stock solution (stock solution = NFW with 2% RRI) while applying positive pressure ( Supplementary Fig. 1A b ). Great care was taken to avoid any bubbles while applying positive pressure. Samples were briefly spun in a mini centrifuge, then snapfrozen in liquid nitrogen and stored at − 80 °C until further processing.
We monitored for potential contamination by collecting negative control samples during each experiment. Patch pipettes were filled with internal solution and then inserted into the tissue without targeting any neuron for recording; their contents were processed identically. cDNA synthesis, library preparation and sequencing. RNA from the recovered cytoplasm of patchclamped neurons was converted to complementary DNA (cDNA) according to the SMART-Seq HT protocol (Takara Bio USA, Mountain View, CA), which incorporates the template-switching activity of the reverse transcriptase to select for full-length cDNAs and to add PCR adapters to both ends of the first-strand DNA (SMART = Switching Mechanism at 5' end of RNA Template). Samples were denatured at 72 °C for 3 min. poly(A) + RNA was reverse transcribed using a tailed oligo(dT) primer. First strand cDNA synthesis and double-stranded cDNA amplification were performed in a thermocycler using the following program: 42 °C for 90 min; 95 °C for 1 min; 18 cycles of 98 °C for 10 s, 65 °C for 30 s, 68 °C for 3 min; and finally, 72 °C for 10 min. PCR-amplified cDNA was purified by immobilization on Agencourt AMPure XP beads (Beckman Coulter, Brea, CA), and then washed with 80% ethanol and eluted with elution buffer. Sequencing libraries were prepared from the amplified cDNA using SMART-Seq PLUS kits (Takara Bio USA, Mountain View, CA). Unique dual indexes were used on the amplified libraries to identify samples. We verified average cDNA size, abundance, and quality control of the final library using a Bioanalyzer High Sensitivity kit (Agilent, Santa Clara, CA) and Qubit dsDNA High-sensitivity Assay Kit (Molecular Probes, Eugene, OR) ( Supplementary Fig. 1A c ). cDNA samples containing less than 150 pg/µl cDNA were not sequenced. The cDNA sequencing libraries passing quality control were sequenced using an Illumina HiSeq X Sequencing system ( Supplementary Fig. 1A d ) with paired-end (150 bp) reads (Admera Health Biopharma Services, South Plainfield, NJ). A total of 18 samples were sequenced. Investigators were blinded to cell type during library construction and sequencing.
Quality control, pre-processing, and alignment to reference genome. Nucleotide sequences along with their corresponding quality scores were returned as FASTQ files. We received an average of 18,724,864 (n = 18 samples) paired-end reads per sample. The quality of reads was verified using FastQC v0.11.8 (Supplementary Fig. 1B a ). One sample returning 688 reads was discarded.
The mouse reference genome, mm10 from Ensembl, was used to create the genome directory for aligning the reads in STAR using the following commands: The raw reads were aligned to the mm10 reference genome by the splice-aware STAR software v2.7.7a, which generates BAM alignment files ( Supplementary Fig. 1B c ), using the following command: 1. STAR --genomeDir mm10ReferenceGenome --readFilesIn inputFASTQFile1.fastq inputFASTQFile2.fastq --outFileNamePrefix outputBAMFile --outSAMtype BAM SortedByCoordinate --outReadsUnmapped Fastx The alignment procedure (above) was done a total of 3 times for each sample to monitor the quality of the samples. The first alignment corresponds to raw reads. The second and third alignments are done after removing adapter and overrepresented sequences, respectively. www.nature.com/scientificreports/ Adapters added during library construction: AGA TCG GAA GAG CAC ACG TCT GAA CTC CAG TCA (paired end 1) and AGA TCG GAA GAG CGT CGT GTA GGG AAA GAG TGT (paired end 2) were trimmed ( Supplementary  Fig. 1B b ) by bbduk v38.00 using the following command: 1. sh bbduk.sh in1 = inputFASTQFile1.fastq. in2 = inputFASTQFile2.fastq out1 = outputFASTQFile1.fastq out2 = outputFASTQFile2.fastq ktrim = r -Xmx27g mm = f k = 33 hdist = 1 literal = AGA TCG GAA GAG CAC  ACG TCT GAA CTC CAG TCA,AGA TCG GAA GAG CGT CGT GTA GGG AAA GAG TGT tbo tpe The SMART-Seq HT kit uses dT priming for first-strand cDNA synthesis, annealing to the poly A tails of mRNA. The sequencing reads contained poly T/A sequences that were identified by FASTQC and tagged as overrepresented sequences, and finally trimmed ( Supplementary Fig. 1B b ) by cutAdapt v3.2 using the following command: 1. cutadapt -a overrepresentedSequence -A overrepresentedSequence' -o outputFASTQFile1.fastq -p output-FASTQFile2.fastq inputFASTQFile1.fastq inputFASTQFile2.fastq −m 10 −j 4 The STAR v2.7.7a alignment software tallies the number of sequences that (1) align uniquely, (2) align at multiple portions, or (3) fail to align with mm10. We present these alignment statistics for each step (raw reads, adapter-trimmed reads, and adapter-trimmed reads following removal of overrepresented sequences, i.e., processed reads) in Supplementary Intergenic reads exceeding 30% indicate DNA contamination. Our samples showed an average of 5.37% ± 2.33% intergenic reads (n = 17) so we conclude that our samples were not contaminated and thus could be used for downstream analyses. The quality control metrics of the processed reads, computed by Qualimap, are shown in Supplementary Table 1.
Uniquely aligned reads were converted to fragment counts ( Supplementary Fig. 1B d ) using featureCounts from the Rsubread package v2.4.2. The data pre-processing was performed using computing facilities at William & Mary (https:// www. wm. edu/ it/ rc). Data analysis. We wrote custom R scripts (R v4.0.3, RRID:SCR_001905) that quantify gene expression as fragment counts per kilobase of exon per million mapped reads (FPKM; Supplementary Fig. 1B e ). This quantification method is ideally suited for paired-end reads and normalizes for gene length and quantity of mapped reads. We also used R scripts to compute the mean and standard deviation (SD) of FPKM values. The log 2 transformed value of FPKM, mean FPKM, or mean + SD FPKM were used for visualization.
Genes that are part of mm10 but had zero fragment counts in all 17 samples were omitted from all further analyses and consideration (23,824 genes). We performed differential expression analyses on the remaining nonzero genes (31,543 genes) using DESeq2 v1.30.1 software. DESeq2 uses fragment count (not FPKM) for each gene to calculate its geometric mean (non-zero counts only) across all the samples ( Supplementary Fig. 1B f ). Next, it normalizes each count by dividing the fragment count of the gene by its geometric mean. The fold change (L2FC) between Type-1 and Type-2 Dbx1 neurons is calculated using logarithm (base 2) of the normalized counts. Any gene where the L2FC exceeds 1.5 and adjusted p value (p adj ) is less than 0.01 was deemed differentially expressed (DE).
Custom MATLAB scripts (RRID:SCR_001622) implemented unsupervised principal component analysis (PCA) for dimensionality reduction and clustering of the log 2 (FPKM) expression profiles of 123 DE genes and 16 samples. Although the PCA was performed without regard for sample category, clustering of the Type-1 and Type-2 samples is evident using the first two principal component scores that represent 32% of the variation in the data (Fig. 2B, axes labelled PC1 and PC2). A boundary line calculated using linear discriminant analysis (LDA) shows that an accurate Type-1 versus Type-2 classification may be performed using the first two principal component scores.
As a control, we permuted the labels identifying the 16 samples as Type-1 versus Type-2 and repeated our analyses of the log 2 (FPKM) expression profiles. In each of 17 scrambled data sets, 50% of the samples were correctly labelled and 50% were "imposters" with false identities. In each case, we performed DESeq2 analysis to obtain a list of "DE genes" and performed PCA on this subset of genes. Supplementary Fig. 2A shows a representative LDA using PC1 and PC2 for genes differentially expressed between two groups of neurons with scrambled Type-1 and -2 identities. The classification error in this case is 0.25, because 4 of the 16 neurons are misclassified (4 Type-2 neurons, one of which is a Type-2 imposter, are on or below the boundary line). Overall, the performance of classifiers obtained by LDA of such "DE genes" was severely degraded compared to the unscrambled case, especially when the LDA was restricted to the first several principal component scores ( Supplementary Fig. 2). This result adds confidence to our list of DE genes for Type-1 versus Type-2 neurons.