Transcriptome analysis in a humanized mouse model of familial dysautonomia reveals tissue-specific gene expression disruption in the peripheral nervous system

Familial dysautonomia (FD) is a rare recessive neurodevelopmental disease caused by a splice mutation in the Elongator acetyltransferase complex subunit 1 (ELP1) gene. This mutation results in a tissue-specific reduction of ELP1 protein, with the lowest levels in the central and peripheral nervous systems (CNS and PNS, respectively). FD patients exhibit complex neurological phenotypes due to the loss of sensory and autonomic neurons. Disease symptoms include decreased pain and temperature perception, impaired or absent myotatic reflexes, proprioceptive ataxia, and progressive retinal degeneration. While the involvement of the PNS in FD pathogenesis has been clearly recognized, the underlying mechanisms responsible for the preferential neuronal loss remain unknown. In this study, we aimed to elucidate the molecular mechanisms underlying FD by conducting a comprehensive transcriptome analysis of neuronal tissues from the phenotypic mouse model TgFD9; Elp1Δ20/flox. This mouse recapitulates the same tissue-specific ELP1 mis-splicing observed in patients while modeling many of the disease manifestations. Comparison of FD and control transcriptomes from dorsal root ganglion (DRG), trigeminal ganglion (TG), medulla (MED), cortex, and spinal cord (SC) showed significantly more differentially expressed genes (DEGs) in the PNS than the CNS. We then identified genes that were tightly co-expressed and functionally dependent on the level of full-length ELP1 transcript. These genes, defined as ELP1 dose-responsive genes, were combined with the DEGs to generate tissue-specific dysregulated FD signature genes and networks. Within the PNS networks, we observed direct connections between Elp1 and genes involved in tRNA synthesis and genes related to amine metabolism and synaptic signaling. Importantly, transcriptomic dysregulation in PNS tissues exhibited enrichment for neuronal subtype markers associated with peptidergic nociceptors and myelinated sensory neurons, which are known to be affected in FD. In summary, this study has identified critical tissue-specific gene networks underlying the etiology of FD and provides new insights into the molecular basis of the disease.

results in the tissue-specific skipping of exon 20 2 .The nervous system expresses the lowest amount of full-length ELP1 transcript and protein 2 .
FD patients exhibit a range of neurological symptoms that manifest from birth and worsen over time, including diminished pain and temperature sensation, visual loss, kyphoscoliosis, proprioceptive ataxia, and difficulty regulating body temperature and blood pressure 1, [24][25][26] .Loss of sensory neurons in the dorsal root ganglion (DRG), including nociceptors and proprioceptors, is a prominent feature of FD 25,[27][28][29] .Nociceptors are specialized sensory neurons that detect and transmit signals related to pain and temperature perception.In FD patients, there is a diminished ability to perceive pain and temperature, which can lead to insensitivity to potentially harmful stimuli 30 .Proprioceptors are sensory neurons responsible for detecting body position and movement and their loss in FD results in proprioceptive ataxia, causing difficulties in coordinating movements and maintaining balance 30,31 .The loss of both nociceptors and proprioceptors contributes to the complex neurological symptoms observed in FD patients.
Mouse models of FD have contributed significantly to our understanding of the role of ELP1 in neural development and function.Multiple studies have provided evidence supporting the crucial role of ELP1 in maintaining neuronal survival and tissue innervation [32][33][34] .The Elp1 KO mouse provided the first insights into the role of Elp1 in transcriptional elongation and gene expression regulation despite leading to lethality at the mid-gastrulation stage 7 .RNA-seq transcriptomic profiling of mouse embryos expressing increasing levels of human ELP1 revealed dysregulation of genes essential to early-stage nervous system development and to the identification of a set of co-expressed genes whose expression highly correlated with the level of ELP1 35 .These ELP1 dose-responsive genes were enriched for axon and cell projection formation which supports the role of ELP1 in the expression of genes important for target tissue innervation and is consistent with the innervation failure observed in FD 35 .A phenotypic FD mouse model was generated by introducing the human ELP1 transgene carrying the FD major splice mutation (TgFD9) into a hypomorphic Elp1 Δ20/Flox mouse 27 .This humanized mouse mimics the tissuespecific mis-splicing seen in FD patients, as well as many phenotypic characteristics of the human disease 36,37 .
While the PNS is known to be significantly affected in FD, the specific gene networks responsible for this disruption have not been identified.In the current study, we uncovered putative, tissue-specific, and convergent molecular mechanisms underlying FD by analyzing the transcriptomes of several neuronal tissues (Fig. 1).We collected DRG and trigeminal ganglion (TG) as representative PNS tissues, and cortex, medulla (MED), and spinal cord (SC) as representative CNS tissues, from both control and FD phenotypic mice 36 (Fig. 1A).To unravel tissue-specific transcriptomic dysregulation, we identified differentially expressed genes (DEGs) and ELP1 doseresponsive genes (Fig. 1A).We then constructed FD-dysregulated gene networks from these transcriptional signatures based on their known protein-protein interactions (Fig. 1B).Finally, we compared FD signature genes to determine functional convergence across tissues in FD (Fig. 1C).This comprehensive transcriptome analysis provides valuable insights into the regulatory mechanisms underlying FD pathogenesis and sheds light on the shared dysregulation observed in FD PNS tissue.

ELP1 reduction leads to tissue-specific transcriptome changes
To uncover the tissue-specific molecular alterations underlying FD, we conducted transcriptome analysis in DRG, TG, cortex, MED, and SC from 3-month-old, humanized FD-phenotypic mouse TgFD9; Elp1 Δ20/flox36 .The TgFD9; Elp1 Δ20/flox mouse carries one Elp1 allele in which exon 20 is deleted (Elp1 Δ20 allele), one Elp1 allele with loxP sites flanking exon 20 (hypomorphic Elp1 flox allele) and the human TgFD9 transgene containing 9 copies of the complete ELP1 gene into which we inserted the major IVS20+6T>C splice mutation 36,38 (Supplementary Fig. S1).We first measured the expression levels of full-length ELP1 transcript in each tissue and found a significant downregulation of full-length transcript in all five mouse neuronal tissues (Fig. 2A).In mouse tissues, the expression of FD full-length ELP1 transcript compared to controls was 30.97% in DRG, 39.51% in TG, 44.56% in MED, 54.45% in Cortex, and 44.62% in SC.Next, we explored transcriptional 'signatures' representing the most significant transcriptional changes across tissues following ELP1 reduction by performing DEG analyses and gating results on those with false discovery rate (FDR) less than 0.1 and fold changes (FCs) either greater than 1.2 (i.e.upregulated) or less than 0.8 (i.e.downregulated), compared to controls (see "Methods").Using this approach, we observed 148 DEGs (FDR < 0.1) in DRG, 194 DEGs in TG, 65 DEGs in MED, 19 DEGs in SC and 59 DEGs in cortex (Fig. 2B, Supplementary Fig. S2A, Supplementary Table S1), demonstrating significantly higher dysregulation in the PNS tissues.As expected, ELP1 was the most downregulated gene in all five tissues.The strongest increase in expression was observed with Fev (alias Pet1) in the DRG, TG, MED, and SC.Fev is a transcription factor known to play a crucial role in the differentiation and functional maturation of serotonergic neurons and displayed a 49-fold and 37-fold increase in expression in the DRG and TG respectively [39][40][41][42][43] (Fig. 2B, Supplementary Table S1).In DRG and TG, we also observed upregulation of Th (1.66-fold in DRG and 2.14-fold in TG), which encodes a tyrosine hydroxylase and serves as a marker for dopaminergic neurons (Fig. 2B, Supplementary Table S1).Interestingly, the expression of the other Elongator subunits (Elp2-6) was not affected by the reduction of ELP1 in any of the five mouse neuronal tissues (Supplementary Fig. S3).
We subsequently conducted functional enrichment analyses on the DEGs for each tissue using gene ontology (GO) to identify the pathways that were significantly enriched for DEGs in each FD neuronal tissue (see "Methods", Supplementary Table S2).In DRG and TG, which exhibited the most pronounced gene expression dysregulation, we found DEGs were enriched in multiple GO terms associated with synaptic signaling and amine-related metabolic processes at FDR < 0.1 (Fig. 2C).Conversely, no significant enrichment was observed in CNS (Supplementary Fig. S2B).
The substantial number of observed DEGs, coupled with the significant enrichment of functional terms in the PNS, aligns with the observation of a drastic reduction (average ~ 65%) in full-length ELP1 transcript within the PNS tissues.The dramatic ELP1-dependent gene dysregulation in the PNS is consistent with the significant neuronal loss observed in DRG from FD patients 25 .The fact that an average 52% decrease in ELP1 in the CNS does not result in dramatic gene dysregulation underscores different tissue-specific sensitivity to ELP1 reduction.

Dose-responsive genes create a connection between the DEGs specific to each tissue and ELP1 reduction
Although there were significant expression alterations (DEGs) between FD and control tissues, we did not observe a direct connection to Elp1 in the mouse protein-protein interaction network (PPI).Therefore, we sought to identify ELP1 dose-responsive genes, which are defined as genes that display co-expression and tight correlation with the level of full-length ELP1 transcript 35 .These genes are highly sensitive to ELP1 dosage even though their expression does not significantly change between FD and control.To identify the ELP1 dose-responsive genes, we adopted a two-step approach as previously described in Morini et al. 35 .Initially, we identified coexpressed gene modules with eigengenes that correlate with full-length ELP1 expression, followed by filtering for individual genes within these modules that display the strongest correlation (see "Methods").As expected, each tissue had one co-expression module containing Elp1 (Supplementary Table S3).Among 641 co-expression modules across five tissues, only seven modules (1.09%) in DRG, TG, and MED showed high correlation with the expression of full-length ELP1 transcript (Supplementary Fig. S4A).In DRG, we identified 156 ELP1 doseresponsive genes, while in TG and MED, we identified 137 and 514 such genes, respectively (Supplementary Fig. S4B-H, Supplementary Table S4).www.nature.com/scientificreports/Next, we intersected DEGs and ELP1 dose-responsive genes to generate a set of tissue-specific FD signature genes that showed a strong correlation with ELP1 expression.Using the annotated mouse PPI data, we constructed gene networks to uncover the potentially disrupted molecular pathways in FD (see "Methods") (Fig. 3A, Supplementary Fig. S5A).Interestingly in DRG, TG and MED we observed a significantly higher number of interactions among the FD signature genes compared to what would be expected by chance (PPI enrichment p-value < 1.0E−12, Fig. 3A, Supplementary Fig. S5A).In all three tissues, Elp1 was found to be connected to a network that encompassed at least 48% of the FD signature genes (DRG, 48.21% or 121/251; TG, 49.82% or 138/277, and MED 86.27% or 490/568, Supplementary Table S5).Notably, these networks highly relied on the inclusion of the ELP1 dose-responsive genes (Fig. 3A, Supplementary Fig. S5A), as their exclusion resulted in an Elp1 network that contained less than 1% of the DEGs.For instance, in the DRG FD-dysregulated network, three ELP1 dose-responsive genes, namely Iars (FDR = 0.041, fold change = 108%), Asns (FDR = 0.015, fold change = 110%), and Aldh18a1 (FDR = 0.011, fold change = 111%), mediated the interactions between Elp1 and all network DEGs, except for Hdhd3 and Pxylp1 (Fig. 3A).Iars encodes isoleucyl-tRNA synthetase 44 , Asns encodes asparagine synthetase 45 , and Aldh18a1 encodes pyroline-5-carboxylate synthetase 46 , which are all associated with cellular amino acid metabolism.They establish connections between Elp1 and a series of solute carrier 7 (Slc7) family members, among which, Slc7a5 and Slc7a3, are responsible for neuronal amino acid transport across the cell membrane and are DEGs between FD and control 47,48 .
The FD-dysregulated network in DRG showed several FD signature genes with five or more connections to their neighbors that serve as "hubs" within the network.It is noteworthy that 53.85% (21/39) of these hub genes were upregulated DEGs, while only 12.82% (5/39) of the hub genes were downregulated DEGs (Fig. 3B).To assess their relative contribution to the transcriptomic dysregulation observed in FD DRG, we ranked these hub genes based on their number of connections (Fig. 3B).The top five hubs that are DEGs in the network, Rac2, Mmp9, Cxcl12, Hp, and Adipoq, directly connect to 42.19% (27/64) of the network DEGs in DRG, as well as to each other.Further investigation into the molecular function of these genes and their role in maintaining neuronal health will provide valuable insights into the etiology of FD.We then examined the abundance of DEGs at different distances from Elp1 in the FD-dysregulated network of DRG, which provided insights into the influence of ELP1 reduction on the transcriptome.Interestingly, we found that 28.69% (72/251) of FD signature genes (and 26.35% DEGs) were located six to seven steps away from Elp1 in the DRG dysregulated network (Fig. 3C).In contrast, there were fewer FD signature genes (13.55%) within five steps of Elp1.This observation aligns with the placement of the top five hub genes within the network and suggests that the impact of reduced ELP1 levels may be amplified along the FD-dysregulated network in DRG.
The FD signature genes specific to TG showed significant enrichment in synaptic signaling, GABAergic synapse and neurotransmitter pathways (Supplementary Fig. S5A, Supplementary Table S6).Similar to DRG, the connection between Elp1 and the rest of the FD signature genes was mediated by an ELP1 dose-responsive tRNA synthetase, Cars 49 (FDR = 0.0011, fold change = 113%, Supplementary Fig. S5A).Furthermore, like in DRG, the FD signature genes in TG are located far away from Elp1 (Supplementary Fig. S5B), and the majority (71.88%) of TG hub genes are significantly upregulated in FD compared to control samples (Supplementary Fig. S5C).
In MED, both the FD signature genes and FD-dysregulated gene network were distinct from those in the PNS tissues.Out of 568 FD signature genes specific to MED, 87.85% (499/568) were ELP1 dose-responsive genes.The significant number of dose-responsive genes, coupled with the low count of DEGs, suggests a relatively mild impact of ELP1 reduction on the MED transcriptome.The FD signature genes were significantly enriched in the chromatin regulator term (Supplementary Fig. S6A, S6C, Supplementary Table S6), which was not observed in the FD signature genes of the PNS tissues.Additionally, these gene signatures were found to be closer to Elp1, with 63.38% (360/568) of MED signature genes located within four steps of Elp1 (Supplementary Fig. S6B).
Collectively, our findings indicate that ELP1 dose-responsive genes play a crucial role in mediating the connections between tissue-specific DEGs and Elp1.Furthermore, they nominate highly connected loci in the FDregulatory gene network, shedding light on important contributors to the molecular etiology of FD.

Convergence of transcriptomic dysregulation in FD DRG and TG
We compared the hub genes across the three tissue-specific FD networks, and we discovered that seven hubs were shared between DRG and TG, while only one hub was shared by all three tissues (Fig. 3D, Supplementary Fig. S6D).To identify common and convergent molecular mechanisms underlying FD, we evaluated the extent of similarity between any two transcriptomes in relation to ELP1 reduction.From a differential expression perspective, we observed that DRG and TG shared 44 DEGs (~ 26%, p = 4.55E−54, hypergeometric test) enriched for synaptic signaling, dendrite tree development, and ammonium ion metabolic processes (Fig. 4B, Supplementary Table S7).In contrast, there were less than five overlapping DEGs between any two CNS tissues, although the number of overlaps was significantly different from what would be expected by chance (Fig. 4A).Among the three CNS tissues, only three DEGs, including ELP1, were shared (Supplementary Table S1).
The PNS convergent DEGs, when connected to Elp1 in both DRG-and TG-specific FD-dysregulated gene networks, exhibited a significantly similar distribution (Pearson correlation coefficient = 0.88, p < 2.2E−16) and were relatively distant from Elp1 (Fig. 4C).Additionally, the magnitude of dysregulation of these 44 PNS convergent DEGs was nearly identical, as indicated by a high Pearson correlation coefficient of 0.98 (p < 2.2E−16) for the fold change correlation between DRG and TG (Fig. 4D).Next, we compared the total level of transcriptomic disruption across tissues.We utilized the rank-rank hypergeometric overlap which evaluates the concordance of transcriptomic changes between any two tissues 50,51 .The two PNS tissues exhibited the strongest concordance (odds ratio > 1, Chi-squared test), whereas evidence of concordance among the CNS tissues was much weaker (refer to Supplementary Fig. S7, see "Methods").Discordance became apparent when comparing a CNS tissue to a PNS tissue (Supplementary Fig. S7).We also found that 88.5% (46/52) of the co-expression modules in DRG were preserved in TG (Fig. 4E).Taken together, these findings suggest convergent dysfunction in the two PNS tissues in FD.

PNS convergent DEGs show association with specific neuronal subtypes in FD
To gain a deeper understanding of how PNS convergent DEGs contribute to FD etiology, we hypothesized that the observed dysregulation might be driven by specific neuronal subtypes unique to the PNS.To investigate this, we analyzed publicly available single-cell RNA sequencing (scRNA-seq) data from wildtype mouse DRG 42,52 and TG 53 .By combining neuronal subtype markers provided in these studies with the novel markers identified through our analyses (Supplementary Fig. S8A-C, Supplementary Table S8, see "Methods"), we discovered significant overlaps with the DRG, TG, and PNS convergent DEGs (Fig. 5A).These sets of DEGs were enriched for specific neuronal subtype markers.Notably, we observed enrichment of markers for peptidergic nociceptors in www.nature.com/scientificreports/DRG and TG, myelinated sensory neurons, and TH+ neurons in DRG, and c-fiber mechanoreceptors and cold nociceptors in TG (all FDRs < 0.07, hypergeometric test).Furthermore, out of 44 PNS convergent DEGs, 15 were identified as neuronal subtype markers (Fig. 5B), with 11 of them being markers for peptidergic nociceptors and myelinated sensory neurons 42 .These findings suggest that these two neuronal subtypes might be particularly susceptible to ELP1 reduction in the PNS.Collectively, our results provide novel insights into the dysregulation of peripheral nervous system gene expression in FD.Tissue-specific analyses revealed a greater impact of ELP1 reduction on PNS tissues compared to CNS tissues, as evidence by the number of DEGs.The FD-dysregulated gene networks showed upregulated hub genes that were significantly enriched in the PNS tissues.Cross-tissue comparisons further highlighted convergent mechanisms of disrupted synaptic signaling and amine-related metabolic processes in the PNS tissues, while such convergence was not observed across CNS tissues.

Discussion
FD is a progressive neurodegenerative disease that manifests in various debilitating symptoms including diminished pain and temperature perception, decreased or absent myotatic reflexes, proprioceptive ataxia, and retinal degeneration.Recent studies have provided compelling evidence linking the reduction of Elp1 to sensory neuronal loss and diminished tissue innervation [54][55][56] .However, the intricate molecular mechanisms connecting ELP1 reduction with the phenotypic manifestations of the disease remain largely unknown.
Using a humanized FD phenotypic mouse that recapitulates the same tissue-specific reduction of ELP1 observed in patients, we have conducted a comprehensive transcriptomic study to investigate the disrupted gene expression and pathways underlying FD etiology in disease-relevant neuronal tissues.We collected DRG and TG as representative PNS tissues, and cortex, MED, and SC as representative CNS tissues, from both control and FD-phenotypic mice.We found significant differences in the levels of full-length ELP1 transcripts between PNS and CNS.The PNS tissues, DRG and TG, exhibited lower levels of full-length ELP1 transcripts (~ 35% of the control level) when compared to the three CNS tissues, MED, cortex, and SC (~ 48% of the control level).
To gain deeper insights into the molecular networks and pathways involved in FD, we assembled a set of FD signature genes by combining tissue-specific DEGs that represented the most prominent transcriptional alterations, and ELP1 dose-responsive genes that exhibited moderate expression changes but they closely correlated with ELP1 levels.The FD signature genes formed interconnected gene networks providing a comprehensive view of how ELP1 reduction impacts the expression of many genes.This study shows that merely using DEGs is not sufficient to build a meaningful PPI network.Adding dose-response genes interconnects the robust signals from DEGs and creates a more interconnected and expansive network.FD is a recessive disease but, unlike most recessive diseases, it does not result from complete loss of a protein but is caused by tissue-specific reduction of ELP1 57,58 .It is reasonable to think that the reduction of ELP1, instead of its complete depletion, might cause moderate transcriptomic changes (e.g.ELP1 dose-responsive genes) in its immediate downstream genes.These moderate responders could then synergistically lead to more dramatic expression alterations (DEGs) deeper in the molecular network.We indeed observed such patterns in both DRG and TG FD-dysregulated networks.
In the DRG, for instance, the connection of Elp1 with the other DEGs is dependent on the inclusion of the three dose-responsive genes Iars, Aldh18a1 and Asns (Fig. 3A).These three genes encode synthase proteins.It is well known that the cellular concentrations of tRNA synthase must be precisely regulated and overproduction of them leads to various disorders including neurodegeneration 23,59,60 .In addition, we observed significant upregulation of amino acid transporter genes Slc7a5 (alias Lat1) and Slc7a3 (alias Cat3).Both tRNA synthesis and amino acid transport are the immediate upstream biological steps of tRNA wobble modification, one of the well-studied functions of ELP1 13,15,23,59,61 .Interestingly, the expression of the other Elongator subunits (Elp2-6) was not affected by the reduction of ELP1.This suggests that the expression of each subunit might be independently regulated, and they may not be linked by negative feedback regulation.However, it is important to note Figure 3. Tissue-specific dysregulated gene network due to ELP1 reduction.(A) The dysregulated gene network in DRG.Each node is either a DEG indicated by a diamond shape or an ELP1 dose-responsive gene indicated by a round shape.The colors for the nodes reflect the fold changes in the genes between FD and Control.The red color domain represents upregulation between FD and Control while the blue color domain represents downregulation.The deeper the color, the stronger the fold changes.Each edge represents an interaction between the two connected genes, where only an interaction score of more than 0.4 (default) in String-DB is displayed.The thicker the edge, the higher the interaction score.Only the dysregulated genes with at least one interaction are displayed.The rings outside the nodes represent significant functional enrichment with FDR < 0.1 using all the dysregulated genes (i.e., DEGs and ELP1 dose-responsive genes).The names of three tRNA synthetases next to Elp1 were marked in magenta.The associated functional enrichment terms with the ring colors are given, where the values in the brackets are the enrichment FDRs for the terms.(B) The bar plot demonstrates hub genes in DRG ranked by their number of connections to the neighbor genes in the network of panel (A).The x-axis represents the hub gene names, where each gene is colored according to its dysregulation direction and gene category.(C) The bar plot demonstrates the number of dysregulated genes in DRG at different distances to ELP1.The x-axis represents the distance of the shortest path to a gene.The genes in the "0-connectivity" distance category refer to those dysregulated genes not displayed in panel (A) because they don't have any interaction score > = 0.4.The y-axis represents the number of genes at each distance.(D) The table demonstrates the shared hub genes across DRG, TG, and MED.The rows represent the tissues while the columns represent the hub genes shared by at least two tissues.The colors of the grids reflect the genes' categories and dysregulation directions.that the functional integrity of Elongator is compromised in the absence of any of its subunits 13,62 .Therefore, the observed dysregulation is likely a result of the compromised function of the entire Elongator complex.We acknowledge that these findings are based on the expression profiles identified from an FD-phenotypic mouse model and rely on the annotated mouse PPI network.Further evaluation is needed to determine the extent to which these findings can be translated to FD patients.Transcriptome-wide, we observed striking similarities in the response to ELP1 reduction between DRG and TG.The PNS convergent DEGs in this study were enriched for synaptic signaling and amine transport.This PNS enrichment aligns with the functional enrichment patterns observed in DEGs specific to each tissue.Further exploration of these convergent and tissue-specific DEGs may provide valuable insights into the underlying mechanisms of FD.
Finally, to determine if the observed dysregulation was specific to unique neuronal subtypes in the PNS, we combined the markers identified through our analyses with mouse DRG and TG neuronal subtype markers from publicly available single-cell RNA sequencing (scRNA-seq) data 42,52,53 .Remarkably, we found significant overlaps between the PNS convergent DEGs and the single-cell markers associated with peptidergic nociceptors and myelinated sensory neurons supporting the hypothesis that certain neuronal subtypes are more susceptible to ELP1 reduction.
Overall, our study provided a comprehensive analysis of the disrupted transcriptomic dynamics in FD from both tissue-specific and cross-tissue perspectives.By examining gene expression patterns, we identified several gene sets that may contribute to the disease manifestations.The FD signature genes in the PNS tissues were found to be enriched in amine-related metabolic processes, which in turn influenced synaptic signaling.Our findings suggest the peptidergic nociceptors and myelinated sensory neurons in the PNS are particularly affected in FD, as evidenced by significant alterations in the expression of their marker genes upon ELP1 reduction.Our study not only provides valuable insights into the intricate molecular mechanisms underlying FD but also have broader implications for other neurological disorders associated with Elongator dysfunction.The heatmap demonstrates the overlap between DEGs identified from this study (columns) and the neuronal subtype markers identified from publicly available scRNA datasets for mouse DRG and TG.The numbers in the brackets after the row or column names indicate the number of genes in each category.The numbers in the grids indicate the number of gene overlaps between the two categories.The grid colors reflect the significance of overlap according to the hypergeometric test followed by FDR correction, in the − log10-transformed scale.The deeper the color, the more significant the overlap.The grids in white mean no significant overlap where FDR > = 0.1.(B) The overlaps between the PNS convergent DEGs (columns) and the neuronal subtype markers identified from publicly available scRNA datasets for mouse DRG and TG (rows, the same as panel (A)).The colors of the grids reflect the genes' dysregulation directions between FD and Control, where red is for upregulation while blue is for downregulation.

Animals
The generation of the TgFD9 mouse line carrying the human ELP1 transgene with the NM_003640.5:c.2204+6T>Cmutation can be found in Hims et al. 38 .Descriptions of the original targeting vector used to generate the Elp1 flox allele and the strategy to generate the Elp1 Δ20 allele have been previously published 63,64 .
All experimental protocols were approved by the Institutional Animal Care and Use Committee of the Massachusetts General Hospital and were in accordance with ARRIVE (https:// arriv eguid elines.org/) and NIH guidelines.

Tissue-specific RNA-seq
RNA was extracted from DRG, trigeminal ganglion, cortex, medulla, and spinal cord collected from 12 control (6 males and 6 females) and 13 FD (4 males and 9 females) 3-month-old mice.Using the QIAzol Reagent, following the manufacturer's instructions.RNA-seq libraries were prepared using the Tru-Seq Stranded ® mRNA library Prep Kit (Illumina, 20020594) using 100 ng of total RNA as input.Final library concentration was quantified using size distribution by the Agilent 2200 Tape Station and/or qPCR using the Library Quantification Kit (KK4854, Kapa Biosystems).Equimolar amounts of each library were pooled prior to multiplex sequencing.Libraries were 50 basepair paired end sequenced on the Illumina HiSeq 2500 across multiple lanes.The HiSeq Sequencing Control Software was used for real-time image analysis and base calling before CASAVA (version 1.8) was used for FastQ generation.

RNA-seq pre-processing
A custom transcriptome reference was generated by adding the human ELP1 gene (ENSG00000070061) Ensemble Human transcriptome reference GRCh37.75 to the Ensembl Mouse Transcriptome GRCm38.83 as an independent chromosome.RNA-seq reads were mapped to this synthesized transcriptome reference by STAR v2.5.22b allowing only uniquely mapped reads with 5% mismatch 65 .Illumina TruSeq reads were trimmed using Trimmomatic (v0.36) with minimal length set to 105 and other default parameters 66 .

Differential gene expression analysis
Gene counts were performed via HTSeq-counts 67 (version 0.11.2) with '-s reverse' option to be compatible with the Illumina TruSeq library.Genes were further filtered so that only genes whose median expression was no less than 0.1 counts-per-million in at least one genotype were kept for analysis.Raw gene counts were then normalized using sample-wise size factors estimated by the Bioconductor package DESeq2 68 (v1.34.0).To get the most robust DEGs between the two genotypes of interest (FD and control), surrogate variables unrelated to genotype were first estimated from the normalized counts via the Bioconductor package SVA 69 (v3.42.0) and then built into a generalized linear model (GLM) together with genotype via DESeq2.Data from both males and females were combined for this analysis.We performed a correlation analysis on several parameters including sex with principal components and surrogate variables.Sex was corrected by surrogate variable analysis and did not correlate with any principal components.Since the phenotypes in FD patients do not exhibit sex differences 1,25 , in this study we aimed at identifying the common disease mechanism regardless of sex.

Concordance of FD transcriptomic dynamics between tissues
To assess the similarity of the transcriptomic response we compared the expression changes between FD and Control across any two tissues using the Rank-Rank Hypergeometric Overlap test via the RRHO2 Bioconductor package 50,51 .The significance of overlap was calculated in log-transformed odds ratio from the Chi-squared test.

Gene networks according to PPI
These networks were built to include the query gene sets using the "stringApp" 70 (version 2.0.1) in Cytoscape 71 (version 3.9.1).For a tissue-specific FD-dysregulated gene network, the query set consisted of tissue-specific DEGs and ELP1 dose-responsive genes.For the PNS convergent network, the query set consisted of shared DEGs between DRG and TG.To construct each network, the "STRING: protein query" mode was used, with species set as "Mus musculus", confidence (score) cutoff set as 0.4 and maximum additional interactor set as 0.

Shortest distance from Elp1
The "Edge Table" from the stringApp network construction was exported.A customized R script was used to convert the pair-wised interactions in the "Edge Table " to an R list object, where the list names represented the nodes (i.e., genes) in the network while the list elements were vectors representing genes connected to each node.Then the shortest distance was calculated via the "shortest_paths" function from an opensource R package "igraph" (version 1.3.5, https:// github.com/ igraph/ igraph).

GO analysis
In each analysis, the query gene set was searched against a background gene list consisting of either expressed genes from the same tissue or non-redundant union of expressed genes from multiple tissues where the query set was built from.For functional enrichment of tissue-specific DEGs, the resources of GO were from the Gene Set Enrichment Analysis website 72 (https:// www.gsea-msigdb.org/, v. MS1).For functional enrichment of FD signature genes or PNS convergent DEGs that were used to build interaction networks, the "Functional enrichment" function from the stringApp in Cytoscape was used, with the appropriate background expressed genes as reference.For any two significant functional terms, an overlap score was calculated to reflect their semantic similarity.An overlap score cutoff of either 0.1 or 0.2 was applied on the raw result to remove redundancy in the results.
For all the functional enrichment analyses in this study, complete lists of the results were provided in the supplementary tables.

Co-expression module analysis
Once generalized linear models for differential gene expression were established, the effects from surrogate variables were regressed out from the normalized gene counts to create a cleaned matrix whose variance was mainly due to the genotype difference.Then the R package WGCNA 73 (version 1.71) was implemented upon this cleaned matrix of each tissue, identifying genes co-expressed together and grouping them into modules.To achieve the best performance, the soft-thresholding power was heuristically selected for each tissue (cortex power = 5, dorsal root ganglion power = 6, medulla power = 5, spinal cord power = 6, trigeminal ganglion power = 7) at the beginning of the WGCNA approach.A signed network was used, and minimal module size was set to 30 and the raw modules were merged with a dis-similarity cut-off of 0.25.

Correlation between co-expression modules' eigengenes and the full-length ELP1 expression
The full-length ELP1 transcript expression was measured as the expression sum of exon 20 (human) and exon 26 (mouse), in the unit of counts per million (CPM).The values across all samples from the same tissue were correlated with each eigengene representing the identified co-expression modules using Pearson correlation.ELP1 dose-responsive genes were defined as the genes meeting both of the following criteria: (1) the Pearson correlation between their co-expression module eigengene and the full-length ELP1 had a coefficient no less than 0.8; and (2) the absolute value of the Pearson correlation coefficient between their normalized expression and the module eigengene had a coefficient no less than 0.8.

Analysis of publicly available scRNA-seq data
For DRG scRNA-seq, the processed data was downloaded from GEO (GSE59739).Cells with the top and bottom 2.5% of the number of RNA features were filtered out.The normalized counts were what the authors provided 42,52,53 .Dimensionality reduction was first done via multiple correspondence analysis (MCA) using the CelliD package 74 (version 1.6.0),followed by uniform manifold approximation and projection (UMAP) using the Seurat package 75 (version 4.2.1).Unsupervised clustering was done in the UMAP space using 20-nearest neighbor graph construction with a resolution of 0.2.For each unsupervised cluster, its gene signatures were defined as the top 200 nearest genes to the cluster (i.e., cells) center in the MCA space.The cell-type markers provided by the authors were then compared with the unsupervised cluster signatures.If the overlap was significant under the hypergeometric test, the unsupervised cluster would be marked as the cell-type provided by the authors and the cell-type markers would be replaced by the unsupervised cluster signatures.If the cell-type provided by the authors was found to have no significant overlaps with the markers from the unsupervised clusters, the authors' cell-types and markers were retained in the final marker list.
For TG scRNA-seq, the raw counts data were downloaded from GEO (GSE197289).Cells with top and bottom 2.5% of the number of RNA features were filtered out.Normalization was done using SCTransform package 76 (version 0.3.5),with variance stabilization flavor set to "v2".Like the DRG processing, the dimension reduction was done by MCA followed by UMAP.The UMAP visual separation already agreed with the cell types provided by the authors.The signature of each cell type was called by the top 200 nearest genes to the cell-type center in the MCA space.

Statistical analysis
Wald test was used to estimate the significance of DEGs from the DESeq2 models.Within each tissue, genes with false discovery rate (FDR) < 0.1 and a fold change cut-off was applied (more than 120% for upregulated or less than 80% for downregulated genes) and these genes were then considered as significant.Fisher's exact test was used for GO analysis where a significant enrichment was defined as FDR < 0.1.A significant correlation throughout this study was defined as Pearson correlation coefficient ≥ 0.8.For overlap significance, hypergeometric test was used and the p-value < 0.05 (or FDR < 0.1 when multiple test correction was applicable) was considered as significant.The four values used in the hypergeometric test were the size of gene list A, gene list B, their overlaps, and their non-redundant background genes (e.g., all expressed genes in the transcriptome where list A and B derived from).

Figure 1 .
Figure 1.Experiment design and analysis workflow.(A) Tissue-specific DEG analyses to reveal the most dominant influence of ELP1 reduction.(B) Assembly of FD gene signatures and the disrupted gene network by integrating DEGs and ELP1 dose-responsive genes.(C) Identification of convergent disease mechanisms of FD across tissues using shared DEGs.

Figure 2 .
Figure 2. Tissue-specific DEGs and their functional enrichment.(A) The violin plot demonstrates the summed expression of full-length human ELP1 and mouse Elp1 in counts per million (CPM, log2-transformed) per tissue.The shape represents the distribution of expression values with individual points representing the actual CPM data points.The line in the middle of the box inside the violin distribution represents the median value with the upper and lower hinges of the box representing the first and third quartiles respectively.A t-test was used to calculate the p-value of the difference between FD and Control, followed by FDR correction.The '***' indicates FDR < 0.001.(B) The heatmap represents the 25 most upregulated and downregulated genes between FD and Control across five tissues.Each row represents a gene, and each column represents a tissue.The red color domain represents upregulation between FD and Control while the blue color domain represents downregulation, where the expression changes are measured by log2transformed fold changes.The deeper the color, the stronger the fold changes.The range was set to [− 1.5, 2].Values beyond this range were clipped to either − 1.5 or 2, whichever is closer.The rows are ordered, from top to bottom, by the greatest fold change across the five tissues.(C) Bar plot that represents the functional enrichment based on DEGs in DRG and TG, respectively.Note, only 10 significant terms (FDR < 0.1) are selected to display per tissue.The x-axis represents the enrichment significance in − log10-transformed FDR while the y-axis represents the selected significant terms.The bar colors indicate the resources of GO.The vertical black dashed line represents an FDR of 0.1. https://doi.org/10.1038/s41598-023-51137-6

Figure 4 .
Figure 4.The convergence of transcriptomic dysregulation in the PNS tissues.(A) The UPSET plot demonstrates the DEGs overlaps between two out of the five tissues.The x-axis of the bar plot represents overlap comparisons while the y-axis of it represents the number of genes.(B) The gene network of PNS convergent DEGs.Each node represents a PNS convergent DEG indicated by a round shape.Each edge represents a potential interaction between the two connected genes, where only an interaction score of more than 0.4 (default) in String-DB is displayed.The thicker the edge, the higher the interaction score.The rings outside the nodes represent significant functional enrichment with FDR < 0.1 of all the PNS convergent DEGs.The associated functional enrichment terms with the ring colors are given, where the values in the brackets are the enrichment FDRs for the terms.(C) The scatter plot demonstrates the shortest distances of PNS convergent DEGs from ELP1 in the DRG dysregulated network (x-axis) and the TG dysregulated network (y-axis), respectively.Each dot represents a PNS convergent DEG with non-zero connectivity in both networks.The blue line represents the best-fitted linear regression line while the grey zone around the line represents the 95% confidence intervals.(D) The scatter plot demonstrates the log2-transformed fold change of PNS convergent DEGs in DRG (x-axis) and TG (y-axis), respectively.Each dot represents a PNS convergent DEG.The blue line represents the best-fitted linear regression line while the grey zone around the line represents the 95% confidence intervals.(E) The scatter plot demonstrates the DRG co-expression modules' sizes (x-axis) and their similarity to the TG co-expression modules, measured by Z summary preservation scores (y-axis).The score indicates the degree of relatedness of each module to other modules in other co-expression networks.The dots represent the co-expression modules identified in DRG.The three modules whose eigengene highly correlated with the full-length ELP1 expression, namely black, brown, and salmon, are highlighted in their corresponding colors.The blue dashed line indicates a module preservation score of 2 below which the preservation is not considered strong, while the red dashed line represents a module preservation score of 10 above which the preservation is considered very strong.

Figure 5 .
Figure 5.The potential transcriptomic dysregulation at neuronal subtype levels in the PNS tissues.(A) The heatmap demonstrates the overlap between DEGs identified from this study (columns) and the neuronal subtype markers identified from publicly available scRNA datasets for mouse DRG and TG.The numbers in the brackets after the row or column names indicate the number of genes in each category.The numbers in the grids indicate the number of gene overlaps between the two categories.The grid colors reflect the significance of overlap according to the hypergeometric test followed by FDR correction, in the − log10-transformed scale.The deeper the color, the more significant the overlap.The grids in white mean no significant overlap where FDR > = 0.1.(B) The overlaps between the PNS convergent DEGs (columns) and the neuronal subtype markers identified from publicly available scRNA datasets for mouse DRG and TG (rows, the same as panel (A)).The colors of the grids reflect the genes' dysregulation directions between FD and Control, where red is for upregulation while blue is for downregulation.