Rare disease research workflow using multilayer networks elucidates the molecular determinants of severity in Congenital Myasthenic Syndromes

Exploring the molecular basis of disease severity in rare disease scenarios is a challenging task provided the limitations on data availability. Causative genes have been described for Congenital Myasthenic Syndromes (CMS), a group of diverse minority neuromuscular junction (NMJ) disorders; yet a molecular explanation for the phenotypic severity differences remains unclear. Here, we present a workflow to explore the functional relationships between CMS causal genes and altered genes from each patient, based on multilayer network community detection analysis of complementary biomedical information provided by relevant data sources, namely protein-protein interactions, pathways and metabolomics. Our results show that CMS severity can be ascribed to the personalized impairment of extracellular matrix components and postsynaptic modulators of acetylcholine receptor (AChR) clustering. This work showcases how coupling multilayer network analysis with personalized -omics information provides molecular explanations to the varying severity of rare diseases; paving the way for sorting out similar cases in other rare diseases.


Acetylcholine biosynthesis and release
Acetylcholine, the main neurotransmitter involved in skeletal muscle contraction, is synthesized in the presynaptic neuron, by the choline acetyltransferase (enzyme encoded by CHAT gene), using Acetyl-CoA and choline as substrate in the reaction 1 .Compound heterozygous mutations in this gene were identified by Ohno et al. 2 causing CMS in 5 patients.Solute carriers are critical for this process.Three genes encoding this class of transporters have been previously related to CMS and neuromuscular transmission defects, namely SLC5A7 3 , SLC25A 4 , and SLC18A3 5 .SLC5A7 encodes the membrane choline transporter 6,7 .Acetyl-CoA presence is in part dependent on malate exported from mitochondria, by the action of SLC25A1 transporter 8 .Finally, after CHAT generates the acetylcholine, this is carried into synaptic vesicles by SLC18A3 gene product, the VAChT transporter 9 .Another CMS causal gene that might have a detrimental effect at presynaptic level is PREPL.This gene encodes a serine oligopeptidase essential for the activation of clathrin associated adaptor protein 1 (AP1), which is needed by VAChTr to fill the synaptic vesicles with acetylcholine 10 .Régal et al. 11 described a CMS case caused by a heterozygous deletion.Rabphilin 3a (RPH3A) is also involved in vesicle trafficking in the presynaptic element 12,13 and has recently been described as causative of a specific form of CMS 14 .Other genes described as causal of CMS related to the vesicle generation and exocytosis are SNAP25 15 , VAMP1 16,17 , SYT2 18,19 and UNC13B 20 .SNAP25 encodes synaptosomal-associated protein 25 21 , which is a part of the SNARE complex, where also synaptobrevin 1 (VAMP1) is allocated 22 .This SNARE complex is key for the Ca²⁺-induced exocytosis of synaptic vesicles, a process in which Synaptotagmin 2 (SYT2), the Ca2+ sensor, is also critical 23 .UNC13B encodes a homolog protein to rat Munc13-1.This protein has a calmodulin site and also regulates synaptic vesicles by mediating in the SNARE complex conformation 24 .

Acetylcholine Receptor clustering
While acetylcholine is the main neurotransmitter in the neuromuscular junction contraction process, another important molecule, the proteoglycan agrin (AGRN), is released by exocytosis from the motor neuron into the synaptic cleft, where it binds the LRP4 receptor.A special type of myosin, MYO9, is known to affect AGRN exocytosis upon depletion, causing a characteristic type of CMS 25,26   .AGRN binding to LRP4 leads to MuSK (MUSK) self-phosphorylation. Activated MuSK recruits Dok-7 (DOK7), which in the end stimulates Rapsyn (RAPSN) for AChRs (acetylcholine receptors) clustering at the skeletal muscle fiber membrane 27 .MuSK, Dok-7 and Rapsyn absence has been previously reported to result in AChR deficiency, poor neuromuscular junction development and causal of some CMS cases [28][29][30][31] .Interestingly, promoting MuSK activity has been described as capable of preserving neuromuscular synapses in Amyotrophic Lateral Sclerosis mice models 32 .Plectin, encoded by the gene PLEC, is essential in the AChR clustering process as it bridges AChRs to the postsynaptic intermediate filament network (IF) via interaction with rapsyn anchoring of endplate acetylcholinesterase (AChE) at the NMJ extracellular matrix (ECM), via a collagen-like peptide encoded by COLQ gene 36 .AChE is involved in terminating impulse transmission, by hydrolysis of acetylcholine.Mutations in COLQ have been reported as causative for a specific form of CMS 37,38 .The acetylcholine receptor itself is the main source of CMS-related mutations.In adult individuals, the receptor acts as a cation ligand-gated ion channel formed by 5 homologous subunits, being α2βδε its stoichiometry.The channel is mainly permeable to Na⁺ and K⁺, and to Ca²⁺ in a lesser way.When acetylcholine binds to AChR, the channel opens triggering the membrane depolarization 39 .All the genes encoding the receptor subunits (CHRNA, CHRNB, CHRND, CHRNE and CHRNG) have been described as causal for different CMS types [40][41][42][43][44] .CHRNE, which encodes the ε subunit of the AChR receptor, is causative for ~50% of all reported CMS cases, although frequencies might vary depending on ethnicity 45,46 .The high prevalence of ε subunit mutations may be the result of partial compensation of its functionality by the embryonic γ (encoded by CHRNG), which is substituted after birth given its lower conductance levels.Mutations in other subunits reduce patient survival as no compensation mechanism occurs 47 .Both Fast-Channel CMS (abnormally short AChR opening time) and Slow-Channel (abnormally long AChR opening time) CMS have been reported for mutations on CHRNE.SCN4A gene encodes the α subunit of the voltage-gated sodium channel (Na v 1.4), which is key for the generation and propagation of action potentials through the skeletal muscle fiber, which causes Ca²⁺ release and fiber contraction.Many SCN4A mutations have been associated with different muscle channelopathies [48][49][50] , including CMS.Another process related to AChR clustering is the Endoplasmic Reticulum glycosylation pathway.Normally, mutations in genes that are part of these processes cause congenital disorders of glycosylation (CDG) 51 .However, mutations in some of the pathways components (DPAGT1, ALG2, ALG14 , GFPT1 and GMPPB) have been also described as causal of some CMS variants 52-55   .As for the ECM, collagens are also involved in the receptor clustering process.Collagen XIII (encoded by COL13A1 gene) is known to be a key regulator of NMJ maturation process and AChR clustering 56 .Logan et al. 57 reported a specific form of CMS being caused by mutations on this gene.Laminins α5 and β2 are also involved molecules in AChR clustering 58 .Each one of the different laminins have its own role during NMJ maturation and development, with mutations in LAMA5 59 and LAMB2 60 being causative of the CMS disease.

Segregation analyses
We employed Rbbt 61 framework to stratify CMS patients based on mutations, aiming to assess whether nonsevere (n=12) and severe (n=8) patients segregate any of the following mutation types (Figure S1): 'overlapping' = the mutation overlaps the span of the gene, from first exon to last, including introns 'mutated_isoform' = the mutation produces a mutated isoform, i.e. an AA change (on one isoform or just the principal isoform, depending on the options used) 'splicing' = the mutation falls within a splicing site, they are deemed to break the protein function 'affected' = the mutations affects the encoded protein, by introducing a mutated isoform or a splice site mutation 'damaged_mutated_isoform' = the mutation makes a specific protein isoform damaged as predicted by damage or pathogenicity predictions 'broken' = the mutation seems to break the protein function, due it introducing a damaging mutation or a splice site mutation 'TSS' = the mutation falls within a transcription starting site (1000 bases from TSS) 'compound' = the gene has at least two mutations that affect it 'homozygous' = the gene is affected by a homozygous mutation 'missing' = the genes function may be entirely missing due to a homozygous or a compound mutation possibly affecting both alleles 'gc19_pc' .promCore'= core promoter of protein coding gene (hg19) 'gc19_pc.promDomain'= promoter domain of protein coding gene (hg19) 'gc19_pc.5utr'= 5'UTR of a protein coding gene (hg19) 'gc19_pc.3utr'= 3'UTR of a protein coding gene (hg19) 'gc19_pc.ss'= splicing site of a protein coding gene (hg19) 'lncrna.promDomain'= core promoter of a long noncoding RNA with coding potential 'lncrna.promCore'= core promoter of a long noncoding RNA with coding potential 'lncrna.ss'= splicing site of a long noncoding RNA with coding potential 'lncrna.ncrna'= long noncoding RNA 'smallrna.ncrna'= small RNA Figure S1.Segregation analysis of several mutation types (described in the text).The number of mutations that overlap in the two groups (Nonsevere and Severe) are reported for sets of individuals (0 to 12 for nonsevere individuals, 0 to 8 for severe individuals).
We define complete segregating mutations that are present in one group and not in the other.No complete segregating mutations were observed (Figure S1), while partial segregation mutations (i.e.present in at least 50% of the patients of one group and not in the other) can be appreciated (Suppl.Dataset 2).

Multilayer community detection analysis
In this work, we performed a multilayer community detection analysis using MolTi software 62 , which adapts the Louvain clustering algorithm with modularity maximization to multilayer networks.The algorithm is parametrized by the resolution parameter γ: the higher the value of γ, the smaller the size of the detected multilayer communities.Given the intrinsic resolution limit of modularity, the reliability of community detection should be assessed ad hoc using quality functions that are able to capture the actual community structure of a network 63 .In this work, we were interested in the identification of communities that robustly express functional relationships among the CMS linked genes (i.e.known CMS causal genes, and severe and nonsevere compound heterozygous variants and CNVs).Accordingly, we sought to determine the largest module of CMS linked genes that are found in the same multilayer communities at any value of resolution within the parameter range in which the community structure is more variable (see Supplementary Figure 12).The adopted procedure is illustrated in Figure S2.The module corresponding to the highest n contains genes that are systematically found in the same community across the entire range of resolution.

Figure S2 .
Figure S2.Module identification based on detected multilayer communities.Genes that are found in the same community at n values of the resolution parameter γ are represented as fully connected modules (upper panel).The resolution range considered is γ∈(0,4] with intervals of 0.5 (Methods).The module corresponding to the highest n contains genes that are systematically found in the same community across the entire range of resolution.