Distribution of disease-causing germline mutations in coiled-coils implies an important role of their N-terminal region

Next-generation sequencing resulted in the identification of a huge number of naturally occurring variations in human proteins. The correct interpretation of the functional effects of these variations necessitates the understanding of how they modulate protein structure. Coiled-coils are α-helical structures responsible for a diverse range of functions, but most importantly, they facilitate the structural organization of macromolecular scaffolds via oligomerization. In this study, we analyzed a comprehensive set of disease-associated germline mutations in coiled-coil structures. Our results suggest an important role of residues near the N-terminal part of coiled-coil regions, possibly critical for superhelix assembly and folding in some cases. We also show that coiled-coils of different oligomerization states exhibit characteristically distinct patterns of disease-causing mutations. Our study provides structural and functional explanations on how disease emerges through the mutation of these structural motifs.


Scientific Reports
| (2020) 10:17333 | https://doi.org/10.1038/s41598-020-74354-9 www.nature.com/scientificreports/ repeats) were also discovered 9 . Folding studies of selected coiled-coils indicated the importance of a specific segment, the trigger sequence, that is required for initiating the proper interaction between the helices 10 . However, it is not yet entirely clear whether specific sequence patterns are required for assembly, or the accumulation of interaction promoting residues at critical positions generally aid coiled-coil formation.
In recent decades many studies addressed how DMs perturb protein structure. The majority of frequently occurring structural elements (e.g., transmembrane 11 and intrinsically disordered protein regions 12 ), as well as various structurally distinct functional regions (e.g., protein-protein interfaces, buried domains 13 ) were analyzed in detail. However, coiled-coils are a largely understudied class in this respect with only individual cases discussed. To our knowledge, only one large-scale study has been published, highlighting the critical role of register positions and pointing out mutations frequently associated with pleiotropy 14 . In this study we integrate multiple prediction algorithms and structural information for an in-depth analysis to assess how non-synonymous disease-associated germline mutations affect coiled-coil structures and thereby their functions. Our work revealed that disrupting hydrophobic and electrostatic interactions impairs coiled-coil structure and disease-associated mutations accumulate near the N-terminal of coiled-coil regions. We also showed that even if their destabilizing effect is small, DMs are enriched in antiparallel homodimer coiled-coils. On one hand, understanding how these variations modulate the structure and function of proteins may improve prediction algorithms. On the other hand, the rational coiled-coil design can be achieved through a detailed understanding of the sequence-structure relationship 15 . However, a missing piece of the puzzle is how DMs perturb coiled-coil structures.

Results
DMs are depleted in coiled-coil regions and they are most often associated with central nervous system diseases. To obtain an overall picture of how disease-associated mutations (DM) and coiledcoils are related, we determined the relative frequency of DMs and PMs. We also calculated how proteins having coiled-coil regions are affected. We found DMs are less frequent in coiled-coils producing a 0.56 mean odds ratio, however coiled-coil containing proteins gather nearly the same amount of DMs as other proteins (Supplementary Material, Supplementary Fig. 1). Most coiled-coils (~ 95%) do not contain any variation, and the majority of variations occupy the coiled-coil segment alone. There is a non-significant trend showing a slight increase in the ratio of coiled-coil regions with multiple DMs compared to PMs ( Supplementary Fig. 2).
To reveal disease groups that are most often associated with DMs falling into coiled-coil regions, we calculated the number of occurrences of each disease category using DiseaseOntology. According to our analysis, the most enriched disease terms are skin diseases, muscular diseases, carbohydrate metabolic diseases, and central nervous system diseases (Supplementary Material, Supplementary Fig. 3).
Coiled-coils are often perturbed by DMs affecting charged residues. The main driving force of protein domain folding and stability is achieved through hydrophobic interactions. Coiled-coils are special structural units where the balanced contribution of hydrophobic interactions and electrostatic interactions aid the stability together. This is reflected by the different amino acid preferences of the different positions in the heptad repeat unit, corresponding to the distinct spatial position and role of these within the superhelical structure. To assess how residues are affected by variations, we grouped amino acids based on their basic physicochemical features (positive: HKR, negative: DE, hydrophobic: AILMV, other: CFGNPQSTWY), then we calculated the log ratio of the substitutions observed in DMs. Figure 1 shows preferred residue type changes in coiled-coils relative to other non coiled-coil regions of the proteome. We calculated the relative frequency of amino acid substitutions in the coiled-coil regions, and in the proteome, then calculated the log ratio of substitution frequencies. According to our results hydrophobic residues are targeted in similar proportions. However, in the case of coiled-coils, charged residues aiding electrostatic interactions are much more frequently affected by DMs (Fig. 1, right).
In contrast, several residue types indispensable for stable domain structure (e.g., cysteines forming disulfide bridges) do not influence coiled-coil formation, thus their replacement does not cause stability problems (Fig. 1,  www.nature.com/scientificreports/ left, for more details see Supplementary Fig. 4). The most prevalent changes in coiled-coil regions by DMs are replacements by oppositely charged residues. Interestingly, the negatively charged Glu and Asp are generally not interchangeable residues in coiled-coils, in contrast to the positively charged residues Lys and Arg. In coiled-coils DMs most likely target A, E, I, K, L, M, N and Q residues, as opposed to C, G and P residues being more often targeted in other proteins in the proteome ( Supplementary Fig. 4). Both the non-redundant and the full human proteome show similar trends (Supplementary Table 12).

DMs accumulate at the N-terminal region of coiled-coils.
We investigated the distribution of variations in coiled-coils, considering their coverage, abundance in the N-terminal region, and coiled-coil length. We divided coiled-coil regions into five equal parts, and calculated the proportion of variations in these parts.
Although the first half of the sequences contain slightly more DMs, the difference is not significant compared to PMs ( Supplementary Fig. 5).
To reduce bias originating from the varied length of coiled-coils, we performed the enrichment calculation considering only the first 28 residues of all coiled-coils, this time by dividing sequences into four equal parts, i.e., using seven residue bins-keeping in mind that predictors were optimized for heptad repeats ( Fig. 2A). Using this approach, the accumulation of DMs at the N-terminal became visible, showing a monotonous decline of DMs towards the C-terminal, however we could not confirm that this trend is significant.
To demonstrate that the first seven residues of coiled-coils contain significantly more DMs compared to the rest of the coiled-coil regions, we counted the number of DMs and PMs in the first seven residues of coiled-coils and in their succeeding part. The result is significant (χ 2 test, p < 0.01), and the odds ratio between DMs and PMs is 1.33 (Fig. 2B). This result is confirmed by all predictors independently ( Supplementary Fig. 6) and on each dataset ( Supplementary Fig. 7). To eliminate possible bias caused by shorter coiled-coil segments, we also shuffled the position of variations inside each protein, and calculated the same statistics. Using this approach no abundance is visible at the N-terminal, the variations randomly distributed along the sequence without any significant accumulation (p > 0.01 with all predictors) (Supplementary Table 7).
Notably, this effect is strong enough to influence the distribution of variations considering coiled-coils with different lengths. The relative frequency of DMs is significantly higher in shorter coiled-coils (Fig. 2C), as they utilize most of their residues as "N-terminal" segment that may contribute to stability, while in longer coiled-coils, other residues have a lesser role in sustaining the complex form. In contrast, PMs show uniform distribution in coiled-coil regions with different lengths. This effect is also visible on other datasets ( Supplementary Fig. 8), confirmed by all prediction methods ( Supplementary Fig. 9). It is arguable whether very short predicted coiledcoil segments (below 10 residues) are biologically relevant, however we did not want to tailor prediction outputs. Moreover, omitting the first bin only further strengthens our result.
We performed the same analysis around C-terminal residues, however according to our results the accumulation of DMs is only detectable at the N-terminal region (Supplementary Table 7).
Oligomerization state affects which register positions are vulnerable. The periodic property of coiled-coils enables a position type classification of residues, grouping amino acid positions based on their location in the helix, uncovering preferred physico-chemical features and interaction types. Regardless of oligomerization state, residues at "a" and "d" positions are often hydrophobic and face each other, forming the core of the complex, while "e" and "g" residues may be charged and promote stability via electrostatic interactions on the outer face of the superhelical structure.
We analyzed the distribution of variations on the different positions, considering different oligomerization states. As expected from early results, PMs are more abundant in every heptad position. However, considering DMs only, residues falling into "stabilizing" positions are more vulnerable to variations (Fig. 3, left). Interestingly, residue type changes affect the heptad positions differently: replacement of amino acids in "a" and "d" positions likely perturb the structure, even when the substitution seems conserving on the basis of physicochemical properties (i.e., variation replacing a hydrophobic residue with another one is also often harmful). In contrast, "e" and "g" positions seem to be slightly more resilient, and residue type change (i.e., charge change) is more often required to disrupt the structure. PMs change the residue type to a lesser extent.
The oligomerization state of the coiled-coil also affects its vulnerability. We have to add, the number of mutations on different datasets highly varies. The less sensitive method (Marcoil) on the less populated dataset (tetramers: ~ 14%) suggests there are 253 disease-associated mutations on this subset (mean 36.14 mutations on each register). The most sensitive method (Ncoils) on the most populated dataset (trimers: ~ 44%) suggests there are 1577 mutations on this subset (with a mean 225.28). Nevertheless, in general, antiparallel formations (both dimers and tetramers) are slightly more preferred targets of DMs. Oligomerization also influences which positions are modulated (Fig. 3, right): "e" and "g" (charged) positions are more often affected by DMs in parallel dimers, while "a" and "d" (hydrophobic) positions are primarily targeted in antiparallel dimers. Hydrophobic interaction promoting positions are less likely to be targeted by DMs in parallel dimers. When these positions are mutated, the mutation changes the type of the residue in almost every case, showing an opposite trend compared to other oligomerization modes.
Variations in trimeric and tetrameric coiled-coils are similar: in these cases, structures are most often perturbed via amino acids in "a", "g" and "e" positions and also often replace residue type. DMs on "d" positions are rare.
The different prediction methods show high agreement ( Supplementary Fig. 10). We also performed the same calculations on the full proteome, and on the random sampled non-redundant dataset ( Supplementary  Fig. 11 www.nature.com/scientificreports/ In general, there seems to be an opposing trend, that in positions where the DM frequency is lower, any change can carry disease, while in positions where the DM frequency is higher, the mutations more likely change the physico-chemical property of the residue.

Structure analysis reveals most DMs occur in homooligomeric coiled-coils with a subtle destabilizing effect.
To gain detailed insights on how DMs perturb the formation of coiled-coils, we searched for structures in the PDB and identified coiled-coil segments using SOCKET. Although the number of variations falling into characterized coiled-coil structures is rather low, and sometimes insufficient for performing reliable statistical tests to draw convincing conclusions, such analysis can open prospects to recognize interesting trends. www.nature.com/scientificreports/ First, we analyzed the distribution of DMs in different heptad positions. As the number of cases was low, we classified the positions into three categories: responsible for hydrophobic stabilization (a, d), electrostatic stabilization (e, g) and outward facing/solvent exposed (b, c, f). Disease-associated mutations are enriched on residues responsible for forming the hydrophobic core of coiled-coils, have nearly the same occurrence as PMs in positions reserved for charged residues, and show decline on outward facing residues (Fig. 4A). Although at first glance this does not seem to confirm prediction data where DMs have a higher frequency on 'e' and 'g' positions. However, this discrepancy is due to the very different composition of the two datasets with regard to oligomerization state: the most prevalent class of structures are two-stranded antiparallel coiled-coils, the only class where mutations on hydrophobic positions dominate in prediction data too (Fig. 3).
Next, we investigated whether N-terminal segments of the coiled-coils gather more variations. Although both types of variations (PMs and DMs) seem to accumulate in the first seven residues of coiled-coils, the two kinds of variations exhibit an opposing trend, with a higher frequency of DMs around the N-terminal and PMs in www.nature.com/scientificreports/ other residues (Fig. 4B). Moreover, while PM data is not significant, the distribution of DMs is slightly significant according to the χ 2 test (p < 0.1). Sequence data alone can be rather difficult to utilize for defining the monomeric or oligomeric state of coiledcoil assemblies, and predictions are also limited in detecting how many strands the coiled-coils are composed of. However, from structural data we can readily classify coiled-coils as monomers (both strands are part of the same protein, typically antiparallel coiled coils with a short linker between the two helices), homooligomers (interaction of identical proteins) or heterooligomers (interaction between different proteins). While PMs show a uniform distribution among these classes, DMs mainly occur in homooligomers (Fig. 4C). The rationale behind this can be that a single mutation might (but in a heterozygous case, not necessarily) affect multiple constituent helices simultaneously, so their effect is instantly multiplied, in contrast to heterooligomers and monomers where the interacting partner/segment does not amplify the impact of the mutation.
The energy change calculated by the introduced mutation can be used as an approximation of the contribution of a mutation to the overall stability of the coiled-coil. Figure 4D shows the calculated energy changes upon mutation in the proteome and in coiled-coil structures. Generally, the mean energetic contribution of PMs can outline the range of changes a protein can tolerate without damage. In both cases (proteome, and coiled-coil proteins), DMs have an average higher ΔΔG. However, in the case of coiled-coils, despite keeping the same trend, both variation types seem to have a lower effect compared to those of other proteins of the proteome.
We also performed the same analyses on the full structure dataset (where the full proteome was assigned to PDB structures). To reduce bias, we removed PDB: 2FXM (Myosin7) from the structures, as 18% of the variations belonged to this protein. On the full dataset, the accumulation of DMs on the first seven residues is visible, yet not significant, furthermore DMs on heterooligomeric proteins are more frequent. Other statistics are in agreement on the full structure dataset ( Supplementary Fig. 12, Supplementary Table 20).
Further context can be added by the joint analysis of structural data.  www.nature.com/scientificreports/ coiled-coils (80%), with most cases occurring at position 'e' . We mapped the variations to only one chain of PDB structures, thus the real energetic contribution of a mutation may be even more stabilizing in homooligomers, abolishing transient interactions. In contrast, most mutations affecting monomeric coiled-coils are definitely highly destabilizing (94%), suggesting greater energetic effect is required to disrupt the overall structure that also includes intrachain interactions outside the coiled-coil, in contrast to mutations of complexes where coiled-coil interchain interactions are the only forces keeping the complex together.

Discussion
The structural consequences of inherited disease-causing mutations is an often revisited topic 16 . Recently Mohanasundaram et al., investigated how DMs affect coiled-coils 14 . While they mostly focused on pleiotropy and irregularities, in this paper we focused on general patterns. The Mohanasundaram et. al. paper also quantified variations in different PFAM families. In contrast, here we performed an analysis of the non-redundant human proteome. Members of the same family share sequential and structural similarities and might carry out similar functions. For the same reason, these proteins also usually share their mutation hotspots, meaning diseaseassociated mutations emerge in their same regions. We performed redundancy filtering to rule out bias caused by counting the "same" mutation falling into the same domain regions in more populated families, and deemphasizing features of smaller protein families. Mohanasundaram et al. also investigated how heptad positions in coiled-coils are affected, however they relied on MarCoil alone. In contrast, we used four different predictors to assess the structural consequences of variations in coiled-coils, then extended our analysis by incorporating structural data and features responsible for the proper assembly of coiled-coil complexes. We found that DMs accumulate in heptad positions critical for the assembly of coiled-coils (in line with the findings published by Mohanasundaram et al. 14 ), N-terminal parts of coiled-coils are more abundant in DMs, and mutations mostly affect homooligomeric coiled-coils. Interestingly, in recent analyses some coiled-coil prediction methods showed a rather low accuracy and their result is sometimes contradictory 17,18 , however, based on the agreement of the distribution of variations predicted by different methods, they show balanced performance on our dataset.
Simple properties of targeted residues suggest how structure is impaired. Sequence properties are often used to characterize substitutions, as they often can be connected to structural changes. In this case, grouping amino acids based on their possible role in coiled-coil formation highlights the critical role of charged residues. While mutations on hydrophobic residues impair coiled-coil structures to the same extent as in the case of globular proteins, charge changes often perturb coiled-coil formation. Notably, not only the change of net charge influences coiled-coils, but residues bearing negative charges also do not seem to be interchangeable. This effect is attributable to the helix formation tendency of glutamic acid 19 that was also proposed in the case of single-α helices 20 . The most characteristic feature of coiled-coils is their repeated register position pattern. Steric clashes and loss of hydrophobic interactions dominate in 'a' and 'd' positions (Fig. 6, top, left), while the loss of electrostatic interactions mostly occurs in 'e' and 'g' positions ( Fig. 6, top, middle). Outward-facing residues can also carry essential roles sometimes: they can serve as outside staples that stabilize the alpha-helix by electrostatic interactions, or they can provide a binding site for other molecules (Fig. 6, top, right). For example, the ubiquitin-binding domain (UBAN), conserved in optineurin (OPTN) is part of a coiled-coil, specifically recognizing ubiquitin chains binding to the accessible surface of the coiled-coil 21 . The nuclear factor-κB (NF-κB) pathway plays an important role in regulating inflammation, adaptive and innate immune responses, and cell www.nature.com/scientificreports/ death via transcriptional targets, such as IL-1β 22 . In the canonical pathway, NF-κB factors are retained in an inactive state via binding to OPTN 21 . The E478G mutation in the UBAN of OPTN abolishes its NF-κB suppressive activity 23 , as residues involved in linear ubiquitin-binding correspond to the residues crucial for keeping NF-κB inactive 24 . The mutations result in significant up-regulation of IL-1β, causing neuroinflammation and neuronal cell death of motor neurons, leading to Amyotrophic Lateral Sclerosis 25 .

Mutations in coiled-coils influence protein function with different mechanisms. From a func-
tional point of view, mutations falling into distinct structural categories may have different effects. DMs harboring residues contributing to the hydrophobic core usually have an indirect consequence. In the first scenario, the effect of the mutation manifests outside the coiled-coil region. Desmins are large scaffolding proteins connecting the Sarcolemma, Z-discs, and the nucleus 26 . They consist of elongated coiled-coil regions, with a head and tail unit at their termini. Mutations in the coiled-coil regions disrupt the coiled-coil structure (e.g., DESM: L345P), eventually leading to the disorganization of Z discs and affecting the integrity of the cellular IF network 27 (Fig. 6, bottom, left). Mutations often impair coiled-coils directly, so they lose (some of) their binding affinity to molecules interacting with them. The H486R mutation in OPTN perturbs the structure of the UBAN domain and causes low-grade inflammation that leads to glaucoma 28 . However, in contrast to other mutants that were shown to have a direct role in interacting with ubiquitin, this mutated residue points inside the coiled-coil, and only reduces the binding affinity 29 (Fig. 6, bottom, middle). In the third scenario, mutations are occurring in the coiled-coil, however on a residue facing outward. An example of disruption of direct binding is the mutation affecting interaction of PIK3CA-PIK3R2-glycerol complex. PIK3R2 possesses a two-stranded coiled-coil and forms a heterodimer regulatory unit with PIK3CA via H-bond between N345 of PIK3CA and D557 of PIK3R2. Their complex structure also preserves a groove, providing a room for binding glycerol 30 , which is perturbed by the D557H mutation. The lost direct contact of Asp sidechain with the glycerol, as well as the lack of negative charge positioning the molecule (which is abolished with the positively charged and larger histidine) were proposed to impact binding negatively 31 . PIK3R2 was associated with Megalencephaly-Polymicrogyria-Polydactyly-Hydrocephalus 32 , although the molecular details of the disease were not revealed yet (Fig. 6, bottom, right). Besides perturbing structural stability and folding leading to toxic conformations, mutations may also modulate degradation or lead to improper trafficking 33 . For example, assembly of the Non-POU domain-containing www.nature.com/scientificreports/ octamer-binding protein is mediated via antiparallel coiled-coil domains and single-α helices 34,35 . The R293H mutation in the coiled-coil domain was shown to lead to subnuclear mislocalization and resulting in endocrinerelated tumors 36 . Putative link between N-terminal accumulation of DMs and trigger sequences. The different types of coiled-coils utilizing different strategies to achieve a folded state. Many studies already suggested that highly conserved sequence patterns (so-called trigger sequences) are responsible for initiating coiled-coil assembly: for example a seven residue highly conserved motif is required for the folding of the Human Macrophage Scavenger Receptor oligomerization domain 37 , and germline mutation in this region is associated with prostate cancer risk 38 . Another way to initialize assembly occurs during co-translation, as in the case of Peripherin including two-stranded parallel coiled-coils 39 , which also accomodate a disease-causing mutation at the N-terminal region in one of it's coiled-coil regions 40 . Cotranslational assembly generally occurs via N-terminally biased interaction domains 41 and a possible interpretation for the N-terminal accumulation of DMs might lie in the co-translational initiation of the folding and stabilization of α-helices as they emerge from the ribosome 42 .
Although abolishing the process likely affects superhelix assembly, this phenomenon only serves as an explanation for mutations in parallel coiled-coils. The critical role of terminal regions are also well-marked in antiparallel coiled-coils: SMC1 forms a complex with SMC3 via their globular N-and C-terminal domains. In both proteins the head and tail regions are connected by antiparallel coiled-coils, and most of the identified DMs gather at their beginning/end of the coiled-coil domains 43 . The proposed antiparallel intramolecular coiledcoil of KIF21A gathers several DMs, predominantly occupying the termini of the coiled-coil 44 , responsible for congenital fibrosis. Thus, although the exact molecular background was not revealed yet, there is a substantial amount of evidence supporting the critical role of certain segments in coiled-coils (trigger sites or terminal regions), with an underlied role of N-terminal residues.

Conclusion
A handful of popular methods are available to predict the effect of variations 45,46 or to highlight vulnerable regions in proteins 47,48 , yet most of these are based on purely statistical approaches. Methods incorporating structural information are largely limited to general features of PDB structures, or prediction of transmembrane domains or disordered segments, although no currently available methods incorporate features of coiled-coils. We showed that basic properties of coiled-coils, such as register position, oligomerization state and position along the region significantly influence the formation of coiled-coils. Since coiled-coil region prediction typically has short run times, we suggest that including such data into state-of-the-art predictors to increase their accuracy would be feasible.

Methods
Datasets. The human proteome was downloaded from UniProt 49 , germline variations were obtained from humsavar 4 (Supplementary Table 1). For redundancy filtering CD-HIT 50 was applied on the human proteome in an incremental manner, filtering identical proteins to 90, 70, 50 and finally to 40% identity using 5, 4, 3 and 2 word lengths, respectively (Supplementary Table 2). We performed the analyses on the "non-redundant human proteome", on the "full human proteome". Moreover, we also performed "random sampling on the nonredundant dataset", by selecting 80% of the data 100 times. Differences between the results of various datasets are highlighted in the text.
Coiled-coil predictions. Coiled-coil regions were determined using DeepCoil 51 , MarCoil 52 , Ncoils 53 and Paircoil 54 (Supplementary Table 3), applying default cutoff values suggested in their descriptive articles. In the case of DeepCoil we utilized the 'PSSM' flavor: we generated PSSM for each sequence, using PSI-BLAST with three iterations and 10 −5 e-value cutoff on the SwissProt database. Coiled-coil heptad positions were predicted using MarCoil, Ncoils and Paircoil (Supplementary Table 4). Oligomerization states were defined using LogiCoil (Supplementary Table 4). Single-α Helix regions 55 were used as a filter, to reduce false positive hits (Supplementary Table 5). All statistics were calculated independently, using the appropriate predictors-i.e., amino acid substitutions and distribution of variations along the sequence with DeepCoil, MarCoil, Ncoils and Paircoil; impact on heptad positions state by MarCoil, NCoils and PairCoil; distribution in different oligomeric states by LogiCoil (using MarCoil, NCoils and Paircoil as input).
Each time we also calculated the mean value of the results of different predictors-these results are shown in the main text. If there were differences between the results of the applied methods, we noted it in the main text.
Statistical tests. χ 2 tests were performed in contingency tables (Table 1). Odds ratios were defined as: Enrichments on Supplementary Fig. 1 were defined as: χ 2 was applied to find the significance of the relation between DMs and coiled-coils (Supplementary Table 6) and the significant importance of the first seven residues of coiled-coil sequences (Supplementary Table 7).
To estimate the significance of residue changes, we eliminated the sporadic error of the data by performing bootstrap analysis. We randomly selected 80% of the data 100 times and the significance was determined by calculating the average and standard deviations of the data according to the 68-95-99.7 rule (Supplementary Table 11 -12). χ 2 was used to find the significance of the distribution of DMs into different coiled-coils positions (Supplementary Table 13).
All tests and analysis were performed to the different predictors separately. To produce figures, in each case, we calculated the mean of different predictors (Supplementary Table 6-13).
DiseaseOntology term analysis. Disease ontology terms were mapped using MIM identifiers from humsavar and DiseaseOntology 56 . Only identifiers linked to DMs, where all methods predicted coiled-coil were used. For the analysis the top three level of the ontology was applied, and the number of mutations were counted in each disease category-only terms occurring in coiled-coil containing proteins are shown in Supplementary  Table 14, and only terms responsible for at least 5% of all annotated diseases shown in Supplementary Fig. 2.
Next we mapped all mutations in a similar manner. Expected values were calculated by normalizing these numbers on each term with the proportion of all coiled-coil mutations.
Assigning structures to amino acid sequences. We used BLAST on sequences from the non-redundant human proteome against the PDB with 10 −5 e-value. Chimeric proteins were discarded. We used the greedy algorithm to select structures with 100% identity, with the most variations mapped on them (Supplementary Table 15). On all PDB structures we considered biomatrix transformations as defined in the PDB files to detect all possible coiled-coils.