Global computational mutagenesis of domain structures associated with inherited eye disease

Multidomain proteins account for 70% of the eukaryotic proteome. In genetic disease, multidomain proteins are often affected by numerous mutations, but the effects of these mutations on protein stability and their roles in genetic disease are not well understood. Here, we analyzed protein globular domains to understand how genetic mutations affect the stability of multidomain proteins in inherited disease. In total, 291 domain atomic structures from nine multidomain proteins were modeled by homology, equilibrated using molecular dynamics in water, and subjected to global computational mutagenesis. The domains were separated into 7 groups based on protein fold homology. Mutation propensities within each group of domains were then averaged to select residues critical for domain fold stability. The consensus derived from the sequence alignment shows that the critical residues determined by global mutagenesis are conserved within each group. From this analysis, we concluded that 80% of known disease-related genetic variants are associated with critical residues and are expected to have significant destabilizing effects on domain structure. Our work provides an in silico quantification of protein stability and could help to analyze the complex relationship among missense mutations, multidomain protein stability, and disease phenotypes in inherited eye disease.

The proteins were split into individual domains and then divided into 7 groups by homology. These domains were epidermal growth factor-like (EGF-like), laminin globular (laminin-G), sushi, immunoglobulin-like C2-type (Ig-like C2-type), fibronectin type-III, cadherin, and transforming growth factor-beta (TB). In total, the 291 protein domain structures were each individually homology-modeled (Supplemental Table S1), equilibrated using 2 ns molecular dynamics in water to achieve better domain stereochemistry, and subjected to global computational mutagenesis using the UMS 13 to evaluate the effect of mutations on their protein stability. Panels A-I images were prepared using the SMART (http://smart.embl-heidelberg.de/).
The disease-mutation data for each protein were then quantified using unfolding propensities, which were retrieved for each mutation from its appropriate unfolding matrix. These data revealed that most inherited-disease-causing mutations are associated with large destabilizing effects (unfolding above 0.9), consistent with the literature 16 . To quantify the overall pattern of mutation changes between similar domains, the unfolding propensities of homologous domains within each protein were averaged to filter out noise related to structural domain variations. This procedure isolates residues that have a higher propensity for protein structure destabilization and are therefore critical for protein stability. Mapping of disease-causing mutations and identification of critical residues shows correspondence between residues identified as critical in silico and residues associated with disease-causing mutations. Mutations in critical residues are associated with a wide range of inherited diseases, revealing the critical roles of these residues in protein structure and stability. Furthermore, the sequence www.nature.com/scientificreports www.nature.com/scientificreports/ alignment of each domain group reveals conservation of critical residues, with 50% of all conserved residues predicted to be critical residues. These results support a critical stability framework of residues that is conserved across domains to provide essential components of protein structure and stability.
Quality of domain structures. Domain structures were modeled as described in the Methods section to ensure viable, stable structures with proper stereochemistry. To assess the quality of the improved models, they were subjected to an internal control designed to verify the quality of the side-chain rotamers of the model 13 , and Ramachandran plots were produced to verify plausible dihedral angles 17 .
An internal control was used to create self-mutated structures for each residue in the protein sequence. Quality models subjected to the internal control were expected to produce confidence intervals centered on 0.5 with small p-values (~10 −2 ). Supplemental Table S2 contains the internal control values for the domains modeled, and Supplemental Fig. S1 shows the distribution of p-values produced by the internal control. More than 75% of the domain models produced p-values of 0.05 or smaller. Potentially, the quality of these domains individually could be improved using longer periods of molecular dynamics. However, this work would require significant additional computational time to perform this kind of simulation for all domains. Averaging the domain propensities offsets this shortcoming and serves to improve the overall descriptors of domain structure stability.
The structures were further validated using Ramachandran plots to illustrate the distribution of backbone dihedral angle data, and the plots are included in Fig. 2 for the 7 domain groups studied. The Ramachandran plots shown have concentrated backbone dihedral angles in the energetically allowed region. Additionally, the Ramachandran plots in Fig. 2 are accompanied by structurally aligned domain structures to show the consistency and reliability of the structures built using homology modeling.
Due to computing limitations brought about by Chimera, only a subset of 15 domains from the cadherin and EGF-like domains could be studied using structural alignment. Therefore, only 15 domains were shown in Fig. 2 for each group, to allow better visualization of domain superposition. These 15 domains were selected at random to reduce bias. To ensure the 15-domain representative sample was a valid representation of the whole set, 5 different 15-domain subsets of EGF-like domains were structurally aligned. The alignments produced by these www.nature.com/scientificreports www.nature.com/scientificreports/ subsets were compared upon their primary structure alignment and secondary structure alignment. The alignment of these was determined to be similar enough across subsets to validate the methodology proposed in the manuscript. Once this approach was validated, 15-domain subsets of the EGF-like and cadherin domains were selected for this study. Thus, the Ramachandran plots, internal control, and structural alignment of each domain set validated our modeled structures.

Disease-related mutant variants.
Considering that many inherited human disorders are believed to arise from protein destabilization 16 , the effect of disease-causing mutations on domain stability was quantified. A number of disease-causing missense mutations were retrieved from the HGMD (http://www.hgmd.cf.ac.uk/) for each of the nine multidomain proteins ( Table 2). Mutations that did not map to the individual protein domains were excluded from our analysis. The number of retrieved missense mutations is shown in Table 1. For each mutation, the propensity of domain destabilization was obtained from the unfolding propensity data matrix produced by the UMS. The average unfolding fraction of all missense mutations ranged between 0.59 ± 0.26 and 0.82 ± 0.22. Mutations with predicted unfolding fractions above 0.9 were categorized as severe-destabilization variants. Medium-and low-destabilization mutations resulted in propensities below 0.9.
We found that the average unfolding propensity due to disease-causing missense mutations for all proteins was 0.71 ± 0.25, a value that corresponds to a medium-level destabilizing effect. The list of disease-causing mutations was further filtered by selection of mutations with high degrees of unfolding (>0.9). The total number of disease-related mutations with predicted severe destabilization effects on domain structure ranged from 13% (FAT4) to 57% (FBN1) and was on average 33%.
The data show that 82% of all severe-destabilization mutations occur in residues considered critical for protein structure. The percent of severe mutations mapping to critical residues ranged from 64% to 100%, revealing a correlation between the residues implicated in disease and those believed to be critical for the structure of individual domains and the whole protein. Moreover, on average, critical residues associated with disease-causing mutations had foldability scores of approximately 14.
In Fig. 3, a laminin-G domain (top: A and B) and an EGF-like domain (bottom: C and D) of EYS are shown. The structures are colored by foldability (left) and by severity of disease-causing mutations (right). The mutations identified on the right are colored by the corresponding unfolding propensity. On the left, high-foldability residues are red. Overall, 95% of severe mutations occurred in high-foldability residues ( Table 2). Of the 155 known disease-causing mutations, 34% occur in EGF-like domains, and 32% occur in laminin-G domains. Additionally, Supplemental Figs S2 through S6 contain instances of each domain type for ROBO3 (S2), CFH (S3), FBN1 and FBN2 (S4), FAT1 and FAT4 (S5), and CDH23 and PCDH15 (S6). The structures are colored by foldability and www.nature.com/scientificreports www.nature.com/scientificreports/ disease-causing mutations, and these results support our findings that severe mutations are associated with high-foldability residues in many cases.
The identification of protein missense mutations associated with disease phenotypes showcases the relationship between protein residues associated with disease and individual domain residues that are believed to be critical for domain stability. The results above suggest that stabilizing protein structure residues are critical for maintaining proper folding in individual protein domains.
Critical residues are conserved across homologous domains. The foldability of identically conserved and similarly conserved residues was examined to assess the relationship between residue conservation and foldability. Supplemental Table S3 and Figure S7 show that 52% of all conserved residues in the seven domain groups were considered critical. By contrast, only 34% of all nonconserved residues in the domain sets were described as critical.   . The foldability scale ranges from 0 to 20, with low-foldability residues shown in blue and high-foldability residues shown in red. The tan residues on the right structure are not associated with any disease-causing mutations, while green, yellow, and red residues correspond to low-, medium-, and high-destabilization mutations. The unfolding parameter ranges from 0 to 1. Mutations colored in purple do not have an associated unfolding propensity.
www.nature.com/scientificreports www.nature.com/scientificreports/ In Fig. 4, an example from each group of domains is shown colored by residue conservation (left) and foldability (middle). Each domain sequence is also represented as a secondary structure plot (right), highlighting critical residues in the domain sequence that correspond to secondary structure components. Conserved residues in EGF-like domains (Fig. 4A) were found to have high foldability in 41% of cases. The secondary structure components show a critical role for six cysteine residues, which form disulfide bridges. The conserved residues in laminin-G domains were high-foldability residues in 56% of cases (Fig. 4B). A single disulfide bridge is present in the domain structure of laminin-G, and the cysteine residues that form this bridge are also critical. The alignment of 105 cadherin domains revealed that 49% of conserved residues were also high-foldability residues (Fig. 4C). The TB domain alignment showed that 45% of conserved residues were high-foldability residues (Fig. 4D); the secondary structure components show four disulfide bridges, and all the cysteine residues involved are considered critical. The conserved residues of the sushi domains of CFH had 57% correspondence with high-foldability residues (Fig. 4E), and the cysteine residues involved in the two disulfide bridges of the domain structure were all considered critical. The conserved residues of the Ig-like C2-type (Fig. 4F) domains of ROBO3 were associated with high foldability in 56% of cases, while the conserved residues of the fibronectin type-III domains (Fig. 4G) of ROBO3 were associated with high foldability in 62% of cases. The disulfide bridge present in the Ig-like C2-type domain structure corresponded to critical cysteine residues. Table 3 shows the differences in average foldabilities of conserved and nonconserved residues for each domain type.
Bar graphs highlighting identically conserved residues and similarly conserved residues over the length of the domain sequence are shown for each domain in Fig. 5. Figure 5A shows the alignment consensus of the 132 EGF-like domains. Overall, 40% of conserved residues were considered critical, while only 25% of nonconserved residues were critical. Figure 5B contains the consensus of the 8 aligned laminin-G domains; 57% of conserved residues were described as critical, and only 32% of nonconserved residues were categorized as critical residues. Figure 5C shows the consensus of the alignment of 110 cadherin domains present in the proteins PCDH15, FAT1, FAT4, CDH23, and CDH3. The consensus showed higher conservation of nonpolar residues towards the middle of the sequence. Aligned residues were considered critical in 45% of cases, and only 39% of nonconserved residues were considered critical. The 18 TB domains of FBN1 and FBN2 were aligned, and 45% of aligned residues of TB domains were also described as critical domains; only 23% of nonconserved residues were described as critical. The alignment consensus of CFH can be found in Fig. 5E. Of all aligned residues, 60% were considered critical. Lastly, the alignment consensus of Ig-like C2-type domains and fibronectin type-III domains from ROBO3 can be found in Fig. 5F,G, respectively. For Ig-like C2-type domains, conserved residues were critical in 52% of cases, and nonconserved residues were critical in 47% of cases. Conserved residues of fibronectin type-III domains had 62% correspondence with critical residues, while nonconserved residues were critical in only 30% of cases. Overall, although the percentage of conserved critical residues for each domain set varied greatly, critical residues were more often associated with conserved residues.

Discussion
In this work, we sought to find determinants of multidomain protein stability. Multidomain protein structure estimation has remained elusive due to the shortage of suitable experimental and computational methods designed to handle such large proteins. The lack of defined structures for multidomain proteins hinders our ability to understand the relationship between protein stability and disease. Using in silico methods, we developed an add-on to the previously published UMS 14 . Nearly 300 domains from 9 multidomain proteins associated with inherited eye disease were homology-modeled and subjected to global computational mutagenesis. This mutagenesis produced unfolding propensity data for all domains studied and a measure of protein stability derived from domain stability for proteins that have otherwise been largely unstudied.
The multidomain UMS identifies the effects of all possible missense mutations on globular domain structures and defining a foldability parameter allows us to identify residues that provide critical stability for proper domain folding. The method developed here uses free energy measures to quantify domain stability (as an unfolding propensity) in response to genetic perturbations, and these descriptors of domain stability are integrated to describe multidomain protein stability. Each domain was subjected to the UMS, which produces unfolding propensity data and stores it in a 2D matrix. Unfolding data matrices for sets of homologous domains of each protein were averaged to obtain consensus patterns of domain stability and filter noise associated with structural dissimilarities and modeling errors. A foldability parameter was derived from the unfolding propensities as a measure of the sensitivity of a residue to missense mutations; a high foldability parameter would correspond to a residue that, when mutated, causes significant destabilizing effects on the domain structure. Residues that are consistently critical for domain structure across all homologous domains were identified. Amino acid residues with high foldabilities across most of each homologous protein domain were determined to be residues critical for protein stability.
The determination of protein stability allowed us to investigate the relationship between protein stability and disease-causing mutations. Disease-related mutations were described by the associated unfolding values to gain an understanding of the effect of disease-related missense mutations on the stability of the domain and subsequently, the entire protein. The proteins described were found to have missense mutations, which resulted, on average, in a destabilizing effect measured as 0.7 unfolding. The distribution of unfolding propensities shows that only ~30% of disease-causing mutations have 'severe' destabilizing effects on domain structure, while the remaining 70% of mutations had primarily destabilizing effects.
The sequence alignment produced for each set of domains reveals that a high percentage of conserved residues are considered critical. The data show that nearly 50% of conserved residues are critical for each domain, while nonconserved residues are described as critical in only 30% of cases. These results suggest an apparent relationship between conserved residues and those acting as determinants of protein stability. In part, this observation agrees with our previous results on strong correlation between the protein sequence conservation index and foldability determined for 9 eye disease-related proteins 15,18 . However, a comparison of average foldabilities www.nature.com/scientificreports www.nature.com/scientificreports/ of conserved and nonconserved residues of each domain type as shown in a Table 3 do not support this result. Although the average foldability of nonconserved residues was found to be lower than the average foldability of conserved residues for 5 domains, the differences do not show the same trend for EGF-like domain and Ig-like domain.
Protein polypeptides fold to produce a native protein structure within approximately 1 to 30 ms 19 , but simulations of protein folding for such lengths are computationally expensive. At 2 ns, only very early events of protein destabilization can be modeled. Although these very short simulations may not represent a model for the folding-unfolding process, we assessed the proteins using an internal control designed as a measure of protein model quality. The distribution of p-values from each domain internal control, shown in Fig. S1, includes the percentage of domains with p-values less than 10 −2 . Although a well-refined model should have a p-value with a magnitude of 10 −5 or better, domains with p-values smaller than 0.05 were still considered attainable structures in our application. In addition, Ramachandran plots of homologous domains showed dihedral angles primarily in energetically allowed regions, which allowed us to appraise these structures as representative models of domain structures.
Protein domain structures and functions are, to some extent, conserved between homologous domains in different proteins. The seven homologous domains from the 9 selected proteins were studied to inspect which residues are mostly conserved across proteins, as well as the relationship between conserved residues and critical residues. For each domain, conserved residues were also classified as critical residues. Over 600 disease-causing mutations were associated with severe destabilization effects on domain structures, and approximately 30% of the disease-causing mutations occurred in conserved residues that were considered critical across all proteins studied. These results highlight the importance of residues described as critical for proper protein folding across many different protein families and establishes their importance for healthy phenotypes.
An evident restriction of our current approach is the lack of domain-domain interactions that may favor stability. It is known that in multidomain proteins, domain-domain interactions aid in stabilizing domains 5 . However, since it is also known that only a fraction of all domain residues are involved in interactions with other domains 7 , our approach remains valid. In the future, we hope to develop a multidomain UMS application that incorporates stability derived from domain assembly, since correctly arranged domain structures are often crucial for a full understanding of the functions of multidomain proteins 20 .
Within eukaryotes family, about 67% are multidomain proteins 21 . Multi-domain proteins vary significantly in molecular weight. For example, proteins analyzed in this work (Fig. 1) are changing their molecular weight from 139 kDa (CFH) to 507 kDa (FAT1). The largest multi-domain protein, titin from human muscle has 34,000 residues in a protein sequence, the molecular weight of 3816 kDa, and includes about 132 fibronectin type III and 152 Ig-type domains (Uniprot, # Q8WZ42). Many multi-domain proteins are difficult for the structural analysis and their full-length protein atomic structures currently not available.
Protein databases contains structures of many small domains, which are building blocks of multi-domain proteins. This information could be used for the computational analysis of the domain protein stability. Here we assume that multi-domain protein is a combination of domains, each of these is an independent folding unit. The assumption about the domain independence could be in question if there is a wide-range interaction between neighbor domains within a chain. Indeed, recently it was shown that domain folding for some proteins might depend on inter-domain interface or linker length and flexibility 7 . This consideration is limiting the analysis of single domains and requires the application of global mutagenesis for the analysis of whole multi-domain protein (not for independent folding units). The availability of multi-domain protein atomic model is the only limitation for the global mutagenesis analysis. The analysis is standard for our method and numerous protein structures were already analyzed using this technique (https://neicommons.nei.nih.gov/#/proteomeData).
Two large proteins, FBN1 and FBN2, form fiber-like multi-domain structures 22 , each is composed of 47 EGF-like domains interrupted by 9 TB domains (Fig. 1). From crystal data (PDB files: 2BO2, 2BOU, 2BOX) we might expect small interfaces and very short linkers between domains suggesting independent folding of EGF-like domains that agrees with the analysis the disease-causing mutations 23,24 . TB domains are included in a sequence of protein domains and separated by linkers of 6 and 18 residues at the N-and C-termini from other domains, respectively. This also indicates that folding of TB domains could be independent. Loosely packed small interfaces also are expected for other proteins such as Laminin-G domain (PDB file: 1H30), Cadherin domains (PDB files: 1L3W,2NCJ, 2MVS), and Sushi domain (PDB file: 2QFG). Two domains, Ig-like type and fibronectin type III domains, are independent folding units 7 .  Table 3. Average foldability of conserved and nonconserved residues of each domain type.
www.nature.com/scientificreports www.nature.com/scientificreports/ Individual domains of the multi-domain protein are synthesized consecutively in vivo by ribosome. Protein biosysthesis starts from protein N-terminus by a mechanism known as co-translational folding 25 . During protein synthesis, the nascent peptide travels through the polypeptide exit tunnel of the ribosome, which has a length of about 100 Å and covers about 30-40 amino acids of the nascent peptide in a fully-extended conformation 25 . Smaller structures such as secondary structure elements can form within the tunnel. However, larger structural domains can be formed when the protein domain appears from the peptide exit tunnel of the ribosome 26 . www.nature.com/scientificreports www.nature.com/scientificreports/ Synthesis of whole multi-domain protein might require a significantly longer time (4-16 minutes for the proteins from Table 1) if assume that protein synthesis performed with a rate of 50-300 residues/ min for cell-free systems and somewhat faster in vivo 27 . The secondary structure formation and protein domain compaction might require less than 1 s 28 .
The pathogenic effect of a genetic mutation on will destabilize protein domain folding at very first stages of protein synthesis when isolated protein fragment containing the mutation is localized to either a ribosomal tunnel or a tunnel exit (for larger domains). The domain will be relatively 'isolated' from other parts of the multi-domain protein and the perturbation caused by mutation could be analyzed computationally.
The automation of the multidomain UMS reduces the need for human labor and has the potential for cloud-based applications. If the domains and unfolding mutation data are prepared in advance, stability information about a protein can be retrieved virtually instantaneously. The data produced in this work are freely available at https://neicommons.nei.nih.gov/#/proteome.
In conclusion, the multidomain UMS pipeline developed in this work allows us to predict residues and regions in multidomain proteins that are critical for protein structure and function. In the future, a knowledge of critical residues will allow us to examine the relationship among mutations in multidomain proteins, effects on protein stability, and disease phenotypes.

Methods
Molecular modeling. Protein domain, sequence, and mutation information for 9 multidomain proteins are shown in Table 1. Protein amino acid sequences and domain ranges were automatically retrieved from the UniProt database (http://www.uniprot.org/) using the accession ID numbers and a python script. The domain sequence for each protein can be found in Fig. 1, which shows the variability in domain number and structure of the nine proteins.
Each protein domain was generated by homology using the molecular graphics, modeling and simulation program YASARA 29 . A homology modeling experiment in YASARA uses the target protein sequence to identify possible structural templates by running 3 PSI-BLAST iterations to extract a position specific scoring matrix (PSSM) from UniRef90. The Protein Data Bank (PDB) is then searched for a match, and YASARA builds models for each matched template. For each template, if the alignment is unambiguous, a single model is built. If the alignment is ambiguous, several alternative models may be built. A 'quality Z-score' is calculated from the molecular dynamics force field energies, capturing the correctness of the backbone-and side-chain dihedrals as well as packing interactions. The models are ranked by their overall quality Z-scores, and the best parts of each model are combined to obtain a hybrid model, in hopes of synergistically increasing the accuracy.
The obtained 3D atomic structure for each domain was equilibrated using molecular dynamics in water using the standard macro 'md_runfast' in YASARA. The macro is optimized with the LINCS algorithm to run accurate molecular dynamics with a 'fast' speed setting at a 2 × 2.5 fs time step. Structures are placed in a cubic simulation cell extending 20 Å around the domain structure. The AMBER14 forcefield is used with a periodic cell boundary, and long-range electrostatics used a particle mesh Ewald algorithm with an 8.0 Å distance cutoff. The macro is preset to achieve pH 7.4 and 0.9% NaCl (153 mM) concentration at 25 °C. Sodium and chlorine ions are placed at the locations of the lowest and highest electrostatic potentials until the cell is neutralized. The location of the counter ions does not matter in practice, as they will randomly diffuse away through the simulation 15 . The simulation aims to reach a water density of 0.997 g/l and will adjust the pressure accordingly to obtain the previously stated parameters. The simulation frames are saved every 250 ps. The domain structure is subjected to 2 ns of fast molecular dynamics ('md_fast.mcr') until the structure is determined to be feasible. A total of 291 domains were modeled. Supplementary Table S1 shows the templates used to build the structures.

Global mutagenesis.
The domains were then subjected to the UMS to generate unfolding propensities for all possible missense mutations, which were organized into a 2-dimensional matrix 12 . The identity mutation assessment 13 was applied to these models. In this method, a residue is mutated to itself. Since no significant changes in structure should occur under a self-mutation, the stability and free energy should remain the same. The mean, standard deviation, p-value, and 95% confidence interval of each identity mutation in the protein sequence were calculated to provide a measure of the quality of each domain. The protein domain structures, disease information, results of global computational mutagenesis, and disease-related mutations are available at the Ocular Proteome website (https://neicommons.nei.nih.gov/#/proteome).
Foldability. Foldability is a parameter first described by McCafferty and Sergeev 12 ; this parameter is used to describe the sensitivity of each location to missense mutations. Foldability is calculated for each alignment position to evaluate the frequency of severe mutations for each location through summation of all unfolding propensities greater than 0.9. The foldability scale ranges from 0 to 19, where a foldability of 0 represents an alignment position at which residues can be mutated without significant effects on structure, and 19 represents a residue that results in severe destabilization when mutated.
Critical residues. Critical residues for each protein were first described by McCafferty and Sergeev 13,14 as the residues with the highest foldabilities. In our work, foldability is calculated from an aligned ensemble of residues. At each alignment position in the aligned FASTA file (AFASTA), a p-value is calculated using analysis of variance (ANOVA). This p-value quantifies the variability in the unfolding propensities of all residues averaged. Here, critical positions are described as the AFASTA alignment positions with the highest foldability values (>10) and p-values below 0.05. High foldability values identify positions that result in severe protein destabilization, while the accompanying p-value describes the variability at that position. AFASTA alignment positions that invariably www.nature.com/scientificreports www.nature.com/scientificreports/ result in high destabilization (high foldability, small p-value) are determined to be critical across all domains, creating a critical-residue framework that provides stability across all domains.

Multidomain UMs algorithm.
A python/bash script was developed to automate the processing of protein domain data. The code is available on request, and the method design is shown in Fig. 6. Given a UniProt accession ID, the multidomain UMS algorithm can retrieve protein domain information, model the domains, and subject them to the UMS (1 and 2 in Fig. 2). Homologous domains are then structurally aligned, and using the structural alignment produced by UCSF Chimera, the unfolding matrices produced by the UMS are averaged to generate a unique unfolding matrix for each domain type. P-values of the averaged unfolding propensities at each alignment position were calculated using ANOVA. The averaged unfolding matrix of each domain type is used to calculate the foldability of the residues comprising the domains. The foldability is written into an attribute file for each domain, which is used to color all domain structures using UCSF Chimera. Critical residues are identified using the criteria described in the Methods. Domain superposition. The structures of the homologous domains of each protein were superimposed using UCSF Chimera 30 , and the sequence alignment was output as an AFASTA file based on the structural superposition. To increase the accuracy of critical-residue identification, the unfolding propensities of the aligned residues of each domain were averaged for each domain set. At each AFASTA alignment position, a p-value was calculated from the distinct unfolding propensities of each residue averaged using ANOVA.
Homologous protein domains was aligned using PROMALS3D 31 , a server that constructs alignments for multiple protein sequences. The alignment of each domain set provides an alignment consensus that identifies the aligned residues identical within each domain (identically conserved residues), as well as positions in the domain sequence where similar residues are conserved (similarly conserved residues). The averaging of domain structures removes the noise introduced by each structure and illuminates residues with consistent unfolding effects across domain structures.

Data Availability
Ocular proteome protein domain structures and UMS data from this work available at the NEI Commons website (https://neicommons.nei.nih.gov/#/proteome).