Design principles of gene evolution for niche adaptation through changes in protein–protein interaction networks

Carmi, Gon; Tagore, Somnath; Gorohovski, Alessandro; Sivan, Aviad; Raviv-Shay, Dorith; Frenkel-Morgenstern, Milana

doi:10.1038/s41598-020-71976-x

Download PDF

Article
Open access
Published: 24 September 2020

Design principles of gene evolution for niche adaptation through changes in protein–protein interaction networks

Gon Carmi¹^na1,
Somnath Tagore^1,2^na1,
Alessandro Gorohovski ORCID: orcid.org/0000-0002-4126-9781¹^na1,
Aviad Sivan¹,
Dorith Raviv-Shay¹ &
…
Milana Frenkel-Morgenstern ORCID: orcid.org/0000-0002-0329-4599¹

Scientific Reports volume 10, Article number: 15628 (2020) Cite this article

1264 Accesses
1 Citations
3 Altmetric
Metrics details

Subjects

Abstract

In contrast to fossorial and above-ground organisms, subterranean species have adapted to the extreme stresses of living underground. We analyzed the predicted protein–protein interactions (PPIs) of all gene products, including those of stress-response genes, among nine subterranean, ten fossorial, and 13 aboveground species. We considered 10,314 unique orthologous protein families and constructed 5,879,879 PPIs in all organisms using ChiPPI. We found strong association between PPI network modulation and adaptation to specific habitats, noting that mutations in genes and changes in protein sequences were not linked directly with niche adaptation in the organisms sampled. Thus, orthologous hypoxia, heat-shock, and circadian clock proteins were found to cluster according to habitat, based on PPIs rather than on sequence similarities. Curiously, "ordered" domains were preserved in aboveground species, while "disordered" domains were conserved in subterranean organisms, and confirmed for proteins in DistProt database. Furthermore, proteins with disordered regions were found to adopt significantly less optimal codon usage in subterranean species than in fossorial and above-ground species. These findings reveal design principles of protein networks by means of alterations in protein domains, thus providing insight into deep mechanisms of evolutionary adaptation, generally, and particularly of species to underground living and other confined habitats.

Abiotic and past climatic conditions drive protein abundance variation among natural populations of the caddisfly Crunoecia irrorata

Article Open access 23 September 2020

A network approach to elucidate and prioritize microbial dark matter in microbial communities

Article Open access 22 September 2020

Unraveling the functional dark matter through global metagenomics

Article Open access 11 October 2023

Introduction

Subterranean animals represent an excellent model for studying the evolution of adaptation to life underground and its stresses, generally associated with life in confined environments, such as dry- and dump-woods and caves. These animals spend their entire lives below ground. As such, they experience relatively stable fluctuations in temperature and humidity, yet face multiple stresses, such as darkness, hypoxia, hypercapnia (high levels of carbon dioxide), and multiple pathogens^1,2,3,4. Fossorial animals inhabit both underground and aboveground habitats, with varying amounts of time spent in each^5,6,7. Thus, comparing subterranean animals with fossorial and aboveground animals offers a prime opportunity for studying evolution in the face of environmental stresses^8,9. Although underground-dwelling organisms have been extensively studied^1,2,10,11,12, the evolution of their cellular networks and protein–protein interactions (PPIs), particularly those involving stress response genes, remains elusive. While extreme changes in habitat may affect protein sequence, structure, and function, the impact of such changes on corresponding cellular networks has not been studied in detail. According to the domain-oriented view, proteins are built from a set of domains corresponding to conserved regions with distinct functional and structural characteristics^13,14,15. As might be expected, rearranged domain combinations (via exon shuffling or mixing) may result in the emergence of new PPI networks (as occurred during metazoan evolution). The evolutionary pressure of niche adaptation is assumed to act upon random changes in gene expression. Here, we considered an alternative view whereby functional properties of proteins within defined PPI networks can be direct selected by such evolutionary pressure.

Our previously developed ChiPPI¹⁵ predictive tool is based on the integration of true PPI data from BioGrid (release 3.4.163)¹⁶, a database of experimentally verified PPIs, and the protein domain content of the interacting proteins. ChiPPI utilizes protein domains of interacting proteins to predict interactions of orthologous proteins. In the current study, we used ChiPPI to identify changes in PPI networks that occur upon switching protein domains within otherwise conserved orthologous proteins. Accordingly, we found that every change in protein sequence or domain content in an orthologous protein throughout evolution resulted in advantageous addition to or disruption of a PPI network. The ChiPPI tool is designed to interpret and represent such changes as alterations in PPI networks. We thus predicted all PPI networks of 32 species living in three broad ecological niches, namely, subterranean (terrestrial and aquatic caves, woods and underground), fossorial, and aboveground (terrestrial and aquatic niches, such as rivers) habitats.

Our efforts revealed that the functional expression of genetic change is mostly associated with changes in PPI networks as species adapt to a new niche, rather than with changes in protein sequences. Since niche adaptation likely requires changes in cellular functions that regulates heat, oxygen, carbon dioxide levels, and light, we studied the PPI networks of the relevant stress response proteins using ChiPPI. Our findings infer that organisms adapt to their environment largely by species-specific alterations in PPI networks, and by "shuffling" (or "mixing") protein domains, rather than by point sequence mutations. Orthologous hypoxia, heat-shock, and circadian clock proteins were found to cluster according to their corresponding broad ecological niches (i.e., subterranean, fossorial or aboveground), based on PPI conservation, rather than by protein sequence conservation. Interestingly, we found that over the course of evolution, "ordered" domains (domains with defined 2-dimension (2D) or 3D structure) were preserved in aboveground species, while "disordered" domains were conserved in subterranean organisms. Moreover, we found that genes encoding proteins with disordered regions presented adapted non-optimal codon usage. Accordingly, such proteins form at least 35% fewer PPIs than do abundant proteins with ordered and mixed regions. Furthermore, subterranean proteins have at least 14% significantly lower codon usage preference scores than do animals from the other habitats. Thus, we demonstrated that the evolution-driven “ordered” domains of aboveground species adapted to include more connected networks than did domains in the homologous proteins of subterranean species. These findings highlight the complicated adaptation process based on protein networks rather than point mutations, as described frequently in evolutionary studies.

Results

Data collection

We hypothesized that the evolution of underground species affected protein networks in a unique manner in which various types of protein domains served as building blocks of protein evolution. To study the evolution of protein networks, we collected genomic, proteomic, and protein domain classification data, namely, fully sequenced genomes with coding sequences and annotated proteomes, together with protein ortholog assignments, from 32 species living in three broad ecological niches, namely subterranean, fossorial, and aboveground (Table 1, and listed in Materials and Methods). We first sought overall statistics regarding the number of proteins and the number of corresponding orthologous protein families. Overall PPI statistics were calculated, including those predicting PPIs in organisms for which experimentally verified PPI data are missing. We used the KEGG orthologs (KO) group of orthologous proteins in KEGG (Kyoto Encyclopaedia of Genes and Genomes)¹⁷ to reproduce gain and loss of protein domains in orthologous proteins. We collected 1,350,898 proteins from the studied organisms that belong to 624,787 KO groups (10,314 are unique ortholog groups). The matching number of interactors and networks for every organism were exhaustively calculated for all these proteins (Fig. 1). We found that 361,615 of the 1,350,898 proteins are distributed among 5,879,879 (predicted and real) PPIs. The mean number of interactors per protein within each habitat, namely, aboveground (A), fossorial (F), and subterranean (S) were 32.07, 32.48, and 32.67, respectively (see details in the supplementary results and in Tables S1–S3). This shows that the number of interactors per protein is similar for organisms from different ecologies.

Table 1 All organisms included in the PASTORAL database, with a complete number of proteins in the corresponding proteome.

Full size table

Additional analysis of PPI features for orthologous proteins (516 KOs) common to all organisms were similar across ecologies. These features included the number of interactors, the number of PPIs, and global/individual clustering coefficients (supplementary results, Figures S1, S2, Table S4). Thus, we studied PPI properties of genes encoding products related to stresses that differ across the ecologies considered, such as hypoxia. Our findings confirm our hypothesis that the design principles of the evolution of underground species involve various types of protein domains serving as building blocks of protein evolution.

Analysis of the PPIs of stress-response proteins cluster organisms according to habitat

To examine how organisms might have adapted to the various stresses in each habitat, we analyzed mutations and changes in the PPIs encoded by stress response genes. Heat-shock, hypoxia, and circadian stresses differ considerably between aboveground and underground environments, and are likely to drive evolutionary selection of proteins that provide optimal function in each niche^1,9. We assumed that organisms subject to a shared ecological experience would face similar environmental stresses. PPI networks of stress-related proteins would thus be expected to differ substantially according to ecology.

To test our hypothesis, we performed clustering analysis of all the organisms included in our study, based on mutations and PPI network features, and compared the results for each classification. Such analysis included all orthologous stress-response, hypoxia, heat-shock, and circadian stress proteins (Table 1). In total, 85,173 PPIs related to stress-response proteins were found to be distributed among 1,103 proteins. These comprised of 730 heat shock proteins in 71,940 PPIs, 254 hypoxia-related proteins in 10,256 PPIs, and 119 circadian proteins in 2,977 PPIs (Table 1, Tables S1–S7). All orthologous stress-response genes (KO groups) were obtained by querying the KEGG database with the terms “heat-shock”, “hypoxia”, and “circadian” terms. The results are listed in Table 2, while the corresponding lists of proteins are found in Tables S5, S6 and S7, respectively.

Table 2 KEGG Orthologs: Heat-shock (upper), hypoxia-related (middle) and circadian (bottom) proteins.

Full size table

Next, we performed clustering analysis based on sequence mutations and PPI features for the full set of heat-shock, hypoxia, and circadian stress proteins (Table 2). Remarkably, proteins related to hypoxia, heat-shock, and circadian stresses in the 32 organisms studied did not all cluster according to shared ecology based on sequence mutations (Fig. 2A) but significantly did so on the basis of "PPI network clustering coefficient" (Fig. 2B–D; p value (AU) < 0.02, p value = 0.0018, and p value = 0.0013, respectively, Pearson's χ²-test). Moreover, the observed clustering of organisms according to ecological niches reflects adaptation towards a specific stress, rather than to the particular identity of the environment, such as a cave or within soil. Interestingly, we observed that bat clustered with other subterranean organisms based on hypoxia-related proteins. As hypoxia has been associated with spill-over, i.e., transmission of virulent viruses to other species¹⁸, other subterranean organisms may also have innate protection from virulent viruses. Moreover, the little brown bat (Myotis lucifugus) is associated with the emergence of SARS-CoV-2 responsible for the current COVID-19 pandemic¹⁹. Additional contributors to the spill-over of virulent viruses from bats include arousal from hibernation and the fact that hundreds of these bats hibernate in caves^18,20. Taken together, these results showed better assignment of organisms to broad ecological niches based on their cellular PPI networks than on sequence mutations, and supports the hypothesis that organisms adapt to their specific ecologies by modulating PPI networks rather than by mutation of protein sequences.

Additional analysis of PPI networks involving hypoxia-related proteins (e.g. HIF2A) revealed that distribution of central proteins within PPI network discriminates between PPIs of different ecologies, such as DMAD3, XPO1 and EWSR1, were unique to subterranean animals (supplementary results, Figure S3). This finding indicates that adaption to ecology via PPI modulation could rely on “shuffling” of protein domains, resulting in global changes in PPI networks in an ecology-specific manner.

Genes encoding common orthologous proteins of subterranean animals adopted non-optimal codons

Due to redundancy of the genetic code, amino acids are encoded by multiple synonymous codons. Moreover, the use of synonymous codons is non-uniform, such that there is a strong preference for certain codons in highly expressed genes^21,22,23. According to the strength of affinity of codon-anticodon interactions, codons with high and low affinities are referred to as optimal and non-optimal, respectively^24,25. We previously showed that subterranean animals adopted non-optimal codon usage as part of their adaptation to their stressful environment²⁶. We now hypothesized that orthologous proteins of subterranean animals adopted different codon usage preferences than those of fossorial and aboveground species.

To examine differences in codon usage preferences, we considered 516 orthologous proteins from KO groups common to the 32 organisms of study by developing a tendency score to estimate codon usage preferences (codon usage preference score (CUPS), defined by Eq. (3)) from a codon usage table (CUT). Accordingly, we classified codons as optimal and non-optimal²⁵. For the 516 common KO proteins, we computed the probability of subterranean animals adopting non-optimal and optimal codon usage, as calculated from the area under the density distribution curve, relative to aboveground animals (Fig. 3). Using the bootstrapping procedure described below, we found that subterranean and fossorial animals adopted 75.0% (p value = 0.0019) and 58.8% (p value = 0.076) more non-optimal codon usage, respectively, compared with aboveground animals (Fig. 3). Briefly, 10,000 random groups of 516 KOs were generated (as bootstrap replicates) and codon usage was calculated. p values were defined as the frequency of bootstrap replicates, with calculated values equaling or exceeding observed values (see Materials and Methods). We found that subterranean animals adopted 50.05%, on average, less optimal codon usage (CUPS: (subterranean (S) = 11.20, aboveground (A) = 22.42, A vs. S, p value < 2.2 × 10^–16. Wilcoxon rank sum test with continuity correction; Table S4).

Proteins with disordered regions are encoded by genes that adopted non-optimal codon usage and form fewer PPIs

Traditionally, proteins realize their function based on their 3-dimensional structure. However, in recent years, protein segments (> 30 residues) lacking stable secondary and/or tertiary structure, referred to as intrinsically disordered regions (IDRs) or intrinsically disordered protein regions (IDPRs), have been shown to exhibit functional capabilities within core molecular processes^{27,28,29,30,31,32}. The tendency of a protein region to exhibit structure can be represented on a spectrum³³. At one extreme, proteins without IDRs are considered as structured, while at the other end, proteins without structure over the entire sequence are referred to as intrinsically disordered proteins (IDPs)^{27,28,29,30,31,32}. Differential inclusion of IDRs via alternative splicing was found to increase protein function capabilities. IDRs contain sequence motifs which mediate interactions, and can contain post-translational modification sites^34,35,36. Differential inclusion of IDRs was also found to modulate PPIs in an tissue-specific manner by including or excluding IDRs that interact directly with protein partners^34,35. IDR composition, length and position were, moreover, shown to affect protein half-life, in addition to expanding protein functional capabilities^37,38,39,40. Misregulation and mutations within IDRs affect molecular function^41,42,43. The presence of a high proportion of missense disease mutations within IDRs indicates the importance of IDRs to proper molecular function, as well as to the development of disease. Therefore, we expanded the 516 KO groups common to all organisms addressed in this study to consider all KO groups and intrinsically disordered regions in proteins, defined as a continuous stretch longer than seven residues with an IUPRED SCORE > = 0.5⁴⁴ that do not overlap with Pfam domains. We thus hypothesized that disordered segments would affect ecological adaptation; and examined this by systematic analysis of multiple data sets that describe the sequences of various ordered and disordered domains, as well as proteins with both ordered and disordered regions, roughly corresponding to structured proteins, IDPs and IDRs respectively.

Once again, we calculated the total number of PPIs and CUPS and generated scatter plots (Fig. 4). These plots were generated from orthologous proteins, with the total number of PPIs differing significantly, at least by 1.2-fold, between ecologies. We found that proteins with disordered regions generally form fewer PPIs and are encoded by more genes showing non-optimal codon usage preferences to higher degree (Fig. 4A), relative to their counterparts containing mixed (Fig. 4B) and ordered (Fig. 4C) regions. On average, proteins with disordered regions formed 35.2%, 36.92%, and 35.6% fewer PPIs than did proteins with ordered regions within aboveground, fossorial, and subterranean ecologies, respectively (p value < 2.2e−16, Wilcoxon rank sum test with continuity correction;Table S2). Moreover, proteins with disordered regions adopt, on average, 11.2%, 12.8%, and 7.6% less optimal codon usage (CUPS) than do proteins with mixed regions from aboveground, fossorial, and subterranean ecologies, respectively (p value < 0.024, Wilcoxon rank sum test with continuity correction, Table S8). These results indicate that proteins with disordered regions form fewer PPIs and are encoded by genes that adopted fewer optimal codon usage preferences than do counterpart proteins with ordered and mixed regions.

Collectively, our findings are consistent and extend observations made with the fungus Neurospora, namely that non-optimal codons are used more often in intrinsically disordered regions, while optimal codons are preferentially used in structured (ordered) domains⁴⁵. Moreover, experimentally optimizing codon usage of the circadian clock gene was found to impair gene function⁴⁵, thus demonstrating the functional role of IDRs in protein function, in general^32,46, and the functional role of non-optimal codons, in particular. The results were similar when proteins with disordered regions were compared across ecologies (supplementary results, Tables S2, S8 and S9).

We observed a higher proportion in the mean number of interactors among aboveground than subterranean animals (93.1% (ordered), 97.7% (mixed), and 147.8% (disordered), p value 1.15e−11, Pearson's χ²-test; Table S8). This result indicates higher connectivity in the PPI networks of aboveground animals. Additionally, PPIs in subterranean, fossorial, and aboveground species displayed significant enrichment, compared with the 17,266 instances of loss of protein domains in 10,000 random PPI networks (125,956; 172,613; and 212,941, compared to 81,622 PPIs; V = 52, p value = 0.009766, V = 40, p value = 0.03906, and p = V = 91, p value = 0.0002441, Wilcoxon signed rank test with continuity correction), respectively.

These observed interactions involved 9,429; 10,676; and 13,077 proteins, on average, in subterranean, fossorial, and aboveground species (V = 36, p value = 0.4316, V = 21, p value = 0.9102, V = 89, p value = 0.0007324, respectively, Wilcoxon signed rank test with continuity correction), respectively. These values are thus significantly higher than the average 10,000 random PPI networks only for aboveground species. This is possibly due to the low number of proteins in PPIs belonging to fossorial and subterranean insects considered. Indeed, the average numbers of interactors per protein as a function of habitat (i.e., 32.67 (S), 32.48 (F) and 32.07 (A)) were significantly higher compared with the random value (16.5) (V = 55, p value = 0.001953, V = 45, p value = 0.003906, V = 91, p value = 0.0002441, respectively, Wilcoxon signed rank test with continuity correction). To confirm our results regarding codon usage preferences and PPIs, we collected such information from 61 proteins from the DisProt^47,48,49 database with over 98% disorder content from Rattus norvegicus, Mus musculus, Homo sapiens, Drosophila melanogaster, Danio rerio, Sus scrofa and Bos taurus (Table S10).

Orthologous proteins were found in aboveground, fossorial and subterranean animals, and CUPS and PPI analysis were performed. We found that subterranean animals adopt extreme non-optimal codon usage preferences and form less PPIs that are on average relative to aboveground and fossorial ecologies (CUPS ( PPIs)): − 17.37 (43.57), − 14.71 (45.74) and − 14.76 (44.86) respectively (p value < 0.041, Wilcoxon rank sum test with continuity correction, Tables S11, S12, S13, respectively). These patterns are apparent in a scatter plot showing density distributions (Fig. S4). The results replicated the observations obtained from our classification of proteins as ordered, disordered or mixed. Moreover, as this analysis was performed without consideration of our ecology-based classification, our results are independent of our domain classification method. Furthermore, the results confirm that our classification method captures many aspects of the disordered nature of proteins, at least in relation to their adaptation to a subterranean environment.

The user-friendly interface of the PASTORAL server

Finally, we organized all our data in a dedicated resource, PASTORAL (Protein–Protein Interactions of Stress-Response Genes in Subterranean and Fossorial Animals). The PASTORAL database interface is user-friendly and accepts the following parameters for a selected animal as input query: Gene symbols, NCBI Entrez identifiers (NCBI_ID), protein ID, chromosomes, and gene descriptions. Upon an identified match of a search query, the user is directed to the entry webpage. From this page, all PPI data can be obtained (particularly for the heat-shock and hypoxia-related proteins) using annotations and the corresponding KEGG orthologs¹⁷ (see Fig. 5). Querying PASTORAL for two protein names (interactors) at most, or their NCBI_IDs, returns interactions for bi-level PPIs. Querying for three or more identifiers (maximum 380) returns interactions between these entities (single-level PPIs). The interactors can also be downloaded as a file in tab-delimited format. PASTORAL, written in mySQL, enables users to study proteins and their interactions in an intuitive workflow, as displayed in Fig. 5 and Figures S5–S7). Here, PASTORAL was used in an analysis involving NCBI_IDs for input proteins from 23 organisms listed in Table S1.

Discussion and conclusions

The blind mole rat Spalax galili is an outstanding model for studying adaptation to life underground, with a remarkable predilection to disease including cancer^1,50. A number of studies have shown that reduced mutations and chromosomal alterations are probably linked to hypoxia and hypercapnia, and that both may have significant roles in enhancing resistance to cancer in the blind mole rat¹. With this in mind, the current study explored gene niche adaptation that results in the rewiring of PPI networks. We utilized fully sequenced genomes and proteomes of diverse taxa that inhabit similar ecological niches, namely, aboveground and underground habitats. These surroundings differ markedly in terms of environmental stresses. Accordingly, diverse organisms experience identical stresses imposed by virtue of their inhabiting a particular environment. We examined whether organisms sharing an ecological niche exhibit common attributes, distinct from those of organisms from a different ecological niche, specifically comparing subterranean and aboveground species.

We comprehensively assessed PPI network features between aboveground, fossorial, and subterranean organisms, considering various groups of orthologous proteins (KOs), and their domain content, namely, proteins with ordered, disordered, and mixed (ordered and disordered) regions. We evaluated PPI features, such as the total and average numbers of PPIs, as well as codon usage preferences by the encoding genes. We found that proteins with disordered regions generally form fewer PPIs and are encoded by genes that adopt more non-optimal codon usage, i.e., more negative CUPS than do counterpart proteins with ordered and mixed regions. Both PPIs and non-optimal codon usage were observed as more prevalent in subterranean than in fossorial and aboveground species. Taken together, these observations indicate that distantly related organisms inhabiting the same type of ecological niche is manifested in PPI networks and in the DNA and amino acid sequences of the interacting proteins. This is presumably a consequence of these organisms experiencing shared ecological stresses.

The above observations led us to hypothesize that substantial differences in the severity of stresses between above and underground habitats account for the great variance observed between organisms living in these habitats. This was reflected in differences between PPI networks and in the properties of interacting proteins. We presented evidence from PPIs of stress-related, hypoxia-related, heat-shock, and circadian proteins. All the organisms investigated demonstrated complete clustering according to PPI features, such that these clusters reflect the ecological niche-based classification of the organism considered. We confirmed that the distribution of hubs (key proteins) in ecology-specific PPI sub-networks accounts for such clustering by ecology. Accordingly, the key proteins (hubs) and essential interactions in the PPI networks of PAS (Per-Arnt-Sim⁵¹) domain -containing proteins are central players in environmental stress response pathways, such as hypoxia, heat-shock, and circadian and dioxin response pathways. This demonstrated the applicability of PPI network analyze is to understanding biological phenomena. Together, our results allude to the intimate relation between ecology and evolution, in general, and convergent evolution, in particular, due to the shared stress experienced by species confined to the same ecology. Finally, we organized all PPIs and codon usage data in a dedicated user-friendly resource, PASTORAL, which provides evolutionary biologists an extensive and comprehensive tool to study convergent evolution related to stress responses and other essential cellular processes.

Materials and methods

Data resources

Five core resources were used: Entrez/NCBI⁵², KEGG¹⁷, BioGrid (release 3.4.163)¹⁶, Pfam (release 31.0)⁵³, and the Gene Ontology^54,55 (GO) consortium. Complete genomes, proteomes, and coding sequences were obtained from NCBI. Ortholog annotations were obtained from KEGG, whereas annotations for organisms not included in the KEGG database, coded by an upper-case three-letter code, were obtained using the BLASTKOALA web-tool^17,56. Annotations from KEGG included ortholog (KO) groups (https://rest.kegg.jp/list/ko) and GO annotations (linkDB within KEGG). Annotations for Pfam domains were retrieved from Pfam (release 31.0)⁵³. We collected data for the following 23 organisms: ten fossorial species—Cricetulus griseus (Chinese hamster, cge), Condylura cristata (star-nosed mole, COC)⁵⁷, Dasypus novemcinctus (nine-banded armadillo, DAN), Microtus ochrogaster (prairie vole, MIO), Octodon degus (common degu, OCD), Peromyscus maniculatus bairdii (deer mouse, PEM), Dipodomys ordii (Ord's kangaroo rat, DIO), Camponotus floridanus (Florida carpenter ant, cfo), Manis javanica (Malayan pangolin, mjv), and Solenopsis invicta (red fire ant, soc); nine subterranean species—Chrysochloris asiatica (Cape golden mole, CHA), Fukomys damarensis (Damara mole rat, FUD), Heterocephalus glaber (naked mole-rat ,hgl), Nannospalax galili (blind mole rat, ngi), Astyanax mexicanus (Mexican tetra, ASM), Cryptotermes secundus (drywood termite, CRS), Zootermopsis nevadensis (dampwood termite, zne), and Myotis lucifugus (little brown bat, MYL); and 13 organisms that live aboveground—Erinaceus europaeus (European hedgehog, ERE), Ornithorhynchus anatinus (platypus, oaa), Orycteropus afer (aardvark, ORA), Homo sapiens (human, hsa), Mus musculus (mouse, mmu), Rattus norvegicus (rat, rno), Pan troglodytes (chimpanzee, ptr), Gallus gallus (chicken, gga), Felis catus (cat, fca), Drosophila melanogaster (fruitfly, dme), Bos taurus (cow, bta), Sus scrofa (swine, ssc), and Danio rerio (zebrafish, dre).

Conservation and point mutations in protein domain sequences

Protein domains, delineated by coordinates identified by Perl scripts written in the lab, along with the Pfam search tool⁵³, were extracted using the extractseq program (EMBOSS:6.6.0.0) with the ‘regions’ option. The highest scoring domain was reserved for multiple sequence alignment analysis, thus ensuring a single sequence per organism. Only domains conserved among 10 or more animals were analyzed. Multiple sequence alignment was performed using T-Coffee⁵⁸ with default parameters. Statistical analysis was performed using R, and hierarchical clustering was performed and assessed using pvclust⁵⁹.

PPIs of stress response genes

We identified PPIs encoded by all stress-response genes using the ChiPPI tool, which we previously described¹⁵. ChiPPI assumes that PPIs can be approximated by calculating the propensity of discreet domains by means of a pre-computed domain-domain co-occurrence table (DDCOT) from all interactions in BioGrid¹⁶. A new PPI network was generated based on the DDCOT for each organism examined in this study. Thus, we identified domain-domain co-occurrences for each PPI to detect potential interacting proteins that reflect the overall structure of the PPI network¹⁵. Additionally, we used the agglomerative hierarchical clustering method to classify the stress-response genes for all 32 organisms in PASTORAL.

Domain prediction method

The Pfam database represents protein domains as profile-hidden Markov models (HMM)⁵³. Accordingly, protein sequences were searched by HMMER (version 3.2.1, 13 June 2018)⁶⁰ with Pfam-provided HMM profiles, to predict protein domains. In addition, disordered regions were predicted based on the redox state using IUPred2A⁴⁴ Predicted disordered regions are treated as a single generic type (DISORDERED) for the generation of PPI networks. Hence, disordered regions are incorporated within a PPI model as an additional DISORDERED domain, a component of domain-domain co-occurrence scores from which PPIs are predicted.

Codon usage preference score (CUPS)

Previously, we found that genes adopt non-optimal codon usage to modulate protein expression in a cell-cycle dependent manner²⁵. The distinction between optimal and non-optimal codons refers to the strength of the codon-anti-codon interaction, where optimal and non-optimal codons have high and low affinity, respectively, due to the “wobble” effect^24,25. To quantify codon usage preferences, we devised optimal and non-optimal codon usage scores for the respective optimal and non-optimal codons. Moreover, we defined the difference between optimal and non-optimal as a tendency score. A positive tendency score represents an optimal codon usage preference, while negative values represent non-optimal codon preferences. Both optimal and non-optimal codon usage scores are computed from codon frequencies obtained from a Codon Usage Table (CUT), calculated using the cusp program (EMBOSS:6.6.0.0) for a set, of coding sequence (CDSs).

$$Optimal\left(S\right)=100\cdot {\left[1+\frac{\left|{Cod}_{Opt}\right|\cdot {\sum }_{j\in {Cod}_{nonOpt}}{\nu \left(S\right)}_{j}}{\left|{Cod}_{nonOpt}\right|\cdot {\sum }_{i\in {Cod}_{Opt}}{\nu \left(S\right)}_{i}}\right]}^{-1}$$

(1)

$$nonOptimal\left(S\right)=100\cdot {\left[1+\frac{\left|{Cod}_{nonOpt}\right|\cdot {\sum }_{i\in {Cod}_{Opt}}{\nu \left(S\right)}_{i}}{\left|{Cod}_{Opt}\right|\cdot {\sum }_{j\in {Cod}_{nonOpt}}{\nu \left(S\right)}_{j}}\right]}^{-1}$$

(2)

Here ν is the frequency of codons obtained from the Codon Usage Table (CUT), Cod_Opt and Cod_nonOpt are sets of optimal and non-optimal codons²⁵, respectively, |Cod_Opt| and |Cod_nonOpt| are the numbers of elements of these sets and S is a set of CDSs.

$$tendency\left(S\right)=Optimal\left(S\right)-nonOptimal\left(S\right)$$

(3)

where tendency is referred to as a codon usage preference score (CUPS).

To evaluate the significance of codon usage (CUPS) among subterranean and fossorial organisms, relative to aboveground species (observed values), bootstrap analysis was performed. Bootstrap analysis consisted of generating 10,00 random groups of 516 KOs (bootstrap replicates) and calculating p-values associated with observed values.

Codon usage was calculated as the summed areas under density distribution, i.e., negative (non-optimal) and positive (optimal) CUPS. These areas were scaled according to aboveground animals. Bootstrap replicates were generated by assigning a random number to each KO, using the random_normal() function of the Math::Random Perl module, and selecting the first 516 KO from a sorted list.

P-values were calculated as a proportion of bootstrap replicates, with codon usage equal or exceeding observed codon usage. p values of 1% were considered significant.

References

Fang, X. et al. Genome-wide adaptive complexes to underground stresses in blind mole rats Spalax. Nat. Commun. 5, 3966. https://doi.org/10.1038/ncomms4966 (2014).
Article CAS PubMed Google Scholar
Nevo, E. Stress, adaptation, and speciation in the evolution of the blind mole rat, Spalax, in Israel. Mol. Phylogenet. Evol. 66, 515–525. https://doi.org/10.1016/j.ympev.2012.09.008 (2013).
Article PubMed Google Scholar
Emerling, C. A. & Springer, M. S. Eyes underground: regression of visual protein networks in subterranean mammals. Mol. Phylogenet. Evol. 78, 260–270. https://doi.org/10.1016/j.ympev.2014.05.016 (2014).
Article CAS PubMed Google Scholar
Sun, H. et al. Evolution of circadian genes PER and CRY in subterranean rodents. Int. J. Biol. Macromol. 118, 1400–1405. https://doi.org/10.1016/j.ijbiomac.2018.06.133 (2018).
Article CAS PubMed Google Scholar
Maddin, H. C. & Sherratt, E. Influence of fossoriality on inner ear morphology: insights from caecilian amphibians. J. Anat. 225, 83–93. https://doi.org/10.1111/joa.12190 (2014).
Article PubMed PubMed Central Google Scholar
Su, J. et al. Abundance and characteristics of microsatellite markers in Gansu zokor (Eospalax cansus), a fossorial rodent endemic to the Loess plateau, China. J. Genet. 93, e25-28 (2014).
PubMed Google Scholar
Williams, C. T., Barnes, B. M. & Buck, C. L. Integrating physiology, behavior, and energetics: biologging in a free-living arctic hibernator. Comp. Biochem. Physiol. A Mol. Integr. Physiol. 202, 53–62. https://doi.org/10.1016/j.cbpa.2016.04.020 (2016).
Article CAS PubMed Google Scholar
Nevo, E., Filippucci, M. G. & Beiles, A. Genetic diversity and its ecological correlates in nature: comparisons between subterranean, fossorial, and aboveground small mammals. Prog. Clin. Biol. Res. 335, 347–366 (1990).
CAS PubMed Google Scholar
Tavares, W. C. & Seuánez, H. N. Changes in selection intensity on the mitogenome of subterranean and fossorial rodents respective to aboveground species. Mamm. Genome 29, 353–363. https://doi.org/10.1007/s00335-018-9748-5 (2018).
Article PubMed Google Scholar
Malik, A. et al. Genome maintenance and bioenergetics of the long-lived hypoxia-tolerant and cancer-resistant blind mole rat, Spalax: a cross-species analysis of brain transcriptome. Sci. Rep. 6, 38624 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Gorbunova, V. et al. Cancer resistance in the blind mole rat is mediated by concerted necrotic cell death mechanism. Proc. Natl. Acad. Sci. 109, 19392–19396 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Schmidt, H. et al. Hypoxia tolerance, longevity and cancer-resistance in the mole rat Spalax: a liver transcriptomics approach. Sci. Rep. 7, 14348 (2017).
Article ADS PubMed PubMed Central Google Scholar
Marsh, J. A. & Teichmann, S. A. How do proteins gain new domains?. Genome Biol. 11, 126. https://doi.org/10.1186/gb-2010-11-7-126 (2010).
Article CAS PubMed PubMed Central Google Scholar
Frenkel-Morgenstern, M. & Valencia, A. Novel domain combinations in proteins encoded by chimeric transcripts. Bioinformatics 28, i67 (2012).
Article CAS PubMed PubMed Central Google Scholar
Frenkel-Morgenstern, M. et al. ChiPPI: a novel method for mapping chimeric protein–protein interactions uncovers selection principles of protein fusion events in cancer. Nucleic Acids Res. 45, 7094–7105 (2017).
Article CAS PubMed PubMed Central Google Scholar
Chatr-Aryamontri, A. et al. The BioGRID interaction database: 2017 update. Nucleic Acids Res. 45, D369–D379. https://doi.org/10.1093/nar/gkw1102 (2017).
Article CAS PubMed Google Scholar
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457-462. https://doi.org/10.1093/nar/gkv1070 (2016).
Article CAS PubMed Google Scholar
Subudhi, S., Rapin, N. & Misra, V. Immune system modulation and viral persistence in bats: understanding viral spillover. Viruses https://doi.org/10.3390/v11020192 (2019).
Article PubMed PubMed Central Google Scholar
Zhou, P. et al. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature 579, 270–273. https://doi.org/10.1038/s41586-020-2012-7 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Keen, R. & Hitchcock, H. B. Survival and Longevity of the Little Brown Bat (Myotis lucifugus) in Southeastern Ontario. J. Mammal. 61, 1–7. https://doi.org/10.2307/1379951 (1980).
Article Google Scholar
Sharp, P. M. & Li, W.-H. An evolutionary perspective on synonymous codon usage in unicellular organisms. J. Mol. Evol. 24, 28–38 (1986).
Article ADS CAS PubMed Google Scholar
Lavner, Y. & Kotlar, D. Codon bias as a factor in regulating expression via translation rate in the human genome. Gene 345, 127–138. https://doi.org/10.1016/j.gene.2004.11.035 (2005).
Article CAS PubMed Google Scholar
Goodenbour, J. M. & Pan, T. Diversity of tRNA genes in eukaryotes. Nucleic Acids Res. 34, 6137–6146. https://doi.org/10.1093/nar/gkl725 (2006).
Article CAS PubMed PubMed Central Google Scholar
Crick, F. H. C. Codon—anticodon pairing: the wobble hypothesis. J. Mol. Biol. 19, 548–555. https://doi.org/10.1016/S0022-2836(66)80022-0 (1966).
Article CAS PubMed Google Scholar
Frenkel-Morgenstern, M. et al. Genes adopt non-optimal codon usage to generate cell cycle-dependent oscillations in protein levels. Mol. Syst. Biol. 8, 572 (2012).
Article PubMed PubMed Central Google Scholar
Li, K. et al. Transcriptome, genetic editing, and microRNA divergence substantiate sympatric speciation of blind mole rat, Spalax. Proc. Natl. Acad. Sci. 113, 7584–7589 (2016).
Article CAS PubMed PubMed Central Google Scholar
van der Lee, R. et al. Classification of intrinsically disordered regions and proteins. Chem. Rev. 114, 6589–6631. https://doi.org/10.1021/cr400525m (2014).
Article CAS PubMed PubMed Central Google Scholar
Wright, P. E. & Dyson, H. J. Intrinsically disordered proteins in cellular signalling and regulation. Nat. Rev. Mol. Cell Biol. 16, 18–29. https://doi.org/10.1038/nrm3920 (2015).
Article CAS PubMed PubMed Central Google Scholar
Tompa, P. Intrinsically disordered proteins: a 10-year recap. Trends Biochem. Sci. 37, 509–516. https://doi.org/10.1016/j.tibs.2012.08.004 (2012).
Article CAS PubMed Google Scholar
Gsponer, J. & Babu, M. M. The rules of disorder or why disorder rules. Prog. Biophys. Mol. Biol. 99, 94–103. https://doi.org/10.1016/j.pbiomolbio.2009.03.001 (2009).
Article CAS PubMed Google Scholar
Uversky, V. N. A decade and a half of protein intrinsic disorder: biology still waits for physics. Protein Sci. 22, 693–724. https://doi.org/10.1002/pro.2261 (2013).
Article CAS PubMed PubMed Central Google Scholar
Latysheva, N. S., Flock, T., Weatheritt, R. J., Chavali, S. & Babu, M. M. How do disordered regions achieve comparable functions to structured domains?. Protein Sci. 24, 909–922. https://doi.org/10.1002/pro.2674 (2015).
Article CAS PubMed PubMed Central Google Scholar
Babu, M. M., Kriwacki, R. W. & Pappu, R. V. Structural biology. versatility from protein disorder. Science 337, 1460–1461. https://doi.org/10.1126/science.1228775 (2012).
Article ADS CAS PubMed Google Scholar
Buljan, M. et al. Tissue-specific splicing of disordered segments that embed binding motifs rewires protein interaction networks. Mol. Cell 46, 871–883. https://doi.org/10.1016/j.molcel.2012.05.039 (2012).
Article CAS PubMed PubMed Central Google Scholar
Buljan, M. et al. Alternative splicing of intrinsically disordered regions and rewiring of protein interactions. Curr. Opin. Struct. Biol. 23, 443–450. https://doi.org/10.1016/j.sbi.2013.03.006 (2013).
Article CAS PubMed Google Scholar
Weatheritt, R. J., Davey, N. E. & Gibson, T. J. Linear motifs confer functional diversity onto splice variants. Nucleic Acids Res. 40, 7123–7131. https://doi.org/10.1093/nar/gks442 (2012).
Article CAS PubMed PubMed Central Google Scholar
van der Lee, R. et al. Intrinsically disordered segments affect protein half-life in the cell and during evolution. Cell Rep. 8, 1832–1844. https://doi.org/10.1016/j.celrep.2014.07.055 (2014).
Article CAS PubMed PubMed Central Google Scholar
Inobe, T. & Matouschek, A. Paradigms of protein degradation by the proteasome. Curr. Opin. Struct. Biol. 24, 156–164. https://doi.org/10.1016/j.sbi.2014.02.002 (2014).
Article CAS PubMed Google Scholar
Fishbain, S. et al. Sequence composition of disordered regions fine-tunes protein half-life. Nat. Struct. Mol. Biol. 22, 214–221. https://doi.org/10.1038/nsmb.2958 (2015).
Article CAS PubMed PubMed Central Google Scholar
Prakash, S., Tian, L., Ratliff, K. S., Lehotzky, R. E. & Matouschek, A. An unstructured initiation site is required for efficient proteasome-mediated degradation. Nat. Struct. Mol. Biol. 11, 830–837. https://doi.org/10.1038/nsmb814 (2004).
Article CAS PubMed Google Scholar
Babu, M. M., van der Lee, R., de Groot, N. S. & Gsponer, J. Intrinsically disordered proteins: regulation and disease. Curr. Opin. Struct. Biol. 21, 432–440. https://doi.org/10.1016/j.sbi.2011.03.011 (2011).
Article CAS PubMed Google Scholar
Vacic, V. & Iakoucheva, L. M. Disease mutations in disordered regions—exception to the rule?. Mol. Biosyst. 8, 27–32. https://doi.org/10.1039/c1mb05251a (2012).
Article CAS PubMed Google Scholar
Pajkos, M., Mészáros, B., Simon, I. & Dosztányi, Z. Is there a biological cost of protein disorder? Analysis of cancer-associated mutations. Mol. Biosyst. 8, 296–307. https://doi.org/10.1039/c1mb05246b (2012).
Article CAS PubMed Google Scholar
Mészáros, B., Erdos, G. & Dosztányi, Z. IUPred2A: context-dependent prediction of protein disorder as a function of redox state and protein binding. Nucleic Acids Res. 46, W329–W337. https://doi.org/10.1093/nar/gky384 (2018).
Article CAS PubMed PubMed Central Google Scholar
Zhou, M., Wang, T., Fu, J., Xiao, G. & Liu, Y. Nonoptimal codon usage influences protein structure in intrinsically disordered regions. Mol. Microbiol. 97, 974–987. https://doi.org/10.1111/mmi.13079 (2015).
Article CAS PubMed PubMed Central Google Scholar
Flock, T., Weatheritt, R. J., Latysheva, N. S. & Babu, M. M. Controlling entropy to tune the functions of intrinsically disordered regions. Curr. Opin. Struct. Biol. 26, 62–72. https://doi.org/10.1016/j.sbi.2014.05.007 (2014).
Article CAS PubMed Google Scholar
Sickmeier, M. et al. DisProt: the database of disordered proteins. Nucleic Acids Res. 35, D786-793. https://doi.org/10.1093/nar/gkl893 (2007).
Article CAS PubMed Google Scholar
Hatos, A. et al. DisProt: intrinsic protein disorder annotation in 2020. Nucleic Acids Res. https://doi.org/10.1093/nar/gkz975 (2019).
Article PubMed Central Google Scholar
Piovesan, D. et al. DisProt 7.0: a major update of the database of disordered proteins. Nucleic Acids Res. 45, D219–D227. https://doi.org/10.1093/nar/gkw1056 (2017).
Article CAS PubMed Google Scholar
Zhao, Y. et al. Adaptive methylation regulation of p53 pathway in sympatric speciation of blind mole rats, Spalax. Proc. Natl. Acad. Sci. USA 113, 2146–2151. https://doi.org/10.1073/pnas.1522658112 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
McIntosh, B. E., Hogenesch, J. B. & Bradfield, C. A. Mammalian Per-Arnt-Sim proteins in environmental adaptation. Annu. Rev. Physiol. 72, 625–645 (2010).
Article CAS PubMed Google Scholar
Benson, D. A. et al. GenBank. Nucleic Acids Res. 46, D41–D47. https://doi.org/10.1093/nar/gkx1094 (2018).
Article CAS PubMed Google Scholar
Finn, R. D. et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 44, D279-285. https://doi.org/10.1093/nar/gkv1344 (2016).
Article CAS PubMed Google Scholar
Mulder, N. J. et al. InterPro, progress and status in 2005. Nucleic Acids Res. 33, D201-205. https://doi.org/10.1093/nar/gki106 (2005).
Article CAS PubMed Google Scholar
Lee, H., Deng, M., Sun, F. & Chen, T. An integrated approach to the prediction of domain-domain interactions. BMC Bioinform. 7, 269. https://doi.org/10.1186/1471-2105-7-269 (2006).
Article CAS Google Scholar
Kanehisa, M., Sato, Y. & Morishima, K. BlastKOALA and GhostKOALA: KEGG tools for functional characterization of genome and metagenome sequences. J. Mol. Biol. 428, 726–731 (2016).
Article CAS PubMed Google Scholar
Petersen, K. E. & Yates, T. L. Condylura cristata. Mammalian Species 129, 1–4 (1980).
Article Google Scholar
Notredame, C., Higgins, D. & Heringa, J. T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302, 205–217 (2000).
Article CAS PubMed Google Scholar
Suzuki, R. & Shimodaira, H. Pvclust: an R package for assessing the uncertainty in hierarchical clustering. Bioinformatics 22, 1540–1542 (2006).
Article CAS PubMed Google Scholar
Potter, S. C. et al. HMMER web server: 2018 update. Nucleic Acids Res. 46, W200–W204. https://doi.org/10.1093/nar/gky448 (2018).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank Dr. Eviatar Nevo for his expertise and helpful comments on the manuscript. This work was supported by PBC (VATAT) fellowships for outstanding post-Docs from China and India (22351 and 20027) and Israel Cancer Association Grants (24562-01 and 24562-02). M.F.M. is a member of the Dangoor Center for Personalized Medicine and the Data Science Institute (DSI), Bar-Ilan University, Israel.

Author information

These authors contributed equally: Gon Carmi, Somnath Tagore and Alessandro Gorohovski.

Authors and Affiliations

The Azrieli Faculty of Medicine, Bar-Ilan University, 8 Henrietta Szold St, 13195, Safed, Israel
Gon Carmi, Somnath Tagore, Alessandro Gorohovski, Aviad Sivan, Dorith Raviv-Shay & Milana Frenkel-Morgenstern
Department of Systems Biology, Columbia University Medical Center, Herbert Irving Cancer Research Center, New York, USA
Somnath Tagore

Authors

Gon Carmi
View author publications
You can also search for this author in PubMed Google Scholar
Somnath Tagore
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Gorohovski
View author publications
You can also search for this author in PubMed Google Scholar
Aviad Sivan
View author publications
You can also search for this author in PubMed Google Scholar
Dorith Raviv-Shay
View author publications
You can also search for this author in PubMed Google Scholar
Milana Frenkel-Morgenstern
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.F.M. designed and supervised the study and wrote the paper; G.C., A.G., and S.T. produced the study, verified the results and wrote the paper. A.S. and D.R.S. participated in the study and wrote the paper.

Corresponding author

Correspondence to Milana Frenkel-Morgenstern.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Table S1

Supplementary Table S2

Supplementary Table S3

Supplementary Table S4

Supplementary Table S5

Supplementary Table S6

Supplementary Table S7

Supplementary Table S8

Supplementary Table S11

Supplementary Table S12

Supplementary Table S13

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Carmi, G., Tagore, S., Gorohovski, A. et al. Design principles of gene evolution for niche adaptation through changes in protein–protein interaction networks. Sci Rep 10, 15628 (2020). https://doi.org/10.1038/s41598-020-71976-x

Download citation

Received: 02 August 2019
Accepted: 24 August 2020
Published: 24 September 2020
DOI: https://doi.org/10.1038/s41598-020-71976-x

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.