Computational modelling of wet adhesive mussel foot proteins (Bivalvia): Insights into the evolutionary convolution in diverse perspectives

Underwater adhesion in mussels (Bivalvia) is an extreme adaptation to achieve robust and firm wet adhesion in the freshwater/brackish/ocean, which biochemically shaped through millions of years. The protein-based adhesion has huge prospective in various fields like industry, medical, etc. Currently, no comprehensive records related to the systematic documentation of structural and functional properties of Mussel foot proteins (Mfps). In this study, we identified the nine species of bivalves in which the complete sequence of at least one adhesive protein is known. The insilico characterization revealed the specific physio-chemical structural and functional characters of each Mfps. The evolutionary analyses of selected bivalves are mainly based on Mfps, Mitogenome, and TimeTree. The outcome of the works has great applications for designing biomimetic materials in future.


Mytilus galloprovincialis
The Mytilus unguiculatus foot protein (Mufp), contains three types of protein and their variant. Mufp2 showed the transmembrane signaling receptor activity, endopeptidase activity, signal transducer activity, serine hydrolase activity, cytokine, and zinc ion binding activity. In Mufp3 and their variants exhibit the peptidase inhibitory activity and G-protein coupled receptor activity. All proteins and their variables in Mufp6 showed the cytokine activity. Mufp6 and their variant like v3 and v9 exhibit the zinc ion binding activity. Except for Mufp6, all variants showed the cytokine receptor binding activity.
Mizuhopecten yessoensis foot protein (Myfp), showed the DNA binding (GO:0003677), cytoskeleton protein binding (GO:0008092) and nucleic acid binding (GO:0003676) activities. The Myfp1 v2 showed the sequence-specific DNA binding transcription factor activity. The Perna canaliculus foot protein (Pcfp), the fp1 have four variants and all variants showed the poly (A) RNA binding (GO:0044822) except Pcfp1 v1. Perna viridis foot protein (Pvfp), each protein has unique molecular functions. The Pvfp1 v1 and Pvfp6 showed the G-protein coupled receptor binding activity. Glycosaminoglycan binding (GO:0005539) activity observed in Pvfp1 v2. The variants like fp3 and fp5 exhibit the zinc ion binding activity. chemical structural evaluation of Mfps. Amino acid compositional analysis of Apfp1, the major amino acid composition is lysine (15.3%) and proline (15.1%). Most of the amino acid in neutral charge and with positive charge clusters from 121 to 146, (KKPPVYKPKKPVYKPKKRPAYKPKKK), mixed and negative charge clusters are absent in Apfp1. Core block tandem repeats like PPVD, KPPV and PDYKP repeated two times and YKPKK repeated three times. Dpfp1 showed the highest abundance of proline (22.3%) and tyrosine (14.9%), the charge cluster analysis revealed the absence of positive, negative and mixed charge clusters in Dpfp1. tandem repeated blocks, FTTK, PVYPT, PVYPY, PVYPP, PEYP repeated two times and a four-times repetition of PVYP are also observed.
The amino acid composition of each Mfps, glycine, and lysine are the major component of the Mfps other than tyrosine (Detailed data in Supplementary File S4). Recently discovered as multiple pairs of Dopa-lysine contribute to the critical underwater adhesion 5,6,14,18 . The polymorphism in Mfps, may indicated as the versatility of adhesion as variety forms of an adhesive protein can interact the various surfaces.
physiochemical characterization of Mfps. Expasy protparam server revealed the physio-chemical properties of each protein (Table 3 and Fig. 3). This server helps to the grouping of Mfps, the all Mfps half-life more than 10 hours in E.coli and >20 hours in yeast. In Comparing the other fps, fp3 (0.260) and fp6 (0.102) is polar nature protein because the GRAVY value is positive in nature. Except for fp3 (44.26), other variants are highly stable because of the II is below 40.
The physiochemical structural and functional characterization of all Mfps is the first time. All protein is hydrophobic nature but except the Pvfp3 and Pvfp6 is polar but it is hydrophobic nature. In Mytilus sp. Mfp-3f is polar but hydrophobic nature, the protein may play vital role in metal and mineral surface adhesion 7,18,19 . ion ligand binding sites of Mfps. Understanding the general properties of the ligand-binding ability of the protein sites is the great importance to understand the functional diversity of the Mfps. One of the fundamental features of the Mfps receptor surface is the set of amino acids available for interactions with ligands. The stabilization and interlinking of Mfps mainly mediated by metal ions, by divulging the metal and acid radical ion binding ability helps to understanding the functional diversity of Mfps. (Detailed predicted binding residues (s) of each Mfps provided in Supplementary File S5).
Foot protein: Apfp1, mainly 3 ligand binding sites were identified in this protein i.e., Zn 2+ , Ca 2+ and Na + . Among these ligand binding sites; 67 sites are available for zinc-binding and followed by 22 sites for sodium binding and three amino acid sites for calcium (I249 E255 E341). Except for calcium, the zinc and sodium bind to the tyrosine and they may associate with DOPA to help them interlinking. No binding site detected for the following ions:(Cu 2+ , Fe 2+ , Fe 3+ , Mg 2+ , Mn 2+ , K + , CO  www.nature.com/scientificreports www.nature.com/scientificreports/ binding site detected for the following ions:(Fe 2+ , Fe 3+ , Mg 2+ , CO 3 2− , NO 2 − , SO 4 2− , PO 4 3− ). Comparing the two variants of Mcfp1, the v2 contained different ligand binding sites such as Zn 2+ , Cu 2+ , Ca 2+ , Mn 2+ , K + and Na + . The metal ion Na + is mostly preferring the v1 and the Zn 2+ prefers the v2.
Mcfp2: in the foot protein only two metal ion binding sites are present and also one acid radical ion sites also detected. Comparing the Zn 2+ and Na + metal ion binding sites, Zn 2+ (~221 sites) is widely distributed or binding the most of the regions and followed by Na + (~103 sites) and Ca 2+ (97 binding sites). The Zn 2+ , Na + and Ca 2+ randomly bind the different amino acids present in the protein. ). In v8, 19 predicted sites for Zn 2+ , and followed by Mg 2+ binding site (two sites): Y51 K62, Na + binding site (three sites): Y42 Y48 Y51. No binding site detected for the following ions:(Cu 2+ , Fe 2+ , Fe 3+ , Ca 2+ , Mn 2+ , K + , CO 3 2− , NO 2 − , SO 4 2− , PO 4 3− ). The v9, showed the five different metal ions has the ability to bind this protein (Zn 2+ , Ca 2+ , Mn 2+ , Na + , and K + ). 20 amino acid residues capable to bind the Zn 2+ metal ion and followed by Ca 2+   ). The catechol containing polymers and peptides has the ability to bind various metal ions like Zn 2+ , Cu 2+ , Fe 3+ , Ca 2+ , Mg 2+ , Mn 2+ , Na + and K + and acid radical ions, CO 3 2− , NO 2 − , SO 4 2− and PO 4 3− . The overall analysis most of the foot proteins showed the Zn 2+ and Na + metal ion binding ability and only few Mfps have the ability to bind acid radical ions.
Unbosoming the emergence of Bivalvia: Perspectives on mitogenome, TimeTree, and Mfps. The Bivalvia evolution is very complex in nature, the evolution of Bivalvia starts from Cambrian periods 3 . The bivalve origins, evolution of their phenotype and functional divergence of Mfps is largely remained unresolved. Evolutionary pattern of byssus thread producing Bivalvia in different perspectives revealed the functional divergence and speciation pattern.
Mitogenome -phylogenetic construction analysis of bio-adhesive producing bivalves. The complete mitochondrial genome sequence analysis revealed the mitochondrial genome evolutionary pattern in the byssus thread producing bivalves. Based on the mitochondrial genome evolution, the speciation of all Mytilus species is from the same clad but interestingly founded that Perna perna is also originated from the speciation node of Mytilus. In the genus of Perna, currently, three living species only exist, the stem branch of P. canaliculus and P. viridis is entirely separated from P. perna. The monophyletic origin of mitogenome initially separated into two branches. The one branch contained three species of Bivalvia, with entirely different order taxa, the D. polymorpha under the order Myoida and another are Mytiloida (P.canaliculus and P.viridis). And another set of clad, the first separated taxa is Pectinoida and then followed by Ostreoida and Myoida. The Ostreoida is closely resembling the mitogenome of Mytiloida. Among the Mytilus species, the M.galloprovincialis is the first originated species and it is the ancestor of all other Mytilus sp. and P.perna also. The M.unguiculatus and M.californianus is existed in the same clad.
The mitochondrial genome revealed the fascinating key of Mfps producing Bivalvia origin, M.galloprovincialis is the first evolved Bivalvia then followed by M.californianus. But, in the case of TimeTree analysis, the first evolved Bivalvia is D.polymorpha, it is an brackish/fresh water forms, then followed by M.califroninanus. Both mitogenome and TimeTree revealed the second evolved form is M.californianus. The natural selection behavior, molecular evolutionary clock speciation indicated as the fresh/brackish form bivalve is the first evolved form, is evidently supported by the Bivalvia taxa evolution and diversification. Contradiction outcome observed in the mitogenome analysis, under the natural selection pressure is marine bivalve to brackish/fresh aquatic forms of species diversification raised. The mitogenome is an important potential target of natural taxa selection spread across the gradients of the ecosystem 20 . The bivalve spread in the costal belt habitats with dynamic changes such as temperature fluctuation, salinity, dissolved oxygen, desiccation, UV-radiation and exposure to chemical pollutants etc., which can induce oxidative stress to them 21 (Fig.5).
The evolutionary and environmental forces equally blend together to tune the unique constitution, magnitude and function of each proteins in the adhesive secretion. The functional property of each protein in root, stem, thread and plaque of byssus thread are highly predisposed and extremely intricate 2,4,23 . The evolutionary lineage of Mfps revealed the sedentary mode of life style preference of an adult organisms. The Mfps property determined by the geographical habits of the organisms. These byssus threads producing bivalves is randomly distributed all over the world, each geographic zone has the specific dynamic characters are presents, in the case tidal power, salinity, temperature, wave actions etc. Based on this property the evolution and functional divergence of Mfps may evolved. The evolutionary divergence of Mfps producing bivalve, M.edulis is showed the highly complex geographical distribution pattern. The geographical distributional pattern of Mytilus sp. are widespread that exhibit an anti-tropical distribution with M.edulis, M.californianus, and M.unguiculatus occurring in the Northern Hemisphere and M.galloprovincialis distributed in Northern and Southern Hemispheres 24 . The geographical distribution of Perna sp. is mainly occurred in the tropical zone. P. canaliculus is randomly distributed in Southern temperate region ( Fig. 6 and Supplementary data S1- Table 1).
By analyzing all Mfps, the fp3 and then followed by fp5 and fp6 is the evolutionary lineage of foot proteins in selected bivalves species (all Mytilus sp. except M.edulis fp3 is not available) except P.viridis. Because the fp3,5 and 6 are predominantly found in plaque region of the byssus threads and it contribute to the wet adhesion 2 . It can be easily concluded that in all byssus thread producing bivalves, fp3 is the ancestor of all other existing Mfps and the wet adhesion property is the core phenomena of all Mfps. In Perna sp. first evolved Mfps is fp1, they actually provide the hydrophobic nature and act as protective varnish layer 25 . Comparing to the all Mfps, the fp1 and fp2 is the last evolved Mfps, because the fp1 mainly act as protective functions and fp2, provide the structural integrity to the adhesive plaques 14 . After the fp1, fp5 and followed by fp3 and fp6 is the evolutionary lineages of wet adhesive property of Mfps in Perna sp. except fp3, the fp5 has the specialized mechanism for ability to bind into the calcareous mineral substrate. Fp5 contained the phosphoserine, the post translational modification of phosphoserine gives the ability to bind calcareous materials. The fp5 is the first produced foot protein by P. viridis for surface water replacement and then followed by fp3 and fp6 it gives the stability to wet adhesion 13 . Mfps revealed functional evolutionary origin characterization and speciation of Perna sp 26 . The Darwin natural selection pressure is observed in the expression of Mfps diversification because the Mfps is played a vital role in wet adhesion and helps to development of the sedentary mode of lifestyle. The natural selection depends on the environment and requires existing heritable variation in a group. This is the first report of the phylogeny construction of all available Mfps and evolutionary analysis based on the functional divergence.
conclusion This is the first report by using the insilico methods to evaluate the physiochemical structural and functional characterization of all available Mfps revealed the unique characteristic features of each mussel foot proteins (Mfps). Required more than a thousand mussels for each Mfps extraction and characterization from different species of mussels in the aim of creating strong adhesives materials. In this works highlighted the several biochemicals, molecular, structural and functional features of the Mfps, these results help to the future development of bio-adhesives in different perspectives. We are not only revealed the bio-adhesive property of Mfps and also

Materials and Methods
Datasets. Bivalve Mfps (mussel foot proteins) sequences in FASTA format were retrieved from the NCBI protein database (August 2019) (http: www.ncbi.mlm.gov/protein). Selection criteria are mainly based on Mfps producing bivalves in which the complete sequence of at least one adhesive protein is identified. Molecular modeling. The MUSTER algorithm used for protein modeling 10,27 . This server (https://zhanglab. ccmb.med.umich.edu/MUSTER/) analyzes the previous sequence profile-profile alignment (PPA) method and the best template used for the homology modeling of Mfps. The models were evaluated in PROCHECK 27,28 and PDBsum server 12,27 , and the visualization of the protein model in PyMol 27 and EzMol 2.1 29 .
Signal peptide prediction. Phobius (http://phobius.sbc.su.se/) 30  functional characterization of Mfps. FFPred 3 (http://bioinf.cs.ucl.ac.uk/psipred/) 17 server used for functional characterization of Mfps. The predictions are made by scanning the input sequences against an array of Support Vector Machines (SVM). In this server, large SVM library that extends its coverage to the cellular component sub-ontology for the first time, prompted by the establishment of a dedicated evaluation category within the critical assessment of functional annotation. For further analysis of the functional characterization of Mfps, the probability range set to be above 0.800. chemical structural characterization of Mfps. SAPS 32 (https://www.ebi.ac.uk/Tools/seqstats/saps/) server evaluates a wide variety of protein sequence properties using statistics. Properties considered include compositional biases, clusters and runs of charge and other amino acid types, different kinds and extents of repetitive structures, locally periodic motifs, and anomalous spacing between identical residue types.
physio-chemical characterization of Mfps. Expasy protparam (https://web.expasy.org/protparam/) server 33 analyze the physicochemical properties of Mfps likes, isoelectric point (pI), molecular weight (Mw), extinction coefficient (EC-quantitative study of protein-protein and protein-ligand interactions), instability index (II-stability of protein), aliphatic index (AI-relative volume of protein occupied by aliphatic side chains), and Grand Average of Hydropathicities (GRAVY -sum of all hydropathicity values of all amino acids divided by number of residues in a sequences). www.nature.com/scientificreports www.nature.com/scientificreports/ Accessible surface area (ASA) analysis. VADAR (http://vadar.wishartlab.com/) server 34 is a compilation of more than 15 different algorithms and programs for analyzing and assessing peptide and protein structures from their PDB coordinate data. ion ligand-binding site prediction. IonCom (https://zhanglab.ccmb.med.umich.edu/IonCom/) 35 is a ligand-specific method for small ligand (including metal and acid radical ions) binding site prediction. Starting from given sequences or structures of the query proteins, IonCom performs a composite binding-site prediction that combines ab intio training and template-based transferals. The server focuses on binding site prediction of thirteen most important small ligand molecules, including nine metal ions (Zn 2+ , Cu 2+ , Fe 2+ , Fe 3+ , Ca 2+ , Mg 2+ , Mn 2+ , Na + , K + ) and four acid radical ions (CO phylogeny construction of Mfps. Phylogenetic analysis of 78 Mfps were performed in MEGA X software 36 and 78 Mfps were aligned by using MUSCLE software. The evolutionary history was inferred by using the Maximum Likelihood method and the JTT matrix-based model 37 . The tree with the highest log likelihood (−3861.65) and Initial tree(s) for the heuristic search was obtained automatically by applying Neighbor-Join and BioNJ algorithms to a matrix of pairwise distances estimated using a JTT model and then selecting the topology with superior log likelihood value. The tree is drawn to scale, with branch lengths measured in the number of substitutions per site.
Ancestral analysis -mitogenome based. Ancestral states were inferred using the Maximum Likelihood method 38 and the Tamura-Nei model 39 . The tree shows a set of possible nucleotides (states) at each ancestral node based on their inferred likelihood at site 1. For each node, only the most probable state is shown. The initial tree was inferred using the method. The rates among sites were treated as being uniform among sites (Uniform rates option). This analysis involved ten nucleotide sequences. Codon positions included were 1 st + 2 nd + 3 rd + Noncoding. Evolutionary analyses were conducted in MEGA X 36 with MUSCLE alignment.

Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.